Archiving

The archival process is a critical part of data management. It is often forgotten, so you should ideally start to think about archiving as part of the data collection.

What is data archiving?

It is common to think about archiving as the last chain of a data management procedure. That is when you are done with your research, and want to "box up" your data and do other things. Unfortunately, many projects do not include time and resources for archiving. So a lot of data is left in the storage and is never archived. Then it cannot be reused. Storing is not the same as archiving!

What to archive?

The goal should be that we collect data that is of so high quality that it is publishable and shareable. In many cases, however, we collect data that should be discarded for one or more reasons. So it is up to the researcher to evaluate which data to archive.

A general rule of thumb should be that all articles should be published with supporting data. This is also becoming the standard in many journals. Just be careful about the license that you select (or is forced upon you).

It is also generally smart to release your data as a dataset, or database with more datasets. If you do this properly (see below), the dataset may also be citeable.

Where to archive?

This is the big question. For now, many researchers have just left their data on our internal servers. That is not an archive. It is becoming increasingly common that journals require depositing data used in papers in a specific archive. This is a good idea since it creates a link between the article and data. However, it is not a satisfactory solution for longterm archiving of data. For example, in many cases, only a subset of the dataset will be deposited. In addition to such depositing of data, you should think about archiving your data properly and according to the FAIR principles.

UiO does not have its own data archive but relies on researchers to use other archives. Below we go through some of the alternatives.

The NSD archive

The preferred solution, for now, is to use NSD's archive. The benefit of depositing data with NSD is that a professional data curator will check the data before being deposited. This ensures that the data files are readable and that the metadata are understandable. It is a free service to UiO researchers. Data deposited with NSD will be FAIR, but not necessarily open. If closed, researchers will need to contact NSD to check whether they can get access or not.

Field-specific archives

Instead of, or in addition to, using the NSD archive, it may be relevant to use a field-specific archive. One relevant archive for mocap data is Physionet. They also have professional data curators that go through the data. This archive's added benefit is that they also know mocap data, so the data curation is more detailed than NSD.

There are now also popping up "data journals", such as Research Data Journal for the Humanities and Social Sciences. Here data will be published after a peer-review process.

General-purpose archives

Several archives are open to all sorts of data:

  • Open Science Framework. This is part of a larger suite of tools for Open Science practice. A US-based organization backs it, but they have servers in Germany that comply with GDPR.
  • Zenodo. This is a pure data archive run by CERN and funded by the EU.

If you go for a general-purpose archive, you should remember no quality control of the data. So the data may not fully comply with the FAIR principles.

Please beware that other general-purpose archives may not comply with GDPR and/or be commercial. It may be required to archive your data in such archives as part of a publication process. However, you should be careful with what type of license that is used. You should also still archive data in one or more free and open repositories.

Web page

It has been common to "archive" data on own web pages for some time. Remember that putting your data on a web page is not in compliance with the FAIR principles. However, it may be a good idea to make a landing page for your data. The Oslo Standstill Database informs about the different datasets it contains, with pointers to the data stored with NSD and Physionet.

Archiving Checklist

  • Start to think about archiving as part of the data collection
  • Think about the FAIR principles when storing the data. This includes adding proper metadata and the use of open formats.
  • If you are asked to archive data with an article: do it, but do not give away your copyright
  • If you don't know where to archive: deposit in the NSD archive
  • If you have a good field-specific archive, go for it
  • Create a UiO landing page that points to your archive(s)
Published Jan. 26, 2021 10:46 AM - Last modified Nov. 14, 2023 1:59 PM