3. Data Storage and Protection

As the “lifecycle of data” indicates, usually data generated from a research project is not discarded after the results are published. Instead, it would be kept for future research and application. For example, the same dataset might be analyzed with new methods to answer additional research questions, or it might be used by other research groups in different studies. Because of the potential value of existing data, researchers (especially research team leaders and PIs) should develop plans to evaluate, store, and retain the data after a research project is concluded. Ideally, planning for data preservation should take place at the beginning of a project. This plan should specify the methods, techniques, and devices that will be used for storing data, the period of data retention, as well as measures that will be taken to ensure data security.

 

3.1 Methods of data storage

When choosing the proper method for storing research data, one should take into account the scope, format, and volume of the anticipated data as well as the intended length of storage. As we explain in Unit 1, research data includes a broad range of information, such as test results, samples, graphics, experimental procedures, and preliminary analysis. Given the limitations in space, equipment, and personnel, it is likely that only a proportion of the “raw data” will be retained. After analyzing data and publishing the results, a research group should collectively evaluate all the data sets generated in the project and decide which ones should be stored. The selection should be based on assessment of different categories of data (e.g., test results vs. lab notes); one should not pick and choose the “best performing” subsets of data, which will lead to biases in future analysis.

Based on its format, data can be categorized as physical or electronic. Physical data might include material samples, hardcopies of test results, lab notes, written analysis, etc. Material samples should be stored in appropriate containers and sometimes kept in the refrigerator; hardcopies of written data are usually complied in binders and placed in secured cabinets. Electronic data includes any digital files of texts, graphics, tables, audio and video records. Electronic data can be stored on common digital storage devices, such as computer disks, portable hard drives, and CDs. Some data generated in the past is stored on tapes. Recently some researchers prefer to store their data on secured online space. Such service is often enabled by cloud-based technology. While cloud-based service provides convenient access to data, there is risk associated with storing data on the cloud. We discuss this issue later in Data Security.

The choice of medium for data storage should also take into account their durability. For example, CDs are easy to copy and transport, but their durability is limited. We should also pay attention to the compatibility of a storage device with the development of new digital technology. To prevent accidental loss, it is a good practice to retain multiple copies of important data sets. These copies might be stored on different media for different purposes. For example, CDs might be used for transporting data to other locations, and hard drives can be stored in locked cabinets for long-term preservation. For researchers who deal with large volume of data in multiple formats, it is essential to properly index the stored data, so that a stored data set can be easily located by current as well as future members of the research group.