1. Sources of Data

Two things are at the heart of ethical data generation: the quality of data and the way in which data is generated. First, researchers who are involved in data generation have a duty to produce and record accurate, adequate, and verifiable data. Second, when human subjects and other living organisms are involved in the production of research data, researchers have an ethical responsibility to protect the rights of the human and living nonhuman research subjects. To develop a more comprehensive understanding of ethical issues in research data generation, let’s first look at the various sources of research data.

It is a good practice to think about the potential sources of data when you brainstorm research ideas or formulate new research questions. Just as the tools available to us sometimes help define who we are as humans, the availability of data and the difficulty of collecting it often in important ways shape the direction of a research project. In many cases research data comes from three sources: existing data, observations, and experiments. Sometimes a research project generates data that is beyond the scope of the researchers’ current analysis, and the “surplus” data might be used to answer different sets of research questions. A published data set that has been collected during previous research might also be used for other research purposes. Remember the “lifecycle” of research data? The cycle reminds us that data rarely ends in the “using, sharing, and preserving” bulb. Instead, it is often reused in new research initiatives and thus repeats its lifecycle. Using existing data is often more cost-effective than collecting new data; therefore, in the planning stage of a research project, one should always examine the possibility of reusing existing data. Of course, if the data is owned by another research group or by the sponsor of a previous research project, you will need to acquire appropriate permissions from the owners of the data—sometimes with payment—before using it in your own research.

New data could come from observational or experimental studies. In an observation study the researchers collect data without applying treatment to the research subjects (Hill, Reiter, and Zanutto, 2004). For example, if we want to estimate how many vehicles pass an intersection during certain hours of the day, we could send a team of researchers, or set up a camera, to count the number of passing vehicles during those hours for several days and then calculate an average. Because the observer in principle does not alter the dynamics of the research subjects, observational data might be considered representative of the natural occurrence; i.e., the data reflects what actually happens in the day-to-day environment. However, we should caution that observational data is not the same as the natural occurrence: the mere presence of an observer might change the dynamics of the scene that’s being observed. Many of us can recall a slowed down traffic on a highway when there is a police car, even if the policy does not act in any particular direction. Therefore, when collecting data, we should be aware of the possible impact our presence might have on the integrity of data.

Experimentation is another common avenue for generating and collecting research data. Experimentalists usually create an environment in which they manipulate certain independent variables and document how the research objects (dependent variables) change accordingly. For example, if we want data about how drivers behave under ineffective regulators (e.g., traffic lights), we might intervene in the functioning of traffic lights (e.g., making a light permanently red) and record how long drivers wait before driving through the red light. Experimentation in a controlled environment, such as a research lab, grants easier access to data, because the researchers can decide when to run the experiments and how data can be isolated from “noises.”

 

Person conducting an experiment in a lab.

Figure 2 Experimentation Is an Important Source of Data.
(Genetics Research by Penn State News / CC BY-NC-ND 2.0)

 

We sometimes take for granted that the mechanisms of data generation (e.g., existing data, observations, and experimentation) are the “sources” of data, in part because such mechanisms often indicate the credibility of acquired data. What’s more, big data technologies have dramatically enhanced researchers’ ability to turn almost every sphere of our physical and social worlds into quantifiable and computerized data, a process Mayer-Schönberger and Cukier (2013) call “datafication.” However, ethical reflection should help us recognize that the real “sources” of data are not the tools or methods for data collection but the objects from whom or which information is extracted. Therefore, in the study of drivers with ineffective regulators, the source of the data is the drivers. Similarly, the sources of research data in most cases are materials, humans, living nonhuman organisms, and systems. For many engineering researchers, materials are a familiar source of data: they test the strength of concrete, observe the structure of polymer, or document the conductivity of ceramics. Ensuring the integrity of data is one important ethical concern when generating data from materials. In addition, researchers who work with materials might consider other ethical issues, such as the cost, safety (for the lab crew and for end users), and environmental impact of the materials.

Humans are a common source for research data, even though the human subjects are sometimes involved in data generation indirectly. For example, in the above example where we count the number of vehicles passing an intersection, we are actually collecting data about the drivers. Collecting data from human subjects is a sensitive issue and it often raises a wide range of ethical, legal, and technical questions. In Section 3 of this unit we will discuss some relevant ethical values, regulations, and recommended practices for using human subjects in research. Some key points include respecting the rights of the research subjects and acquiring consent before collecting data. In addition, although we are not able to acquire consent from other living nonhuman organisms, there are important ethical values and regulations that protect the rights of these research subjects.

Sometimes research data is generated not from individual subjects but from systems. An example would be the testing of the response rate of software. Traditionally the research community does not grant nonliving systems the same ethical rights as living organisms. However, research in the past few years in fields like robot ethics has started to raise questions about our ethical obligations to artifacts, such as robots (Coeckelbergh, 2010).