2. The Lifecycle of Data: an Overview of Ethical Data Management

2.1 The research process and relevant actors

In the Introduction of this tutorial, we presented the lifecycle of research data, consisting of four interrelated stages. Each stage involves particular responsibility for different actors to manage research data ethically. Planning for data should take place during the research design. Data planning provides crucial opportunities for a research team to build consensus and establish overall guidelines for handling data during the entire research process. Usually the PIs should take lead in the planning but also encourage the whole team to participate. The stage of generating data is critical for the quality of the research. Graduate students and technicians who are responsible for data collection should follow proper protocols and best practices in the discipline. PIs and senior researchers should also take part in regular review of data generated from experiments and observations. Raw data usually needs to be processed by analysts and statisticians. This stage often involves ethical questions regarding appropriate ways of analyzing and representing the data. Finally, the results of data analysis will be used in supporting or refuting arguments, and very often, partial or entire datasets will be shared with other researchers and the public through publications. The use and sharing of data raise important questions about who has what kind of rights over the data. In many cases, research data will be preserved for future research or development, thus it is important to specify who is responsible for preserving the data after a research project is completed.


The research that can not be replicated: The STAP cells scandal

In 2014, Haruko Obokata became one of the most talked about stem cell researchers in the world. Obokata earned her Bachelor of Science and Master of Science degrees in Applied Chemistry from Waseda University and continued there for a Ph.D. focusing on stem cell research. During her Ph.D. study Obokata spent two years in the U.S. doing research in a lab led by Harvard professor Charles Vacanti, who first proposed the idea of stimulus-triggered acquisition of pluripotency (STAP) cells. The theory behind STAP cells suggests that under proper stimulation (e.g., stress), ordinary body cells can be converted back to stem cells, which are capable of growing into any type of cell in the body.

In 2011, Obokata received her Ph.D. degree in chemical engineering from Waseda University, and the renowned Riken Center for Developmental Biology hastened to hire her to lead a laboratory that focused on making STAP cells. In January 2014, Obokata and her collaborators published two articles in Nature which claim that the authors found a simple way of making STAP cells: simply bathing body cells in a weak acidic liquid. Because this discovery represented a significant breakthrough in stem cell research and showed great business potential, the world soon greeted Obokata as one of the stars of stem cell research.

However, Obokata’s success was short-lived. Shortly after her publications in Nature, readers found that some images in the articles had been inappropriately manipulated and some texts were copied from previous publications. An investigation undertaken by the Riken Center concluded that Obokata had committed research misconduct. In July 2014, Nature retracted the two articles. Meanwhile, a more disturbing problem loomed large: no other lab was able to replicate Obokata’s results following the method she had reported. It became increasingly doubtful whether Obokata had successfully produced STAP cells at all. In March 2014, Obokata’s mentor in the U.S. and an co-author of the Nature articles, Vacanti, claimed that STAP cells are easy to make and posted a protocol for making STAP cells on the website of his lab, but no one else was able to produce STAP cells following his protocol either. The Riken Center required Obokata to lead a team to reproduce her experiments in a monitored environment. After several months of work, Obokata announced that she was not able to reproduce the results, and she resigned from her position at Riken. A subsequent investigation by Riken concludes that the stem cells identified as STAP cells in Obokata’s research were actually embryonic stem cells that were taken from elsewhere.

The STAP cells research scandal was followed by a chain of sever consequences. On August 5, 2014, Yoshiki Sasai, Deputy Director of the Riken Center and Obokata’s supervisor, committed suicide. In March 2015, Nobel Prize winner Ryoji Noyori resigned from his position as the head of Riken Center. In November 2015, Waseda University revoked Obokata’s Ph.D. degree. It is estimated that the Riken Center spent about 145 million Japanese Yens (~1.33 million US dollars) on the failed STAP cells research.


  1. John Rasko and Carl Power, “What pushes scientists to lie? The disturbing but familiar story of Haruko Obokata.” The Guardian. Feb 18, 2015.
  2. A series of reports are also available on The Japan Times.


Questions for Case Analysis

  1. Which persons and institutions are accountable for the STAP cells scandal?
  2. What actions should the leaders of Riken take in response to the outbreak of the STAP cells scandal?


2.2 Relevant ethical concepts

Some basic concepts from ethical theory will help us clearly state and examine the ethical problems—such as the replicability of data in the above case—that are associated with the lifecycle of research data. In this tutorial, we will repeatedly interact with four ethical concepts: integrity, rights, impact, and epistemic norms.



Portrait of Immanuel Kant

Figure 3 Immanuel Kant.
(Public Domain)

Oxforddictionaries.com defines “integrity” as 1) The quality of being honest and having strong moral principles; and 2) The state of being whole and undivided. These two definitions very nicely illustrate the meanings of “integrity” in data management: integrity refers to the characters of the researcher as well as the quality of data. Philosopher Immanuel Kant (Figure 3) reminds us that we have an ethical duty to be honest all the time, and under no circumstance should we abandon this duty in order to advance a particular interest. Accordingly, researchers have an ethical duty to truthfully report their research findings and should never deceive the audience. While this principle seems simple and clear, it is sometimes a challenge to maintain integrity in research. The second definition of integrity, “being whole and undivided,” relates to what researchers call “data integrity.” Giffels, Vollmer, and Bird (2010) define data integrity as the accuracy and reliability of data; they explain that data integrity “encompasses a broad range of topics relevant to the collection, selection, interpretation, storage, and distribution of data.” In science and engineering research, data integrity is sometimes referred to as “replicability.” According to a popular guide to responsible conduct of research, On Being a Scientist, “researchers have a fundamental obligation to create and maintain an accurate, accessible, and permanent record of what they have done in sufficient detail for others to check and replicate their work” (Committee on Science, Engineering, and Public Policy, 2009).



When dealing with data, we should be careful about potential impingement on others’ rights. The word “right” appears frequently in our political, social, and economic lives. Examples include human rights, right of free speech, property rights, etc. We can understand right as a special form of freedom: owners of a right have the freedom to conduct certain activities, while everyone else has a duty to honor and protect that freedom. For example, I have a right to my own writings, thus I can share them to anyone as I wish. Meanwhile, other people have a duty not to share my writings without my permission. In principle, people have a right to information about themselves; i.e., they should control whether and how such information is collected, shared, and used. Under certain circumstance, people could consent to give up or transfer this right to others; for example, we consent to let security personnel look at our luggage at an airport. However, we should be very careful about when a right is transferred and to whom it is transferred, for confusion about these questions could raise ethical issues. For example, drivers consent to be filmed by speed cameras installed by the traffic authority when they apply for licenses. In this case, the right to film the drivers is transferred to the traffic authority, not to everyone. Therefore, a researcher who films the traffic flow at a busy intersection could potentially violate the rights of the passing drivers and passengers. Even when the research subjects consent to have their personal information collected, there remain questions with regard to who should have a right to use the research findings, and who have a right to the potential benefits (including profit) generated from the research. As we will see in Unit 4, our rights to information about ourselves are challenged by emerging technologies of data handling, such as big data analysis.



To understand the impact of research as an ethical concept, we might start by asking “Why do we do research?” One possible answer is: we do research to satisfy our intellectual curiosity. Indeed many researchers are driven by the enthusiasm to ask questions and find new answers. However, curiosity alone may not always justify the cost of research, especially when research has become increasingly expensive and relies very often on external funding. Therefore, an additional reason for conducting and supporting research might be the economic and social benefits that result from the research findings, especially benefits to those who fund the research. The emphasis on benefits may sound familiar to engineering researchers, as engineers often engage in “applied research” that seeks to improve sociotechnical realities. Hence, when assessing the ethical implications of a research project, we should ask: What is the positive and negative impact of the research? How do we maximize the positive while minimize the negative impact? Once again, these questions seem simple and clear in principle, but answering these questions in the real context of research may not always be straightforward.

To begin with, how should we identify the objects (people, living species, the environment, etc.) that are impacted by our research? Imagine a research project in which blood samples from cancer patients are collected to test the effects of a newly designed radiation therapy device. During each stage in the lifecycle of this research, a variety of “stakeholders” will be impacted in different ways. In the data planning stage, the researchers need to decide which population should be included in the research; for example, whether samples will be collected from patient groups evenly distributed by age, gender, and ethnicity? The choice of sample population could impact which subgroup’s diseases receive more attention from the researchers. Furthermore, the way data generation is organized and implemented also impacts patients, their families, and other medical practitioners. For example, will the patients donate their blood samples at a single research facility or at local hospitals? Will the sample providers receive adequate treatment during the research? Will they receive compensation for participating in the research? Obviously, the processing and analysis of the research data will have a huge impact on the success of the new radiation therapy device, which in turn impacts the financial prospect of the company that developed the device as well as the career advancement of the researchers. Finally, the application and sharing of the research results will impact who gets the new treatment and the cost for patients, their families, and insurance companies. The storage or disposal of the blood samples will impact future researchers, the privacy of patients, the health of involved medical practitioners, and the environment. In many cases the impact of research is sophisticated and uncertain. Therefore, besides identifying the various stakeholders and how they are affected, researchers also need to make difficult decisions in order to balance the positive and negative impact.


Epistemic norms

Epistemic norms are principles and standards related to questions like “what is true” and “how do we know it.” Widely shared in the research community, epistemic norms help ensure the quality of research. Examples of common epistemic norms include rigor, objectivity, and robustness. Because epistemic norms specify not only criteria for solid research but also proper behaviors in the research community, following these norms becomes a technical as well as an ethical matter. For example, while “integrity” describes the quality of research data from a technical standpoint, it also reflects an ethical quality of the researcher: honesty. In order to uphold common epistemic norms in managing research data, it is important to recognize and avoid research practices that are broadly considered “questionable.” Questionable research practices are not blatant research misconducts; however, researchers who engage in these practices fail to uphold important values widely shared in the research community (Pascal, 2006). For example, the National Academy of Sciences names the following questionable research practices with regard to data management:

  • Failing to retain significant research data for a reasonable period;
  • Maintaining inadequate research records, especially for results that are published or are relied on by others;
  • Refusing to give peers reasonable access to unique research materials or data that support published papers;
  • Using inappropriate statistical or other methods of measurement to enhance the significance of research findings;
  • Misrepresenting speculations as fact or releasing preliminary research results, especially in the public media, without providing sufficient data to allow peers to judge the validity of the results or to reproduce the experiments (cited in Pascal (2006)).

Some of these practices can be attributed to “slackness,” “lack of courtesy,” or “lousy research,” but they also raise questions about the researchers’ ethical standards.


Skip to toolbar