Tag Archives: biobanks

The Feds’ “Vast DNA Database”

A few reporters have inquired about the Texas tale concerning “hundreds of dried blood samples [shipped] to the federal government to help build a vast DNA database–a forensics tool designed to identify missing persons and crack cold cases.” [1] Earlier postings (on March 4 and 15) explained that contrary to the impression created in a series of Texas Tribune stories, the Guthrie cards were not destined for a data base like the FBI’s National DNA Index System (NDIS) that matches DNA recovered from crime scenes against stored DNA profiles from convicted offenders or arrestees.

This was not to say that, as a matter of public policy and research ethics, the state Department of Health Services should have released its Guthrie cards for any research without first securing parental consent. That is an issue about which reasonable minds can and do differ. In judging the propriety of these decisions, however, it is vital to understand the true privacy risks that the dissemination of the samples entailed. Indeed, inasmuch as plaintiffs are demanding that the Armed Forces DNA Identification Laboratory (AFDIL) return the cards (which would lead to their incineration under Texas’s current policy), the question is crucial to the continuing tussle in Texas. If there is a substantial privacy risk, the demand obviously has more merit than if there no such risk. Furthermore, because plaintiffs also want the researchers to extirpate the DNA sequence data in its anonymized research database, the issue is of national concern. The research database, although small, is important to a fair presentation of mtDNA matches in all criminal cases and to the correct interpretation of these findings during an investigation. This posting therefore provides an assessment of some of the risks that children conceivably could face from the presence of their mtDNA sequences in the research database and from their Guthrie cards in the laboratory’s files.

I. Use of the AFDIL Research Database

The population-genetics database for mitotypes poses a rather remote threat to the Texas newborns. The FBI explained how such databases work in 1999:

The FBI Laboratory, the Armed Forces DNA Identification Laboratory, and other laboratories have collaborated to compile a mtDNA population database . … The database is referred to as the SWGDAM (Scientific Working Group on DNA Analysis Methods) database. It contains sequences from four main racial groups: Caucasians, Africans, Hispanics, and Asians. Most of these samples have been obtained from paternity-testing laboratories, blood banks, or academic groups studying ethnic populations. The database currently contains 2,426 mtDNA sequences from unrelated individuals. However, the database is updated frequently and is constantly growing. …

When a sequence from a questioned sample [one found at a crime-scene] and a known [suspect’s] sample is the same, the SWGDAM database is searched for this sequence. … The FBI Laboratory lists the number of observations of a sequence in each racial subgroup of the database in a report of a mtDNA examination. For example, a sequence might be seen five times in the database samples of Caucasian descent and one time in the database samples of Hispanic descent yet not appear in the remaining database subgroups. …

[�] Most of the sequences in the forensic mtDNA database occur a single time (approximately 60 percent), and the total number of mtDNA sequences in the entire human population is not known. Reliable frequency estimates for most mtDNA sequences are therefore not possible [because] small databases are not effective tools for estimating frequencies of rare events.

[�] However, statistical methods exist for calculating an upper-bound estimate of the frequency of mtDNA types with zero occurrences or very few occurrences in a database of limited size. This upper-bound estimate describes the highest frequency expected for a particular mtDNA sequence using the database. … As the database grows in size, the frequency estimates for individual mtDNA profiles will become more and more refined and eventually lead to reliable population frequency estimates. [2]

When used to estimate the frequency of a mitotype in a crime-scene sample that matches a defendant in a criminal case, the population-genetics database poses no risk that a Texas baby will be accused of a crime–correctly or otherwise. But one can imagine a different scenario: Suppose that Inspector Javert is pursuing the perpetrator of a horrific crime in Texas. The usual suspects have excellent alibis. A search of the Texas DNA database of convicted offenders draws a blank. A search of NDIS also comes up negative. Javert, who never gives up, takes the mtDNA sequence from the crime scene and compares it to the sequences in the anonymized population-genetics database. Voila! He finds a match.

Javert demands that the custodians of the population reference database tell him whether it came from the 800 or so Texas samples. It does. Now he could go to the state health department to obtain the names of the 800 suspects (if such records still exist). Or better, if the cards supplied to AFDIL retained their original numbers and if the health department retained the numbers linked to personally identifying information, then Javert could find his way to one family, and he could investigate whether anyone in that maternal lineage could be the culprit. There might be other families with the same mitotype, but Javert at least would have found a lead.

Although the Javert scenario is fictional, it is not impossible. Even anonymized population reference mtDNA databases could lead the police to a family in some situations. But it’s a stretch.

II. Genetic Discrimination

Every tissue repository contains human biological material that could be tested for genetic markers or predictors of various diseases. Most states have laws to prevent insurance companies and employers from conducting or using such genetic test results, and the recent federal Genetic Information Nondiscrimination Act (GINA) provides comprehensive national protection as well. Considering that the AFDIL research samples came with no names attached to them, the risk that the children will face “genetic discrimination” if the cards are retained seems very small indeed.

III. Leakage into NDIS

The National DNA Index System used to find “cold hits” to convicted offenders in criminal investigations contains over 7,000,000 STR profiles. It is possible to extract STR profiles from the Texas cards. But adding these 800 or so STR profiles to the database would violate the federal law establishing the Convicted Offender DNA Index System (CODIS). Furthermore, lacking identifying information, the database administrators would not find them terribly useful. A hit from an unsolved crime to one of these profiles would mean that one of 800 Texas babies has grown up to deposit DNA at a crime scene. Our Investigator Javert then could find his way to a single suspect–if the cards supplied to AFDIL retained their original numbers, if the illegal NDIS record kept track of this number, and if the health department retained the numbers linked to personally identifying information. Still, the full scenario–that AFDIL researchers would supply the cards to the FBI, that the FBI would analyze the STRs and illegally add them to the operational database, and that a hit in the database would lead to a suspect–is strained.

IV. Leakage into the National Missing Persons Database

The FBI maintains a National Missing Persons database, also known as CODIS(mp). [3] When a child is missing, a family member with the same mitotype (anyone in the same maternal lineage) can supply a DNA sample for mtDNA sequencing. The mitotype will be kept in the missing persons database to be checked against mtDNA extracted from unidentified human remains that come to the attention of the police. A hit between the mtDNA from the remains and the family member’s DNA serves to identify the remains as the reported missing person’s. It would make little sense to include several hundred de-identified samples from Texas newborns in this database.


I would not claim that the retention of the de-identified Texas samples or the presence of the anonymized mtDNA sequences in the population-genetics research database poses absolutely no risk of someday implicating today’s newborns in a criminal investigation. But the pertinent scenarios seem farfetched. The real population genetics database bears little resemblance to “a vast DNA database” for finding missing persons and solving criminal cases. The risk it poses to the Texas newborns and their families is minimal.


1. Emily Ramshaw, DNA Deception, Texas Tribune, Feb. 22, 2010, http://www.texastribune.org/stories/2010/feb/22/dna-deception/, last viewed, March 2, 2010

2. Alice R. Isenberg & Jodi M. Moore, Mitochondrial DNA Analysis at the FBI Laboratory, For. Sci. Commun., July 1999 Vol. 1 No. 2, 1999, available at http://www.pocketexpert.net/files/U.pdf (last viewed April 10, 2010).

[3] Nancy Ritter, Missing Persons and Unidentified Remains: The Nation’s Silent Mass Disaster, NIJ Journal, No. 257 (2007), available at http://www.ojp.usdoj.gov/nij/journals/256/missing-persons.html (last viewed April 10, 2010)

� 2010 David H. Kaye

Up in Smoke: 5 Million Neonatal Blood Samples Incinerated

An effort to avoid the pointless destruction of the millions of Guthrie cards maintained by the Texas Department of Health Services has come to naught. Plaintiffs who sued the department as well as privacy advocates initially were open to the idea of preserving all the cards with neonatal bloodspots for future research while seeking consent for their storage from millions of parents. [1]

However, this was not to be. After the state quickly settled the dubious lawsuit, an enterprising but inadequately informed journalist published Internet stories alleging that the department had turned “over hundreds of dried blood samples to the federal government to help build a vast DNA database–a forensics tool designed to identify missing persons and crack cold cases” and that the samples “were forwarded along to the federal government to create a vast DNA database, one that could help crack cold cases and identify missing persons.” [2] In her latest installment of this tall tale she continues to write that the samples will “help identify missing persons and crack cold cases.” [1]

The suggestion that the U.S. military is using the samples to build a database to “crack cold cases” or to identify “missing persons” in this country is preposterous. To summarize a previous posting [2]: First, the research project is limited to mitochondrial DNA, which rarely is used in forensic investigations because it is not capable of providing specific identification. Second, AFDIL does not maintain any databases of DNA profiles to crack cold cases. Third, even if AFDIL were authorized to maintain a database of civilian DNA profiles for criminal investigations, a collection of nameless mtDNA sequences from de-identified samples would be pretty useless. Finally, the true purpose of the research is clear from “Federal MtDNA Paper” posted on the Tribune‘s website. The AFDIL paper explains that the research database, which cannot be used to identify individuals, simply allows geneticists to put estimates of random-match probabilities for mtDNA on a sounder footing. These estimates are necessary to understand the probative value of an mtDNA match in any criminal investigation or trial. They have nothing in particular to do with cold hits or missing persons.

In sum, the research database has virtually no meaningful privacy implications. Some parents might not want their children’s blood samples used to improve the criminal justice system, but that alone is not much of a reason to destroy what the article calls a medical “treasure trove.” The children’s DNA is not going into any military or law-enforcement database for tracking down missing persons or cracking cold cases.

Yet, this fear apparently was the monkey wrench that jammed the effort to preserve the samples while seeking consent. Here are some excerpts from the latest news as described by the same journalist:

[T]he Department of State Health Services . . . agreed in December to destroy the blood spots, after a civil rights attorney and several Texas parents sued the state for storing them for research purposes without permission. But after the court settlement was signed, privacy advocates lobbied the agency for an alternate solution: a research database that would keep the blood spots intact while seeking electronic consent from parents. They got the go-ahead from some key lawmakers and from the lawsuit’s plaintiffs, who pledged to void the settlement, but not from DSHS.

When The Texas Tribune discovered last month that state health officials had turned hundreds of baby blood spots over to a federal Armed Forces lab between 2003 and 2007 to build a mitochondrial DNA database . . . any chance for saving the blood spots fizzled out. All 5 million blood spots were sent to a Houston-area incinerator last week.

“If there was any way the blood spots were going to be saved, the whole thing fell apart at that point,” said state Sen. Bob Deuell, R-Greenville, . . . “When this came out about these specimens going to the military, I said, ‘We’ve lost this one.'”

. . . State health officials say Austin-based national patient privacy advocate Deborah Peel and Deuell, a physician, approached DSHS Commissioner David Lakey early this year about using electronic consents to save the 5 million existing blood spots from destruction. The agency reviewed the idea but never pursued it. . . .

Critics say . . . DSHS conveniently settled the lawsuit before the trial went to the discovery phase, meaning the documents on the federal DNA study were never disclosed to the plaintiffs. (The Tribune obtained the documents on the federal project — designed to build a forensics tool to help identify missing persons and crack cold cases — through Texas open-records laws.) “Unfortunately, that of course confirmed the plaintiffs’ worst fears,” said Peel, founder of the nonprofit advocacy group Patient Privacy Rights.

Peel said the state’s decision not to seek a non-destructive solution is a shame. . . . “We were going to … reach out to those 5 million families and let them know they had an alternative to having their blood spots destroyed,” Peel said. . . .

Deuell said the impression he got from state health officials was that they feared they would be subject to litigation from other parents if they negotiated with the plaintiffs not to destroy the blood spots. . . . “They said, ‘The plaintiffs are just three people out of 5 million. Who’s to say somebody else wouldn’t come back and file a new suit?'”

Harrington [plaintiffs’ attorney] said that worry is “utter nonsense.” He said both sides could have gone back to the judge to have a new settlement drafted — one that would’ve protected the agency. “What’s the harm in that?” Harrington asked. “We would have supplemented or amended the settlement. It would have been totally possible.”

But once news broke that some of the blood spots had been turned over to the federal lab — and that the state had no intention of destroying those samples — the plaintiffs’ offer was off the table. Instead, they have demanded that the state get the blood spots back from the federal government, or they’ll file another lawsuit. . . . [1]

Maybe another lawsuit would be a good thing. With competent lawyering and journalism, the people of Texas finally might realize that none of their children’s DNA has found its way into any DNA database for identifying anyone.


1. Emily Ramshaw, DNA Destruction, Tex. Tribune, http://www.texastribune.org/stories/2010/mar/09/blood-drive/

2. David H. Kaye, A Texas Tall Tale of “DNA Deception,” Double Helix Law, Mar. 4, 2010 https://sites.psu.edu/dhlaw/2010/03/04/a-texas-tall-tale-of-dna-deception/

A Texas Tall Tale of “DNA Deception”

A “non-profit, nonpartisan public media organization,” the Texas Tribune [1] broke a story that is bound to attract national outrage. The story goes like this. Texas, like every other state, pricks the heels of new born children for a blood sample. It screens these samples for rare, metabolic genetic diseases and stores spots of blood on a card for each child. As the March of Dimes explains, “[w]hen test results show that the baby has a birth defect, early diagnosis and treatment can make the difference between lifelong disabilities and healthy development.” [2]

As these “Guthrie cards” began to accumulate, it became clear that they might be useful for medical research. In 1994, law professor Jean McEwen and doctor-lawyer Phil Reilly called them “inchoate databases” and found that many laboratories were open to the idea of sharing them — in anonymized form — for research that would benefit the public. [3]

The Texas State Department of Health Services did exactly this. It provided medical researchers with de-identified Guthrie cards to study “the gene involved in club foot, to inspect the DNA of infants who develop childhood cancer, [and] to examine prenatal lead exposure.” [4] For its efforts, the department was sued. It had treated the cards as free for the taking, without going back to every pair of parents to obtain explicit permission to release their (nameless) child’s blood spots. Although it is a huge jump from any case law, and even though the legally cognizable damages suffered by any parent whose unknown child’s blood spot made its way to a laboratory are obscure, five plaintiffs alleged violations of the protection of the Fourth Amendment, the Texas Constitution, and the common law. On their behalf and seeking to represent a much larger class of plaintiffs, the Texas Civil Rights Project sought declaratory and injunctive relief. [5]

The case promptly settled. The state agreed to destroy millions of cards, to give parents clearer procedures to opt out of the storage of the cards, and to pay $26,000 in attorneys fees and costs.

There things might have stayed — but for a journalist’s “review of nine years’ worth of e-mails and internal documents on the Department of State Health Services’ newborn blood screening program.” [4] She found that the state had concealed its involvement in a nefarious and far-reaching military or law-enforcement project. The Texas doctors had turned “over hundreds of dried blood samples to the federal government to help build a vast DNA database — a forensics tool designed to identify missing persons and crack cold cases.” [4] The samples, she repeated, “were forwarded along to the federal government to create a vast DNA database, one that could help crack cold cases and identify missing persons.” [6] The database would be shared worldwide, “for international law enforcement and investigation in the context of homeland security and anti-terrorism efforts.” [4]

Incensed, the lawyer for the five plaintiffs fired off a letter to the governor and the attorney-general. He accused the “TDSHS [of] supplying those blood samples taken from newborn babies to the military, not just for research, but so that the military can build a mitochondria DNA data base, which can be used in part for law enforcement purposes.” [6] He complained that “[t]his … alarming development … raises the specter of the federal government building an international DNA data base,” and he demanded that “within ten (10) days of this letter, you retrieve from the federal government all the blood samples that Texas has sent to the U.S. military and retrieve and destroy all information taken from those samples … .” [6] Indeed, he suddenly realized that this military project was why the state was so willing to settle the case: “‘Sometimes there are slam-dunk cases, but I’d never seen this kind of case settle without discovery,’ says [Jim] Harrington, director of the Texas Civil Rights Project. ‘This explains the mystery of why they gave up so fast.'” [4]

The trouble is that it’s all smoke and no fire. The reporter and the lawyer apparently have misread the report of the Armed Forces DNA Identification Laboratory (AFDIL) detailing its efforts to collect and study mitochondrial DNA (mtDNA) from varied people and places. As explained in Chapters 11 of The Double Helix and the Law of Evidence, AFDIL is a world leader in mitochondrial DNA sequencing because the technique is exceedingly valuable in identifying the remains of soldiers missing in action. [7] But mtDNA is not used to “crack cold cases,” at least not by generating cold hits in any law-enforcement database of DNA profiles from possible offenders. The national database (NDIS) maintained by the FBI — the one that actually helps in cracking cold cases — is limited to STR profiles in the DNA from the cell nucleus. These DNA sequences are wonderful for discriminating among individuals. When a 13-locus match from a crime-scene to one of the more than seven million profiles in NDIS pops up, it can constitute a practically conclusive identification to a known individual. [7] And, the bigger NDIS is, the more likely it is that the culprit will be in it. This kind of database is “only as valuable as its … size.” [4]

Not so with mtDNA. Everyone in the same maternal line shares the same sequence, and other essentially unrelated maternal lineages might have the same sequences. [7] Moreover, it would be inane to put anonymous sequences — nuclear or mitochondrial — into the database used in searching for cold hits. A hit from a crime-scene sample to a profile from a Guthrie card with no name attached to it would have little or no investigative value. The (nameless) Texas children need not fear being swept up in criminal or terrorist investigations because AFDIL sequenced their anonymous DNA.

But if the federal government does not want the samples for a database that will be used to catch criminals or terrorists, what nefarious international database are these profiles going into? Prosaically, they are part of a scientific, population-genetics database that will be helpful in understanding the significance of a match in an ordinary criminal case. Consider State v. Ware, the very first case with mtDNA evidence. Hairs were found in the bed where a young girl was attacked. [7, chap. 12] The hairs looked similar to the defendant’s under a microscope, but there have been false convictions with hairs that happen to look similar. (Just check with the Innocence Project.) Nuclear DNA, which could yield well-nigh conclusive results, were absent in the hair shafts, but there were enough mitochondria to get a useful sequence, and this sequence matched the defendant’s. [7]

Because mtDNA just does not have the power of nuclear DNA to differentiate among individuals, however, defense counsel in such cases can object (appropriately) that the evidence is confusing or misleading without statistics on how rare the mitotype in question would be in the general population. How many people would be falsely incriminated by the mtDNA sequence in the case?

By understanding the variations in the mtDNA sequences in different places and populations, scientists can estimate how rare or how common a mitotype that incriminates a suspect might be. Such estimates require reference databases, but the existing forensic-statistical-reference databases, defense counsel and a number of scientists have argued, are too small  and full of gaps in the population groups represented. [7] Indeed, the federal government has received considerable flak from the media and a vocal group of scientists, lawyers, and sundry others for its refusal to supply de-identified nuclear-DNA profiles from law-enforcement databases for new studies to supplement the existing statistical-reference databases long used to estimate the probability of random STR-profile matches in criminal cases. [9, 10]

In sum, the AFDIL study is a response to a legitimate scientific and legal concern. The federal government (as it should) wants to improve the infrastructure for using mtDNA evidence in court by enlarging the statistical-reference databases. Thus, the AFDIL report — the supposed smoking gun posted on the Tribune‘s website — is entitled “Development and Expansion of High-quality Control Region Databases to Improve Forensic mtDNA Evidence Interpretation.” As the title indicates, these scientific databases do not generate DNA evidence. They “improve” the “intepretation” of mtDNA evidence from other sources. The very first sentence of the report makes it plain that the databases are for statistical purposes only:

Mitochondrial DNA testing in the forensic context requires appropriate, high-quality population databases for estimating the rarity of questioned haplotypes. However, large forensic mtDNA databases, which adhere to strict guidelines in terms of their generation and maintenance, are not widely available for many regional populations of the United States or most global populations outside of the United States and Western Europe.

After elaborating, the report continues:

In order to address this issue, the Armed Forces DNA Identification Lab (AFDIL) has undertaken a high-throughput control region databasing effort. … Global populations that are currently underrepresented in available forensic mtDNA databases will comprise approximately 25% of the total number of samples. The remaining individuals will represent regional samples of various U.S. populations and global populations that contribute to the overall mtDNA diversity of the U.S. The high-quality mtDNA data generated from these efforts will be publicly available to permit examination of regional mtDNA substructure and admixture, and ultimately to improve our ability to interpret mtDNA evidence.

This population-genetics study is entirely different from building a huge database of mitotypes to generate cold hits. MtDNA does not work well for this purpose, and even if the FBI wanted to do it, anonymous data from AFDIL would be useless. All that those data can do is help investigators, judges and juries better assess the results of a match to a known suspect or defendant. Suggestions that neonatal samples are being put into databases that could result in the unknowing “donors” being swept up in future investigations of crime or terrorism are troubling — but not because they are true.


[1] Texas Tribune, About the Texas Tribune, http://www.texastribune.org/about/

[2] March of Dimes, Newborn Screening Tests, Mar. 2008, http://www.marchofdimes.com/pnhec/298_834.asp

[3] J. E. McEwen & P. R. Reilly, Stored Guthrie Cards as DNA “Banks,” 55 Am. J. Human Genetics 196-200 (1994), available at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1918213/

[4] Emily Ramshaw, DNA Deception, Texas Tribune, Feb. 22, 2010, available at http://www.texastribune.org/stories/2010/feb/22/dna-deception/, last viewed, March 2, 2010

[5] Beleno v. State Dep’t of Health Serv., Civ. No. SA09CA1088 (W.D. Tex. Mar. 12, 2009) (complaint)

[6] Emily Ramshaw, TribBlog: AG’s Office Fires Back at Blood Spot Attorney, Feb. 22, 2010, available at http://www.texastribune.org/blogs/post/2010/feb/22/tribblog-attorney-asks-perry-get-dna-back-feds/, last viewed, March 2, 2010

[7] David H. Kaye, The Double Helix and the Law of Evidence (2010)

[8] Jodi A. Irwin et al., Development and Expansion of High-quality Control Region Databases to Improve Forensic mtDNA Evidence Interpretation, 1 Forensic Sci. Int’l: Genetics 154-157 (2007)

[9] David H. Kaye, Trawling DNA Databases for Partial Matches: What Is the FBI Afraid Of?, 19 Cornell J. L. & Public Pol’y 145-171 (2009)

[10] D. E. Krane et al., Time for DNA Disclosure, 326 Science 1631-1632 (2009), DOI: 10.1126/science.326.5960.1631

� 2010 David H. Kaye