Tag Archives: mitochondrial DNA databases

The Feds’ “Vast DNA Database”

A few reporters have inquired about the Texas tale concerning “hundreds of dried blood samples [shipped] to the federal government to help build a vast DNA database–a forensics tool designed to identify missing persons and crack cold cases.” [1] Earlier postings (on March 4 and 15) explained that contrary to the impression created in a series of Texas Tribune stories, the Guthrie cards were not destined for a data base like the FBI’s National DNA Index System (NDIS) that matches DNA recovered from crime scenes against stored DNA profiles from convicted offenders or arrestees.

This was not to say that, as a matter of public policy and research ethics, the state Department of Health Services should have released its Guthrie cards for any research without first securing parental consent. That is an issue about which reasonable minds can and do differ. In judging the propriety of these decisions, however, it is vital to understand the true privacy risks that the dissemination of the samples entailed. Indeed, inasmuch as plaintiffs are demanding that the Armed Forces DNA Identification Laboratory (AFDIL) return the cards (which would lead to their incineration under Texas’s current policy), the question is crucial to the continuing tussle in Texas. If there is a substantial privacy risk, the demand obviously has more merit than if there no such risk. Furthermore, because plaintiffs also want the researchers to extirpate the DNA sequence data in its anonymized research database, the issue is of national concern. The research database, although small, is important to a fair presentation of mtDNA matches in all criminal cases and to the correct interpretation of these findings during an investigation. This posting therefore provides an assessment of some of the risks that children conceivably could face from the presence of their mtDNA sequences in the research database and from their Guthrie cards in the laboratory’s files.

I. Use of the AFDIL Research Database

The population-genetics database for mitotypes poses a rather remote threat to the Texas newborns. The FBI explained how such databases work in 1999:

The FBI Laboratory, the Armed Forces DNA Identification Laboratory, and other laboratories have collaborated to compile a mtDNA population database . … The database is referred to as the SWGDAM (Scientific Working Group on DNA Analysis Methods) database. It contains sequences from four main racial groups: Caucasians, Africans, Hispanics, and Asians. Most of these samples have been obtained from paternity-testing laboratories, blood banks, or academic groups studying ethnic populations. The database currently contains 2,426 mtDNA sequences from unrelated individuals. However, the database is updated frequently and is constantly growing. …

When a sequence from a questioned sample [one found at a crime-scene] and a known [suspect’s] sample is the same, the SWGDAM database is searched for this sequence. … The FBI Laboratory lists the number of observations of a sequence in each racial subgroup of the database in a report of a mtDNA examination. For example, a sequence might be seen five times in the database samples of Caucasian descent and one time in the database samples of Hispanic descent yet not appear in the remaining database subgroups. …

[�] Most of the sequences in the forensic mtDNA database occur a single time (approximately 60 percent), and the total number of mtDNA sequences in the entire human population is not known. Reliable frequency estimates for most mtDNA sequences are therefore not possible [because] small databases are not effective tools for estimating frequencies of rare events.

[�] However, statistical methods exist for calculating an upper-bound estimate of the frequency of mtDNA types with zero occurrences or very few occurrences in a database of limited size. This upper-bound estimate describes the highest frequency expected for a particular mtDNA sequence using the database. … As the database grows in size, the frequency estimates for individual mtDNA profiles will become more and more refined and eventually lead to reliable population frequency estimates. [2]

When used to estimate the frequency of a mitotype in a crime-scene sample that matches a defendant in a criminal case, the population-genetics database poses no risk that a Texas baby will be accused of a crime–correctly or otherwise. But one can imagine a different scenario: Suppose that Inspector Javert is pursuing the perpetrator of a horrific crime in Texas. The usual suspects have excellent alibis. A search of the Texas DNA database of convicted offenders draws a blank. A search of NDIS also comes up negative. Javert, who never gives up, takes the mtDNA sequence from the crime scene and compares it to the sequences in the anonymized population-genetics database. Voila! He finds a match.

Javert demands that the custodians of the population reference database tell him whether it came from the 800 or so Texas samples. It does. Now he could go to the state health department to obtain the names of the 800 suspects (if such records still exist). Or better, if the cards supplied to AFDIL retained their original numbers and if the health department retained the numbers linked to personally identifying information, then Javert could find his way to one family, and he could investigate whether anyone in that maternal lineage could be the culprit. There might be other families with the same mitotype, but Javert at least would have found a lead.

Although the Javert scenario is fictional, it is not impossible. Even anonymized population reference mtDNA databases could lead the police to a family in some situations. But it’s a stretch.

II. Genetic Discrimination

Every tissue repository contains human biological material that could be tested for genetic markers or predictors of various diseases. Most states have laws to prevent insurance companies and employers from conducting or using such genetic test results, and the recent federal Genetic Information Nondiscrimination Act (GINA) provides comprehensive national protection as well. Considering that the AFDIL research samples came with no names attached to them, the risk that the children will face “genetic discrimination” if the cards are retained seems very small indeed.

III. Leakage into NDIS

The National DNA Index System used to find “cold hits” to convicted offenders in criminal investigations contains over 7,000,000 STR profiles. It is possible to extract STR profiles from the Texas cards. But adding these 800 or so STR profiles to the database would violate the federal law establishing the Convicted Offender DNA Index System (CODIS). Furthermore, lacking identifying information, the database administrators would not find them terribly useful. A hit from an unsolved crime to one of these profiles would mean that one of 800 Texas babies has grown up to deposit DNA at a crime scene. Our Investigator Javert then could find his way to a single suspect–if the cards supplied to AFDIL retained their original numbers, if the illegal NDIS record kept track of this number, and if the health department retained the numbers linked to personally identifying information. Still, the full scenario–that AFDIL researchers would supply the cards to the FBI, that the FBI would analyze the STRs and illegally add them to the operational database, and that a hit in the database would lead to a suspect–is strained.

IV. Leakage into the National Missing Persons Database

The FBI maintains a National Missing Persons database, also known as CODIS(mp). [3] When a child is missing, a family member with the same mitotype (anyone in the same maternal lineage) can supply a DNA sample for mtDNA sequencing. The mitotype will be kept in the missing persons database to be checked against mtDNA extracted from unidentified human remains that come to the attention of the police. A hit between the mtDNA from the remains and the family member’s DNA serves to identify the remains as the reported missing person’s. It would make little sense to include several hundred de-identified samples from Texas newborns in this database.


I would not claim that the retention of the de-identified Texas samples or the presence of the anonymized mtDNA sequences in the population-genetics research database poses absolutely no risk of someday implicating today’s newborns in a criminal investigation. But the pertinent scenarios seem farfetched. The real population genetics database bears little resemblance to “a vast DNA database” for finding missing persons and solving criminal cases. The risk it poses to the Texas newborns and their families is minimal.


1. Emily Ramshaw, DNA Deception, Texas Tribune, Feb. 22, 2010, http://www.texastribune.org/stories/2010/feb/22/dna-deception/, last viewed, March 2, 2010

2. Alice R. Isenberg & Jodi M. Moore, Mitochondrial DNA Analysis at the FBI Laboratory, For. Sci. Commun., July 1999 Vol. 1 No. 2, 1999, available at http://www.pocketexpert.net/files/U.pdf (last viewed April 10, 2010).

[3] Nancy Ritter, Missing Persons and Unidentified Remains: The Nation’s Silent Mass Disaster, NIJ Journal, No. 257 (2007), available at http://www.ojp.usdoj.gov/nij/journals/256/missing-persons.html (last viewed April 10, 2010)

� 2010 David H. Kaye

A Texas Tall Tale of “DNA Deception”

A “non-profit, nonpartisan public media organization,” the Texas Tribune [1] broke a story that is bound to attract national outrage. The story goes like this. Texas, like every other state, pricks the heels of new born children for a blood sample. It screens these samples for rare, metabolic genetic diseases and stores spots of blood on a card for each child. As the March of Dimes explains, “[w]hen test results show that the baby has a birth defect, early diagnosis and treatment can make the difference between lifelong disabilities and healthy development.” [2]

As these “Guthrie cards” began to accumulate, it became clear that they might be useful for medical research. In 1994, law professor Jean McEwen and doctor-lawyer Phil Reilly called them “inchoate databases” and found that many laboratories were open to the idea of sharing them — in anonymized form — for research that would benefit the public. [3]

The Texas State Department of Health Services did exactly this. It provided medical researchers with de-identified Guthrie cards to study “the gene involved in club foot, to inspect the DNA of infants who develop childhood cancer, [and] to examine prenatal lead exposure.” [4] For its efforts, the department was sued. It had treated the cards as free for the taking, without going back to every pair of parents to obtain explicit permission to release their (nameless) child’s blood spots. Although it is a huge jump from any case law, and even though the legally cognizable damages suffered by any parent whose unknown child’s blood spot made its way to a laboratory are obscure, five plaintiffs alleged violations of the protection of the Fourth Amendment, the Texas Constitution, and the common law. On their behalf and seeking to represent a much larger class of plaintiffs, the Texas Civil Rights Project sought declaratory and injunctive relief. [5]

The case promptly settled. The state agreed to destroy millions of cards, to give parents clearer procedures to opt out of the storage of the cards, and to pay $26,000 in attorneys fees and costs.

There things might have stayed — but for a journalist’s “review of nine years’ worth of e-mails and internal documents on the Department of State Health Services’ newborn blood screening program.” [4] She found that the state had concealed its involvement in a nefarious and far-reaching military or law-enforcement project. The Texas doctors had turned “over hundreds of dried blood samples to the federal government to help build a vast DNA database — a forensics tool designed to identify missing persons and crack cold cases.” [4] The samples, she repeated, “were forwarded along to the federal government to create a vast DNA database, one that could help crack cold cases and identify missing persons.” [6] The database would be shared worldwide, “for international law enforcement and investigation in the context of homeland security and anti-terrorism efforts.” [4]

Incensed, the lawyer for the five plaintiffs fired off a letter to the governor and the attorney-general. He accused the “TDSHS [of] supplying those blood samples taken from newborn babies to the military, not just for research, but so that the military can build a mitochondria DNA data base, which can be used in part for law enforcement purposes.” [6] He complained that “[t]his … alarming development … raises the specter of the federal government building an international DNA data base,” and he demanded that “within ten (10) days of this letter, you retrieve from the federal government all the blood samples that Texas has sent to the U.S. military and retrieve and destroy all information taken from those samples … .” [6] Indeed, he suddenly realized that this military project was why the state was so willing to settle the case: “‘Sometimes there are slam-dunk cases, but I’d never seen this kind of case settle without discovery,’ says [Jim] Harrington, director of the Texas Civil Rights Project. ‘This explains the mystery of why they gave up so fast.'” [4]

The trouble is that it’s all smoke and no fire. The reporter and the lawyer apparently have misread the report of the Armed Forces DNA Identification Laboratory (AFDIL) detailing its efforts to collect and study mitochondrial DNA (mtDNA) from varied people and places. As explained in Chapters 11 of The Double Helix and the Law of Evidence, AFDIL is a world leader in mitochondrial DNA sequencing because the technique is exceedingly valuable in identifying the remains of soldiers missing in action. [7] But mtDNA is not used to “crack cold cases,” at least not by generating cold hits in any law-enforcement database of DNA profiles from possible offenders. The national database (NDIS) maintained by the FBI — the one that actually helps in cracking cold cases — is limited to STR profiles in the DNA from the cell nucleus. These DNA sequences are wonderful for discriminating among individuals. When a 13-locus match from a crime-scene to one of the more than seven million profiles in NDIS pops up, it can constitute a practically conclusive identification to a known individual. [7] And, the bigger NDIS is, the more likely it is that the culprit will be in it. This kind of database is “only as valuable as its … size.” [4]

Not so with mtDNA. Everyone in the same maternal line shares the same sequence, and other essentially unrelated maternal lineages might have the same sequences. [7] Moreover, it would be inane to put anonymous sequences — nuclear or mitochondrial — into the database used in searching for cold hits. A hit from a crime-scene sample to a profile from a Guthrie card with no name attached to it would have little or no investigative value. The (nameless) Texas children need not fear being swept up in criminal or terrorist investigations because AFDIL sequenced their anonymous DNA.

But if the federal government does not want the samples for a database that will be used to catch criminals or terrorists, what nefarious international database are these profiles going into? Prosaically, they are part of a scientific, population-genetics database that will be helpful in understanding the significance of a match in an ordinary criminal case. Consider State v. Ware, the very first case with mtDNA evidence. Hairs were found in the bed where a young girl was attacked. [7, chap. 12] The hairs looked similar to the defendant’s under a microscope, but there have been false convictions with hairs that happen to look similar. (Just check with the Innocence Project.) Nuclear DNA, which could yield well-nigh conclusive results, were absent in the hair shafts, but there were enough mitochondria to get a useful sequence, and this sequence matched the defendant’s. [7]

Because mtDNA just does not have the power of nuclear DNA to differentiate among individuals, however, defense counsel in such cases can object (appropriately) that the evidence is confusing or misleading without statistics on how rare the mitotype in question would be in the general population. How many people would be falsely incriminated by the mtDNA sequence in the case?

By understanding the variations in the mtDNA sequences in different places and populations, scientists can estimate how rare or how common a mitotype that incriminates a suspect might be. Such estimates require reference databases, but the existing forensic-statistical-reference databases, defense counsel and a number of scientists have argued, are too small  and full of gaps in the population groups represented. [7] Indeed, the federal government has received considerable flak from the media and a vocal group of scientists, lawyers, and sundry others for its refusal to supply de-identified nuclear-DNA profiles from law-enforcement databases for new studies to supplement the existing statistical-reference databases long used to estimate the probability of random STR-profile matches in criminal cases. [9, 10]

In sum, the AFDIL study is a response to a legitimate scientific and legal concern. The federal government (as it should) wants to improve the infrastructure for using mtDNA evidence in court by enlarging the statistical-reference databases. Thus, the AFDIL report — the supposed smoking gun posted on the Tribune‘s website — is entitled “Development and Expansion of High-quality Control Region Databases to Improve Forensic mtDNA Evidence Interpretation.” As the title indicates, these scientific databases do not generate DNA evidence. They “improve” the “intepretation” of mtDNA evidence from other sources. The very first sentence of the report makes it plain that the databases are for statistical purposes only:

Mitochondrial DNA testing in the forensic context requires appropriate, high-quality population databases for estimating the rarity of questioned haplotypes. However, large forensic mtDNA databases, which adhere to strict guidelines in terms of their generation and maintenance, are not widely available for many regional populations of the United States or most global populations outside of the United States and Western Europe.

After elaborating, the report continues:

In order to address this issue, the Armed Forces DNA Identification Lab (AFDIL) has undertaken a high-throughput control region databasing effort. … Global populations that are currently underrepresented in available forensic mtDNA databases will comprise approximately 25% of the total number of samples. The remaining individuals will represent regional samples of various U.S. populations and global populations that contribute to the overall mtDNA diversity of the U.S. The high-quality mtDNA data generated from these efforts will be publicly available to permit examination of regional mtDNA substructure and admixture, and ultimately to improve our ability to interpret mtDNA evidence.

This population-genetics study is entirely different from building a huge database of mitotypes to generate cold hits. MtDNA does not work well for this purpose, and even if the FBI wanted to do it, anonymous data from AFDIL would be useless. All that those data can do is help investigators, judges and juries better assess the results of a match to a known suspect or defendant. Suggestions that neonatal samples are being put into databases that could result in the unknowing “donors” being swept up in future investigations of crime or terrorism are troubling — but not because they are true.


[1] Texas Tribune, About the Texas Tribune, http://www.texastribune.org/about/

[2] March of Dimes, Newborn Screening Tests, Mar. 2008, http://www.marchofdimes.com/pnhec/298_834.asp

[3] J. E. McEwen & P. R. Reilly, Stored Guthrie Cards as DNA “Banks,” 55 Am. J. Human Genetics 196-200 (1994), available at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1918213/

[4] Emily Ramshaw, DNA Deception, Texas Tribune, Feb. 22, 2010, available at http://www.texastribune.org/stories/2010/feb/22/dna-deception/, last viewed, March 2, 2010

[5] Beleno v. State Dep’t of Health Serv., Civ. No. SA09CA1088 (W.D. Tex. Mar. 12, 2009) (complaint)

[6] Emily Ramshaw, TribBlog: AG’s Office Fires Back at Blood Spot Attorney, Feb. 22, 2010, available at http://www.texastribune.org/blogs/post/2010/feb/22/tribblog-attorney-asks-perry-get-dna-back-feds/, last viewed, March 2, 2010

[7] David H. Kaye, The Double Helix and the Law of Evidence (2010)

[8] Jodi A. Irwin et al., Development and Expansion of High-quality Control Region Databases to Improve Forensic mtDNA Evidence Interpretation, 1 Forensic Sci. Int’l: Genetics 154-157 (2007)

[9] David H. Kaye, Trawling DNA Databases for Partial Matches: What Is the FBI Afraid Of?, 19 Cornell J. L. & Public Pol’y 145-171 (2009)

[10] D. E. Krane et al., Time for DNA Disclosure, 326 Science 1631-1632 (2009), DOI: 10.1126/science.326.5960.1631

� 2010 David H. Kaye