Monthly Archives: April 2010

The Feds’ “Vast DNA Database”

A few reporters have inquired about the Texas tale concerning “hundreds of dried blood samples [shipped] to the federal government to help build a vast DNA database–a forensics tool designed to identify missing persons and crack cold cases.” [1] Earlier postings (on March 4 and 15) explained that contrary to the impression created in a series of Texas Tribune stories, the Guthrie cards were not destined for a data base like the FBI’s National DNA Index System (NDIS) that matches DNA recovered from crime scenes against stored DNA profiles from convicted offenders or arrestees.

This was not to say that, as a matter of public policy and research ethics, the state Department of Health Services should have released its Guthrie cards for any research without first securing parental consent. That is an issue about which reasonable minds can and do differ. In judging the propriety of these decisions, however, it is vital to understand the true privacy risks that the dissemination of the samples entailed. Indeed, inasmuch as plaintiffs are demanding that the Armed Forces DNA Identification Laboratory (AFDIL) return the cards (which would lead to their incineration under Texas’s current policy), the question is crucial to the continuing tussle in Texas. If there is a substantial privacy risk, the demand obviously has more merit than if there no such risk. Furthermore, because plaintiffs also want the researchers to extirpate the DNA sequence data in its anonymized research database, the issue is of national concern. The research database, although small, is important to a fair presentation of mtDNA matches in all criminal cases and to the correct interpretation of these findings during an investigation. This posting therefore provides an assessment of some of the risks that children conceivably could face from the presence of their mtDNA sequences in the research database and from their Guthrie cards in the laboratory’s files.

I. Use of the AFDIL Research Database

The population-genetics database for mitotypes poses a rather remote threat to the Texas newborns. The FBI explained how such databases work in 1999:

The FBI Laboratory, the Armed Forces DNA Identification Laboratory, and other laboratories have collaborated to compile a mtDNA population database . … The database is referred to as the SWGDAM (Scientific Working Group on DNA Analysis Methods) database. It contains sequences from four main racial groups: Caucasians, Africans, Hispanics, and Asians. Most of these samples have been obtained from paternity-testing laboratories, blood banks, or academic groups studying ethnic populations. The database currently contains 2,426 mtDNA sequences from unrelated individuals. However, the database is updated frequently and is constantly growing. …

When a sequence from a questioned sample [one found at a crime-scene] and a known [suspect’s] sample is the same, the SWGDAM database is searched for this sequence. … The FBI Laboratory lists the number of observations of a sequence in each racial subgroup of the database in a report of a mtDNA examination. For example, a sequence might be seen five times in the database samples of Caucasian descent and one time in the database samples of Hispanic descent yet not appear in the remaining database subgroups. …

[�] Most of the sequences in the forensic mtDNA database occur a single time (approximately 60 percent), and the total number of mtDNA sequences in the entire human population is not known. Reliable frequency estimates for most mtDNA sequences are therefore not possible [because] small databases are not effective tools for estimating frequencies of rare events.

[�] However, statistical methods exist for calculating an upper-bound estimate of the frequency of mtDNA types with zero occurrences or very few occurrences in a database of limited size. This upper-bound estimate describes the highest frequency expected for a particular mtDNA sequence using the database. … As the database grows in size, the frequency estimates for individual mtDNA profiles will become more and more refined and eventually lead to reliable population frequency estimates. [2]

When used to estimate the frequency of a mitotype in a crime-scene sample that matches a defendant in a criminal case, the population-genetics database poses no risk that a Texas baby will be accused of a crime–correctly or otherwise. But one can imagine a different scenario: Suppose that Inspector Javert is pursuing the perpetrator of a horrific crime in Texas. The usual suspects have excellent alibis. A search of the Texas DNA database of convicted offenders draws a blank. A search of NDIS also comes up negative. Javert, who never gives up, takes the mtDNA sequence from the crime scene and compares it to the sequences in the anonymized population-genetics database. Voila! He finds a match.

Javert demands that the custodians of the population reference database tell him whether it came from the 800 or so Texas samples. It does. Now he could go to the state health department to obtain the names of the 800 suspects (if such records still exist). Or better, if the cards supplied to AFDIL retained their original numbers and if the health department retained the numbers linked to personally identifying information, then Javert could find his way to one family, and he could investigate whether anyone in that maternal lineage could be the culprit. There might be other families with the same mitotype, but Javert at least would have found a lead.

Although the Javert scenario is fictional, it is not impossible. Even anonymized population reference mtDNA databases could lead the police to a family in some situations. But it’s a stretch.

II. Genetic Discrimination

Every tissue repository contains human biological material that could be tested for genetic markers or predictors of various diseases. Most states have laws to prevent insurance companies and employers from conducting or using such genetic test results, and the recent federal Genetic Information Nondiscrimination Act (GINA) provides comprehensive national protection as well. Considering that the AFDIL research samples came with no names attached to them, the risk that the children will face “genetic discrimination” if the cards are retained seems very small indeed.

III. Leakage into NDIS

The National DNA Index System used to find “cold hits” to convicted offenders in criminal investigations contains over 7,000,000 STR profiles. It is possible to extract STR profiles from the Texas cards. But adding these 800 or so STR profiles to the database would violate the federal law establishing the Convicted Offender DNA Index System (CODIS). Furthermore, lacking identifying information, the database administrators would not find them terribly useful. A hit from an unsolved crime to one of these profiles would mean that one of 800 Texas babies has grown up to deposit DNA at a crime scene. Our Investigator Javert then could find his way to a single suspect–if the cards supplied to AFDIL retained their original numbers, if the illegal NDIS record kept track of this number, and if the health department retained the numbers linked to personally identifying information. Still, the full scenario–that AFDIL researchers would supply the cards to the FBI, that the FBI would analyze the STRs and illegally add them to the operational database, and that a hit in the database would lead to a suspect–is strained.

IV. Leakage into the National Missing Persons Database

The FBI maintains a National Missing Persons database, also known as CODIS(mp). [3] When a child is missing, a family member with the same mitotype (anyone in the same maternal lineage) can supply a DNA sample for mtDNA sequencing. The mitotype will be kept in the missing persons database to be checked against mtDNA extracted from unidentified human remains that come to the attention of the police. A hit between the mtDNA from the remains and the family member’s DNA serves to identify the remains as the reported missing person’s. It would make little sense to include several hundred de-identified samples from Texas newborns in this database.


I would not claim that the retention of the de-identified Texas samples or the presence of the anonymized mtDNA sequences in the population-genetics research database poses absolutely no risk of someday implicating today’s newborns in a criminal investigation. But the pertinent scenarios seem farfetched. The real population genetics database bears little resemblance to “a vast DNA database” for finding missing persons and solving criminal cases. The risk it poses to the Texas newborns and their families is minimal.


1. Emily Ramshaw, DNA Deception, Texas Tribune, Feb. 22, 2010,, last viewed, March 2, 2010

2. Alice R. Isenberg & Jodi M. Moore, Mitochondrial DNA Analysis at the FBI Laboratory, For. Sci. Commun., July 1999 Vol. 1 No. 2, 1999, available at (last viewed April 10, 2010).

[3] Nancy Ritter, Missing Persons and Unidentified Remains: The Nation’s Silent Mass Disaster, NIJ Journal, No. 257 (2007), available at (last viewed April 10, 2010)

� 2010 David H. Kaye