Monthly Archives: February 2010

European Recommendations on Managing Forensic DNA Databases

In some ways, America remains a self-absorbed and insular nation. For instance, the thinking of European forensic scientists about interpretation and presentation of data from forensic investigations deserves more attention than it has received. The report of a European Network of Forensic Science Institutes working group on managing law-enforcement DNA databases also should be significant in refining policies on the many such databases in the United States. It addresses such issues as whose DNA profiles should be on file, how long the profiles underlying biological samples should be retained, DNA mixtures, low-template DNA, and “wild-cards” in searches.

Its recommendation number 22, about the statistics to use in reporting cold-hits (the topic of yesterday’s post), is this:

A DNA-database match report of a crime scene related DNA-profile with a person should be informative and apart from the usual indication of the evidential value of the match (RMP) it should also contain a warning indicating the possibility of finding adventitious matches (as mentioned in recommendation 21) and its implication that the match should be considered together with other information.


ENFSI DNA Working Group, DNA-database Management Review and Recommendations, April 2009, available via

Rehash and Mishmash in the Washington Monthly

More bad press on the practice of trawling DNA databases to locate suspects is here. Writing in the Washington Monthly, Michael Bobelian reveals “DNA’s Dirty Little Secret” — “a tool renowned for exonerating the innocent may actually be putting a growing number of them behind bars.” The centerpiece in the expose is the difficult case of People v. Puckett.

To some extent, the article rehashes previous writing on Puckett in the Los Angeles TimesSan Francisco Magazine, and the California Lawyer, but it also makes mincemeat of statistics. Having analyzed one aspect of the case elsewhere, I shall limit myself to commenting on a few choice excerpts from this story.

“[T]he jury was told that the chance that a random person’s DNA would match that found at the crime scene was one in 1.1 million. If Puckett’s were an ordinary criminal case, this figure might have been accurate. Indeed, when police use fresh DNA material to link a crime directly to a suspect identified through eyewitness accounts or other evidence, the chances of accidentally hitting on an innocent person are extraordinarily slim. But when suspects are found by combing through large databases, the odds are exponentially higher. In Puckett’s case the actual chance of a false match is a staggering one in three … .”

One in 1.1 million was an accurate statement of “the chance that a random person’s DNA would match that found at the crime scene” at the loci in question. There was a bona fide dispute at trial over which loci to count as matching and what the resulting random-match should have been, but the probability that a person plucked at random from the Caucasian population will have DNA that matches the loci in question does not grow larger (“exponentially” or otherwise) because Puckett’s name emerged from a trawl through the state database. The random match probability is just the frequency in the population. This number is what it is.

Of course, there is a dispute over the use of the random-match probability to express the probative value of a DNA match arising from a database trawl. Many statisticians agree that, if anything, the fact of the search enhances the probative value of the match, primarily because it not only identifies a matching profile (as in the non-database-search case) but also eliminates as possible contributors thousands or even millions of individuals. (“Dirty Little Secret” keeps this fact secret.) A respectable minority of the statisticians who have written on the subject, however, maintain that the random-match probability (p) should be inflated by the size of the database (N) — the Np rule. The Np statistic is an upper bound on the probability that there would be a match to one or more profiles in a database composed entirely of profiles from people innocent of the crime under investigation.

However ones resolves the debate on the relevance of the innocent database match probability, it is misleading to suggest that “[i]n Puckett’s case the actual chance of a false match is a staggering one in three … .” The stubborn fact is that we do not know the actual chance that the match in Puckett’s case was true or false. The probability of 338,000 x 1/1,100,000 = 1/3 assumes that the database is innocent. To slide from the innocent-database-match probability to the probability that the match was to an innocent man named Puckett is to commit the transposition fallacy condemned in the recent Supreme Court case of McDaniel v. Brown (noted in a posting on January 18, 2010).

“In cases where a suspect is found by searching through large databases, the chances of accidentally hitting on the wrong person are orders of magnitude higher.”

This statement is much better. For large N, Np is much greater than p. If there are many trawl cases with only a few loci to search, some hits to innocent people will occur. For example, if a database of 500,000 profiles from individuals is trawled 1,000 times for matches to crime-scene samples that each have a random-match probability of one in million, and if all the contributors of the profiles and the crime-scene samples are unrelated, then the expected number of adventitious matches will be 500.

Nonetheless, “hitting on the wrong person” will not always put people behind bars. For one thing, if the database is not innocent — if it includes the culprit — there will be more than one match if an accidental” match occurs. As a rule, the “accident” is unlikely to be the one prosecuted. Moreover, even if only one suspect emerges, it often will be easy to eliminate the innocent ones. The 2004 database used in Puckett, for example, contained profiles from many people who were not even alive over 30 years ago, when Diane Sylvester was sexually assaulted and stabbed. Hits to those wrong persons could not lead to false prosecutions. Thus, the Np statistic exaggerates the true danger to the criminal justice system of the practice of trawling databases to find investigative leads. However, a nonzero danger remains.

“[T]he little information that has come to light about the actual rate of coincidental matches in offender databases suggests the chances of hitting on the wrong person may be even higher than the Database Match Probability suggests. In 2005, Barlow heard that an Arizona state employee named Kathryn Troyer had run a series of tests on the state’s DNA database, which at the time included 65,000 profiles, and found multiple people with nine or more identical markers. If you believe the FBI’s rarity statistics, this was all but impossible–the chances of any two people in the general population sharing that many markers was supposed to be about one in 750 million, while the Database Match Probability for a nine-marker match in a system the size of Arizona’s is roughly one in 11,000.”

These remarks are so confused that it is hard to know where to begin. A series of papers in the scientific and legal literature (reviewed in Trawling DNA Databases for Partial Matches: What Is the FBI Afraid Of?) have shown that the Arizona numbers of partial matches are roughly what one would expect if the theoretical random-match probabilities are accurate. Studies of offender databases in other counties also confirm the theoretical estimates for matches at moderate numbers of loci.

The effort to apply “the Database Match Probability” of Np = 65,000 x 1/750,000,000 = 1/11,538 in this context is nonsensical. The Np formula applies to a search involving a single nine-locus DNA profile as against N = 65,000 nine-locus DNA profiles in the database. The Arizona trawl was totally different. It was an all pairs search of N(N-1)/2 = 2,112,467,500 pairs of 13-locus profiles for matches at any combination of 9 or more loci. There are 715 ways to get a nine-locus match in a database of 13-locus profiles. Instead of a mere 65,000 comparisons, this peculiar trawl (not representative of a real database search) involved 715 x 2,112,467,500 = 1,500,000,000,000 comparisons! It is no wonder that matches at as many as nine loci were observed.

In short, the paragraph mixes two distinct issues: (a) whether expert witnesses should use Np instead of p to explain a match arising from an ordinary database trawl to a jury, and (b) whether the p as currently computed (“laughably” according to Mr. Bobelian’s source) is a reasonable estimate of the random-match probability — regardless of how the defendant was selected for prosecution.

Jurors told the Los Angeles Times that the one-in-1.1-million statistic had been pivotal to their decision. Asked whether the jury might have reached a different conclusion if they had been presented with the one-in-three figure, juror Joe Deluca replied, “Of course it would have changed things. It would have changed a lot of things.”

What did the Los Angeles reporters who interviewed the poor juror say that “1 in 3” meant? Their article mischaracterizes it as “the probability that the database search had hit upon an innocent person.”  As noted above, no one knows the probability that this search hit upon an innocent person. We know only that if Puckett and everyone in the database were innocent, then the chance that at least one person would have matched could have been no larger than about 1/3. Was it error to keep this information from the jury? Mr. Puckett’s opening brief and the state’s brief give different answers. They are better sources of information about the case than is DNA’s Dirty Little Secret.


Michael Bobelian, DNA’s Dirty Little Secret, Washington Monthly, Mar.-April 2010, available at

Charles Brenner, Arizona DNA Database Matches,

Jason Felch and Maura Dolan, DNA Matches Aren’t Always a Lock, Los Angeles Times, May 3, 2008, available at,0,6156934,full.story

David H. Kaye, Rounding Up the Usual Suspects: A Legal and Logical Analysis of DNA Database Trawls, North Carolina Law Review, Vol. 87, No. 2, January 2009, pp. 425-503.

—–, Trawling DNA Databases for Partial Matches: What Is the FBI Afraid Of?, Cornell Journal of Law and Public Policy, Vol. 19, No. 1, Fall 2009, pp. 145-171

Appellant’s Opening Brief, People v. Puckett, available at

Respondent’s Brief, People v. Puckett, available at

Striking Out with GINA

An article in the New York Times refers to the problems the major leagues have encountered in verifying the ages of young baseball players from the Dominican Republican. It reports that there is talk of taking fingerprints from aspiring players at around age 10. It adds that

The disclosure that Major League Baseball is considering fingerprinting young prospects comes six months after The New York Times reported that investigators for the commissioner’s office were conducting genetic testing on some Dominican prospects and their parents to ensure that the players were not lying about their identities and ages.

The practice was widely criticized by experts in genetics and bioethics who said they believed it was a violation of personal privacy and that it was illegal under an act passed by Congress that took effect in November 2009.

According to several people in baseball, the commissioner’s office has not conducted DNA testing since the practice was disclosed.

But was the earlier “genetic testing” was “a violation of personal privacy” and illegal under the Genetic Information Nondiscrimination Act”? It depends, I would submit, on the tests done. GINA was meant to prohibit tests for medically relevant conditions in employment and insurance. Its application — if any — to identity testing is discussed in the essay, GINA’s Genotypes. There, I argue that GINA should not prohibit an employer from testing potential or actual employees at the usual forensic identification loci. Of course, the privacy issue is more acute if the leagues were doing parentage testing, but even that is outside the strike zone of GINA.


David H. Kaye, Commentary, GINA’s Genotypes, 108 Mich. L. Rev. First Impressions 51 (2010),

Michael S. Schmidt, Baseball Considers Plan to Curtail Age Fraud, N.Y. Times, Feb. 10, 2010, at B11

Previous entries in this blog tagged with “GINA”