More bad press on the practice of trawling DNA databases to locate suspects is here. Writing in the Washington Monthly, Michael Bobelian reveals “DNA’s Dirty Little Secret” — “a tool renowned for exonerating the innocent may actually be putting a growing number of them behind bars.” The centerpiece in the expose is the difficult case of People v. Puckett.
To some extent, the article rehashes previous writing on Puckett in the Los Angeles Times, San Francisco Magazine, and the California Lawyer, but it also makes mincemeat of statistics. Having analyzed one aspect of the case elsewhere, I shall limit myself to commenting on a few choice excerpts from this story.
“[T]he jury was told that the chance that a random person’s DNA would match that found at the crime scene was one in 1.1 million. If Puckett’s were an ordinary criminal case, this figure might have been accurate. Indeed, when police use fresh DNA material to link a crime directly to a suspect identified through eyewitness accounts or other evidence, the chances of accidentally hitting on an innocent person are extraordinarily slim. But when suspects are found by combing through large databases, the odds are exponentially higher. In Puckett’s case the actual chance of a false match is a staggering one in three … .”
One in 1.1 million was an accurate statement of “the chance that a random person’s DNA would match that found at the crime scene” at the loci in question. There was a bona fide dispute at trial over which loci to count as matching and what the resulting random-match should have been, but the probability that a person plucked at random from the Caucasian population will have DNA that matches the loci in question does not grow larger (“exponentially” or otherwise) because Puckett’s name emerged from a trawl through the state database. The random match probability is just the frequency in the population. This number is what it is.
Of course, there is a dispute over the use of the random-match probability to express the probative value of a DNA match arising from a database trawl. Many statisticians agree that, if anything, the fact of the search enhances the probative value of the match, primarily because it not only identifies a matching profile (as in the non-database-search case) but also eliminates as possible contributors thousands or even millions of individuals. (“Dirty Little Secret” keeps this fact secret.) A respectable minority of the statisticians who have written on the subject, however, maintain that the random-match probability (p) should be inflated by the size of the database (N) — the Np rule. The Np statistic is an upper bound on the probability that there would be a match to one or more profiles in a database composed entirely of profiles from people innocent of the crime under investigation.
However ones resolves the debate on the relevance of the innocent database match probability, it is misleading to suggest that “[i]n Puckett’s case the actual chance of a false match is a staggering one in three … .” The stubborn fact is that we do not know the actual chance that the match in Puckett’s case was true or false. The probability of 338,000 x 1/1,100,000 = 1/3 assumes that the database is innocent. To slide from the innocent-database-match probability to the probability that the match was to an innocent man named Puckett is to commit the transposition fallacy condemned in the recent Supreme Court case of McDaniel v. Brown (noted in a posting on January 18, 2010).
“In cases where a suspect is found by searching through large databases, the chances of accidentally hitting on the wrong person are orders of magnitude higher.”
This statement is much better. For large N, Np is much greater than p. If there are many trawl cases with only a few loci to search, some hits to innocent people will occur. For example, if a database of 500,000 profiles from individuals is trawled 1,000 times for matches to crime-scene samples that each have a random-match probability of one in million, and if all the contributors of the profiles and the crime-scene samples are unrelated, then the expected number of adventitious matches will be 500.
Nonetheless, “hitting on the wrong person” will not always put people behind bars. For one thing, if the database is not innocent — if it includes the culprit — there will be more than one match if an accidental” match occurs. As a rule, the “accident” is unlikely to be the one prosecuted. Moreover, even if only one suspect emerges, it often will be easy to eliminate the innocent ones. The 2004 database used in Puckett, for example, contained profiles from many people who were not even alive over 30 years ago, when Diane Sylvester was sexually assaulted and stabbed. Hits to those wrong persons could not lead to false prosecutions. Thus, the Np statistic exaggerates the true danger to the criminal justice system of the practice of trawling databases to find investigative leads. However, a nonzero danger remains.
“[T]he little information that has come to light about the actual rate of coincidental matches in offender databases suggests the chances of hitting on the wrong person may be even higher than the Database Match Probability suggests. In 2005, Barlow heard that an Arizona state employee named Kathryn Troyer had run a series of tests on the state’s DNA database, which at the time included 65,000 profiles, and found multiple people with nine or more identical markers. If you believe the FBI’s rarity statistics, this was all but impossible–the chances of any two people in the general population sharing that many markers was supposed to be about one in 750 million, while the Database Match Probability for a nine-marker match in a system the size of Arizona’s is roughly one in 11,000.”
These remarks are so confused that it is hard to know where to begin. A series of papers in the scientific and legal literature (reviewed in Trawling DNA Databases for Partial Matches: What Is the FBI Afraid Of?) have shown that the Arizona numbers of partial matches are roughly what one would expect if the theoretical random-match probabilities are accurate. Studies of offender databases in other counties also confirm the theoretical estimates for matches at moderate numbers of loci.
The effort to apply “the Database Match Probability” of Np = 65,000 x 1/750,000,000 = 1/11,538 in this context is nonsensical. The Np formula applies to a search involving a single nine-locus DNA profile as against N = 65,000 nine-locus DNA profiles in the database. The Arizona trawl was totally different. It was an all pairs search of N(N-1)/2 = 2,112,467,500 pairs of 13-locus profiles for matches at any combination of 9 or more loci. There are 715 ways to get a nine-locus match in a database of 13-locus profiles. Instead of a mere 65,000 comparisons, this peculiar trawl (not representative of a real database search) involved 715 x 2,112,467,500 = 1,500,000,000,000 comparisons! It is no wonder that matches at as many as nine loci were observed.
In short, the paragraph mixes two distinct issues: (a) whether expert witnesses should use Np instead of p to explain a match arising from an ordinary database trawl to a jury, and (b) whether the p as currently computed (“laughably” according to Mr. Bobelian’s source) is a reasonable estimate of the random-match probability — regardless of how the defendant was selected for prosecution.
Jurors told the Los Angeles Times that the one-in-1.1-million statistic had been pivotal to their decision. Asked whether the jury might have reached a different conclusion if they had been presented with the one-in-three figure, juror Joe Deluca replied, “Of course it would have changed things. It would have changed a lot of things.”
What did the Los Angeles reporters who interviewed the poor juror say that “1 in 3” meant? Their article mischaracterizes it as “the probability that the database search had hit upon an innocent person.” As noted above, no one knows the probability that this search hit upon an innocent person. We know only that if Puckett and everyone in the database were innocent, then the chance that at least one person would have matched could have been no larger than about 1/3. Was it error to keep this information from the jury? Mr. Puckett’s opening brief and the state’s brief give different answers. They are better sources of information about the case than is DNA’s Dirty Little Secret.
Michael Bobelian, DNA’s Dirty Little Secret, Washington Monthly, Mar.-April 2010, available at http://www.washingtonmonthly.com/features/2010/1003.bobelian.html
Charles Brenner, Arizona DNA Database Matches, http://dna-view.com/ArizonaMatch.htm
Jason Felch and Maura Dolan, DNA Matches Aren’t Always a Lock, Los Angeles Times, May 3, 2008, available at http://www.latimes.com/news/local/la-me-dna4-2008may04,0,6156934,full.story
David H. Kaye, Rounding Up the Usual Suspects: A Legal and Logical Analysis of DNA Database Trawls, North Carolina Law Review, Vol. 87, No. 2, January 2009, pp. 425-503.
—–, Trawling DNA Databases for Partial Matches: What Is the FBI Afraid Of?, Cornell Journal of Law and Public Policy, Vol. 19, No. 1, Fall 2009, pp. 145-171
Appellant’s Opening Brief, People v. Puckett, available at http://www.personal.psu.edu/dhk3/dhblog/AOB(Puckett-CA).pdf
Respondent’s Brief, People v. Puckett, available at http://www.personal.psu.edu/dhk3/dhblog/ROB(Puckett-CA).pdf