Monthly Archives: October 2011

DNA Identification Technology: Fast and Furious

Today’s talks at the International Symposium on Human Identification indicated some directions in which DNA-based identification technology will move in the near future. For example, one company reported a way to type 26 different STRs simultaneously. Is that enough to justify testimony of global individualization (with the exception of identical twins)?

The Departments of Defense, Homeland Security, and Justice are seeking self-contained devices for rapid STR profiling and interpretation, and several companies claim to be on the verge of delivering them. “Rapid” means an hour or so, and the hope is that these microfluidic devices will permit on-the-spot (or at-the-police-station) results for investigations as well as DNA database queries and entries. One company promises a functioning product in April 2012. Another refers to an existing instrument “compact enough to be used in an office setting, airport security area, mobile van, or field-forward military site.”

None of these has been fully validated. The FBI is figuring on widespread implementation at local police stations in 4-7 years, but police in Palm Bay, Florida, have posted videos on YouTube to advertise their success with a microfluidic device in “Operation Rapid Hit.”

Finally, companies are supplying police with phenotype and ancestry data, including probable eye and hair color. For the future, the most impressive — and disquieting — approach uses “next-generation sequencing” to extract all the usual STRs, together with phenotypically and medically informative data in one fell swoop.

Indeed, sequencing the oral bacteria that we are host is possible. A speaker described one individual whose microbiome included a bacterium used in the industrial production of yogurt and cheese. Just imagine the APB: “The suspect is a white male with brown hair (probability = 0.45) and blue eyes (probability = 0.95) who likes yogurt.”

An Odd Set of Odds in Kinship Matching with DNA Databases

The 22d International Symposium on the Future of Human Identification began yesterday with a set of workshops. One was on “familial searching.” The phrase refers to trawling the profiles in a DNA database for certain types of partial matches to a DNA profile from a crime-scene sample.

Partial matches that are useful in generating investigative leads to family members arise much more often when a particular kind of relative (say, a full sibling) is the source of the crime-scene sample than when an individual who is not closely related to the database inhabitant is the source. The ratio of the probability of the partial match under the former condition (a given genetic relationship) to the latter (unrelated individuals) is a likelihood ratio (LR). The LR (or, technically, its logarithm) for siblingship expresses the weight of the evidence in favor of the hypothesis that the source is full sibling as opposed to an unrelated individual.

After explaining the this idea, the first speaker presented the following formula:

“Odds” = LRautosomal x LRY-STR x 1/N         (1)

She attributed this formula to the California state DNA laboratory that does familial searching in that state. In this equation, N is the size of the database, LRautosomal is the likelihood ratio for the partial match at a set of autosomal STR loci, and LRY-STR is the likelihood ratio for the matching Y-STR haplotype.

She described this as a Bayesian computation that could lead to statements in court such as “there is a 98% probability” that the person whose DNA was found at the crime scene is a brother of Joe Smith, a convicted offender whose DNA profile is in a DNA database.

There are three interesting things to note about these suggestions. To begin with, it is not clear why such a statement would be introduced in a trial. By the time the suspect has become a defendant, a new sample of his DNA should have been tested to establish a full match to the crime-scene sample. At that point, why would the judge or jury care whether defendant is related to a database inhabitant. The relevance of the DNA evidence lies in the full match to the crime-scene sample, and the jury need not consider whether the defendant is a relative of someone not involved in the alleged crime. (One might ask whether the trawl through the database somehow degrades the probative value of the full match, but, if anything, it increases it. [1])

The issue could arise, however, if police were to seek a court order or search warrant to collect a DNA sample from the suspect. At that point, they would need to describe the significance of the partial match to the convicted offender.

This possibility brings us to the second noteworthy point about equation (1). The “odds” (or the corresponding probability) are not the way to present the weight of the partial match. Consider the prior probability of a match in a small database, say, of size N=2. Prior to considering the partial match, why would one think that the probability of a database inhabitant being the sibling of the criminal who resides outside the database is 1/N = 1/2? It is quite improbable that the database of two people includes a relative of every criminal who leaves DNA at a crime-scene. The a priori probability for a small database must be closer to 0 than 1/N.

That the prior probability is less than 1/N is a general result. The only exception occurs when it is absolutely certain that a sibling of the perpetrator is in the database. On that assumption, prior odds of 1 to N-1 are not unreasonable. But that assumption is entirely artificial, and to advise a magistrate that the posterior odds have the value computed according to (1) would be to overstate the implications of the partial match.

The third thing to note about dividing by N is that it accomplishes nothing in producing a viable list of partially matching profiles in a DNA database trawl. The straightforward approach is to produce a short list of candidates in the database whose first-degree relatives might be the source of the crime-scene sample. The minimum value of LRautosomal x LRY-STR should be large enough to keep the two conditional error probabilities (including a candidate when there is no relationship, and not including a candidate when there is a relationship) small. This threshold value does not depend on N. (A later speaker made this observation.)

Equation (1), it seems, is useless. Instead, the magistrate should be told the value of the LR and how often such large LRs would occur when a crime-scene sample comes from a relative versus how often it would occur when it comes from an related person.


1. David H. Kaye, 2009, Rounding Up the Usual Suspects: A Legal and Logical Analysis of DNA Database Trawls, North Carolina Law Review, 87(2), 425-503.