Tag Archives: likelihood ratio

Williams v. Illinois (Part II: More Facts, from Outside the Record)

This morning, Professor Richard Friedman posted a revealing report that Cellmark sent to the Illinois State Police (ISP) in Williams v. Illinois. As he explains, and as my previous posting on the facts of the case indicated, the report consists of much more than “machine-generated” statements. But the report — which is a “lodging” that is not part of the record in the case — and Professor Friedman’s remarks warrant a revision to my presentation of the facts of the case and some queries about the ethics of the state’s presentation of the DNA evidence.

On cross-examination, ISP analyst Karen Kooi Abbinanti, who examined the blood sample that Williams gave under court order in another case, testified to William’s STR profile. Because ISP analyst Sandra Lambatos, who provided the state’s only evidence of a DNA match, testified that “there [was] a computer match generated of the male DNA profile found in semen from the vaginal swabs of [LJ] to a male DNA profile that had been identified as having originated from Sandy Williams,” I presumed that the Cellmark report listed this profile as coming from the male fraction of DNA in the vaginal swab. Indeed, Lambatos testified that the “allele chart” in the Cellmark report “included data that [she] used to run [her] data bank search.” Joint Appendix at 61. Thus, I wrote that

The unnamed analyst believed that the semen had the following profile: D3 (16, 19), DWA (17, 17), FGA (18.2, 22), D8 (14, 14), D21 (29, 30), D18 (13, 17), D5 (12, 13), D13 (11, 11), D7 (10, 12), D16 (9, 11), TH01 (7, 7), TPOX (11, 11), and CSF (8, 10). The analyst’s report included this profile . . . .

Now that the report is lodged, it is clear that this singular profile is not what the anonymous Cellmark analyst and Cellmark’s two laboratory directors, Robin Cotton and Jennifer Reynolds, signed off on. Their table, which was Lambatos’s “data,” has the entry of (10, 12, 13) instead of (12, 13) for the D5S818 locus. Had Ms. Lambatos used this tri-allelic genotype, Williams would have been excluded! (Tri-allelic, single locus profiles are rare, but they are not unheard of. For example, one paper reports three cases of tri-allelic patterns observed during routine forensic casework on 5964 Belgian residents [1], and the D5S818 (10, 12, 13) profile has been observed [2].)

Ms. Lambatos, however, testified on cross-examination that the Cellmark report’s “deduced male donor profile” (to quote the report itself) was not actually a deduced profile, but only a list of deduced alleles. Joint Appendix at 71. Interpreting it in this fashion (which may well be the correct understanding what the unknown analyst meant to write), she searched the unspecified database for certain two-allele subsets of the three alleles– namely, (13, 13), (10, 13), and (12, 13). Id. This made sense because, if Cellmark had correctly identified the victim’s profile — something that Lambatos did not check — then the rapist rather than the victim had to be the source of the 13-repeat allele.

The circumscribed nature of Ms. Lambatos’s testimony on direct examination about the “DNA match” is worthy of comment. Full disclosure would have required a scientist to reveal that other male profiles than just Williams’ profile were “consistent with” the vaginal-swab mixture and could have been picked out of a database in her trawl. Instead, Ms. Lambatos acquiesced in or suggested confining her testimony to Williams’ matching profile and the random-match probability associated with that one profile. In other words, she chose not to acknowledge possibilities that were inconsistent with the state’s theory. Does such selectivity contravene the professional responsibility of forensic scientists to “[a]ttempt to qualify their responses while testifying when asked a question with the requirement that a simple ‘yes’ or ‘no’ answer be given, if answering ‘yes’ or ‘no’ would be misleading to the judge or the jury”? [1]

The answer, I think, depends on how misleading Ms. Labatos’s answers on direct examination were. This was not a case of a single profile that probably could exclude everybody except a twin brother. The analysts were unable to distinguish between Sandy Williams and other males with similar, but not identical profiles, as possible sources of the male DNA. By not disclosing this fact, Ms. Lambatos and the prosecutor made the DNA “match” sound especially compelling. The prosecutor asked about “the male DNA profile found in the semen.” Ms. Labatos made no effort to correct or clarify even though she firmly believed that Cellmark was reporting at least three different male profiles for the semen (and that Williams was, of course, a match to only one of them). Hammered with Ms. Lambatos’s figures for the Williams’ profile frequency, a judge surely would think that only Williams could have been the rapist. In contrast, a judge who understood that Cellmark’s tests pointed to men with other DNA profiles might have been more willing to entertain some doubt.

The counterargument is that the probative value of the evidence for the ambiguous profile is essentially the same as the probative value of the evidence for the unambiguous profile that Ms. Lambatos was asked about. There probably are no other men in Chicago with the alternative profiles. Assuming that the vaginal swab DNA is a mixture of the victim’s DNA and one man’s DNA, and assuming that the laboratory called all the alleles correctly, the likelihood ratio for the hypotheses of Williams versus that of a random, unrelated man is 1/[p(10,12,13) + p(13,13) + p(10,13) + p(12,13)], where p is the random-match probability for the full genotype, including the alleles shown in parentheses. Ms. Lambatos computed the probability p(12,13) as falling in the quadrillionths. Although I have not consulted allele frequency tables, it is a safe bet that similarly small probabilities would pertain to the profiles with the (13,13) and (10,13) genotypes. The random-match probability for the profile with the tri-allelic pattern would be even smaller. (When asked by the defense, Ms. Lambatos testified that a tri-allelic male was not a real possibility.) Therefore, I would predict that the correct computation would differ from the number given to the judge by less than an order of magnitude. Hence, the witness’s failure to clarify or correct the prosecutor in her questioning affected the probative value of the evidence minimally.

Nevertheless, for the expert to present such oversimplified testimony without any qualification seems problematic to me. When confronted with the omissions on cross-examination, the expert owned up to them, but did she ask the prosecutor to present the expert’s reasoning accurately in the first place? And if she did ask, why didn’t the prosecutor do it?

References

1. G. Mertensemail, S. Rand, E. Jehaes et al., Observation of Tri-allelic Patterns in Autosomal STRs During Routine Casework, 2 Forensic Sci. Int’l: Genetics Supplement Series 38-40 (2009).

2. NIST STR-base, Tri-Allelic Patterns, June 2, 2011, http://www.cstl.nist.gov/strbase/var_D5S818.htm#Tri

3. American Society of Crime Laboratory Directors Laboratory Accreditation Board, ASCLD/LAB Guiding Principles of Professional Responsibility for Crime Laboratories and Forensic Scientists, Principle 19, Version 1.1, 2009.

Another expert succumbs to the transposition fallacy

A book that attempts to inform defense lawyers on how to handle DNA cases is Dealing with DNA Evidence: A Legal Guide (London: Routledge-Cavendish 2007). In this short primer, Andrei Semikhodskii, Director of Medical Genomics, Ltd., explains that “[u]nderstanding how DNA evidence is obtained and evaluated helps lawyers to find pitfalls in evidence and in data interpretation … .” (P. xi).

Fair enough, but the burden on a book whose purpose is to provide accurate explanations is a heavy one. A common mistake in DNA and other statistical testimony is transposition — mistaking the probability of the evidence given a hypothesis, P(E|H), for the probability of the hypothesis given the evidence, P(H|E). (See the blog of January 18, 2010, on McDaniel v. Brown.) A variation on the transposition fallacy occurs in parentage tests. Dr. Semikhodskii’s laboratory advertises the “world’s most accurate  paternity testing,” but Dealing with DNA Evidence is less than pellucid when it explains that

DNA testing does not give a 100 per cent probability of confirming parentage. When biological parentage is possible, its likelihood is estimated by the CPI [Combined Parentage Index] The value of the CPI indicates how many more times the alleged parent is likely to be the true biological parent of the child than in comparison to an untested unrelated individual from the same population.  (P. 45).

Apparently, the book is referring to a likelihood ratio for the hypothesis that the tested man is the father as against the hypothesis that an unknown man (with no close genetic relationship to the accused) is. But a likelihood ratio that takes on some value x does not mean that the tested man is x times more likely to be the father than is the untested man. It means that the genetic data are x times more likely to arise if he is the father.

Not clear? Well, suppose a ridiculously limited genetic test indicated that a child is 10 times more likely to inherit a genotype from his mother and the putative father than from his mother and a randomly selected man (of equal fertility). Does this mean that the putative father is ten times more likely than Mr. Random to be the biological father? It cannot mean this (in general). After all, if the putative father were up in the International Space Station (and the mother was not) during any plausible date of conception, the likelihood ratio would still be 10. Geneticists can compute the chances of a child’s inheriting various alleles if and when a given man is the father. Even with the best paternity test in the universe, the laboratory cannot compute the chance that the man is the father just by knowing the alleles the child inherited from his father.

Therefore — and contrary to this expensive guide for lawyers — the likelihood ratio does not “show how many times more plausible the prosecution hypothesis is given the DNA evidence.” (P. 76). The ubiquitous transposition fallacy is at work here, as it is in the case law. (I discuss some cases involving such transposition in the likelihood ratio in The Modern Wigmore on Evidence: Expert Evidence.)

This confusion between a “likelihood” P(E|H) (the probability of data given a hypothesis) and a “posterior probability” (that the hypothesis is true given evidence in support of that hypothesis) infects a later discussion of the rule that “[t]he expert should not be asked his opinion on the likelihood that it was the defendant who left the crime stain … .” R. v. Doheny [1997] 1 Cr. App. R. 369. Dr. Semikhodskii thinks that “in contravention of this ruling, almost every DNA report submitted to courts does contain the verbal expression of how much support is to be given to the prosecution hypothesis and in most cases this is allowed to be admitted and aired in front of the jury.” (P. 60). But if “what is admitted and aired” is merely a likelihood ratio and a characterization of its magnitude in English, the expert is not giving “an opinion on the likelihood that it was the defendant who left the crime stain.” An expert who states that it is, say, 100,000 times more likely for certain evidence to arise when the defendant really is the source than otherwise and that this means that the evidence gives “very strong support” to this hypothesis is avoiding rather than offering a statement about the source probability.

Somehow or other, the expert must explain the strength of the evidence to the jury, and classifying it as weak or strong is one way to do it. Indeed, a committee of the U.S. National Academy of Science recently recommended that forensic scientists use such standardized terminology to characterize evidence. The problem with this recommendation is not that it invades the province of the jury by directly expressing an opinion on an ultimate issue, but that the verbal predicate is superfluous. If the expert can state the numerical value of the likelihood ratio — the quantity that measures the strength of the evidence rather than the probability of the hypothesis — then what does adding an arbitrary but standard adjective accomplish?

Let’s hope there is a better guide for lawyers.