Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
80A Statistical evaluation in forensic DNA typing by Henry Roberts Aimee Pollett [We are indebted to numerous people for communicating their ideas to us. Sections of this chapter are based on material presented in particular by J.S. Buckleton, B. Budowle, I.W. Evett and B.S. Weir in international forums and informal discussions. Suggestions by J.S Buckleton, S.J. Gutowski, C.M. Triggs and B.S. Weir greatly improved the first edition of this Chapter.] THOMSON REUTERS 80A - 1 Update: 68 EXPERT EVIDENCE Author information Aimee Pollett is a forensic scientist in the Victoria Police Forensic Services Department. She gained her Bachelor of Science degree majoring in Biochemistry and Molecular Biology at the University of Melbourne in 2000. She completed a Postgraduate Diploma in forensic science at LaTrobe University in 2002. In 2003, she started her employment within the Biology division of the Victoria Police Forensic Services Centre at Macleod, Victoria. In 2008, she completed a Postgraduate course entitled “Biostatistics for Forensic DNA Profile Interpretation” offered by the University of Washington. Dr Henry Roberts is a forensic scientist in the Victoria Police Forensic Services Department. He gained his Bachelor of Arts degree in Biology with Chemistry at the University of York (United Kingdom) in 1969. He completed a Doctor of Philosophy degree at the University of Oxford in 1971, working in protein chemistry. He has 30 years’ experience in the areas of forensic biology and forensic chemistry. From 1988 to 2000 he was head of the VPFSD DNA analysis laboratory. His current position is leader of the DNA Interpretation and Statistics Unit. He is a member of the Australasian Scientific Working Group – Forensic DNA Statistics (STATSWG). He is author or co-author of 12 papers in the scientific literature on the subjects of biochemistry and the use of DNA profiling in forensic science. Ms Pollett and Dr Roberts may be contacted at: DNA Interpretation and Statistics Unit Biological Examination Branch Victoria Police Forensic Services Department 31 Forensic Drive MACLEOD VIC 3085 AUSTRALIA Telephone: 61 (03) 9450 3444 Fax: 61 (03) 9450 3601 Email: [email protected]; [email protected] COPYRIGHT AND INFRINGEMENT NOTICE All rights reserved under Australian and International Copyright Conventions. No part of this work covered by Copyright may be used, reproduced or copied in any form or by any means (graphic, electronic or mechanical, including photocopying, recording, record taping, or information retrieval systems) without the written permission of the Victoria Police Force. Copyright in this work has worldwide protection, and any unauthorised use, reproduction or copy of this work may be an infringement of copyright which the Victoria Police Force is entitled to prevent. THOMSON REUTERS 80A - 2 Expert Evidence TABLE OF CONTENTS INTRODUCTION ............................................................................................................ [80A.10] Why do we need statistics to interpret DNA profiles? ...................................................... [80A.10] What does a DNA profile tell us? ....................................................................... [80A.20] What kind of information does a DNA profile NOT provide? ............................. [80A.30] Laboratory error............................................................................................................... [80A.100] Match probability ............................................................................................................. [80A.200] Can we ever be certain that the suspect is the source of the DNA at the crime scene?.......................................................................................................... [80A.210] If alleles are not unique, are whole profiles unique? ....................................... [80A.220] Is DNA typing different from other comparative forensic techniques in this respect? ....................................................................................................... [80A.230] Approaches to solving the match probability problem.................................................... [80A.300] What assumptions are made?.......................................................................... [80A.310] Which population? ............................................................................................ [80A.320] Drawing conclusions from a database ............................................................. [80A.330] PROBABILITY THEORY ......................................................................................... [80A.1000] Probability...................................................................................................................... [80A.1000] Estimating allele frequency ............................................................................. [80A.1110] Laws of Probability ........................................................................................................ [80A.1210] First law of probability..................................................................................... [80A.1210] Second law of probability ............................................................................... [80A.1220] Joint probabilities ............................................................................................ [80A.1230] Conditional probabilities - Third Law of Probability ........................................ [80A.1240] Likelihood Ratios ........................................................................................................... [80A.1300] Genotype frequency and match probability .................................................................. [80A.1400] FALLACIES AND FANTASIES .............................................................................. [80A.2100] The danger of misusing statistics ................................................................................. [80A.2100] Prosecutor’s fallacy ......................................................................................... [80A.2110] Defence attorney’s fallacy .............................................................................. [80A.2120] The meaning of frequencies ........................................................................... [80A.2130] Database searches ......................................................................................... [80A.2140] Uniqueness and individualisation ................................................................... [80A.2150] Verbal scales ................................................................................................................. [80A.2200] DATABASES.............................................................................................................. [80A.3000] Sample selection ........................................................................................................... [80A.3100] Making estimates from population samples.................................................................. [80A.3200] Sampling uncertainty..................................................................................................... [80A.3300] Confidence limits ............................................................................................ [80A.3310] The “factor of 10” rule..................................................................................... [80A.3320] Bootstrap......................................................................................................... [80A.3330] Bayesian support interval or size bias correction........................................... [80A.3340] Highest posterior density ................................................................................ [80A.3350] Comparison of methods to estimate sampling effects ................................... [80A.3360] MODELLING THE POPULATION.......................................................................... [80A.4100] A simple model.............................................................................................................. [80A.4100] Testing the model............................................................................................ [80A.4110] Chi-square test ............................................................................................... [80A.4120] Exact tests ...................................................................................................... [80A.4130] Conclusions from testing ................................................................................ [80A.4140] Subpopulation theory .................................................................................................... [80A.4200] Modelling subpopulations ............................................................................... [80A.4210] The sampling formula ..................................................................................... [80A.4220] Heterozygote probability ................................................................................. [80A.4230] Homozygote probability .................................................................................. [80A.4240] Subpopulation theory and linkage between loci............................................. [80A.4250] THOMSON REUTERS 80A - 3 Update: 68 EXPERT EVIDENCE What value of θ to use ................................................................................... [80A.4260] Nonconcordances ......................................................................................................... [80A.4300] Considering drop-out in an assumed single source profile............................ [80A.4310] MORE COMPLEX PROBABILITIES ..................................................................... [80A.5100] Mixtures ......................................................................................................................... [80A.5100] Identifying the presence of a mixture ............................................................. [80A.5110] Characteristics of single-source DNA profiles ................................................ [80A.5120] Procedure for the interpretation of mixtures................................................... [80A.5130] Simple mixed stain example........................................................................... [80A.5140] Random Man Not Excluded ........................................................................... [80A.5150] Application of subpopulation theory to mixtures ............................................ [80A.5160] Complex mixtures ........................................................................................... [80A.5170] Implementing guidelines for mixture interpretation......................................... [80A.5180] Resolving 2-person mixtures .......................................................................... [80A.5190] Unresolvable two-person mixtures ................................................................. [80A.5200] Low-level profiles with the possibility of dropout ............................................ [80A.5210] Mixtures of DNA from more than two people ................................................. [80A.5220] Paternity calculations .................................................................................................... [80A.5300] Paternity trio.................................................................................................... [80A.5310] Exclusions ....................................................................................................... [80A.5320] Missing persons ............................................................................................................ [80A.5400] Family tree ...................................................................................................... [80A.5410] Comparison of DNA profiles ........................................................................... [80A.5420] Relatives........................................................................................................................ [80A.5500] THOMSON REUTERS 80A - 4 Expert Evidence TABLE OF CONTENTS Abbreviations df degrees of freedom DNA deoxyribonucleic acid FST Wright’s Fixation Index IBD identical by descent LR Likelihood Ratio NAFIS National Automated Fingerprint Identification System NRC National Research Council PCR polymerase chain reaction POI person of interest RFLP restriction fragment length polymorphism RFU Relative fluorescence unit RMNE Random man not excluded STR short tandem repeat θ co-ancestry coefficient [The next text page is 80A-7] THOMSON REUTERS 80A - 5 Update: 68 EXPERT EVIDENCE THOMSON REUTERS 80A - 6 Expert Evidence GLOSSARY Glossary allele — one of two or more different forms of a gene or DNA sequence at a genetic locus that can exist on different chromosomes allele frequency — the number of occurrences of a particular allele among the profiles of individuals within a particular database Bayes theorem — a mathematical formula used for calculating conditional probabilities. It figures prominently in Bayesian approaches to statistics co-ancestry coefficient — see FST concordance — an allele in an evidentiary sample that matches a corresponding allele in a person of interest’s profile confidence interval — an interval which is expected to include the unknown true value of a particular parameter, a specified proportion of the time constrained model — a model that utilises peak height information and/or mixture proportion rules to exclude genotype combinations based on those that do not meet acceptable thresholds within mixed DNA profiles cumulative density function — a statistical distribution that describes the area under the curve of a probability density function. It measures probability of a particular variable database — a list of DNA profiles obtained from a collection of individuals in a group or population drop-out — a phenomenon where an allele may not be detected due to low levels of template DNA in a sample ethnic group — a group of people whose members have common ancestral origin explicable non-concordance — the absence of a person of interest’s allele in an evidentiary profile that can be explained by known phenomena such as drop-out or somatic mutation leading to extreme peak height imbalance. This type of non-concordance leads to non-exclusions FST or θ — more or less interchangeable terms that describe the relatedness of individuals within a population genetic drift — the tendency for the genetic makeup of a population to change with time owing to the random nature of inheritance of alleles, and the consequent finite probability of some alleles becoming rare or even extinct in the population simply because they failed by chance to be passed from one generation to another genotype — characterisation of an individual’s alleles at a particular site on their DNA Hardy-Weinberg Equilibrium — the observation that the proportions of the various genotypes of a particular locus are the same in successive generations in a population highest posterior density — a statistical method that is used to account for the uncertainty which arises as a result of using a sample of a population to make estimates about the whole population. The method generates an interval which captures the most probable values of a particular variable such as allele frequency. THOMSON REUTERS 80A - 7 Update: 68 EXPERT EVIDENCE inexplicable non-concordance — the absence of a person of interest’s allele in an evidentiary profile that cannot be explained in terms of drop-out or the number of contributors proposed. In other words, the absence of an allele when it would be expected to be present if the person of interest was a contributor to the evidentiary sample. This type of non-concordance leads to exclusions. intron — a portion of the gene not translated into protein; an intervening or non-coding sequence Likelihood Ratio — a mathematical equation that gives that probability of the evidence occurring given two alternative propositions, usually the prosecution hypothesis and defence hypothesis. For single-source profiles the Likelihood Ratio is the inverse of the match probability. linkage equilibrium — a state in which multilocus genotype proportions are the same in successive generations in a population; where there is statistical independence between alleles at different loci and where the genotype at one locus does not influence the probability of a genotype at another match/matches — the situation where a person of interest’s alleles are the same as those in the evidentiary profile. This means that the person of interest is not excluded as being the source of the DNA. match probability — the likelihood that a second person from some population possesses the same single-source DNA profile mean — the mathematical average of a set of numbers mixture proportion — the relative proportions of DNA from the individual contributors to a mixed DNA profile multinomial distribution — a statistical formula that gives the probability of the possible results of an experiment with repeated trials in which each trial can result in a specified number of outcomes that is greater than two, eg the results of tossing two dice, because each die can land on one of six possible values mutually exclusive — a statistical term used to describe two or more possible alternative outcomes where in reality only one outcome can occur (a situation where the occurrence of one event is not influenced or caused by another event). In addition, it is impossible for mutually exclusive events to occur at the same time. non-coding — sections of the DNA that are not translated into protein non-concordance — an allele in a person of interest that is not present in an evidentiary profile normal distribution — a statistical distribution which plots all of its values in a symmetrical fashion and therefore follows a bell-shaped curve. In a normal distribution, the shape of the curve is completely described by the mean and the variance. peak height ratio — the ratio of the intensities of two heterozygote peaks (smaller peak divided by the larger peak) population genetics — the study of the frequency of genes and alleles in various populations probability density function — a statistical distribution that describes the probability that a variable may take on a range of values THOMSON REUTERS 80A - 8 Expert Evidence GLOSSARY probability interval — an interval which is expected to include the unknown true value of a particular parameter, with an associated probability product rule — a model that is used to evaluate the strength of a DNA match that involves multiplying alleles frequencies to obtain locus genotype frequencies, and to multiply these to estimate the frequency of the whole profile; a statistical model in which the probability of a set of characteristics is the product of the probabilities of the individual characteristics racial group — a population genetic term used to describe one of the four major racial classifications of humans: Caucasian, Negroid, Mongoloid (east Asian) and Australoid random man not excluded — the chance that someone selected at random (random man) could not be excluded as a contributor to a set of alleles observed in a mixture relative fluorescence unit — the unit of measurement of the intensity of an allele relative frequency — the number of times a particular outcome is observed (counts) divided by the total number of trials standard deviation — a measure of the spread of a set of data from its mean in a Normal distribution. The more spread apart the data, the higher the standard deviation. Mathematically, the standard deviation is the square root of the variance. stutter — a phenomenon which occurs during the amplification process, which generates a small peak (one repeat unit or four base pairs) directly before or after a larger peak unconstrained model — a model that considers all possible genotype combinations within a mixed DNA profile Wahlund effect — the observation of the genetic pool (increase in homozygotes and decrease in heterozygotes) as a result of the mixing of two populations that differ or were once isolated, and which do not undergo random mating [The next text page is 80A-11] THOMSON REUTERS 80A - 9 Update: 68 EXPERT EVIDENCE THOMSON REUTERS 80A - 10 Expert Evidence INTRODUCTION Why do we need statistics to interpret DNA profiles? [80A.10] Let us suppose that a DNA profile has been obtained from a biological sample found at a crime scene that is believed to have been left by the perpetrator of the crime. This profile is then compared with profiles from one or more reference samples from individuals who are considered to be possible sources of this material. [80A.20] What does a DNA profile tell us? 1. We can eliminate people whose DNA profile characteristics (alleles) are not present when we would expect to find them. 2. Conversely, a person whose DNA profile matches the profile of the crime scene sample is not excluded as a source of the biological material in question. [80A.30] What kind of information does a DNA profile NOT provide? 1. It does not identify the suspect as the source of crime scene material that he matches, because we cannot be sure that no-one else has the same set of matching characteristics (alleles). 2. It tells us nothing about how or when the DNA came to be at the crime scene. 3. In particular it does not tell us who else could have been the source of the DNA. There are several possible explanations for a match between two DNA profiles: 1. The samples come from the same person. 2. The crime scene sample comes from another individual whose profile matches by chance. 3. The profile matches because it comes from a close relative. 4. A laboratory error occurred. The second and third explanations are the main focus of this chapter. However, to put the debate in context, it is first necessary to consider how the possibility of an error may be handled. [The next text page is 80A-103] THOMSON REUTERS 80A - 11 Update: 68 [80A.30] EXPERT EVIDENCE THOMSON REUTERS 80A - 12 Expert Evidence INTRODUCTION [80A.100] Laboratory error [80A.100] Scientific evidence is only as good as the people and systems that produce it. Many forensic science laboratories have adopted quality assurance programs and are regularly inspected by accrediting bodies (for example, the National Association of Testing Authorities, Australia, and the American Society of Crime Laboratory Directors – Laboratory Accreditation Board) to demonstrate that their testing procedures adhere to accepted standards. Nevertheless, the possibility of human error cannot be entirely eliminated. Errors in test results may be classified as either false positives or false negatives. A report of a DNA match between two samples that actually have different profiles is an example of a false positive. A false negative occurs when a report excludes a person as the source of a DNA sample that actually came from that person. False negative results may be a concern to the community, in that they may allow guilty persons to evade detection. However, false positive results are of more immediate concern to innocent suspects, who may be falsely implicated in criminal investigations. Errors leading to false positives or false negatives can have several possible origins; for example, technical errors or limitations; contamination; sample substitution; and clerical errors. (Some laboratories even go as far as to investigate their own staff to see whether one of them could be the source of the DNA. There are proposals that such an investigation should be extended to include other investigators and law enforcement personnel who may have come into contact with the evidence. The UK has established an elimination database of Scenes of Crime Officers and police who attend scenes. Ethical, privacy and employee rights considerations may prevent such a search.) The second report of the National Research Council (1996) argued that it would be inappropriate to incorporate laboratory error rates into estimates of the strength of DNA profile evidence: it suggested that a better approach would be to demonstrate whether an error had occurred in each individual case. Thompson, Taroni and Aitken(2003), however, considers that the smaller the chance of a random match, the larger the impact the probability of a laboratory error may have on the weight of DNA evidence. That is, if one were to examine the relative merits of two possible causes of a DNA match that were consistent with the innocence of the defendant (that is to say, someone else is the culprit, and he matches the defendant’s DNA by chance; and the laboratory made an error that falsely incriminated the defendant) then the smaller the probability of the first explanation being true, the more the probability of error comes to dominate the match probability. That is not to say, however, that a low random match probability implies a high probability of error, which clearly is not the case. There are few reliable studies of the rate of false positive results reported by laboratories. Possible sources of information include proficiency test reports and documented instances of errors in cases. Either source of information has drawbacks. Proficiency tests are sometimes used as training aids for inexperienced laboratory staff. Staff undertaking proficiency tests generally know that they are being tested, and therefore may be tempted either to take more care, or alternatively to complete the test more hastily, than they otherwise might. Blind proficiency tests are difficult and expensive to devise and administer. The number of errors that come to light in casework reports, on the other hand, can be assumed to be an underestimate of the actual number of errors that occur. Estimating an error rate from historical industry-wide proficiency test results, or casework, and then applying the estimate to current casework performance in a particular laboratory would involve a bold assumption that the probability of making an error is constant. The probability of error having occurred in a particular case can be reduced by ensuring laboratory adherence to accepted protocols, scrutinising actual analytical records, duplicate testing, or the finding of more than one matching profile. THOMSON REUTERS 80A - 103 Update: 68 [80A.100] EXPERT EVIDENCE It is preferable to consider separately the issues of chance matches, matches with close relatives, and laboratory error. It is not the purpose of this chapter to evaluate the impact of laboratory error. In the following discussion of the genetic and statistical aspects of match probabilities, the possibility that errors occur will not be considered. [The next text page is 80A-105] THOMSON REUTERS 80A - 104 Expert Evidence INTRODUCTION [80A.210] Match probability [80A.200] Consider the situation where a suspect has a DNA profile that matches the profile of biological material associated with a crime. DNA statistics can be used to estimate the likelihood that a second person from some population may possess the same profile. This is termed the match probability. Multi-locus DNA profiles containing information from nine loci have not been found to occur more than once in large collections of profiles: Weir (2003). Therefore we would make an informed guess that it is highly unlikely that a second person would have the same DNA profile as the suspect. If this could be shown to be true of our particular crime scene material, it would provide very strong support for the opposing view, namely, that the DNA came from the suspect; and very little support for the view that it came from a person other than the suspect. Note that the DNA expert cannot go any further than this conclusion: Taroni, Lambert, Fereday and Werrett (2002). No matter how strongly the DNA evidence supports the proposition that it was the suspect who left the DNA, and no matter how unlikely it is that a second person has the same profile, the statistics take no account of, for example, an alibi that the suspect may have, or the chance that someone else with the same profile had the opportunity to leave DNA at the crime scene. This other evidence is not the realm of the DNA expert. So: the DNA expert needs to estimate the chance of finding a matching profile in someone other than the suspect; And: the Court needs to put this information together with the other information about the suspect, crime scene and other possible sources of the material, to arrive at its own estimate of the likelihood that the source of the DNA was the suspect. Needless to say, these two quite different roles have occasionally been confused. On one hand, the scientific question of how best to estimate match probabilities has been the subject of disagreement among experts called by opposing sides. Conducting this debate in court has, in the authors’ experience, occupied up to ten days of court time: see R v Sfoygaristos (unreported, Victorian County Court, 1992). On the other hand, the legal question of the likelihood that the suspect was not the source of the DNA should not be put to a forensic scientist: Taroni, Lambert, Fereday and Werrettl (2002). This can be a ground for a successful appeal: see R v Deen (unreported, English Court of Appeal, 1993), cited in Matthews (1994); Doheny and Adams v The Queen [1997] 1 Cr App R 369. [80A.210] Can we ever be certain that the suspect is the source of the DNA at the crime scene? DNA profiles are made up of combinations of alleles that, for the purposes of this discussion, are the same in all tissues in a person’s body (with some rare exceptions: Ainsworth (2003)). However, we know that none of these alleles that are identifiable, using current technology, is unique: • They were inherited from that person’s parents, and therefore were present in his/her ancestors, and are almost certainly present in some of his/her other relatives; • Databases (lists of DNA profiles) show that any given allele is present in many other people who are not known to be closely related to the person. THOMSON REUTERS 80A - 105 Update: 68 [80A.210] EXPERT EVIDENCE Particular alleles can be rare or common, but only very occasionally does an allele turn up that has never been seen before: Margolis-Nunno, Brenner, Cascardi and Kobilinsky (2001); Walsh et al (2003). [80A.220] If alleles are not unique, are whole profiles unique? A profile is simply a combination of alleles in an individual. Studies have shown that profiles composed of alleles at four loci may occur more than once in population samples comprising a few hundred individuals: Sudbury, Marinopoulos and Gunn (1993). Matches at six or seven loci have occasionally been found: Kidd et al (1991); Weir (2003); JS Buckleton, personal communication. It has been calculated (BS Weir, personal communication) that for a profile consisting of 8 STR loci having 10 alleles each, there are 8.4 x 1013 possible genotypes. There are not enough people in the world for every possible profile to exist. Less than 0.001% of all possible 8-locus profiles can exist. Herein lies the problem: • logic says (and studies such as those above imply) that some existing multi-locus profiles, though extremely rare, may occur in more than one person; • but we have not yet examined enough DNA profiles to find out (by counting) how often any given profile occurs (ie how many people have the same profile); • and we could not prove conclusively that each profile was unique until all people in the world (or any particular group or population considered relevant) have been typed. • and the rarity of any profile prohibits any check of the accuracy of methods for calculating the probability that a second person has the same profile as the suspect. Nevertheless, some experts are prepared to claim that the likelihood of a matching profile occurring at random is so small that they consider the profile in question to be unique (Budowle, Chakraborty, Carmody and Monson (2000; 2001)) or effectively individual (L. Freney, personal communication). It is important to realise that statements of this kind are given as expert opinions based on common sense rather than scientific or statistical proof. [80A.230] Is DNA typing different from other comparative forensic techniques in this respect? Physical or chemical comparisons to establish identity of source are commonplace in the fields of hair and fibre examinations, fingerprints, handwriting, firearms and toolmarks, shoe impressions, glass refractive indices and chemical composition data. All of these sciences currently rely on the examiner’s opinion as to whether two samples are from the same or different sources. In some of these fields, statistical data on which estimates of rarity can be based do exist (for example, NAFIS in Australia and the FBI database in the United States of America each contain several million sets of fingerprints). However, following the US Supreme Court’s decision in Daubert v Merrell Dow Pharmaceuticals , 509 U.S. 579 (1993) in 1993, fingerprints, shoe impressions (R v Wong) and knife marks were challenged in the courts as to their ability to “individualise”. It has been argued (Champod and Evett (2001); Giannelli (2010); Aitken et al, (2011)) that these other sciences will eventually need to provide estimates of chance match probability. Perhaps because such estimates were made from the outset in DNA profiling, DNA evidence has been more closely scrutinised than other types of forensic comparisons in some jurisdictions, particularly in the United States (Lynch (2003)), but also in Victoria. [The next text page is 80A-201] THOMSON REUTERS 80A - 106 Expert Evidence