Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DNA Forensics • DNA Forensics deals with the use of recombinant DNA technology on one or more biological specimens for forensic investigation • Common use of DNA Forensics include: Human Identification, Kinship Analysis for Missing Person Identification, Parentage Testing, etc. • Probability and Statistics play important roles in assessing the strength of DNA evidence in all such applications • Events in DNA forensics are generally low probability events, and statistical assessment of DNA forensic data requires estimation based on sparse multi-dimensional data Brief Introduction of the DNA Forensics Session of the Symposium • Four talks will address some of the major Statistical/Probabilistic issues of DNA Forensics • Current paradigm of the topic will be the focus of the first talk (R. Chakraborty) • B. Budowle will address challenges to such paradigm, when DNA quantity is low, and for identification of source of microbial agents in forensic samples • T. Wang will introduce the need of pedigree-based probabilistic calculations for missing person identification • A. Eisenberg will discuss possible statistical formulations applicable for newer technologies that being (or, about to be) implemented in the field • All four speakers are major players in DNA Forensics in the country; contributed significantly in the development of DNA Forensics; and together, have over 75 years of experience working in the subject Statistical and Probabilistic Issues in DNA Forensics: Current Paradigms Ranajit Chakraborty, PhD Robert A. Kehoe Professor and Director Center for Genome Information Department of Environmental Health University of Cincinnati College of Medicine Cincinnati, OH 45267, USA Tel. (513) 558-4925/3757; Fax (513) 558-4505 e-mail: [email protected] (Presentation at the University of Cincinnati Symposium on Probability Theory and Applications on March 21, 2009) Overview of the Talk • Brief History of DNA Forensics • Currently used DNA Markers in Forensics • Three Generic Forensic Scenarios • Examples of DNA Evidence Data • Frequency, Likelihood, and Bayesian Logic of DNA Statistics • Population Substructure and Its Effect on DNA Statistics • Lineage Markers (mtDNA and Y-STR haplotypes) • Match and Partial Match in Databases Brief History of DNA Forensics • • • • • • • • • • • • • • • • 1980 – Ray White described the first hypervariable RFLP marker 1985 – Alec Jeffreys discovered multilocus VNTR probes (the term “DNA Fingerprinting” coined) 1985 – First paper on PCR published 1988 – In US, FBI started DNA forensic casework 1991 – First STR paper published 1992 – NRC-I Report Issued 1994 –CODIS STR Loci Characterized 1995 – FSS started UK DNA Database 1996 – NRC-II Report Issued; mtDNA introduced in Forensics 1997 – 13 CODIS STR Loci Validated for Forensic Use; Y-STRs described for forensic investigation purposes 1998 – FBI launched CODIS Database 2000 – RFLP Technology replaced by Multiplex STR Technology 2002 – FBI mtDNA Population Database published; Y-STR 20plex published 2002 – SNPs have been proposed as supplementary markers 2004 – Large sizes of “offenders’ data bases” opened issues of coincidental full/partial matches 2007 – Familial search through partial match occurrences in databases Advantages of Use of STR Loci in DNA Forensics • PCR Based • Low quantity DNA • Degraded DNA • Amenable to automation • Non-isotopic CSF1PO D7S820 TPOX D8S1179 THO1 D13S317 FGA D16S539 • Rapid typing VWA • Discrete alleles D3S1358 • Abundant in genome • Highly informative (satisfied by the CODIS STRs) D18S51 D21S11 D5S818 Penta D Penta E 15 CODIS STR Loci with Chromosomal Positions TPOX D3S1358 TH01 D8S1179 D5S818 VWA FGA D7S820 CSF1PO AMEL Penta E D13S317 D16S539 D18S51 D21S11, Penta D AMEL Three Types of DNA Forensic Issues Transfer Evidence: DNA profile of the evidence sample providing indications of it being of a single source origin Mixture of DNA: Evidence sample’s DNA profile suggests it being a mixture of DNA from multiple (more than one) individuals Kinship Determination: Evidence sample’s DNA compared with that of one or more reference profiles is to be used to determine the validity of stated biological relatedness among individuals Transfer Evidence – An Example DNA Mixture Analysis (amelogenin, D8S1179, D21S11, D18S51) Inclusion mtDNA Lineage Marker Y-Chromosomal Genes Lahn, Pearson & Jegalian 2001 Y STR Loci Three Types of Conclusions Exclusion Match, or Inclusion Inconclusive Statistical Assessment of DNA Evidence Needed most frequently in the inclusionary events (Apparent) exclusionary cases may also be sometimes subjected to statistical assessment, particularly for kinship determination because of genetic events such as mutation, recombination, etc. Loci providing inconclusive results are often excluded from statistical considerations Even if one or more loci show inconclusive results, inclusionary observations of the other typed loci can be subjected to statistical assessment Approaches for Statistical Assessment of DNA Evidence Frequentist Approach: indicating the coincidental chance of the event observed Likelihood Approach: indicating relative support of the event observed under two contrasting (mutually exclusive) stipulations regarding the source of the evidence sample Bayesian Approach: providing a posterior probability regarding the source, when data in hand is considered with a prior probability of the knowledge of the source (later is not generally provided by the DNA profiles being considered for statistical assessment) Frequentist Approach of Statistical Assessment for Transfer Evidence When the evidence sample DNA profile matches that of the reference sample, one or more of the following questions are answered: How often a random person would provide such a DNA match? Equivalently, what is the expected frequency of the profile observed in the evidence sample? – also called Random Match Probability, complement of which is the Exclusion Probability What is the expected frequency of the profile seen in the evidence sample, given that it is observed in another person (namely in the reference sample) – also called Conditional Match Probability What would be the expected frequency of the profile seen in the evidence sample in a relative (of specified kinship) of the reference individual, given the DNA match of the reference and evidence samples – also called the Match Probability in Relatives Frequentist Approach of Statistical Assessment for DNA Mixture When the evidence mixture DNA profile fails to exclude a reference sample as a part contributor, and more commonly a set of reference samples together explains all alleles seen in the mixture, one or more of the following questions are answered: How often a random person would be excluded as a part contributor of the mixture sample? – also called Exclusion Probability, the complement of which is the inclusion probability, giving the expected chance of Coincidental Inclusion (Note: This answer is based on the data on the evidence sample alone, without any consideration of the profiles of the reference samples) With a stipulation on the number of contributors, how often a random person’s DNA, mixed with that of one or more of the reference persons, would provide a mixture profile as seen in the evidence sample, given that the reference persons are also part contributors of the DNA mixture (Note: This answer considers data on the profiles of evidence sample as well as those of the reference samples stipulated to be part contributors) Kinship Assessment – Frequentist Approach When comparisons of evidence and reference samples fail to exclude a stated relationship of the evidence sample with the reference individual(s), the frequency based question is of the form: What is the chance of excluding the stated relationship? – called the Exclusion Probability (PE), this is generally answered conditioned on the profiles of the reference samples and stated relationship Note: Average exclusion probability can also be computed disregarding the profiles examined, which rationalizes the choice of loci to be typed for validating the stated relationship Concept of Likelihood A Likelihood represents the support of a given hypothesis (of vale of a parameter) provided by the observations in the data, written as Likelihood = Prob. (Data | Hypothesis). Technically, likelihood is mathematically identical to the probability of the data given the hypothesis, but interpreted as a function of the hypothesis (or, parameter values specified by the hypothesis) for the observations in the data. Likelihood Ratio With two (mutually exclusive) hypotheses, say H1 and H2, the likelihood ratio (LR) is the ratio of probabilities of observing the same data under H1 and H2 , giving LR = Prob. (Data | H1) / Prob. (Data | H2). Meaning of LR: LR < 1: Data less well supported by H1, compared with H2 LR = 1: Data equally well supported by H1 and H2 LR > 1: Data better supported by H1, compared with H2 LR in Transfer Evidence Background Data: DNA profile of evidence sample (E) matches that of the suspect (S); i.e., E = S Contrasting Scenarios of Source (Hypotheses): Hp: DNA in the evidence sample came from the suspect Hd: DNA in the evidence came from someone other than the suspect, but it coincidentally matches the DNA profile of the suspect. LR in Transfer Evidence Computation LR = Pr. (Data | Hp) / Pr. (Data | Hd) = Pr. (E = S | Hp) / Pr. (E = S | Hd) = 1 / Pr. (coincidental match) Thus, LR in this case is simply the inverse (reciprocal) of the relative frequency of the DNA profile of the evidence sample in the population, given that it is the same as of the suspect LR in Transfer Evidence Variation Since LR can be defined for any two mutually exclusive hypotheses, one may also consider the alternative hypothesis as: Hr: A relative of the suspect is the source of evidence DNA In this case, the likelihood ratio, LR(r), will be LR(r) = Prob. (E=S | Hp) / Prob. (E =S | Hr) = 1/ Pr. (DNA match in the relative), which equals the reciprocal of the probability of the DNA profile found in the evidence sample in the relative of the suspect, given that the suspect has the same DNA profile LR in DNA Mixture Background Data: The DNA evidence profile, E (a DNA mixture) has alleles which are all explained by alleles present in the suspect’s DNA profile (S) and that of a victim’s DNA profile (V) Contrasting Hypotheses: Hp: DNA in the evidence sample is the mixture of DNA of the suspect and that of the victim; (i.e., Hp: E = V + S) Hd1: Evidence DNA is a mixture of DNA from the victim and that of an unknown person (i.e., Hd1: E = V + UN) Hd2: Evidence DNA is a mixture of DNA from two unknown persons (i.e., Hd2: E = UN + UN) LR in DNA Mixture Computation Pr. (Data | Hp: E = V + S) = 1, since data represents all alleles in the mixture are explained by alleles present in V and S, and no extra alleles are present in V and/or S. Hence under Hp: E = V + S, data observed is the only possible outcome, but Pr. (Data | Hd1: E = V + UN) = relative frequency of a random person, whose DNA, mixed with the DNA of the victim, would yield a mixture that matched the evidence sample, Pr. (Data | Hd2: E = UN + UN) = relative frequency of a pair of random persons, whose DNA mixture would match the profile seen in the evidence sample LR in DNA Mixture Interpretation LR for Hp vs. Hd1: = 1 / Pr. (Data | Hp: E = V + UN), which becomes the reciprocal of the relative frequency of a random person, whose DNA, mixed with the DNA of the victim, would yield a mixture that matched the evidence sample Likewise, LR for Hp vs. Hd2: = 1 / Pr. (Data | Hp: E = UN + UN), which is the inverse of the relative frequency of a pair of random persons, whose DNA mixture would match the profile seen in the evidence sample Other Considerations of Computing LR in DNA Mixture Computations of numerator and denominator of LR in mixture interpretation depend on: Precise knowledge of the number of contributors in the DNA mixture Assumptions regarding the biological relatedness of the unknown contributors (between themselves, or with the reference individuals) Population origin of the contributors Likelihood Ratio in Kinship Assessment Although the logic is similar, principles of LR formulation in kinship analysis can be simply illustrated with: Standard paternity analysis (with DNA of mother, child, and alleged father typed for several loci), and Kinship assessment for a pair of individuals (with genotype data from one or more loci) Interpretation of LR in Paternity Testing LR in paternity testing, also called PI, is the ratio of two conditional probabilities It contrasts the chance of observing the specific trio of genotypes (GC, GM, and GAF) given that AF = BF, as opposed to AF ≠ BF PI (or LR) can be computed even when M and AF, or AF and BF, are biologically related PI can be computed for apparent exclusion events as well, invoking mutation and/or recombination (generally leading to drastically reduced PI or LR for the loci where such events are observed) LR in Standard Paternity Testing Data: Mother’s DNA profile (GM), and that of the child (GC) suggests that all obligatory alleles (i.e., the alleles that the child must have received from its biological father, BF) are present in the DNA profile of AF (GAF) Hypotheses contrasted: Hp: Alleged father (AF) is the biological father (BF) of the child (M is assumed to the true mother); i.e., Hp: AF = BF Hd: Alleged father is not the biological father, but he is not excluded from paternity (i.e., Hd: AF ≠ BF) SAMPLING THEORY OF ALLELE FREQUENCIES Under the mutation-drift balance, the probability of a sample in which copies of the allele is observed, for any set of is given by Where freq. of allele in the population, and G(.) is the Gamma function, in which is the coefficient of coancestry (equivalent to Fst or Gst, the coefficient of gene differentiation between subpopulations within the population) Match Probability - Formulae under HWE with substructure adjustment unconditional conditional Homozygote (AiAi ) p i2 pi2 +θpi (1-pi) [pi (1-θ)+2θ] [pi (1-θ)+3θ] (1+θ) (1+2θ) Heterozygote (AiAj ) 2pipj 2pipj (1-θ) 2[pi (1-θ)+θ] [pj (1-θ)+θ] (1+θ) (1+2θ) CONDITIONAL MATCH PROBABILITY [2 (1 ) pi ][3 (1 ) pi ] Pr( Ai Ai | Ai Ai ) (1 )(1 2 ) 2[ (1 ) pi ][ (1 ) p j ] Pr( Ai Aj | Ai Aj ) (1 )(1 2 ) Where pi, pj are frequencies of alleles Ai and Aj , and = coefficient of co-ancestry ( Fst/Gst) representing extent of population substructure effect (Balding and Nichols, 1994) Match Probability - examples under HWE with substructure adjustment (θ=.01) unconditional conditional D3S1358 (14, 18 ) 0.0457 0.0457 0.0495 vWA (14, 16) 0.0411 0.0411 0.0451 FGA (23, 25) 0.0218 0.0218 0.0253 D8S1179 (12, 14) 0.0586 0.0586 0.0626 D21S11 (29, 30) 0.0840 0.0840 0.0881 D18S51 (13, 17) 0.0381 0.0381 0.0418 D5S818 (12, 12) 0.1252 0.1275 0.1367 D13S317 ( 9, 11) 0.0488 0.0488 0.0542 D7S820 (10, 10) 0.0844 0.0865 0.0949 Cumulative 3.9610-12 4.1310-12 9.1510-12 Upper bound of 95% C.I. 1.0210-11 1.0510-11 2.1710-11 Paternity Testing – Frequentist Approach Example In a standard paternity testing case, with mother’s genotype being A1A1, and the child’s A1A2, an alleged father whose genotype does not contain the A2 allele would be excluded, giving PE 1 – Freq.(A2 A2 ) – Freq.(A2 A2 ) where A2 is any allele other than the allele A2. This computation assumes that no mutation occurred during the transmission of alleles across generations. Note: Average exclusion probability can also be computed disregarding the profiles examined, which rationalizes the choice of loci to be typed for validating the stated relationship LR for Kinship of a Pair of Individuals Data: DNA profile (GX) of one individual X, compared with that (GY) of another individual Y is considered to assess the accuracy of a specified stated biological relationship between X and Y Hypotheses contrasted: Hp: X and Y are biologically related (i.e., the stated relationship is correct) Hd: X and Y are biologically not related Note: Comparison between two stated relationships may also be tested IBD Probabilities – ITO Method Two individuals of genotypes GX and GY can share: Both alleles IBD (called scenario I), Only one allele from each is IBD (scenario T), None of their alleles are IBD (scenario O). Their probabilities are denoted by Φ2, Φ1, and Φ0, respectively, and for any biological relatedness 0 Φ2,Φ1,Φ0 1, Φ2 + Φ1 + Φ0 = 1, and 4 Φ0 Φ2 Φ12 Kinship Analysis of a pair of Individuals : IBD Coefficients In Relatives Relationship Type Symbol 0 1 2 Monozygotic twins MZ 0 0 1 Parent-Offspring PO 0 1 0 Full Sib S 1/4 1/2 1/4 First Cousin 1C 3/4 1/4 0 Unrelated U 1 0 0 Conditional Probability of Gy given Gx for specific kinship of x and y • Stipulated kinship between x and y specifies the IBD probabilities 0, 1, 2 for x and y • For observed Gx and Gy : Pr (Gy | Gx for the specified relationship) = 0•Pr(Gy | Gx under O) + 1•Pr(Gy | Gx under T) + 2•Pr(Gy | Gx under I) Rule: Conditional probability of Gy given Gx for a stated kinship is the weighted average of conditional probabilities of the same event under specified IBD described by the kinship GENOTYPE PROBABILITIES FOR A PAIR OF INDIVIDUALS CONDITIONED BY IBD PROBABILITIES OF ALLELES Bayes Formula (Odds form) P(H1 | E) P(E | H1 ) P(H1 ) P(H2 | E) P(E | H2 ) P(H2 ) posterior odds = likelihood ratio x prior odds E = DNA evidence H1 = alleged father is biological father H2 = alleged father is not biological father Note: While the first factor of the RHS is computed from DNA evidence, the second factor, P(H1)/P(H2), is not necessarily a DNA-based information Synthesis of Three Approaches of Statistical Assessment Frequency-Approach provides the probability of the observed DNA evidence (unconditional as well as conditional) under a given stipulated hypothesis Likelihood Ratio (LR) contrasts such probabilities for two mutually exclusive hypotheses In Bayesian approach, with the use of prior probability, LR is transformed to obtain the relative odds of one hypothesis against another given the DNA data of the evidence (and that from known persons tested) Synthesis of Three Approaches (Contd.) The three approaches are built on one another, and hence, it is inaccurate to say one is wrong and the others are correct LR, without the transformation with the use of the prior probability, may be incorrectly interpreted as the answer of the Bayesian computation, but the numerator and denominator of LR can be stated with frequentist’s interpretation to avoid the error of reverse conditioning The prior probability of the Bayesian approach generally comes from non-DNA evidence, and hence, their assumptions are untestable from DNA data Important Fact with An Example LR, by itself, is not a Bayesian Approach, and the prosecutor’s fallacy can be avoided by explaining the two conditional probabilities separately Example: Consider a mixture case, where victim’s profile (V) together with the defendant’s profile (S) explains all alleles in the mixture profile (E). Under Hp: E = V + S, the conditional probability of E given Hp is 1.0, but under Hd: E = V + UN, say the conditional probability of E given that the other contributor is unknown (UN) is 1 in 100,000. Instead of telling LR = 100,000, it is less confusing to say that if we were to assume that the mixture DNA came from the victim and this defendant, this is the only observation possible (certain), but if the other contributor is unknown, we have to sample 100,000 unrelated persons before finding one, whose DNA mixed with that of the victim would produce a profile matching the profile seen in the mixture DNA evidence sample. Is the Extent of Population Substructure Uncertain for the Forensic Loci? Inbreeding Coefficient (FST) Caucasian African American Hispanic Asian Native American CSF1PO -0.0007 -0.0009 -0.0003 -0.0012 0.0244 D13S317 -0.0008 0.0029 0.0047 0.0071 0.0157 D18S51 0.0001 0.0012 0.0011 0.0046 0.0268 D21S11 0.0008 0.0005 0.0013 0.0056 0.0371 D3S1358 -0.0009 -0.0009 0.0010 0.0035 0.0764 D5S818 -0.0001 0.0010 0.0010 0.0028 0.0656 D7S820 -0.0005 0.0000 0.0010 0.0039 0.0201 Inbreeding Coefficient (FST) Caucasian African American Hispanic Asian Native American 0.0000 -0.0001 0.0005 0.0025 0.0125 FGA -0.0004 0.0004 0.0008 0.0029 0.0168 THO1 -0.0012 0.0015 0.0041 0.0058 0.0356 TPOX -0.0015 0.0021 0.0024 0.0100 0.0164 VWA -0.0011 0.0011 0.0029 0.0027 0.0172 Average -0.0005 0.0006 0.0021 0.0039 0.0282 D8S1179 The NRC-II recommendation = 0.01 for large cosmopolitan populations and = 0.03 for small isolated populations is well-validated by empirical as well as theoretical foundations Are the DNA Forensic Population Databases Random and are their Sample Sizes Sufficient? Features of Genetic Databases • Population Genetics historically always employed ‘convenient’ sampling, in stead of strict random sampling • ‘Convenient sampling’ defined as sampling of individuals without any prior knowledge of their DNA type is operationally random, in particular, when variations at DNA loci do not affect fertility, viability, cognitive or life achievement abilities • Allele frequency estimates from convenient samples have been shown to well-approximate those estimates from structured strict random sampling • Strict random samples collected at one point of time from a natural population may not remain random at another time point because of birth, death, immigration, and emigration events Features of Genetic Databases - 2 • Allele frequencies from subjects of convenient samples described by ‘selfidentified’ ethnicity have been shown to represent genetic affinities comparable with similar inferences drawn from anthropologically well-defined populations • Occasional presence of biological relatives in convenient samples does not affect allele frequency estimates, but may produce excess allele/genotype sharing at some loci Phylogenetic Tree (UPGMA) for some World Populations with allele frequency data of the CODIS STR Loci SW Hispanic (TX) SW Hispanic (CA) US Caucasian Swiss Italian SE Hispanic (FL) Chinese Japanese African American (TX) African American (CA) Apache Navajo Athabaskan Inupiat Yupik Sample Size Limitation Issue • Strictly speaking, no sample size is universally sufficient unless all individuals are continually genotyped over times • Sample sizes such as 100 to 150 individuals per population has been shown to produce stable estimates of allele frequencies above a prescribed minimum threshold allele frequency • Current forensic DNA statistics employ the concepts of minimum threshold allele frequency, and upper 95% confidence interval to account for sampling variation Concerns Related to Databases Used for Lineage Markers (e.g., mtDNA and Y-STRs) Inheritance of Lineage Markers (NOTE: Colors denote mtDNA-type, Letters (X, A, B) indicate Ylinked information, where X denotes no Y-chromosome; A and B are Y-linked alleles or Haplotypes) B X X B B B X A A A A X X X A X Introductory Comments on Lineage Markers • mtDNA is maternally inherited, and Y-STRs are transmitted to only sons from fathers alone • Barring mutations, all maternally related persons (males as well as females) will have the same mtDNA profile, and all paternally related males will have the same Y-STR profile • Different markers on mtDNA are genetically linked (with virtually no recombination) and so are the Y-STRs (residing on the non-recombining region of the Y chromosome) Comments on Lineage Markers (Contd.) • Consequently, mtDNA sequence data has to be treated like a haploid haplotype, frequency of which is NOT multiplicative across markers, and so is the case of Y-STR based profile • Counting method is the one that captures the genetic information • Stated ethnicity of individuals does not necessarily reflect patrilineal or matrilineal ancestry (e.g., mtDNA of Hispanics may be almost entirely of Native American descent, while for the autosomal STRs, only 30-50% of their genes are of Native American descent) • Thus, grouping of populations used for autosomal nuclear STR loci does not necessarily provide accurate frequency estimates of Y-linked STR haplotype, nor that of specific mtDNA sequence Fundamental Difference of Frequency of CODIS STR DNA Profile and that of based on mtDNA and Y-STRs • For CODIS STR loci, profile frequency provides information regarding the rarity of the profile in the population, or conditional probability given that the profile is found in someone else • For mtDNA, it is the frequency among individuals who are NOT maternally related • For Y-STRs, likewise, it is the frequency among individuals NOT paternally related Computation of Frequency of Lineage-based Marker Profile Using the general theory, the unconditional frequency of an haplotype (say Ai), which is count divided by sample size, can be modified to get the conditional probability Pr. (Ai|Ai) = [pi2 + pi(1-pi)]/pi = pi + (1-pi) = + pi(1 - ) Hence, the conditional probability always exceeds , the adjustment factor of possible population substructure in the database used Computation of Frequency of Lineage-based Marker Profile (Contd.) Some advocates suggest that the quantity pi in Pr. (Ai|Ai) = + pi(1 - ) can be substituted by (Count of Ai + 2)/(N + 3), where N is the sample size. When N is large, this has little effect, but can be of help when the count of Ai in the database is zero (i.e., profile in evidence not seen in the database) mtDNA and Y-STR -Value Since in terms of match versus non-match, how different are the haplotypes is not an issue, the values for mtDNA and Y-linked haplotypes are to be computed not based on mismatch based approaches (such as AMOVA), but treating all haplotypes as different alleles, generally leading to much smaller value Issues Related to DNA-Match Statistics when Suspects are Identified by Database Search Three Approaches – Three Types of Questions! • The NRC-I recommendation to use only the additional loci, not used in database search, is counter-productive • The chance of coincidental finding of a profile in a database depends on the expected rarity of the profile and database size • NRC-II’s Np rule answers the question of expected number of profiles matching a target profile (of rarity p) in a database (random with respect to crime) of size N • Bayesian approach makes additional assumptions regarding the prior odds of each individual in the database being the contributor of the DNA of the target profile DOES SOMEONE HAVE YOUR BIRTHDAY? Prob. that in a sample of persons, all birthdays are different is given by SAMPLE SIZE NEEDED FOR AT LEAST ONE DUPLICATE FOR GIVEN VALUES OF EVENT PROBABILITY AND DEGREE OF CONFIDENCE OBSERBED AND EXPECTED MATCH PROBABILITY Frequency Caucasian African-American 0.45 0.45 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 0.45 0.45 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 0 1 2 3 4 5 6 7 8 Number of Loci 6 7 8 9 10 11 12 13 9 10 11 12 13 Caribbean Hispanic Frequency Observed Expected 9 10 11 12 13 0 1 2 3 4 5 6 7 8 Number of Loci EXPECTED NUMBER OF MATCHES IN DATABASE SEARCH (CARIBBEAN) OBSERVED AND EXPECTED NUMBER OF MATCHES IN PAIRWISE COMPARISON OF PROFILE IN DATABASE (CARIBBEAN) EFFECT OF PRESENCE OF RELATIVES (Caucasian data on CODIS loci, =0, N = 1000) 1000000 100000 10000 1000 Number of Pairs 100 10 1 Unrelated 0.1 1 Full sib 0.01 10 Full sibs 0.001 100 Full sibs 0.0001 0.00001 0.000001 1E-07 1E-08 1E-09 1E-10 0 1 2 3 4 5 6 7 Number of Loci 8 9 10 11 12 13 Conclusions • With larger amount of data collected since1996, and with experiences of statistical results from caseworks, NRC-II recommendations remain as appropriate suggestions for statistical evaluation of Forensic DNA evidence • Statistical answers for different questions are necessarily different; they do not constitute lack of general acceptance • mtDNA and Y-STR database groupings are necessarily different from that of autosomal STRs because of uniparental ancestry of lineage markers • Convenient sampling effect and sampling size limitations are imbedded in current protocols of DNA statistics • Suspect from database search raises multiple type of questions answers of which are different Acknowledgements • Dr. Bruce Budowle - from FBI Academy • Hee S. Lee, Xiaohua Sheng, Jianye Ge Graduate Students at CGI, Univ. Cincinnati • SWGDAM members – for providing databases • US Granting Agencies NIH and NIJ – for partial support of the research Thank You!