Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Maximum Likelihood Estimation for Allele Frequencies Biostatistics 666 Lecture 7 Last Three Lectures: Introduction to Coalescent Models z Computationally efficient framework • Alternative to forward simulations z Predictions about sequence variation • Number of polymorphisms • Frequency of polymorphisms Coalescent Models: Key Ideas z Proceed backwards in time z Genealogies shaped by • Population size • Population structure • Recombination rates z Given a particular genealogy ... • Mutation rate predicts variation Next Series of Lectures z Estimating allele and haplotype frequencies from genotype data • Maximum likelihood approach • Application of an E-M algorithm z Challenges • Using information from related individuals • Allowing for non-codominant genotypes • Allowing for ambiguity in haplotype assignments Objective: Parameter Estimation z Learn about population characteristics • E.g. allele frequencies, population size z Using a specific sample • E.g. a set sequences, unrelated individuals, or even families Maximum Likelihood z A general framework for estimating model parameters z Find the set of parameter values that maximize the probability of the observed data z Applicable to many different problems Example: Allele Frequencies z Consider… • A sample of n chromosomes • X of these are of type “a” • Parameter of interest is allele frequency… ⎛n⎞ X L( p | n, X ) = ⎜⎜ ⎟⎟ p (1 − p ) n − X ⎝X⎠ Evaluate for various parameters p 1-p L 0.0 1.0 0.000 0.2 0.8 0.088 0.4 0.6 0.251 0.6 0.4 0.111 0.8 0.2 0.006 1.0 0.0 0.000 Likelihood Plot 0.4 For n = 10 and X = 4 Likelihood 0.3 0.2 0.1 0 0.0 0.2 0.4 0.6 Allele Frequency 0.8 1.0 In this case z The likelihood tells us the data is most probable if p = 0.4 z The likelihood curve allows us to evaluate alternatives… • Is p = 0.8 a possibility? • Is p = 0.2 a possibility? Example: Estimating 4Nµ z Consider S polymorphisms in sample of n sequences… L(θ | n, S ) = Pn ( S | θ ) z Where Pn is calculated using the Qn and P2 functions defined previously Likelihood Plot Likelihood With n = 5, S = 10 MLE 4Nµ Maximum Likelihood Estimation z Two basic steps… a) Write down likelihood function L(θ | x) ∝ f ( x | θ ) b) Find value of θˆ that maximizes L(θ | x) z In principle, applicable to any problem where a likelihood function exists MLEs z Parameter values that maximize likelihood • θ where observations have maximum probability z Finding MLEs is an optimization problem z How do MLEs compare to other estimators? Comparing Estimators z How do MLEs rate in terms of … • Unbiasedness • Consistency • Efficiency z For a review, see Garthwaite, Jolliffe, Jones (1995) Statistical Inference, Prentice Hall Analytical Solutions z Write out log-likelihood … l(θ | data ) = ln L(θ | data ) z Calculate derivative of likelihood dl(θ | data ) dθ z Find zeros for derivative function Information z The second derivative is also extremely useful ⎡ d 2 l(θ | data ) ⎤ Iθ = − E ⎢ ⎥ 2 θ d ⎣ ⎦ 1 Vθˆ = Iθ z z The speed at which log-likelihood decreases Provides an asymptotic variance for estimates Allele Frequency Estimation … z When individual chromosomes are observed this does not seem tricky… z What about with genotypes? z What about with parent-offspring pairs? Coming up … z We will walk through allele frequency estimation in three distinct settings: • Samples single chromosomes … • Samples of unrelated Individuals … • Samples of parents and offspring … I. Single Alleles Observed z Consider… • A sample of n chromosomes • X of these are of type “a” • Parameter of interest is allele frequency… ⎛n⎞ X L( p | n, X ) = ⎜⎜ ⎟⎟ p (1 − p ) n − X ⎝X⎠ Some Notes z The following two likelihoods are just as good: ⎛n⎞ X L( p; X , n) = ⎜⎜ ⎟⎟ p (1− p ) n− X ⎝X⎠ n L( p; x1 , x2 ...xn , n) = ∏ p xi (1− p )1−xi i=1 z For ML estimation, constant factors in likelihood don’t matter Analytic Solution z z The log-likelihood ⎛n⎞ ln L(θ | n, X ) = ln⎜⎜ ⎟⎟ + X ln p + (n − X ) ln(1 − p ) ⎝X⎠ The derivative d ln L( p | X ) X n − X = − dp p 1− p z Find zero … Samples of Individual Chromosomes z The natural estimator (where we count the proportion of sequences of a particular type) and the MLE give identical solutions z Maximum likelihood provides a justification for using the “natural” estimator II. Genotypes Observed Genotype Observed Frequency A1A1 n11 p11 Genotypes A1A2 A2A2 n12 n22 p12 p22 Total n=n11+n12+n22 1.0 Alleles Genotype A1 Observed Frequency n1=2n11+n12 p1=n1/2n A2 n2=2n22+n12 p2=n2/2n Total 2n=n1+n2 1.0 Consider a Set of Genotypes… z Use notation nij to denote the number of individuals with genotype i / j z Sample of n individuals Genotype Observed Frequency A1A1 n11 p11 Genotypes A1A2 A2A2 n12 n22 p12 p22 Total n=n11+n12+n22 1.0 Allele Frequencies by Counting… z A natural estimate for allele frequencies is to calculate the proportion of individuals carrying each allele Alleles Genotype A1 Observed Frequency n1=2n11+n12 p1=n1/2n A2 n2=2n22+n12 p2=n2/2n Total 2n=n1+n2 1.0 MLE using genotype data… z Consider a sample such as ... Genotype Observed z A1A1 n11 Genotypes A1A2 A2A2 n12 n22 Total n=n11+n12+n22 The likelihood as a function of allele frequencies is … n! n11 n12 n22 ( p ² ) (2 pq ) (q² ) L ( p; n ) = n11! n12! n22! Which gives… z Log-likelihood and its derivative l = ln L = (2n11 + n12 ) ln p1 + (2n22 + n12 ) ln(1 − p1 ) + C dl 2n11 + n12 2n22 + n12 = − dp1 p1 (1 − p1 ) z Giving the MLE as … pˆ1 = (2n11 + n12 ) 2(n11 + n12 + n22 ) Samples of Unrelated Individuals z Again, natural estimator (where we count the proportion of alleles of a particular type) and the MLE give identical solutions z Maximum likelihood provides a justification for using the “natural” estimator III. Parent-Offspring Pairs Parent A1A1 Child A1A2 A1A1 a1 a2 0 a1+a2 A1A2 a3 a4 a5 a3+a4+a5 A2A2 0 a6 a7 a6+a7 a1+a3 a2+a4+a6 a5+a7 N pairs A2A2 Probability for Each Observation Parent A1A1 Child A1A2 A2A2 A1A1 A1A2 A2A2 1.0 Probability for Each Observation Parent A1A1 Child A1A2 A1A1 p13 p12p2 0 p12 A1A2 p12p2 p1p2 p1p22 2p1p2 A2A2 0 p1p22 p23 p22 p12 2p1p2 p22 1.0 A2A2 Which gives… ln L = p2 = 1 − p1 B = 3a1 + 2(a2 + a3 ) + a4 + (a5 + a6 ) C = (a2 + a3 ) + a4 + 2(a5 + a6 ) + 3a7 pˆ1 = B (B + C ) Which gives… ( ) ln L = a1 ln p13 + (a2 + a3 ) ln p12 p2 + a4 ln( p1 p2 ) ( ) + (a5 + a6 ) ln p1 p22 + a7 ln p23 + constant = B ln p1 + C ln(1 − p1 ) p2 = 1 − p1 B = 3a1 + 2(a2 + a3 ) + a4 + (a5 + a6 ) C = (a2 + a3 ) + a4 + 2(a5 + a6 ) + 3a7 pˆ 1 = B (B + C ) Samples of Parent Offspring-Pairs z The natural estimator (where we count the proportion of alleles of a particular type) and the MLE no longer give identical solutions z In this case, we expect the MLE to be more accurate Comparing Sampling Strategies z We can compare sampling strategies by calculate the information for each one ⎡ d 2 l(θ | data ) ⎤ Iθ = − E ⎢ ⎥ 2 θ d ⎣ ⎦ 1 Vθˆ = Iθ z Which one to you expect to be most informative? How informative is each setting? pq z Single chromosomes Var ( p ) = N z z Unrelated individuals pq Var ( p) = 2N Parent offspring trios pq Var ( p) = 3 N − a4 Other Likelihoods z Allele frequencies when individuals are… • Diagnosed for Mendelian disorder • Genotyped at two neighboring loci • Phenotyped for the ABO blood groups z z Many other interesting problems… … but some have no analytical solution Today’s Summary z Examples of Maximum Likelihood z Allele Frequency Estimation • Allele counts • Genotype counts • Pairs of Individuals Take home reading z Excoffier and Slatkin (1996) z Introduces the E-M algorithm Widely used for maximizing likelihoods in genetic problems z Properties of Estimators For Review Unbiasedness z An estimator is unbiased if E (θˆ) = θ bias(θˆ) = E (θˆ) − θ z z Multiple unbiased estimators may exist Other properties may be desirable Consistency z An estimator is consistent if ( ) P | θˆ − θ |> ε → 0 as n → ∞ z z for any ε Estimate converges to true value in probability with increasing sample size Mean Squared Error z MSE is defined as { } 2 ⎛ ˆ ˆ MSE (θ ) = E ⎜ (θ − θ ) + (θ − θ ) ⎞⎟ ⎝ ⎠ = var(θˆ) + bias(θˆ) 2 z If MSE → 0 as n → ∞ then the estimator must be consistent • The reverse is not true Efficiency z The relative efficiency of two estimators is the ratio of their variances var(θˆ2 ) if > 1 then θˆ1 is more efficient var(θˆ1 ) z Comparison only meaningful for estimators with equal biases Sufficiency z Consider… • • z Observations X1, X2, … Xn Statistic T(X1, X2, … Xn) T is a sufficient statistic if it includes all information about θ in the sample • • Distribution of Xi conditional on T is independent of θ Posterior distribution of θ conditional on T is independent of Xi Minimal Sufficient Statistic z There can be many alternative sufficient statistics. z A statistic is a minimal sufficient statistic if it can be expressed as a function of every other sufficient statistic. Typical Properties of MLEs z Bias • z Consistency • z Typically, MLEs are asymptotically efficient estimators Sufficiency • z Subject to regularity conditions, MLEs are consistent Efficiency • z Can be biased or unbiased Often, but not always Cox and Hinkley, 1974 Strategies for Likelihood Optimization For Review Generic Approaches z Suitable for when analytical solutions are impractical z Bracketing Simplex Method Newton-Rhapson z z Bracketing z Find 3 points such that • • z θa < θ b < θ c L(θb) > L(θa) and L(θb) > L(θc) Search for maximum by • Select trial point in interval • Keep maximum and flanking points Bracketing 6 5 3 4 1 2 The Simplex Method z Calculate likelihoods at simplex vertices • Geometric shape with k+1 corners • E.g. a triangle in k = 2 dimensions z At each step, move the high vertex in the direction of lower points The Simplex Method II Original Simplex high reflection reflection and expansion low contraction multiple contraction One parameter maximization z Simple but inefficient approach z Consider • Parameters θ = (θ1, θ2, …, θk) • Likelihood function L (θ; x) z Maximize θ with respect to each θi in turn • Cycle through parameters The Inefficiency… θ2 θ1 Steepest Descent z Consider • Parameters θ = (θ1, θ2, …, θk) • Likelihood function L (θ; x) z Score vector • d ln( L) ⎛ d ln( L) d ln( L) ⎞ ⎟⎟ S= , ..., = ⎜⎜ dθ dθ k ⎠ ⎝ dθ1 Find maximum along θ + δS Still inefficient… Consecutive steps are perpendicular! Local Approximations to Log-Likelihood Function In the neighboorhood of θi l(θ) ≈ l(θi ) + S (θ − θ i ) − 1 (θ − θ i ) t Iθ (θ − θi ) 2 where is the loglikelihood function l(θ) = ln L(θ) is the score vector S = dl(θi ) Iθ = −d ²l(θ i ) is the observed information matrix Newton’s Method Maximize the approximation l(θ) ≈ l(θi ) + S(θ − θ i ) − 1 (θ − θi ) t I (θ − θ i ) 2 by setting its derivative to zero... S − I (θ − θ i ) = 0 and get a new trial point −1 θ i +1 = θ i + I S Fisher Scoring z Use expected information matrix instead of observed information: ⎡ d 2 l(θ ) ⎤ E ⎢− 2 ⎥ d θ ⎣ ⎦ instead of d 2 l(θ | data ) − dθ 2 Compared to Newton-Rhapson: Converges faster when estimates are poor. Converges slower when close to MLE.