Download DATA ANALYSIS - DCU School of Computing

DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation MAXIMUM LIKELIHOOD ESTIMATION • Recall general points: Estimation, definition of Likelihood function for a vector of parameters  and set of values x. Find most likely value of  = maximise the Likelihood fn. Also defined Log-likelihood (Support fn. S() ) and its derivative, the Score, together with Information content per observation, which for single parameter likelihood is given by   2  2          I ( )  E   log L( x)    E  log L( x)     2         • Why MLE? (Need to know underlying distribution). Properties: Consistency; sufficiency; asymptotic efficiency (linked to variance); unique maximum; invariance and, hence most convenient parameterisation; usually MVUE; amenable to conventional optimisation methods. 2 VARIANCE, BIAS & CONFIDENCE 1 2 ˆ i    k  k  ˆ i    k • Variance of an Estimator - usual form or ˆ2  i 1 i 1 for k independent estimates • For a large sample, variance of MLE can be approximated by 1 ˆ2  nI ( ) 2 can also estimate empirically, using re-sampling* techniques. • Variance of a linear function (of several estimates) – (common need in genomics analysis, e.g. heritability), in risk analysis E (ˆ)   • Recall Bias of the Estimator 2 then the Mean Square Error is defined to be: MSE  E (ˆ   ) expands to E{[ˆ  E (ˆ)]  [ E (ˆ)   ]}2   2ˆ  [ E (ˆ)   ]2 so we have the basis for C.I. and tests of hypothesis. 3 COMMONLY-USED METHODS of obtaining MLE • Analytical - solving dL  0 or dS d  0 when simple solutions d exist • Grid search or likelihood profile approach • Newton-Raphson iteration methods • EM (expectation and maximisation) algorithm N.B. Log.-likelihood, because max. same  value as Likelihood Easier to compute Close relationship between statistical properties of MLE and Log-likelihood 4 MLE Methods in outline Analytical : - recall Binomial example earlier dS ( ) x n  x Score    0 d   x ˆ   n • Example : For Normal, MLE’s of mean and variance, (taking derivatives w.r.t mean and variance separately), and equivalent to sample mean and actual variance (i.e. /N), - unbiased if mean known, biased if not. • Invariance : One-to-one relationships preserved • Used: when MLE has a simple solution 5 MLE Methods in outline contd. Grid Search – Computational Plot likelihood or log-likelihood vs parameter. Various features • Relative Likelihood =Likelihood/Max. Likelihood (ML set =1). Peak of R.L. can be visually identified /sought algorithmically. e.g. S ( )  Log[ 20 (1   )80   80 (1   ) 20 ] Plot likelihood and parameter space range 0    1 - gives 2 peaks, symmetrical around ˆ  0.2 ( likelihood profile for e.g. well-known mixed linkage analysis problem. Or for similar example of populations following known proportion splits). If now constrain 0    0.5 MLE solution unique e.g. ˆ  0.5 = R.F. between genes (possible mixed linkage phase). 6 MLE Methods in outline contd. • Graphic/numerical Implementation - initial estimate of . Direction of search determined by evaluating likelihood to both sides of . Search takes direction giving increase, because looking for max. Initial search increments large, e.g. 0.1, then when likelihood change starts to decrease or become negative, stop and refine increment. Issues: • Multiple peaks – can miss global maximum, computationally intensive ; see e.g. http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html • Multiple Parameters - grid search. Interpretation of Likelihood profiles can be difficult, e.g. http://blogs.sas.com/content/iml/2011/10/12/maximumlikelihood-estimation-in-sasiml/ 7 Example in outline • Data e.g used to show a linkage relationship (non-independence) between e.g. marker and a given disease gene, or (e.g. between sex and purchase) of computer games. Escapes = individuals who are susceptible, but show no disease phenotype under experimental conditions: (express interest but no purchase record). So define  ,  as proportion of escapes and R.F. respectively. 1   is penetrance for disease trait or of purchasing, i.e. P{ that individual with susceptible genotype has disease phenotype}. P{individual of given sex and interested who actually buys} Purpose of expt.-typically to estimate R.F. between marker and gene or proportion of a sex that purchases • Use: Support function = Log-Likelihood. Often quite complex, e.g. for above example, might have S ( ,  )  k1 ln( 1     )  k2 ln(    )  k3 (     )  k4 ln( 1       ) 8 Example contd. • Setting 1st derivatives (Scores) w.r.t   0 and w.r.t.   0 • Expected value of Score (w.r.t.  is zero, (see analogies in classical sampling/hypothesis testing). Similarly for . Here, however, No simple analytical solution, so can not solve directly for either. • Using grid search, likelihood reaches maximum at e.g. ˆ  0.02, ˆ  0.22 • In general, this type of experiment tests H0: Independence between the factors (marker and gene), (sex and purchase) (  0.5) • and H0: no escapes (   0) Uses Likelihood Ratio Test statistics. (M.L.E. 2 equivalent) 9 MLE Methods in outline contd. Newton-Raphson Iteration Have Score () = 0 from previously. N-R consists of replacing Score by linear terms of its Taylor expansion, so if ´´ a solution,  ´=1st guess dS ( ) dS ( ) d 2 [ S ( )]   (    ) 0 2 d d d d [ S ( )] d       2 d S ( ) d 2 Repeat with  ´´ replacing ´ Each iteration - fits a parabola to Likelihood Fn. L.F. 2nd  1st • Problems - Multiple peaks, zero Information, extreme estimates • Multiple parameters – need matrix notation, where S matrix e.g. has elements = derivatives of S(, ) w.r.t.  and  respectively. Similarly, Information matrix has terms of form  2   2   E  2 S ( ,  )  E  S ( ,  )etc.             1 1  Estimates are         N I ( ) S ( ) Variance of Log-L     10 i.e.S() MLE Methods in outline contd. Expectation-Maximisation Algorithm - Iterative. Incomplete data (Much genomic, financial and other data fit this situation e.g. linkage analysis with marker genotypes of F2 progeny. Usually 9 categories observed for 2locus, 2-allele model, but 16 = complete info., while 14 give info. on linkage. Some hidden, but if linkage parameter known, expected frequencies can be predicted and the complete data restored using expectation). • Steps: (1) Expectation estimates statistics of complete data, given observed incomplete data. • -(2) Maximisation uses estimated complete data to give MLE. • Iterate till converges (no further change) 11 E-M contd. Implementation • Initial guess, ´, chosen (e.g. =0.25 say = R.F.). • Taking this as “true”, complete data is estimated, by distributional statements e.g. P(individual is recombinant, given observed genotype) for R.F. estimation. • MLE estimate ´´ computed. • This, for R.F.  sum of recombinants/N. • Thus MLE, for fi observed count,    Convergence ´´ = ´ or 1 N  f P ( R G) i i      tolerance (0.00001) 12 LIKELIHOOD : C.I. and H.T. • Likelihood Ratio Tests – c.f. with 2. • Principal Advantage of G is Power, as unknown parameters involved in hypothesis test. Have : Likelihood of  taking a value A which maximises it, i.e. its MLE and likelihood  under H0 : N , (e.g. N = 0.5) • Form of L.R. Test Statistic  L( N x)   L( A x)  G  2 Log  G  2 Log    or, conventionally L (  x ) L (  x )   A    N - choose; easier to interpret. • Distribution of G ~ approx. 2 (d.o.f. = difference in dimension of parameter spaces for L(A), L(N) ) n O • Goodness of Fit : notation as for 2 , G ~ 2n-1 : G2 Oi Log i Ei i 1 O Log E r c • Independence: G  2 Oij ij i 1 j 1  notation again as for 2 ij 13 Likelihood C. I.’s – graphical method • Example: Consider the following Likelihood function L( )  (1   ) a  b  is the unknown parameter ; a, b observed counts • For 4 data sets observed, A: (a,b) = (8,2), B: (a,b)=(16,4) C: (a,b)=(80, 20) D: (a,b) = (400, 100) • Likelihood estimates can be plotted vs possible parameter values, with MLE = peak value. e.g. MLE = 0.2, Lmax=0.0067 for A, and Lmax=0.0045 for B etc. Set A: Log Lmax- Log L=Log(0.0067) - Log(0.00091)= 2 gives  95% C.I. so  =(0.035,0.496) corresponding to L=0.00091,  95% C.I. for A. Similarly, manipulating this expression, Likelihood value corresponding to  95% confidence interval given as L = (7.389)-1 Lmax Note: Usually plot Log-likelihood vs parameter, rather than Likelihood. As sample size increases, C.I. narrower and  symmetric 14 Maximum Likelihood Benefits • Strong estimator properties – sufficiency, efficiency, consistency, non-bias etc. as before • Good Confidence Intervals Coverage probability realised and intervals meaningful • MLE Good estimator of a CI 2 MSE consistent Lim n E (ˆ   )  0 Absence of Bias E (ˆ)   - does not “stand-alone” – minimum variance important ˆ   ~ N (0, 1) as n   Asymptotically Normal  ˆ Precise – large sample Inferences valid, ranges realistic 15

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download DATA ANALYSIS - DCU School of Computing