Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University [email protected] http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 1 Computation Tools 2 Include Functions in R source(“file path”) Example In MME.R: In R: 3 Part 2 Motivation Examples 4 Example 1 in Genetics (1) Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab A A A a B A a B b a B b b b a B F ( Female) 1- r’ r’ (female recombination fraction) M (Male) 1-r r (male recombination fraction) 5 Example 1 in Genetics (2) r and r’ are the recombination rates for male and female Suppose the parental origin of these heterozygote is from the mating of AABB aabb. The problem is to estimate r and r’ from the offspring of selfed heterozygotes. Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92. http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/ba nk/handout12.pdf 6 Example 1 in Genetics (3) MALE F E M A L E AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2 AB (1-r’)/2 AABB (1-r) (1-r’)/4 aABb (1-r) (1-r’)/4 aABB r (1-r’)/4 AABb r (1-r’)/4 ab (1-r’)/2 AaBb (1-r) (1-r’)/4 aabb (1-r) (1-r’)/4 aaBb r (1-r’)/4 Aabb r (1-r’)/4 aB r’/2 AaBB (1-r) r’/4 aabB (1-r) r’/4 aaBB r r’/4 AabB r r’/4 Ab r’/2 AABb (1-r) r’/4 aAbb (1-r) r’/4 aABb r r’/4 AAbb r r’/4 7 Example 1 in Genetics (4) Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*. A*: the dominant phenotype from (Aa, AA, aA). a*: the recessive phenotype from aa. B*: the dominant phenotype from (Bb, BB, bB). b* : the recessive phenotype from bb. A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. Total: 16 combinations. 8 Example 1 in Genetics (5) Let (1 r )(1 r '), then 2 P ( A * B*) , 4 1 P ( A * b*) P (a * B*) , 4 P (a * b*) 4 . 9 Example 1 in Genetics (6) Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: 2 1 1 Multinomial n; , , , . 4 4 4 4 We know that (1 r )(1 r '), 0 r 1/ 2, and 0 r ' 1/ 2. So, 1 1/ 4. 10 Example 1 in Genetics (7) Suppose that we observe the data of y = (y1, y2, y3, y4) = (125, 18, 20, 24), which is a random sample from 2 1 1 Multinomial n; , , , . 4 4 4 4 Then the probability mass function is n! 2 y1 1 y2 y3 y4 g ( y, ) ( ) ( ) ( ) . y1 ! y2 ! y3 ! y4 ! 4 4 4 11 Maximum Likelihood Estimate (MLE) Likelihood: Maximize likelihood: Solve the score equations, which are setting the first derivates of likelihood to be zeros. Under regular conditions, the MLE is consistent, asymptotic efficient and normal! More: http://en.wikipedia.org/wiki/Maximum_lik elihood 12 MLE for Example 1 (1) n! 2 y1 1 y2 y3 y4 L( ) ( ) ( ) ( ) y1 ! y2 ! y3 ! y4 ! 4 4 4 Likelihood MLE: ˆMLE max L( ) ˆMLE max log L( ) ( ) logL( ) n! 2 1 log( ) y log( ) ( y2 y3 ) log( ) y4 log( ) y1 ! y2 ! y3 ! y4 ! 4 4 4 y y3 y4 y d log L( ) 1 2 0 d 2 1 ( y1 y2 y3 y4 ) 2 ( y1 2 y2 2 y3 y4 ) 2 y4 0 A B C 13 MLE for Example 1 (2) MLE B B 2 4 AC 2A Checking: d ( ) 2 d ˆ 2 (1) 0? MLE (2) 1/ 4 ˆMLE 1? (3) Compare log L(ˆMLE )? 14 Part 3 Numerical Solutions for the Score Equations of MLEs 15 A Banach Space A Banach space B is a vector space over the field K such that 1. x 0 x B. 2. x 0 x B x 0. 3. rx r x r K ,x B. 4. x y x y x, y B. 5. Every Cauchy sequence x ( k ) of B converges in B (i.e., B is complete). (http://en.wikipedia.org/wiki/Banach_space) 16 Lipschitz Continuous A closed subset A B and mapping F : A A. 1. F is Lipschitz continuous on A with 0 L if F ( x) F ( y) L x y . 2. F is a contraction mapping on A if F is Lipschitz continuous and 0 L 1. (http://en.wikipedia.org/wiki/Lipschitz_continuous) 17 Fixed Point Theorem If F is a contraction mapping on A if F is Lipschitz continuous and 0 L 1. 1. F has an unique fixed point s A such that F ( s ) s. (0) x A , x( k ) F ( x( k -1) ), k=1,2,… 2. initial k 3. x ( k ) s. k -t L (k ) sx x(t 1) x(t ) , 0 t k . 1- L (http://en.wikipedia.org/wiki/Fixed-point_theorem) (http://www.math-linux.com/spip.php?article60) 18 Applications for MLE (1) log L( ) ( ) 0, i.e. 0 ( ) ' 0 0 F ( ) ( ) ' is a contraction mapping (0) : initial (1) F ( ) '( ) (0) (k ) F ( ) (0) Then, (k ) ( k 1) (0) ) '( ( k 1) ( k 1) k s s.t. F ( s ) s 19 Applications for MLE (2) ? lim yx F ( x) F ( y ) x y F '( x) ''( x) 1 L 1. Optimal ? 20 Parallel Chord Method (1) Parallel chord method is also called simple iteration. (k) ( k 1) '( ( k 1) ) 0 '( ) 1 (k ) ( k 1) ( k 1) 21 Parallel Chord Method (2) s '( (0) ) '( (1) ) (0) (1) (2) 22 Plot the Parallel Chord Method by R 23 Define Functions for Example 1 in R We will define some functions and variables for finding the MLE in Example 1 by R 24 Parallel Chord Method by R (1) 25 Parallel Chord Method by R (2) 26 Parallel Chord Method by C/C++ 27 Newton-Raphson Method (1) '(s) '( ( k 1) ) ( S ( k 1) ) ''( ( k 1) ) 0 ( k 1) '( ) ( k 1) s ''( ( k 1) ) ( k ) ( k 1) '( ) ( k 1) ( k 1) G ( ) ''( ( k 1) ) http://math.fullerton.edu/mathews/n2003 /Newton'sMethodMod.html http://en.wikipedia.org/wiki/Newton%27s _method 28 Newton-Raphson Method (2) s '( (0) ) '( (1) ) (0) (1) (2) 29 Plot the Newton-Raphson Method by R 30 Newton-Raphson Method by R (1) 31 Newton-Raphson Method by R (2) 32 Newton-Raphson Method by C/C++ 33 Halley’s Method The Newton-Raphson iteration function is f ( x) g ( x) x f '( x) It is possible to speed up convergence by using more expansion terms than the Newton-Raphson method does when the object function is very smooth, like the method by Edmond Halley (1656-1742): f ( x) f ( x) f ''( x) g ( x) x 1 f '( x) 2( f ( x)) 2 1 34 (http://math.fullerton.edu/mathews/n2003/Halley'sMethodMod.html) Halley’s Method by R (1) 35 Halley’s Method by R (2) 36 Halley’s Method by C/C++ 37 Bisection Method (1) Assume that f C[a, b] and that there exists a number r [ a, b] such that f (r ) 0 . If f ( a ) and f (b ) have opposite signs, and cn represents the sequence of midpoints generated by the bisection process, then ba r cn n 1 n 1, 2,... 2 and the sequence cn converges to r. That is, lim cn r . n (http://en.wikipedia.org/wiki/Bisection_method ) 38 Bisection Method (2) 1 39 Plot the Bisection Method by R 40 Bisection Method by R (1) > fix(Bisection) 41 Bisection Method by R (2) 42 Bisection Method by R (3) 43 Bisection Method by C/C++ (1) 44 Bisection Method by C/C++ (2) 45 Secant Method 1 ''( ( k 1) ) slop ''( ) (k ) '( ) '( ( k 2) ( k 1) ( k 2) ( k 1) ) (http://en.wikipedia.org/wiki/Secant_method ) (http://math.fullerton.edu/mathews/n2003/Secant MethodMod.html ) 46 Secant Method by R (1) >fix(Secant) 47 Secant Method by R (2) 48 Secant Method by C/C++ 49 Secant-Bracket Method The secant-bracket method is also called the regular falsi method. '( A) '( B) '(C ) '( A) A B CA B '( A) A '( B) C '( A) '( B) S A C B 50 Secant-Bracket Method by R (1) >fix(RegularFalsi) 51 Secant-Bracket Method by R (2) 52 Secant-Bracket Method by R (3) 53 Secant-Bracket Method by C/C++ (1) 54 Secant-Bracket Method by C/C++ (1) 55 Fisher Scoring Method (k ) ''( ) Fisher scoring method replaces by E ( ''( ( k 1) )) I ( ( k 1) ) where I is the Fisher information matrix when the parameter may be multivariate. 56 Fisher Scoring Method by R (1) > fix(Fisher) 57 Fisher Scoring Method by R (2) 58 Fisher Scoring Method by C/C++ 59 Order of Convergence Order of convergence is p if lim sup k x ( k 1) s x (k ) s p c 0c and c<1 for p=1. (http://en.wikipedia.org/wiki/Order_of_convergence) ( k 1) (k ) ln x s c p ln x s as k . Note: Hence, we can use regression to estimate p. 60 Theorem for Newton-Raphson Method If I [a, b] , F : I I , F is a contraction mapping F ( x) 0 x I then p=1 and 1 lim k x ( k 1) s x (k ) s F '( s) 1 If I [a, b] , '''exists, ''( s ) 0, '( ) 0 has a simple zero, then I1 [s , s ], >0, such that G () of the Newton-Raphson method is a contraction mapping and p=2. 1 61 Find Convergence Order by R (1) R=Newton(y1, y2, y3, y4, initial) #Newton method can be substitute for different method temp=log(abs(R$iteration-R$phi)); y=temp[2:(length(temp)-1)] x=temp[1:(length(temp)-2)] lm(y~x) 62 Find Convergence Order by R (2) 63 Find Convergence Order by R (3) 64 Find Convergence Order by C/C++ 65 Exercises Write your own programs for those examples presented in this talk. Write programs for those examples mentioned at the following web page: http://en.wikipedia.org/wiki/Maximum_li kelihood Write programs for the other examples that you know. 66 More Exercises (1) Example 3 in genetics: The observed data are (nO, nA, nB, nAB) = (176, 182, 60, 17) ~ Multinomial(r^2, p^2+2pr, q^2+2qr, 2pq), where p, q, and r fall in [0,1] such that p+q+r = 1. Find the MLEs for p, q, and r. 67 More Exercises (2) Example 4 in the positron emission tomography (PET): The observed data are n*(d) ~Poisson(λ*(d)), d = 1, 2, …, D, and B * (d ) p(b, d ) (b). . b 1 The values of p(b,d) are known and the unknown parameters are λ(b), b = 1, 2, …, B. Find the MLEs for λ(b), b = 1, 2, …, B. 68 More Exercises (3) Example 5 in the normal mixture: The observed data xi, i = 1, 2, …, n, are random samples from the following probability density function: K f ( xi ) K k 1 k k 1 k Normal ( k , ), 2 k 1, and 0 k 1 for all k. Find the MLEs for the following parameters: (1 ,..., K , 1 ,..., K , 1 ,..., K ). 69