Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 550 Notes 13 Reading: Section 2.3. Schedule: 1. Take home midterm due Wed. Oct. 25th 2. No class next Tuesday due to fall break. We will have class on Thursday. 3. The next homework will be assigned next week and due Friday, Nov. 3rd. I. Asymptotic Relative Efficiency (Clarification from last class) Consider two estimators Tn and U n and suppose that L n (Tn ) N (0, t 2 ) and that L n (U n ) N (0, u 2 ) . We define the asymptotic relative efficiency of U to T by ARE (U , T ) t 2 / u 2 . For X 1 , , X n iid N ( ,1) , 1 ARE (Sample median,Sample mean) 0.63 . /2 The interpretation is that if person A uses the sample median as her estimator of and person B uses the sample mean as her estimator of , person B needs a sample size that is only 0.63 as large as person A to obtain the same approximate variance of the estimator. 1 Theorem: If ˆn is the MLE and n is any other estimator, then ARE (n ,ˆn ) 1 . Thus, the MLE has the smallest asymptotic variance and we say that the MLE is asymptotically efficient and asymptotically optimal. Comments: (1) We will provide an outline of the proof for this theorem when we study the Cramer-Rao (information) inequality in Chapter 3.4; (2) The result is actually more subtle than the stated theorem because it only covers a certain class of well behaved estimators – more details will be study in Stat 552. II. Uniqueness and Existence of the MLE For a finite sample, when does the MLE exist, when is it unique and how do we find the MLE? If is open, l x ( ) is differentiable in and ˆMLE exists, then ˆ must satisfy the estimating equation MLE l x ( ) 0 This is known as the likelihood equation. (1.1) But solving (1.1) does not necessarily yield the MLE as there may be solutions of (1.1) that are not maxima, or solutions that are only local maxima. Anomalies of maximum likelihood estimates: 2 Maximum likelihood estimates are not necessarily unique and do not even have to exist. Nonuniqueness of MLEs example: X 1 , , X n are iid 1 1 , Uniform( 2 2 ). 1 1 if max X i min X i 1 Lx ( ) 2 2 0 otherwise Thus any estimator ˆ that satisfies max X i 1 ˆ 1 min X i is a maximum likelihood 2 2 estimator. Nonexistence of maximum likelihood estimator: The likelihood function can be unbounded. An important example is a mixture of normal distributions, which is frequently used in applications. X 1 , , X n iid with density ( x 1 )2 ( x 2 )2 1 1 f ( x) p exp exp (1 p) . 2 2 2 21 2 1 2 2 2 This is a mixture of two normal distributions. The 2 2 unknown parameters are ( p, 1 , 2 , 1 , 2 ) . 2 Let 1 X1 . Then as 1 0 , f ( X 1 ) so that the likelihood function is unbounded. 3 Example where the MLE exists and is unique: Normal distribution X1 , , X n iid N ( , 2 ) 1 ( xi ) 2 1 , xn ; , ) exp 2 2 i 1 n f ( x1 , 2 n 1 l ( , ) n log log 2 2 2 2 l 1 n 2 i 1 ( X i ) l n n 3 i 1 ( X i ) 2 n i 1 ( X i )2 The partials with respect to and are Setting the first partial equal to zero and solving for the mle, we obtain ˆ MLE X Setting the second partial equal to zero and substituting the mle for , we find that the mle for is 1 n 2 ˆ MLE ( X X ) i . n i 1 4 To verify that this critical point is a maximum, we need to check the following second derivative conditions: (1) The two second-order partial derivatives are negative: 2l 2l 0 0 2 2 and ˆ , ˆ ˆ , ˆ MLE MLE MLE MLE (2) The Jacobian of the second-order partial derivatives is positive, 2l 2 2l l l 2 2 0 2 ˆ MLE , ˆ MLE See attached notes from Casella and Berger for verification of (1) and (2) for normal distribution. Conditions for uniqueness and existence of the MLE: We now provide a general condition under which there is a unique maximum likelihood estimator that is the solution to the likelihood equation. The condition applies to many exponential families. Boundary of a parameter space: Suppose the parameter p space is an open set. Let be the boundary of , where denotes the closure of in [, ] p . That is, is the set of points outside of that can be obtained as limits of points in , including all 5 points with as a coordinate. For instance, for X ~ N ( , 2 ), ( , 2 ) (, ) (0, ) , {(a, b) : a , 0 b } {(a, b) : a , b {0, }} Convergence of points to boundary: In general, for a sequence { m } of points from open, we define m as m to mean that for any subsequence { mk } , either mk t with t or mk diverges with | mk | as k where | | denotes the Euclidean norm. 2 Example: In the N ( , ) case, (a, m1 ),(m, b),(m, b),(a, m),(m, m1 ) all tend to as m . Lemma 2.3.1: Suppose we are given a function l : where p is open and l is continuous. Suppose also that lim {l ( ) : } . Then there exists ˆ such that l (ˆ) max{l ( ) : } . Proof: Problem 2.3.5. Proposition 2.3.1: Suppose our model is that X has pdf or pmf p( X | ), , and that (i) l x ( ) is strictly concave; 6 (ii) l x ( ) as . Then the maximum likelihood estimator exists and is unique. Proof: l x ( ) is continuous because l x ( ) is convex (see Appendix B.9). By Lemma 2.3.1, ˆ exists. To prove MLE uniqueness, suppose ˆ1 and ˆ2 are distinct maximizers of the likelihood, then 1 1 l x (ˆ1 ) l x (ˆ1 ) l x (ˆ2 ) l x (ˆ1 ˆ2 ) 2 2 with the inequality following from the strict concavity of l x ( ) ; this contradicts ˆ1 being a maximizer of the likelihood. Corollary: If the conditions of Proposition 2.3.1 are satisfied and l x ( ) is differentiable in , then ˆMLE is the unique solution to the estimating equation: l x ( ) 0 (1.2) Application to Exponential Families: 1. Theorem 1.6.4, Corollary 1.6.5: For a full exponential family, the log likelihood is strictly concave. Consider the exponential family p( x | ) h( x) exp{i 1iTi ( x) A( )} k Note that if A( ) is convex, then the log likelihood 7 log p( x | ) log h( x) i 1iTi ( x) A( ) is concave in . k Proof that A( ) is convex: Recall that A( ) log h( x) exp[i 1iTi ( x)dx . To show that A( ) is convex, we want to show that A(1 (1 )2 ) A(1 ) (1 ) A(2 ) for 0 1 or equivalently exp{ A(1 (1 )2 )} exp{ A(1 )}exp{(1 ) A(2 )} k We use Holder’s Inequality to establish this. Holder’s Inequality (B.9.4 on page 518 of Bickel and Doksum) states that for any two numbers r and s with r , s 1, r 1 s 1 1 , E | XY | {E | X |r }1/ r {E | Y |s }1/ s . We have exp{ A(1 (1 ) 2 )} exp[ i 1 (1i (1 ) 2i )Ti ( x )]h( x ) dx k exp[ T ( x )]exp[ T ( x )]h( x )dx (exp[ T ( x)]) h( x)dx (exp[ (1 ) T ( x)]) k k i 1 k i 1 i 1 1i i 2i i 1/(1 ) k 1/ i 1 1i i 1i i h ( x ) dx exp{ A(1 )}exp{(1 ) A( 2 )} For a full exponential family, the log likelihood is strictly concave. For a curved exponential family, the log likelihood is concave but not strictly concave. 8 1 2. Theorem 2.3.1, Corollary 2.3.2 spell out specific conditions under which l x ( ) as for exponential families. Example 1: Gamma distribution 1 x 1e x / , 0 x f ( x; , ) ( ) 0, elsewhere l ( , ) i 1 log ( ) log ( 1)log X i X i / n for the parameter space 0, 0 . The gamma distribution is a full two-dimensional exponential family so that the likelihood function is strictly concave. The boundary of the parameter space is {(a, b) : a , 0 b } {( a, b) : a 0, 0 b } {(a, b) : 0 a , b } {( a, b) : 0 a , b 0} Can check that lim {l ( ) : } . Thus, by Proposition 2.3.1, the MLE is the unique solution to the likelihood equation. The partial derivatives of the log likelihood are 9 l n '( ) i 1 log log X i ( ) X l n i 1 2i Setting the second partial derivative equal to zero, we find ˆ n i 1 Xi nˆ MLE When this solution is substituted into the first partial derivative, we obtain a nonlinear equation for the MLE of : MLE '( ) n Xi n log i 1 n log ˆ MLE i 1 log X i 0 ( ) n This equation cannot be solved in closed form. n n Next topic: Numerical methods for finding the MLE (Chapter 2.4). 10