Download Statistics 550 Notes 3 - Wharton Statistics Department

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mathematical optimization wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
Statistics 550 Notes 13
Reading: Section 2.3.
Schedule:
1. Take home midterm due Wed. Oct. 25th
2. No class next Tuesday due to fall break. We will have
class on Thursday.
3. The next homework will be assigned next week and due
Friday, Nov. 3rd.
I. Asymptotic Relative Efficiency (Clarification from last
class)
Consider two estimators Tn and U n and suppose that
L
n (Tn   )  N (0, t 2 )
and that
L
n (U n   )  N (0, u 2 ) .
We define the asymptotic relative efficiency of U to T by
ARE (U , T )  t 2 / u 2 . For X 1 , , X n iid N ( ,1) ,
1
ARE (Sample median,Sample mean) 
 0.63 .
 /2
The interpretation is that if person A uses the sample
median as her estimator of  and person B uses the sample
mean as her estimator of  , person B needs a sample size
that is only 0.63 as large as person A to obtain the same
approximate variance of the estimator.
1
Theorem: If ˆn is the MLE and  n is any other estimator,
then
ARE (n ,ˆn )  1 .
Thus, the MLE has the smallest asymptotic variance and
we say that the MLE is asymptotically efficient and
asymptotically optimal.
Comments: (1) We will provide an outline of the proof for
this theorem when we study the Cramer-Rao (information)
inequality in Chapter 3.4; (2) The result is actually more
subtle than the stated theorem because it only covers a
certain class of well behaved estimators – more details will
be study in Stat 552.
II. Uniqueness and Existence of the MLE
For a finite sample, when does the MLE exist, when is it
unique and how do we find the MLE?
If  is open, l x ( ) is differentiable in  and ˆMLE exists,
then ˆ
must satisfy the estimating equation
MLE
 l x ( )  0
This is known as the likelihood equation.
(1.1)
But solving (1.1) does not necessarily yield the MLE as
there may be solutions of (1.1) that are not maxima, or
solutions that are only local maxima.
Anomalies of maximum likelihood estimates:
2
Maximum likelihood estimates are not necessarily unique
and do not even have to exist.
Nonuniqueness of MLEs example: X 1 , , X n are iid
1
1


,


Uniform(
2
2 ).
1
1

if max X i     min X i 
1
Lx ( )  
2
2
0
otherwise
Thus any estimator ˆ that satisfies
max X i 
1 ˆ
1
   min X i  is a maximum likelihood
2
2
estimator.
Nonexistence of maximum likelihood estimator: The
likelihood function can be unbounded. An important
example is a mixture of normal distributions, which is
frequently used in applications.
X 1 , , X n iid with density
 ( x  1 )2 
 ( x  2 )2 
1
1
f ( x)  p
exp 
exp 
  (1  p)
.
2
2
2

21
2

1


 2 2

2
This is a mixture of two normal distributions. The
2
2
unknown parameters are ( p, 1 , 2 ,  1 ,  2 ) .
2
Let 1  X1 . Then as  1  0 , f ( X 1 )   so that the
likelihood function is unbounded.
3
Example where the MLE exists and is unique: Normal
distribution
X1 ,
, X n iid N (  ,  2 )
 1  ( xi   )  2 
1
, xn ;  ,  )  
exp   
 

2


2


 
i 1

n
f ( x1 ,
2
n
1
l (  ,  )  n log   log 2 
2
2 2
l
1
n
 2  i 1 ( X i   )
 
l
n
n
    3  i 1 ( X i   ) 2



n
i 1
( X i   )2
The partials with respect to
 and  are
Setting the first partial equal to zero and solving for the
mle, we obtain
ˆ MLE  X
Setting the second partial equal to zero and substituting the
mle for  , we find that the mle for  is
1 n
2
ˆ MLE 
(
X

X
)

i
.
n i 1
4
To verify that this critical point is a maximum, we need to
check the following second derivative conditions:
(1) The two second-order partial derivatives are negative:
 2l
 2l
0
0
2
2
and
   ˆ , ˆ
   ˆ , ˆ
MLE
MLE
MLE
MLE
(2) The Jacobian of the second-order partial derivatives is
positive,
 2l
 2
 2l

l

l
 2
2
0
2
  ˆ MLE , ˆ MLE
See attached notes from Casella and Berger for verification
of (1) and (2) for normal distribution.
Conditions for uniqueness and existence of the MLE: We
now provide a general condition under which there is a
unique maximum likelihood estimator that is the solution to
the likelihood equation. The condition applies to many
exponential families.
Boundary of a parameter space: Suppose the parameter
p
space    is an open set. Let      be the
boundary of  , where  denotes the closure of  in
[, ] p . That is,  is the set of points outside of  that
can be obtained as limits of points in  , including all
5
points with  as a coordinate. For instance, for
X ~ N (  ,  2 ), (  ,  2 )  (, )  (0, ) ,
  {(a, b) : a  , 0  b  }  {(a, b) : a , b {0, }}
Convergence of points to boundary: In general, for a
sequence { m } of points from  open, we define
 m   as m   to mean that for any subsequence
{ mk } , either  mk  t with t  or  mk diverges with
|  mk |  as k   where
|  | denotes
the Euclidean norm.
2
Example: In the N (  ,  ) case,
(a, m1 ),(m, b),(m, b),(a, m),(m, m1 )
all tend to  as m   .
Lemma 2.3.1: Suppose we are given a function
l :   where    p is open and l is continuous.
Suppose also that
lim {l ( ) :   }   .
 
Then there exists ˆ  such that
l (ˆ)  max{l ( ) :   } .
Proof: Problem 2.3.5.
Proposition 2.3.1: Suppose our model is that X has pdf or
pmf p( X |  ),    , and that (i) l x ( ) is strictly concave;
6
(ii) l x ( )   as    . Then the maximum
likelihood estimator exists and is unique.
Proof: l x ( ) is continuous because l x ( ) is convex (see
Appendix B.9). By Lemma 2.3.1, ˆ exists. To prove
MLE
uniqueness, suppose ˆ1 and ˆ2 are distinct maximizers of
the likelihood, then
1
1

l x (ˆ1 )  l x (ˆ1 )  l x (ˆ2 )  l x  (ˆ1  ˆ2 ) 
2
2

with the inequality following from the strict concavity of
l x ( ) ; this contradicts ˆ1 being a maximizer of the
likelihood.


Corollary: If the conditions of Proposition 2.3.1 are
satisfied and l x ( ) is differentiable in  , then ˆMLE is the
unique solution to the estimating equation:
 l x ( )  0
(1.2)
Application to Exponential Families:
1. Theorem 1.6.4, Corollary 1.6.5: For a full exponential
family, the log likelihood is strictly concave.
Consider the exponential family
p( x |  )  h( x) exp{i 1iTi ( x)  A( )}
k
Note that if A( ) is convex, then the log likelihood
7
log p( x |  )  log h( x)  i 1iTi ( x)  A( ) is concave in
.
k
Proof that A( ) is convex:
Recall that A( )  log   h( x) exp[i 1iTi ( x)dx . To
show that A( ) is convex, we want to show that
A(1  (1   )2 )   A(1 )  (1   ) A(2 ) for 0    1
or equivalently
exp{ A(1  (1   )2 )}  exp{ A(1 )}exp{(1   ) A(2 )}
k
We use Holder’s Inequality to establish this. Holder’s
Inequality (B.9.4 on page 518 of Bickel and Doksum)
states that for any two numbers r and s with
r , s  1, r 1  s 1  1 ,
E | XY | {E | X |r }1/ r {E | Y |s }1/ s .
We have
exp{ A(1  (1   ) 2 )}   exp[ i 1 (1i  (1   ) 2i )Ti ( x )]h( x ) dx
k
 exp[  T ( x )]exp[  T ( x )]h( x )dx 


  (exp[  T ( x)]) h( x)dx    (exp[ (1   ) T ( x)])
k
k
i 1
k
i 1
i 1
1i i
2i i
1/(1 )
k
1/
i 1
1i i
1i i
h ( x ) dx
exp{ A(1 )}exp{(1   ) A( 2 )}
For a full exponential family, the log likelihood is strictly
concave.
For a curved exponential family, the log likelihood is
concave but not strictly concave.
8

1

2. Theorem 2.3.1, Corollary 2.3.2 spell out specific
conditions under which l x ( )   as    for
exponential families.
Example 1: Gamma distribution
 1
x 1e x /  , 0  x  


f ( x;  ,  )   ( ) 
0,
elsewhere

l ( ,  )  i 1  log ( )   log   ( 1)log X i  X i /  
n
for the parameter space   0,   0 .
The gamma distribution is a full two-dimensional
exponential family so that the likelihood function is strictly
concave.
The boundary of the parameter space is
  {(a, b) : a  , 0  b  }  {( a, b) : a  0, 0  b  } 
{(a, b) : 0  a  , b  }  {( a, b) : 0  a  , b  0}
Can check that lim {l ( ) :   }   .
 
Thus, by Proposition 2.3.1, the MLE is the unique solution
to the likelihood equation.
The partial derivatives of the log likelihood are
9

l
n   '( )
  i 1  
 log   log X i 

 ( )

X 
l
n  
  i 1    2i 

   
Setting the second partial derivative equal to zero, we find
ˆ


n
i 1
Xi
nˆ MLE
When this solution is substituted into the first partial
derivative, we obtain a nonlinear equation for the MLE of
:
MLE
 '( )
n
 Xi
 n log i 1  n log ˆ MLE   i 1 log X i  0
( )
n
This equation cannot be solved in closed form.
n
n
Next topic: Numerical methods for finding the MLE
(Chapter 2.4).
10