Download Brief introduction to maximum likelihood estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
MLE.1
Brief introduction to maximum likelihood estimation
Suppose the success or failure of a field goal in football
can be modeled with a Bernoulli() distribution. Let Y = 0
if the field goal is a failure and Y = 1 if the field goal is a
success. Then the probability distribution for Y is:
P(Y=y) = y (1  )1 y
where  denotes the probability of success.
Suppose we would like to estimate  for a 40 yard field
goal. Let y1, …, yn denote a random sample of observed
field goal results at 40 yards. Thus, these yi’s are either
0’s or 1’s. Given the resulting data (y1, …, yn), the
“likelihood function” measures the plausibility of different
values of :
L(  | y1,...,yn )
 P(Y1  y1 )  P(Y2  y2 ) 
n
  P(Yi  yi )
i1
n
  yi (1  )1yi
i1
 i1 yi (1  )ni1yi
n
n
 P(Yn  yn )
MLE.2
Suppose ni1 yi = 4 and n = 10. Given this observed
information, we would like to find the corresponding
parameter value for  that produces the largest
probability of obtaining this particular sample. The
following table can be formed to help find this
parameter value:

0.2
0.3
0.35
0.39
0.4
0.41
0.5
L(  | y1,...,yn )
0.000419
0.000953
0.001132
0.001192
0.001194
0.001192
0.000977
Calculations in R (LikelihoodFunction.r):
>
>
>
>
>
1
2
3
4
5
6
7
sum.y<-4
n<-10
pi<-c(0.2, 0.3, 0.35, 0.39, 0.4, 0.41, 0.5)
Lik<-pi^sum.y*(1-pi)^(n-sum.y)
data.frame(pi, Lik)
pi
Lik
0.20 0.0004194304
0.30 0.0009529569
0.35 0.0011317547
0.39 0.0011918935
0.40 0.0011943936
0.41 0.0011919211
0.50 0.0009765625
> #Likelihood function plot
MLE.3
0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012
Likelihood function
> curve(expr = x^sum.y*(1-x)^(n-sum.y), from = 0, to =
1, xlab = expression(pi), ylab = "Likelihood
function")
0.0
0.2
0.4
0.6
0.8
1.0
Note that  = 0.4 is the “most plausible” value of  for the
observed data because this maximizes the likelihood
function. Therefore, 0.4 is the maximum likelihood
estimate (MLE).
In general, the MLE can be found as follows:
MLE.4
1. Find the natural log of the likelihood function,
logL(  | y1,...,yn )
2. Take the derivative of logL(  | y1,...,yn ) with respect
to .
3. Set the derivative equal to 0 and solve for  to find
the MLE. Note that the solution is the maximum of
L(  | y1,...,yn ) provided certain “regularity” conditions
hold (see Mood, Graybill, Boes, 1974).
For the field goal example:
log L(  | y1,...,yn )  log  i1 yi (1  )ni1 yi 
n
n
n
n
i1
i1
  yi log( )  (n   yi )log(1  )
where log means natural log.
n
n
yi n   yi
 logL(  | y1,...,yn ) 
i1
Then
 i1 
0


1 
n

 yi
i 1

n

n   yi
i 1
1 
n
1 



n   yi
i 1
n
 yi
i 1
MLE.5
n
1
 

n
n   yi   y i
i 1
n
i 1
 yi
i 1
n

 yi
i 1
n
Therefore, the maximum likelihood estimator of  is the
proportion of field goals made. To avoid confusion
between a parameter and a statistic, one denotes the
n
estimator as ̂ =  yi /n.
i1
Example: MLEs for  and  from a normal distribution
Let X1,…,Xn be a random sample from a N(,2)
distribution. Find the MLE of  and 2. Remember that
the normal distribution is:
f(xi ) 
1
2

e
( Xi  )2
2 2
Then the likelihood function is:
MLE.6
L(,  | x1,...,xn )
n

i 1
1
2

( Xi  )2
e
2 2
1
 2  ( Xi  )2
1

e 2
n/2 n
(2) 
Taking the log of the likelihood function produces:
log[L(,  | x1,...,xn )]
n
1
  log(2)  nlog()  2 (Xi  )2
2
2
To find the maximum likelihood estimate of , take the
derivative with respect to  and set equal to 0.
 log[L(,  | x1,...,xn )]
1
 2 2(Xi  )  0

2
Solving for  produces:
(Xi  )  0
1
 ˆ  Xi  X
n
MLE.7
To find the maximum likelihood estimate of , take the
derivative with respect to  and set equal to 0.
log[L(,  | x1,...,xn )]  
n 2
 3 (Xi  )2  0
 2
Solving for 2 produces:
n
1
 3 (Xi   )2
 
(Xi   )2
2
 
n
 ˆ 
(Xi  X)2
n
Likelihood Ratio Test (LRT)
The LRT is a general way to test hypotheses. The LRT
statistic, , is the ratio of two likelihood functions. The
numerator is the likelihood function maximized over the
parameter space restricted under the null hypothesis.
The denominator is the likelihood function maximized
over the unrestricted parameter space. The test statistic
is written as:
MLE.8

Max. lik. when parameters satisfy Ho
Max. lik. when parameters satisfy Ho or Ha
Wilks (1935, 1938) shows that –2log() can be
approximated by a u2 for a large sample and under Ho
where u is the difference in dimension between the
alternative and null hypothesis parameter spaces. See
Casella and Berger (2002, p. 374) for more background
on the LRT if this is of interest.
Example: Continuing the field goal example; suppose the
hypothesis test Ho:=0.5 vs. Ha:0.5 is of interest.
The numerator of  is the maximum possible value of
the likelihood function under the null hypothesis.
Because  = 0.5 is the null hypothesis, the maximum
can be found by just substituting  = 0.5 in the likelihood
function:
L(   0.5 | y1,...,yn )  0.5yi (1  0.5)nyi
Then
L(   0.5 | y1,...,yn )  0.54 (0.5)104  0.0009766
The denominator of  is the maximum possible value of
the likelihood function under the null OR alternative
MLE.9
hypotheses. Because this includes all possible values of
 here, the maximum is achieved when the MLE is
substituted for  in the likelihood function! As shown
previously, the maximum value is 0.001194.
Therefore,
Max. lik. when parameters satisfy Ho
Max. lik. when parameters satisfy Ho or Ha
0.0009766

 0.8179
0.001194

Then –2log() = -2log(0.8179) = 0.4020 is the test
2
statistic value. The critical value is 1,0.95
= 3.84 using  =
0.05:
> qchisq(p = 0.95, df = 1)
[1] 3.841459
There is not sufficient evidence to reject the hypothesis
that  = 0.5.
In general for any level , the chi-square distribution and
the critical value plotted are
MLE.10
Questions:
 Suppose the ratio is close to 0, what does this say about
Ho? Explain.
 Suppose the ratio is close to 1, what does this say about
Ho? Explain.