Download 3.The Theory of Maximum Likelihood

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
EC 331
The Theory of and applications of
Maximum Likelihood Method
Burak Saltoğlu
1
outline




Maximum Liklelihood Principle
Estimating population parameters via ML method
Properties of ML
OLS vs ML
2
1 Maximum Likelihood
ML method is based on the principle that the parameter
estimates can be obtained by maximising the likelihood of the selected sample
to reflect the population.
We choose the parameters in a way that we maximize the joint likelihood of
representing the population. Suppose we are given iid observed sample of y
x  ( x1 , x2 , x3 ,
xn )
and also a parameter vector (of k dimesion) can be represented as
 '  (1 , 2 ,3 ,
f ( x1 , x2 , x3 ,
3
k )
xn |  )
This represents the joint density of y’s given
parameter
Likelihood Function
Joint likelihood function then can be written as the joint probability of observing y’s
drawn from f(.)
f ( x1 , x2 , x3 ,
xn |  )  f ( x1 |  ). f ( x2 |  )... f ( xn |  )
n
f ( x1 , x2 , x3 ,
xn |  )   f ( xi | )  L( | x)
Likelihood function is
i 1
L( x; )
Maximizing above function w.r.t.  will yield a special value ˆ that maximizes
the probability of obtaining sample values that have actually observed.
ˆ  Maximum Likelihood Estimator of 
In most applications it is convinient to work with loglikelihood function, which is
4
Likelihood Function
l  ln L( x; )
n
ln L( x; )   ln f ( xi |  )
i 1
Note that
l 1 L

 L 
Also note that above equation is known as score .
5
Example-1 Poisson distribbution due to Siméon
Denis Poisson
expresses the probability of a given number of events occurring in a fixed interval
of time
these events occur with a known average rate and independently of the time
since the last event
use: defaults of countries, customers,
6
x1 , x2 , x3 ,..., xn i.i.d .
xi follows a Poisson Distribution f ( x1 , x2 , x3 ,..., xn |  ) 
 
find ˆ  ˆ
n
Solution : f ( x1 , x2 , x3 ,..., xn |  )   f  xi |   L( | y )
i 1
7
 x e 
i
xi !
Example-1
n
L
 e
i 1
L
xi

xi !
x1  
x2  
 e
 e
x1 !
x2 !

x1  x2 .... xn
xn  
....
 e
xn !
 n
e
L
x1 ! x2 !...xn !
n
n
n
i 1
i 1
i 1
ln  L   ln   xi      ln xi !
8
Example-1
n
n
n
i 1
i 1
i 1
ln  L   ( )   xi ln       ln xi !

1
  xi  n  0
  i 1
n
n
ˆMLE 
9
x
i 1
n
i
x
Numerical example
x1 , x2 , x3 ,..., xn i.i.d .
xi follows a Poisson Distribution suppose we observe
x1 , x2 , x3 ,..., x10 ={5, 0, 1, 1, 0, 3, 2, 3, 4, 1}
n
f ( x1 , x2 , x3 ,..., xn |  )   f  xi |   L( | y )
i 1
 
find ˆ  ˆ
n
L
 x e
i 1
L
L
i
xi !
 5e   0e 
x1 !
x2 !
 5 0 ....1e 10 
5!0!...1!
....

 1e  
xn !
 20 e 10 
207, 360
Numerical example
n
n
n
i 1
i 1
i 1
ln  L   ln   xi      ln xi !
 ln L( | x)
20
 10 
 0  ˆMLE  2


 2 ln L( | x)
20


 0  this implies this is a maximum
 2
2
ln  L   ln(2) x 20  10  12.242   18.37905639
11
8
Loglikelihood rescaled by+25
6
4
2
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
loglikelihook
-2
-4
-6
-8
12
Likelihood profile (lambda in the horizontal
axis)
Likelihood 10e12
10
8
6
likelihood
4
2
0
0
0.5
13
1
1.5
2
2.5
3
3.5
4
4.5
12
Likelihood and log-likelihood for Poisson
(rescaled Graph)
10
8
6
4
loglikelihook
2
likelihood
0
-2
-4
-6
-8
14
Example-2
It describes the time between events in
a Poisson process
x1 , x2 , x3 ,..., xn i.i.d .
xi follows an Exponential Distribution
f ( xi |  )   e
  xi
 
find ˆ  ˆ
n
L   f  xi 
i 1
15
Example-2
n
L   e
  xi
i 1
L  e
  x1
L e
n

e
  x2
.... e
  xn
n
 xi
i 1
n
ln L  ( )  n ln( )    xi
i 1
16
Example-2
( )  n ln     x
max

n
i 1
n

1
 0  n   xi  0


i 1
ˆ 
n
n
x
i 1
17
i
n 1


nx x
i
Example-3
x1 , x2 , x3 ,..., xn i.i.d .
xi follows a Geometric Distribution
n
L   f  xi  find ˆ   pˆ 
i 1
18
Example-4
n
L   p 1  p 
xi 1
i 1
n
max ln  L   n ln p    xi  1 ln 1  p 
i 1
n
  ln  L  
p
n
 
p
  x  1
i 1
i
1  p 
1
p
x
n nx  n
 
0
p 1  p 
Convergence in Probability
Definition : Let xn be a sequence random variable where n is sample size,
the random variable xn converges in probability to a constant c if
limn Prob(| xn  c |  )  0
for any positive 
the values that the x may take that are not close to c become
increasingly unlikely as n increases.

If xn converges to c, then we say,

All the mass of the probability distribution concentrates around c.

plim xn  c
20

Properties of MLE
Consistency:
p lim(ˆ)  
a

Asymtotic Normality:
where information matrix is
ˆ N ( , I 1 ( ))
 l
  2l 
l  



I ( )  E 

   E 

  










   2l 
 2
  1 

 2l

  

2
l


  k 1
that is, the hessian of log-likelihood function.
21
  2l 






 1 2
 2l 

1 k 




 2l 
 k2 
3.3 Properties of MLE
Asymtotic Efficiency:
Assumimg that we are dealing with only one parameter θ;



d
n ˆ    N (0,  2 )
which states that if there is another consistent and asymtotically normal
estimator of to θ then
,
n has the limiting distribution with variance greater than or equal to  2

Invariance:
If ˆ is MLE of  and g(.) is continuous function of  ;
then g(ˆ) is the MLE of g( )
22
4 Estimation of the Linear Regression Model
Yi  1  2 X i  ui
Gaussian density f (Yi | X ) 
1
 2 
2
1/2
e
 1 (Yi  1   2 X i )2 


2
 2

n
L(  ,  2 )   f (Yi | X i )
i 1
L(  ,  ) 
2
1
 2 
2
23
1/2
e
 1 (Y1  1   2 X1 )2 


2
 2

1
 2 
2
1/2
e
 1 (Y2  1   2 X 2 )2 


2
 2

......
1
 2 
2
1/2
e
 1 (Yn  1   2 X n )2 


2
 2

L(  ,  ) 
2
L(  ,  ) 
2
1
 2 
2
n /2
1

2 2

n /2
e
e
 1 (Y1  1   2 X1 )2   1 (Y2  1   2 X 2 ) 2 

 

2
2
 2
  2

e
1

2
n

i
...e
 1 (Yn  1   2 X n )2 


2
 2

 (Yi  1   2 X i )2 


2


2
n


(
Y




X
)
n
n
1
2
2
i
1
2 i
ln( L(  ,  ))   ln(2 )  ln( )   

2
2
2
2 i 


24
l
1
 2
1

l
1
 2
 2

n
 Y  
i
 2 X i 
i
n
 (Y  
i
1
  2 X i )( X i )
i
l
n
1



 2
2 2 2 4
25
1
n
2
(
Y




X
)
 i 1 2 i
i
Matrix notation
Y  X u
f (Y | X ) 
1


n /2
e
 (1/2 2 )( u ' u )
2 2
n
n
1
2
log( L)   ln 2  ln  
(Y  X  )(Y  X  )
2
2
2
2
n
n
1
l   ln 2  ln  2 
(Y  X  )(Y  X  )
2
2
2
2
26
3.4 Estimation of the Linear Regression Model
Parameter vector is
   ( , 2 )
l
1
  2   X y  X X  


and
l
n
1


u u
2
2
4

2
2
ˆMLE  ( X ' X )1 X ' y
27
ˆ MLE 2 
u 'u
n
3.4 Estimation of the Linear Regression Model
To calculate variance matrix of parameters, we need hessian of likelihood
parameters. İf we take ot second derivatives
Taking expectations,
 2l
 '

 2l

 2l
 ( )
2 2
X X
2
1
2
  4 ( X y  X X  )  
=

n
2
4
-
X u

4
 E( 
X u

4
)0
u'u
6
 2l
X 'X
E(
) 2
 '

 2l
-E(
)0
2

 2l
n
-E(
)
2 2
( )
2 4
28
3.4 Estimation of the Linear Regression Model
 2l
 ( )
2 2
=
n
2
4
-
u'u
6
E(u ' u )
since  =
 E (u ' u )   2 n
n
2
  2l 
n n 2
n
n
n
E

- 6 
- 4 
2 2 
4
4
4

(

)
2


2


2



  2l 
n
E 

2 2 
4

(

)
2



 2l
X 'X
E(
) 2
 '

 2l
-E(
)0
2

 2l
n
-E(
)
2 2
( )
2 4
29
3.4 Estimation of the Linear Regression Model
So, the information matrix is
  2l

 

I ( ) 
  2l
 2
  
 2l   1
(X ' X )

 2    2

2
l  
0

2 2 
 ( )  

0 

n 

2 4 
The inverse of the information matrix will give us the variance-covariance matrix
of the MLE estimators,
  2 ( X ' X ) 1
I 1 ( )  
0


30
0 

4
2 

n 
Testing in Maximum Likelihood Framework
H 0 : Rβ  r
R is qxk
matrix of known contants with q<k,
and r is a q-vector of known constants.
31
32
33
35
loglikeliho
ok
30
25
20
15
10
Lagrange multiplier
Likelihood Ratio
5
0
0
-5
-10
0.5
1
1.5
2
2.5
3
3.5
Example from Poisson example
35
Example
In our Poisson example
H : 1.8, ˆ  2.0
0
L( )

where  is the restricted and ˆ is unsrestricted model
L(ˆ)

This ratio is always between 0 and 1 and the less likely the
assumption is, the smaller this ratio
36
Likelihood Ratio Test
If we want to test
H 0 : R  r
likelihood ratio defined as
L(  ;  2 )

L(  ;  2 )
Restricted
Unrestricted
can be used with decision rule
Reject H0 if  2ln   q2
q:#restrictions
37
Likelihood Ratio Test
 L(1.8) 
 0.0936 
LR  2ln   2ln 

-2ln
=0.2144



 0.104 
 L(2) 
10.95  3.84
Don’t reject the null
38
12
More on LR test in the context of Linear
Regression
39
Likelihood Ratio Test
H 0 : R  r
L(  ;  2 )
 
L(  ;  2 )

n
2
remember : L(  ;  )  k (uˆ uˆ )
so that the restricted model's L can be written similarly
2

n
2
L(  ;  2 )  k (u u )
Likelihood ratio then can be written as
=
k (u u )

n
2
k (uˆ uˆ )

n
2
n


(u u ) 2

 LR  2 ln
n


 (uˆ uˆ ) 2

LR  n(ln(u u )  ln(uˆ uˆ ))
 q2
Reject H0 if LR  q2
40

n
 n

  2
   ln(u u )  ln(uˆ uˆ )  



2
 2



Related documents