Download 3.The Theory of Maximum Likelihood

EC 331 The Theory of and applications of Maximum Likelihood Method Burak Saltoğlu 1 outline     Maximum Liklelihood Principle Estimating population parameters via ML method Properties of ML OLS vs ML 2 1 Maximum Likelihood ML method is based on the principle that the parameter estimates can be obtained by maximising the likelihood of the selected sample to reflect the population. We choose the parameters in a way that we maximize the joint likelihood of representing the population. Suppose we are given iid observed sample of y x  ( x1 , x2 , x3 , xn ) and also a parameter vector (of k dimesion) can be represented as  '  (1 , 2 ,3 , f ( x1 , x2 , x3 , 3 k ) xn |  ) This represents the joint density of y’s given parameter Likelihood Function Joint likelihood function then can be written as the joint probability of observing y’s drawn from f(.) f ( x1 , x2 , x3 , xn |  )  f ( x1 |  ). f ( x2 |  )... f ( xn |  ) n f ( x1 , x2 , x3 , xn |  )   f ( xi | )  L( | x) Likelihood function is i 1 L( x; ) Maximizing above function w.r.t.  will yield a special value ˆ that maximizes the probability of obtaining sample values that have actually observed. ˆ  Maximum Likelihood Estimator of  In most applications it is convinient to work with loglikelihood function, which is 4 Likelihood Function l  ln L( x; ) n ln L( x; )   ln f ( xi |  ) i 1 Note that l 1 L   L  Also note that above equation is known as score . 5 Example-1 Poisson distribbution due to Siméon Denis Poisson expresses the probability of a given number of events occurring in a fixed interval of time these events occur with a known average rate and independently of the time since the last event use: defaults of countries, customers, 6 x1 , x2 , x3 ,..., xn i.i.d . xi follows a Poisson Distribution f ( x1 , x2 , x3 ,..., xn |  )    find ˆ  ˆ n Solution : f ( x1 , x2 , x3 ,..., xn |  )   f  xi |   L( | y ) i 1 7  x e  i xi ! Example-1 n L  e i 1 L xi  xi ! x1   x2    e  e x1 ! x2 !  x1  x2 .... xn xn   ....  e xn !  n e L x1 ! x2 !...xn ! n n n i 1 i 1 i 1 ln  L   ln   xi      ln xi ! 8 Example-1 n n n i 1 i 1 i 1 ln  L   ( )   xi ln       ln xi !  1   xi  n  0   i 1 n n ˆMLE  9 x i 1 n i x Numerical example x1 , x2 , x3 ,..., xn i.i.d . xi follows a Poisson Distribution suppose we observe x1 , x2 , x3 ,..., x10 ={5, 0, 1, 1, 0, 3, 2, 3, 4, 1} n f ( x1 , x2 , x3 ,..., xn |  )   f  xi |   L( | y ) i 1   find ˆ  ˆ n L  x e i 1 L L i xi !  5e   0e  x1 ! x2 !  5 0 ....1e 10  5!0!...1! ....   1e   xn !  20 e 10  207, 360 Numerical example n n n i 1 i 1 i 1 ln  L   ln   xi      ln xi !  ln L( | x) 20  10   0  ˆMLE  2    2 ln L( | x) 20    0  this implies this is a maximum  2 2 ln  L   ln(2) x 20  10  12.242   18.37905639 11 8 Loglikelihood rescaled by+25 6 4 2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 loglikelihook -2 -4 -6 -8 12 Likelihood profile (lambda in the horizontal axis) Likelihood 10e12 10 8 6 likelihood 4 2 0 0 0.5 13 1 1.5 2 2.5 3 3.5 4 4.5 12 Likelihood and log-likelihood for Poisson (rescaled Graph) 10 8 6 4 loglikelihook 2 likelihood 0 -2 -4 -6 -8 14 Example-2 It describes the time between events in a Poisson process x1 , x2 , x3 ,..., xn i.i.d . xi follows an Exponential Distribution f ( xi |  )   e   xi   find ˆ  ˆ n L   f  xi  i 1 15 Example-2 n L   e   xi i 1 L  e   x1 L e n  e   x2 .... e   xn n  xi i 1 n ln L  ( )  n ln( )    xi i 1 16 Example-2 ( )  n ln     x max  n i 1 n  1  0  n   xi  0   i 1 ˆ  n n x i 1 17 i n 1   nx x i Example-3 x1 , x2 , x3 ,..., xn i.i.d . xi follows a Geometric Distribution n L   f  xi  find ˆ   pˆ  i 1 18 Example-4 n L   p 1  p  xi 1 i 1 n max ln  L   n ln p    xi  1 ln 1  p  i 1 n   ln  L   p n   p   x  1 i 1 i 1  p  1 p x n nx  n   0 p 1  p  Convergence in Probability Definition : Let xn be a sequence random variable where n is sample size, the random variable xn converges in probability to a constant c if limn Prob(| xn  c |  )  0 for any positive  the values that the x may take that are not close to c become increasingly unlikely as n increases.  If xn converges to c, then we say,  All the mass of the probability distribution concentrates around c.  plim xn  c 20  Properties of MLE Consistency: p lim(ˆ)   a  Asymtotic Normality: where information matrix is ˆ N ( , I 1 ( ))  l   2l  l      I ( )  E      E                   2l   2   1    2l      2 l     k 1 that is, the hessian of log-likelihood function. 21   2l         1 2  2l   1 k       2l   k2  3.3 Properties of MLE Asymtotic Efficiency: Assumimg that we are dealing with only one parameter θ;    d n ˆ    N (0,  2 ) which states that if there is another consistent and asymtotically normal estimator of to θ then , n has the limiting distribution with variance greater than or equal to  2  Invariance: If ˆ is MLE of  and g(.) is continuous function of  ; then g(ˆ) is the MLE of g( ) 22 4 Estimation of the Linear Regression Model Yi  1  2 X i  ui Gaussian density f (Yi | X )  1  2  2 1/2 e  1 (Yi  1   2 X i )2    2  2  n L(  ,  2 )   f (Yi | X i ) i 1 L(  ,  )  2 1  2  2 23 1/2 e  1 (Y1  1   2 X1 )2    2  2  1  2  2 1/2 e  1 (Y2  1   2 X 2 )2    2  2  ...... 1  2  2 1/2 e  1 (Yn  1   2 X n )2    2  2  L(  ,  )  2 L(  ,  )  2 1  2  2 n /2 1  2 2  n /2 e e  1 (Y1  1   2 X1 )2   1 (Y2  1   2 X 2 ) 2      2 2  2   2  e 1  2 n  i ...e  1 (Yn  1   2 X n )2    2  2   (Yi  1   2 X i )2    2   2 n   ( Y     X ) n n 1 2 2 i 1 2 i ln( L(  ,  ))   ln(2 )  ln( )     2 2 2 2 i    24 l 1  2 1  l 1  2  2  n  Y   i  2 X i  i n  (Y   i 1   2 X i )( X i ) i l n 1     2 2 2 2 4 25 1 n 2 ( Y     X )  i 1 2 i i Matrix notation Y  X u f (Y | X )  1   n /2 e  (1/2 2 )( u ' u ) 2 2 n n 1 2 log( L)   ln 2  ln   (Y  X  )(Y  X  ) 2 2 2 2 n n 1 l   ln 2  ln  2  (Y  X  )(Y  X  ) 2 2 2 2 26 3.4 Estimation of the Linear Regression Model Parameter vector is    ( , 2 ) l 1   2   X y  X X     and l n 1   u u 2 2 4  2 2 ˆMLE  ( X ' X )1 X ' y 27 ˆ MLE 2  u 'u n 3.4 Estimation of the Linear Regression Model To calculate variance matrix of parameters, we need hessian of likelihood parameters. İf we take ot second derivatives Taking expectations,  2l  '   2l   2l  ( ) 2 2 X X 2 1 2   4 ( X y  X X  )   =  n 2 4 - X u  4  E(  X u  4 )0 u'u 6  2l X 'X E( ) 2  '   2l -E( )0 2   2l n -E( ) 2 2 ( ) 2 4 28 3.4 Estimation of the Linear Regression Model  2l  ( ) 2 2 = n 2 4 - u'u 6 E(u ' u ) since  =  E (u ' u )   2 n n 2   2l  n n 2 n n n E  - 6  - 4  2 2  4 4 4  (  ) 2   2   2      2l  n E   2 2  4  (  ) 2     2l X 'X E( ) 2  '   2l -E( )0 2   2l n -E( ) 2 2 ( ) 2 4 29 3.4 Estimation of the Linear Regression Model So, the information matrix is   2l     I ( )    2l  2     2l   1 (X ' X )   2    2  2 l   0  2 2   ( )    0   n   2 4  The inverse of the information matrix will give us the variance-covariance matrix of the MLE estimators,   2 ( X ' X ) 1 I 1 ( )   0   30 0   4 2   n  Testing in Maximum Likelihood Framework H 0 : Rβ  r R is qxk matrix of known contants with q<k, and r is a q-vector of known constants. 31 32 33 35 loglikeliho ok 30 25 20 15 10 Lagrange multiplier Likelihood Ratio 5 0 0 -5 -10 0.5 1 1.5 2 2.5 3 3.5 Example from Poisson example 35 Example In our Poisson example H : 1.8, ˆ  2.0 0 L( )  where  is the restricted and ˆ is unsrestricted model L(ˆ)  This ratio is always between 0 and 1 and the less likely the assumption is, the smaller this ratio 36 Likelihood Ratio Test If we want to test H 0 : R  r likelihood ratio defined as L(  ;  2 )  L(  ;  2 ) Restricted Unrestricted can be used with decision rule Reject H0 if  2ln   q2 q:#restrictions 37 Likelihood Ratio Test  L(1.8)   0.0936  LR  2ln   2ln   -2ln =0.2144     0.104   L(2)  10.95  3.84 Don’t reject the null 38 12 More on LR test in the context of Linear Regression 39 Likelihood Ratio Test H 0 : R  r L(  ;  2 )   L(  ;  2 )  n 2 remember : L(  ;  )  k (uˆ uˆ ) so that the restricted model's L can be written similarly 2  n 2 L(  ;  2 )  k (u u ) Likelihood ratio then can be written as = k (u u )  n 2 k (uˆ uˆ )  n 2 n   (u u ) 2   LR  2 ln n    (uˆ uˆ ) 2  LR  n(ln(u u )  ln(uˆ uˆ ))  q2 Reject H0 if LR  q2 40  n  n    2    ln(u u )  ln(uˆ uˆ )      2  2   

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 3.The Theory of Maximum Likelihood