Download Statistics

Document related concepts
no text concepts found
Transcript
Econometrics I
Professor William Greene
Stern School of Business
Department of Economics
19-1/56
Part 19: MLE Applications
Econometrics I
Part 19 –MLE Applications
and a Two Step Estimator
19-2/56
Part 19: MLE Applications
Model for a Binary Dependent Variable

Binary outcome.




Event occurs or doesn’t (e.g., the person adopts green
technology, the person enters the labor force, etc.)
Model the probability of the event. P(x)=Prob(y=1|x)
Probability responds to independent variables
Requirements for a probability


19-3/56
0 < Probability < 1
P(x) should be monotonic in x – it’s a CDF
Part 19: MLE Applications
Central Proposition
A Behavioral Utility Based Approach




19-4/56
Observed outcomes partially reveal underlying preferences
There exists an underlying preference scale defined over
alternatives, U*(choices)
Revelation of preferences between two choices labeled 0 and 1
reveals the ranking of the underlying utility

U*(choice 1) > U*(choice 0)
Choose 1

U*(choice 1) < U*(choice 0)
Choose 0
Net utility = U = U*(choice 1) - U*(choice 0). U > 0 => choice 1
Part 19: MLE Applications
Binary Outcome: Visit Doctor
In the 1984 year of the GSOEP, 2265 of 3874
individuals visited the doctor at least once.
19-5/56
Part 19: MLE Applications
A Random Utility Model for
the Binary Choice

Yes or No decision | Visit or not visit the doctor

Model: Net utility of visit at least once

Net utility depends on observables and unobservables
Udoctor = Net utility = U*visit – U*not visit
Random Utility
Udoctor =  + 1Age + 2Income + 3Sex + 
Choose to visit at least once if net utility is positive

19-6/56
Observed Data: X
y
= Age, Income, Sex
= 1 if choose visit,  Udoctor > 0, 0 if not.
Part 19: MLE Applications
Modeling the Binary Choice Between
the Two Alternatives
Net Utility Udoctor = U*visit – U*not visit
Udoctor =  + 1 Age + 2 Income + 3 Sex + 
Chooses to visit: Udoctor > 0
 + 1 Age + 2 Income + 3 Sex +  > 0
Choosing to visit is a random outcome because of 
 > -( + 1 Age + 2 Income + 3 Sex)
19-7/56
Part 19: MLE Applications
Probability Model for Choice Between Two Alternatives
People with the same (Age,Income,Sex) will make different choices because  is
random. We can model the probability that the random event “visits the
doctor”will occur.
Probability is
governed by ,
the random
part of the
utility function.
Event DOCTOR=1 occurs if  > -( + 1Age + 2Income + 3Sex)
We model the probability of this event.
19-8/56
Part 19: MLE Applications
An Application
27,326 Observations in GSOEP Sample



19-9/56
1 to 7 years, panel
7,293 households observed
We use the 1994 year; 3,337 household observations
Part 19: MLE Applications
An Econometric Model

Choose to visit iff Udoctor > 0
 Udoctor =  + 1 Age + 2 Income + 3 Sex + 


Udoctor > 0   > -( + 1 Age + 2 Income + 3 Sex)
 <  + 1 Age + 2 Income + 3 Sex
Probability model: For any person observed by the analyst,
Prob(doctor=1) = Prob( <  + 1 Age + 2 Income + 3 Sex)

Note the relationship between the unobserved  and the
observed outcome DOCTOR.
19-10/56
Part 19: MLE Applications
Index
= +1Age + 2 Income + 3 Sex
Probability
= a function of the Index.
P(Doctor = 1) = f(Index)
Internally consistent probabilities:
(1) (Coherence)
0 < Probability < 1
(2) (Monotonicity) Probability increases with Index.
19-11/56
Part 19: MLE Applications
Econometric Identification Issues





Data may reveal information about coefficients
I.e., effect of observed variables on utilities
May reveal information about probabilities
I.e., probabilities under certain assumptions
Data on choices made do not reveal information about
utility itself
The data contain no information about the scale of
utilities or utility differences. We only observe the sign of
the utility. (Ones and zeros.)
Variance of  is not estimable so it is normalized at 1 or
some other fixed (known) constant.
19-12/56
Part 19: MLE Applications
A Fully Parametric Model




Index Function: U = β’x + ε
Observation Mechanism: y = 1[U > 0]
Distribution: ε ~ f(ε); Normal, Logistic, …
Maximum Likelihood Estimation:
Max(β) logL = Σi log Prob(Yi = yi|xi)
19-13/56
Part 19: MLE Applications
A Logit Model
We examine the model components.
19-14/56
Part 19: MLE Applications
Parametric Model Estimation

How to estimate , 1, 2, 3?

The technique of maximum likelihood
L   y 0 Prob[ y  0 | x]   y 1 Prob[ y  1| x]

Prob[doctor=1] = Prob[ > -( + 1 Age + 2 Income + 3 Sex)]
Prob[doctor=0] = 1 – Prob[doctor=1]

Requires a model for the probability
19-15/56
Part 19: MLE Applications
Completing the Model: F()

The distribution





Does it matter?


19-16/56
Normal:
PROBIT, natural for behavior
Logistic:
LOGIT, allows “thicker tails”
Gompertz: EXTREME VALUE, asymmetric
Others…
Yes, large difference in estimates
Not much, quantities of interest are more stable.
Part 19: MLE Applications
Two Standard Models

Based on the normal distribution:



Based on the logistic distribution



Prob[y=1|x] = (β’x) = CDF of normal distribution
The “probit” model
Prob[y=1|x] = exp(β’x)/[1+ exp(β’x)] = (β’x)
The “logit” model
Log likelihood


19-17/56
P(y|x) = (1-F)(1-y) Fy where F = the cdf
LogL = Σi (1-yi)log(1-Fi) + yilogFi
= Σi F[(2yi-1)β’x] since F(-t)=1-F(t) for both.
Part 19: MLE Applications
Mechanics
Log Likelihood Function
ln L   i 1 ln F ( qixi ); qi  1 if y i =1, -1 if y i =0.
n
F(.) = (.) for logit, (.) for probit
Likelihood Equation
 ln F (qixi )
 ln L
n
n
  i 1
  i 1 g i  0


q
g i  ( yi   i ) xi for logit, i i x i =a i xi for probit
i
Second Derivatives
 2 ln F ( qixi )
 2 LnL
n
n
n
  i 1
  i 1 H i   i 1 hi xi xi
 
 
hi   i (1   i ) for logit, -[a i ( ai  qixi )]
Covariance Matrix Estimators:
n
Conventional   i 1 hi xi xi 


1
1
Robust
19-18/56
n
BHHH   i 1 g i gi 


1
  n hi xi xi    n g i gi    n hi xi xi 
 i 1
  i 1
  i 1

1
Part 19: MLE Applications
Estimated Binary Choice Models for
Three Distributions
Log-L(0) = log likelihood for a model that has only a constant term.
Ignore the t ratios for now.
19-19/56
Part 19: MLE Applications
Effect on Predicted Probability of an Increase in Age
 + 1 (Age+1) + 2 (Income) + 3 Sex (1 is positive)
19-20/56
Part 19: MLE Applications
Coefficients in the Binary Choice Models
E[y|x] = 0*(1-Fi) + 1*Fi = P(y=1|x)
= F(β’x)
The coefficients are not the slopes, as usual
in a nonlinear model
∂E[y|x]/∂x= f(β’x)β
These will look similar for probit and logit
19-21/56
Part 19: MLE Applications
Partial Effects in Probability Models


Prob[Outcome] = some F(+1Income…)
“Partial effect” = F(+1Income…) / ”x”


Partial effects are derivatives
Result varies with model


19-22/56
(derivative)
Logit: F(+1Income…) /x
=

Probit:  F(+1Income…)/x
=

Extreme Value:  F(+1Income…)/x
=

Normal density  
Prob * (-log Prob)  
Prob * (1-Prob)

Scaling usually erases model differences
Part 19: MLE Applications
Partial effect for the logit model
exp(α+β1 Age +β 2Income +β 3Sex )
Prob(doctor =1) =
1+ exp(α+β1 Age +β 2Income +β 3Sex )
= (α+β1 Age +β 2Income +β 3Sex)
= (β x )
The derivative with respect to one of the variables is
 (β x )
  (β x )1  (β x ) β k
xk
(1) A multiple of the coefficient, not the coefficient itself
(2) A function of all of the coefficients and variables
(3) Evaluated using the data and model parts after the model
is estimated.
Similar computations apply for other models such as probit.
19-23/56
Part 19: MLE Applications
Estimated Partial Effects
for Three Models
(Standard errors based on the delta method)
19-24/56
Part 19: MLE Applications
Partial Effect for a Dummy Variable Computed
Using Means of Other Variables



Prob[yi = 1|xi,di] = F(’xi+di) where d is a dummy
variable such as Sex in our doctor model.
For the probit model, Prob[yi = 1|xi,di] = (x+d), 
= the normal CDF.
Partial effect of d
Prob[yi = 1|xi, di=1] - Prob[yi = 1|xi, di=0]

  
= (di )   ˆ x  ˆ   ˆ x
19-25/56
Part 19: MLE Applications
Partial Effect – Dummy Variable
19-26/56
Part 19: MLE Applications
Computing Partial Effects

Compute at the data means (PEA)




Average the individual effects (APE)


19-27/56
Simple
Inference is well defined.
Not realistic for some variables, such as Sex
More appropriate
Asymptotic standard errors are slightly more complicated.
Part 19: MLE Applications
Partial Effects
Probability = Pi  F( ' xi )
Pi F( ' xi )
Partial Effect =

 f ( ' xi )   = di
xi
xi


Partial Effect at the Means = f ( ' x )    f  '  n1 in1xi   
Average Partial Effect
=
1
n
in1di


1
n
in1f ( ' xi )   
Both are estimates of δ =E[di ] under certain assumptions.
19-28/56
Part 19: MLE Applications
The two approaches usually give similar answers,
though sometimes the results differ substantially.
Age
Income
Female
19-29/56
Average Partial Partial Effects
Effects
at Data Means
0.00512
0.00527
-0.09609
-0.09871
0.13792
0.13958
Part 19: MLE Applications
APE vs. Partial Effects at the Mean
Delta Method for Average Partial Effect
N
1

Estimator of Var   i 1 PartialEffect i   G Var ˆ  G 
N

19-30/56
Part 19: MLE Applications
19-31/56
Part 19: MLE Applications
19-32/56
Part 19: MLE Applications
19-33/56
Part 19: MLE Applications
I have a question. The question is as follows. We have a probit
model. We used LM tests to test for the hetercodeaticiy in this
model and found that there is heterocedasticity in this model...
How do we proceed now? What do we do to get rid of
heterescedasticiy?
Testing for heteroscedasticity in a probit model and then getting
rid of heteroscedasticit in this model is not a common procedure.
In fact I do not remember seen an applied (or theoretical also)
works which tests for heteroscedasticiy and then uses a method
to get rid of it???
See Econometric Analysis, 7th ed. pages 714-715
The most common specification is the Harvey model,
Prob(y = 1|x,z) = F[’x / exp(’z)]
19-34/56
Part 19: MLE Applications
Odds Ratios
This calculation is not meaningful if
the model is not a binary logit model
1
Prob(y = 0| x , z) =
,
1+ exp(βx + z)
exp(βx + z)
Prob(y =1| x , z) =
1+ exp(βx + z)
Prob(y =1| x , z) exp(βx + z)
OR ( x , z ) 

Prob(y = 0| x , z)
1
 exp(βx + z)
 exp(βx )exp( z)
OR ( x , z +1) exp(βx )exp( z +  )

 exp(  )
OR ( x , z )
exp(βx )exp( z)
19-35/56
Part 19: MLE Applications
Odds Ratio
Exp() = multiplicative change in the odds ratio
when z changes by 1 unit.
 dOR(x,z)/dx = OR(x,z)*, not exp()
 The “odds ratio” is not a partial effect – it is not a
derivative.
 It is only meaningful when the odds ratio is itself
of interest and the change of the variable by a
whole unit is meaningful.
 “Odds ratios” might be interesting for dummy
variables

19-36/56
Part 19: MLE Applications
Cautions About reported Odds Ratios
19-37/56
Part 19: MLE Applications
The Linear Probability “Model”
Prob(y = 1| x) = βx
E[y | x ] = 0 * Prob(y = 1| x) + 1Prob(y = 1| x) = Prob(y = 1| x )
y = βx + ε
19-38/56
Part 19: MLE Applications
The Dependent Variable equals zero for 99.1% of the observations. In
the sample of 163,474 observations, the LHS variable equals 1 about
1,500 times.
19-39/56
Part 19: MLE Applications
2SLS for a
binary
dependent
variable.
19-40/56
Part 19: MLE Applications
Prob(y = 1| x) = βx
E[y | x ] = 0 * Prob(y = 1| x) +1Prob(y = 1| x) = Prob(y = 1| x )
y = βx + ε
Predictions : Nothing prevents Pˆ < 0 or > 1.
Residuals : e = y - βˆ x = 1- βˆ x if y = 1, or 0 - βˆ x if y = 0
The standard errors make no sense because the stochastic properties
of the "disturbance" are inconsistent with the observed variable.
Heteroscedasticity : The variance of y|x equals
Prob(y = 0 | x )Prob(y = 1| x )  βx (1  βx )
The "disturbances" are heteroscedastic. Users of the LPM always
worry about clustering. They should also worry about heteroscedasticity.
19-41/56
Part 19: MLE Applications
OLS approximates the partial effects, “directly,” without bothering with coefficients.
MLE
Average Partial Effects
OLS Coefficients
19-42/56
Part 19: MLE Applications
GARCH Models: A Model for Time Series with
Latent Heteroscedasticity
Bollerslev/Ghysel, 1974
19-43/56
Part 19: MLE Applications
ARCH Model
19-44/56
Part 19: MLE Applications
GARCH Model
19-45/56
Part 19: MLE Applications
Estimated GARCH Model
---------------------------------------------------------------------GARCH MODEL
Dependent variable
Y
Log likelihood function
-1106.60788
Restricted log likelihood
-1311.09637
Chi squared [
2 d.f.]
408.97699
Significance level
.00000
McFadden Pseudo R-squared
.1559676
Estimation based on N =
1974, K =
4
GARCH Model, P = 1, Q = 1
Wald statistic for GARCH =
3727.503
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------|Regression parameters
Constant|
-.00619
.00873
-.709
.4783
|Unconditional Variance
Alpha(0)|
.01076***
.00312
3.445
.0006
|Lagged Variance Terms
Delta(1)|
.80597***
.03015
26.731
.0000
|Lagged Squared Disturbance Terms
Alpha(1)|
.15313***
.02732
5.605
.0000
|Equilibrium variance, a0/[1-D(1)-A(1)]
EquilVar|
.26316
.59402
.443
.6577
--------+-------------------------------------------------------------
19-46/56
Part 19: MLE Applications
2 Step Estimation (Murphy-Topel)
Setting, fitting a model which contains parameter estimates
from another model.
Typical application, inserting a prediction from one model
into another.
A. Procedures: How it's done.
B. Asymptotic results:
1. Consistency
2. Getting an appropriate estimator of the
asymptotic covariance matrix
The Murphy - Topel result
Application: Equation 1: Number of children
Equation 2: Labor force participation
19-47/56
Part 19: MLE Applications
Setting

Two equation model:




Model for y1 = f(y1 | x1, θ1)
Model for y2 = f(y2 | x2, θ2, x1, θ1)
(Note, not ‘simultaneous’ or even ‘recursive.’)
Procedure:



19-48/56
Estimate θ1 by ML, with covariance matrix (1/n)V1
Estimate θ2 by ML treating θ1 as if it were known.
Correct the estimated asymptotic covariance matrix,
(1/n)V2 for the estimator of θ2
Part 19: MLE Applications
Murphy and Topel (1984,2002) Results
Both MLEs are consistent
Asy.Var[ˆ 2 ] 
1
n
 V2  V2 (CV1C ' RV1C ' CV1R ')V2 
V1  Asy.Var n[ˆ 1  1]
V2  Asy.Var n[ˆ 2  2 ] | 1
 1   logL    logL  
2
2

C =E 
 
 n  2   1  
 1   logL    logL  
2
1

R =E 
 
 n  2   1  
19-49/56
Part 19: MLE Applications
M&T Computations
First equation: ˆ 1=MLE,
ˆ 
Vˆ 1   n1 Ni1  H
i1 

Second equation: ˆ =MLE| ˆ
2
1
or
1
N
i1 i1 i1
  gˆ gˆ  
1
n
1
1
1
1 N
1 N ˆ ˆ
ˆ
ˆ


V2  n i1  Hi2
or  n i1gi2 gi3 


1 N   ln f2 (y 2 | x 2 , ˆ 2 , x1, ˆ 1 )    ln f2 (y 2 | x 2 , ˆ 2 , x1, ˆ 1 ) 
C = i1 


ˆ
ˆ
n
2
1



1 N   ln f2 (y 2 | x 2 , ˆ 2 , x1, ˆ 1 )    ln f1(y1 | x1, ˆ 1 ) 
R = i1 


ˆ
ˆ
n
2
1



19-50/56
Part 19: MLE Applications
Example
Equation 1: Number of Kids –
Poisson Regression




19-51/56
p(yi1|xi1, β)=exp(-λi)λiyi1/yi1!
λi = exp(xi1’β)
gi1 = xi1(yi1 – λi)
V1 = [(1/n)Σ(-λi)xi1xi1’]-1
Part 19: MLE Applications
Example - Continued
Equation 2: Labor Force Participation – Logit
p(yi2|xi2,δ,α,xi1,β)=exp(di2)/[1+exp(di2)]=Pi2
di2 = (2yi2-1)[δxi2 + αλi]
λi = exp(βxi1)
Let zi2 = (xi2, λi), θ2 = (δ, α)
di2 = (2yi2-1)[θ2zi2]
gi2 = (yi2-Pi2)zi2
V2 = [(1/n)Σ{-Pi2(1-Pi2)}zi2zi2’]-1
19-52/56
Part 19: MLE Applications
Murphy and Topel Correction
C 
1
N
N
i1
R 
1
N
19-53/56
N
i1
(yi2  Pi2 )zi2 (yi2  Pi2 )i xi1 
(yi2  Pi2 )zi2 (yi1  i )xi1 
Part 19: MLE Applications
Two Step Estimation of LFP Model
19-54/56
? Data transformations. Number of kids, scale income variables
Create ; Kids = kl6 + k618
; income = faminc/10000 ; Wifeinc = ww*whrs/1000 $
? Equation 1, number of kids. Standard Poisson fertility model.
? Fit equation, collect parameters BETA and covariance matrix V1
? Then compute fitted values and derivatives
Namelist ; X1 = one,wa,we,income,wifeinc$
Poisson ; Lhs = kids ; Rhs = X1 $
Matrix
; Beta = b ; V1 = N*VARB $
Create
; Lambda = Exp(X1'Beta); gi1 = Kids - Lambda $
? Set up logit labor force participation model
? Compute probit model and collect results. Delta=Coefficients
on X2
? Alpha = coefficient on fitted number of kids.
Namelist ; X2 = one,wa,we,ha,he,income ; Z2 = X2,Lambda $
Logit
; Lhs = lfp
; Rhs = Z2 $
Calc
; alpha = b(kreg) ; K2 = Col(X2) $
Matrix
; delta=b(1:K2)
; Theta2 = b ; V2 = N*VARB $
? Poisson derivative of with respect to beta is (kidsi lambda)´X1
Create
; di = delta'X2 + alpha*Lambda
; pi2= exp(di)/(1+exp(di))
; gi2 = LFP - Pi2
? These are the terms that are used to compute R and C.
; ci = gi2*gi2*alpha*lambda
; ri = gi2*gi1$
MATRIX
; C = 1/n*Z2'[ci]X1
; R = 1/n*Z2'[ri]X1
; A = C*V1*C' - R*V1*C' - C*V1*R'
; V2S = V2+V2*A*V2 ; V2s = 1/N*V2S $
? Compute matrix products and report results
Matrix ; Stat(Theta2,V2s,Z2)$
Part 19: MLE Applications
Estimated Equation 1: E[Kids]
+---------------------------------------------+
| Poisson Regression
|
| Dependent variable
KIDS
|
| Number of observations
753
|
| Log likelihood function
-1123.627
|
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
3.34216852
.24375192
13.711
.0000
WA
-.06334700
.00401543
-15.776
.0000
42.5378486
WE
-.02572915
.01449538
-1.775
.0759
12.2868526
INCOME
.06024922
.02432043
2.477
.0132
2.30805950
WIFEINC
-.04922310
.00856067
-5.750
.0000
2.95163126
19-55/56
Part 19: MLE Applications
Two Step Estimator
+---------------------------------------------+
| Multinomial Logit Model
|
| Dependent variable
LFP
|
| Number of observations
753
|
| Log likelihood function
-351.5765
|
| Number of parameters
7
|
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Characteristics in numerator of Prob[Y = 1]
Constant
33.1506089
2.88435238
11.493
.0000
WA
-.54875880
.05079250
-10.804
.0000
42.5378486
WE
-.02856207
.05754362
-.496
.6196
12.2868526
HA
-.01197824
.02528962
-.474
.6358
45.1208499
HE
-.02290480
.04210979
-.544
.5865
12.4913679
INCOME
.39093149
.09669418
4.043
.0001
2.30805950
LAMBDA
-5.63267225
.46165315
-12.201
.0000
1.59096946
With Corrected Covariance Matrix
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
Constant
33.1506089
5.41964589
6.117
.0000
WA
-.54875880
.07780642
-7.053
.0000
WE
-.02856207
.12508144
-.228
.8194
HA
-.01197824
.02549883
-.470
.6385
HE
-.02290480
.04862978
-.471
.6376
INCOME
.39093149
.27444304
1.424
.1543
LAMBDA
-5.63267225
1.07381248
-5.245
.0000
19-56/56
Part 19: MLE Applications