Download Statistics

Econometrics I Professor William Greene Stern School of Business Department of Economics 19-1/56 Part 19: MLE Applications Econometrics I Part 19 –MLE Applications and a Two Step Estimator 19-2/56 Part 19: MLE Applications Model for a Binary Dependent Variable  Binary outcome.     Event occurs or doesn’t (e.g., the person adopts green technology, the person enters the labor force, etc.) Model the probability of the event. P(x)=Prob(y=1|x) Probability responds to independent variables Requirements for a probability   19-3/56 0 < Probability < 1 P(x) should be monotonic in x – it’s a CDF Part 19: MLE Applications Central Proposition A Behavioral Utility Based Approach     19-4/56 Observed outcomes partially reveal underlying preferences There exists an underlying preference scale defined over alternatives, U*(choices) Revelation of preferences between two choices labeled 0 and 1 reveals the ranking of the underlying utility  U*(choice 1) > U*(choice 0) Choose 1  U*(choice 1) < U*(choice 0) Choose 0 Net utility = U = U*(choice 1) - U*(choice 0). U > 0 => choice 1 Part 19: MLE Applications Binary Outcome: Visit Doctor In the 1984 year of the GSOEP, 2265 of 3874 individuals visited the doctor at least once. 19-5/56 Part 19: MLE Applications A Random Utility Model for the Binary Choice  Yes or No decision | Visit or not visit the doctor  Model: Net utility of visit at least once  Net utility depends on observables and unobservables Udoctor = Net utility = U*visit – U*not visit Random Utility Udoctor =  + 1Age + 2Income + 3Sex +  Choose to visit at least once if net utility is positive  19-6/56 Observed Data: X y = Age, Income, Sex = 1 if choose visit,  Udoctor > 0, 0 if not. Part 19: MLE Applications Modeling the Binary Choice Between the Two Alternatives Net Utility Udoctor = U*visit – U*not visit Udoctor =  + 1 Age + 2 Income + 3 Sex +  Chooses to visit: Udoctor > 0  + 1 Age + 2 Income + 3 Sex +  > 0 Choosing to visit is a random outcome because of   > -( + 1 Age + 2 Income + 3 Sex) 19-7/56 Part 19: MLE Applications Probability Model for Choice Between Two Alternatives People with the same (Age,Income,Sex) will make different choices because  is random. We can model the probability that the random event “visits the doctor”will occur. Probability is governed by , the random part of the utility function. Event DOCTOR=1 occurs if  > -( + 1Age + 2Income + 3Sex) We model the probability of this event. 19-8/56 Part 19: MLE Applications An Application 27,326 Observations in GSOEP Sample    19-9/56 1 to 7 years, panel 7,293 households observed We use the 1994 year; 3,337 household observations Part 19: MLE Applications An Econometric Model  Choose to visit iff Udoctor > 0  Udoctor =  + 1 Age + 2 Income + 3 Sex +    Udoctor > 0   > -( + 1 Age + 2 Income + 3 Sex)  <  + 1 Age + 2 Income + 3 Sex Probability model: For any person observed by the analyst, Prob(doctor=1) = Prob( <  + 1 Age + 2 Income + 3 Sex)  Note the relationship between the unobserved  and the observed outcome DOCTOR. 19-10/56 Part 19: MLE Applications Index = +1Age + 2 Income + 3 Sex Probability = a function of the Index. P(Doctor = 1) = f(Index) Internally consistent probabilities: (1) (Coherence) 0 < Probability < 1 (2) (Monotonicity) Probability increases with Index. 19-11/56 Part 19: MLE Applications Econometric Identification Issues      Data may reveal information about coefficients I.e., effect of observed variables on utilities May reveal information about probabilities I.e., probabilities under certain assumptions Data on choices made do not reveal information about utility itself The data contain no information about the scale of utilities or utility differences. We only observe the sign of the utility. (Ones and zeros.) Variance of  is not estimable so it is normalized at 1 or some other fixed (known) constant. 19-12/56 Part 19: MLE Applications A Fully Parametric Model     Index Function: U = β’x + ε Observation Mechanism: y = 1[U > 0] Distribution: ε ~ f(ε); Normal, Logistic, … Maximum Likelihood Estimation: Max(β) logL = Σi log Prob(Yi = yi|xi) 19-13/56 Part 19: MLE Applications A Logit Model We examine the model components. 19-14/56 Part 19: MLE Applications Parametric Model Estimation  How to estimate , 1, 2, 3?  The technique of maximum likelihood L   y 0 Prob[ y  0 | x]   y 1 Prob[ y  1| x]  Prob[doctor=1] = Prob[ > -( + 1 Age + 2 Income + 3 Sex)] Prob[doctor=0] = 1 – Prob[doctor=1]  Requires a model for the probability 19-15/56 Part 19: MLE Applications Completing the Model: F()  The distribution      Does it matter?   19-16/56 Normal: PROBIT, natural for behavior Logistic: LOGIT, allows “thicker tails” Gompertz: EXTREME VALUE, asymmetric Others… Yes, large difference in estimates Not much, quantities of interest are more stable. Part 19: MLE Applications Two Standard Models  Based on the normal distribution:    Based on the logistic distribution    Prob[y=1|x] = (β’x) = CDF of normal distribution The “probit” model Prob[y=1|x] = exp(β’x)/[1+ exp(β’x)] = (β’x) The “logit” model Log likelihood   19-17/56 P(y|x) = (1-F)(1-y) Fy where F = the cdf LogL = Σi (1-yi)log(1-Fi) + yilogFi = Σi F[(2yi-1)β’x] since F(-t)=1-F(t) for both. Part 19: MLE Applications Mechanics Log Likelihood Function ln L   i 1 ln F ( qixi ); qi  1 if y i =1, -1 if y i =0. n F(.) = (.) for logit, (.) for probit Likelihood Equation  ln F (qixi )  ln L n n   i 1   i 1 g i  0   q g i  ( yi   i ) xi for logit, i i x i =a i xi for probit i Second Derivatives  2 ln F ( qixi )  2 LnL n n n   i 1   i 1 H i   i 1 hi xi xi     hi   i (1   i ) for logit, -[a i ( ai  qixi )] Covariance Matrix Estimators: n Conventional   i 1 hi xi xi    1 1 Robust 19-18/56 n BHHH   i 1 g i gi    1   n hi xi xi    n g i gi    n hi xi xi   i 1   i 1   i 1  1 Part 19: MLE Applications Estimated Binary Choice Models for Three Distributions Log-L(0) = log likelihood for a model that has only a constant term. Ignore the t ratios for now. 19-19/56 Part 19: MLE Applications Effect on Predicted Probability of an Increase in Age  + 1 (Age+1) + 2 (Income) + 3 Sex (1 is positive) 19-20/56 Part 19: MLE Applications Coefficients in the Binary Choice Models E[y|x] = 0*(1-Fi) + 1*Fi = P(y=1|x) = F(β’x) The coefficients are not the slopes, as usual in a nonlinear model ∂E[y|x]/∂x= f(β’x)β These will look similar for probit and logit 19-21/56 Part 19: MLE Applications Partial Effects in Probability Models   Prob[Outcome] = some F(+1Income…) “Partial effect” = F(+1Income…) / ”x”   Partial effects are derivatives Result varies with model   19-22/56 (derivative) Logit: F(+1Income…) /x =  Probit:  F(+1Income…)/x =  Extreme Value:  F(+1Income…)/x =  Normal density   Prob * (-log Prob)   Prob * (1-Prob)  Scaling usually erases model differences Part 19: MLE Applications Partial effect for the logit model exp(α+β1 Age +β 2Income +β 3Sex ) Prob(doctor =1) = 1+ exp(α+β1 Age +β 2Income +β 3Sex ) = (α+β1 Age +β 2Income +β 3Sex) = (β x ) The derivative with respect to one of the variables is  (β x )   (β x )1  (β x ) β k xk (1) A multiple of the coefficient, not the coefficient itself (2) A function of all of the coefficients and variables (3) Evaluated using the data and model parts after the model is estimated. Similar computations apply for other models such as probit. 19-23/56 Part 19: MLE Applications Estimated Partial Effects for Three Models (Standard errors based on the delta method) 19-24/56 Part 19: MLE Applications Partial Effect for a Dummy Variable Computed Using Means of Other Variables    Prob[yi = 1|xi,di] = F(’xi+di) where d is a dummy variable such as Sex in our doctor model. For the probit model, Prob[yi = 1|xi,di] = (x+d),  = the normal CDF. Partial effect of d Prob[yi = 1|xi, di=1] - Prob[yi = 1|xi, di=0]     = (di )   ˆ x  ˆ   ˆ x 19-25/56 Part 19: MLE Applications Partial Effect – Dummy Variable 19-26/56 Part 19: MLE Applications Computing Partial Effects  Compute at the data means (PEA)     Average the individual effects (APE)   19-27/56 Simple Inference is well defined. Not realistic for some variables, such as Sex More appropriate Asymptotic standard errors are slightly more complicated. Part 19: MLE Applications Partial Effects Probability = Pi  F( ' xi ) Pi F( ' xi ) Partial Effect =   f ( ' xi )   = di xi xi   Partial Effect at the Means = f ( ' x )    f  '  n1 in1xi    Average Partial Effect = 1 n in1di   1 n in1f ( ' xi )    Both are estimates of δ =E[di ] under certain assumptions. 19-28/56 Part 19: MLE Applications The two approaches usually give similar answers, though sometimes the results differ substantially. Age Income Female 19-29/56 Average Partial Partial Effects Effects at Data Means 0.00512 0.00527 -0.09609 -0.09871 0.13792 0.13958 Part 19: MLE Applications APE vs. Partial Effects at the Mean Delta Method for Average Partial Effect N 1  Estimator of Var   i 1 PartialEffect i   G Var ˆ  G  N  19-30/56 Part 19: MLE Applications 19-31/56 Part 19: MLE Applications 19-32/56 Part 19: MLE Applications 19-33/56 Part 19: MLE Applications I have a question. The question is as follows. We have a probit model. We used LM tests to test for the hetercodeaticiy in this model and found that there is heterocedasticity in this model... How do we proceed now? What do we do to get rid of heterescedasticiy? Testing for heteroscedasticity in a probit model and then getting rid of heteroscedasticit in this model is not a common procedure. In fact I do not remember seen an applied (or theoretical also) works which tests for heteroscedasticiy and then uses a method to get rid of it??? See Econometric Analysis, 7th ed. pages 714-715 The most common specification is the Harvey model, Prob(y = 1|x,z) = F[’x / exp(’z)] 19-34/56 Part 19: MLE Applications Odds Ratios This calculation is not meaningful if the model is not a binary logit model 1 Prob(y = 0| x , z) = , 1+ exp(βx + z) exp(βx + z) Prob(y =1| x , z) = 1+ exp(βx + z) Prob(y =1| x , z) exp(βx + z) OR ( x , z )   Prob(y = 0| x , z) 1  exp(βx + z)  exp(βx )exp( z) OR ( x , z +1) exp(βx )exp( z +  )   exp(  ) OR ( x , z ) exp(βx )exp( z) 19-35/56 Part 19: MLE Applications Odds Ratio Exp() = multiplicative change in the odds ratio when z changes by 1 unit.  dOR(x,z)/dx = OR(x,z)*, not exp()  The “odds ratio” is not a partial effect – it is not a derivative.  It is only meaningful when the odds ratio is itself of interest and the change of the variable by a whole unit is meaningful.  “Odds ratios” might be interesting for dummy variables  19-36/56 Part 19: MLE Applications Cautions About reported Odds Ratios 19-37/56 Part 19: MLE Applications The Linear Probability “Model” Prob(y = 1| x) = βx E[y | x ] = 0 * Prob(y = 1| x) + 1Prob(y = 1| x) = Prob(y = 1| x ) y = βx + ε 19-38/56 Part 19: MLE Applications The Dependent Variable equals zero for 99.1% of the observations. In the sample of 163,474 observations, the LHS variable equals 1 about 1,500 times. 19-39/56 Part 19: MLE Applications 2SLS for a binary dependent variable. 19-40/56 Part 19: MLE Applications Prob(y = 1| x) = βx E[y | x ] = 0 * Prob(y = 1| x) +1Prob(y = 1| x) = Prob(y = 1| x ) y = βx + ε Predictions : Nothing prevents Pˆ < 0 or > 1. Residuals : e = y - βˆ x = 1- βˆ x if y = 1, or 0 - βˆ x if y = 0 The standard errors make no sense because the stochastic properties of the "disturbance" are inconsistent with the observed variable. Heteroscedasticity : The variance of y|x equals Prob(y = 0 | x )Prob(y = 1| x )  βx (1  βx ) The "disturbances" are heteroscedastic. Users of the LPM always worry about clustering. They should also worry about heteroscedasticity. 19-41/56 Part 19: MLE Applications OLS approximates the partial effects, “directly,” without bothering with coefficients. MLE Average Partial Effects OLS Coefficients 19-42/56 Part 19: MLE Applications GARCH Models: A Model for Time Series with Latent Heteroscedasticity Bollerslev/Ghysel, 1974 19-43/56 Part 19: MLE Applications ARCH Model 19-44/56 Part 19: MLE Applications GARCH Model 19-45/56 Part 19: MLE Applications Estimated GARCH Model ---------------------------------------------------------------------GARCH MODEL Dependent variable Y Log likelihood function -1106.60788 Restricted log likelihood -1311.09637 Chi squared [ 2 d.f.] 408.97699 Significance level .00000 McFadden Pseudo R-squared .1559676 Estimation based on N = 1974, K = 4 GARCH Model, P = 1, Q = 1 Wald statistic for GARCH = 3727.503 --------+------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------|Regression parameters Constant| -.00619 .00873 -.709 .4783 |Unconditional Variance Alpha(0)| .01076*** .00312 3.445 .0006 |Lagged Variance Terms Delta(1)| .80597*** .03015 26.731 .0000 |Lagged Squared Disturbance Terms Alpha(1)| .15313*** .02732 5.605 .0000 |Equilibrium variance, a0/[1-D(1)-A(1)] EquilVar| .26316 .59402 .443 .6577 --------+------------------------------------------------------------- 19-46/56 Part 19: MLE Applications 2 Step Estimation (Murphy-Topel) Setting, fitting a model which contains parameter estimates from another model. Typical application, inserting a prediction from one model into another. A. Procedures: How it's done. B. Asymptotic results: 1. Consistency 2. Getting an appropriate estimator of the asymptotic covariance matrix The Murphy - Topel result Application: Equation 1: Number of children Equation 2: Labor force participation 19-47/56 Part 19: MLE Applications Setting  Two equation model:     Model for y1 = f(y1 | x1, θ1) Model for y2 = f(y2 | x2, θ2, x1, θ1) (Note, not ‘simultaneous’ or even ‘recursive.’) Procedure:    19-48/56 Estimate θ1 by ML, with covariance matrix (1/n)V1 Estimate θ2 by ML treating θ1 as if it were known. Correct the estimated asymptotic covariance matrix, (1/n)V2 for the estimator of θ2 Part 19: MLE Applications Murphy and Topel (1984,2002) Results Both MLEs are consistent Asy.Var[ˆ 2 ]  1 n  V2  V2 (CV1C ' RV1C ' CV1R ')V2  V1  Asy.Var n[ˆ 1  1] V2  Asy.Var n[ˆ 2  2 ] | 1  1   logL    logL   2 2  C =E     n  2   1    1   logL    logL   2 1  R =E     n  2   1   19-49/56 Part 19: MLE Applications M&T Computations First equation: ˆ 1=MLE, ˆ  Vˆ 1   n1 Ni1  H i1   Second equation: ˆ =MLE| ˆ 2 1 or 1 N i1 i1 i1   gˆ gˆ   1 n 1 1 1 1 N 1 N ˆ ˆ ˆ ˆ   V2  n i1  Hi2 or  n i1gi2 gi3    1 N   ln f2 (y 2 | x 2 , ˆ 2 , x1, ˆ 1 )    ln f2 (y 2 | x 2 , ˆ 2 , x1, ˆ 1 )  C = i1    ˆ ˆ n 2 1    1 N   ln f2 (y 2 | x 2 , ˆ 2 , x1, ˆ 1 )    ln f1(y1 | x1, ˆ 1 )  R = i1    ˆ ˆ n 2 1    19-50/56 Part 19: MLE Applications Example Equation 1: Number of Kids – Poisson Regression     19-51/56 p(yi1|xi1, β)=exp(-λi)λiyi1/yi1! λi = exp(xi1’β) gi1 = xi1(yi1 – λi) V1 = [(1/n)Σ(-λi)xi1xi1’]-1 Part 19: MLE Applications Example - Continued Equation 2: Labor Force Participation – Logit p(yi2|xi2,δ,α,xi1,β)=exp(di2)/[1+exp(di2)]=Pi2 di2 = (2yi2-1)[δxi2 + αλi] λi = exp(βxi1) Let zi2 = (xi2, λi), θ2 = (δ, α) di2 = (2yi2-1)[θ2zi2] gi2 = (yi2-Pi2)zi2 V2 = [(1/n)Σ{-Pi2(1-Pi2)}zi2zi2’]-1 19-52/56 Part 19: MLE Applications Murphy and Topel Correction C  1 N N i1 R  1 N 19-53/56 N i1 (yi2  Pi2 )zi2 (yi2  Pi2 )i xi1  (yi2  Pi2 )zi2 (yi1  i )xi1  Part 19: MLE Applications Two Step Estimation of LFP Model 19-54/56 ? Data transformations. Number of kids, scale income variables Create ; Kids = kl6 + k618 ; income = faminc/10000 ; Wifeinc = ww*whrs/1000 $ ? Equation 1, number of kids. Standard Poisson fertility model. ? Fit equation, collect parameters BETA and covariance matrix V1 ? Then compute fitted values and derivatives Namelist ; X1 = one,wa,we,income,wifeinc$ Poisson ; Lhs = kids ; Rhs = X1 $ Matrix ; Beta = b ; V1 = N*VARB $ Create ; Lambda = Exp(X1'Beta); gi1 = Kids - Lambda $ ? Set up logit labor force participation model ? Compute probit model and collect results. Delta=Coefficients on X2 ? Alpha = coefficient on fitted number of kids. Namelist ; X2 = one,wa,we,ha,he,income ; Z2 = X2,Lambda $ Logit ; Lhs = lfp ; Rhs = Z2 $ Calc ; alpha = b(kreg) ; K2 = Col(X2) $ Matrix ; delta=b(1:K2) ; Theta2 = b ; V2 = N*VARB $ ? Poisson derivative of with respect to beta is (kidsi lambda)´X1 Create ; di = delta'X2 + alpha*Lambda ; pi2= exp(di)/(1+exp(di)) ; gi2 = LFP - Pi2 ? These are the terms that are used to compute R and C. ; ci = gi2*gi2*alpha*lambda ; ri = gi2*gi1$ MATRIX ; C = 1/n*Z2'[ci]X1 ; R = 1/n*Z2'[ri]X1 ; A = C*V1*C' - R*V1*C' - C*V1*R' ; V2S = V2+V2*A*V2 ; V2s = 1/N*V2S $ ? Compute matrix products and report results Matrix ; Stat(Theta2,V2s,Z2)$ Part 19: MLE Applications Estimated Equation 1: E[Kids] +---------------------------------------------+ | Poisson Regression | | Dependent variable KIDS | | Number of observations 753 | | Log likelihood function -1123.627 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 3.34216852 .24375192 13.711 .0000 WA -.06334700 .00401543 -15.776 .0000 42.5378486 WE -.02572915 .01449538 -1.775 .0759 12.2868526 INCOME .06024922 .02432043 2.477 .0132 2.30805950 WIFEINC -.04922310 .00856067 -5.750 .0000 2.95163126 19-55/56 Part 19: MLE Applications Two Step Estimator +---------------------------------------------+ | Multinomial Logit Model | | Dependent variable LFP | | Number of observations 753 | | Log likelihood function -351.5765 | | Number of parameters 7 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Characteristics in numerator of Prob[Y = 1] Constant 33.1506089 2.88435238 11.493 .0000 WA -.54875880 .05079250 -10.804 .0000 42.5378486 WE -.02856207 .05754362 -.496 .6196 12.2868526 HA -.01197824 .02528962 -.474 .6358 45.1208499 HE -.02290480 .04210979 -.544 .5865 12.4913679 INCOME .39093149 .09669418 4.043 .0001 2.30805950 LAMBDA -5.63267225 .46165315 -12.201 .0000 1.59096946 With Corrected Covariance Matrix +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ Constant 33.1506089 5.41964589 6.117 .0000 WA -.54875880 .07780642 -7.053 .0000 WE -.02856207 .12508144 -.228 .8194 HA -.01197824 .02549883 -.470 .6385 HE -.02290480 .04862978 -.471 .6376 INCOME .39093149 .27444304 1.424 .1543 LAMBDA -5.63267225 1.07381248 -5.245 .0000 19-56/56 Part 19: MLE Applications

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics