Download Lecture 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Bias of an estimator wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Econometrics
Week 8
Institute of Economic Studies
Faculty of Social Sciences
Charles University in Prague
Fall 2012
1 / 25
Recommended Reading
For the today
Instrumental Variables Estimation and Two Stage Least
Squares
Chapter 15 (pp. 461 – 491).
In the next week
Simultaneous Equations Models
Chapter 16 (pp. 501 – 523).
2 / 25
Today’s talk
We will further study the problem of endogenous
explanatory variables in multiple regression models.
Under omitted variables, OLS is generally inconsistent.
When a suitable proxy variable is found for an unobserved
variable, problem can be solved.
But lot of times, it is difficult to find such a proxy.
We will take a more rigorous approaches to the endogeneity
problem:
Instrumental Variables (IV) estimator.
Two stage least squares (2SLS).
Today, we will show that IV can be used to obtain
consistent estimator in presence of omitted variables.
3 / 25
Why use Instrumental Variables?
Consider a simple regression model:
y = β0 + β1 x + u,
where we think that x and u are correlated:
Cov(x, u) 6= 0.
To obtain consistent estimators for β0 and β1 in this case,
we need a new variable.
Instrumental Variable z
This variable has to satisfy following properties:
(1) z is uncorrelated with u, Cov(z, u) = 0.
(2) z is correlated with x, Cov(z, x) 6= 0.
(1) ⇒ z is exogenous in the regression equation.
(2) ⇒ z must be related to the endogenous variable x.
4 / 25
Valid Instruments
As Cov(z, u) = 0 can never be tested (u is unobservable
error), we have to rely on the economic theory ⇒ use
common sense to decide about exogeneity.
Cov(z, x) 6= 0 can easily be tested by running a simple
regression:
x = π0 + π1 z + ν
Cov(z, x) 6= 0 holds if and only if π1 6= 0.
Thus we should be able to reject the null hypothesis:
H0 : π1 = 0
against the two-sided alternative that HA : π 6= 0
5 / 25
Example: College Education
Consider an equation for returns to college education
among young workers:
wages = β0 + β1 college + u
People freely choose to go to college ⇒ Cov(college, u) 6= 0.
Good instrumental variable thus is on which:
makes going to college more likely (relevance).
does not affect wages directly (exogeneity).
6 / 25
Example: College Education
Good instrumental variables in this case might be:
Distance between pre-college residence and college
Those living in the proximity of college will be more likely
to go to college (relevance)
Pre-college residence is usually the parent’s decision
(exogeneity).
Father’s education
An educated father will tend to inform the child better
about the profits of education (relevance).
Father’s education is father’s decision (exogeneity).
Now suppose we have a valid instrument z, what do we do
with it?
7 / 25
IV Estimation
We use it to consistently estimate the parameters, as
proper IV identifies the β1 parameter as:
Cov(z, y) = β1 Cov(z, x) + Cov(z, u).
As we know that Cov(z, u) = 0 and Cov(z, x) 6= 0 (notice
that is z and x are uncorrelated, this equation fails):
β1 =
Cov(z, y)
.
Cov(z, x)
Cov(z, x) 6= 0 ⇒ z is relevant.
Cov(z, u) = 0 ⇒ z is exogenous.
Hence, β1 is identified and given random sample, we have:
Instrumental Variable (IV) estimator
Pn
(zi − z̄)(yi − ȳ)
β̂1,IV = Pni=1
i=1 (zi − z̄)(xi − x̄)
8 / 25
IV Estimation cont.
Intercept can be estimated as:
β̂0,IV = ȳ − β̂1,IV x̄.
NOTE
When z = x, we have OLS estimator of β1 .
In other words, when x is exogenous, IV estimator is
identical to OLS estimator.
IV estimator is consistent plim(β̂1,IV ) = β1 .
9 / 25
Statistical Inference with IV Estimation
IV estimates are asymptotically normal ⇒ use standard
errors
Usually, we impose homoskedasticity assumption:
E(u2 |z) = σ 2 = V ar(u)
The asymptotic variance of β̂1,IV
Under the homoskedasticity assumption, the asymptotic
variance of the β̂1,IV is:
σ2
.
nσx2 ρ2x,z
where ρ2x,z is the square of the correlation between x and z
10 / 25
Statistical Inference with IV Estimation cont.
This is important as it provides us with the standard errors.
Standard errors of β̂1,IV
The (asymptotic) standard error of β̂1,IV can be estimated as:
s
σ̂ 2
,
2
SSTx Rx,z
where σ̂ 2 can be estimated from the IV residuals, SSTx is total
2 is simple R2 from the
sum of squares of the x and Rx,z
regression of x on z
Resulting standard errors allows us to construct t statistics
for testing the hypotheses about β1 and about confidence
intervals of β1 .
11 / 25
IV versus OLS Estimation
Standard errors in IV case differs from OLS case only in
2 .
the Rx,z
2 < 1, standard errors of IV are always larger than
Since Rx,z
in OLS.
The stronger the correlation between z and x, the smaller
the IV standard errors (in case of 1, it is equivalent to
OLS).
12 / 25
The Effect of Poor Instruments
What happens if Cov(z, u) 6= 0?
IV estimator will be inconsistent.
However, it can still be better than OLS.
Asymptotic bias of IV and OLS estimators
Corr(z, u) σu
.
Corr(z, x) σx
σu
= β1 + Corr(x, u).
σx
plimβ̂1,IV = β1 +
plimβ̂1,OLS
Thus asymptotic bias in IV will be smaller than asymptotic
bias in OLS if:
Corr(z, u)
< Corr(x, u)
Corr(z, x)
13 / 25
IV estimation in the Multiple Regression Case
We can extend the IV estimation to multiple regression.
Let’s start with the case, where only one of the explanatory
variables is correlated with the error:
y1 = β0 + β1 y2 + β2 z1 + u1 .
This is called structural equation where we distinguish
between endogenous and exogenous variables.
y1 is clearly endogenous as it is correlated with u1
z1 is exogenous (uncorrelated with u1 , Cov(z1 , u1 ) = 0).
y2 is endogenous, but suspected of being correlated with
u1 , Cov(y2 , u1 ) 6= 0.
14 / 25
IV estimation in the Multiple Regression Case
We know that OLS estimator will be biased and
inconsistent ⇒ we need to find proper instrumental
variable for y2 , Cov(z2 , u1 ) = 0.
z2 also needs to be correlated with y2 :
y2 = π0 + π1 z1 + π2 z2 + ν2 .
The key identification condition is π2 6= 0.
This reduced form equation regresses the endogenous
variable on all exogenous variables.
15 / 25
Two Stage Least Squares
We may need to have multiple instruments for each
variable, say z2 and z3
In this case we may use more than one IV estimator.
BUT: None of the IV estimators would be efficient.
Since z1 , z2 and z3 are all uncorrelated with u1 , any linear
combination of exogenous variables would be valid IV.
Thus we choose the linear combination that is most highly
correlated with y2 .
This estimator is known as the two stage least squares
(2SLS).
16 / 25
Two Stage Least Squares cont.
Consider following model:
y1 = β0 + β1 y2 + β2 z1 + u1 .
2SLS estimates is obtained in two stages:
Two-stage least squares (2SLS)
(1): Obtain OLS fitted values of endogenous variable:
ŷ2 = πˆ0 + πˆ1 z1 + πˆ2 z2 + πˆ3 z3
(2): y1 = β0 + β1 ŷ2 + β2 z1 + u1
But let STATA do the estimation for you to get the correct
(robust) standard errors.
We can extend to multiple endogenous variables.
BUT, we need at least as many instruments as there are
endogenous variables
(proper conditions statement in Advanced Econometrics
course).
17 / 25
Addressing Errors-in-Variables with IV
Estimation
IV can be used not only to solve the omitted variables
problem, but also measurement error problem.
Recall (from Ch.9) the equation:
y = β0 + β1 x∗1 + β2 x2 + u,
where y and x2 are observed but x∗1 is not.
Instead, we observe x1 = x∗1 + 1 , Cov(x∗1 , 1 ) = 0.
Correlation of x1 and 1 ⇒ biased and inconsistent OLS.
If there is such a z that Corr(z, u) = 0 and
Corr(z, x1 ) 6= 0, IV will remove this bias.
18 / 25
Testing for Endogeneity
When all explanatory variables are exogenous, both OLS
and 2SLS are consistent estimators.
BUT: 2SLS is less efficient than OLS ⇒ OLS is preferred.
If we have endogeneity problem, only IV is consistent.
Thus it is good to have a test for endogeneity (to see if the
2SLS is necessary).
Hausman test for endogeneity
H0 : OLS and IV are consistent.
We simply compute both estimates and use Hausman test
for comparison.
(more about this test during the Advanced Econometrics
course.)
19 / 25
Testing for Endogeneity cont.
Another alternative is to use a regression-based test.
If y2 is endogenous, then ν2 (from the reduced form
equation) and u1 from the structural model will be
correlated.
Regression-based test for endogeneity
y1 = β0 + β1 y2 + β2 z1 + β3 z2 + u1
(1): Regress potentially endogenous variable y2 on all
exogenous variables and obtain residuals ν̂2 :
y2 = π0 + π1 z1 + π2 z2 + π3 z3 + π4 z4 + ν2 .
(2): Run structural model including endogenous variable
and residual ν2 :
y1 = β0 + β1 y2 + β2 z1 + β3 z2 + δ1 ν̂2 + u1
(3): If H0 : δ1 = 0 is rejected against HA : δ1 6= 0 on small
significance level ⇒ Cov(ν2 , u1 ) 6= 0 ⇒ y2 is endogenous.
20 / 25
Testing Overidentification Restrictions
If we have only one instrument for our endogenous
variable, we can not test whether the instrument is
uncorrelated with the error.
We say that model is just identified.
In case of multiple instruments for each endogenous
variable, it is possible to test overidentifying
restrictions to see if some of the instruments are
correlated with the error.
We call the testing for overidentifying restrictions
21 / 25
Testing Overidentification Restrictions cont.
(1): Estimate the structural model using IV and obtain
residuals, û1 .
(2): Regress û1 on all exogenous variables and obtain R2
Test the H0 : all IVs are uncorrelated with u1
a
LM = nR2 ∼ χ2q
where q is the number of instrumental variables from
outside minus the total number of endogenous explanatory
variables.
If we reject the H0 , at least some of the IV are not
exogenous.
22 / 25
Testing for Heteroskedasticity
Heteroskedasticity in 2SLS raises the same issues as with
OLS.
We can use adjusted Breusch-Pagan test.
First, we compute the 2SLS residuals û1
Second, we regress squared residuals û21 on all exogenous
variables z1 , z2 , . . . , zm .
Third, we use the F statistic for joint significance.
The null hypothesis of homoskedasticity is rejected if zj are
jointly significant.
23 / 25
Testing for Serial Correlation
Applying 2SLS to time series data brings the same
considerations as when using OLS (Lectures 2 – 4).
We need a slight adjustment to test for serial correlation.
First, we compute the 2SLS residuals ût
Second, re-estimate the structural model by 2SLS including
the lagged residuals ût−1 , and using the same instruments
as originally
It is possible to use 2SLS on a quasi-differenced model,
using quasi0differenced instruments.
24 / 25
Thank you
Thank you very much for your attention!
25 / 25