Download Simultaneous equations regression model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
SIMULTANEOUS EQUATIONS REGRESSION MODEL
INTRODUCTION
The classical linear regression model, general linear regression model, seemingly unrelated regressions
model make the following assumption:
The error term is uncorrelated with each explanatory variable.
If this assumption is violated, then the OLS, FGLS and SUR estimators produce biased estimates in small
samples, and inconsistent estimates in large samples.
SOURCES OF CORRELATION BETWEEN THE ERROR TERM AND EXPLANATORY VARIABLE
The most important sources of correlation between the error term and an explanatory variable are omitted
confounding variables and reverse causation.
Omitted Confounding Variable
Consider the following wage equation,
Y = β1 + β2X + μ,
Y is the worker’s wage, X is the worker’s years of education, and μ is the error term. We want to analyze
the effect of education on the wage. Let Z be the worker’s innate ability. Since we omit Z from the
equation its effect is included in μ. Workers with more innate ability have higher wages, and therefore
larger errors. Also, workers with more innate ability have more education, and therefore higher values of
X. Thus, the error term and education are positively correlated. The OLS, FGLS, and SUR estimators
will include the effect of innate ability in the estimate of β2. This results in a biased estimate of the effect
of education on the wage.
Reverse Causation
Consider the following simple Keynesian model of income determination comprised of two equations: a
consumption function, and equilibrium condition
C=a+bY+
Y=C+I
C is aggregate consumption; Y is aggregate income; I is exogenous investment; a and b are parameters;
and  is an error term that summarizes all factors other than Y that influence C (e.g., wealth, interest rate).
Now, suppose that  increases. This will directly increase C in the consumption function. However, the
equilibrium condition tells us that the increase in C will increase Y. Therefore,  and Y are positively
correlated. The OLS, FGLS, and SUR estimators will produce a biased estimate of the effect of income
on consumption because it will capture the reverse effect of consumption on income.
OBJECTIVE
1
In this section of the course, we will examine statistical models that assume the error term is correlated
with an explanatory variable. This can result from either an omitted confounding variable or reverse
causation. We will spend most of our time on the simultaneous equations model. This model assumes
that the error term is correlated with an explanatory variable because of reverse causation. However, the
estimators we develop for the simultaneous equations model can be used for any model for which the
error term is correlated with an explanatory variable because of an omitted confounding variable.
INTRODUCTION TO THE SIMULTANEOUS EQUATIONS MODEL
When a single equation is embedded in a system of simultaneous equations, at least one of the right-hand
side variables will be endogenous, and therefore the error term will be correlated with at least one of the
right-hand side variables. In this case, the true data generation process is not described by the classical
linear regression model, general linear regression model, or seemingly unrelated regression model; rather,
it is described by a simultaneous equations regression model. If you use the OLS estimator, FGLS
estimator, SUR estimator, or ISUR estimator, you will get biased and inconsistent estimates of the
population parameters.
Definitions and Basic Concepts
Endogenous variable – a variable whose value is determined within an equation system. The values of
the endogenous variables are the solution of the equation system. More generally, any variable
that is correlated with the error term.
Exogenous variable – a variable whose value is determined outside an equation system. More generally,
any variable not correlated with the error term.
Structural equation – an equation that has one or more endogenous right-hand side variables.
Reduced form equation – an equation for which all right-hand side variables are exogenous.
Structural parameters – the parameters of a structural equation.
Reduced form parameters – the parameters of a reduced form equation.
THE IDENTIFICATION PROBLEM
Before you estimate a structural equation, you must first determine if it is identified. An equation is
identified if you have enough information to get meaningful estimates of its parameters. A meaningful
estimate is one that has a useful interpretation. An equation is not identified if you don’t have enough
information to get meaningful estimates of its parameters. If an equation is not identified, then estimating
its parameters is meaningless. This is because the estimates you obtain will have no useful interpretation.
Example
You want to estimate the price elasticity of demand for a good. You collect annual data on price (P) and
quantity bought and sold (Q) for the period 1980 to 2015. You estimate the following equation,
lnQ = γ1 + γ2 lnP + μ
where ln designates natural logarithm. The problem is that this equation does not have an identity. It can
be either a demand equation, supply equation, or some combination of both. Therefore, γ2 might measure
the price elasticity of demand, price elasticity of supply, or some combination of both. Running a
regression of lnQ on lnP has no useful interpretation.
Classifying Structural Equations
2
Every structural equation can be placed under one of three categories.
Unidentified Equation. Not enough information to get a meaningful estimate.
Exactly Identified Equation. Just enough information to get a meaningful estimate.
Overidentified Equation. More than enough information to get a meaningful estimate.
Exclusion Restrictions
The most often used way to identify a structural equation is to use prior information provided by
economic theory to exclude certain variables from an equation that appear in a model. This is called
obtaining identification through exclusion restrictions. To exclude a variable from a structural equation,
we restrict the value of its coefficient to zero. This type of zero fixed value restriction is called an
exclusion restriction because it has the effect of omitting a variable from the equation to obtain
identification.
Rank and Order Condition for Identification
Exclusion restrictions are most often used to identify a structural equation in a simultaneous equations
model. When using exclusion restrictions, you can use two general rules to check if identification is
achieved. These are the rank condition and the order condition. The order condition is a necessary but
not sufficient condition for identification. The rank condition is both a necessary and sufficient condition
for identification. Because the rank condition is more difficult to apply, many economists only check the
order condition and gamble that the rank condition is satisfied. This is usually, but not always the case.
Order Condition
The order condition is a simple counting rule that you can use to determine if one structural equation in a
system of linear simultaneous equations is identified. Define the following:
G = total number of endogenous variables in the model (i.e., in all equations that comprise the
model).
K = total number of variables (endogenous and exogenous) excluded in the equation being
checked for identification.
The order condition is as follows.
If
If
If
K=G–1
K>G–1
K<G–1
the equation is exactly identified
the equation is overidentified
the equation is unidentified
SPECIFICATION OF A SIMULTANEOUS EQUATIONS MODEL
A simultaneous equation regression model has two alternative specifications: reduced form and structural
form. The reduced-form specification is comprised of M reduced-form equations and a set of
assumptions about the error terms in the reduced form equations. The reduced-form specification of the
model is usually not estimated, because it provides limited information about the economic process in
which you are interested. The structural-form specification is comprised of M structural equations and a
set of assumptions about the error terms in the structural equations. The structural-form specification of
the model is the specification most often estimated. This is because it provides more information about
the economic process in which you are interested.
3
Specification of the Structural Form
A set of assumptions defines the specification of the structural form of a simultaneous equations
regression model. The key assumption is that the error term is correlated with one or more explanatory
variables. There are several alternative specifications of the structural form of the model depending on the
remaining assumptions we make about the error term. For example, if we assume that the error term has
non-constant variance, then we have a simultaneous equation regression model with heteroscedasticity. If
we assume the errors in one or more equations are correlated, then we have a simultaneous equation
regression model with autocorrelation.
ESTIMATION
Single Equation Vs System Estimation
Two alternative approaches can be used to estimate a simultaneous equation regression model are single
equation estimation and system estimation.
Single Equation Estimation
Single equation estimation involves estimating either one equation in the model, or two or more equations
in the model separately. For example, suppose you have a simultaneous equation regression model that
consists of two equations: a demand equation and a supply equation. Suppose your objective is to obtain
an estimate of the price elasticity of demand. In this case, you might estimate the demand equation only.
Suppose your objective is to obtain estimates of price elasticity of demand and price elasticity of supply.
In this case, you might estimate the demand equation by itself and the supply equation by itself.
System Estimation
System estimation involves estimating two or more equations in the model jointly. For instance, in the
above example you might estimate the demand and supply equations together. You might do this even if
your objective is to obtain an estimate of the price elasticity of demand only.
Advantages and Disadvantages of the Two Approaches
The major advantage of system estimation is that it uses more information, and therefore results in more
precise parameter estimates. The major disadvantages are that it requires more data and is sensitive to
model specification errors. The opposite is true for single equation estimation.
SINGLE EQUATION ESTIMATION
If the error term is correlated with an explanatory variable, then we cannot find an estimator that is
unbiased in small samples. This means we must look for an estimator that has desirable large sample
properties. We will consider 4 single equation estimators.
1.
2.
3.
4.
Ordinary least squares (OLS) estimator
Instrumental variables (IV) estimator
Two-stage least squares (2SLS) estimator
Generalized method of moments (GMM) estimator
ORDINARY LEAST SQUARES (OLS) ESTIMATOR
4
The OLS estimator is given by the rule: OLS^ = (XTX)-1XTy
Properties of the OLS Estimator
If the error term is correlated with an explanatory variable, then the OLS estimator is biased in small
samples, and inconsistent in large samples. It does not produce maximum likelihood estimates. Thus, it
has undesirable small and large sample properties.
Role of OLS Estimator
The OLS estimator should be used as a preliminary estimator. You should initially estimate the equation
using the OLS estimator. Then estimate the equation using a consistent estimator. Then compare the
OLS estimate and consistent estimate of a parameter to determine the possible direction of the bias. This
is because a consistent estimator will have a smaller bias than the inconsistent OLS estimator in any finite
sample.
INSTRUMENTAL VARIABLES (IV) ESTIMATOR
The IV estimator involves the following two-step procedure.
1. Find one instrumental variable for each right-hand side variable in the equation to be estimated. A
valid instrumental variable has two properties:
1. Instrument relevance. It is correlated with the variable for which it is to serve as an
instrument.
2. Instrument exogeneity. It is not correlated with the error term in the equation to be estimated.
2. Apply the following formula to the sample data:  IV^ = (ZTX)-1ZTy . Where X is the TxK data matrix
for the original right-hand side variables; Z is the TxK data matrix for the instrumental variables; y is
the Tx1 column vector of observations on the dependent variable in the equation to be estimated.
Comments
Each exogenous right-hand side variable can serve as its own instrumental variable. This is because it is
perfectly correlated with itself and is not correlated with the error term by assumption of exogeneity. The
best candidates to be an instrumental variable for an endogenous right-hand side variable in the equation
to be estimated are exogenous variables that appear in other equations in the model. This is because they
are correlated with the endogenous variables in the model via the reduced-form equations, but they are
not correlated with the error term in any equation. Oftentimes there will exist more than one exogenous
variable that can serve as an instrumental variable for an endogenous variable. In this case, you can do
one of two things. 1) Use as your instrumental variable the exogenous variable that is most highly
correlated with the endogenous variable. 2) Use as your instrumental variable the linear combination of
candidate exogenous variables most highly correlated with the endogenous variable. As we will see later,
if we do this we have a more general type of IV estimator called the two-stage least squares estimator.
Relationship Between the IV Estimator and Identification
The following relationship exists between the IV estimator and identification.

If the equation is exactly identified, then there are exactly enough exogenous variables excluded from
the equation to serve as instrumental variables for the endogenous right-hand side variable(s).
5


If the equation is overidentified, then there are more than enough exogenous variables excluded from
the equation to serve as instrumental variables for the endogenous right-hand side variable(s).
If the equation is unidentified, then there are not enough exogenous variables excluded from the
equation to serve as instrumental variables for the endogenous right-hand side variable(s). In this
case, the IV estimator cannot be used.
Properties of the IV Estimator





Like all estimators, it is biased in finite samples.
It is consistent in large samples.
It is not necessarily asymptotically efficient. This is because an endogenous variable can have more
than one instrumental variable. Each instrumental variable results in a different IV estimator. The
higher the correlation between the endogenous variable and the instrumental variable, the more
efficient the IV estimator.
If there is heteroscedasticity, then the IV estimator is not efficient in the class of consistent
estimators and the estimated standard errors are biased and inconsistent.
It is not the maximum likelihood estimator.
TWO-STAGE LEAST SQUARES (2SLS) ESTIMATOR
The 2SLS estimator is a generalization of the IV estimator. It reduces to the IV estimator if the equation
is exactly identified.
2SLS Rule
The 2SLS estimator is given by the rule,
^2sls = (XTPX)-1XTPy
where P = Z(ZTZ)-1ZT is called the projection matrix
Note that Z is now a TxI matrix, where I is the number of instruments (identifying and other). If the
equation is exactly identified, then I = K. If the equation is overidentified, then I > K. If the error term
has constant variance and the errors are uncorrelated, then the variance-covariance matrix of estimates is,
cov( ^2sls) = σ2(XTPX)-1
The estimated variance-covariance matrix replaces unknown σ2 with the estimate σ2 = RSS/T. Default
2SLS in Stata uses T-k because it believes this is a better approximation in finite samples.
Asymptotically, they are equivalent.
Two-Stage Implementation of Rule
This estimator can be implemented by using two successive applications of the OLS estimator. This twostage procedure is as follows.
Stage #1: Regress each right-hand side endogenous variable in the equation to be estimated on all
exogenous variables in the simultaneous equation model using the OLS estimator. Calculate the fitted
values for each of these endogenous variables.
Stage #2: In the equation to be estimated, replace each endogenous right-hand side variable by its fitted
value variable. Estimate the equation using the OLS estimator.
6
Comments




Stage 1 is identical to estimating the reduced-form equation for each endogenous right-hand side
variable in the equation to be estimated.
The exogenous variables in the stage 1 regression are the instruments. They can be placed under two
categories. 1) Identifying instruments. 2) Other instruments. An identifying instrument is any
exogenous variable that has been excluded from an equation to identify it. Other instruments are
exogenous variables included in the equation that serve as instruments for themselves.
The fitted value variable from the stage 1 regression is the linear combination of instruments that has
the highest correlation with the endogenous explanatory variable in the structural equation. At least
one identifying instrument must be partially correlated with the endogenous explanatory variable; if
not, then the fitted value variable will be perfectly correlated with the exogenous variables included in
the stage 2 regression and the 2SLS estimator cannot be used.
The estimated standard errors obtained from the stage 2 regression are incorrect and must be
corrected. This is because the estimate of σ2 = RSS/(T-k) which uses RSS from the second stage
estimate is wrong. We need to use RSS from the estimated structural equation. Statistical programs
that have a 2SLS procedure make this correction automatically and report the correct standard errors.
Logic of 2SLS Estimator
Suppose Y is the dependent variable, X is the endogenous right-hand side variable, μ is the error term,
and Z is instrumental variable. We can decompose the variation in X into 2 parts. One part is correlated
with μ. The other part is uncorrelated with μ. To get an unbiased estimate of the effect of X on Y, we need
to use the variation in X that is uncorrelated with μ, and eliminate the variation in X that is correlated with
μ. To capture the variation in X that is uncorrelated with μ, we use an instrumental variable, Z, that is
correlated with X, but uncorrelated with μ. For Z to perform this function, it must be relevant and
exogenous. If it is not relevant, then it is not correlated with X, and therefore it does not capture variation
in X. If it is not exogenous, then it is correlated with μ, and therefore it captures variation in X that is
correlated with μ. How does the 2SLS estimator capture the variation in X uncorrelated with μ, and
disregard the variation in X correlated with μ? The stage 1 regression can be written as: X = π0 + π0Z + νt.
This regression decomposes the variation in X into 2 parts. 1) The systematic component π0 + π0Z
captures the variation in X explained by Z, but not explained by μ. This is because Z is correlated with X
but uncorrelated with μ. The error term ν captures the variation in X explained by μ and any additional
factors other than Z. However, the true values π0 + π0Z are unknown because the parameters π0 and π0 are
unknown. Therefore, we use the predicted values X^ = π0^ + π0^Z from a regression of X on Z using OLS.
The stage 2 regression can be written as: Yt = α + βXt^ + εt. OLS yields a consistent estimate of β,
because Xt^ is not correlated with the error term μt. Note that εt = Yt - α - βXt^, while μt = Yt - α - βXt. To
obtain a correct estimate of the standard error of the estimate, we must use the residuals μt^ = Yt – α^ β^Xt. Statistical programs with a 2SLS command calculate these residuals for you.
Properties of the 2SLS Estimator





Like all estimators, it is biased in finite samples.
It is consistent in large samples.
If there is no heteroscedasticity or autocorrelation, then it is asymptotically efficient.
If there is hetero or auto, then it is not asymptotically efficient and the estimated standard errors are
inconsistent. To get consistent estimates, you can use White robust standard errors.
It is not the maximum likelihood estimator.
7
2SLS vs OLS
If an explanatory variable is correlated with the error term, the OLS estimator is biased and inconsistent.
OLS has a smaller variance than the 2SLS. If you compare the OLS and 2SLS standard errors and tstatistics, OLS tends to have smaller standard errors and bigger t-statistics. The 2SLS estimator is
consistent regardless of whether or not the error term is correlated with an explanatory variable. But if
the error term is not correlated with an explanatory variable, then you should use OLS because it has a
smaller variance than 2SLS and will produce more precise estimates.
GENERAL METHOD OF MOMENTS ESTIMATOR
The general method of moments estimator is a generalization of the 2SLS and IV estimators. If the error
term has constant variance and the errors are uncorrelated, then the GMM estimator reduces to the 2SLS
estimator if the equation is overidentified and the IV estimator if the equation is exactly identified.
Logic of GMM Estimator
Assume that the instruments (identifying and other) are not correlated with the error term. If this is valid,
then Cov(Z,u) = E[Z(Y – Xβ)] = 0 in the population. This results in I moments or orthogonality
conditions, one for each instrumental variable. GMM imposes this restriction on the sample. This yields
a system of I equations with K unknown parameters . The expectations operator E[∙] for the population
is replaced by the average operator N-1 ∑T t=1 for the sample. If the structural equation is exactly
identified (I = K), then the number of instruments is exactly equal to the number of right-hand side
variables, the number equations is equal to the number of unknown parameters, and there is a unique
solution for . In this case GMM reduces to IV. However, if the equation is overidentified (I > K), then
the number of instruments is greater than the number of right-hand side variables, the number of
equations is greater than the number of unknown parameters, and there is not a unique solution for  . In
this case, weights can be applied to the instruments to find a unique solution. These weights are the
elements of a weighting matrix, designate this M. M is an IxI matrix .
GMM Estimator Rule
The GMM estimator is given by the rule,
^GMM = (XTZMZTX)-1XTZMZTy
There is a different GMM estimator for each possible weighting matrix M. The optimal weighting matrix
is the one that produces asymptotically efficient estimates. This is given by,
M = [(1/T) (ZTWZ)]-1
where T is the sample size and W is the TxT variance-covariance matrix of errors. To obtain an estimate
of M we need to estimate the elements of W. Assume the errors are uncorrelated but we have
heteroscedasticity of unknown form. The elements on the principal diagonal are the unknown variances
of the T observations. The elements off the principal diagonal are zero by the assumption of no
autocorrelation. To estimate the T unknown variances, we use the T squared residuals. This will produce
a consistent estimate of M. This is because M is an IxI matrix, where I is the number of instruments
(identifying and other). We can get a consistent estimate of an IxI matrix with T observations, because
the elements of Z are known numbers (data for the instruments). The most common way to implement
8
this GMM estimator is to use the following two-step procedure. This is called the two-step GMM
estimator.
Two-Step GMM Estimator
Step #1: Estimate the equation using 2SLS. Save the residuals. Square the residuals. Use the squared
residuals to obtain an estimate of W. Use the estimate of W to obtain an estimate of M.
Step #2: Apply the GMM estimator rule:  ^GMM = (XTZMZTX)-1XTZMZTy
Properties of the GMM Estimator




Like all estimators, it is biased in small samples.
It is consistent in large samples.
It is asymptotically efficient and produces consistent estimates of standard errors. If there is hetero, it
produces more efficient estimates than 2SLS.
If there is no hetero, then the GMM estimator reduces to the 2SLS estimator.
Testing for Heteroscedasticity
Suppose that we want to test the null hypothesis of no hetero for one structural equation in a system of m
structural equations. If the remaining m – 1 structural equations have no hetero, then you can use the
White test. However, if any of these other structural equations have hetero, then the White test is not
valid. This is true even if we don’t estimate these other structural equations. In this case, the appropriate
test is a Pagan-Hall test. The test statistic has an approximate chi-square distribution with degrees of
freedom equal to the number of instruments I (identifying and other) in the equation, PH statistic ~ χ2(I)
CHECKING FOR VALIDITY OF INSTRUMENTS
For the IV, 2SLS, and GMM estimators to have desirable properties, the instruments must be relevant and
exogenous. We can use the sample data to check instrument relevance if there is only one endogenous
explanatory variable. (There are more complicated methods if you have two or more endogenous
variables). We can also test the hypothesis of exogeneity if we have enough information in the sample.
We have enough information if the equation is overidentified.
Checking Instrument Relevance
The instruments can be either irrelevant or relevant. If they are relevant, they can vary from weak to
strong. We can think of the strength of instruments as a continuum.
Irrelevant
→
Weak
→
Strong
If the instruments are irrelevant, then they are not correlated with the endogenous explanatory variable.
This is typically not the case in practice. The higher the correlation the stronger the instruments.
Irrelevant or weak instruments cause two problems. 1) The 2SLS estimator is still consistent, but it can
have a large bias in finite samples. It can produce estimates that are worse than the OLS estimator. 2)
Hypothesis tests are not valid.
To check the strength of identifying instruments, calculate the F-statistic for the null hypothesis
the identifying instruments have no joint effect in the first-stage regression. The bigger (smaller) the Fstatistic, the stronger (weaker) the instruments. A larger F-statistic indicates that the instruments contain
9
more information about the endogenous variable. How big must the F-statistic be for the instruments to
be sufficiently strong? There is no specific answer to this question, only rules-of-thumb. Stock and
Watson show that the mean of the sampling distribution of the 2SLS estimator in large samples is
approximately:
(βOLS – β)
1
^2SLS
E(β
) = β + ———— = β + (βOLS – β) ————
[E(F) – 1]
[E(F) – 1]
where βOLS is the OLS estimator, (βOLS – β) is the bias in the OLS estimator, and E(F) is the expected
value of the F-statistic. Note that the expression 1 / [E(F) – 1] is the bias in β^2SLS relative to βOLS. The
larger (smaller) the F-statistic, the smaller (larger) the bias in β^2SLS relative to the βOLS. For example, if F
=2 then, 1 / [E(F) – 1] = 1 / (2 – 1) = 1. In this case, the bias in β^2SLS is the same as the bias in βOLS. If F
= 3 then, 1 / [E(F) – 1] = 1 / (3 – 1) = ½. In this case, the bias in β^2SLS is one-half the bias in βOLS. If F =
11, then 1 / [E(F) – 1] = 1 / (11 – 1) = 1/10. In this case, the bias in β^2SLS is one-tenth the bias in βOLS.
Some econometricians believe that a bias of about 10% or less is small enough to be acceptable in most
applications, but this is only a rule-of-thumb.
Checking Instrument Exogeneity
If any instrument is correlated with the error term, then it is not exogenous. If it is not exogenous, then
IV, 2SLS, and GMM will be inconsistent. We cannot test whether an instrument is correlated with the
error term if the equation is exactly identified because we don’t have enough information. We can test
whether the instruments are correlated with the error term if the equation is overidentified because we
have sufficient information. To test for exogeneity of instruments, we do a test of overidentifying
restrictions. This test is discussed below.
SYSTEM ESTIMATORS
A system estimator can be use to estimate two or more equations in a simultaneous equations model
together. It uses more information than a single equation estimator (e.g., contemporaneous correlation
among the error terms across equations, cross-equation restrictions, etc.), and therefore will produce more
precise estimates. We will consider 2 system estimators.
1. Three-stage least squares (3SLS) estimator
2. Iterated three-stage least squares (I3SLS) estimator
THREE-STAGE LEAST SQUARES (3SLS) ESTIMATOR
The 3SLS estimator involves the following 3 stage procedure.
1. Same as stage 1 in 2SLS.
2. Same as stage 2 in 2SLS
3. Apply the SUR estimator.
ITERATED THREE-STAGE LEAST SQUARES (I3SLS) ESTIMATOR
The 3SLS estimator involves the following 3 stage procedure.
1. Same as stage 1 in 2SLS
2. Same as stage 2 in 2SLS
3. Apply the ISUR estimator.
10
Properties of the 3SLS I3SLS Estimators
If the error term is correlated with one or more explanatory variables, then the 3SLS and I3SLS estimators
are biased in small samples. However, if there is no heteroscedasticity, then they are both consistent and
asymptotically more efficient than single equation estimators. Even though they have the same
asymptotic properties, there estimates can differ in small samples. There is an ongoing debate about
whether I3SLS or 3SLS produces better estimates in small samples. If there is heteroscedasticity, then
both 3SLS and I3SLS produce inconsistent estimates of the parameters and they should not be used.
Major Shortcoming of the 3SLS and I3SLS Estimators
If there is heteroscedasticity, then both 3SLS and I3SLS are inconsistent. Many economists choose not to
use either of these with cross-section data, because with cross-section data the error term oftentimes has
non-constant variance. In this case, these estimators can produce poor estimates.
HYPOTHESIS TESTING
The small sample t-test and F-test cannot be used for a simultaneous equations model. This is because if
the error term is correlated with one or more explanatory variables, we don’t know the sampling
distributions of the t-statistic and F-statistic. The following large sample tests can be used: 1) Asymptotic
t-test. 2) Approximate F-test. 3) Wald test. 4) Lagrange multiplier test. Note that because the IV, 2SLS,
GMM, 3SLS, and I3SLS estimators do not produce maximum likelihood estimates, the likelihood ratio
test cannot be used to test hypotheses.
SPECIFICATION TESTING
A specification test uses the sample data to test an assumption that defines the specification of the model.
Two important specification tests for simultaneous equation regression models are:
1. Test of Exogeneity
2. Test of overidentifying restrictions
We will implement these tests using a single equation estimation procedure.
TEST OF EXOGENEITY
This is a test of whether one or more right-hand side variables are exogenous against the alternative they
are endogenous. It is also a test of whether the OLS estimator is biased against the alternative it is
unbiased.
Notation
Designate the equation to be estimated and the identifying instruments as
Y = a + bY1 + cX + ;
Z = identifying instruments
Where Y is the dependent variable; Y1 is a vector of one or more right-hand side variables that you
believe may or may not be exogenous; X is a vector of right-hand side variables you believe are
exogenous; a is the intercept; b and c are vectors of slope coefficients attached to the variables in Y1 and
11
X, respectively; and Z is a vector of exogenous variables that are excluded from this equation, and
therefore are used as identifying instruments for the endogenous variable(s) in Y1; and  is the error term.
Hausman Test
The most often used test of exogeneity is the Hausman test. It is also used to test if the OLS estimator is
biased. The Hausman test is based on the following methodology. Let Y1 be interpreted more generally
as a vector that contains one or more variables that you believe may be correlated with the error term .
The null and alternative hypotheses are as follows:
H0: Y1 and  not correlated (Y1 is exogenous).
H1: Y1 and  correlated
(Y1 is endogenous).
To test the null hypothesis that Y1 and  are not correlated, we proceed as follows.
1. Compare the OLS and 2SLS estimators. OLS is a consistent estimator if the null hypothesis is true
but inconsistent if the null hypothesis is false. 2SLS is a consistent estimator if the null hypothesis is
true or false.
2. If the null hypothesis is true, then both estimators should produce similar estimates. If the null
hypothesis is false, then the two estimators should produce significantly different estimates. Thus, to
test the null hypothesis you test the equality of the estimates produced by the two estimators.
3. If the estimates produced by the two estimators are significantly different, then you reject the null
hypothesis and conclude that the sample provides evidence that Y1 is correlated with  in the
population. If the parameter estimates produced by the two estimators are not significantly different,
then you accept the null hypothesis and conclude that Y1 is not correlated with  in the population.
If the vector Y1 contains one variable, then you are testing whether a single right-hand side variable is
exogenous. If the vector Y1 contains two or more variables, then we are testing whether two or more
right-hand side variables are jointly exogenous.
If we are testing whether the OLS estimator produces biased estimates, then we interpret the null and
alternative hypotheses as follows.
H0: OLS is unbiased
H1: OLS is biased
Interpretation of Hausman Test
If we reject the null hypothesis, then you have evidence that Y1 is correlated with , and therefore Y1 is
endogenous. However, we cannot conclude with certainty what causes the correlation between X and .
It may be reverse causation, or an omitted confounding variable, or both.
If we reject the null hypothesis, we have also found evidence that OLS is biased relative to 2SLS.
If we accept the null, this suggests that the OLS may not be biased. This may be the case if Y1 is not
correlated with  or weakly correlated with . In this case, we may want to use OLS. This is because
OLS is more efficient than 2SLS, and therefore may produce estimates that are closer to the true values of
the parameters.
Implementation of the Hausman Test
12
The easiest way to implement the Hausman test is to use Wu’s approach. This involves the following
steps.
1. Regress each variable in Y1 on all variables in X and Z (all exogenous variables in the model) using
the OLS estimator. This is the stage 1 regression(s) of 2SLS.
2. Save the residuals from each of these regressions. Denote this vector of residuals ^. The residuals
from each regression in step #1 is a “residual variable”.
3. Estimate the following regression equation using the OLS estimator:
Y = a + bY1 + cX + d^ + v
Where d denotes the vector of coefficients attached to the residual variables.
4. Test the following null and alternative hypotheses:
H0: d = 0
H1: d  0
(Y1 is exogenous; OLS is unbiased)
(Y1 is endogenous; OLS is biased)
5. If there is one variable in Y1, and therefore one residual variable in ^ and one coefficient in d, then
this hypothesis can be tested using a t-test. If there is more than one variable in Y1, and therefore
more than one residual variable in ^ and more than one coefficient in d, then this hypothesis can be
tested using a F-test.
Logic of Wu’s Approach
The structural equation is
Y = a + bY1 + cX + 
We want to test if Y1 is correlated with . The first-stage regression is
Y1 = α1 + α2X + α3Z + 
Because Y1 depends upon , Y1 is correlated with . Y1 is uncorrelated with  if  is uncorrelated with .
Write  = d + v. If d = 0 then  and  are uncorrelated, and therefore  and Y1 are uncorrelated. If d  0
then they are correlated. We can substitute the expression for  into the structural equation and rewrite it
Y = a + bY1 + cX + d + v
If d=0 then there is no evidence that Y1 is correlated with . If d  0 then there is evidence that Y1 is
correlated with . To do the test, we use the residuals as an estimate of the unknown errors.
TEST OF THE OVERIDENTIFYING RESTRICTIONS
It is possible to test the overidentifying restrictions for a single equation in a system of equations. When
we test the overidentifying restrictions, we are testing whether the variables that you excluded to get
identification can be validly excluded, or whether at least one should be included in the equation.
Therefore, we are testing the following null and alternative hypotheses,
H0: Overidentifying restrictions are valid.
H1: Overidentifying restrictions are not valid.
13
An alternative interpretation of the null and alternative hypotheses is,
H0: The instruments are exogenous (not correlated with the error term).
H1: At least one instrument is endogenous (correlated with the error term).
We cannot test the identifying restriction(s) for an equation that is exactly identified because you don’t
have enough information to conduct the test.
Notation
Designate the equation to be estimated before the identifying instruments are excluded as
Y = a + bY1 + cX + dZ + 
Where all variables and parameters have been defined previously. Note that this is the equation before it
is identified, and therefore the variables in the vector Z have not been excluded. The null and alternative
hypotheses can be expressed as follows.
H0: d = 0
H1: d  0
(Z has no effect on Y, and therefore Z is not correlated with μ )
(At least one variable in Z has an effect on Y and therefore is correlated with μ).
If we reject the null and include at least one of the instruments in Z belongs in the equation, then at least
one of the instruments is endogenous and correlated with the error term because its effect is included in
the error term.
Sargan Lagrange Multiplier Test
The easiest way to test the null of hypothesis that the overidentifying restrictions are valid is to use a
Lagrange multiplier test. This is called a Sargan Test. The test statistic and sampling distribution for this
test are
LM = TR2 ~ 2(Z – Y1)
Where T is the sample size; R2 is the uncentered R-squared statistic from an auxiliary regression; 2 is the
chi square distribution with Z – Y1 degrees of freedom, where Z is the number of variables excluded from
the equation and Y1 is the number of endogenous right-hand side variables in the equation (this difference
is equal to the number of overidentifying restrictions).
Calculating the LM Test Statistic
To calculate the LM test statistic, we need to estimate the restricted model without the variables in Z. We
then use information obtained from the restricted model to run an auxiliary regression to obtain the
uncentered R2 statistic. This two-step approach is as follows.
1. Estimate the following restricted model using the 2SLS estimator
Y = a + bY1 + cX + 
Use as instruments for Y1 all variables in the vectors X and Z.
2. Save the residuals from this regression. Denote the residual variable as ^.
14
3. Regress the residual variable, ^, on all the variables in X and Z using the OLS estimator; that is,
estimate the following equation using the OLS estimator
^ = X + Z + v
4. Use this regression to calculate the LM test statistic.
Notes about the Test of Overidentifying Restrictions
1. A sufficiently high R2 statistic indicates that one or more of the variables in Z is correlated with the
residuals ^. What determines sufficiently high? We compare the LM test statistic to a critical value
for a given level of significance. This provides evidence that at least one variable in Z is correlated
with the error term  and should be included in the restricted model. This variable may be correlated
with the error term either because it has a direct effect on Y or it is correlated with another variable in
μ that has an effect on Y.
2. If you reject the null hypothesis, then you are rejecting the overidentifying restrictions. This casts
doubt on the identifying restrictions. This is because the overidentifying restrictions cannot be
separated from the identifying restrictions.
3. If you reject the overidentifying restrictions, the test gives you no guidance about what to do next. A
test does not exist that allows you do determine which variable or variables in Z should not be
excluded from the equation being estimated.
Heteroscedasticity
The Sargan test assumes that the error term has constant variance. If this assumption is not valid, then the
Sargin test is not valid. The appropriate test is a Hansen test. The test statistic for the Hansen test is
called the J-statistic. This has an approximate chi-square distribution with degrees of freedom equal to
the number of overidentifying restrictions tested. The Sargan test statistic is a special case of the Jstatistic when there is no hetero. To do a Hansen test, you must estimate the equation using the GMM
estimator.
15