Download Chapter 19: Measurement Error and the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 19: Measurement Error and the Instrumental Variables
Estimation Procedure
Chapter 19 Outline
• Introduction to Measurement Error
o What Is Measurement Error?
o Modeling Measurement Error
• The Ordinary Least Squares (OLS) Estimation Procedure and
Dependent Variable Measurement Error
• The Ordinary Least Squares (OLS) Estimation Procedure and
Explanatory Variable Measurement Error
o Summary: Explanatory Variable Measurement Error Bias
o Explanatory Variable Measurement Error: Attenuation (Dilution)
Bias
o Might the Ordinary Least Squares (OLS) Estimation Procedure Be
Consistent?
• Instrumental Variable (IV) Estimation Procedure: A Two Regression
Procedure
o Mechanics
o The “Good” Instrument Conditions
• Measurement Error Example: Annual, Permanent, and Transitory
Income
o Definitions and Theory
o Might the Ordinary Least Squares (OLS) Estimation Procedure
Suffer from a Serious Econometric Problem?
• Instrumental Variable (IV) Approach
o The Mechanics
o Comparison of the Ordinary Least Squares (OLS) and the
Instrumental Variables (IV) Approaches
o “Good” Instrument Conditions Revisited
• Justifying the Instrumental Variable (IV) Estimation Procedure
Chapter 19 Prep Questions
1. Suppose that a physics assignment requires you to measure the amount of time
it takes a one pound weight to fall six feet. You conduct twenty trials in which
you use a very accurate stop watch to measure how long it takes the weight to
fall.
a. Even though you are very careful and conscientious would you expect
the stop watch to report precisely the same amount of time on each
trial? Explain.
2
Suppose that the following equation describes the relationship between the
measured elapsed time and the actual elapsed time:
yMeasuredt = yActualt + vt
yMeasuredt = Measured elapsed time
yActualt
= Actual elapsed time
vt is a random variable. vt represents the random influences that cause your
measurement of the elapsed time to deviate from the actual elapsed time. The
random influences cause you to click the stop watch a little early or a little
late.
b. Recall that you are careful and conscientious in attempting to measure
the elapsed time.
1) In approximately what portion of the trials would you
overestimate the elapsed time; that is, in approximately what
portion of the trials would you expect vt to be positive?
2) In approximately what portion of the trials would you
underestimate the elapsed time; that is, in approximately what
portion of the trials would you expect vt to be negative?
3) Approximately what would the mean (average) of vt equal?
2. Economists distinguish between permanent income and annual income.
Loosely speaking, permanent income equals what a household earns per year
“on average;” that is, permanent income can be thought of as the “average” of
annual income over an entire lifetime. In some years, annual income is more
than its permanent income, but in other years it is less. The difference between
the household’s annual income and permanent income is called transitory
income:
IncTranst = IncAnnt − IncPermt
where
IncAnnt = Households's Annual Income
IncPermt = Household's Permanent Income
IncTranst = Household's Transitory Income
or equivalently,
IncAnnt = IncPermt + IncTranst
Since permanent income equals what a household earns “on average,” the
mean of transitory income equals 0. Microeconomic theory teaches that
households base their consumption decisions on their “permanent” income.
Theory: Additional permanent income increases consumption.
Consider the following model to assess the theory:
Model: Const = β Const + β IncPerm IncPermt + et
Theory: β IncPerm > 0
3
When we attempt to gather data to access this theory, we immediately
encounter a difficulty. Permanent income cannot be observed. Only annual
income data are available to assess the theory. So, while we would like to
specify permanent income as the explanatory variable, we have no choice. We
must use annual disposable income.
a. Can you interpret transitory income as measurement error? Hint: What
is the mean (average) of transitory income?
b. Now, represent transitory income, IncTranst, by ut:
IncAnnt = IncPermt + ut
Express the model in terms of annual income.
c. What is the equation for the new error term?
d. What are the ramifications of using the ordinary least squares (OLS)
estimation procedure to estimate the permanent income coefficient,
βIncPerm, using annual income as the explanatory variable?
Introduction to Measurement Error
Two types of measurement error can be present:
• Dependent variable
• Explanatory variable
We shall argue that dependent variable measurement error does not lead to bias.
On the other hand, whenever explanatory variable measure error exists, the
explanatory variable and error term will be correlated resulting in bias. We
consider dependent variable measurement error first. Before doing so, however,
we shall describe precisely what we mean by measurement error.
What Is Measurement Error?
Suppose that a physics assignment requires you to measure the amount of time it
takes a one pound weight to fall six feet. You conduct twenty trials in which you
use a very accurate stop watch to measure how long it takes the weight to fall.
Question: Will your stop watch report the same amount of time on each trial?
Answer: No. Sometimes reported times will be lower than other reported
times. Sometimes you will be a little premature in clicking the stop watch
button. Other times you will be a little late.
It is humanly impossible to measure the actual elapsed time perfectly. No matter
how careful you are, sometimes the measured value will be a little low and other
times a little high.
4
Modeling Measurement Error
We can model measurement error with the following equation:
yMeasuredt = yActualt + vt
yActualt equals the actual amount of time elapsed and yMeasuredt equals the
measured amount of time. vt represents measurement error. Sometimes vt will be
positive when you are a little too slow in clicking the stop watch button; other
times vt will be negative when you click the button a little too quickly. vt is a
random variable; we cannot predict the numerical value of vt beforehand. What
can we say about vt? We can describe its distribution. Since you are conscientious
in measuring the elapsed time, the mean of vt’s probability distribution equals 0:
Mean[vt] = 0
Measurement error does not systematically increase or decrease the measured
value of yt. The measured value of yt will not systematically overestimate or
underestimate the actual value.
The Ordinary Least Squares (OLS) Estimation Procedure and Dependent
Variable Measurement Error
We begin with the equation specifying the actual relationship between the
dependent and explanatory variables:
Actual Relationship: yActualt = βConst + βxActualxActualt + et
But now suppose that as a consequence of measurement error, the actual value of
the dependent variable, yActualt, is not observable. You have no choice but to use
the measured value, yMeasuredt. Recall that the measured value equals the actual
value plus the measurement error random variable, vt:
yMeasuredt = yActualt + vt
vt is a random variable with mean 0 : Mean[v t ] = 0
Solving for yActualt:
yActualt = yMeasuredt − vt
5
Let us apply this to the actual relationship:
yActualt
= β Const + β xActual xActualt +
et
↓
yMeasuredt − vt
Substituting for yActualt
= β Const
+
β xActual xActualt +
et
Rearranging terms
yMeasuredt
= β Const
+
β xActual xActualt + et + vt
↓
yMeasuredt
= β Const
+
β xActual xActualt +
Letting ε t = et + vt
εt
εt represents the error term in the regression that you will actually be running.
Will this result in bias? To address this issue consider the following question:
Question: Are the explanatory variable, xActualt, and the error term, εt,
correlated?
To answer the question, suppose that the measurement error term, vt, were to
increase:
vt up
ã
é
εt = et + vt
xActualt unaffected
εt up
↔
The value of the explanatory variable, xActualt, is unchanged while the error term,
εt, increases. Hence, the explanatory variable and error term εt are independent;
consequently, no bias should result.
6
Econometrics Lab 19.1: Dependent Measurement Error
Figure 19.1: Dependent Variable Measurement Error Simulation
[Link to MIT-Lab 19.1 goes here.]
We use a simulation to confirm our logic. First, we consider our base case, the no
measurement error case. The YMeas Err checkbox is cleared indicating that no
dependent variable measurement error is present. Consequently, no bias should
result. Be certain that the Pause checkbox is cleared and click Start. After many,
many repetitions, click Stop. The ordinary least squares (OLS) estimation
procedure is unbiased in this case; the average of the estimated coefficient values
and the actual coefficient value both equal 2.0. When no measurement error is
present, all is well.
Now, we shall introduce dependent variable measurement error by
checking the YMeas Err checkbox. The YMeas Var list now appears with 20.0
selected; the variance of the measurement error’s probability distribution, Var[vt],
equals 20.0. Click Start and then after many, many repetitions, click Stop. Again,
the average of the estimated coefficient values and the actual coefficient value
7
both equal 2.0. Next, select from 20.0 to 50.0 to 80.0 from the “YMeas Var” list
and repeat the process.
Sample Size = 10
Type of
Actual Mean (Average) Variance of
Measurement YMeas
Coef
of the Estimated
Estimated
Error
Var
Value
Coef Values
Coef Values
None
2.0
≈2.0
≈1.7
Dep Vbl
20.0
2.0
≈2.0
≈1.8
Dep Vbl
50.0
2.0
≈2.0
≈2.0
Dep Vbl
80.0
2.0
≈2.0
≈2.2
Table 19.1: Dependent Variable Measurement Error Simulation Results
The simulation confirms our logic. Even when dependent variable measurement
error is present, the average of the estimated coefficient values equals the actual
coefficient value. Dependent variable measurement error does not lead to bias.
What are the ramifications of dependent variable measurement error? The
last column of Table 19.1 reveals the answer. As measurement error variance
increases, the variance of the estimated coefficient values and hence the variance
of the coefficient estimate’s probability distribution increases. As the variance of
the dependent variable measurement error term increases, we introduce “more
uncertainty” into the process and hence, the ordinary least squares (OLS)
estimates become less reliable.
The Ordinary Least Squares (OLS) Estimation Procedure and Explanatory
Variable Measurement Error
To investigate explanatory variable measurement error we again begin with the
equation that describes the actual relationship between the dependent and
explanatory variables:
Actual Relationship: yActualt = βConst + βxActualxActualt + et
Now, suppose that we cannot observe the actual value of the explanatory variable;
we can only observe the measured value. The measured value equals the actual
value plus the measurement error random variable, ut:
xMeasured t = xActualt + ut
ut is a random variable with mean 0 : Mean[ut ] = 0
Solving for yActualt:
xActualt = xMeasuredt − ut
8
Now, we apply this to the actual relationship:
yActualt = β Const +
β xActual xActualt
+
et
↓
= β Const +
= β Const
β xActual ( xMeasuredt − ut )
Substituting for xActualt
+
↓
+ β xActual xMeasured t − β xActual ut +
et
Multiplying
et
Rearranging terms
= β Const +
β xActual xMeasuredt
+ et − β xActual ut
↓
yActualt = β Const +
β xActual xMeasuredt
+
Letting ε t = et − β xActual ut
εt
εt is the error term in the regression that we will actually be running.
Recall what we learned about correlation between the explanatory variable
and error term:
Explanatory variable
Explanatory variable Explanatory variable
and error term
and error term
and error term
positively correlated
uncorrelated
negatively correlated
↓
↓
↓
OLS estimation
OLS estimation
OLS estimation
procedure for
procedure for
procedure for
the coefficient value
the coefficient value
the coefficient value
is biased upward
is unbiased
is biased downward
9
Are the explanatory variable, xMeasuredt, and the error term, εt, correlated? The
answer to the question depends on the actual coefficient. Consider the three
possibilities:
• βxActual > 0: When the actual coefficient is positive, negative correlation
exists; consequently, the ordinary least squares (OLS) estimation
procedure for the coefficient value would be biased downward. To
understand why, suppose that ut increases:
ut up
ã
é ε t = et − β xActual ut
βxActual > 0
xMeasuredt = xActualt + ut
xMeasuredt up
εt down
↔
é
ã
Negative Explanatory Variable/Error
Term Correlation
↓
OLS Biased Downward
• βxActual < 0: When the actual coefficient is negative, positive correlation
exists; consequently, the ordinary least squares (OLS) estimation
procedure for the coefficient value would be biased upward. To
understand why, suppose that ut increases:
ut up
ã
é ε t = et − β xActual ut
βxActual < 0
xMeasuredt = xActualt + ut
xMeasuredt up
εt up
↔
é
ã
Positive Explanatory Variable/Error
Term Correlation
↓
OLS Biased Upward
• βxActual = 0: When the actual coefficient equals 0, no correlation exists;
consequently, no bias results. To understand why, suppose that ut
increases:
ut up
ã
é ε t = et − β xActual ut
βxActual = 0
xMeasuredt = xActualt + ut
xMeasuredt up
εt unaffected
↔
é
ã
No Explanatory Variable/Error
Term Correlation
↓
OLS Unbiased
10
Summary: Explanatory Variable Measurement Error Bias
βxActual < 0
βxActual = 0
↓
↓
xMeasuredt and εt are
xMeasuredt and εt are
positively correlated
uncorrelated
↓
↓
OLS estimation
OLS estimation
procedure
procedure
is biased upward
is unbiased
↓
Biased toward 0
βxActual > 0
↓
xMeasuredt and εt are
negatively correlated
↓
OLS estimation
procedure
is biased downward
↓
Biased toward 0
11
Econometrics Lab 19.2: Explanatory Variable Measurement Error
Figure 19.2: Explanatory Variable Measurement Error Simulation
[Link to MIT-Lab 19.2 goes here.]
We shall use a simulation to check our logic. This time we check the XMeas Err
checkbox. The XMeas Var list now appears with 20.0 selected; the variance of the
measurement error’s probability distribution, Var[ut], equals 20.0. Then, we select
various values for the actual coefficient. In each case, click Start and then after
many, many repetitions click Stop. The simulation results are reported in Table
10.2:
12
Sample Size = 40
Type of
Actual
Mean (Average)
Measurement XMeas
Coef
of the Estimated Magnitude
Error
Var
Value
Coef Values
of Bias
Exp Vbl
20.0
2.0
≈1.11
≈.89
Exp Vbl
20.0
1.0
≈.56
≈.44
Exp Vbl
20.0
−1.0
≈−.56
≈.44
Exp Vbl
20.0
0.0
≈.00
≈.00
Table 19.2: Explanatory Variable Measurement Error Simulation Results
The simulation results confirm our logic. When the actual coefficient is
positive and explanatory variable measurement error is present, the ordinary least
squares (OLS) estimation procedure for the coefficient value is biased downward.
When the actual coefficient is negative and explanatory variable measurement
error is present, upward bias results. Lastly, when the actual coefficient is zero, no
bias results even in the presence of explanatory variable measurement error.
Explanatory Variable Measurement Error: Attenuation (Dilution) Bias
The simulations reveal an interesting pattern. While explanatory variable
measurement error leads to bias, the bias never appears to be strong enough to
change the sign of the mean of the coefficient estimates. In other words,
explanatory variable measurement error biases the ordinary least squares (OLS)
estimation procedure for the coefficient value toward 0. This type of bias is called
attenuation or dilution bias.
OLS Estimation Procedure
βxActual < 0
0
βxActual > 0
βxActual
Figure 19.3: Effect of Explanatory Variable Measurement Error
13
Why does explanatory variable measurement error cause attenuation bias?
Even more basic, why does explanatory variable measurement error cause bias at
all? After all, the chances that the measured value of the explanatory variable will
be too high equal the chances it will be too low. Why should this lead to bias? To
appreciate why, suppose that the actual value of the coefficient, βxActual, is
positive. When the measured value of the explanatory variable, xMeasuredt, rises
it can do so for two reasons:
• the actual value of explanatory variable, xActualt, rises
or
• the value of the measurement error term, ut, rises.
Consider what happens to yActualt in each case:
xMeasured t = xActualt + ut
and
yActualt = βConst + β xActual xActualt + et
Assume that β xActual > 0
xActualt up
ç
xMeasuredt up
é
→
yActualt up
→
yActualt unchanged
or
ut up
So, we have two possibilities:
• First case: The actual value of the dependent variable rises since the actual
value of the explanatory variable has risen. In this case, the estimation
procedure will estimate the value of the coefficient estimate “correctly.”
• Second case: The actual value of the dependent variable remains
unchanged since the actual value of the explanatory variable is unchanged.
In this case, the estimation procedure would estimate the value of the
coefficient to be 0.
Taking into account both cases, the estimation procedure will understate the effect
that the actual value of the explanatory variable has on the dependent variable.
Overall, the estimation procedure will understate the actual value of the
coefficient.
14
Might the Ordinary Least Squares (OLS) Estimation Procedure Be Consistent?
Econometrics Lab 19.3: Consistency and Explanatory Variable Measurement
Error
[Link to MIT-Lab 19.3 goes here.]
We have already shown that when explanatory variable measurement error is
present and the actual coefficient is nonzero, the ordinary least squares (OLS)
estimation procedure for the coefficient value is biased. But perhaps it is
consistent. Let us see, by increasing the sample size:
Estimation XMeas Sample Actual Mean of Magnitude Variance of
Procedure Var
Size
Coef Coef Ests of Bias
Coef Ests
OLS
20
40
2.0
≈1.11
≈0.89
≈0.2
OLS
20
50
2.0
≈1.11
≈0.89
≈0.2
OLS
20
60
2.0
≈1.11
≈0.89
≈0.1
Table 19.3: OLS Estimation Procedure, Measurement Error, and Consistency
The bias does not lessen as the sample size is increased. Unfortunately, when
explanatory variable measurement error is present and the actual coefficient is
nonzero, the ordinary least squares (OLS) estimation procedure for the coefficient
value provides only bad news:
• Bad news: The ordinary least squares (OLS) estimation procedure is
biased.
• Bad news: The ordinary least squares (OLS) estimation procedure is not
consistent.
Instrumental Variable (IV) Estimation Procedure: A Two Regression Procedure
Recall that the instrumental variable estimation procedure addresses situations in
which the explanatory variable and the error term are correlated:
Original Model:
yt = βConst
+ βxxt + εt where yt = Dependent variable
xt = Explanatory variable
é ã
When xt and εt
εt = Error term
are correlated
t = 1, 2, …, T T = Sample size
↓
xt is the “problem”
explanatory variable
Figure 19.4: The “Problem” Explanatory Variable
15
When an explanatory variable, xt, is correlated with the error term, εt, we shall
refer to the explanatory variable as the “problem” explanatory variable. The
correlation of the explanatory variable and the error term creates the bias problem
for the ordinary least squares (OLS) estimation procedure. The instrumental
variable estimation procedure can mitigate, but not completely remedy the
problem. Let us briefly review the procedure and motivate it.
Mechanics
• Choose a “Good” Instrument: A “good” instrument, zt, must have two
properties:
o Correlated with the “problem” explanatory variable, xt.
o Uncorrelated with the error term, εt.
• Instrumental Variables (IV) Regression 1: Use the instrument, zt, to
provide an “estimate” of the problem explanatory variable, xt.
o Dependent variable: “Problem” explanatory variable, xt.
o Explanatory variable: Instrument, zt.
o Estimate of the “problem” explanatory variable: Estxt = aConst + azzt
where aConst and az are the estimates of the constant and coefficient
in this regression, IV Regression 1.
• Instrumental Variables (IV) Regression 2: In the original model,
replace the “problem” explanatory variable, xt, with its surrogate, Estxt, the
estimate of the “problem” explanatory variable provided by the
instrument, zt, from IV Regression 1.
o Dependent variable: Original dependent variable, yt.
o Explanatory variable: Estimate of the “problem” explanatory
variable based on the results from IV Regression 1, Estxt.
16
The “Good” Instrument Conditions
Let us now provide the intuition behind why a “good” instrument, zt, must satisfy
the two conditions:
• Instrument/”Problem” Explanatory Variable Correlation: The
instrument, zt, must be correlated with the “problem” explanatory variable,
xt. To understand why, focus on IV Regression 1. We are using the
instrument to create a surrogate for the “problem” explanatory variable in
IV Regression 1:
Estxt = aConst + azzt
The estimate, Estxt, will be a good surrogate only if it is a good predictor
of the “problem” explanatory variable, xt. This will occur only if the
instrument, zt, is correlated with the “problem” explanatory variable, xt.
• Instrument/Error Term Independence: The instrument, zt, must be
independent of the error term, εt. Focus on IV Regression 2. We begin
with the original model and then replace the “problem” explanatory, xt,
variable with its surrogate, Estxt:
yt = βConst +
βxxt
+ εt
Replace “problem” with surrogate
↓
βxEstxt
+ εt
= βConst +
where Estxt = aConst + azzt
from IV Regression 1
To avoid violating the explanatory variable/error term independence
premise in IV Regression 2, the surrogate for the “problem” explanatory
variable, Estxt, must be independent of the error term, εt. The surrogate,
Estxt, is derived from the instrument, zt, in IV Regression 1:
Estxt = aConst + azzt
Consequently, to avoid violating the explanatory variable/error term
independence premise the instrument, zt, and the error term, εt, must be
independent.
Estxt and εt must be independent
ã
é
= βConst + βxEstxt
+
yt
εt
⏐
↓
⏐
Estxt = aConst + azzt
⏐
⏐
é
↓
zt and εt must be independent
17
Measurement Error Example: Annual, Permanent, and Transitory Income
Definitions and Theory
Economists distinguish between permanent income and annual income. Loosely
speaking, permanent income equals what a household earns per year “on
average;” that is, permanent income can be thought of as the “average” of annual
income over an entire lifetime. In some years, the household’s annual income is
more than its permanent income, but in other years it is less. The difference
between the household’s annual income and permanent income is called transitory
income:
IncTranst = IncAnnt − IncPermt
where
IncAnnt = Households's Annual Income
IncPermt = Household's Permanent Income
IncTranst = Household's Transitory Income
or equivalently,
IncAnnt = IncPermt + IncTranst
Since permanent income equals what the household earns “on average,” the mean
of transitory income equals 0.
Microeconomic theory teaches that households base their consumption
decisions on their “permanent” income. We are going to apply the permanent
income consumption theory to health insurance coverage:
Theory: Additional permanent per capita disposable income within a state
increases health insurance coverage within the state.
Project: Assess the effect of permanent income on health insurance coverage.
We consider a straightforward linear model:
Model: Coveredt = β Const + β IncPerm IncPermPCt + et
Theory: β IncPerm > 0
where Covered t
= Percent of individuals with health insurance in state t
IncPermPCt = Per capita permanent disposable income in state t
When we attempt to gather data to access this theory, we immediately encounter a
difficulty. Permanent income cannot be observed. Only annual income data are
available to assess the theory.
18
Health Insurance Data: Cross section data of health insurance coverage,
education, and income statistics from the 50 states and the District of Columbia in
2007.
Adults (25 and older) covered by health insurance in state t
Coveredt
(percent)
IncAnnPCt Per capita annual disposable income in state t (thousands of
dollars)
Adults (25 and older) who completed high school in state t
HSt
(percent)
Adults (25 and older) who completed a four year college in
Collt
state t (percent)
Adults (25 and older) who have an advanced degree in state t
AdvDegt
(percent)
[Link to MIT-HealthInsur-2007.wf1 goes here.]
While we would like to specify permanent income as the explanatory variable, we
have no choice. We must use annual disposable income as the explanatory
variable. Using the ordinary least squares (OLS) estimation procedure to estimate
the parameters:
Ordinary Least Squares (OLS)
Dependent Variable: Covered
Estimate
SE
t-Statistic
Prob
Explanatory Variable(s):
IncAnnPC
0.226905 0.104784
2.165464
0.0352
Const
78.56242 3.605818
21.78768
0.0000
Number of Observations
51
Estimated Equation: EstCovered = 78.6 + .23IncAnnPC
Interpretation of Estimates:
bIncAnnPC = .23: A $1,000 increase in annual per capita disposable income
increases the state’s health insurance coverage by .23 percentage points.
Critical Result: The IncAnnPC coefficient estimate equals .23. The positive sign
of the coefficient estimate suggests that increases in disposable
income increase health insurance coverage. This evidence
supports the theory.
Table 19.4: Health Insurance OLS Regression Results
19
Now, construct the null and alternative hypotheses:
H0: βIncPerm = 0 Disposable income has no effect on health insurance
coverage
H1: βIncPerm > 0 Additional disposable income increases health insurance
coverage
Since the null hypothesis is based on the premise that the actual value of the
coefficient equals 0, we can calculate the Prob[Results IF H0 True] using the tails
probability reported in the regression printout:
.0352
Prob[Results IF H 0 True] =
= .0176
2
Might the Ordinary Least Squares (OLS) Estimation Procedure Suffer from a
Serious Econometric Problem?
Might this regression suffer from a serious econometric problem, however? Yes.
Annual income equals permanent income plus transitory income; transitory
income can be viewed as measurement error. Sometimes transitory income is
positive, sometimes it is negative, on average it is 0:
IncAnnPCt = IncPermPCt +
IncTransPCt
↓
Measurement
Error
↓
=
+
IncAnnPCt
IncPermPCt
ut
where Mean[ut] = 0
or equivalently,
IncPermPCt = IncAnnPCt − ut
As a consequence of explanatory variable measurement error the ordinary
least squares (OLS) estimation procedure for the coefficient will be biased
downward. To understand why we begin with our model and the do a little
algebra:
20
Coveredt = β Const +
β Inc P erm IncPermPCt
+
et
↓
= β Const +
= β Const
β Inc P erm ( IncAnnPCt − ut )
where β Inc P erm > 0
Substituting for IncPermPCt
+
↓
+ β Inc P erm IncAnnPCt − β Inc P ermut +
et
Multiplying
et
Rearranging terms
= β Const +
β Inc P erm IncAnnPCt
+ et − β Inc P ermut
↓
Coveredt = β Const +
β Inc P erm IncAnnPCt
+
Letting ε t = et − β Inc P ermut
εt
Theory suggests that βIncPerm is positive; consequently, we expect the new error
term, εt, and the explanatory variable, IncAnnPCt, to be negatively correlated.
ut up
IncAnnPCt = IncPermPCt + ut
ã
é εt = et − βIncPermut βIncPerm > 0
IncAnnPCt up
εt down
↔
é
ã
Negative Explanatory Variable/Error
Term Correlation
↓
OLS Biased Downward
IncAnnPCt is the “problem” explanatory variable because it is correlated with the
error term, εt. The ordinary least squares (OLS) estimation procedure for the
coefficient value is biased toward 0. We shall now show how we can use the
instrumental variable (IV) estimation procedure to mitigate the problem.
Instrumental Variable (IV) Approach
The Mechanics
Choose an Instrument: In this example, we use percent of adults who completed
high school, HSt, as our instrument. In doing so, we believe that it satisfies the
two “good” instrument conditions. We believe that high school education, HSt,
• is positively correlated with the “problem” explanatory variable,
IncAnnPCt.
and
• is uncorrelated with the error term, εt.
21
Instrumental Variables (IV) Regression 1
• Dependent variable: “Problem” explanatory variable, IncAnnPC.
• Explanatory variable: Instrument, the correlated variable, HS.
We can motivate IV Regression 1 by devising a theory to explain permanent
income. Our theory is very straightforward, state per capita permanent income
depends on percent of state residents who are high school graduates:
IncPermPCt = α Const + α HS HSt + et
where HSt Percent of adults (25 and over) who completed high school in state t
Theory: As a state has a greater percent of college graduates, its per capita
permanent income increase; hence, αHS > 0.
But again we note that permanent income is not observable, only annual income
is. Consequently, we have no choice but to use annual per capita income as the
dependent variable.
[Link to MIT-HealthInsur-2007.wf1 goes here.]
Ordinary Least Squares (OLS)
Dependent Variable: IncAnnPC
Estimate
SE
t-Statistic
Explanatory Variable(s):
HS
0.456948 0.194711
2.346797
Const
−5.274762 16.75975
-0.314728
Number of Observations
Prob
0.0230
0.7543
51
Estimated Equation: EstIncAnnPC = −5.27 + .457HS
Table 19.5: Health Insurance IV Regression 1 Results
What are the ramifications of using annual per capita income as the dependent
variable? We can view annual per capita income as permanent per capita income
with measurement error. What do we know about dependent variable
measurement error? Dependent variable does not lead to bias; only explanatory
variable measurement error creates bias. Since annual income is the dependent
variable in IV Regression 1, the ordinary least squares (OLS) estimation
procedure for the regression parameters will not be biased.
22
Instrumental Variables (IV) Regression 2
• Dependent variable: Original dependent variable, Covered.
• Explanatory variable: Estimate of the “problem” explanatory variable
based on the results from IV Regression 1, EstIncAnnPC.
Use the estimates of IV Regression 1 to create a new variable, the estimated value
of per capita disposable income based on the completion of high school:
EstIncAnnPC = −5.27 + .457HS
Ordinary Least Squares (OLS)
Dependent Variable: Covered
Estimate
SE
t-Statistic
Explanatory Variable(s):
EstIncAnnPC
1.387791 0.282369
4.914822
Const
39.05305 9.620730
4.059260
Number of Observations
Prob
0.0000
0.0002
51
Estimated Equation: EstCovered = 39.05 + 1.39EstIncAnnPC
Interpretation of Estimates:
bEstIncAnnPC = 1.39: A $1,000 increase in annual per capita disposable income
increases the state’s health insurance coverage by 1.39 percentage points.
Critical Result: The EstIncAnnPC coefficient estimate equals 1.39. The positive
sign of the coefficient estimate suggests that increases in
permanent disposable income increase health insurance
coverage. This evidence supports the theory.
Table 19.6: Health Insurance IV Regression 2 Results
Comparison of the Ordinary Least Squares (OLS) and the Instrumental Variables
(IV) Approaches
Now review the two approaches that we used to estimate of the effect of
permanent income on health insurance coverage: the ordinary least squares (OLS)
estimation procedure and the instrumental variable (IV) estimation procedure.
• First, we used annual disposable income as the explanatory variable and
applied the ordinary least squares (OLS) estimation procedure. We
estimated that a $1,000 increase in per capita disposable income increases
health insurance coverage by .23 percentage points. But we believe that an
explanatory variable measurement error problem is present here.
• Second, we used an instrumental variable (IV) approach which resulted in
a higher estimate for the impact of permanent income. We estimated that a
$1,000 increase in per capita disposable income increases health insurance
coverage by 1.39 percentage points.
23
βIncPerm Standard
Tails
Estimate Error t-Statistic Probability
.23
.105
2.17
.0352
Ordinary Least Squares (OLS)
1.38
.282
4.91
<.0001
Instrumental Variable (IV)
Table 19.7: Comparison of OLS and IV Regression Results
These results are consistent with the notion that the ordinary least squares (OLS)
estimation procedure for the coefficient value is biased downward whenever
explanatory variable measurement error is present.
“Good” Instrument Conditions Revisited
IV Regression 1 allows us to assess the first “good” instrument condition.
• Instrument/”Problem” Explanatory Variable Correlation: The
instrument, HSt, must be correlated with the “problem” explanatory
variable, IncAnnPCt. We are using the instrument to create a surrogate for
the “problem” explanatory variable in IV Regression 1:
EstIncAnnPCt = −5.27 + .457HSt
The estimate, EstIncAnnPCt, will be a “good” surrogate only if the
instrument, HSt, is correlated with the “problem” explanatory variable,
IncAnnPCt; that is, only if the estimate is a good predictor of the
“problem” explanatory variable.
The sign of the HSt coefficient is positive supporting our view that annual income
and high school education are positively correlated. Furthermore, the coefficient
is significant at the 5 percent level and nearly significant at the 1 percent level.
So, it is reasonable to judge that the instrument meets the first condition.
Next, focus on the second “good” instrument condition:
• Instrument/Error Term Independence: The instrument, HS, and the
error term, εt, must be independent. Otherwise, the explanatory
variable/error term independence premise would be violated in IV
Regression 2.
Recall the model that IV Regression 2 estimates:
= βConst + βIncPermEstAnnIncPCt
+
Coveredt
εt
é
ã
Question: Are EstAnnIncPCt and εt independent?
ã
é
EstIncAnnPCt = −5.27 + .457HSt
εt = e t− βIncPermut
é
ã
Answer: Only if HSt and εt are independent.
24
The explanatory variable/error term independence premise will be satisfied only if
the instrument, HSt, and the new error term, εt, are independent. If they are
correlated, then we have gone “from the frying pan into the fire.” It was the
violation of this premise that created the problem in the first place. There is no
obvious reason to believe that they are correlated. Unfortunately, there is no way
to confirm this empirically, however. This can be the “Achilles heel” of the
instrumental variable (IV) estimation procedure, however. Finding a good
instrument can be very tricky.
Justifying the Instrumental Variable (IV) Estimation Procedure
Claim: While the instrumental variable (IV) estimation procedure for the
coefficient value in the presence of measurement is biased, it is consistent.
Econometrics Lab 19.4: Consistency and the Instrumental Variable (IV)
Estimation Procedure
While this claim can be justified rigorous, we shall avoid the mathematics by
using a simulation.
[Link to MIT-Lab 19.4 goes here.]
25
Figure 19.5: Instrumental Variable Measurement Error Simulation
Focus your attention on Figure 19.5. Since we wish to investigate the properties
of the instrumental variable (IV) estimation procedure, IV is selected in the
estimation procedure box. Next, note the XMeas Var List. Explanatory variable
measurement error is present. By default, the variance of the probability
distribution for the measurement error term, Var[ut], equals 20.0. In the Corr
X&Z list .50 is selected; the correlation coefficient between the explanatory
variable and the instrument is .50.
26
Initially, the sample size is 40. Click Start and then after many, many
repetitions click Stop. The average of the estimated coefficient values equals 2.24.
Next, increase the sample size from 40 to 60 and repeat the process. Do the same
for a sample size of 80. As Table 19.8 reports, the average of the estimated
coefficient values never equals the actual value; consequently, the instrumental
variable (IV) estimation procedure for the coefficient value is biased. But also
note that the magnitude of the bias decreases as the sample size increases. Also,
the variance of the estimates declines as the sample size increases.
Estimation XMeas Sample Actual Mean of Magnitude Variance of
Procedure Var
Size
Coef Coef Ests of Bias
Coef Ests
IV
20
40
2.0
≈2.24
≈0.24
≈8.8
IV
20
50
2.0
≈2.17
≈0.17
≈3.4
IV
20
60
2.0
≈2.12
≈0.12
≈1.7
Table 19.8: Measurement Error, IV Estimation Procedure, and Consistency
Table 19.8 suggests that when explanatory variable measurement error is present,
the instrumental variable (IV) estimation procedure for the coefficient value
provides both good news and bad news:
• Bad news: The instrumental variable (IV) estimation procedure for the
coefficient value is still biased; the average of the estimated coefficient
values does not equal the actual value.
• Good news: The instrumental variable (IV) estimation procedure for the
coefficient value is consistent. As the sample size is increased,
o the magnitude of the bias diminishes.
o the variance of the estimated coefficient values decreases.