Download The Why?, What?, and How? of Instrumental Variables Estimation

Document related concepts

Interaction (statistics) wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Least squares wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
The Plan for Day Two

Practice and pitfalls
(1) Natural experiments as interesting sources of
instrumental variables
(2) The consequences of “weak” instruments for
causal inference
(3) Some useful IV diagnostics
(4) Walk through an empirical application

Goal = provide concrete examples of
instrumental variables methods
Instrumental
Variables and Natural Experiments

What is a natural experiment?
–
“situations where the forces of nature or government
policy have conspired to produce an environment
somewhat akin to a randomized experiment”


Angrist and Krueger (2001, p. 73)
Natural experiments can provide a useful
source of exogenous variation in problematic
regressors
–
But they require detailed institutional knowledge
Instrumental
Variables and Natural Experiments

Some natural experiments in economics
–
Existing policy differences, or changes that affect
some jurisdictions (or groups) but not others



–
Minimum wage rate
Excise taxes on consumer goods
Unemployment insurance, workers’ compensation
Unexpected “shocks” to the local economy


Coal prices and the Middle East oil embargo (1973)
Agricultural production and adverse weather events
Instrumental
Variables and Natural Experiments

Some potential pitfalls
–
Not all policy differences/changes are exogenous

–
Generalizability of causal effect estimates

–
Political factors and past realizations of the response
variable can affect existing policies or policy changes
Results may not generalize beyond the units under study
Heterogeneity in causal effects

Results may be sensitive to the natural experiment
chosen in a specific study (L.A.T.E.)
Instrumental
Variables and Natural Experiments

Some natural experiments of criminological
interest
–
–
–

Levitt (1996) = prison population → crime rate
Levitt (1997) = police hiring → crime rate
Apel et al. (2008) = youth employment → delinquency
Some natural experiments not of criminological
interest, but interesting nonetheless
–
Angrist and Evans (1998) = fertility → labor supply
Levitt (1996), Q.J.E.

Large decline in crime did not accompany the
large increase in prison population (1971-1993)
–

But...increased prison use could mask what
would have been a greater increase in crime
–

Prima fascia evidence of prison ineffectiveness
Underlying determinants of crime probably worsened
And...prison population probably responded to
crime increase
Levitt (1996), Q.J.E.

Prison overcrowding legislation
–
–
Population caps, prohibition of “double celling”
In 12 states, the entire prison system came under
court control


AL, AK, AR, DE, FL, MS, NM, OK, RI, SC, TN, TX
Relationship between legislation and prisons
–
–
Prior to filing, prison growth outpaced national
average by 2.3 percent
After filing, prison growth was 5.1 percent slower
Levitt (1996), Q.J.E.
Prisons
Under Court
Control

–
Prison
Population
Growth
–
Crime Rate
Growth
Logic of the instrumental variable in this study
–
–
Court rulings concerning prison capacity cannot be correlated
with the unobserved determinants of crime rate changes
Or...the only reason court rulings are related to crime is
because they limit prison population growth
Levitt (1996), Q.J.E.

2SLS model yields a “prison
effect” on crime at least four
times as high as the LS model
–
Violent crime rate


–
Property crime rate


–
bLS = –.099 (s.e. = .033)
bIV = –.424 (s.e. = .201)
bLS = –.071 (s.e. = .019)
bIV = –.321 (s.e. = .138)
A 10% increase in prison size
produces a 4.2% decrease in
violent crime and a 3.2%
decrease in property crime
Levitt (1996), Q.J.E.


L.A.T.E. = effect of prison growth on crime
among states under court order to slow growth
Some relevant observations
–
Generalizability = predominately Southern states

–
Large prison populations, unusually fast prison growth
T.E. heterogeneity = (slowed) prison growth due to
court-ordered prison reductions may be differentially
related to crime rates

Other IV’s could lead to different causal effect estimates
Levitt (1997), A.E.R.

Breaking the simultaneity in the police-crime
connection
–
–

When more police are hired, crime should decline
But...more police may be hired during crime waves
Election cycles and police hiring
–
–
Increases in size of police force disproportionately
concentrated in election years
Growth is 2.1% in mayoral election years, 2.0% in
gubernatorial election years, and 0.0% in nonelection years
Levitt (1997), A.E.R.

However...can election cycles affect crime
rates through other spending channels?
–
–
Ex., education, welfare, unemployment benefits
If so, all of these other indirect channels must be
netted out
Election
Year
+
Growth in
Police
Manpower
–
Growth in
Crime Rate
Levitt (1997), A.E.R.
Reduced-form
coefficients
First-stage
coefficients
Levitt (1997), A.E.R.

Comparative estimates of the effect of police
manpower on city crime rates
–
Violent crime rate



–
Levels: bLS = +.28 (s.e. = .05)
Changes: bLS = –.27 (s.e. = .06)
Changes: bIV = –1.39 (s.e. = .55)
Property crime rate



Levels: bLS = +.21 (s.e. = .05)
Changes: bLS = –.23 (s.e. = .09)
Changes: bIV = –.38 (s.e. = .83)
Levitt (1997), A.E.R.

Follow-up instrumental variables studies of
the police-crime relationship in the U.S.
–
–
–

Levitt (2002) = Number of firefighters
Klick and Tabarrok (2005) = Washington, DC,
terrorism alert levels post-9/11
Evans and Owens (2007) = Grants from the
federal Office of C.O.P.S.
These findings basically replicated those from
Levitt’s (1997) original study
Apel et al. (2008), J.Q.C.

What effect does working have on adolescent
behavior?
–
–
Prior research suggests the consequences of work
are uniformly negative
Focus on “work intensity” rather than work per se


Youth Worker Protection Act
Problem of non-random selection into youth
labor market
–
Especially pronounced for high-intensity workers
Apel et al. (2008), J.Q.C.

Something interesting happens at age 16
–
Youth work is no longer governed by the federal
Fair Labor Standards Act (F.L.S.A.)
Apel et al. (2008), J.Q.C.

F.L.S.A. governs employment of all 15 year
olds during the school year
–
–

No work past 7:00 pm
Maximum 3 hours/day and 18 hours/week
But, F.L.S.A. expires for 16 year olds
–
–
And...every state has its own law governing 16year-old employment
Thus, youth age into less restrictive regimes that
vary across jurisdictions
Apel et al. (2008), J.Q.C.

Change in work intensity at 15-16 transition among
15-year-old non-workers
Magnitude of change
is an increasing
function of the
number of hours
allowed at age 16
Apel et al. (2008), J.Q.C.
State Child Labor Law
Hours per Week
No Limit
Model 1
Model 2
Model 3
0.32 (.05)***
11.43 (1.6)***
Hours per Weekday
1.19 (.19)***
No Limit
9.37 (1.3)***
Work Curfew
2.19 (.27)***
No Limit
23.83 (2.7)***
R-square
.400
.401
.409
ΔR-square with IV’s
.014
.015
.023
Partial R-square for IV’s
.023
.025
.037
F-test for IV’s
26.2
28.3
41.9
Approx. relative bias
.000
.000
.000
Apel et al. (2008), J.Q.C.

A 20-hour increase in the number of hours worked per
week reduces the “variety” of delinquent behavior by
0.47 (–.023320)
Angrist and Krueger (1991), J.L.E.

Returns to education (Y = wages)
–

Years of schooling vary by quarter of birth
–
–

Problem of omitted “ability bias”
Compulsory schooling laws, age-at-entry rules
Someone born in Q1 is a little older and will be
able to drop out sooner than someone born in Q4
Q.O.B. can be treated as a useful source of
exogeneity in schooling
Angrist and Krueger (1991), J.L.E.

People born in Q1 do
obtain less schooling
–
–

But pay close attention to
the scale of the y-axis
Mean difference between
Q1 and Q4 is only 0.124,
or 1.5 months
So...need large N since
R2X,Z will be very small
–
A&K had over 300k for
the 1930-39 cohort
Source: Angrist and Krueger (1991), Figure I
Angrist and Krueger (1991), J.L.E.

Final 2SLS model interacted QOB with year of
birth (30), state of birth (150)
–
–

OLS: b = .0628 (s.e. = .0003)
2SLS: b = .0811 (s.e. = .0109)
Least squares estimate does not appear to be
badly biased by omitted variables
–
But...replication effort identified some pitfalls in this
analysis that are instructive
Bound,
Jaeger, and Baker (1995), J.A.S.A.

Potential problems with QOB as an IV
–
Correlation between QOB and schooling is weak

–
QOB may not be completely exogenous


Small Cov(X,Z) introduces finite-sample bias, which will be
exacerbated with the inclusion of many IV’s
Even small Cov(Z,e) will cause inconsistency, and this will
be exacerbated when Cov(X,Z) is small
QOB qualifies as a weak instrument that may
be correlated with unobserved determinants of
wages (e.g., family income)
Bound,
Jaeger, and Baker (1995), J.A.S.A.

Even if the instrument is “good,” matters can
be made far worse with IV as opposed to LS
–
Weak correlation between IV and endogenous
regressor can pose severe finite-sample bias


And…really large samples won’t help, especially if there
is even weak endogeneity between IV and error
First-stage diagnostics provide a sense of
how good an IV is in a given setting
–
F-test and partial-R2 on IV’s
Useful
Diagnostic Tools for IV Models

Tests of instrument relevance
–

Tests of instrument exogeneity
–

Weak IV’s → Large variance of bIV as well as
potentially severe finite-sample bias
Endogenous IV’s → Inconsistency of bIV that
makes it no better (and probably worse) than bLS
Durbin-Wu-Hausman test
–
Endogeneity of the problem regressor(s)
Tests of Instrument Relevance

Diagnostics based on the F-test for the joint
significance of the IV’s
–
–

Partial R-square for the IV’s
–

Nelson and Startz (1990); Staiger and Stock (1997)
Bound, Jaeger, and Baker (1995)
Shea (1997)
There is a growing econometric literature
on the “weak instrument” problem
Tests of Instrument Exogeneity

Model must be overidentified, i.e., more IV’s
than endogenous X’s
–

H0: All IV’s uncorrelated with structural error
Overidentification test:
1. Estimate structural model
2. Regress IV residuals on all exogenous variables
3. Compute NR2 and compare to chi-square

df = # IV’s – # endogenous X’s
Durbin-Wu-Hausman (DWH) Test

Balances the consistency of IV against the
efficiency of LS
–
–

H0: IV and LS both consistent, but LS is efficient
H1: Only IV is consistent
DWH test for a single endogenous regressor:
–
DWH = (bIV – bLS) / √(s2bIV – s2bLS) ~ N(0,1)
If |DWH| > 1.96, then X is endogenous and IV is
the preferred estimator despite its inefficiency
Durbin-Wu-Hausman (DWH) Test

A roughly equivalent procedure for DWH:
1. Estimate the first-stage model
2. Include the first-stage residual in the structural
model along with the endogenous X
3. Test for significance of the coefficient on residual

Note: Coefficient on endogenous X in this
model is bIV (standard error is smaller, though)
–
First-stage residual is a “generated regressor”
Software Considerations

I have a strong preference for Stata
–
–
–

Classic routine (-ivreg-) as well as a user-written
one with a lot more diagnostic capability (-ivreg2-)
Non-linear models: -ivprobit- and -ivtobitPanel models: -xtivreg- and -xtivreg2-
Useful post-estimation routines
–
–
–
Overidentification: -overidEndogeneity of X in LS model: -ivendogHeteroscedasticity: -ivhettest-
Software Considerations

Basic model specification in Stata
ivreg y (x = z) w [weight = wtvar], options
y = dependent variable
x = endogenous variable
z = instrumental variable
w = control variable(s)
–
Useful options: first, ffirst, robust, cluster(varname)
Software Considerations

For SAS users: Proc Syslin (SAS/ETS)
–
Basic command:
proc syslin data=dataset 2sls options1;
endogenous x;
instruments z w;
model y = x w / options2;
weight wtvar;
run;
–
–
Useful “options1”: first
Useful “options2”: overid
Software Considerations

For SPSS users: 2SLS
–
Basic command:
2sls y with x w
/ instruments z w
/ constant.
–
For point-and-click aficionados


Analyze → Regression → Two-Stage Least Squares
DEPENDENT, EXPLANATORY, and INSTRUMENTAL
Software Considerations

For Limdep users: 2SLS
–
Basic command:
2SLS ; Lhs = y
; Rhs = one, x, w
; Inst = one, z, w
; Wts = wtvar
; Dfc $
Application: Adolescent
Work and Delinquent Behavior

Prior research shows a positive correlation
between teenage work and delinquency
–

Reasons to suspect serious endogeneity bias
2nd wave of the NLSY97 (N = 8,368)
–
–
–
–
Y = 1 if committed delinquent act (31.9%)
X = 1 if worked in a formal job (52.6%)
Z1 = 1 if child labor law allows 40+ hours (14.2%)
Z2 = 1 if no child labor restriction in place (39.6%)
Regression
Model Ignoring Endogeneity
. reg pcrime work if nomiss==1 & wave==2
Source |
SS
df
MS
-------------+-----------------------------Model | 1.37395379
1 1.37395379
Residual | 1815.97786 8366 .217066443
-------------+-----------------------------Total | 1817.35182 8367 .217204711
Number of obs
F( 1, 8366)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
8368
6.33
0.0119
0.0008
0.0006
.4659
-----------------------------------------------------------------------------pcrime |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------work |
.0256633
.0102005
2.52
0.012
.0056677
.0456588
_cons |
.3053242
.0074009
41.26
0.000
.2908167
.3198318
------------------------------------------------------------------------------

Teenage workers significantly more delinquent
–
Modest effect but consistent with prior research
First-Stage Model
. reg work law40 nolaw if nomiss==1 & wave==2
Source |
SS
df
MS
-------------+-----------------------------Model | 271.829722
2 135.914861
Residual | 1814.33364 8365 .216895832
-------------+-----------------------------Total | 2086.16336 8367 .249332301
Number of obs
F( 2, 8365)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
8368
626.64
0.0000
0.1303
0.1301
.46572
-----------------------------------------------------------------------------work |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------law40 |
.0688902
.0154383
4.46
0.000
.0386274
.099153
nolaw |
.3818684
.0110273
34.63
0.000
.3602521
.4034847
_cons |
.3655636
.0074883
48.82
0.000
.3508847
.3802425
------------------------------------------------------------------------------

State child labor laws affect probability of work
–
This is a really strong first stage (F, R2)
Two-Stage Least Squares Model
. ivreg pcrime (work = law40 nolaw) if nomiss==1 & wave==2
Instrumental variables (2SLS) regression
Source |
SS
df
MS
-------------+-----------------------------Model | -19.5287923
1 -19.5287923
Residual | 1836.88061 8366 .219564978
-------------+-----------------------------Total | 1817.35182 8367 .217204711
Number of obs
F( 1, 8366)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
8368
6.86
0.0088
.
.
.46858
-----------------------------------------------------------------------------pcrime |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------work | -.0744352
.0284206
-2.62
0.009
-.1301466
-.0187238
_cons |
.3580171
.0158135
22.64
0.000
.3270187
.3890155
-----------------------------------------------------------------------------Instrumented: work
Instruments:
law40 nolaw
------------------------------------------------------------------------------
What
Do the Models Suggest Thus Far?

Completely different conclusions!
–
OLS = Teenage work is criminogenic (b = +.026)

–
2SLS = Teenage work is prophylactic (b = –.074)


Delinquency risk increases by 8.5 percent (base = .305)
Delinquency risk decreases by 20.7 percent (base = .358)
Which model should we believe?
–
We still have some additional diagnostic work to do
to evaluate the 2SLS model

Overidentification test, Hausman test
RegressionBased Overidentification Test
. reg IVresid law40 nolaw if nomiss==1 & wave==2
Source |
SS
df
MS
-------------+-----------------------------Model | .111648085
2 .055824043
Residual | 1836.76895 8365 .219577878
-------------+-----------------------------Total |
1836.8806 8367 .219538735
Number of obs
F( 2, 8365)
Prob > F
R-squared
Adj R-squared
Root MSE
=
8368
=
0.25
= 0.7755
= 0.0001
= -0.0002
= .46859
-----------------------------------------------------------------------------IVresid |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------law40 |
.010988
.0155334
0.71
0.479
-.0194613
.0414374
nolaw |
.0016436
.0110953
0.15
0.882
-.020106
.0233931
_cons | -.0022127
.0075344
-0.29
0.769
-.0169821
.0125567
------------------------------------------------------------------------------

Overidentification test = 8,368 × .0001 = .8368 ~ χ2(1)
Overidentification
Test from the Software
. overid
Tests of overidentifying restrictions:
Sargan N*R-sq test
0.509 Chi-sq(1)
Basmann test
0.508 Chi-sq(1)

P-value = 0.4757
P-value = 0.4758
IV’s jointly pass the exogeneity requirement
–
Notice that -overid- provides a global test, whereas
the regression-based approach allows you to test
the IV’s jointly as well as individually
Durbin-Wu-Hausman
(DWH) Test Estimated by Hand

Summary coefficients
–
–
–


OLS model: b = +.026, s.e. = .010
2SLS model: b = –.074, s.e. = .028
Notice the size of the 2SLS standard error
DWH = (–.074 – .026) / √(.0282 – .0102) ≈ –3.82
CONCLUSION: Least squares estimate of the
“work effect” is biased and inconsistent
–
The 2SLS estimate is preferred
Regression-Based DWH Test
. reg pcrime work FSresid if nomiss==1 & wave==2
Source |
SS
df
MS
-------------+-----------------------------Model | 4.50567523
2 2.25283761
Residual | 1812.84614 8365 .216718009
-------------+-----------------------------Total | 1817.35182 8367 .217204711
Number of obs
F( 2, 8365)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
8368
10.40
0.0000
0.0025
0.0022
.46553
-----------------------------------------------------------------------------pcrime |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------work | -.0744352
.0282357
-2.64
0.008
-.1297842
-.0190862
FSresid |
.1150956
.0302771
3.80
0.000
.0557449
.1744462
_cons |
.3580171
.0157106
22.79
0.000
.3272204
.3888139
------------------------------------------------------------------------------

Coeff. on work is bIV, while t-test on FSresid is DWH
–
Standard error for work is underestimated, though
Or Just Let the
Software Give You the DWH Test
. ivendog
Tests of endogeneity of: work
H0: Regressor is exogenous
Wu-Hausman F test:
Durbin-Wu-Hausman chi-sq test:

14.45067
14.43093
F(1,8365)
Chi-sq(1)
P-value = 0.00014
P-value = 0.00015
Notice that -ivendog- provides a chi-square
test for DWH, but the z-test that we computed
by hand is easily recovered
–
√(χ2) = z  √(14.43) = 3.80
Alternative Specifications for
the Work-Delinquency Association

IV probit model
–
–

Continuous work hours
–
–

Without IV’s: b = +.072 (s.e. = .029)
With IV’s: b = –.207 (s.e. = .078)
Without IV’s: b = +.0015 (s.e. = .0003)
With IV’s: b = –.0024 (s.e. = .0009)
Indicator for “intensive” work (>20 hours)
–
–
Without IV’s: b = +.043 (s.e. = .012)
With IV’s: b = –.095 (s.e. = .036)
Alternative Specifications for
the Work-Delinquency Association

Control variables = gender, race, child, dropout,
family structure, family size, urbanicity, dwelling,
school suspension, unemployment rate, mobility
–
Binary work status


–
Continuous work hours


–
Without IV’s: b = +.013 (s.e. = .010)
With IV’s: b = –.061 (s.e. = .029)
Without IV’s: b = +.0007 (s.e. = .0003)
With IV’s: b = –.0023 (s.e. = .0010)
Intensive work indicator


Without IV’s: b = +.020 (s.e. = .012)
With IV’s: b = –.085 (s.e. = .040)
So Where Do We Stand with
the Work-Delinquency Question?

Are child labor laws correlated with work?
–

Are child labor laws good IV’s?
–

YES = overidentification test is not rejected
Is teenage work endogenous?
–

YES = first-stage F is large
YES = Hausman test is rejected
Prior research findings that teenage work
is criminogenic are selection artifacts
Stata Commands
for the Foregoing Example

Regression model ignoring endogeneity:
reg y x w

First-stage regression model:
–
reg x z1 z2 w
With controls and multiple IV’s, test relevance:
test z1 z2

2SLS regression model:
ivreg y (x = z1 z2) w
Stata Commands
for the Foregoing Example

Manual post hoc commands
–
Get residuals for regression-based overid. test:


–
Get residuals for regression-based DWH test:



After 2SLS model: predict IVresid if e(sample), resid
Then: reg IVresid z1 z2
After first-stage model: predict FSresid if e(sample), resid
Then: reg y x w FSresid
“Canned” post hoc commands
–
After 2SLS model: overid and ivendog
Now…What Happens if I Throw
in a Potentially Bogus Instrument?

Now there are three instrumental variables
–
–
–

Z1 = 1 if child labor law allows 40+ hours (14.2%)
Z2 = 1 if no child labor restriction in place (39.6%)
Z3 = 1 if high unemployment rate in county (20.1%)
A little more difficult to tell a convincing story
that the unemployment rate is only related to
delinquency through work experience
–
But let’s see what happens
First-Stage Model
. reg work law40 nolaw highun if nomiss==1 & wave==2
Source |
SS
df
MS
-------------+-----------------------------Model | 277.229696
3 92.4098987
Residual | 1808.93366 8364 .216276144
-------------+-----------------------------Total | 2086.16336 8367 .249332301
Number of obs
F( 3, 8364)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
8368
427.28
0.0000
0.1329
0.1326
.46505
-----------------------------------------------------------------------------work |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------law40 |
.0636421
.0154519
4.12
0.000
.0333525
.0939317
nolaw |
.3775975
.0110447
34.19
0.000
.3559472
.3992479
highun | -.0636009
.0127283
-5.00
0.000
-.0885517
-.0386502
_cons |
.3808061
.0080759
47.15
0.000
.3649754
.3966368
------------------------------------------------------------------------------

So far so good and consistent with expectation
Two-Stage Least Squares Model
. ivreg pcrime (work = law40 nolaw highun) if nomiss==1 & wave==2
Instrumental variables (2SLS) regression
Source |
SS
df
MS
-------------+-----------------------------Model | -16.0635514
1 -16.0635514
Residual | 1833.41537 8366 .219150773
-------------+-----------------------------Total | 1817.35182 8367 .217204711
Number of obs
F( 1, 8366)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
8368
5.47
0.0194
.
.
.46814
-----------------------------------------------------------------------------pcrime |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------work | -.0657624
.0281159
-2.34
0.019
-.1208765
-.0106483
_cons |
.3534516
.0156602
22.57
0.000
.3227537
.3841496
-----------------------------------------------------------------------------Instrumented: work
Instruments:
law40 nolaw highun
------------------------------------------------------------------------------
Post-Hoc Diagnostics
. overid
Tests of overidentifying restrictions:
Sargan N*R-sq test
5.301 Chi-sq(2)
Basmann test
5.301 Chi-sq(2)
P-value = 0.0706
P-value = 0.0706
. ivendog
Tests of endogeneity of: work
H0: Regressor is exogenous
Wu-Hausman F test:
Durbin-Wu-Hausman chi-sq test:

12.32811
12.31438
F(1,8365)
Chi-sq(1)
P-value = 0.00045
P-value = 0.00045
Overidentification gives cause for concern
–
The p-value shouldn’t be anywhere near 0.05
Follow-Up Overidentification Test
. reg IVresid law40 nolaw highun
Source |
SS
df
MS
-------------+-----------------------------Model |
1.1613555
3 .387118499
Residual | 1832.25406 8364
.21906433
-------------+-----------------------------Total | 1833.41541 8367 .219124586
Number of obs
F( 3, 8364)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
8368
1.77
0.1511
0.0006
0.0003
.46804
-----------------------------------------------------------------------------IVresid |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------law40 |
.0080993
.0155512
0.52
0.603
-.0223849
.0385836
nolaw | -.0035329
.0111156
-0.32
0.751
-.0253223
.0182565
highun | -.0277671
.0128101
-2.17
0.030
-.0528781
-.0026561
_cons |
.0058369
.0081277
0.72
0.473
-.0100955
.0217693
------------------------------------------------------------------------------

Okay…unemployment rate is problematic as IV
Conclusion from Diagnostic Tests

2SLS “work effect” is similar
–
–

Without unemployment, b = –.074 (s.e. = .028)
With unemployment, b = –.066 (s.e. = .028)
But…the second model is invalidated because
the unemployment rate is not exogenous
–
If affects criminality through other channels


We need to control for all other indirect pathways, or…
It should not be used as an IV at all
Closing Comments about
Instrumental Variables Studies

In general, a lagged value of the endogenous
regressor is not a good instrument
–
Traditional structural equation model uses lagged
values of X and Y as instruments to break the
simultaneity between the current values of X and Y
X1
X2
Y1
Y2
These models impose the
awfully strong assumption that
lagged values of X and Y only
affect the outcomes through
current values
Closing Comments about
Instrumental Variables Studies

Good IV models are generally interesting in
their own right, and should not be treated as
“tack on” analyses
–
Practice varies widely across disciplines


Some researchers write papers about their discovery and
application of a “clever” IV for some problem
Other researchers “tack on” IV models at the end of their
analysis, often poorly, as a way to convince readers that
their results are robust
Rules for Good Practice with
Instrumental Variables Models

IV models can be very informative, but it’s
your job to convince your audience
–
Show the first-stage model diagnostics

–
Report test(s) of overidentifying restrictions

–
Even the most clever IV might not be sufficiently strongly
related to X to be a useful source of identification
An invalid IV is often worse than no IV at all
Report LS endogeneity (DWH) test
Rules for Good Practice with
Instrumental Variables Models


Most importantly, TELL A STORY about why a
particular IV is a “good instrument”
Something to consider when thinking about
whether a particular IV is “good”
–
Does the IV, for all intents and purposes, randomize
the endogenous regressor?
Other Interesting IV Topics
I Just Don’t Have Time to Cover







2SLS with a continuous “treatment”
Instrumental variables for sample selectivity
Generalized method of moments (IV-GMM)
Non-linear two-stage least squares (N2SLS)
Two-sample instrumental variables (TSIV)
Fixed-effects instrumental variables (FEIV)
Dynamic panel data estimators
References





Angrist. (2006). Instrumental variables methods in experimental
criminology research: What, why and how. Journal of Experimental
Criminology, 2, 23-44.
Angrist & Evans. (1998). Children and their parents’ labor supply: Evidence
from exogenous variation in family size. American Economic Review, 88,
450-477.
Angrist & Krueger. (1991). Does compulsory school attendance affect
schooling and earnings. Quarterly Journal of Economics, 106, 979-1014.
Angrist & Krueger. (2001). Instrumental variables and the search for
identification: From supply and demand to natural experiments. Journal of
Economic Perspectives, 15, 69-85.
Apel, Bushway, Paternoster, Brame & Sweeten. (2008). Using state child
labor laws to identify the causal effect of youth employment on deviant
behavior and academic achievement. Journal of Quantitative Criminology,
24, 337-362.
References







Bound, Jaeger & Baker. (1995). Problems with instrumental variables
estimation when the correlation between the instruments and the
endogenous explanatory variables is weak. Journal of the American
Statistical Association, 90, 443-450.
Evans & Owens. (2007). COPS and crime. Journal of Public Economics,
91, 181-201.
Imbens & Angrist. (1994). Identification and estimation of local average
treatment effects. Econometrica, 62, 467-475.
Kelejian. (1971). Two-stage least squares and econometric systems linear
in parameters but nonlinear in the endogenous variable. Journal of the
American Statistical Association, 66, 373-374.
Klick & Tabarrok. (2005). Using terror alert levels to estimate the effect of
police on crime. Journal of Law & Economics, 48, 267-279.
Levitt. (1996). The effect of prison population size on crime rates: Evidence
from prison overcrowding litigation. Quarterly Journal of Economics, 111,
319-351.
Levitt. (1997). Using electoral cycles in police hiring to estimate the effect
of police on crime. American Economic Review, 87, 270-290.
References







Levitt. (2002). Using electoral cycles in police hiring to estimate the effect
of police on crime: Reply. American Economic Review, 92, 1244-1250.
Nelson and Startz. (1990). The distribution of the instrumental variables
estimator and its t-ratio when the instrument is a poor one. Journal of
Business, 63, S125-S140.
Permutt & Hebel. (1989). Simultaneous-equation estimation in a clinical
trial of the effect of smoking on birth weight. Biometrics, 45, 619-622.
Sexton & Hebel. (1984). A clinical trial of change in maternal smoking and
its effect on birth weight. Journal of the American Medical Association, 251,
911-915.
Shea. (1997). Instrument relevance in multivariate linear models: A simple
measure. Review of Economics and Statistics, 79, 348-352.
Sherman & Berk. (1984). The specific deterrent effect of arrest for
domestic assault. American Sociological Review, 49, 261-272.
Staiger and Stock. (1997). Instrumental variables regression with weak
instruments. Econometrica, 65, 557-586.