Download Differences-in-Differences and A (Very) Brief Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Instrumental Variables:
Introduction
Methods of Economic
Investigation
Lecture 14
Last Time

Review of Causal Effects

Defining our types of estimates:




ATE (hypothetical)
TOT (can get this if SB = 0)
ITT (use this if we’ve got compliance problems)
Methods




Experiment (gold standard but can’t always get it)
Fixed Effects (assumption on within group variation)
Difference-in-Differences (assumption on parallel trends)
Propensity Score Matching (assumption on relationship
between observables, unobseravables, and treatment)
Today’s Class

Introduction to Instrumental Variables



What are they
How do we estimate IV
Tests for specification/fit
Recap of the problem

There is some part of the error that we don’t
observe (maybe behavioral parameters, maybe
simultaneously determined component, etc.)

This component might not be:




Fixed within a group
Fixed over time/space
Related to observables
BUT…this component IS correlated with the
treatment/variable of interest
Our Treatment Effects Model

Consider the following model to estimate the
effect of treatment S on some outcome Y:
Y = αX + ρS + η

Our Treatment here is S



Think of the example of schooling
How much more will you earn if you go to
college?
Can’t observe true underlying ability which is
correlated with college attendance decision and
future earnings
What’s correlated and what’s not
 The
model we want to estimate:
Yi = αX + ρsi +γAi + vi
 We
have that:
E[sv] = 0 (by assumption)
 E[Av] = 0 (by construction)

 The
idea: if A could be observed,
we’d just include it in the regression
and be done
The Instrument….
Assigned to
Treatment (S=1)
A
AH =1
AL =0
Not Assigned
Treatment (S=0)
B
AH =1
AL =0
ITT compares all of A to all of B: this mixes up the compliers
(AH=S=1; AL=S=0) and the non-compliers (AH=1, S=0; AL=0, S=1)
Introducing Instruments

The problem: How to estimate ρ when




A is not observed
A is related to Y
Cov(AS) ≠0
The solution: find something that is


Correlated with S [“Monotonicity”]
Uncorrelated with any other determinant of
the outcome variable Y [“Exclusion
Restriction”]
How does IV work

Call our instrument z

Our two instrument characteristics can be
re-written as



E[z S] ≠0
E[z η] = 0
Then from our equations we can write our
population estimate of ρ as:

Cov(Y , z ) Cov(Y , z ) / V ( z )

Cov( s, z ) Cov( s, z ) / V ( z )
The Instrument….
Assigned to
Treatment (S=1)
A
AH =1
AL =0
Not Assigned
Treatment (S=0)
B
AH =1
AL =0
Using the Instrument, we can determine where the partition is: then
we can compare the part of A which was “randomly assigned
(AH=S=1) to the part of B that is randomly assigned (AL=S=0)
Simplest case for IV

Homogeneous treatment effects (same ρ
for all i )

Dummy Variable for instrument



z= 1 with probability q
Can break-up continuous instruments into sets
of dummy variables or use GLS to generalize
For now—don’t worry about covariates


Simple extension: just include these in both
stages
Simplify our notation later…
Return to LATE

Using z as a dummy that’s 1 with probability
q



Cov(Y, z) = {E[Y | z = 1] – E[Y | z = 0]}q(1 – q)
Cov(s, z) = {E[s | z = 1] – E[s | z = 0]}q(1 – q)
Can rewrite ρ as:
E[Yi | z i  1]  E[Yi | z i  0]

E[si | z i  1]  E[si | z i  0]

Should look familiar: it’s our LATE estimate
Another type of intuition

Remember that E[η | S] ≠ 0 (that’s why
we’re in this mess)


E[Y | S] ≠ ρE[S]
Can condition on Z, rather than S
By the “exclusion restriction” property of
our instrument E[η | Z] = 0
 So now can estimate ρ because
E[Y | z] = ρE[S | z]


If Z is binary, then this simplifies to our Wald
estimator
IV estimate intuition

The only reason for a relationship between
z and Y is the relationship between z and
X

In dummy variable specification: this is
just rescaling the reduced form difference
in means (E[Y | z=1] – E[Y| z=0]) by the
first stage difference in means
(E[S | z=1] – E[S| z=0])
How does IV work: Regression Intuition

To see why this is, Think about our
“structural equations”
Yi = αX + ρsi + ηi
 We
can estimate ρ by getting the
ratio of two different coefficients
First stage: si = π10X + π11zi + ξ1i
 Reduced form: yi = π20X + π21zi + ξ2i

Endogenous
Exogenous Covariates
Exogenous instrument
Rewriting the Structural equation
Plug in the values from the first stage:
Yi = αX + ρsi + ηi
= αX + ρ [π10X + π11zi + ξ1i] + ηi
= [α + ρπ10]X + ρ π11zi + ρ ξ1i+ ηi
= π20X + π21zi + ξ2i
= αX + ρ[π10X + π11zi ] + + ξ2i
Coefficient population
regression of y on s, and
also on the fitted value of
S (and the X’s)
Fitted value in the population
regression of s on z (and X)
Population vs. Estimates

If we had the entire population, we could
measure the relationship between z and S
and obtain the true π’s

Using these π’s we could then obtain the
true ρ

Unfortunately, most of the time, we have
finite samples
Estimating 2SLS

In practice, use finite samples to obtain fitted
value sˆi  ˆ10 X i  ˆ11 z i



Consistent estimate of parameters from OLS
Use these parameters to construct fitted value
Then use this fitted value to construct second
stage estimating equation
y  X  sˆi  [ i   (si  sˆ)]

Can get consistent estimates because covariates
and fitted values are


independent of η (by assumption)
Independent of ( si  sˆ) (by construction)
Bias in 2SLS

2SLS is biased—we’ll talk about this in
detail next time but the general idea is:



We must estimate the first stage (e.g. Ŝ )
In practice, the first-stage estimates reflect
some of the randomness in the endogenous
ŝ
variable (e.g. S)
This randomness generates finite-sample
correlations between first-stage fitted values
and second stage errors



Endogeneous variable correlated with the second
stage errors
Some of that is left in the first stage fitted value
Asymptotically this bias goes to zero but in finite
sample might not
Next time:

Issues with IV estimates



Return to Consistency: what about bias?
Weak instruments
Heterogeneous Treatment Effects