Download Notes 25

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Rubin causal model wikipedia , lookup

Least squares wikipedia , lookup

Regression toward the mean wikipedia , lookup

Coefficient of determination wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
Stat 521 Notes 25
Regression Discontinuity Designs
I. Monday’s class
For Monday’s class (the final class), please prepare a poster with
9 or so slides that summarize the main economic and
econometric issues for what you are working on for your final
project. Pick up the posterboard from me today after class or on
Thursday. You can include tables and figures from the actual
paper. My plan is to have everybody say a minute or two about
what they are working on and then leave some time for
everybody to walk around and look at each others’ posters.
II. Regression Discontinuity Designs
The basic setting for a regression discontinuity design is that
assignment to a binary treatment is completely or partly
determined by the value of a predictor being on either side of a
fixed cutoff point. This predictor or covariate itself may be
correlated with the potential outcomes, but in a continuous way
and so the discontinuity of the conditional expectation of the
outcome in this covariate is interpreted as the causal effect of the
treatment. The design often arises from administrative
decisions, where the allocation of units to a program is partly
limited for reasons of resource constraints, and clear transparent
rules rather than discretion by administrators is used for
allocation.
1
It is useful to distinguish between two settings, the sharp and
fuzzy regression discontinuity designs. In the sharp RD design,
the assignment is a deterministic function of the covariates:
Di  I ( X i  c} .
All units with a covariate value of at least c are assigned to the
treatment group and all units with a covariate value less than c
are assigned to the control group. This is the design we focused
on in the last class.
In the fuzzy RD design, the probability of receiving the treatment
is discontinuous at c:
lim P( Di  1| X i  x)  lim P( Di  1| X i  x)
xc
xc
but the jump is not necessarily of size 1.
III. Examples
Example 1: College Aid (Wilbert Van der Klaauw, 2001): Van
der Klaauw looks at the effect of financial aid on acceptance on
college admissions, i.e., attendance at the college offering the
financial aid. The regression of attendance on financial aid
could provide a biased estimate of the causal effect of financial
aid because a candidate’s attractiveness to other colleges, which
is an omitted variable, is correlated with the financial aid offer at
the college being considered and with attendance at the college
being considered. Van der Klaauw proposes a fuzzy RD
design. Here X i is score assigned to college applicant based on
all information (SAT scores, grades, essay). College applicants
are grouped together based on discretized scores: X i  c means
a higher financial aid category and higher likelihood of financial
2
aid. This is a fuzzy RD design. College aid is not a
deterministic function of categories. Other things play a role,
but there is a clear discontinuity.
Example 2: Effect of School Quality on Housing Values (Sandra
Black, 1999). Black is interested in the effect of being in a
neighborhood with good schools on housing values. She
measures school quality by average test scores. She compares
houses on either side of the boundary of the school district. The
difference in average housing prices for similar houses (in terms
of characteristics, e.g, square footage, number of rooms) is
attributed to difference in school quality. This is a sharp RD
design with X i equal to location of the house relative to the
boundary of the school district.
Example 3: Effect of Class Size on Educational Outcomes
(Joshua Angrist and Victor Lavy, 1999). Old Jewish law
(Maimonides) says class size should not be over 40. Angrist and
Lavy look at schools where cohorts are close to multiples of 40.
If just over a multiple, the actual class size is much smaller than
if the cohort size is just under 40.
Example 4: Effect of Incumbency on Elections. David Lee
(2008, Journal of Econometrics) studies the effect of party
incumbency on reelection probabilities in U.S. House of
Representatives elections. Lee is interested in whether the
Democratic candidate for a seat in the U.S. House has an
advantage if his or her party won the seat last time.
3
To capture the causal effect of incumbency, Lee looks at
the likelihood a Democratic candidate wins as a function of
relative vote shares in the previous election. Specifically, he
exploits the fact that an election winner is determined
Di  I ( X i  0) , where X i is the vote margin share of victory
(the difference between the Democratic and Republican vote
shares). This is a sharp RD design.
election_data=read.table("david_lee.txt",header=TRUE);
win=election_data$win;
vote_share_previous=election_data$vote_share_previous;
4
# Local Averages for David Lee Regression
grid=c(-1,-.5,-.4,-.35,-.3,-.25,-.2,-.15,-.1,-.05,0,.05,.1,.15,.2,.25,.3,.35,.4,.5,1);
avgwin=length(grid)-1;
middlegrid=length(grid)-1;
for(i in 1:(length(grid)-1)){
avgwin[i]=mean(win[as.logical((vote_share_previous>=grid[i])*(vote_share_previ
ous<grid[i+1])==1)]);
middlegrid[i]=(grid[i]+grid[i+1])/2;
}
5
plot(middlegrid,avgwin,xlim=c(-1,1),xlab="Vote Share in Previous
Election",ylab="Probability of Winning This Election",main="Plot of Local
Averages")
IV. Sharp Regression Discontinuity Design
Recall the potential outcomes framework:
Yi (1)  outcome unit i would have if unit i receives the treatment.
Yi (0)  outcome unit i would have if unit i does not receive the
treatment.
(1)
(0)
Causal Effect of Treatment for unit i: Yi  Yi .
In the sharp RD design, there are no values of X i for which
some people receive the treatment ( Di  1 ) and some don’t
( Di  0 ). People with X i  c receive the treatment and those
with X i  c don’t receive the treatment. This implies the
fundamental need for extrapolation. There are no units with
X i  c for whom we observe Yi (0) . However, we observe units
with covariate values arbitrarily close to this value. In order to
make use of these units, we need a continuity assumption:
(0)
(1)
Continuity Assumption: E[Y | X  x] and E[Y | X  x] are
continuous in x.
This implies that
E[Y (0) | X  c]  lim E[Y (0) |X  x]  lim E[Y |X  x]
x c
xc
and
6
E[Y (1) | X  c]  lim E[Y (1) |X  x]  lim E[Y |X  x]
x c
xc
Thus, we estimate the average treatment effect at c,
   (c)  E[Y (1)  Y (0) | X  x]
through estimation of the two conditional expectations
lim E[Y |X  x]  lim E[Y |X  x] .
xc
x c
The estimand is the difference of two regression functions at a
point. We will estimate the regression functions at the point
using nonparametric regression.
V. Estimation: Nonparametric Regression at the Boundary
E[Y |X  x] and 0 (c)  lim E[Y |X  x] . We
Let 1 (c)  lim
xc
xc
are trying to estimate  (c)  1 (c)  0 (c) . To estimate 1 (c) ,
we will do nonparametric regression on the subsample with
X  c . To estimate 0 (c) , we will do nonparametric regression
on the subsample with X  c . In doing this nonparametric
regression, it’s important to use a local linear regression that
does not suffer from boundary bias.
We estimate 0 (c) by ˆ  (c) where
N
2
 X c 
ˆ
ˆ  (c),   (c)  arg min  I ( X i  c) Yi     X i  K  i

 ,
 h 
i 1
and 1 (c) by ˆ  (c) where
N
2
 X c 
ˆ
ˆ  (c),   (c)  arg min  I ( X i  c) Yi     X i  K  i

 ,
 h 
i 1
7
where we use the edge kernel K (u )  (1 | u |) I (| u | 1) .
Imbens and Kalyanaraman (working paper, available at
http://www.economics.harvard.edu/faculty/imbens/files/rd_09fe
b3.pdf) propose a way to choose the optimal bandwidth h.
Their optimal bandwidth is
1/ 5
 2ˆ 2 (c) / fˆ (c) 
hopt  3.4375  (2)
N 1/ 5

(2)
2
 (mˆ (c)  mˆ (c)) 

 

See attached page from Imbens and Kalyanaraman paper.
# Choose optimal bandwidth using Imbens and Kalyanaraman
x=vote_share_previous;
y=win;
threshold=0;
# Step 1
n=length(x);
h1=1.84*sd(x)*n^(-1/5);
nh1.minus=sum((x<threshold)*(x>threshold-h1));
nh1.plus=sum((x>=threshold)*(x<threshold+h1));
meany.h1.minus=sum(y*(x<threshold)*(x>threshold-h1))/nh1.minus;
meany.h1.plus=sum(y*(x>=threshold)*(x<threshold+h1))/nh1.plus;
fc.hat=(nh1.minus+nh1.plus)/(n*h1);
sigmasq.c.hat=(1/(nh1.minus+nh1.plus))*(sum((ymeany.h1.minus)^2*(x<threshold)*(x>threshold-h1))+sum((ymeany.h1.plus)^2*(x>=threshold)*(x<threshold+h1)));
# Step 2
median.x.plus=median(x[x>=threshold]);
median.x.minus=median(x[x<threshold]);
middledata=as.logical((x>=median.x.minus)*(x<=median.x.plus)==1);
x.minus.threshold=x-threshold;
x.minus.threshold.sq=(x-threshold)^2;
x.minus.threshold.cubed=(x-threshold)^3;
x.gt.threshold=(x>=threshold);
8
polfit=lm(y~x.gt.threshold+x.minus.threshold+x.minus.threshold.sq+x.minus.thres
hold.cubed,subset=middledata);
m3=6*coef(polfit)[5];
h2plus=3.56*(sigmasq.c.hat/(fc.hat*max(m3^2,.01)))^(1/7)*nh1.plus^(-1/7);
h2minus=3.56*(sigmasq.c.hat/(fc.hat*max(m3^2,.01)))^(1/7)*nh1.minus^(-1/7);
# m2plus estimate
m2plusdata=as.logical((x>=threshold)*(x<threshold+h2plus)==1);
polfit2=lm(y~x.minus.threshold+x.minus.threshold.sq,subset=m2plusdata);
m2plus=2*coef(polfit2)[3];
N2plus=sum(m2plusdata);
# m2minus estimate
m2minusdata=as.logical((x<threshold)*(x>threshold-h2minus)==1);
polfit3=lm(y~x.minus.threshold+x.minus.threshold.sq,subset=m2minusdata);
m2minus=2*coef(polfit3)[3];
N2minus=sum(m2minusdata);
# Step 3
rplus=720*sigmasq.c.hat/(N2plus*h2plus^4);
rminus=720*sigmasq.c.hat/(N2minus*h2minus^4);
# Optimal bandwidth
hopt=3.4375*((2*sigmasq.c.hat/fc.hat)/(m2plus-m2minus)^2)^(1/5)*n^(-1/5);
hopt
x.minus.threshold.sq
0.6331392
alphahat.c.plus
(Intercept)
0.8055923
alphahat.c.minus
(Intercept)
0.1955822
est.effect
(Intercept)
0.6100101
9
The estimated effect of incumbency is to increase the probability
of winning by 0.61, a huge effect.
We can use the bootstrap to estimate the standard error of the
estimate.
VI. Specification Testing
A main concern in using the RD design is that other things
change around the discontinuity point. This may be other
administrative things that are based on the same cutoff, leading
(0)
to a discontinuity of E[Y | X ] . For example, around age 65
individuals become eligible for a host of services and so
comparing those a little younger and a little older than 65 may
not allow researchers to infer the effect of one particular
program.
There are a number of things one can do. Most fit into the
conceptual framework of using the same methodology to
estimate effects that should be zero if the model is correctly
specified. In other words, we use the RD design to estimate
whether there is an effect at the discontinuity point on an
outcome that we think shouldn’t be affected.
Consider the housing example from Black. Suppose that the
average characteristics of houses (e.g., number of rooms) ar
every different on either side of the school district boundary.
This would suggest that there are other differences between the
two areas that are not related to school quality, and hence the
regression discontinuity estimates would be suspect.
10
VI. Fuzzy Regression Discontinutiy Design
In the fuzzy RD design, the probability P( D  1| X  x) does
not jump from 0 to 1 around the discontinuity point c. Instead
the probability is discontinuous at this value.
We estimate the treatment effect by
lim E[Y |X  x]  lim E[Y |X  x]
xc
  xc
lim E[ D|X  x]  lim E[ D|X  x] .
xc
xc
We use the method for the sharp RD design to estimate the
numerator and denominator.
We can interpret this estimate as an IV estimator. We can think
of the indicator I ( X i  c) as the IV and the above estimator.
See attached page from Imben’s course notes on discussion on
connection to instrumental variables
References:
S. Black (1999), “Do Better Schools Matter: Parental Value of
Elementary Education,” Quarterly Journal of Economics, 114,
577-599.
Hahn, J., Todd, P. and Van Der Klaauw, W. (2001),
“Identification and Estimation of Treatment Effects with a
Regression Discontinuity Design,” Econometrica, 69, 201-209.
11
Shadish, W., Campbell, T. and Cook, D. (2002), Experimental
and Quasi-experimental Designs for Causal Inference.
Van der Klaauw, W. (2002), “Estimating the Effect of Financial
Aid Offers on College Enrollment: A Regression-Discontinuity
Approach,” International Economic Review, 43(4), 1249-1287.
12