Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Time series wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Rubin causal model wikipedia , lookup
Least squares wikipedia , lookup
Regression toward the mean wikipedia , lookup
Coefficient of determination wikipedia , lookup
Stat 521 Notes 25 Regression Discontinuity Designs I. Monday’s class For Monday’s class (the final class), please prepare a poster with 9 or so slides that summarize the main economic and econometric issues for what you are working on for your final project. Pick up the posterboard from me today after class or on Thursday. You can include tables and figures from the actual paper. My plan is to have everybody say a minute or two about what they are working on and then leave some time for everybody to walk around and look at each others’ posters. II. Regression Discontinuity Designs The basic setting for a regression discontinuity design is that assignment to a binary treatment is completely or partly determined by the value of a predictor being on either side of a fixed cutoff point. This predictor or covariate itself may be correlated with the potential outcomes, but in a continuous way and so the discontinuity of the conditional expectation of the outcome in this covariate is interpreted as the causal effect of the treatment. The design often arises from administrative decisions, where the allocation of units to a program is partly limited for reasons of resource constraints, and clear transparent rules rather than discretion by administrators is used for allocation. 1 It is useful to distinguish between two settings, the sharp and fuzzy regression discontinuity designs. In the sharp RD design, the assignment is a deterministic function of the covariates: Di I ( X i c} . All units with a covariate value of at least c are assigned to the treatment group and all units with a covariate value less than c are assigned to the control group. This is the design we focused on in the last class. In the fuzzy RD design, the probability of receiving the treatment is discontinuous at c: lim P( Di 1| X i x) lim P( Di 1| X i x) xc xc but the jump is not necessarily of size 1. III. Examples Example 1: College Aid (Wilbert Van der Klaauw, 2001): Van der Klaauw looks at the effect of financial aid on acceptance on college admissions, i.e., attendance at the college offering the financial aid. The regression of attendance on financial aid could provide a biased estimate of the causal effect of financial aid because a candidate’s attractiveness to other colleges, which is an omitted variable, is correlated with the financial aid offer at the college being considered and with attendance at the college being considered. Van der Klaauw proposes a fuzzy RD design. Here X i is score assigned to college applicant based on all information (SAT scores, grades, essay). College applicants are grouped together based on discretized scores: X i c means a higher financial aid category and higher likelihood of financial 2 aid. This is a fuzzy RD design. College aid is not a deterministic function of categories. Other things play a role, but there is a clear discontinuity. Example 2: Effect of School Quality on Housing Values (Sandra Black, 1999). Black is interested in the effect of being in a neighborhood with good schools on housing values. She measures school quality by average test scores. She compares houses on either side of the boundary of the school district. The difference in average housing prices for similar houses (in terms of characteristics, e.g, square footage, number of rooms) is attributed to difference in school quality. This is a sharp RD design with X i equal to location of the house relative to the boundary of the school district. Example 3: Effect of Class Size on Educational Outcomes (Joshua Angrist and Victor Lavy, 1999). Old Jewish law (Maimonides) says class size should not be over 40. Angrist and Lavy look at schools where cohorts are close to multiples of 40. If just over a multiple, the actual class size is much smaller than if the cohort size is just under 40. Example 4: Effect of Incumbency on Elections. David Lee (2008, Journal of Econometrics) studies the effect of party incumbency on reelection probabilities in U.S. House of Representatives elections. Lee is interested in whether the Democratic candidate for a seat in the U.S. House has an advantage if his or her party won the seat last time. 3 To capture the causal effect of incumbency, Lee looks at the likelihood a Democratic candidate wins as a function of relative vote shares in the previous election. Specifically, he exploits the fact that an election winner is determined Di I ( X i 0) , where X i is the vote margin share of victory (the difference between the Democratic and Republican vote shares). This is a sharp RD design. election_data=read.table("david_lee.txt",header=TRUE); win=election_data$win; vote_share_previous=election_data$vote_share_previous; 4 # Local Averages for David Lee Regression grid=c(-1,-.5,-.4,-.35,-.3,-.25,-.2,-.15,-.1,-.05,0,.05,.1,.15,.2,.25,.3,.35,.4,.5,1); avgwin=length(grid)-1; middlegrid=length(grid)-1; for(i in 1:(length(grid)-1)){ avgwin[i]=mean(win[as.logical((vote_share_previous>=grid[i])*(vote_share_previ ous<grid[i+1])==1)]); middlegrid[i]=(grid[i]+grid[i+1])/2; } 5 plot(middlegrid,avgwin,xlim=c(-1,1),xlab="Vote Share in Previous Election",ylab="Probability of Winning This Election",main="Plot of Local Averages") IV. Sharp Regression Discontinuity Design Recall the potential outcomes framework: Yi (1) outcome unit i would have if unit i receives the treatment. Yi (0) outcome unit i would have if unit i does not receive the treatment. (1) (0) Causal Effect of Treatment for unit i: Yi Yi . In the sharp RD design, there are no values of X i for which some people receive the treatment ( Di 1 ) and some don’t ( Di 0 ). People with X i c receive the treatment and those with X i c don’t receive the treatment. This implies the fundamental need for extrapolation. There are no units with X i c for whom we observe Yi (0) . However, we observe units with covariate values arbitrarily close to this value. In order to make use of these units, we need a continuity assumption: (0) (1) Continuity Assumption: E[Y | X x] and E[Y | X x] are continuous in x. This implies that E[Y (0) | X c] lim E[Y (0) |X x] lim E[Y |X x] x c xc and 6 E[Y (1) | X c] lim E[Y (1) |X x] lim E[Y |X x] x c xc Thus, we estimate the average treatment effect at c, (c) E[Y (1) Y (0) | X x] through estimation of the two conditional expectations lim E[Y |X x] lim E[Y |X x] . xc x c The estimand is the difference of two regression functions at a point. We will estimate the regression functions at the point using nonparametric regression. V. Estimation: Nonparametric Regression at the Boundary E[Y |X x] and 0 (c) lim E[Y |X x] . We Let 1 (c) lim xc xc are trying to estimate (c) 1 (c) 0 (c) . To estimate 1 (c) , we will do nonparametric regression on the subsample with X c . To estimate 0 (c) , we will do nonparametric regression on the subsample with X c . In doing this nonparametric regression, it’s important to use a local linear regression that does not suffer from boundary bias. We estimate 0 (c) by ˆ (c) where N 2 X c ˆ ˆ (c), (c) arg min I ( X i c) Yi X i K i , h i 1 and 1 (c) by ˆ (c) where N 2 X c ˆ ˆ (c), (c) arg min I ( X i c) Yi X i K i , h i 1 7 where we use the edge kernel K (u ) (1 | u |) I (| u | 1) . Imbens and Kalyanaraman (working paper, available at http://www.economics.harvard.edu/faculty/imbens/files/rd_09fe b3.pdf) propose a way to choose the optimal bandwidth h. Their optimal bandwidth is 1/ 5 2ˆ 2 (c) / fˆ (c) hopt 3.4375 (2) N 1/ 5 (2) 2 (mˆ (c) mˆ (c)) See attached page from Imbens and Kalyanaraman paper. # Choose optimal bandwidth using Imbens and Kalyanaraman x=vote_share_previous; y=win; threshold=0; # Step 1 n=length(x); h1=1.84*sd(x)*n^(-1/5); nh1.minus=sum((x<threshold)*(x>threshold-h1)); nh1.plus=sum((x>=threshold)*(x<threshold+h1)); meany.h1.minus=sum(y*(x<threshold)*(x>threshold-h1))/nh1.minus; meany.h1.plus=sum(y*(x>=threshold)*(x<threshold+h1))/nh1.plus; fc.hat=(nh1.minus+nh1.plus)/(n*h1); sigmasq.c.hat=(1/(nh1.minus+nh1.plus))*(sum((ymeany.h1.minus)^2*(x<threshold)*(x>threshold-h1))+sum((ymeany.h1.plus)^2*(x>=threshold)*(x<threshold+h1))); # Step 2 median.x.plus=median(x[x>=threshold]); median.x.minus=median(x[x<threshold]); middledata=as.logical((x>=median.x.minus)*(x<=median.x.plus)==1); x.minus.threshold=x-threshold; x.minus.threshold.sq=(x-threshold)^2; x.minus.threshold.cubed=(x-threshold)^3; x.gt.threshold=(x>=threshold); 8 polfit=lm(y~x.gt.threshold+x.minus.threshold+x.minus.threshold.sq+x.minus.thres hold.cubed,subset=middledata); m3=6*coef(polfit)[5]; h2plus=3.56*(sigmasq.c.hat/(fc.hat*max(m3^2,.01)))^(1/7)*nh1.plus^(-1/7); h2minus=3.56*(sigmasq.c.hat/(fc.hat*max(m3^2,.01)))^(1/7)*nh1.minus^(-1/7); # m2plus estimate m2plusdata=as.logical((x>=threshold)*(x<threshold+h2plus)==1); polfit2=lm(y~x.minus.threshold+x.minus.threshold.sq,subset=m2plusdata); m2plus=2*coef(polfit2)[3]; N2plus=sum(m2plusdata); # m2minus estimate m2minusdata=as.logical((x<threshold)*(x>threshold-h2minus)==1); polfit3=lm(y~x.minus.threshold+x.minus.threshold.sq,subset=m2minusdata); m2minus=2*coef(polfit3)[3]; N2minus=sum(m2minusdata); # Step 3 rplus=720*sigmasq.c.hat/(N2plus*h2plus^4); rminus=720*sigmasq.c.hat/(N2minus*h2minus^4); # Optimal bandwidth hopt=3.4375*((2*sigmasq.c.hat/fc.hat)/(m2plus-m2minus)^2)^(1/5)*n^(-1/5); hopt x.minus.threshold.sq 0.6331392 alphahat.c.plus (Intercept) 0.8055923 alphahat.c.minus (Intercept) 0.1955822 est.effect (Intercept) 0.6100101 9 The estimated effect of incumbency is to increase the probability of winning by 0.61, a huge effect. We can use the bootstrap to estimate the standard error of the estimate. VI. Specification Testing A main concern in using the RD design is that other things change around the discontinuity point. This may be other administrative things that are based on the same cutoff, leading (0) to a discontinuity of E[Y | X ] . For example, around age 65 individuals become eligible for a host of services and so comparing those a little younger and a little older than 65 may not allow researchers to infer the effect of one particular program. There are a number of things one can do. Most fit into the conceptual framework of using the same methodology to estimate effects that should be zero if the model is correctly specified. In other words, we use the RD design to estimate whether there is an effect at the discontinuity point on an outcome that we think shouldn’t be affected. Consider the housing example from Black. Suppose that the average characteristics of houses (e.g., number of rooms) ar every different on either side of the school district boundary. This would suggest that there are other differences between the two areas that are not related to school quality, and hence the regression discontinuity estimates would be suspect. 10 VI. Fuzzy Regression Discontinutiy Design In the fuzzy RD design, the probability P( D 1| X x) does not jump from 0 to 1 around the discontinuity point c. Instead the probability is discontinuous at this value. We estimate the treatment effect by lim E[Y |X x] lim E[Y |X x] xc xc lim E[ D|X x] lim E[ D|X x] . xc xc We use the method for the sharp RD design to estimate the numerator and denominator. We can interpret this estimate as an IV estimator. We can think of the indicator I ( X i c) as the IV and the above estimator. See attached page from Imben’s course notes on discussion on connection to instrumental variables References: S. Black (1999), “Do Better Schools Matter: Parental Value of Elementary Education,” Quarterly Journal of Economics, 114, 577-599. Hahn, J., Todd, P. and Van Der Klaauw, W. (2001), “Identification and Estimation of Treatment Effects with a Regression Discontinuity Design,” Econometrica, 69, 201-209. 11 Shadish, W., Campbell, T. and Cook, D. (2002), Experimental and Quasi-experimental Designs for Causal Inference. Van der Klaauw, W. (2002), “Estimating the Effect of Financial Aid Offers on College Enrollment: A Regression-Discontinuity Approach,” International Economic Review, 43(4), 1249-1287. 12