Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Estimating Causal Effects with Experimental Data Some Basic Terminology • Start with example where X is binary (though simple to generalize): – X=0 is control group – X=1 is treatment group • Causal effect sometimes called treatment effect • Randomization implies everyone has same probability of treatment Why is Randomization Good? • If X allocated at random then know that X is independent of all pre-treatment variables in whole wide world • an amazing claim but true. • Implies there cannot be a problem of omitted variables, reverse causality etc • On average, only reason for difference between treatment and control group is different receipt of treatment Why is this useful? An Example: Racial Discrimination • Black men earn less than white men in US LOGWAGE | Coef. Std. Err. t -----------+------------------------------BLACK | -.1673813 .0066708 -25.09 NO_HS | -.2138331 .0077192 -27.70 SOMECOLL | .1104148 .0049139 22.47 COLLEGE | .4660205 .0048839 95.42 AGE | .0704488 .0008552 82.38 AGESQUARED | -.0007227 .0000101 -71.41 _cons | 1.088116 .0172715 63.00 • Could be discrimination or other factors unobserved by the researcher but observed by the employer? • hard to fully resolve with non-experimental data An Experimental Design • Bertrand/Mullainathan “Are Emily and Greg More Employable Than Lakisha and Jamal”, American Economic Review, 2004 • Create fake CVs and send replies to job adverts • Allocate names at random to CVs – some given ‘black-sounding’ names, others ‘white-sounding’ • Outcome variable is call-back rates • Interpretation – not direct measure of racial discrimination, just effect of having a ‘black-sounding’ name – may have other connotations. • But name uncorrelated by construction with other material on CV The Treatment Effect • Want estimate of: E yi X i 1 E yi X i 0 Estimating Treatment Effects: the Statistics Course Approach • Take mean of outcome variable in treatment group • Take mean of outcome variable in control group • Take difference between the two • No problems but: – Does not generalize to where X is not binary – Does not directly compute standard errors Estimating Treatment Effects: A Regression Approach • Run regression: yi=β0+β1Xi+εi • Proposition 2.2 The OLS estimator of β1 is an unbiased estimator of the causal effect of X on y: • Proof: Many ways to prove this but simplest way is perhaps: – Proposition 1.1 says OLS estimates E(y|X) – E(y|X=0)= β0 so OLS estimate of intercept is consistent estimate of E(y│X=0) – E(y|X=1)= β0+β1 so β1 is consistent estimate of E(y│X=1) E(y│X=0) • Hence can read off estimate of treatment effect from coefficient on X • Approach easily generalizes to where X is not binary • Also gives estimate of standard error Computing Standard Errors • Unless told otherwise regression package will compute standard errors assuming errors are homoskedastic i.e. • Even if only interested in effect of treatment on mean X may affect other aspects of distribution e.g. variance • This will cause heteroskedasticity • Heteroskedasticity does not make OLS regression coefficients inconsistent but does make OLS standard errors inconsistent ‘Robust’ Standard Errors • Also called: – Huber standard errors – White standard errors – Heteroskedastic-consistent standard errors • Simple to use in practice e.g. in STATA: . reg y x, robust • Statistics course approach – Get variance of estimate of mean of treatment and control group – Sum to give estimate of variance of difference in means Bertrand/Mullainathan: Basic Results Summary So Far • Econometrics very easy if all data comes from randomized controlled experiment • Just need to collect data on treatment/control and outcome variables • Just need to compare means of outcomes of treatment and control groups • Is data on other variables of any use at all? – Not necessary but useful Including Other Regressors • Can get consistent estimate of treatment effect without worrying about other variables • Reason is that randomization ensures no problem of omitted variables bias • But there are reasons to include other regressors: – – – – – Improved efficiency Check for randomization Improve randomization Control for conditional randomization Heterogeneity in treatment effects The Uses of Other Regressors I: Improved Efficiency • Don’t just want consistent estimate of causal effect – also want low standard error (or high precision or efficiency). • Standard formula for standard error of OLS estimate of β is σ2/Var(X) • σ2 comes from variance of residual in regression – (1-R2)* Var(y) • Include more variables and R2 rises – formal proof (Proposition 2.4) a bit more complicated but this is basic idea. The Uses of Other Regressors II: Check for Randomization • Randomization can go wrong – Poor implementation of research design – Bad luck • If randomization done well then W should be independent of X – this is testable: – Test for differences in W in treatment/control groups – Probit model for X on W The Uses of Other Regressors III: Improve Randomization • Can also use W at stage of assigning treatment • Can guarantee that in your sample X and W are independent instead of it being just probabiliistic • This is what Bertrand/Mullainathan do when assigning names to CVs The Uses of Other Regressors IV: Adjust for Conditional Randomization • This is case where must include W to get consistent estimates of treatment effects • Conditional randomization is where probability of treatment is different for people with different values of W, but random conditional on W • Why have conditional randomization? – May have no choice – May want to do it (c.f. stratification) An Example: Project STAR .4 .3 .2 .1 Fraction in Treatment Group .5 • Allocation of students to classes is random within schools • But small number of classes per school • This leads to following relationship between probability of treatment and number of kids in school: 40 60 80 Number of Kids in School 100 120 Controlling for Conditional Randomization • X can know be correlated with W • But, conditional on W, X independent of other factors • But must get functional form of relationship between y and W correct – matching procedures • This is not the case with (unconditional) randomization – see class exercize Heterogeneity in Treatment Effects • So far have assumed causal (treatment) effect the same for everyone • No good reason to believe this • Start with case of no other regressors: yi=β0+β1iXi+εi • Random assignment implies X independent of β1i • Sometimes called random coefficients model What treatment effect to estimate? • Would like to estimate causal effect for everyone – this is not possible – Holland’s fundamental problem of statistical inference • Can only hope to estimate some average • Average treatment effect: ATE E 1i 1 • Proposition 2.5: OLS estimates ATE Observable Heterogeneity • Full outcomes notation: – Outcome if in control group: y0i=γ0’Wi+u0i – Outcome if in treatment group: y1i=γ1’Wi+u1i • Treatment effect is (y1i-y0i) and can be written as: (y1i-y0i )=(γ1- γ0 )’Wi+u1i-u0i • Note treatment effect has observable and unobservable component • Can estimate as: – Two separate equations – One single equation Combining treatment and control groups into single regression • We can write: yi X i y1i 1 X i y0i • Combining outcomes equations leads to: yi X i 1 'Wi u1i 1 X i 0 'Wi u0i 0 'Wi 1 0 ' X iWi u0i X i u1i u0i • Regression includes W and interactions of W with X – these are observable part of treatment effect • Note: error likely to be heteroskedastic Bertrand/Mullainathan • Different treatment effect for high and low quality CVs: Units of Measurement • Causal effect measured in units of ‘experiment’ – not very helpful • Often want to convert causal effects to more meaningful units e.g. in Project STAR what is effect of reducing class size by one child Simple estimator of this would be: E y X 1 E y X 0 E S X 1 E S X 0 • where S is class size • Takes the treatment effect on outcome variable and divides by treatment effect on class size • Not hard to compute but how to get standard error? IV Can Do the Job • Can’t run regression of y on S – S influenced by factors other than treatment status • But X is: – Correlated with S – Uncorrelated with unobserved stuff (because of randomization) • Hence X can be used as an instrument for S • IV estimator has form (just-identified case): 1 ˆ IV X ' S X ' y The Wald Estimator • This will give estimate of standard error of treatment effect • Where instrument is binary and no other regressors included the IV estimate of slope coefficient can be shown to be: E y X 1 E y X 0 E S X 1 E S X 0 Partial Compliance • So far: – in control group implies no treatment – In treatment group implies get treatment • Often things are not as clean as this – Treatment is an opportunity – Close substitutes available to those in control group – Implementation not perfect e.g. pushy parents An Example: Moving to Opportunity • Designed to investigate the impact of living in bad neighbourhoods on outcomes • Gave some residents of public housing projects chance to move out • Two treatments: – Voucher for private rental housing – Voucher for private rental housing restricted for use in ‘good’ neighbourhoods • No-one forced to move so imperfect compliance – 60% and 40% did use it Some Terminology • Z denotes whether in control or treatment group – ‘intention-to-treat’ • X denotes whether actually get treatment • With perfect compliance: – Pr(X=1│Z=1)=1 – Pr(X=1│Z=0)=0 • With imperfect compliance: 1>Pr(X=1│Z=1)>Pr(X=1│Z=0)>0 What Do We Want to Estimate? • ‘Intention-to-Treat’: ITT=E(y|Z=1)-E(y|Z=0) • This can be estimated in usual way • Treatment Effect on Treated TOT E y Z 1 E y Z 0 E X Z 1 E X Z 0 Estimating TOT • Can’t use simple regression of y on Z • But should recognize TOT as Wald estimator • Can estimated by regressing y on X using Z as instrument • Relationship between TOT and ITT: ITT TOT . Pr X 1 Z 1 Pr X 1 Z 0 Most Important Results from MTO • • • • No effects on adult economic outcomes Improvements in adult mental health Beneficial outcomes for teenage girls Adverse outcomes for teenage boys Sample results from MTO • TOT approximately twice the size of ITT • Consistent with 50% use of vouchers IV with Heterogeneous Treatment Effects (More Difficult) • If treatment effect same for everyone then TOT recovers this (obvious) • But what if treatment effect heterogeneous? • No simple answer to this question • Suppose model for treatment effect is: yi 0 1i X i i Proposition 2.6 The IV estimate for the heterogeneous treatment case is a consistent estimate of: p lim ˆ 1, IV where: E i 1i E i i Pr X i 1 Zi 1 Pr X i 1 Z i 0 the difference in the probability of treatment for individual i when in treatment and control group Interpretation • This is weighted average of treatment effects • ‘weights’ will vary with instrument – contrast with heterogeneous treatment case • Some cases in which can interpret IV estimate as ATE How will IV estimate differ from ATE • IV is ATE if no correlation between β1i and πi • Previous formula says depends on covariance of β1i and πi • In some situations can sign – but not always • Example 1: no-one gets treatment in the absence of the programme so i p1i • If those who get treatment when in the treatment group are those with the highest returns then: Cov i , 1i Cov p1i , 1i 0 • IV>ATE • Example 2: treatment is voluntary for those in the control group but compulsory for those in the treatment group • This implies i 1 p0i • If those who get treatment in control are those with highest returns then: Cov i , 1i Cov p0i , 1i 0 • IV<ATE Angrist/Imbens Monotonicity Assumption • Case where IV estimate is not ATE • Assume that everyone moved in same direction by treatment – monotonicity assumption • Then can show that IV is average of treatment effect for those whose behaviour changed by being in treatment group • They call this the Local Average Treatment Effect (LATE) Problems with Experiments • Expense • Ethical Issues • Threats to Internal Validity – Failure to follow experiment – Experimental effects (Hawthorne effects) • Threats to External Validity – Non-representative programme – Non-representative sample – Scale effects Conclusions on Experiments • • • • Are ‘gold standard’ of empirical research Are becoming more common Not enough of them to keep us busy Study of non-experimental data can deliver useful knowledge • Some issues similar, others different