Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Instrumental Variables Isaac Mbiti University of Virginia Basics • Goal: we want to to estimate the returns to education (for example) • We collect data on wages/earnings, years of schooling, and other individual level data. • Outline of lecture • Basic OLS (ordinary least squares regression) – We look at basic relationship btw wages and schooling • Problems with OLS- what problems do we face using OLS • How to address these problems with IV (instrumental varibles) methods Basics • Consider estimating the returns to education. Yi = α + ρS i + vi • • • • • Y = wages/earnings S = years of schooling, ρ = returns to schooling – the coefficient of interest ν is the error term (often denoted with e) For clarity purposes we will focus on simple bivariate regression o Extends to multivariate regression case 60 40 20 Earnings 80 Basics- OLS regression 6 8 10 12 yrs_schooling earnings Fitted values 14 16 Basics- OLS regression 80 What can I learn from looking at this scatter plot? 20 40 What command do you use in stata to do a scatter plot? 60 Earnings 6 8 10 12 yrs_schooling earnings Fitted values 14 16 “Error” is the diff btw prediction and the actual data point Basics Yi = α + ρS i + vi • OLS (ordinary least squares) finds the “best fit line” o Minimizes the sum of squared errors • Which letters denote the slope and intercept? ρ is the _____? And α is the _______ • In Stata we use the command: regress {dep_var} {independent}, [options] Other examples: Regress wages yrs_schooling, regress birth_weight mothers_smoking, robust regress test_score textbooks, cluster(classroomid) Basics- OLS regression . regress earnin yrs_schooling Source SS df MS Model Residual 7459.44217 2864.05109 1 98 7459.44217 29.2250111 Total 10323.4933 99 104.27771 earnings Coef. yrs_schooling _cons 4.331217 2.729021 Std. Err. .2711029 3.002237 Number of obs F( 1, 98) Prob > F R-squared Adj R-squared Root MSE t 15.98 0.91 P>|t| 0.000 0.366 How do we read this regression output table? = = = = = = 100 255.24 0.0000 0.7226 0.7197 5.406 [95% Conf. Interval] 3.793222 -3.228821 4.869212 8.686862 Basics • Consider estimating the returns to education. Yi = α + ρS i + vi • OLS finds the “best fit line” o Minimizes the sum of squared errors • In Stata we use the command: regress {dep_var} {independent}, [options] Other examples: Regress wages yrs_schooling, regress birth_weight mothers_smoking, robust regress test_score textbooks, cluster(classroomid) Basics- Assumptions of OLS • Linearity • Each person’s earnings (Yi) are a linear function of their education (Si), plus an individual-specific error term, νi • νi may be called the error term, the residual, or the deviation. It is a random variable, meaning that the value that any individual gets is a random draw from a distribution. I think of νi as representing the “luck” factor. • For a given level of schooling (S), not everyone makes the same Y because some are lucky and make more and others make less. Basics- Assumptions of OLS • Assumption 2: Error term has zero mean, E(νi)=0 • This means that the positive errors and the negative errors cancel out, so that on average, the error is zero. Basics- Assumptions of OLS • Assumption 3: Homoskedasticity• In math: var(νi |S) is constant • Intuitively this means that the variance of the error term does not depend on S (our independent variable) Basics- Assumptions of OLS • Assumption 3: Homoskedasticity• An example of a violation of this assumptions is if our data is clustered • Dataset 1: 100 standard 5 students each from different schools • Dataset 2: 100 standard 5 students, 10 students each from 10 schools • Suppose you do the following in stata with both datasets • Regress test_scores textbooks • which analysis would be problematic? Basics- Assumptions of OLS • • • • • • Assumption 3: HomoscedasticityViolations of this assumption do not affect the slope coefficients BUT: it affects the standard error of the coefficients This means our T-statistics and confidence intervals will be wrong SO we could give bad policy advice! (eg we say we should implement a program when we really shouldn’t) • So we fix this by clustering or using robust standard errors • In stata: regress birth_weight mothers_smoking, robust regress test_score textbooks, cluster(classroomid) Basics- OLS regression . regress earnin yrs_schooling Source SS df MS Model Residual 7459.44217 2864.05109 1 98 7459.44217 29.2250111 Total 10323.4933 99 104.27771 earnings Coef. yrs_schooling _cons 4.331217 2.729021 Std. Err. .2711029 3.002237 Number of obs F( 1, 98) Prob > F R-squared Adj R-squared Root MSE t 15.98 0.91 P>|t| 0.000 0.366 = = = = = = 100 255.24 0.0000 0.7226 0.7197 5.406 [95% Conf. Interval] 3.793222 -3.228821 If we violate homoscedasticity – Standard errors, T statistics, P value and Confidence Intervals will be wrong! 4.869212 8.686862 Basics- Assumptions of OLS • Assumption 4: Cov(S,νi ) =0 • Recall our regression: Yi = α + ρS i + vi • • • • • What does this assumption mean? No relationship between schooling and the error term Recall we are thinking of the error term as “luck”. So the assumption states that “lucky people” have similar education as “unlucky people” Basics- Assumptions of OLS • Assumption 4: Cov(S,νi ) =0 • Use a simulated data set where I violate this assumption and I know the true relationship between variables. Yi = 2 + 3.7 S i + vi • Ie. I am going to create a data set where the above relationship is true (what is the intercept? What is the slope?) • I also create the data such that Cov(S,νi ) ≠0 Basics- Assumptions of OLS 20 • Assumption 4: Cov(S,νi ) =0 • Use a simulated data set where I violate this assumption. • What does this look like? • Very -20 -10 0 10 “Luck” 5 10 15 yrs_schooling v Fitted values 20 important note: I can only create this graph because I am using simulated data. You cannot create such a graph with regular data So what? • Lets run the ols regression on the simulated data. Yi = α + ρS i + vi . regress earnings yrs_school, robust Linear regression Number of obs F( 1, 998) Prob > F R-squared Root MSE earnings Coef. yrs_schooling _cons 4.291932 -4.316475 Robust Std. Err. .081199 .9084094 t 52.86 -4.75 P>|t| 0.000 0.000 Did my regression give me the right answer? = 1000 = 2793.86 = 0.0000 = 0.7447 = 5.7537 [95% Conf. Interval] 4.132591 -6.099087 4.451272 -2.533864 OLS Assumptions –When All is OK S V Y Omitted Variables • Often S and V are actually correlated because of omitted variables • An omitted variable Cov(S,νi ) ≠0 • From our simulation what problem will this cause in OLS? • Recall previous lectures: • We want to know the impact of a program • But its really hard bc people who enroll in programs are more motivated (for example). • This is an example of an omitted variable problem • ALL methods we discussed are trying to address this problem • RCT, Diff in Diff , RD and now we look at one more… IV (instrumental variables) IV basics • Often S and V are actually correlated because of omitted variables • An omitted variable Cov(S,νi ) ≠0 • Classic example of omitted variable problem: o unobserved ability (which is correlated with schooling) Yi =α + ρ Si + A 'i + vi • In this case Cov(S, V) >0 ie higher ability people get more schooling IV basics • Could we solve the problem by controlling for A? • Yes BUT ONLY IF • • • • o WE CAN MEASURE A properly? (unlikely in practice) o A is the only Omitted variable? (hard to argue and very unlikely) o SO adding lots of variables to the regression is not sufficient. For exposition purposes let us suppose A is the only omitted variable If we can’t measure A then we have a problem Simply estimating this regression would lead to overestimates of ρ Why would we be overestimating? Omitted Variables S Y V We can’t measure A so A is part of V Since V and S are correlated and BOTH affect Y Is S driving Y or is it V (including A)? Could Y be driven solely by V (which includes A)? Omitted Variable BiasJust trust me on this • By knowing (or hypothesizing) about direction of the relationships between S and V; Y and V; we can actually figure out if the omitted variable problem will lead us to overestimate or underestimate the true relationship between Y and S if we use OLS • if S and V are positively correlated & Y and V are positively correlated, OLS will overestimate the true relationship • if S and V are negatively correlated & Y and V are negatively correlated, OLS will overestimate the true relationship • if S and V are positively correlated & Y and V are negatively correlated, OLS will underestimate the true relationship • if S and V are negatively correlated & Y and V are postively correlated, OLS will underestimate the true relationship Let’s Start Simple.. Yi =α + ρ Si + A 'i + vi • We can solve the problem using an instrumental variable (z) which is correlated with S but not A or v • The assumption that z is uncorrelated with A or is called the exclusion restriction. IV intuition: The exclusion restriction Z S V Y IV intuition: The Exclusion Restriction ??? Z S V Y How Do We Estimate ρ? ρ Cov(Yi , Z i ) Cov(Yi , Z i ) / V ( Z i ) = Cov( Si , Z i ) Cov( Si , Z i ) / V ( Z i ) • Note: with a simple regression Eg reg y on x, the OLS estimate is Cov(Y,X)/V(X) • So the denominator is the OLS regression between schooling and our instrument Z o We call this the FIRST STAGE o Does Z predict schooling? • First stage coefficient can’t be zero! That is the instrument has to have some predictive power How Do We Estimate ρ? ρ Cov(Yi , Z i ) Cov(Yi , Z i ) / V ( Z i ) = Cov( Si , Z i ) Cov( Si , Z i ) / V ( Z i ) • Note: with a simple regression Eg reg y on x, the OLS estimate is Cov(Y,X)/V(X) • The numerator is the OLS regression between Earnings and our instrument (Z) • We call this the reduced form relationship o (SIMILAR TO INTENT TO TREAT) • So IV works by taking the coefficients from the reduced form relationship and dividing by coefficients from the first stage • It’s the ratio of the effect of Z on earnings divided by the effect of Z on schooling Where do we get instruments from? • Economic theory • Natural/policy experiments o Eg Duflo (2001) examines impact of rapid school construction on education and wages. • Take away – sometimes the variation in diff in diff can support an IV estimation strategy Where do we get instruments from? • Natural experiments: o Angrist and Kruger (1991) use quarter of birth + compulsory schooling laws in the US as an instrument for years of schooling o How/why does this work? • School year starts Sept 1. You have to be age 6 by that date. Now compare someone born Aug 30 to someone born Sept 3- very similar in age but older kids meets criteria, younger one has to wait till next year. • School leaving laws say you have to be in school till age 16. suppose both leave at 16 who will have more schooling? • Is this a valid instrument? Exercise GPAi = β 0 + β1 PCi + ui • Want to examine effect of personal computer (PC) on gpa in college. PC is a dummy variable 1. Why might PC be correlated with u? 2. Explain why PC is likely to be related to parental income. Does this mean parental income is a good IV for PC? Why or why not? 3. Suppose the university randomly gave grants for PC purchasing to some students. How could you use this to construct an instrumental variable estimate of the equation of interest (above)? Quarter of Birth: An Example of an IV for Schooling (Angrist and Krueger, QJE 1991) • Why might this work? o States only allows kids to start school after they turn 6. o For exposition pretend school starts Sept 1. o So, kids turning 6 from Sept 2-Dec31 can’t join until the following year. But Kids turning 6 in Sept 1 and before are able to enroll in the school. o But compulsory schooling laws require kids to stay in school until their 16th birthday o So…kids with Sept2 -December birthdays end up spending less time in school than kids born in before Sept1 → quarter of birth predictive of years of schooling o Meanwhile, hard to imagine that quarter of birth affects earnings for any reason besides completed schooling (or does it?) Let’s Add Covariates to the Model • Important because maybe people born in the South are more likely to give birth later in the year than people in the North (I made this up) and region of birth is correlated with earnings. No problem: just controls for region/state of birth Si = X i'π 10 + π 11Z i + ξ1i Yi = X i'π 20 + π 21Z i + ξ 2i • The coefficient on Z in first equation is the first stage and the coefficient on Z in second equation is reduced form π 21 ρ= π 11 Two-Stage Least Squares (2SLS) First Stage: Estimate predicted schooling using the Z (and the other covariates) Second Stage: Plug that predicted schooling into equation of interest (structural equation). The estimated coefficient on the predicted schooling will be the estimated ρ Two-Stage Least Squares (2SLS) First Stage: Estimate predicted schooling using the Z (and the other covariates) S i = π 0 + π 1Z i + ei Si Second Stage: Plug that predicted schooling into equation of interest (structural equation). The estimated coefficient on the predicted schooling will be the estimated ρ Yi = α + ρ S i + ui “S-hat” How To Think About This 2SLS retains only variation in S that is generated from variation in X. This variation is not correlated with ability and so we can consistently estimate the effect of schooling on earnings If I use S in the regression of interest- many things drive S including A and other unobservables If I use S_hat (predicted from instrument)- ALL the variation in S_hat is driven by the instrument so we “break link” between schooling and unobservables The Wald Estimator • (This is the British guy who told RAF to reinforce fuselage) • Simplest IV estimator: A single dummy instrument, with one endogenous variable and no covariates • Not so useful in practice but helps to think about intuition E[Yi | Z i = 1] − E[Yi | Z i = 0] ρ= E[ Si | Z i = 1] − E[ Si | Z i = 0] • Only reason for any relationship between Z and Y is that Z affects S so if numerator nonzero, must be because of S. Denominator is just for rescaling so we can answer the question of how much S affects Y. Example with Quarter of Birth 1st Quarter 4th Quarter Difference Compute the wald estimator comparing Q1 to Q4 Example with Quarter of Birth 1st Quarter 4th Quarter Difference • 1. 2. 3. 4. Exercises Suppose you want to test whether girls who attend school a girls high school do better in math than girls in coed (mixed) schools. You have a sample from high school girls and score is a standardized math test. Girlhs = attend all girls school What other factors would you control for in the equation? (be realistic about things that are in data) write a regression equation for #1 Suppose parental support and motivation are unobserved factors in the error term in #2. Are they likely correlated with girlhs? explain Discuss the assumptions needed for the number of girls high schools within a 20 km radius of a girls home to be a valid IV for girlhs Back to the Basics • A good instrument must 1. Be correlated with the endogenous right hand side variable • We also want the First Stage to be informative and strongly statistically significant. (A good F test) 2. Uncorrelated with the error term • Good news is that we can test condition (1)…- this is the importance of the first stage • Bad news is that we cannot test (2) Very Silly and Embarrassing Mistakes • Don’t do 2SLS by hand (getting predicted values of endogenous variable, plugging in to equation of interest, doing OLS) because standard errors WILL BE WRONG • Always put all of the controls in the first and second stage equations! o First stage residual (S minus Shat) uncorrelated by construction with all covariates in first stage. But..these first stage residuals which are included in error in second stage, may be correlated with any X’s that were not in the first stage→INCONSISTENT estimates! o What’s good enough for the second stage is good enough for the first stage!! Forbidden Regressions • Imagine endogenous variable is a dummy. It is FORBIDDEN to get predicted value for this dummy using a probit model and to plug this into second stage equation o WHY? Only OLS is guaranteed to produce first stage residuals which are uncorrelated with fitted values. • If want to examine effect of schooling on earnings but believe nonlinear relationship, include S and S2. But.. o Treat S and S2 as two endogenous variables and so need two instruments (the square of the original instrument is fine) Another Example of an IV • Draft lottery numbers• During the vietnam war there was a lottery for who would be required to join the army (“the draft”). • Basically all men were given a random number based on birthday. If the number was high- not drafted. If low- you were drafted (ie you were required to do military service) • Angrist uses this to examine the relationship between military service and earnings • What does IV tell us? Notice that the IV estimator is the ratio of the change in Y due to change in Z to the change in X due to change in Z E[Yi | Z i = 1] − E[Yi | Z i = 0] ρ= E[ Si | Z i = 1] − E[ Si | Z i = 0] • • You can see that easily from Wald estimator Lets go back to date of birth and compulsory schooling example What does IV tell us? • Suppose you have two types of people in the population: o "ambitious" and "non-ambitious" people. o Distributed evenly in population and across AK birth groups. o Ambitious people get more years of education. • Who is going to respond to the "treatment" in this setting? • “Aug 31 and before" + ambitious will get educated anyway and “Aug 31 and before" + non-ambitious forced to get additional year of schooling Sept 2 and after +ambitious will get educated (so doesn’t respond to treatment) and sept 3 and after + non-ambitious will drop out asap. so really the estimates are driven by “Aug31 and before" + non-ambitious people • --> IV produces a "LOCAL AVERAGE TREATMENT EFFECT" not necessarily the same as the treatment effect on the whole population. Assumptions Need to Make to Interpret IV Estimate as LATE 1. Independence: Instrument as good as randomly assigned (eg: random draft) 2. Exclusion: Instrument only affects outcome via the endogenous right hand side variable (no other mechanism) 3. First stage: Instrument has an effect on endogenous variable 4. Monotonicity: Instrument has either no effect or same effect for everyone. • If we don’t have monotonicty some the instrument pushes some people into treatment while pushing other people out of treatment We’ll get to what a local average treatment effect (LATE) is in a second, but first, a bit more on the assumptions... Useful Notation Yi(d,z) Potential outcome of person i were this person to have treatment d and IV z. Causal effect of veteran status (serving in the army: Causal effect of draft eligibility: D1i D0i Yi (1, zi ) − Yi (0, zi ) Yi ( Di ,1) − Yi ( Di , 0) Whether join military given z=1 (draft eligible, low number) Whether join military given z=0 (draft ineligible, high number) What is Observed? Di =D0i + ( D1i − D0i ) zi =π 0 + π 1i zi + ξ π 0 ≡ E[ D0i ], π 1i ≡ D1i − D0i For any individual, only see one potential treatment: D1i or D0i (but not both)..ie, you don’t see whether person i would have joined military if had gotten different draft number (z) E[π 1i ] Average causal effect of zi on Di. LATE Theorem Suppose independence, exclusion, first stage, and monotonicity, then an instrument can be used to estimate the average causal effect on the affected group. E[Yi | zi = 1] − E[Yi | zi = 0] = E[Y1i − Y0i | D1i > D0i ] E[ Di | zi = 1] − E[ Di | zi = 0] = E[ ρi | π 1i > 0] Examples: o IV estimate of effect of military service on earnings gives us causal effect of military on earnings for men who only served because were drafted (wouldn’t have served otherwise) o IV estimate of schooling on earnings (when IV is birth month) gives us causal effect of schooling on earnings for people who stayed in school a few extra months before dropping out because started school younger LATE Theorem • Compliers: Get treated if z=1 and don’t get treated if z=0: D1i=1 and D0i=0 • Always-takers: Always get treated:D1i =D0i=1, • Never-takers: : Never get treated:D1i =D0i=0, LATE is effect of treatment on compliers. Analogy: We want to know effect of medicine on health in a randomized trial. Some people always take medicine and some never take medicine. IV will only tell us effect of medicine on compliers. Treatment Effect on the Treated • Average causal effect on compliers ≠ treatment effect on the treated o The treated consist of compliers (with z=1) + always takers but the always takers may have a different effect than compliers. • Examples: people who take medicine no matter what may be those that benefit most from medicine. People who complete 12 years of schooling regardless of whether they’re forced to be in school may benefit most from school. o Effect of treatment on the treated is weighted average of effects on compliers and always takers (people who would go to military regardless) E[Y1i − Y0i | Di = 1] Effect on treated (people who serve) = E[Y1i − Y0i | D0i = 1]P[ D0i = 1| Di = 1] + =E[Y1i − Y0i | D1i > D0i ]P[ D1i > D0i , zi =1| Di =1] Effect on always takers Effect on compliers Average Treatment Effect • Unconditional average treatment effect is weighted average of effect on compliers, always-takers, and never-takers IV in Randomized Trials • Imagine randomized trial where no one in control group has access to intervention but participation voluntary among those assigned to treatment • Can’t simply compare those who got the treatment to those who didn’t because self-selection (among those offered treatment) into who gets treated. Usually positive selection (those who take the medicine probably healthier people). • However, IV solves the compliance problem and estimates the effect of treatment (taking the drug) on the treated (those who actually take the drug) Impact of Training Program Earnings as dependent variable (men only) Comparisons by Training Status (OLS) Comparisons by Instrumental Variables Assignment Status (ITT) 3970 1117 1825 Treatment: JTPA training program Only 60% of those assigned to training actually received the training, 2% of those assigned to control group, received training ITT= Intention to treat, measures causal effect of being offered treatment. Because some of the people offered treatment didn’t receive treatment, does not measure causal effect of the treatment IV: ITT divided by difference in compliance rates (first stage) measure effect of treatment on the people who actually get treated. In general, this is LATE but b/c there are practically no always-takers, LATE=treatment effect on treated IV in Fuzzy RD • In many applications of RD, we have imperfect compliance across the discontinuity. We can use whether you were above or below the threshold as an instrument for the program take-up (in this case completing secondary school) Compliers • Different (valid) IVs for same causal relationship can estimate different things • Effect of schooling on earnings o Quarter of birth IVs and compulsory schooling IVs affect same people (potential high school dropouts) and so should have similar estimates o Proximity of college would impact different group of people. o If same results for both, might conclude homogeneous effects of schooling …suggestive of external validity • Effect of family size on children’s education o IV for family size using sex ratio o IV using twins o These should generate different compliant populations,. Since get similar results (no effect of family size), might conclude that there really is no effect for anybody (at least in Israel) Characterizing Compliers • Of course we can’t see in the data who are the compliers vs. always-takers vs. never-takers (we don’t what they would’ve done if different z) • But we can examine the characteristics of compliers o Example: Relative likelihood that complier is college grad = first stage of college grads / first stage of all others o In studies of effect of family size on kids education, • Twins compliers are more likely to be older (younger women probably would have had an additional child even without having had twins) • Twins compliers more educated while sex ratio compliers less educated