Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Beyond LATE: A Simple Method for Recovering Sample Average Treatment Effects Peter M. Aronow∗ Allison J. Sovey March 24, 2011 ∗ Peter M. Aronow is Doctoral Student, Department of Political Science, Yale University, 77 Prospect Street, New Haven CT, 06520. Allison J. Sovey is Doctoral Candidate, Department of Political Science, Yale University, 115 Prospect Street, Rosenkranz Hall Room 437, Yale University, New Haven CT, 06529. The authors acknowledge support from the Yale University Faculty of Arts and Sciences High Performance Computing facility and staff. Helpful comments from Lara Chausow, Ivan Fernandez-Val, Don Green, Greg Huber, Holger Kern, Malte Lierl, Mary McGrath, Joel Middleton, Cyrus Samii and the participants of the Yale American Politics and Public Policy Workshop are greatly appreciated. Special thanks to Adria Lawrence and Bethany Albertson for generous data sharing and to Dean Eckles for particularly helpful remarks on an earlier version of this paper. The usual caveat applies. Abstract Political scientists frequently use instrumental variables estimators to estimate the Local Average Treatment Effect (LATE), or the average treatment effect among those who comply with treatment assignment. However, the LATE is often not the causal estimand of interest; researchers may instead be interested in the Sample Average Treatment Effect (SATE), or the average treatment effect for the entire sample. We first introduce the compliance score, a pre-treatment covariate that reflects a unit’s probability of compliance with treatment assignment, to researchers in political science. We posit a maximum likelihood estimator for predicting compliance scores even in the presence of two-sided non-compliance. We then develop a new technique, inverse compliance score weighting, that, in conjunction with a standard IV estimator, will allow researchers to easily estimate the SATE. Finally, we estimate both the LATE and SATE for a randomized experiment designed to measure the effects of media exposure and reach striking substantive conclusions. Keywords: Compliance score, instrumental variables, LATE, average treatment effect, causal inference Introduction In the quest to achieve unbiased causal inference, social scientists have increasingly turned to instrumental variables (IV) estimation, whether in experimental or observational settings. The growing popularity of IV estimation, however, has provoked criticism from several influential scholars. In particular, Deaton (2009) and Heckman and Vytlacil (2009) object to the increased reliance on IV methods. Although both papers criticize IV methods for a variety of reasons, one of the main critiques concerns the estimation of the Local Average Treatment Effect (LATE). The LATE is the average treatment effect among compliers, units that receive treatment if and only if induced to do so. Both papers argue that the LATE is not usually the parameter of interest, as “we are unlikely to learn much about the process at work” by estimating the LATE (Deaton 2009, 10) and the LATE “is often very difficult to interpret as an answer to an interesting economic question” (Heckman and Vytlacil 2009, 19). Problems with the LATE are arguably so severe that Deaton (2009, 4) concludes that experiments have “no special ability to produce more credible knowledge than other methods.” Similarly, Heckman and Vytlacil (2009, 20) argues that reliance on LATE means that “problems of identification and interpretation are swept under the rug.” Deaton suggests a partial alternative to using the LATE: simply compare the average outcome of the group assigned to treatment with that of the group assigned to control. This approach, estimating the intention-to-treat (ITT) effect, may be helpful if the experimenter is not concerned with the magnitude of the effect. Yet ultimately Deaton concludes that this an imperfect solution and that the LATE estimate is not much help without strong theoretical underpinnings. Even Imbens (2009, 4), who joins the fray to defend the special place of randomized experiments in the social sciences, must admit that “in many cases the local average treatment effects...are not the average effects that researchers set out to estimate.” Thus, although the authors disagree on the importance and validity of experiments, they 1 seem to largely agree on the limitations of the LATE. We acknowledge the limitations of the LATE described above and address the arguments against the use of randomized experiments by offering a solution to the problem of reliance on the LATE. Following the basic logic of inverse probability weighting, we develop a simple procedure that allows for the consistent estimation of the Sample Average Treatment Effect (SATE), the average treatment effect that would be observed across the entire sample if all units were to comply with treatment assignment. We introduce the compliance score to researchers in political science, and develop a novel maximum likelihood estimator (MLE) to predict compliance scores for units, even in the presence of two-sided non-compliance. We then develop a new technique, inverse compliance score weighting (ICSW), that, in conjunction with a standard IV estimator, will allow researchers to easily estimate the SATE. Finally, we estimate both the LATE and SATE for a randomized experiment designed to measure the effects of media exposure and reach striking substantive conclusions. Potential Outcomes Framework We develop our method using the Neyman-Rubin potential outcomes framework as elucidated in Rubin (1978), focusing on the case of a binary treatment. For unit i, let Y0i be the outcome if i does not receive treatment and let Y1i be the outcome if i receives treatment. Ideally, we would take the difference between Y1i and Y0i to obtain the effect of the treatment. This is impossible, however, as we only observe either Y0i or Y1i for a given unit; we cannot directly observe the counterfactual. This problem may be solved if units are randomly assigned to treatment. Let Di be an indicator variable for whether a subject receives treatment so that Di = 1 when a subject is treated and Di = 0 when a subject is not treated. The treatment effect for a given unit i is the difference between this unit’s outcomes 2 in both possible states of the world, Y1i − Y0i . The average treatment effect (ATE) is the expected value of the treatment effect, or E(Y1i ) − E(Y0i ). What we observe, however, is E(Yi |Di = 1) − E(Yi |Di = 0) = E(Y1i | Di = 1) − E(Y0i | Di = 1) (average treatment effect on the treated) + E(Y0i | Di = 1) − E(Y0i | Di = 0) (selection bias) (Angrist and Pischke 2009). The ATE may thus be estimated using the difference in means between the treatment and control groups and will be unbiased when the selection bias is zero. However, when Di is systematically related to unobserved causes of Yi , these estimates will be biased. For example, in a randomized experiment with non-compliance, Di is no longer randomly assigned and this assumption is violated. Further assumptions are necessary to recover causal effects in this context. By using the treatment assignment, Z, as an instrumental variable for treatment received, we may recover the LATE. Following Angrist, Imbens and Rubin (1996), the sample may be divided into four groups: always-takers, never-takers, compliers, and defiers. Define D0i as the treatment condition of unit i when assigned to control and D1i as the treatment condition of unit i when assigned to treatment. Always-takers are units that receive treatment regardless of whether they are assigned to treatment or control, so that D0i = 1 and D1i = 1. Conversely, nevertakers never receive treatment regardless of their treatment assignment, so that D0i = 0 and D1i = 0. Compliers receive treatment if assigned to treatment and do not receive treatment if assigned to control, so that D0i = 0 and D1i = 1 (or D1i > D0i ). Defiers receive treatment if assigned to control and receive no treatment if assigned to treatment, so that D0i = 1 and D1i = 0 (or D0i > D1i ). In the example of a randomized clinical trial to assess the causal effect of taking a medication, always-takers will receive the medication regardless of which group they are assigned to, never-takers will not take the medication, compliers will take the medication if and only if assigned to the treatment group and defiers will take the medication if and only if assigned to the control group. Although these identifiers may be interpreted deterministically, they also have a probabilistic interpretation that we favor. For 3 any given unit i, there may exist pre-treatment (unobserved) covariates indicating unit i’s propensity to be in each of these four groups. If the study is interpreted as a sample of a larger population of studies, then, e.g., unit i could be an always-taker in one realization and a never-taker in a different realization. This distinction will become important as we discuss the logic of ICSW in later sections. Using an instrumental variables estimator, we may estimate the average treatment effect among compliers, or the local average treatment effect (LATE). To do so, we must make five additional assumptions, as explicated by Angrist, Imbens and Rubin (1996). First, we must assume that the exclusion restriction is valid, or that Z only affects Y through D. Second, we assume the instrumental variable (Z) to some degree predicts the endogenous independent variable (D). Third, we invoke the stable unit treatment value assumption (SUTVA), which states that the potential outcomes of D and Y are invariant with respect to the allocations of Z and D respectively. Fourth, we assume that the population contains no defiers, i.e., Pr(D0i > D1i ) = 0. Fifth, we must assume that the treatment assignment Zi is randomly assigned. Angrist, Imbens and Rubin (1996) demonstrate that these assumptions imply that the ITT effect of Z on Y (the average effect of treatment assignment on the outcome) divided by the ITT effect of Z on D (the average effect of treatment assignment on treatment received) is equal to the average causal treatment effect for compliers. We present an abbreviated version of their proof: by exclusion and and independence, E(Yi | Zi = 1) = E(Y0i + (Y1i − Y0i )D1i ). Similarly, E(Yi |Zi = 0) = E(Y0i + (Y1i − Y0i )D0i ). Thus, we have the ITT, E(Yi | Zi = 1) − E(Yi | Zi = 0) = E((Y1i − Y0i )(D1i − D0i )) = E(Y1i − Y0i |D1i > D0i ) Pr(D1i > D0i ) 4 (1) and, using analogous reasoning, we find that E(Di | Zi = 1) − E(Di | Zi = 0) = E(D1i − D0i ) = Pr(D1i > D0i ) (2) Therefore, we have E(Y1i − Y0i |D1i > D0i ) = E(Yi | Zi = 1) − E(Yi | Zi = 0) E(Di | Zi = 1) − E(Di | Zi = 0) (3) which is the canonical Wald IV estimator (Angrist and Pischke 2009). As demonstrated, this provides consistent estimates of the local average treatment effect (LATE), or the average effect of the treatment among the compliers. Because IV estimators provide asymptotically unbiased estimates of the LATE, they have become the de facto standard for causal inference in experiments with noncompliance in the social sciences. As we show below, however, the SATE is also easily computed under reasonable assumptions. We now turn to the compliance score, the key component for the estimation of the SATE. The Compliance Score As pioneered by Follmann (2000) in biostatistics, the compliance score is a pretreatment covariate that identifies a unit’s probability of being a complier. By inspection, and following the above nomenclature, the probability of compliance for unit i is E(Di |Zi = 1)−E(Di |Zi = 0). With the assumptions outlined in the previous section, we know that the compliance score (and all other pretreatment covariates) is asymptotically orthogonal to Z. In conjunction with a known covariate profile, the compliance score for a given unit is simple to compute if there exist no always-takers, as E(Di |Zi = 1) − E(Di |Zi = 0) = E(Di |Zi = 1). Under the assumption of no always-takers, Follmann (2000) estimates the LATE of smoking cessation for patients in a clinical trial. Following the general logic of propensity score estimation, 5 Follmann uses a simple logistic regression of D on known covariates for units for which Zi = 1. The fitted values of D from this regression may then be extrapolated to the entire sample in order to compute a compliance score for each unit. One particularly interesting application of the compliance score may be found in Joffe and Brensinger (2003), which suggests using the compliance score to weight observations in an instrumental variables regression in order to improve efficiency. Because the ITT estimates from clinical trials are often too conservative due to noncompliance, Joffe applies this method to focus the analysis on strata with better compliance. Other researchers have also made similar recommendations, e.g., Joffe, Ten Have and Brensinger (2003) and Roy, Hogan and Marcus (2008). However, to our knowledge, none of the existing methods for computing a compliance score have thus far been robust to the inclusion of always-takers. We posit a maximum likelihood estimation technique (similar to that of Yau and Little 2001), to recover the compliance score even in the presence of two-sided non-compliance. Two-sided noncompliance means that there are both always-takers and never-takers in the sample; note that, however, we must still assume that there are no defiers in the sample. As defined above, D is an indicator variable for treatment received and Z is an indicator variable for treatment assigned. Further, we define Xi as an exhaustive (i.e, sufficient for the model to be fully specified) vector of predictive covariates for unit i. For convenience, we make three easily relaxed parametric assumptions. The first assumption is that the probability of being an always-taker or a complier is a function of the covariates with a known distribution. PA,C,i = Pr(D1i > D0i ∪ D0i = 1) = F (θA,C Xi ), (4) where PA,C,i is the probability that unit i is either a complier or an always-taker, θA,C is a vector of coefficients to be estimated and F (·) is the cumulative distribution function (CDF) 6 for an arbitrary distribution. For the purposes of this paper, we will use a probit model, so F (·) = Φ(·), where Φ is the Normal CDF. However, other binary choice models, including logit and generalized additive models (GAM) (Hastie and Tibshirani 1990) could be used. Second, we similarly specify PA|A,C,i = Pr(D0i = 1|D1i > D0i ∪ D0i = 1) = F (θA|A,C Xi ), (5) where PA|A,C is the probability that unit i is an always-taker conditional on it being either an always-taker or a complier, and θA|A,C are coefficients to be estimated. Therefore, we may define compliance score PC,i = Pr(D1i > D0i ). Since we know, by definition, that compliers receive treatment if and only if assigned and that always-takers always receive treatment, Pr(Di = 1) = Pr(D1i > D0i )Zi + Pr(D0i = 1) = PA,C,i (1 − PA|A,C,i )Zi + PA,C,i PA|A,C,i . (6) This expression represents a fully specified model for Pr(Di = 1), and that this value is strictly bounded within (0, 1) since Z ∈ {0, 1} and PrA,C ∈ (0, 1). This boundedness along with the binary nature of Z allows us to specify our third assumption: Di is Bernoulli distributed and observations are independent. Using equation 6, the likelihood of the model may now be specified as: L(PA|A,C,i , PA,C,i | D, Z) = (PA,C,i (1 − PA|A,C,i )Zi + PA,C,i PA|A,C,i )Di (1 − PA,C,i (1 − PA|A,C,i )Zi − PA,C,i PA|A,C,i )1−Di . 7 (7) Combining equations 4, 5 and 7, L(θA,C , θA|A,C |D, Z) = Di ΠN i=1 ((F (θA,C Xi )(1 − F (θA|A,C Xi ))Zi + F (θA,C Xi )F (θA|A,C Xi )) (1 − F (θA,C Xi )(1 − F (θA|A,C Xi ))Zi − F (θA,C Xi )F (θA|A,C Xi ))1−Di ). (8) Maximizing this likelihood function numerically may lead to false convergence when a standard optimizer (i.e., Newton-Raphson or Nelder-Mead) is used. Instead, we recommend using a global optimization technique, such as GENOUD (Mebane and Sekhon 2010) to ensure that the optimizer has converged to the global maximum.1 After estimating θA,C and θA|A,C , we may compute both the compliance score for unit i, Pr(D1i > D0i ) = F (θA,C Xi )(1−F (θA|A,C Xi )), and the always-taker score, Pr(D0i = 1) = F (θA,C Xi )F (θA|A,C Xi ). (The probability that unit i is a never-taker is simply 1 − Pr(D0i = 1) − Pr(D1i > D0i ).) In order to prevent very small compliance scores from forming, we may constrain Pr(D1i > D0i ) ≥ α, where α is an arbitrary threshold for a minimum probability.2 For the purposes of this paper, we choose α = 0.05. Note that, given a compliance score profile for any given population, we may employ a permutation test to assess the efficacy of the compliance identification. Under the null hypothesis that we have not recovered meaningful compliance scores, we would expect that SSR(PC |X) = SSR(PC |X0 ), where SSR is the sum of squared residuals between the predicted probability of D and the actual value of D, and X0 has had its rows randomly permuted. This would imply that, regardless of the particular configuration of covariate profiles, we would observe the same SSR for compliance scores. If we may reject this null hypothesis, we have confidence that our compliance score identification is capturing 1 An R package, cscore, will be made available shortly to implement all procedures described in this paper. This procedure, known as trimming (or, more appropriately, Winsorizing), is performed because very low probabilities can introduce instability in the estimates resulting from inverse probability weighting (Elliott 2009). 2 8 some of the true variation in compliance scores.3 Inverse Compliance Score Weighting With an identified compliance score profile for the entire sample, we may utilize Inverse Compliance Score Weighting to recover the SATE.4 ICSW follows a similar logic as inverse probability weighting: if a type of observation is disproportionately sampled, we may reweight the sample to reflect the distribution of types in the population. That is, the complier population may be interpreted as a non-random subsample of the sample population of interest. We may reweight the entire sample such that the distribution of covariates in the complier population is identical to the covariate distribution of the (pre-weighted) entire sample. In order to recover the SATE, three assumptions (in addition to the instrumental variables assumptions described above) are required. First, we assume latent ignorability (conditional on a covariate profile) of compliance with respect to heterogeneous treatment effects (Esterling, Lazer and Neblo unpublished; Frangakis and Rubin 1999). Define categorical variable h ∈ H = (1, 2, 3..., M ) as a “type” of unit – that is, a unit with a known, observed covariate P profile. By definition, h Pr(H = h) = 1. Similar to the logic behind inverse probability weighting for sample designs (see Hirano and Imbens 2001, for a discussion and example of inverse propensity score weighting), the assumption of latent ignorability here implies that E(Y1 − Y0 | H = h) = E(Y1 − Y0 | D1 > D0 ∩ H = h). In other words, the expected value of the treatment effect conditional on the type of unit is equal to the expected value of the treatment effect conditional on the type of unit and compliance. Note that this is 3 Thanks to Holger Kern for this suggestion. In work developed contemporaneously with our paper, Angrist and Fernandez-Val (2010) presents a similar method to reweight the LATE to target populations. Also, for an integrative framework for recovering the SATE along with other causal quantities, see Esterling, Lazer and Neblo (unpublished). Note, however, that this method is not robust to two-sided noncompliance. Additionally, see Frangakis and Rubin (1999) for a similar approach involving missing data. 4 9 a fairly weak ignorability assumption. We do not specify that both potential outcomes are independent of compliance conditional on the type of unit; rather, we only require that the difference between potential outcomes is ignorable. Second, we assume that the compliance score has been properly identified for each type. Third, we assume that the compliance score for all units is strictly bounded ∈ (0, 1] – each unit has a non-zero probability of compliance across the population of hypothetical (identical) experiments. This assumption has both philosophical and practical implications. If a unit can never be a complier, there exists no counterfactual with which to generate an ATE. Practically, if a unit has a zero probability of compliance, the weighting procedure will produce infinite weights, thus leading to an undefined solution. In order to formally describe the process, we return to the derivation of the LATE; note that, as above, our proof relies on the asymptotic qualities of each of the estimated quantities. Our proof strategy is to derive expressions for the inverse compliance score weighted numerator (ITT) and denominator (probability of compliance) of equation 3, thus demonstrating the asymptotic value of the IV estimator after ICSW. As in equation 1, IT T = Pr(D1 > D0 )E(Y1 − Y0 | D1 > D0 ), which we may express as a weighted sum of the conditional ITTs: IT T = X Pr(D1 > D0 | H = h) Pr(H = h)E(Y1 − Y0 | D1 > D0 ∩ H = h). (9) h With the equations above, we are now able to apply ICSW: for each h ∈ H, we multiply by the weight: 1 , P r(D1 >D0 |H=h) or the inverse of the compliance score. We then divide by the average weight across all observations (to normalize the weight). We define the average P weight wc = h Pr(DPr(H=h) . 1 >D0 |H=h) 10 For equation 9, we can write the weighted ITT as follows: IT T w = 1 X Pr(D1 > D0 | H = h) Pr(H = h)E(Y1 − Y0 | D1 > D0 | H = h) . wc h Pr(D1 > D0 | H = h) (10) Since the proportion of compliers terms cancel out, equation 10 reduces to: IT T w = 1 X Pr(H = h)E(Y1 − Y0 | D1 > D0 ∩ H = h). wc h (11) Returning to the latent ignorability assumption laid out above, we can rewrite equation 11 as P IT T w = w1c h Pr(H = h)E(Y1 −Y0 | H = h). By the law of total probability, the reweighted numerator of the IV estimator is IT T w = 1X 1 Pr(H = h)E(Y1 − Y0 | H = h) = E(Y1 − Y0 ). wc h wc (12) We can now reweight the denominator of the IV estimator by expanding the definition of the complier population, weighting, and simplifying: Pr(D1 > D0 ) = X Pr(D1 > D0 | H = h) Pr(H = h) (13) h Pr(D1 > D0 )w = 1 X Pr(D1 > D0 | H = h) Pr(H = h) 1 X = Pr(H = h) wc h Pr(D1 > D0 | H = h) wc h (14) Dividing equation 12 by equation 14, we have asymptotic value of the ICSW reweighted IV estimator, IT T w = Pr(D1 > D0 )w 1 E(Y1 − wc 1 wc Y0 ) = E(Y1 − Y0 ), (15) which is the SATE.5 In order to produce standard errors, we recommend bootstrapping the entire process – from computing the compliance score to IV estimation (Abadie 2002). 5 Even if latent ignorability is not satisfied, in the case of homogeneous treatment effects, we will still 11 Simulation Studies We present a simple simulation study in order to demonstrate the efficacy of the method. We define three types of units, each with different always-taker, never-taker and compliance scores. Type 1 has a compliance score of 0.55, an always-taker score of 0.40, and a nevertaker score of 0.05. Type 2 has a compliance score of 0.70, an always-taker score of 0.15, and a never-taker score of 0.15. Type 3 has a compliance score of 0.40, an always-taker score of 0.05, and a never-taker score of 0.55. Each of the three types represents 1/3 of the simulation sample. We also define three covariates, each imperfectly measuring the type of units. For covariates h ∈ {1, 2, 3}, each covariate is generated as 3I(type = h)+N(0, 1), where I(·) is the indicator function. Note that these covariates measure the types with some degree of error. The outcome variable, Y , is defined as (3I(type = 1) − 6I(type = 2) + 6I(type = 3)) D − 15I(type = 1) − 15I(type = 2) + 0I(type = 3) + N(0, 5). Therefore, both treatment effects and intercepts vary by type. 50% of units are randomly assigned to treatment (Z = 1) and the other 50% of units are randomly assigned to control (Z = 0). Thus, while the sample average treatment effect is 1.000, the local average treatment effect is -0.05. We simulate with N ∈ {300, 600, 900, 1200, 1500}, each with 10,000 samples. We then perform four estimation procedures on all samples: standard OLS, ITT (OLS using treatment assignment), 2SLS and ICSW. For the ICSW, we set the minimum compliance score α = 0.05. Assuming that the causal estimand of interest is the SATE, we then compute the bias, standard deviation and root-mean-squared-error (RMSE) for estimates from each of these procedures. The results of this simulation study are presented in Table 1. While the variance is always higher for the ICSW estimator than for OLS, ITT or 2SLS, the RMSE associated recover the SATE. We define constant treatment effect τ = Y1i − Y0i , ∀i. Therefore, X X IT T w = Pr(H = h)E(Y1 − Y0 | H = h) = Pr(H = h)τ = τ . w Pr(D1 > D0 ) h (16) h By applying ICSW before applying the IV estimator, we have demonstrated that we may recover the SATE under two different identifying assumptions. 12 with the estimator is superior once N > 300. Furthermore, in all cases, bias is consistently smaller for the ICSW estimator than it is for OLS, ITT or 2SLS. If the SATE is the causal estimand of interest, ICSW appears to provide a superior alternative for its estimation. [TABLE 1 ABOUT HERE] Application We now discuss the application of our method using data from Albertson and Lawrence (2009). The authors performed an experiment (N = 507) in which survey respondents in Orange County, California were randomly assigned to receive a treatment encouraging them to view a Fox News debate on affirmative action that was to take place the eve of the 1996 presidential election. Shortly after the election, these respondents were re-interviewed. The post-election questionnaire asked respondents whether they viewed the Fox News debate, whether they supported a California proposition (209) to eliminate affirmative action (coded 1 if the respondent supported the proposition and 0 otherwise) and whether they felt informed (coded on a scale from 1-4 from least to most informed). The authors use a standard instrumental variable design to address the fact that some who were not assigned to treatment reported viewing the debate and some who were assigned to treatment did not report viewing the debate. This noncompliance was nontrivial: only approximately 40% of the respondents complied with treatment assignment. Albertson and Lawrence’s IV regression results show a statistically insignificant but negative relationship between program viewing and support for the proposition and a nearly statistically significant result for the positive relationship between program viewing and feeling more informed about the issue among compliers. Albertson and Lawrence’s original findings are presented in columns (1) and (2) of Table 2.6 However, this may not be the 6 Note that our replication of their results differs slightly from their original results due to the fact that 13 substantive question of interest. Rather, we may wish to know what sort of effects Fox News debate watching would have on attitudes and knowledge for the entire sample. [TABLE 2 ABOUT HERE] We first compute compliance scores for the sample. We use the eight covariates used by Albertson and Lawrence: television news-watching habits (coded on a seven point scale from never watches to watches everyday), newspaper reading habits (coded on seven point scale from never reads to reads everyday), interest in politics and national affairs (coded on a four point scale from low interest to high interest), party ID (coded on a seven point scale from Republican to Democrat), income (coded on a scale from 1 to 11 from poorest to richest), sex (coded 1 if the respondent is female and 0 otherwise), education (coded on a 13 point scale from least to most educated) and race (coded 1 if the respondent is white and 0 otherwise). Using our MLE, we obtain a mean compliance score of 0.4167 and a mean always-taker score of .0424. Note that this closely comports with our estimated mean proportion of compliers and always-takers, 0.4074 and 0.0435 respectively. Table 3 displays the covariance matrix of compliance scores and covariates. The compliance score is positively correlated with higher income, higher education, greater interest in politics, more frequent news watching and paper reading, being white, being male and identifying as a Democrat. Compliers are more likely to exhibit these qualities, which conforms with our expectations, as we intuitively expect that those who are more interested in politics, read the paper more frequently and watch more news programs would be more likely to comply with watching the program. [TABLE 3 ABOUT HERE] We now apply the proposed permutation test for the identification of compliance scores. Figure 1 presents the distribution of SSRs associated with the null hypothesis that SSR(PC |X) we used mean imputation for missing values in the covariate profile. 14 = SSR(PC |X0 ). Note that the observed SSR is on the outskirts of the SSR distribution, such that the probability of seeing an SSR this extreme is p < 0.001. [FIGURE 1 ABOUT HERE] Since we may feel confident that we have identified (at least) some portion of the true compliance score distribution, we may now apply ICSW. The results of an IV regression (2SLS) after ICSW, presented in columns (3) and (4) of Table 2, are striking. Although Albertson and Lawrence (2009) finds that compliers would have been .27 points more informed after viewing the program, we find that viewers in the entire subject population are .60 points more informed after viewing. If the SATE were the real parameter of interest in this study, using the LATE to approximate it would lead to gross underestimates of the would-be treatment effect. These results comport with our intuitions about the effects of watching the political debate. Recall from Table 3 that non-compliers tend to be less educated and pay less attention to politics. For example, in the control condition, units with a compliance score below the median of 0.435 have a mean knowledge score of 2.87 points and units with a compliance score above the median have a mean knowledge score of 3.48 points. Since the non-compliers (i.e., generally less educated and less informed individuals) were less knowledgeable to begin with, they would naturally learn more from watching the broadcast. Turning to our analysis of the effect of program viewing on support for the measure, we see that Albertson and Lawrence (2009)’s 2SLS estimate and our ICSW estimate are quite similar. While Albertson and Lawrence (2009) finds that compliers were 0.7 percentage points less likely to support the ballot measure, we find that the general sample population would have been 0.6 percentage points less likely to support the measure if assigned to treatment. This finding suggests that the effect of viewing the debate on support for compliers is very similar to the effect for the overall sample population. This result lends itself to two 15 interpretations. Since we cannot reject the null hypothesis of no treatment effect for either the LATE or the SATE, we may suspect that there was no effect on attitudes resulting from the treatment. Alternatively, if we believe there is a treatment effect, this finding suggests that this effect is relatively homogeneous with respect to the population of interest. Together, the analysis of these two measures using ICSW highlights the fact that, ex ante, it is unclear how close the LATE will be to the SATE. In cases where the SATE is the parameter of interest, this uncertainty may be highly problematic. Conclusion Although the SATE is often the real parameter of interest, scholars typically focus on the LATE because it is frequently the only available causal estimand. We have demonstrated that the LATE may not be representative of the general population, and reliance on the LATE may in fact lead to substantive conclusions that are radically different from those suggested when estimating the SATE. In this paper, we have provided a method to recover the SATE using only assumptions standard to instrumental variables estimators and inverse probability weighting for sample correction. Although we have demonstrated our method using an experimental case study, the method can be applied to virtually any research design that uses instrumental variables estimation. The method can be extended to applications using continuous endogenous variables and continuous instruments; as Angrist and Imbens (1995) demonstrates, 2SLS is simply a weighted average of grouped data IV estimators. ICSW thus allows researchers to estimate the SATE, a causal estimand previously considered out of reach, in a vast array of applications throughout the social sciences. 16 References Abadie, Alberto. 2002. “Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models.” Journal of the American Statistical Association 97(457):284–292. Albertson, Bethany and Adria Lawrence. 2009. “After the Credits Roll: The Long-Term Effects of Educational Television on Public Knowledge and Attitudes.” American Politics Research 37(2):275–300. Angrist, Joshua D. and Guido W. Imbens. 1995. “Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity.” Journal of the American Statistical Association 90(430):431–442. Angrist, Joshua D, Guido W. Imbens and Donald B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91:444–55. Angrist, Joshua D. and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press. Angrist, Joshua and Ivan Fernandez-Val. 2010. “ExtrapoLATE-ing: External Validity and Overidentification in the LATE Framework.” NBER Working Paper . Deaton, Angus. 2009. “Instruments of Development: Randomization in the Tropics, and the Search for the Elusive Keys to Economic Development.” Keynes Lecture pp. 1–54. Elliott, Michael R. 2009. “Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models.” Journal of Official Statistics 25(1):1–20. Esterling, Kevin M., David M.J. Lazer and Michael A. Neblo. unpublished. “Estimating Treatment Effects in the Presence of Noncompliance and Nonresponse: The Generalized Endogenous Treatment Model.”. 17 Follmann, Dean A. 2000. “On the Effect of Treatment Among Would-Be Treatment Compliers: An Analysis of the Multiple Risk Factor Intervention Trial.” Journal of the American Statistical Association 95(452):1101–1109. Frangakis, Constantine E. and Donald B. Rubin. 1999. “Addressing Complications of Intention-to-Treat Analysis in the Combined Presence of All-or-None TreatmentNoncompliance and Subsequent Missing Outcomes.” Biometrika 86(2):365–379. Hastie, Trevor and Rob Tibshirani. 1990. Generalized Additive Models. Chapman and Hall. Heckman, James J. and Edward J. Vytlacil. 2009. “Comparing IV with Structural Models: What Simple IV Can and Cannot Identify.” Working Paper . Hirano, Keisuke and Guido W. Imbens. 2001. “Estimation of Causal Effects using Propensity Score Weighting: An Application to Data on Right Heart Catheterization.” Health Services and Outcomes Research Methodology 2(3-4):259–278. Imbens, Guido W. 2009. “Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009).” Working Paper . Joffe, Marshall M. and Colleen Brensinger. 2003. “Weighting in Instrumental Variables and G-Estimation.” Statistics in Medicine 22(1):1285–1303. Joffe, Marshall M, Thomas R. Ten Have and Colleen Brensinger. 2003. “The Compliance Score as a Regressor in Randomized Trials.” Biostatistics 4(3):327–340. Mebane, Walter R., Jr. and Jasjeet S. Sekhon. 2010. “R-GENetic Optimization Using Derivatives (RGENOUD).” R package 5.7-1 . Roy, Jason, Joseph W. Hogan and Bess H. Marcus. 2008. “Principal Stratification with Predictors of Compliance for Randomized Trials with 2 Active Treatments.” Biostatistics 9(2):277–289. 18 Rubin, Donald B. 1978. “Bayesian Inference for Causal Effects: the Role of Randomization.” The Annals of Statistics 6(1):34–58. Yau, Linda H.Y. and Roderick J. Little. 2001. “Inference for the Complier-Average Causal Effect from Longitudinal Data Subject to Noncompliance and Missing Data, with Application to a Job Training Assessment for the Unemployed.” Journal of the American Statistical Association 96. 19 Tables N 300 600 900 1200 1500 OLS -1.83 -1.83 -1.82 -1.82 -1.84 Bias ITT 2SLS ICSW -1.04 -1.07 -0.05 -1.06 -1.11 -0.13 -1.04 -1.08 -0.12 -1.04 -1.08 -0.12 -1.05 -1.10 -0.14 SD OLS ITT 2SLS 0.86 0.79 1.45 0.60 0.55 1.01 0.49 0.45 0.82 0.42 0.39 0.71 0.38 0.35 0.64 RMSE ICSW OLS ITT 2SLS 1.69 2.02 1.31 1.80 1.11 1.93 1.20 1.50 0.90 1.88 1.14 1.35 0.77 1.87 1.12 1.29 0.69 1.88 1.11 1.27 ICSW 1.69 1.12 0.90 0.78 0.70 Table 1: Results of Simulation Study. All reported statistics assume that the SATE is the causal estimand of interest. 20 Watching Debate Intercept Party ID Political Int. Watch News Education Read News Female Income White Knowledge 2SLS (1) 0.27 (0.16) 1.80 (0.23) -0.02 (0.02) 0.25 (0.05) 0.00 (0.02) 0.00 (0.01) 0.11 (0.02) -0.05 (0.07) -0.01 (0.01) 0.07 (0.09) Opinion Knowledge 2SLS ICSW (2) (3) -0.07 0.60 (0.09) (0.43) 1.03 2.14 (0.15) (0.40) -0.09 -0.03 (0.01) (0.03) -0.04 0.24 (0.03) (0.07) 0.01 -0.03 (0.01) (0.04) -0.01 0.01 (0.01) (0.02) -0.01 0.10 (0.01) (0.03) -0.03 -0.07 (0.04) (0.11) 0.01 -0.03 (0.01) (0.02) 0.18 0.00 (0.05) (0.15) Opinion ICSW (4) -0.06 (0.21) 0.94 (0.20) -0.08 (0.01) 0.00 (0.04) 0.00 (0.02) -0.01 (0.01) -0.01 (0.01) 0.00 (0.06) 0.01 (0.01) 0.15 (0.08) Table 2: 2SLS and ICSW Estimates of Knowledge and Opinion Effects. Refer to text for definitions of variable labels. 21 PC PC 1.00 Party ID 0.32 Pol. Int. 0.71 Watch News 0.29 Education 0.39 Read News 0.70 Female -0.29 Income 0.04 White 0.17 Party ID 1.00 -0.11 -0.02 -0.03 -0.05 0.04 -0.14 -0.22 Political Watch EducInterest News ation 1.00 0.17 0.25 0.36 -0.06 0.16 0.20 1.00 0.03 0.15 -0.07 -0.03 0.02 1.00 0.26 -0.10 0.30 0.15 Read Male News 1.00 -0.10 0.23 0.17 1.00 -0.11 0.00 Income White 1.00 0.08 1.00 Table 3: Compliance Score and Covariate Correlation Matrix. Refer to text for definitions of variable labels. 22 Figures 0.06 SSR = 70.5 p < 0.001 0.05 Density 0.04 0.03 0.02 0.01 0.00 50 100 150 200 250 SSR Figure 1: SSR Distribution of Compliance Scores under Null Hypothesis of No Covariate Relationship using Permutation Inference. Black line indicates observed SSR. 23