Download Beyond LATE: A Simple Method for Recovering Sample Average Treatment Effects

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Beyond LATE: A Simple Method for Recovering Sample
Average Treatment Effects
Peter M. Aronow∗
Allison J. Sovey
March 24, 2011
∗
Peter M. Aronow is Doctoral Student, Department of Political Science, Yale University, 77 Prospect
Street, New Haven CT, 06520. Allison J. Sovey is Doctoral Candidate, Department of Political Science,
Yale University, 115 Prospect Street, Rosenkranz Hall Room 437, Yale University, New Haven CT, 06529.
The authors acknowledge support from the Yale University Faculty of Arts and Sciences High Performance
Computing facility and staff. Helpful comments from Lara Chausow, Ivan Fernandez-Val, Don Green, Greg
Huber, Holger Kern, Malte Lierl, Mary McGrath, Joel Middleton, Cyrus Samii and the participants of
the Yale American Politics and Public Policy Workshop are greatly appreciated. Special thanks to Adria
Lawrence and Bethany Albertson for generous data sharing and to Dean Eckles for particularly helpful
remarks on an earlier version of this paper. The usual caveat applies.
Abstract
Political scientists frequently use instrumental variables estimators to estimate the Local Average Treatment Effect (LATE), or the average treatment effect among those who comply with treatment assignment. However, the LATE is
often not the causal estimand of interest; researchers may instead be interested in
the Sample Average Treatment Effect (SATE), or the average treatment effect for
the entire sample. We first introduce the compliance score, a pre-treatment covariate that reflects a unit’s probability of compliance with treatment assignment,
to researchers in political science. We posit a maximum likelihood estimator for
predicting compliance scores even in the presence of two-sided non-compliance.
We then develop a new technique, inverse compliance score weighting, that, in
conjunction with a standard IV estimator, will allow researchers to easily estimate the SATE. Finally, we estimate both the LATE and SATE for a randomized
experiment designed to measure the effects of media exposure and reach striking
substantive conclusions.
Keywords: Compliance score, instrumental variables, LATE, average treatment effect, causal inference
Introduction
In the quest to achieve unbiased causal inference, social scientists have increasingly turned
to instrumental variables (IV) estimation, whether in experimental or observational settings.
The growing popularity of IV estimation, however, has provoked criticism from several influential scholars. In particular, Deaton (2009) and Heckman and Vytlacil (2009) object
to the increased reliance on IV methods. Although both papers criticize IV methods for a
variety of reasons, one of the main critiques concerns the estimation of the Local Average
Treatment Effect (LATE). The LATE is the average treatment effect among compliers, units
that receive treatment if and only if induced to do so. Both papers argue that the LATE is
not usually the parameter of interest, as “we are unlikely to learn much about the process
at work” by estimating the LATE (Deaton 2009, 10) and the LATE “is often very difficult
to interpret as an answer to an interesting economic question” (Heckman and Vytlacil 2009,
19). Problems with the LATE are arguably so severe that Deaton (2009, 4) concludes that
experiments have “no special ability to produce more credible knowledge than other methods.” Similarly, Heckman and Vytlacil (2009, 20) argues that reliance on LATE means that
“problems of identification and interpretation are swept under the rug.”
Deaton suggests a partial alternative to using the LATE: simply compare the average
outcome of the group assigned to treatment with that of the group assigned to control. This
approach, estimating the intention-to-treat (ITT) effect, may be helpful if the experimenter is
not concerned with the magnitude of the effect. Yet ultimately Deaton concludes that this an
imperfect solution and that the LATE estimate is not much help without strong theoretical
underpinnings. Even Imbens (2009, 4), who joins the fray to defend the special place of
randomized experiments in the social sciences, must admit that “in many cases the local
average treatment effects...are not the average effects that researchers set out to estimate.”
Thus, although the authors disagree on the importance and validity of experiments, they
1
seem to largely agree on the limitations of the LATE.
We acknowledge the limitations of the LATE described above and address the arguments
against the use of randomized experiments by offering a solution to the problem of reliance
on the LATE. Following the basic logic of inverse probability weighting, we develop a simple
procedure that allows for the consistent estimation of the Sample Average Treatment Effect
(SATE), the average treatment effect that would be observed across the entire sample if
all units were to comply with treatment assignment. We introduce the compliance score to
researchers in political science, and develop a novel maximum likelihood estimator (MLE)
to predict compliance scores for units, even in the presence of two-sided non-compliance.
We then develop a new technique, inverse compliance score weighting (ICSW), that, in
conjunction with a standard IV estimator, will allow researchers to easily estimate the SATE.
Finally, we estimate both the LATE and SATE for a randomized experiment designed to
measure the effects of media exposure and reach striking substantive conclusions.
Potential Outcomes Framework
We develop our method using the Neyman-Rubin potential outcomes framework as elucidated in Rubin (1978), focusing on the case of a binary treatment. For unit i, let Y0i be
the outcome if i does not receive treatment and let Y1i be the outcome if i receives treatment. Ideally, we would take the difference between Y1i and Y0i to obtain the effect of the
treatment. This is impossible, however, as we only observe either Y0i or Y1i for a given
unit; we cannot directly observe the counterfactual. This problem may be solved if units
are randomly assigned to treatment. Let Di be an indicator variable for whether a subject
receives treatment so that Di = 1 when a subject is treated and Di = 0 when a subject is
not treated.
The treatment effect for a given unit i is the difference between this unit’s outcomes
2
in both possible states of the world, Y1i − Y0i . The average treatment effect (ATE) is the
expected value of the treatment effect, or E(Y1i ) − E(Y0i ). What we observe, however, is
E(Yi |Di = 1) − E(Yi |Di = 0) = E(Y1i | Di = 1) − E(Y0i | Di = 1) (average treatment
effect on the treated) + E(Y0i | Di = 1) − E(Y0i | Di = 0) (selection bias) (Angrist and
Pischke 2009). The ATE may thus be estimated using the difference in means between the
treatment and control groups and will be unbiased when the selection bias is zero. However,
when Di is systematically related to unobserved causes of Yi , these estimates will be biased.
For example, in a randomized experiment with non-compliance, Di is no longer randomly
assigned and this assumption is violated. Further assumptions are necessary to recover causal
effects in this context. By using the treatment assignment, Z, as an instrumental variable
for treatment received, we may recover the LATE.
Following Angrist, Imbens and Rubin (1996), the sample may be divided into four groups:
always-takers, never-takers, compliers, and defiers. Define D0i as the treatment condition of
unit i when assigned to control and D1i as the treatment condition of unit i when assigned
to treatment. Always-takers are units that receive treatment regardless of whether they
are assigned to treatment or control, so that D0i = 1 and D1i = 1. Conversely, nevertakers never receive treatment regardless of their treatment assignment, so that D0i = 0 and
D1i = 0. Compliers receive treatment if assigned to treatment and do not receive treatment
if assigned to control, so that D0i = 0 and D1i = 1 (or D1i > D0i ). Defiers receive treatment
if assigned to control and receive no treatment if assigned to treatment, so that D0i = 1
and D1i = 0 (or D0i > D1i ). In the example of a randomized clinical trial to assess the
causal effect of taking a medication, always-takers will receive the medication regardless of
which group they are assigned to, never-takers will not take the medication, compliers will
take the medication if and only if assigned to the treatment group and defiers will take the
medication if and only if assigned to the control group. Although these identifiers may be
interpreted deterministically, they also have a probabilistic interpretation that we favor. For
3
any given unit i, there may exist pre-treatment (unobserved) covariates indicating unit i’s
propensity to be in each of these four groups. If the study is interpreted as a sample of a
larger population of studies, then, e.g., unit i could be an always-taker in one realization and
a never-taker in a different realization. This distinction will become important as we discuss
the logic of ICSW in later sections.
Using an instrumental variables estimator, we may estimate the average treatment effect
among compliers, or the local average treatment effect (LATE). To do so, we must make five
additional assumptions, as explicated by Angrist, Imbens and Rubin (1996). First, we must
assume that the exclusion restriction is valid, or that Z only affects Y through D. Second, we
assume the instrumental variable (Z) to some degree predicts the endogenous independent
variable (D). Third, we invoke the stable unit treatment value assumption (SUTVA), which
states that the potential outcomes of D and Y are invariant with respect to the allocations
of Z and D respectively. Fourth, we assume that the population contains no defiers, i.e.,
Pr(D0i > D1i ) = 0. Fifth, we must assume that the treatment assignment Zi is randomly
assigned.
Angrist, Imbens and Rubin (1996) demonstrate that these assumptions imply that the
ITT effect of Z on Y (the average effect of treatment assignment on the outcome) divided by
the ITT effect of Z on D (the average effect of treatment assignment on treatment received) is
equal to the average causal treatment effect for compliers. We present an abbreviated version
of their proof: by exclusion and and independence, E(Yi | Zi = 1) = E(Y0i + (Y1i − Y0i )D1i ).
Similarly, E(Yi |Zi = 0) = E(Y0i + (Y1i − Y0i )D0i ). Thus, we have the ITT,
E(Yi | Zi = 1) − E(Yi | Zi = 0) = E((Y1i − Y0i )(D1i − D0i )) =
E(Y1i − Y0i |D1i > D0i ) Pr(D1i > D0i )
4
(1)
and, using analogous reasoning, we find that
E(Di | Zi = 1) − E(Di | Zi = 0) = E(D1i − D0i ) = Pr(D1i > D0i )
(2)
Therefore, we have
E(Y1i − Y0i |D1i > D0i ) =
E(Yi | Zi = 1) − E(Yi | Zi = 0)
E(Di | Zi = 1) − E(Di | Zi = 0)
(3)
which is the canonical Wald IV estimator (Angrist and Pischke 2009). As demonstrated, this
provides consistent estimates of the local average treatment effect (LATE), or the average
effect of the treatment among the compliers. Because IV estimators provide asymptotically
unbiased estimates of the LATE, they have become the de facto standard for causal inference
in experiments with noncompliance in the social sciences. As we show below, however, the
SATE is also easily computed under reasonable assumptions. We now turn to the compliance
score, the key component for the estimation of the SATE.
The Compliance Score
As pioneered by Follmann (2000) in biostatistics, the compliance score is a pretreatment
covariate that identifies a unit’s probability of being a complier. By inspection, and following
the above nomenclature, the probability of compliance for unit i is E(Di |Zi = 1)−E(Di |Zi =
0). With the assumptions outlined in the previous section, we know that the compliance score
(and all other pretreatment covariates) is asymptotically orthogonal to Z. In conjunction
with a known covariate profile, the compliance score for a given unit is simple to compute
if there exist no always-takers, as E(Di |Zi = 1) − E(Di |Zi = 0) = E(Di |Zi = 1). Under the
assumption of no always-takers, Follmann (2000) estimates the LATE of smoking cessation
for patients in a clinical trial. Following the general logic of propensity score estimation,
5
Follmann uses a simple logistic regression of D on known covariates for units for which
Zi = 1. The fitted values of D from this regression may then be extrapolated to the entire
sample in order to compute a compliance score for each unit.
One particularly interesting application of the compliance score may be found in Joffe and
Brensinger (2003), which suggests using the compliance score to weight observations in an
instrumental variables regression in order to improve efficiency. Because the ITT estimates
from clinical trials are often too conservative due to noncompliance, Joffe applies this method
to focus the analysis on strata with better compliance. Other researchers have also made
similar recommendations, e.g., Joffe, Ten Have and Brensinger (2003) and Roy, Hogan and
Marcus (2008). However, to our knowledge, none of the existing methods for computing
a compliance score have thus far been robust to the inclusion of always-takers. We posit
a maximum likelihood estimation technique (similar to that of Yau and Little 2001), to
recover the compliance score even in the presence of two-sided non-compliance. Two-sided
noncompliance means that there are both always-takers and never-takers in the sample; note
that, however, we must still assume that there are no defiers in the sample.
As defined above, D is an indicator variable for treatment received and Z is an indicator
variable for treatment assigned. Further, we define Xi as an exhaustive (i.e, sufficient for
the model to be fully specified) vector of predictive covariates for unit i. For convenience,
we make three easily relaxed parametric assumptions. The first assumption is that the
probability of being an always-taker or a complier is a function of the covariates with a
known distribution.
PA,C,i = Pr(D1i > D0i ∪ D0i = 1) = F (θA,C Xi ),
(4)
where PA,C,i is the probability that unit i is either a complier or an always-taker, θA,C is a
vector of coefficients to be estimated and F (·) is the cumulative distribution function (CDF)
6
for an arbitrary distribution. For the purposes of this paper, we will use a probit model, so
F (·) = Φ(·), where Φ is the Normal CDF. However, other binary choice models, including
logit and generalized additive models (GAM) (Hastie and Tibshirani 1990) could be used.
Second, we similarly specify
PA|A,C,i = Pr(D0i = 1|D1i > D0i ∪ D0i = 1) = F (θA|A,C Xi ),
(5)
where PA|A,C is the probability that unit i is an always-taker conditional on it being either an
always-taker or a complier, and θA|A,C are coefficients to be estimated. Therefore, we may
define compliance score PC,i = Pr(D1i > D0i ). Since we know, by definition, that compliers
receive treatment if and only if assigned and that always-takers always receive treatment,
Pr(Di = 1) = Pr(D1i > D0i )Zi + Pr(D0i = 1) = PA,C,i (1 − PA|A,C,i )Zi + PA,C,i PA|A,C,i . (6)
This expression represents a fully specified model for Pr(Di = 1), and that this value
is strictly bounded within (0, 1) since Z ∈ {0, 1} and PrA,C ∈ (0, 1). This boundedness
along with the binary nature of Z allows us to specify our third assumption: Di is Bernoulli
distributed and observations are independent. Using equation 6, the likelihood of the model
may now be specified as:
L(PA|A,C,i , PA,C,i | D, Z) = (PA,C,i (1 − PA|A,C,i )Zi + PA,C,i PA|A,C,i )Di
(1 − PA,C,i (1 − PA|A,C,i )Zi − PA,C,i PA|A,C,i )1−Di .
7
(7)
Combining equations 4, 5 and 7,
L(θA,C , θA|A,C |D, Z) =
Di
ΠN
i=1 ((F (θA,C Xi )(1 − F (θA|A,C Xi ))Zi + F (θA,C Xi )F (θA|A,C Xi ))
(1 − F (θA,C Xi )(1 − F (θA|A,C Xi ))Zi − F (θA,C Xi )F (θA|A,C Xi ))1−Di ).
(8)
Maximizing this likelihood function numerically may lead to false convergence when
a standard optimizer (i.e., Newton-Raphson or Nelder-Mead) is used. Instead, we recommend using a global optimization technique, such as GENOUD (Mebane and Sekhon
2010) to ensure that the optimizer has converged to the global maximum.1 After estimating
θA,C and θA|A,C , we may compute both the compliance score for unit i, Pr(D1i > D0i ) =
F (θA,C Xi )(1−F (θA|A,C Xi )), and the always-taker score, Pr(D0i = 1) = F (θA,C Xi )F (θA|A,C Xi ).
(The probability that unit i is a never-taker is simply 1 − Pr(D0i = 1) − Pr(D1i > D0i ).)
In order to prevent very small compliance scores from forming, we may constrain Pr(D1i >
D0i ) ≥ α, where α is an arbitrary threshold for a minimum probability.2 For the purposes
of this paper, we choose α = 0.05. Note that, given a compliance score profile for any
given population, we may employ a permutation test to assess the efficacy of the compliance
identification. Under the null hypothesis that we have not recovered meaningful compliance
scores, we would expect that SSR(PC |X) = SSR(PC |X0 ), where SSR is the sum of squared
residuals between the predicted probability of D and the actual value of D, and X0 has had its
rows randomly permuted. This would imply that, regardless of the particular configuration
of covariate profiles, we would observe the same SSR for compliance scores. If we may reject
this null hypothesis, we have confidence that our compliance score identification is capturing
1
An R package, cscore, will be made available shortly to implement all procedures described in this paper.
This procedure, known as trimming (or, more appropriately, Winsorizing), is performed because very low
probabilities can introduce instability in the estimates resulting from inverse probability weighting (Elliott
2009).
2
8
some of the true variation in compliance scores.3
Inverse Compliance Score Weighting
With an identified compliance score profile for the entire sample, we may utilize Inverse
Compliance Score Weighting to recover the SATE.4 ICSW follows a similar logic as inverse probability weighting: if a type of observation is disproportionately sampled, we may
reweight the sample to reflect the distribution of types in the population. That is, the complier population may be interpreted as a non-random subsample of the sample population
of interest. We may reweight the entire sample such that the distribution of covariates in
the complier population is identical to the covariate distribution of the (pre-weighted) entire
sample.
In order to recover the SATE, three assumptions (in addition to the instrumental variables
assumptions described above) are required. First, we assume latent ignorability (conditional
on a covariate profile) of compliance with respect to heterogeneous treatment effects (Esterling, Lazer and Neblo unpublished; Frangakis and Rubin 1999). Define categorical variable
h ∈ H = (1, 2, 3..., M ) as a “type” of unit – that is, a unit with a known, observed covariate
P
profile. By definition, h Pr(H = h) = 1. Similar to the logic behind inverse probability
weighting for sample designs (see Hirano and Imbens 2001, for a discussion and example of
inverse propensity score weighting), the assumption of latent ignorability here implies that
E(Y1 − Y0 | H = h) = E(Y1 − Y0 | D1 > D0 ∩ H = h). In other words, the expected
value of the treatment effect conditional on the type of unit is equal to the expected value
of the treatment effect conditional on the type of unit and compliance. Note that this is
3
Thanks to Holger Kern for this suggestion.
In work developed contemporaneously with our paper, Angrist and Fernandez-Val (2010) presents a
similar method to reweight the LATE to target populations. Also, for an integrative framework for recovering
the SATE along with other causal quantities, see Esterling, Lazer and Neblo (unpublished). Note, however,
that this method is not robust to two-sided noncompliance. Additionally, see Frangakis and Rubin (1999)
for a similar approach involving missing data.
4
9
a fairly weak ignorability assumption. We do not specify that both potential outcomes are
independent of compliance conditional on the type of unit; rather, we only require that the
difference between potential outcomes is ignorable. Second, we assume that the compliance
score has been properly identified for each type. Third, we assume that the compliance score
for all units is strictly bounded ∈ (0, 1] – each unit has a non-zero probability of compliance
across the population of hypothetical (identical) experiments. This assumption has both
philosophical and practical implications. If a unit can never be a complier, there exists no
counterfactual with which to generate an ATE. Practically, if a unit has a zero probability of compliance, the weighting procedure will produce infinite weights, thus leading to an
undefined solution.
In order to formally describe the process, we return to the derivation of the LATE;
note that, as above, our proof relies on the asymptotic qualities of each of the estimated
quantities. Our proof strategy is to derive expressions for the inverse compliance score
weighted numerator (ITT) and denominator (probability of compliance) of equation 3, thus
demonstrating the asymptotic value of the IV estimator after ICSW. As in equation 1,
IT T = Pr(D1 > D0 )E(Y1 − Y0 | D1 > D0 ), which we may express as a weighted sum of the
conditional ITTs:
IT T =
X
Pr(D1 > D0 | H = h) Pr(H = h)E(Y1 − Y0 | D1 > D0 ∩ H = h).
(9)
h
With the equations above, we are now able to apply ICSW: for each h ∈ H, we multiply
by the weight:
1
,
P r(D1 >D0 |H=h)
or the inverse of the compliance score. We then divide by
the average weight across all observations (to normalize the weight). We define the average
P
weight wc = h Pr(DPr(H=h)
.
1 >D0 |H=h)
10
For equation 9, we can write the weighted ITT as follows:
IT T w =
1 X Pr(D1 > D0 | H = h) Pr(H = h)E(Y1 − Y0 | D1 > D0 | H = h)
.
wc h
Pr(D1 > D0 | H = h)
(10)
Since the proportion of compliers terms cancel out, equation 10 reduces to:
IT T w =
1 X
Pr(H = h)E(Y1 − Y0 | D1 > D0 ∩ H = h).
wc h
(11)
Returning to the latent ignorability assumption laid out above, we can rewrite equation 11 as
P
IT T w = w1c h Pr(H = h)E(Y1 −Y0 | H = h). By the law of total probability, the reweighted
numerator of the IV estimator is
IT T w =
1X
1
Pr(H = h)E(Y1 − Y0 | H = h) =
E(Y1 − Y0 ).
wc h
wc
(12)
We can now reweight the denominator of the IV estimator by expanding the definition
of the complier population, weighting, and simplifying:
Pr(D1 > D0 ) =
X
Pr(D1 > D0 | H = h) Pr(H = h)
(13)
h
Pr(D1 > D0 )w =
1 X Pr(D1 > D0 | H = h) Pr(H = h)
1 X
=
Pr(H = h)
wc h
Pr(D1 > D0 | H = h)
wc h
(14)
Dividing equation 12 by equation 14, we have asymptotic value of the ICSW reweighted IV
estimator,
IT T w
=
Pr(D1 > D0 )w
1
E(Y1 −
wc
1
wc
Y0 )
= E(Y1 − Y0 ),
(15)
which is the SATE.5 In order to produce standard errors, we recommend bootstrapping the
entire process – from computing the compliance score to IV estimation (Abadie 2002).
5
Even if latent ignorability is not satisfied, in the case of homogeneous treatment effects, we will still
11
Simulation Studies
We present a simple simulation study in order to demonstrate the efficacy of the method.
We define three types of units, each with different always-taker, never-taker and compliance
scores. Type 1 has a compliance score of 0.55, an always-taker score of 0.40, and a nevertaker score of 0.05. Type 2 has a compliance score of 0.70, an always-taker score of 0.15, and
a never-taker score of 0.15. Type 3 has a compliance score of 0.40, an always-taker score of
0.05, and a never-taker score of 0.55. Each of the three types represents 1/3 of the simulation
sample. We also define three covariates, each imperfectly measuring the type of units. For
covariates h ∈ {1, 2, 3}, each covariate is generated as 3I(type = h)+N(0, 1), where I(·) is the
indicator function. Note that these covariates measure the types with some degree of error.
The outcome variable, Y , is defined as (3I(type = 1) − 6I(type = 2) + 6I(type = 3)) D −
15I(type = 1) − 15I(type = 2) + 0I(type = 3) + N(0, 5). Therefore, both treatment effects
and intercepts vary by type. 50% of units are randomly assigned to treatment (Z = 1) and
the other 50% of units are randomly assigned to control (Z = 0). Thus, while the sample
average treatment effect is 1.000, the local average treatment effect is -0.05.
We simulate with N ∈ {300, 600, 900, 1200, 1500}, each with 10,000 samples. We then
perform four estimation procedures on all samples: standard OLS, ITT (OLS using treatment
assignment), 2SLS and ICSW. For the ICSW, we set the minimum compliance score α = 0.05.
Assuming that the causal estimand of interest is the SATE, we then compute the bias,
standard deviation and root-mean-squared-error (RMSE) for estimates from each of these
procedures. The results of this simulation study are presented in Table 1. While the variance
is always higher for the ICSW estimator than for OLS, ITT or 2SLS, the RMSE associated
recover the SATE. We define constant treatment effect τ = Y1i − Y0i , ∀i. Therefore,
X
X
IT T w
=
Pr(H = h)E(Y1 − Y0 | H = h) =
Pr(H = h)τ = τ .
w
Pr(D1 > D0 )
h
(16)
h
By applying ICSW before applying the IV estimator, we have demonstrated that we may recover the SATE
under two different identifying assumptions.
12
with the estimator is superior once N > 300. Furthermore, in all cases, bias is consistently
smaller for the ICSW estimator than it is for OLS, ITT or 2SLS. If the SATE is the causal
estimand of interest, ICSW appears to provide a superior alternative for its estimation.
[TABLE 1 ABOUT HERE]
Application
We now discuss the application of our method using data from Albertson and Lawrence
(2009). The authors performed an experiment (N = 507) in which survey respondents in
Orange County, California were randomly assigned to receive a treatment encouraging them
to view a Fox News debate on affirmative action that was to take place the eve of the 1996
presidential election. Shortly after the election, these respondents were re-interviewed. The
post-election questionnaire asked respondents whether they viewed the Fox News debate,
whether they supported a California proposition (209) to eliminate affirmative action (coded
1 if the respondent supported the proposition and 0 otherwise) and whether they felt informed
(coded on a scale from 1-4 from least to most informed). The authors use a standard
instrumental variable design to address the fact that some who were not assigned to treatment
reported viewing the debate and some who were assigned to treatment did not report viewing
the debate. This noncompliance was nontrivial: only approximately 40% of the respondents
complied with treatment assignment.
Albertson and Lawrence’s IV regression results show a statistically insignificant but negative relationship between program viewing and support for the proposition and a nearly
statistically significant result for the positive relationship between program viewing and feeling more informed about the issue among compliers. Albertson and Lawrence’s original
findings are presented in columns (1) and (2) of Table 2.6 However, this may not be the
6
Note that our replication of their results differs slightly from their original results due to the fact that
13
substantive question of interest. Rather, we may wish to know what sort of effects Fox News
debate watching would have on attitudes and knowledge for the entire sample.
[TABLE 2 ABOUT HERE]
We first compute compliance scores for the sample. We use the eight covariates used by
Albertson and Lawrence: television news-watching habits (coded on a seven point scale from
never watches to watches everyday), newspaper reading habits (coded on seven point scale
from never reads to reads everyday), interest in politics and national affairs (coded on a four
point scale from low interest to high interest), party ID (coded on a seven point scale from
Republican to Democrat), income (coded on a scale from 1 to 11 from poorest to richest), sex
(coded 1 if the respondent is female and 0 otherwise), education (coded on a 13 point scale
from least to most educated) and race (coded 1 if the respondent is white and 0 otherwise).
Using our MLE, we obtain a mean compliance score of 0.4167 and a mean always-taker score
of .0424. Note that this closely comports with our estimated mean proportion of compliers
and always-takers, 0.4074 and 0.0435 respectively. Table 3 displays the covariance matrix of
compliance scores and covariates. The compliance score is positively correlated with higher
income, higher education, greater interest in politics, more frequent news watching and paper
reading, being white, being male and identifying as a Democrat. Compliers are more likely
to exhibit these qualities, which conforms with our expectations, as we intuitively expect
that those who are more interested in politics, read the paper more frequently and watch
more news programs would be more likely to comply with watching the program.
[TABLE 3 ABOUT HERE]
We now apply the proposed permutation test for the identification of compliance scores.
Figure 1 presents the distribution of SSRs associated with the null hypothesis that SSR(PC |X)
we used mean imputation for missing values in the covariate profile.
14
= SSR(PC |X0 ). Note that the observed SSR is on the outskirts of the SSR distribution, such
that the probability of seeing an SSR this extreme is p < 0.001.
[FIGURE 1 ABOUT HERE]
Since we may feel confident that we have identified (at least) some portion of the true
compliance score distribution, we may now apply ICSW. The results of an IV regression
(2SLS) after ICSW, presented in columns (3) and (4) of Table 2, are striking. Although
Albertson and Lawrence (2009) finds that compliers would have been .27 points more informed after viewing the program, we find that viewers in the entire subject population are
.60 points more informed after viewing. If the SATE were the real parameter of interest
in this study, using the LATE to approximate it would lead to gross underestimates of the
would-be treatment effect. These results comport with our intuitions about the effects of
watching the political debate. Recall from Table 3 that non-compliers tend to be less educated and pay less attention to politics. For example, in the control condition, units with
a compliance score below the median of 0.435 have a mean knowledge score of 2.87 points
and units with a compliance score above the median have a mean knowledge score of 3.48
points. Since the non-compliers (i.e., generally less educated and less informed individuals)
were less knowledgeable to begin with, they would naturally learn more from watching the
broadcast.
Turning to our analysis of the effect of program viewing on support for the measure, we
see that Albertson and Lawrence (2009)’s 2SLS estimate and our ICSW estimate are quite
similar. While Albertson and Lawrence (2009) finds that compliers were 0.7 percentage
points less likely to support the ballot measure, we find that the general sample population
would have been 0.6 percentage points less likely to support the measure if assigned to treatment. This finding suggests that the effect of viewing the debate on support for compliers
is very similar to the effect for the overall sample population. This result lends itself to two
15
interpretations. Since we cannot reject the null hypothesis of no treatment effect for either
the LATE or the SATE, we may suspect that there was no effect on attitudes resulting from
the treatment. Alternatively, if we believe there is a treatment effect, this finding suggests
that this effect is relatively homogeneous with respect to the population of interest. Together, the analysis of these two measures using ICSW highlights the fact that, ex ante, it is
unclear how close the LATE will be to the SATE. In cases where the SATE is the parameter
of interest, this uncertainty may be highly problematic.
Conclusion
Although the SATE is often the real parameter of interest, scholars typically focus on the
LATE because it is frequently the only available causal estimand. We have demonstrated
that the LATE may not be representative of the general population, and reliance on the
LATE may in fact lead to substantive conclusions that are radically different from those
suggested when estimating the SATE.
In this paper, we have provided a method to recover the SATE using only assumptions
standard to instrumental variables estimators and inverse probability weighting for sample
correction. Although we have demonstrated our method using an experimental case study,
the method can be applied to virtually any research design that uses instrumental variables estimation. The method can be extended to applications using continuous endogenous
variables and continuous instruments; as Angrist and Imbens (1995) demonstrates, 2SLS is
simply a weighted average of grouped data IV estimators. ICSW thus allows researchers to
estimate the SATE, a causal estimand previously considered out of reach, in a vast array of
applications throughout the social sciences.
16
References
Abadie, Alberto. 2002. “Bootstrap Tests for Distributional Treatment Effects in Instrumental
Variable Models.” Journal of the American Statistical Association 97(457):284–292.
Albertson, Bethany and Adria Lawrence. 2009. “After the Credits Roll: The Long-Term
Effects of Educational Television on Public Knowledge and Attitudes.” American Politics
Research 37(2):275–300.
Angrist, Joshua D. and Guido W. Imbens. 1995. “Two-Stage Least Squares Estimation
of Average Causal Effects in Models with Variable Treatment Intensity.” Journal of the
American Statistical Association 90(430):431–442.
Angrist, Joshua D, Guido W. Imbens and Donald B. Rubin. 1996. “Identification of Causal
Effects Using Instrumental Variables.” Journal of the American Statistical Association
91:444–55.
Angrist, Joshua D. and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An
Empiricist’s Companion. Princeton University Press.
Angrist, Joshua and Ivan Fernandez-Val. 2010. “ExtrapoLATE-ing: External Validity and
Overidentification in the LATE Framework.” NBER Working Paper .
Deaton, Angus. 2009. “Instruments of Development: Randomization in the Tropics, and the
Search for the Elusive Keys to Economic Development.” Keynes Lecture pp. 1–54.
Elliott, Michael R. 2009. “Model Averaging Methods for Weight Trimming in Generalized
Linear Regression Models.” Journal of Official Statistics 25(1):1–20.
Esterling, Kevin M., David M.J. Lazer and Michael A. Neblo. unpublished. “Estimating
Treatment Effects in the Presence of Noncompliance and Nonresponse: The Generalized
Endogenous Treatment Model.”.
17
Follmann, Dean A. 2000. “On the Effect of Treatment Among Would-Be Treatment Compliers: An Analysis of the Multiple Risk Factor Intervention Trial.” Journal of the American
Statistical Association 95(452):1101–1109.
Frangakis, Constantine E. and Donald B. Rubin. 1999.
“Addressing Complications
of Intention-to-Treat Analysis in the Combined Presence of All-or-None TreatmentNoncompliance and Subsequent Missing Outcomes.” Biometrika 86(2):365–379.
Hastie, Trevor and Rob Tibshirani. 1990. Generalized Additive Models. Chapman and Hall.
Heckman, James J. and Edward J. Vytlacil. 2009. “Comparing IV with Structural Models:
What Simple IV Can and Cannot Identify.” Working Paper .
Hirano, Keisuke and Guido W. Imbens. 2001. “Estimation of Causal Effects using Propensity
Score Weighting: An Application to Data on Right Heart Catheterization.” Health Services
and Outcomes Research Methodology 2(3-4):259–278.
Imbens, Guido W. 2009. “Better LATE Than Nothing: Some Comments on Deaton (2009)
and Heckman and Urzua (2009).” Working Paper .
Joffe, Marshall M. and Colleen Brensinger. 2003. “Weighting in Instrumental Variables and
G-Estimation.” Statistics in Medicine 22(1):1285–1303.
Joffe, Marshall M, Thomas R. Ten Have and Colleen Brensinger. 2003. “The Compliance
Score as a Regressor in Randomized Trials.” Biostatistics 4(3):327–340.
Mebane, Walter R., Jr. and Jasjeet S. Sekhon. 2010. “R-GENetic Optimization Using Derivatives (RGENOUD).” R package 5.7-1 .
Roy, Jason, Joseph W. Hogan and Bess H. Marcus. 2008. “Principal Stratification with
Predictors of Compliance for Randomized Trials with 2 Active Treatments.” Biostatistics
9(2):277–289.
18
Rubin, Donald B. 1978. “Bayesian Inference for Causal Effects: the Role of Randomization.”
The Annals of Statistics 6(1):34–58.
Yau, Linda H.Y. and Roderick J. Little. 2001. “Inference for the Complier-Average Causal
Effect from Longitudinal Data Subject to Noncompliance and Missing Data, with Application to a Job Training Assessment for the Unemployed.” Journal of the American
Statistical Association 96.
19
Tables
N
300
600
900
1200
1500
OLS
-1.83
-1.83
-1.82
-1.82
-1.84
Bias
ITT 2SLS ICSW
-1.04 -1.07
-0.05
-1.06 -1.11
-0.13
-1.04 -1.08
-0.12
-1.04 -1.08
-0.12
-1.05 -1.10
-0.14
SD
OLS ITT 2SLS
0.86 0.79 1.45
0.60 0.55 1.01
0.49 0.45 0.82
0.42 0.39 0.71
0.38 0.35 0.64
RMSE
ICSW OLS ITT 2SLS
1.69 2.02 1.31 1.80
1.11 1.93 1.20 1.50
0.90 1.88 1.14 1.35
0.77 1.87 1.12 1.29
0.69 1.88 1.11 1.27
ICSW
1.69
1.12
0.90
0.78
0.70
Table 1: Results of Simulation Study. All reported statistics assume that the SATE is the
causal estimand of interest.
20
Watching
Debate
Intercept
Party ID
Political Int.
Watch News
Education
Read News
Female
Income
White
Knowledge
2SLS
(1)
0.27
(0.16)
1.80
(0.23)
-0.02
(0.02)
0.25
(0.05)
0.00
(0.02)
0.00
(0.01)
0.11
(0.02)
-0.05
(0.07)
-0.01
(0.01)
0.07
(0.09)
Opinion Knowledge
2SLS
ICSW
(2)
(3)
-0.07
0.60
(0.09)
(0.43)
1.03
2.14
(0.15)
(0.40)
-0.09
-0.03
(0.01)
(0.03)
-0.04
0.24
(0.03)
(0.07)
0.01
-0.03
(0.01)
(0.04)
-0.01
0.01
(0.01)
(0.02)
-0.01
0.10
(0.01)
(0.03)
-0.03
-0.07
(0.04)
(0.11)
0.01
-0.03
(0.01)
(0.02)
0.18
0.00
(0.05)
(0.15)
Opinion
ICSW
(4)
-0.06
(0.21)
0.94
(0.20)
-0.08
(0.01)
0.00
(0.04)
0.00
(0.02)
-0.01
(0.01)
-0.01
(0.01)
0.00
(0.06)
0.01
(0.01)
0.15
(0.08)
Table 2: 2SLS and ICSW Estimates of Knowledge and Opinion Effects. Refer to text for
definitions of variable labels.
21
PC
PC 1.00
Party ID 0.32
Pol. Int. 0.71
Watch News 0.29
Education 0.39
Read News 0.70
Female -0.29
Income 0.04
White 0.17
Party
ID
1.00
-0.11
-0.02
-0.03
-0.05
0.04
-0.14
-0.22
Political Watch EducInterest News ation
1.00
0.17
0.25
0.36
-0.06
0.16
0.20
1.00
0.03
0.15
-0.07
-0.03
0.02
1.00
0.26
-0.10
0.30
0.15
Read Male
News
1.00
-0.10
0.23
0.17
1.00
-0.11
0.00
Income
White
1.00
0.08
1.00
Table 3: Compliance Score and Covariate Correlation Matrix. Refer to text for definitions
of variable labels.
22
Figures
0.06
SSR = 70.5
p < 0.001
0.05
Density
0.04
0.03
0.02
0.01
0.00
50
100
150
200
250
SSR
Figure 1: SSR Distribution of Compliance Scores under Null Hypothesis of No Covariate
Relationship using Permutation Inference. Black line indicates observed SSR.
23