Download Weighted Estimation for Analyses with Missing Data Cyrus Samii Motivation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
Weighted Estimation for Analyses with Missing Data
Cyrus Samii
Columbia University Political Science
Motivation
Missing data plague data analyses in political science. Recent applied
statistics literature reflects renewed interest in weighting methods for
missing data problems. Three properties are stressed in this literature: (i) robustness, (ii) the ability to use post-treatment information
in causal analysis, and (iii) methods to gain efficiency. I present these
results, hoping to show the potential in using refashioned weighting
methods for political science research.
Preliminaries
Consider a generalized linear regression of Y on X for an iid sample
P
of size n indexed by i. Use the estimating equation ni=1 S(Yi, Xi; β)
to characterize the regression estimator (Liang and Zeger, 1986; Stefanski and Boos, 2002). Full sample estimates, β̂f , come from solvP
ing, ni=1 S(Yi, Xi; β̂f ) = 0. Let β0 ≡ E (β̂f ) define our target estimate. Define an indicator, Ri, for whether data for i are fully observed, where the probability that Ri = 1 is πi, a function of the observed data. The estimate on the observed data is obtained by solving,
Pn
i=1 RiS(Yi, Xi; β̂c) = 0. In general, this will not be zero in expectation when β̂c = β0, indicating bias.
Robustness
The estimating equation, S, does not necessarily define the true data
generating process (DGP). S likely defines a useful (e.g. linear) approximation. Inverse probability weighting (IPW) allows for unbiased
“approximate” inference. To see this, suppose missing data on Yi, and
that conditional independence holds, Yi ⊥⊥ Ri|Xi. Then,
E
µ
¶
n
n
X
X
Ri
Ri
E
Si(Yi, Xi; β0) = E
Si(Yi, Xi; β0)|Yi, Xi
π(Xi)
π(Xi)
i=1
i=1
n
X
Si(Yi, Xi; β0).
=E
i=1
Figure 1 illustrates this property, showing how IPW recovers the linear
approximation of a complex relationship between X and Y .
No weighting
IPW
F IGURE 1: Gray points are the complete sample, and hollow points are observed data. Hollow
points on the right are scaled proportionally to their weight. The gray line is the full sample
target, and the dashed lines are the attempts to estimate it.
Estimating equations allow one to study bias-reduction in terms of influence (Tsiatis, 2006). This is intuitive in Figure 1: we tilt the regression line by increasing the influence of points at the left. This opens the
door to a variety of semi- and non-parametric methods for constructing
weights. The example above used logistic regression. Alternatives are
boosted regression (McCaffrey et al, 2004) and robit regression, which
is more robust than logistic regression (Kang and Schafer, 2007).
Using post-treatment information
The robustness result depends only on our ability to estimate πi using
whatever observed data is available. We have flexibility in modeling
πi—e.g. using post-treatment information. Consider an example due to
Hernan et al (2004). Link X, Y , and R causally with a directed acyclic
graph (Pearl, 2000), where X is a “treatment.” Add a post-treatment
variable, Z, on the path to R:
X
J
J
J
^
­
­
À
­
Y
Z
?
R
We want the average effect of X on Y , which is modeled as 0 here.
We only observe Y when R = 1. This missingness mechanism induces
bias in an unadjusted regression of Y on X. Including Z only adds
another biasing component (King and Zeng, 2006). But the information in X and Z can be used compute πi. Then, IPW is unbiased. For
example, suppose a DGP,
Y ∼ Bernoulli(.5), Z ∼ Bernoulli(logit−1(−2 + 2D + 2E)),
X ∼ Bernoulli(.5), and R ∼ Bernoulli(logit−1(−4 + 3L))
The following table shows the results from attempts to estimate a coefficient on X with a logistic regression on 1,000 samples.
Specification
Mean coef. SD of coef.
(a)
(a)
(a) β0 + β1 X
-0.58
0.11
(b)
(b)
(b)
(b) β0 + β1 X + β2 Z
-0.87
0.13
(a)
(a)
(c) β0 + β1 X, weighted by
0.00
0.21
logit−1(α0 + α1X + α2Z)
Gaining efficiency
In the previous example, the standard deviation of the IPW estimate is
large. This shows the volatility of IPW estimates. IPW also discard
incomplete data. Thus, it is inefficient relative to methods that use the
incomplete data. Robins and colleagues (e.g. Bang and Robins, 2005)
have proposed “augmented”-IPW estimators to incorporate incomplete
data to recover efficiency. The estimating equations can be augmented
to include any function of the data that has mean zero at the target
estimate without introducing bias. Thus,
µ
¶
n
X
Ri
Ri
Si(Yi, Xi; β̂f ) + 1 −
φi = 0,
πi
πi
i=1
estimates the target parameter if φi is a function of the fully observed
data with E (φi|β0) = 0. Note that 1 − Rπii term equals zero if the
πi are accurate. The estimator is “doubly robust” in the sense that
it estimates the target parameter if either πi is estimated well or the
assumptions on φi are accurate. Optimal φi functions are available
to maximize efficiency (Robins and Rotnitzky, 1995; Tsiatis, 2006).
Carpenter et al (2006) present an example of linear regression with
three covariates, (X1,X2,X3), and missingness on X1. They derive an
augmented estimating equation, with Si = Xi(Yi − β 0Xi) and φi =
E [Xi(Yi −β 0Xi)|Yi, Xi2, Xi3], where conditional expected values for X1
are based on an assumption of multivariate normality. Semi-parametric
estimation is carried out by maximizing a quasi-log-likelihood (McCullagh and Nelder, 1989:323-328). The πi are estimated with logistic
regression, and so asymptotic parameter variances are derived via the
M-estimator framework. The table below shows results from a simulation with this example, focusing on a coefficient estimate that suffered
the most bias in complete-case OLS:
Method
OLS
IPW
IPW
Augmented IPW
Augmented IPW
Augmented IPW
Multiple imputation
Multiple imputation
πi model X1 model Avg se Avg bias/Avg se
0.03
-2.06
Correct
0.03
-0.09
Wrong
0.03
-2.06
Correct Correct
0.02
-0.11
Wrong Correct
0.02
-0.10
Correct Wrong
0.03
-0.09
Correct
0.02
-0.08
Wrong
0.03
2.60
Conclusion
Weighting methods are robust, flexible, and efficient for dealing with
missing data. Space limits prevent discussion of other benefits, such as
the ready adaptability of augmented IPW for analyzing sensitivity to
conditional independence violations (Scharfstein et al, 1999). For nonmonotone missingness over many variables in a dataset, augmented
IPW is intractable, and so multiple imputation must be preferred. But
we must accept substantial model dependence. Weighting is best for
primary analyses when missingness on one or two variables poses substantial threats to validity.
References
Bang H, Robins JM. 2005. “Doubly robust estimation in missing data and causal inference models.” Biomtr. 61:962-972.
Hernan MA, Hernandez-Diaz S, Robins JM. 2004. “A structural approach to selection bias.” Epid.. 15:615-625.
Kang JDY, Schafer JL. 2007. “Demystifying double robustness.” Stat. Sc. 22:523-539.
King G. Zeng L. 2006. “The dangers of extreme counterfactuals.” Pol. Anal.. 14:131-159.
Liang KY, Zeger SL. 1986. “Longitudinal data analysis using generalized linear models.” Biomka.. 73:13-22.
Little RA, Rubin DB. 2002. Statistical Analysis with Missing Data. New York: Wiley.
McCaffrey DF, Ridgeway G, Morral AR. 2004. “Propensity score estimation with boosted regression for evaluating causal
effects in observational studies.” Psych. Meth..9:405-425.
McCullagh P, Nelder JA. 1999. Generalized Linear Models, 2nd Ed. New York: Chapman and Hall.
Pearl J. 2000. Causality: Models Reasoning and Inference. New York: Cambridge.
Robins JM, Rotnitzky A. “Semiparametric efficiency in in multivariate regression models with missing data.” JASA.
89:846-866.
Scharfstein DO, Rotnitzky A, Robins JM. 1999. “Adjusting for non-ignorable drop-out using semiparametric nonresponse
models (with discussion).” JASA.94:1096-1120.
Stefanski LA, Boos DD. 2002. “The calculus of M-estimation.” The Am. Stat.. 56:29-38.
Tsiatis AA. 2006. Semi-parametric Theory and Missing Data. New York: Springer.