Download Presentation slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Combining Multiple Imputation
and Inverse Probability Weighting
Shaun Seaman1 , Ian White1 , Andrew Copas2,3 , Leah Li4
1
4
MRC Biostatistics Unit, Cambridge
2
MRC Clinical Trials Unit, London
3
UCL Research Department of Infection and Public Health, London
MRC Centre of Epidemiology for Child Health, UCL Institute of Child
Health, London
Overview
• Inverse probability weighting (IPW)
• Multiple Imputation (MI)
• Comparing IPW and MI
• Why combine IPW and MI?
• Properties of IPW/MI parameter and variance estimators
• Application to 1958 British Birth Cohort
• Conclusions
2
Complete Case Analysis
Suppose model of interest is regression of Y on X.
Some values of Y and/or X may be missing.
Complete-case analysis valid if complete cases representative of population.
Otherwise, typically biased.
3
Inverse Probability Weighting (IPW)
Even when no missing data, sample may not be representative of population,
due to unequal sampling fractions.
Sampling weights used to make sample representative of population.
Sampling weight is W S ∝ 1/P (individual sampled).
This is form of IPW.
IPW also used for missing data: weight complete cases so representative of
whole sample.
Let H be variables that predict whether complete case.
Weight each complete case by W C = 1/P (complete case | H).
IPW gives valid inference if
P (complete case | H, X, Y ) = P (complete case | H).
P (complete case | H) usually unknown, so model specified and estimated
using observed data.
4
Multiple Imputation (MI)
Y
X1
X2
1
1
10
20.1
2
1
8
15.2
3
0
?
4.7
4
0
2
3.9
5
1
?
?
6
0
-1
?
1
10
20.1
1
10
20.1
1
8
15.2
1
8
15.2
0
2.3
4.7
0
1.9
4.7
0
2
3.9
0
2
3.9
1
9.3
16.9
1
10.5
19.3
0
-1
-2.3
0
-1
-1.7
5
Multiple Imputation (MI)
Let D = (Y, X, A), where A is auxiliary variables.
• Specify Bayesian model for joint distribution of D with non-inform priors
• Fit model to observed data
• Sample missing data from posterior predictive distribution
• Repeat M times to generate M complete datasets
• Analyse each complete dataset and get θ̂ (1) , . . . , θ̂ (M ) and V̂ (1) , . . . , V̂ (M ) .
• Combine M estimates using Rubin’s Rules:
θ̂M
V̂M
= Mean(θ̂ (1) , . . . , θ̂ (M ) )
= Mean(V̂
(1)
, . . . , V̂
(M )
1
)+ 1+
M
MI is valid when D is missing at random (MAR).
6
× Variance(θ̂ (1) , . . . , θ̂ (M ) )
Missing at Random (MAR)
D is MAR if probability that D values are missing does not depend on these
missing values given observed values.
More formally, let R denote missingness pattern and write D = (Dobs , Dmis ).
E.g. D = (Y, X1 , X2 , A1 , A2 ),
with Y , X1 and A2 observed, and X2 and A1 missing.
Then R = (1, 1, 0, 0, 1) and Dobs = (Y, X1 , A2 ) and Dmis = (X2 , A1 ).
D is MAR if p(R | Dobs , Dmis ) = p(R | Dobs ).
7
Comparing IPW and MI
• IPW only uses information on complete cases
• MI uses data on all subjects
• IPW assumes model for probability of complete case (‘missingness model’)
• MI assumes model for joint distribution of data (‘imputation model’)
• MI more efficient if some incomplete cases have some data
• IPW may be preferred if incomplete cases provide little or no information
8
Why Combine IPW and MI?
Have sampling weights (IPW) and want to impute missing data (MI).
Even without sampling weights, might want to use IPW/MI, because neither
pure IPW nor pure MI appeals.
• Do not want to discard incomplete case with few missing values.
• Do not want to impute data in people with lots of missing values.
IPW/MI has been used (e.g. Priebe et al., 2004; Stansfeld et al., 2008).
9
The IPW/MI method
Want to regress Y on X.
A is auxiliary variables (used in imputation model but not analysis model).
D = (Y, X, A).
R is missingness pattern of D.
W S is sampling weight.
Let I = I(R) be binary function of R.
Individual will be included in analysis if and only if sampled and I = 1.
Let H be variables that predict I given sampled
(assume H fully observed).
Let W I (H) be model for 1/P (I = 1 | H, sampled).
Let W = W (H) = W S × W I (H).
W corrects for varying probabilities of being 1) sampled and 2) included in
analysis given that sampled.
10
Specify model for p(D | I = 1).
Fit this imputation model to individuals with I = 1 and impute missing D
values in individuals with I = 1.
Create M imputed datasets, each containing only individuals with I = 1.
Fit analysis model to each of M imputed datasets, using weights W .
Obtain parameter estimates θ̂ (1) , . . . , θ̂ (M ) .
Calculate θ̂M = Mean(θ̂ (1) , . . . , θ̂ (M ) ).
11
Consistency of θ̂M
We proved that if
1. model for W I (H) = P (I = 1 | H, sampled) correctly specified
2. P (I = 1 | D, H, sampled) = P (I = 1 | H, sampled)
3. model for P (D | I = 1) correctly specified
4. p(R | D, W, I = 1) = p(R | Dobs , W, I = 1), and
5. Dmis ⊥⊥ W | Dobs , R, I = 1.
then, when M = ∞, θ̂M consistently estimates θ
Conditions 1 and 2 are conditions for IPW.
Conditions 3 and 4 are conditions for MI.
If condition 5 violated, distribution of D varies with W and fitting of
imputation model gives more weight to individuals with small W
Condition 5 can be satisfied by including W or (H, W S ) in D
12
Is Rubin’s Rules Variance Estimator Valid for IPW/MI?
θ̂ (m) is parameter estimate from mth imputed dataset.
θ̂M = Mean(θ̂ (1) , . . . , θ̂ (M ) ) is overall parameter estimate.
Rubin’s Rules for MI give estimator V̂M of Var(θ̂M ):
1
V̂M = Mean(V̂ (1) , . . . , V̂ (M ) ) + 1 +
× Variance(θ̂ (1) , . . . , θ̂ (M ) )
M
When θ̂ (m) is mle, V̂M is asymptotically unbiased.
Robins and Wang (2000) and Nielsen (2003) give examples of asymptotically
biased V̂M when θ̂ (m) is not mle.
In IPW/MI, θ̂ (m) is IPW estimator.
13
Linear regression with Imputed Outcome
Analysis is linear regression of Y on X.
A is auxiliary variables.
Let I = 1 if (X, A) is complete.
Missing Y imputed in individuals with I = 1 using linear regression of Y on
(X, A) with non-informative priors.
θ̂ is weighted least squares estimator and V̂ is sandwich estimator of variance.
Assume weights W are known.
We proved that if imputation model is correctly specified and
P (Y observed | Y, X, A, W, I = 1) = P (Y observed | X, A, W, I = 1)
1. θ̂M is consistent estimator of θ.
2. If A includes W and W –X interactions, V̂M is asymptotically unbiased
estimator of Var(θ̂M ).
14
Other Scenarios
No more theoretical results, but simulation studies suggest bias in Rubin’s
Rules variance estimator is small when
• linear regression with imputed covariate
• logistic regression with imputed outcome or covariate
15
Application to 1958 British Birth Cohort
• 17638 individuals born in UK during one week in 1958
16334 still alive at age 45
9014 (55%) participated in biomedical survey
• Thomas et al. (2007) regressed blood glucose (high/normal) at age 45 on
BMI and waist size at age 45 and variables measured at birth
7518 had observed glucose, BMI and waist size
Thomas et al. imputed missing birth variables using MICE
• But are 7518 representative of 16334?
We used IPW/MI
• Need model for P (glucose, BMI & waist size complete)
Used predictors measured at birth, age 7 and 11 (e.g. birth weight, class)
Disadvantaged less likely to be complete for glucose, BMI & waist size
16
Predictor of
Thomas
IPW/MI
Full MI
log OR
SE
log OR
SE
log OR
SE
short gestation
0.45
0.22
0.46
0.23
0.44
0.20
pre-eclampsia
0.47
0.26
0.55
0.27
0.47
0.25
smoke in pregncy
0.02
0.14
0.04
0.14
0.04
0.14
mother overwt
0.28
0.15
0.35
0.15
0.18
0.12
manual at birth
0.37
0.17
0.44
0.18
0.39
0.17
low birth weight
−0.30
0.09
−0.30
0.09
−0.32
0.09
BMI age 45
0.04
0.02
0.02
0.02
0.03
0.02
waist (cm) age 45
0.07
0.01
0.07
0.01
0.07
0.01
High Glucose
Stronger relation between glucose and pre-eclampsia/mother overwt in
disadvantaged than in advantaged
When use Full MI, relation similar in individuals with imputed glucose
Ordinary IPW uses only 5673 (75% of 7518) with complete birth data
17
Conclusions
• IPW/MI needed if using MI with sampling weights.
• Some researchers not confident about imputing large blocks of missing
data and may prefer IPW/MI even when no sampling weights.
• Typically less efficient than full MI, but relies less on imputation model.
• IPW/MI could be used as a check for full MI.
• Rubin’s Rules variance estimator asymptotically unbiased for IPW/MI
when imputing missing outcome in linear regression.
• IPW (and so IPW/MI) not without problems: missingness models can be
misspecified (but arguably are easier to check than imputation model);
IPW has problems handling incomplete predictors of weights (H).
18
Seaman SR et al. Combining Multiple Imputation and Inverse-Probability
Weighting. Biometrics 68: 129-137 (2012).
19
Question: Why might this be useful in practice? Why not just regress Y on
X using complete cases, inversely weighting by W ?
This will give consistent estimation if analysis model is correctly specified and
P (Y observed | Y, X, W, I = 1) = P (Y observed | X, W, I = 1)
Reason 1: Perhaps missingness of Y depends on Y given X and W , but is
independent of Y given X, A and W (“making MAR more plausible by
including auxilary information”)
Reason 2: Greater efficiency if X and A used to impute Y (“using auxilary
information to reduce uncertainty”)
20
Question: Why should A include W X for asymptotically unbiased /
consistent variance estimation?
Meng (1994) showed that when imputer assumes more than analyst and this
assumption is correct, V̂M over-estimates Var(θ̂M )
Suppose X = 1 (θ is population mean) and there are two values of W : a and b
θ̂ corresponds to stratifying sample by W , calculating mean in each stratum,
and then calculating weighted average of these two means.
So, analysis model does not assume population mean is same in both strata
If imputation model does not include W , it assumes population mean is same
in both strata
21