Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Combining Multiple Imputation and Inverse Probability Weighting Shaun Seaman1 , Ian White1 , Andrew Copas2,3 , Leah Li4 1 4 MRC Biostatistics Unit, Cambridge 2 MRC Clinical Trials Unit, London 3 UCL Research Department of Infection and Public Health, London MRC Centre of Epidemiology for Child Health, UCL Institute of Child Health, London Overview • Inverse probability weighting (IPW) • Multiple Imputation (MI) • Comparing IPW and MI • Why combine IPW and MI? • Properties of IPW/MI parameter and variance estimators • Application to 1958 British Birth Cohort • Conclusions 2 Complete Case Analysis Suppose model of interest is regression of Y on X. Some values of Y and/or X may be missing. Complete-case analysis valid if complete cases representative of population. Otherwise, typically biased. 3 Inverse Probability Weighting (IPW) Even when no missing data, sample may not be representative of population, due to unequal sampling fractions. Sampling weights used to make sample representative of population. Sampling weight is W S ∝ 1/P (individual sampled). This is form of IPW. IPW also used for missing data: weight complete cases so representative of whole sample. Let H be variables that predict whether complete case. Weight each complete case by W C = 1/P (complete case | H). IPW gives valid inference if P (complete case | H, X, Y ) = P (complete case | H). P (complete case | H) usually unknown, so model specified and estimated using observed data. 4 Multiple Imputation (MI) Y X1 X2 1 1 10 20.1 2 1 8 15.2 3 0 ? 4.7 4 0 2 3.9 5 1 ? ? 6 0 -1 ? 1 10 20.1 1 10 20.1 1 8 15.2 1 8 15.2 0 2.3 4.7 0 1.9 4.7 0 2 3.9 0 2 3.9 1 9.3 16.9 1 10.5 19.3 0 -1 -2.3 0 -1 -1.7 5 Multiple Imputation (MI) Let D = (Y, X, A), where A is auxiliary variables. • Specify Bayesian model for joint distribution of D with non-inform priors • Fit model to observed data • Sample missing data from posterior predictive distribution • Repeat M times to generate M complete datasets • Analyse each complete dataset and get θ̂ (1) , . . . , θ̂ (M ) and V̂ (1) , . . . , V̂ (M ) . • Combine M estimates using Rubin’s Rules: θ̂M V̂M = Mean(θ̂ (1) , . . . , θ̂ (M ) ) = Mean(V̂ (1) , . . . , V̂ (M ) 1 )+ 1+ M MI is valid when D is missing at random (MAR). 6 × Variance(θ̂ (1) , . . . , θ̂ (M ) ) Missing at Random (MAR) D is MAR if probability that D values are missing does not depend on these missing values given observed values. More formally, let R denote missingness pattern and write D = (Dobs , Dmis ). E.g. D = (Y, X1 , X2 , A1 , A2 ), with Y , X1 and A2 observed, and X2 and A1 missing. Then R = (1, 1, 0, 0, 1) and Dobs = (Y, X1 , A2 ) and Dmis = (X2 , A1 ). D is MAR if p(R | Dobs , Dmis ) = p(R | Dobs ). 7 Comparing IPW and MI • IPW only uses information on complete cases • MI uses data on all subjects • IPW assumes model for probability of complete case (‘missingness model’) • MI assumes model for joint distribution of data (‘imputation model’) • MI more efficient if some incomplete cases have some data • IPW may be preferred if incomplete cases provide little or no information 8 Why Combine IPW and MI? Have sampling weights (IPW) and want to impute missing data (MI). Even without sampling weights, might want to use IPW/MI, because neither pure IPW nor pure MI appeals. • Do not want to discard incomplete case with few missing values. • Do not want to impute data in people with lots of missing values. IPW/MI has been used (e.g. Priebe et al., 2004; Stansfeld et al., 2008). 9 The IPW/MI method Want to regress Y on X. A is auxiliary variables (used in imputation model but not analysis model). D = (Y, X, A). R is missingness pattern of D. W S is sampling weight. Let I = I(R) be binary function of R. Individual will be included in analysis if and only if sampled and I = 1. Let H be variables that predict I given sampled (assume H fully observed). Let W I (H) be model for 1/P (I = 1 | H, sampled). Let W = W (H) = W S × W I (H). W corrects for varying probabilities of being 1) sampled and 2) included in analysis given that sampled. 10 Specify model for p(D | I = 1). Fit this imputation model to individuals with I = 1 and impute missing D values in individuals with I = 1. Create M imputed datasets, each containing only individuals with I = 1. Fit analysis model to each of M imputed datasets, using weights W . Obtain parameter estimates θ̂ (1) , . . . , θ̂ (M ) . Calculate θ̂M = Mean(θ̂ (1) , . . . , θ̂ (M ) ). 11 Consistency of θ̂M We proved that if 1. model for W I (H) = P (I = 1 | H, sampled) correctly specified 2. P (I = 1 | D, H, sampled) = P (I = 1 | H, sampled) 3. model for P (D | I = 1) correctly specified 4. p(R | D, W, I = 1) = p(R | Dobs , W, I = 1), and 5. Dmis ⊥⊥ W | Dobs , R, I = 1. then, when M = ∞, θ̂M consistently estimates θ Conditions 1 and 2 are conditions for IPW. Conditions 3 and 4 are conditions for MI. If condition 5 violated, distribution of D varies with W and fitting of imputation model gives more weight to individuals with small W Condition 5 can be satisfied by including W or (H, W S ) in D 12 Is Rubin’s Rules Variance Estimator Valid for IPW/MI? θ̂ (m) is parameter estimate from mth imputed dataset. θ̂M = Mean(θ̂ (1) , . . . , θ̂ (M ) ) is overall parameter estimate. Rubin’s Rules for MI give estimator V̂M of Var(θ̂M ): 1 V̂M = Mean(V̂ (1) , . . . , V̂ (M ) ) + 1 + × Variance(θ̂ (1) , . . . , θ̂ (M ) ) M When θ̂ (m) is mle, V̂M is asymptotically unbiased. Robins and Wang (2000) and Nielsen (2003) give examples of asymptotically biased V̂M when θ̂ (m) is not mle. In IPW/MI, θ̂ (m) is IPW estimator. 13 Linear regression with Imputed Outcome Analysis is linear regression of Y on X. A is auxiliary variables. Let I = 1 if (X, A) is complete. Missing Y imputed in individuals with I = 1 using linear regression of Y on (X, A) with non-informative priors. θ̂ is weighted least squares estimator and V̂ is sandwich estimator of variance. Assume weights W are known. We proved that if imputation model is correctly specified and P (Y observed | Y, X, A, W, I = 1) = P (Y observed | X, A, W, I = 1) 1. θ̂M is consistent estimator of θ. 2. If A includes W and W –X interactions, V̂M is asymptotically unbiased estimator of Var(θ̂M ). 14 Other Scenarios No more theoretical results, but simulation studies suggest bias in Rubin’s Rules variance estimator is small when • linear regression with imputed covariate • logistic regression with imputed outcome or covariate 15 Application to 1958 British Birth Cohort • 17638 individuals born in UK during one week in 1958 16334 still alive at age 45 9014 (55%) participated in biomedical survey • Thomas et al. (2007) regressed blood glucose (high/normal) at age 45 on BMI and waist size at age 45 and variables measured at birth 7518 had observed glucose, BMI and waist size Thomas et al. imputed missing birth variables using MICE • But are 7518 representative of 16334? We used IPW/MI • Need model for P (glucose, BMI & waist size complete) Used predictors measured at birth, age 7 and 11 (e.g. birth weight, class) Disadvantaged less likely to be complete for glucose, BMI & waist size 16 Predictor of Thomas IPW/MI Full MI log OR SE log OR SE log OR SE short gestation 0.45 0.22 0.46 0.23 0.44 0.20 pre-eclampsia 0.47 0.26 0.55 0.27 0.47 0.25 smoke in pregncy 0.02 0.14 0.04 0.14 0.04 0.14 mother overwt 0.28 0.15 0.35 0.15 0.18 0.12 manual at birth 0.37 0.17 0.44 0.18 0.39 0.17 low birth weight −0.30 0.09 −0.30 0.09 −0.32 0.09 BMI age 45 0.04 0.02 0.02 0.02 0.03 0.02 waist (cm) age 45 0.07 0.01 0.07 0.01 0.07 0.01 High Glucose Stronger relation between glucose and pre-eclampsia/mother overwt in disadvantaged than in advantaged When use Full MI, relation similar in individuals with imputed glucose Ordinary IPW uses only 5673 (75% of 7518) with complete birth data 17 Conclusions • IPW/MI needed if using MI with sampling weights. • Some researchers not confident about imputing large blocks of missing data and may prefer IPW/MI even when no sampling weights. • Typically less efficient than full MI, but relies less on imputation model. • IPW/MI could be used as a check for full MI. • Rubin’s Rules variance estimator asymptotically unbiased for IPW/MI when imputing missing outcome in linear regression. • IPW (and so IPW/MI) not without problems: missingness models can be misspecified (but arguably are easier to check than imputation model); IPW has problems handling incomplete predictors of weights (H). 18 Seaman SR et al. Combining Multiple Imputation and Inverse-Probability Weighting. Biometrics 68: 129-137 (2012). 19 Question: Why might this be useful in practice? Why not just regress Y on X using complete cases, inversely weighting by W ? This will give consistent estimation if analysis model is correctly specified and P (Y observed | Y, X, W, I = 1) = P (Y observed | X, W, I = 1) Reason 1: Perhaps missingness of Y depends on Y given X and W , but is independent of Y given X, A and W (“making MAR more plausible by including auxilary information”) Reason 2: Greater efficiency if X and A used to impute Y (“using auxilary information to reduce uncertainty”) 20 Question: Why should A include W X for asymptotically unbiased / consistent variance estimation? Meng (1994) showed that when imputer assumes more than analyst and this assumption is correct, V̂M over-estimates Var(θ̂M ) Suppose X = 1 (θ is population mean) and there are two values of W : a and b θ̂ corresponds to stratifying sample by W , calculating mean in each stratum, and then calculating weighted average of these two means. So, analysis model does not assume population mean is same in both strata If imputation model does not include W , it assumes population mean is same in both strata 21