Download proc mianalyze

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to
Multiple Imputation
CFDR Workshop Series
Spring 2008
Outline
•
•
•
•
•
•
•
Missing data mechanisms
What is Multiple Imputation?
SAS Proc MI, Proc MIANALYZE
Stata ICE, MICOMBINE
SAS IVEware
What’s the diff?
Problems with categorical imputation
2
Missing data mechanisms
• Missing Completely At Random (MCAR)
– The probability of missingness doesn't depend on
anything.
• Missing At Random (MAR)
– The probability of missingness does not depend on
the unobserved value of the missing variable, but it
can depend on any of the other variables in your
dataset
• Not Missing at Random (NMAR)
– The probability of missingness depends on the
unobserved value of the missing variable itself
3
4
What is Multiple Imputation?
1. Imputation
•
Make M=3 to 10 copies of incomplete data
set filling in with conditionally random values
2. Analyses
•
Of each data set separately
3. Pooling
•
•
Point estimates. Average across M analyses
Standard errors. Combine variances .
5
1. Imputation: Multiple Copies of Dataset
Y
44.61
54.3
49.87
X1
X2
11.37
178
8.65
156
9.22 .
.
11.95
176
39.44 13.08
174
50.54 .
.
44.75 11.12
176
51.86 10.33
166
40.84 10.95
168
46.77 10.25 .
X3
1
0
.
1
1
1
0
0
.
.
_I_
1
1
1
1
1
1
1
1
1
1
_I_
2
2
2
2
2
2
2
2
2
2
Y
44.61
54.3
49.87
39.97
39.44
50.54
44.75
51.86
40.84
46.77
X1
11.37
8.65
9.22
11.95
13.08
9.117
11.12
10.33
10.95
10.25
X2
178
156
181.2
176
174
168.2
176
166
168
185.9
X3
1
0
0.23
1
1
1
0
0
0.756
0.632
Y
X1
X2
X3
44.609 11.37
178
1
54.297
8.65
156
0
49.874
9.22 137.47 0.0666
39.849 11.95
176
1
39.442 13.08
174
1
50.541 9.9192 162.67
1
44.754 11.12
176
0
51.855 10.33
166
0
40.836 10.95
168 0.2288
46.774 10.25 184.83 0.0998
6
Three steps
1. Imputation
•
Make M=2 to 10 copies of incomplete data
set filling in with conditionally random values
2. Analyses
•
Of each data set separately
3. Pooling
•
•
Point estimates. Average across M analyses
Standard errors. Combine variances .
7
What is MI?
• STATA
– based on each conditional density
– chained equations
• SAS
– joint distribution of all the variables
– assumed multivariate normal distribution
• SAS IVEware
– same as Stata, more options.
8
Stata Example
• ICE to impute
– Regression commands may be logistic,
mlogit, ologit, or regress.
• MICOMBINE to analyze and combine the
results.
– Supported regression cmds are clogit, cnreg,
glm, logistic, logit, mlogit, ologit, oprobit,
poisson, probit, qreg, regress, rreg, stcox,
streg, or xtgee.
• Easy to use, nice documentation
9
SAS example
Oxygen
RunTime RunPulse
44.609
11.37
178
54.297
8.65
156
49.874
9.22 .
.
11.95
176
39.442
13.08
174
50.541 .
.
44.754
11.12
176
51.855
10.33
166
40.836
10.95
168
46.774
10.25 .
39.407
12.63
174
45.441
9.63
164
10
Step 1: Proc MI
• Typical syntax:
proc mi data=mi_example out=outmi
seed=1234;
var Oxygen RunTime RunPulse;
run;
11
Step 2: Run Models
proc reg data=outmi outest=outreg covout
noprint;
model Oxygen = RunTime RUnPulse;
by _Imputation_;
run;
Note that the regression output is stored as
dataset “outreg”
Proc’s= Reg, Logistic, Genmod, Mixed, GLM
12
Parameter Estimates & Covariance
Matrices
proc print data=outreg(obs=8);
var _Imputation_ _Type_ _Name_ Intercept
RunTime RunPulse;
run;
Obs
_Imputation_
_TYPE_
1
2
3
4
5
6
7
8
1
1
1
1
2
2
2
2
PARMS
COV
COV
COV
PARMS
COV
COV
COV
_NAME_
Intercept
RunTime
RunPulse
Intercept
RunTime
RunPulse
Intercept RunTime RunPulse
82.9694
65.1698
0.2646
-0.3952
85.1831
85.3406
-0.4467
-0.4679
-2.44422
0.26463
0.14005
-0.0101
-3.0485
-0.44671
0.13629
-0.00581
-0.06121
-0.39518
-0.0101
0.00293
-0.03452
-0.46786
-0.00581
0.00308
13
Step 3. Proc Mianalyze
proc mianalyze data=outreg;
modeleffects Intercept RunTime RunPulse;
run;
Parameter
Estimate
Multiple Imputation Parameter Estimates
Std Error 95% Confidence Limits
DF
Intercept
RunTime
92.696519
-2.915452
12.780914
0.48346
65.35758
-3.90873
120.0355
-1.9222
RunPulse
-0.086795
0.070425
-0.23209
0.0585
Minimum
14.412 82.969385
26.264 -3.146336
24.163
-0.13547
Maximum
Pr > |t|
101.288118 <.0001
-2.444217 <.0001
-0.034519 0.2296
14
Irritating Parameter Est. & Covariance
Matrices
• Syntax depends on what procedure you used in previous step:
• proc mianalyze data=parmcov;
(or)
• proc mianalyze parms=parmsdat
covb=covbdat;
(or)
• proc mianalyze parms=parmsdat
xpxi=xpxidat;
PROC’s: reg, genmod, logit, mixed, glm.
15
SAS IVEware: 4 Components
1. IMPUTE -- nice options.
2. DESCRIBE estimates the population means, proportions, subgroup
differences, contrasts and linear combinations of means and
proportions. A Taylor Series approach is used to obtain variance
estimates appropriate for a user specified complex sample design.
3. REGRESS fits linear, logistic, polytomous, Poisson, Tobit and
proportional hazard regression models for data resulting from a
complex sample design.
4. SASMOD allows users to take into account complex sample design
features when analyzing data with several SAS procedures. SAS
PROCS can be called:CALIS, CATMOD, GENMOD, LIFEREG,
MIXED, NLIN, PHREG, and PROBIT.
16
IVEware Impute
IMPUTE assumes the variables in the data set are one of
the following five types:
(1) continuous
(2) binary
(3) categorical (polytomous with more than two categories)
(4) counts
(5) mixed
The types of regression models used are linear, logistic,
Poisson, generalized logit or mixed logistic/linear,
depending on the type of variable being imputed.
17
SAS IVEware: 4 Components
1. IMPUTE -- nice options.
2. DESCRIBE estimates the population means, proportions, subgroup
differences, contrasts and linear combinations of means and
proportions. A Taylor Series approach is used to obtain variance
estimates appropriate for a user specified complex sample design.
3. REGRESS fits linear, logistic, polytomous, Poisson, Tobit and
proportional hazard regression models for data resulting from a
complex sample design.
4. SASMOD allows users to take into account complex sample design
features when analyzing data with several SAS procedures. SAS
PROCS can be called:CALIS, CATMOD, GENMOD, LIFEREG,
MIXED, NLIN, PHREG, and PROBIT.
18
A Few Issues
• Do I impute the dependent variable?
• Which model has more information? The
imputation model or the analyst model?
• How many imputations do I need to do?
• Can I impute in one language and analyze in
another?
• How do I get summary statistics such as R
squared?
• Can I do this in SPSS?
• Where do I go with questions?
19
Thanks
Next up:
“COLLATERAL CONSEQUENCES OF VIOLENCE IN
DISADVANTAGED NEIGHBORHOODS”
Dr. David Harding
Wednesday, February 13,
Noon - 1:00 pm
Accessing and Analyzing Add Health Data
Instructor: Dr. Meredith Porter
Monday, February 25, 12:00-1:00 pm
20
Related documents