Download Missing Data Pt. 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Guide to Handling Missing Information
• Contacting researchers
• Algebraic recalculations, conversions and
approximations
• Imputation method (substituting missing data)
Imputation Method
- When recalculations not possible
-e.g. no standard deviation for a study
- Use available data from other studies or other
meta-analysis
Imputation Method
a. Within study
imputation
b. Multiple
imputations
Within-study
_ imputation
Method 1.
(Means)
~
SDj
_
Xj
~SD = X Ʃ k SD
j
j ______
i _ i
Ʃik Xi
= Standard deviation (SD) for missing data from study j
=Mean from study with missing SD
Ʃik SDi
=Summation of all known SD from different studies
(Ʃik Xi)
=Summation of means from different studies other
than j
_
Assumptions
-
~SD = X Ʃ k SD
j
j ______
i _ i
Ʃik Xi
•Assumes SD to mean ratio is at the same scale for
all studies
- Experimental scales can differ tremendously
between different taxonomic groups or
experimental designs
Method 2.
(sample size)
~s α+β(n )
j=
j
-Regression techniques
- Reports sample size but missing information
to calculate pooled SD (required for Hedge’s
d).
α = Intercept
β = slope of the linear regression of n vs s
nj = observed sample size of the study with
missing data
Assumptions
~s α+β(n )
j=
j
• Assumes n (observed sample size of the study
with missing data) is a good predictor s.
Method 3.
No. of studies
~s = Ʃ k s √n
j
i j
i
_____
K √nj
K= number of studies with complete information
on s and n (sample size of individual study)
Method 4. Follman et al. (1992) Furukawa et al. (2006)
~s = √Ʃ k [(n -1)Ϭ2 ]
j __________
i
i
i
√Ʃik (ni-1)
Ϭ2= variance
n= sample size of individual study
Assumptions
• Some degree of homogeneity
among the
_
observed SD and X across studies
• Assume information is missing at random and
not due to reporting biases (non-random)
-Imputations retain their original units.
-Large variations among estimates will bias
imputations.
Multiple imputations
• Use random sampling approach
• Average repeated sampling for missing data
Overall imputed synthesis
Advantage of multiple imputations
• Variability is explicitly modeled therefore do
no treat imputed value as true observation
• e.g. ~sj=α+β(nj) Does not account for error
associated with α or β.
Methods: Multiple imputations
• Various methods: use maximum likelihood or
Bayesian models.
• Requires specialized software
• e.g. Hot Deck- To calculate pooled s but
several SD values missing
- Random sample of s drawn with replacement
possible s
- Process repeated with replacement from
possible s
- Repeat till we get “m” number of complete
data sets
Methods:
Hot
deck
_
calculate effect size= δ _
Calculate variance = Ϭ2 (δl)
.
.
_
δ = Ʃlm =___
1 δl
m
for each(m) data
set
Pooled effect size
.
Variance=
_
_
m = Ϭ2(δ ) + (1+1) Ʃ m= (δ – δ)2
Ϭ2(δ)= Ʃ_________
_ _________
l
1
l
l
1 l
m
m
m-1
Rubin and Schenker (1991)
If 30% data missing->m= 3
If 50% data missing->m= 5
Non-parametric analyses and
bootstrapping
• Alternative to Hedge’s d
• Using weighting scheme
• Does not require SD
• E.g log response ratio
_
T= treatment
lnR= ln
X
T
___
_
C= control
XC
If sample size available but no SD
Ϭ2=(lnR)= n___
Inverse of a simplified estimate
T nC
nT+nC
of variance
Effects of Imputation
• No standardized method for imputation-> bias
Rubin and Schenker (1991) e.g.
• Appropriateness of imputed data can be
evaluated using a sensitivity analysis
• Benefits despite potential bias
- Improved variance estimate (i.e. smaller CI) over
exclusion
- May potentially improve representation of null
studies