Download Missing Data Pt. 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Time series wikipedia, lookup

Student's t-test wikipedia, lookup

Misuse of statistics wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Transcript
Guide to Handling Missing Information
• Contacting researchers
• Algebraic recalculations, conversions and
approximations
• Imputation method (substituting missing data)
Imputation Method
- When recalculations not possible
-e.g. no standard deviation for a study
- Use available data from other studies or other
meta-analysis
Imputation Method
a. Within study
imputation
b. Multiple
imputations
Within-study
_ imputation
Method 1.
(Means)
~
SDj
_
Xj
~SD = X Ʃ k SD
j
j ______
i _ i
Ʃik Xi
= Standard deviation (SD) for missing data from study j
=Mean from study with missing SD
Ʃik SDi
=Summation of all known SD from different studies
(Ʃik Xi)
=Summation of means from different studies other
than j
_
Assumptions
-
~SD = X Ʃ k SD
j
j ______
i _ i
Ʃik Xi
•Assumes SD to mean ratio is at the same scale for
all studies
- Experimental scales can differ tremendously
between different taxonomic groups or
experimental designs
Method 2.
(sample size)
~s α+β(n )
j=
j
-Regression techniques
- Reports sample size but missing information
to calculate pooled SD (required for Hedge’s
d).
α = Intercept
β = slope of the linear regression of n vs s
nj = observed sample size of the study with
missing data
Assumptions
~s α+β(n )
j=
j
• Assumes n (observed sample size of the study
with missing data) is a good predictor s.
Method 3.
No. of studies
~s = Ʃ k s √n
j
i j
i
_____
K √nj
K= number of studies with complete information
on s and n (sample size of individual study)
Method 4. Follman et al. (1992) Furukawa et al. (2006)
~s = √Ʃ k [(n -1)Ϭ2 ]
j __________
i
i
i
√Ʃik (ni-1)
Ϭ2= variance
n= sample size of individual study
Assumptions
• Some degree of homogeneity
among the
_
observed SD and X across studies
• Assume information is missing at random and
not due to reporting biases (non-random)
-Imputations retain their original units.
-Large variations among estimates will bias
imputations.
Multiple imputations
• Use random sampling approach
• Average repeated sampling for missing data
Overall imputed synthesis
Advantage of multiple imputations
• Variability is explicitly modeled therefore do
no treat imputed value as true observation
• e.g. ~sj=α+β(nj) Does not account for error
associated with α or β.
Methods: Multiple imputations
• Various methods: use maximum likelihood or
Bayesian models.
• Requires specialized software
• e.g. Hot Deck- To calculate pooled s but
several SD values missing
- Random sample of s drawn with replacement
possible s
- Process repeated with replacement from
possible s
- Repeat till we get “m” number of complete
data sets
Methods:
Hot
deck
_
calculate effect size= δ _
Calculate variance = Ϭ2 (δl)
.
.
_
δ = Ʃlm =___
1 δl
m
for each(m) data
set
Pooled effect size
.
Variance=
_
_
m = Ϭ2(δ ) + (1+1) Ʃ m= (δ – δ)2
Ϭ2(δ)= Ʃ_________
_ _________
l
1
l
l
1 l
m
m
m-1
Rubin and Schenker (1991)
If 30% data missing->m= 3
If 50% data missing->m= 5
Non-parametric analyses and
bootstrapping
• Alternative to Hedge’s d
• Using weighting scheme
• Does not require SD
• E.g log response ratio
_
T= treatment
lnR= ln
X
T
___
_
C= control
XC
If sample size available but no SD
Ϭ2=(lnR)= n___
Inverse of a simplified estimate
T nC
nT+nC
of variance
Effects of Imputation
• No standardized method for imputation-> bias
Rubin and Schenker (1991) e.g.
• Appropriateness of imputed data can be
evaluated using a sensitivity analysis
• Benefits despite potential bias
- Improved variance estimate (i.e. smaller CI) over
exclusion
- May potentially improve representation of null
studies