Download No Slide Title

Document related concepts
no text concepts found
Transcript
Part 13
Missing Data
Measurement Error
Term 4, 2006
BIO656--Multilevel Models
1
PROJECTS ARE DUE
• By midnight, Friday, May 19th
• Electronic submission only to [email protected]
• Please name the file:
[myname]-project.[filetype]
or
[name1_name2]-project.[filetype]
Term 4, 2006
BIO656--Multilevel Models
2
Overview
•
•
•
•
•
Missing data are inevitable
Some missing data are “inherent”
Prevention is better than statistical “cures”
Too much missing information invalidates a study
There are many methods for accommodating missing
data
– Their validity depends on the missing data
mechanism and the analytic approach  
• Issues can be subtle
• A little data on the missingness process can be
helpful
Term 4, 2006
BIO656--Multilevel Models
3
Common types of missing data
•
•
•
•
•
Survey non-response
Missing dependent variables
Missing covariates
Dropouts
Censoring
– administrative, due to competing events
or due to loss to follow-up
• Non-reporting or delayed reporting
• Noncompliance
• Measurement error
Term 4, 2006
BIO656--Multilevel Models
4
Implications of missing data
Missing data produces/induces
• Unbalanced data
• Loss of information and reduced efficiency
• Extent of information loss depends on
– Amount of missingness
– Missingness pattern
– Association between the missing and observed
data
– Parameters of interest
– Method of analysis
Care is needed to avoid biased inferences,
inferences that target a reference population other
than that intended
• e.g., those who stay in the study
Term 4, 2006
BIO656--Multilevel Models
5
Inherent missingness
Right-censoring
• We know only that the event has yet to occur
– Issue: “No news is no news” versus
“no news is good news”
Latent disease state
• Disease Free/Latent Disease/Clinical Disease
– Screen and discover latent disease
– Only known that transition DFLD occurred
before the screening time and that LDCD has
yet to occur
Term 4, 2006
BIO656--Multilevel Models
6
Missing Data Mechanisms
Little RJA, Rubin D. Statistical analysis with missing data.
Chichester, NY: John Wiley & Sons; 2002
Missing Completely at random (MCAR)
• Pr(missing) is unrelated to process under study
Missing at Random (MAR)
• Pr(missing) depends only on observed data
Not Missing at Random (NMAR)
• Pr(missing) depends on both observed
and unobserved data
These distinctions are important because
validity of an analysis depends
on the missing data mechanism
Term 4, 2006
BIO656--Multilevel Models
7
Notation
(for a missing dependent variable
in a longitudinal study)
i indexes participant (unit), i = 1,…,n
j indexes measurement (sub-unit), j = 1,…,J
• Potential response vector
Yi = (Yi1, Yi2, …, YiJ)
• Response Indicators
Ri = (Ri1, Ri2, …, RiJ)
Rij = 1 if Yij is observed and Rij = 0 if Yij is missing
• Given Ri, Yi can be partitioned into two components:
YiO observed responses
YiM missing responses
Term 4, 2006
BIO656--Multilevel Models
8
Schematic Representation of Response vector
and Response indicators
Response vector
Response indicators
Patient
Y1
Y2
Y3
…
YJ
R1
R2
R3
…
RJ
1
y11
y12
y13
…
y1J
1
1
1
…
1
2
y21
*
y23
…
y2J
1
0
1
…
1
3
y31
y32
*
…
y3J
1
1
0
…
1
…
…
…
…
…
…
…
Eg:
…
yn1
…
…
…
n
*
*
…
*
1
0
0
…
0
Y2 = (Y21, Y22, Y23, … , Y2J)
Y2O = (Y21, Y23, …, Y2J)
Term 4, 2006
R2 = (1, 0, 1, … , 1)
Y2M = (Y22)
BIO656--Multilevel Models
9
More general missing data
• A similar notation can be used for missing regressors
(Xij) and for missing components of an even more
general data structure
• Using “Y” to denote all of the potential data
(regressors, dependent variable, etc.), the foregoing
notation applies in general
Term 4, 2006
BIO656--Multilevel Models
10
Missing Data Mechanisms
• Some mechanisms are relatively benign and do not
complicate or bias an analysis
• Others are not benign and can induce bias
Example
• Goal is to predict weight from gender and height
• Use information from Bio656 students
• Possible reasons for missing data
– Absence from class
– Gender-associated, non-response
– Weight-associated, non-response
How would each of the above reasons affect results?
Term 4, 2006
BIO656--Multilevel Models
11
Missing Completely at Random
(MCAR)
• Missingness is a chance mechanism that does not
depend on observed or unobserved responses
– Ri is independent of both YiO and YiM
Pr(Ri | YiO , YiM ) = Pr(Ri)
• In the weight survey example, missingness due to
absence from class is unlikely to be related to the
relation between weight, height and gender
• The dataset can be regarded as a random sample
from the target population (the full class, Bio620 over
the years, ....)
• A complete-case analysis is appropriate, albeit with a
drop in efficiency relative to obtaining more data
Term 4, 2006
BIO656--Multilevel Models
12
Missing Completely at Random
(MCAR)
Scatterplot: Weight vs Height by Gender
90
80
70
60
H[G == 0 & R == 1]
70
80
90
100
FEMALE
Observed
Missing
60
• A complete-case analysis is
appropriate
Weight (lb)
• The probability of having a
missing value for variable Y
is unrelated to the value of Y
or to any other variables in
the data set
Weight (lb)
100
MALE
Observed
Missing
50
60
70
80
90
100
Height (cm)
Height (cm)
Term 4, 2006
BIO656--Multilevel Models
13
Missing at random (MAR)
• Missingness depends on the observed responses, but does not
depend on what would have been measured, but was not collected
Pr(Ri|YiO,YiM) = Pr(Ri|YiO)
• The observed data are not a random sample from the full population
– In the weight survey example, data are MAR if Pr(missing weight)
depends on gender or height but not on weight
• Even though not a random sample, the distribution of YiM conditional
on YiO is the same as that in the reference population (the full class)
• Therefore, YiM can be validly predicted using YiO
– Of course, validity depends on having a correct model for the
mean and dependency structure for the observed data
• But, we don’t need to do these predictions to get a valid inferences
Term 4, 2006
BIO656--Multilevel Models
14
Missing at random (MAR)
Term 4, 2006
60
70
80
90
100
MALE
Observed
Missing
H[G == 0 & R == 1]
70
80
90
100
FEMALE
Observed
Missing
60
A complete case analysis
gives a valid slope, when
selection is on the predictors,
BUT correlation will be biased.
Weight (lb)
• Analysis using the wrong
model is not valid
– e.g., uncorrelated
regression, when
correlation is needed
Scatterplot: Weight vs Height by Gender
Weight (lb)
• The probability of missing
data on Y is unrelated to the
value of Y, after controlling
for other variables in the
analysis
50
BIO656--Multilevel Models
60
70
80
90
100
(cm)
H[GHeight
== 1 & R
== 1]
15
When the mechanism is MAR
• Complete-case methods and standard regression methods based
on all the available data can produce biased estimates of mean
response or trends
• If the statistical model for the observed data is correct, likelihoodbased methods using only the observed data are valid
• Requires that the joint distribution of the observed Yis is correctly
specified,
– when the mean and covariance are correct
– when using a correct GEE working model
– when using correct random effects
Ignorability
• With a correct model for the observeds, under MAR the details of
the missing data mechanism are not needed; the mechanism is
ignorable
– Ignorability is not an inherent property of the mechanism
– It depends on the mechanism and on the analytic model
Term 4, 2006
BIO656--Multilevel Models
16
Not missing at random (NMAR)
• Missingness depends on the responses that could have
been observed
Pr(Ri|YiO,YiM) does depend on YiM
• The observed data cannot be viewed as a random
sample of the complete data
• The distribution of YiM conditional on YiO is not the same
as that in the reference population (the full class)
• YiM depends on YiO and on Pr(Ri|YiO,YiM) and on Pr(Y)
• In the weight survey example, data are NMAR if
missingness depends on weight
Term 4, 2006
BIO656--Multilevel Models
17
Missing Data Mechanisms:
Not missing at random (NMAR)
70
80
90
100
MALE
Observed
Missing
60
Weight (lb)
H[G == 0 & R == 1]
70
80
90
100
FEMALE
Observed
Missing
60
Weight (lb)
• Also known as
– Non-ignorable missing
• The probability of missing
data on Y is related to the
value of Y even if we control
for other variables in the
analysis.
• A complete-case analysis is
NOT valid
• Any analysis that does not
take dependence on Y into
account is not valid
• Inferences are highly model
dependent
Scatterplot: Weight vs Height by Gender
50
60
70
80
90
100
(cm)
H[GHeight
== 1 & R
== 1]
Term 4, 2006
BIO656--Multilevel Models
18
MAR for Y vs X
NMAR for cor(X,Y)
Scatterplot: Weight vs Height with fitted line
analysis with missing
Weight (lb)
80
40
60
Weight (lb)
100
120
initial analysis
50
60
70
80
Height (cm)
Term 4, 2006
90
100
50
60
70
80
90
100
Height (cm)
BIO656--Multilevel Models
19
When the mechanism is NMAR
• Almost all standard methods of analysis are invalid
– Valid inferences require joint modeling of the
response and the missing data mechanism
Pr(Ri|YiO,YiM)
• Importantly, assumptions about Pr(Ri|YiO,YiM) cannot be
empirically verified using the data at hand
• Sensitivity analyses can be conducted
(Dan Scharfstein’s research focus)
• Obtaining values from some missing Ys can inform on
the missing data mechanism
Term 4, 2006
BIO656--Multilevel Models
20
Dropouts
(if missing, missing thereafter)
Dropout Completely at Random
• Dropout at each occasion is independent of all past, current,
and future outcomes
– Is assumed for Kaplan-Meier estimator and Cox PHM
Dropout at Random
• Dropout depends on the previously observed outcomes up to,
but not including, the current occasion
– i.e., given the observed outcomes, dropout is independent of
the current and future unobserved outcomes
Dropout Not at Random, “informative dropout”
• Dropout depends on current and future unobserved outcomes
Term 4, 2006
BIO656--Multilevel Models
21
Probability of a follow-up lung function
measurement depends on smoking status
and current lung function
Term 4, 2006
Is the mechanism MAR?
BIO656--Multilevel
Models
We don’t know!
22
LUNG FUNCTION DECLINE IN ADULTS
Term 4, 2006
BIO656--Multilevel Models
23
Longitudinal dropout example
• Repeated measurements Yit
i indexes people, i=1,…,n
t indexes time, t=1,…,5
Yit = μit = 0 + 1t + eit
cor = cov(eis, eit) = |s-t|;   0
• 0 = 5, 1 = 0.25,  = 1,  = 0.7
Term 4, 2006
BIO656--Multilevel Models
24
Longitudinal dropout example
the dropout mechanism
• Dropout indicator, Di
• Di = k if person i drops out between the (k-1)st and
kth occasion
• Assume that
 Pr( Di  k | Di  k , Yi1 ,..., Yik ) 
log
  q1  q2 Yik 1  q3Yik
 Pr( Di  k | Di  k , Yi1 ,..., Yik ) 
• Dropout is MCAR if q2 = q3 = 0
• Dropout is MAR if q3 = 0
• Dropout is NMAR if q3 ≠ 0
Term 4, 2006
BIO656--Multilevel Models
25
Population Regression Line vs. Observed Data Means
MCAR (q1= -0.5, q2= q3 = 0)
Y
MAR (q1= -0.5, q2=0.5, q3 = 0)
Y
6.5
6.5
6
6
5.5
5.5
5
1
2
3
Y
4
5
T
5
1
2
3
4
T
5
NMAR (q1= -0.5, q2=0, q3 = 0.5)
6.5
6
5.5
Term 4, 2006
5
BIO656--Multilevel
Models
1
2
3
4
5
T
26
Analysis results
The true regression parameters are
intercept = 5.0 and slope = 0.25,  = 0.7
ML(se)
GEE/OLS(se)
Estimate
Estimate
Dropout
Mechanism
Parameter
MCAR
Intercept
5.015(0.031)
5.022(0.032)
Slope
0.257(0.016)
0.253(0.018)
Intercept
5.003(0.041)
5.062(0.043)
Slope
0.261(0.016)
0.182(0.018)
Intercept
5.058(0.040)
5.071(0.043)
Slope
0.201(0.016)
0.162(0.018)
MAR
NMAR
Term 4, 2006
BIO656--Multilevel Models
27
Misspecified GEE
(when the truth is random intercepts and slopes)
Complete Data (GEE)
Partial Missing Data (GEE)
Y
Y
Time
Term 4, 2006
BIO656--Multilevel Models
Time
28
Correctly specified Random Effects
(when the truth is random intercepts and slopes)
Complete Data (REM)
Partial Missing Data (REM)
Y
Y
Time
Term 4, 2006
BIO656--Multilevel Models
Time
29
The probability
of dropping out
depends on the
observed history
Term 4, 2006
BIO656--Multilevel Models
30
One step at a time
Term 4, 2006
BIO656--Multilevel Models
31
There are 5 different “trajectories”
with relative weights 2 2 1 1 2
The OLS analysis has
regressors 0, 1, 2 and
dependent variables
0, , 2
The Indep. Increments
analysis has a constant
regressor “1” and so is just
estimating the mean. The
dependent variable is
either + or -
Term 4, 2006
BIO656--Multilevel Models
32
If the missing data process is MAR and if we
use the correct model for the observed data,
the missing data mechanism is “ignorable”
• In the foregoing example, computing first differences
(current value – previous value) and averaging them
differences is an unbiased estimate (of 0) no matter
how complicated the MAR missing data process
• We don’t have to know the details of the dropout
process (it can be very complicated), as long as the
probabilities depend only on what has been observed
and not on what would have been observed
• Ignorability depends on using the correct model for
the observed data (mean and dependency structure)
• If the errors were independent (rather than the first
differences), then standard OLS would be unbiased
Term 4, 2006
BIO656--Multilevel Models
33
Analytic Approaches
Complete Case Analysis
• Global complete case analysis
• Individual model complete case analysis
• Augment with missing data indicators
– primarily for missing Xs
• Weighting
• Imputation
– Single
– Multiple
• Likelihood-based (model-based) methods
Term 4, 2006
BIO656--Multilevel Models
34
Analytic Approaches
Global complete-case Analysis
(use only data for people with fully complete data)
• Biased, unless the dropout is MCAR
• Even if MCAR is true, can be immensely inefficient
Analyze Available Data (use data for people with complete data
on the regressors in the current model)
• More efficient than complete-case methods, because uses
maximal data
• Biased unless the dropout is MCAR
• Can produce floating datasets, producing “illogical” conclusions
– R2 relations are not monotone
Use Missing data indicators (e.g., create new covariates)
Term 4, 2006
BIO656--Multilevel Models
35
Weighting
• Stratify samples into J weighting classes
– Zip codes
– propensity score classes
• Weight the observed data inversely according to the response
rate of the stratum
– Lower response rate  higher weight
• Unbiased if observed data are a random sample in a weighting
class (a special form of the MAR assumption)
• Biased, if respondents differ from non-respondents in the class
• Difficult to estimate the appropriate standard error because
weights are estimated from the response rates
Term 4, 2006
BIO656--Multilevel Models
36
Simple example of weighting adjustment
• Estimate the average height of villagers in two villages
• Surveys sent to 10% of the population in both villages
Village A
Village B
# villagers
1000
1000
# survey sent
100
100
# providing
data
100
50
Avg height
1.7m
1.4m
• Direct, unweighted: 1.7*(2/3) + 1.4*(1/3) = 1.60m
• Weighted: 100*1.7*0.005 + 50*1.4*0.01 = 1.55m (= 1.7*.5 + 1.4*.5)
2 x Weight
Term 4, 2006
BIO656--Multilevel Models
37
Single Imputation
Single Imputation
• Fill in missing values with imputed values
• Once a filled-in dataset has been constructed,
standard methods for complete data can be applied
Problem
• Fails to account for the uncertainty inherent in the
imputation of the missing data
• Don’t use it!
Term 4, 2006
BIO656--Multilevel Models
38
Multiple Imputation
Rubin 1987, Little & Rubin 2002
• Multiply impute “m” pseudo-complete data sets
– Typically, a small number of imputations
(e.g., 5 ≤ m ≤10) is sufficient
• Combine the inferences from each of the m data sets
• Acknowledges the uncertainty inherent in the imputation
process
• Equivalently, the uncertainty induced by the missing data
mechanism
• Rubin DB. Multiple Imputation for Nonresponse in Surveys, Wiley, New York, 1987
• Little RJA, Rubin D. Statistical analysis with missing data. Chichester, NY: John Wiley &
Sons; 2002
Term 4, 2006
BIO656--Multilevel Models
39
Multiple Imputation
Term 4, 2006
BIO656--Multilevel Models
40
Multiple Imputation: Combining Inferences
• Combine m sets of parameter estimates to provide a single
estimate of the parameter of interest
• Combine uncertainties to obtain valid SEs
• In the following, “k” indexes imputation
1 m ˆ (k)
β  β
m k 1

m
1 m
1
1


ˆ (k)  β
Var( β )   Var(βˆ (k) )  1 
β

m k 1
 m  m  1 k 1
Within-imputation
variance
Term 4, 2006

2
Between-imputation
variance
BIO656--Multilevel Models
41
Multiple Imputation: Combining Inferences
• Combine m sets of parameter estimates to provide a single
estimate of the parameter of interest
• Combine uncertainties to obtain valid SEs
• In the following, “k” indexes imputation
1 m ˆ (k)
β  β
m k 1


m
1 m
1
1


ˆ (k)  β βˆ (k)  β
Cov( β )   Cov( βˆ (k) )  1 
β

m k 1
m
m

1


k 1
Within-imputation
covariance
Term 4, 2006
Between-imputation
covariance
BIO656--Multilevel Models
42

'
Producing the Imputed Values
Last value carried forward (LVCF)
• Single Imputation (never changes)
• Assumes the responses following dropout remain
constant at the last observed value prior to dropout
• Unrealistic unless, say, due to recovery or cure
• Underestimates SEs
Hot deck
• Randomly choose a fill-in from outcomes of “similar”
units
• Distorts distribution less than imputing the mean or
LVCF
• Underestimates SEs
Term 4, 2006
BIO656--Multilevel Models
43
Valid Imputation
Build a model relating observed outcomes
• Means and covariances and random effects, ...
• Goal is prediction, so be liberal in including
predictors
• Don’t use P-values; don’t use step-wise
• Do use multiple R2, predictions sums of squares,
cross-validation, ...
Term 4, 2006
BIO656--Multilevel Models
44
Producing Imputed Values
Sample values of YiM from pr(YiM|YiO, Xi)
• Can be straightforward or difficult
• Monotone case: draw values of YiM from pr(YiM|YiO,Xi) in a
sequential manner
• Valid when dropouts are MAR or MCAR
Propensity Score Method
• Imputed values are obtained from observations on people who
are equally likely to drop out as those lost to follow up at a given
occasion
• Requires a model for the propensity (probability) of dropping
out, e.g.,
Pr(Di  k | Di  k, Yi1,, Yik ) 
log
  θ1  θ2 Yik 1
Pr(Di  k | Di  k, Yi1,, Yik ) 
Term 4, 2006
BIO656--Multilevel Models
45
Producing Imputed Values
Recall that “Y” is all of the data,
not just the dependent variable
Predictive Mean Matching (build a regression model!)
• A series of regression models for Yik, given Yi1, …,Yik-1, are fit using
the observed data on those who have not dropped out by the kth
occasion. For example,
E(Yik) = 1 + 2Yi1 +…+ kYi(k-1)
V(Yik) = ˆ 2
Yields ̂ and ̂ 2
1. Parameters * and 2* are then drawn from the distribution of the
estimated parameters (to account for the uncertainty in the
estimated regression)
2. Missing values can then be predicted from
1* + 2*Yi1+…+ k*Yik-1+ *ei,
where ei is simulated from a standard normal distribution
3. Repeat 1 and 2
Term 4, 2006
BIO656--Multilevel Models
46
Missing, presumed at random
Cost-analysis with incomplete data*
• Estimate the difference in cost between transurethral resection
(TURP) and contact-laser vaporization of the prostate (Laser)
• 100 patients were randomized to one of the two treatments
– TURP: n = 53; Laser: n = 47
• 12 categories of medical resource usage were measured
– e.g., GP visit, transfusion, outpatient consultation, etc.
* Briggs A et al. Health Economics. 2003; 12, 377-392
Term 4, 2006
BIO656--Multilevel Models
47
Missing data
TURP n = 53
Laser n = 47
Total n = 100
Patients with no
missing resource
counts
34 (59%)
21 (51%)
55 (55%)
Observed resource
counts
570 (90%)
510 (90%)
1080 (90%)
Complete-case analysis uses only
half of the patients in the study even
though 90% of resource usage data
were available
Term 4, 2006
BIO656--Multilevel Models
48
Comparison of inferences
Note that mean imputation understates uncertainty.
Term 4, 2006
BIO656--Multilevel Models
49
Multiple Imputation versus
likelihood analysis when data are MAR
• Both multiple imputation or used of a valid statistical
model for the observed data (likelihood analysis) are
valid
– The model-based analysis will be more efficient,
but more complicated
• Validity of each depends on correct modeling to
produce/induce ignorability
Term 4, 2006
BIO656--Multilevel Models
50
What if you doubt the MAR assumption
(you should always doubt it!)
You can never empirically rule out NMAR
• Methods for NMAR exist, but they require information
and assumptions on
pr(Missing | observed, unobserved)
• Methods depend on unverifiable assumptions
• Sensitivity analysis can assess the stability of
findings under various scenarios
– Set bounds on the form and strength of the
dependence
– Evaluate conclusions within these bounds
Term 4, 2006
BIO656--Multilevel Models
51
MEASUREMENT ERROR
If a covariate (X) is measured with error,
what is the implication for regression of Y on X?
See also “Air” and “Cervix” in
volume II of the BUGS examples
Term 4, 2006
BIO656--Multilevel Models
52
Measurement Error
Another type of missing data
• Measurement error is a special case of missing data
because we do not get to “observe the true value” of
the response or covariates
• Depending on the measurement error mechanism
and on the analysis, inferences can be
– inefficient (relative to no measurement error)
– biased
Term 4, 2006
BIO656--Multilevel Models
53
• Differential attenuation across
studies complicates “exporting”
and synthesizing
Term 4, 2006
BIO656--Multilevel Models
54
Term 4, 2006
BIO656--Multilevel Models
55
The two “Pure Forms”
relating Xt & Xo
Classical: Xo = Xt + , (0, 2)
What you see is a random deviation from the truth
• Measured & true blood pressure
• Measured and true social attitudes
Berkson: Xt = Xo + 
The truth is a random deviation from what you see
• Individual SES measured by ZIP-code SES
• Personal air pollution measured by centrally
monitored value
• Actual temperature & thermostat setting
Term 4, 2006
BIO656--Multilevel Models
56
Hybrids are possible
Xt and Xo have a general joint distribution
Term 4, 2006
BIO656--Multilevel Models
57
Measurement error’s effect
on a simple regression coefficient
Classical
• The regression coefficient on Xo is attenuated towards 0
relative to the “true” regression coefficient on Xt
• Because, the spread of Xo is greater than that for Xt
Berkson
• No effect on the expected regression coefficient
• Variance inflation
Term 4, 2006
BIO656--Multilevel Models
58
Berkson
Xt = X0 + , (0, 2)
true: Y = int + Xt + resid
= int + (X0 + ) + resid
observed: Y = int + * X0 + resid
Var(X0) = 02
No attenuation * = 
because E(Xt | X0) = X0
Term 4, 2006
BIO656--Multilevel Models
59
Classical
Xo = Xt + , (0, 2)
true: Y = int + Xt + resid
observed: Y = int + *X0 + resid
= int + *(Xt + ) + resid
 Var(X0) = t2 + 2 (X0 is stretched out)
Attenuation (attenuation factor )
* = 
 = t2 /(t2 + 2)
slope = cov(Y, X)/Var(X), but E(Xt | X0) =  X0
Term 4, 2006
BIO656--Multilevel Models
60
Y versus Xt
Term 4, 2006
BIO656--Multilevel Models
61
Y versus X0
Term 4, 2006
BIO656--Multilevel Models
62
An illustration
Back to the basic example
• W = Weight (lb)
• H = Height (cm)
• Analysis: simple linear regression
Wi = 0 + 1 Hi+ ei where ei ~ N(0, 2
Assume the true model to be:
Wi = 3 + 1.0Hi+ ei where
ei ~ N(0, 82
Measurement error
1. Error in W: observe W* = W + ei* where ei ~ N(0, 42
2. Error in H : observe H* = H + i* where i* ~ N(0, 102
Term 4, 2006
BIO656--Multilevel Models
63
Scenario 1: Measurement Error in Response
100
Scatterplot: Weight vs Height
Results:
90
1 = 1.16
80
SE(1)= 0.15
70
Weight (lb)
No error
With error
60
1 = 1.08
50
SE(1) = 0.18
60
70
80
90
Height (cm)
• Standard regression estimate for 1 is unbiased, but less efficient
• The larger is the measurement error, the greater the loss in efficiency
Term 4, 2006
BIO656--Multilevel Models
64
Scenario 2: measurement error in H
100
Scatterplot: Weight vs Height
Results:
90
80
SE(1)= 0.15
70
1 = 1.16
60
1 = 0.69
SE(1)= 0.21
50
Weight (lb)
No error
With error
50
60
70
80
90
100
Height (cm)
• Standard regression estimate for 1 is biased (attenuated)
• The larger is the measurement error, the greater the attenuation
Term 4, 2006
BIO656--Multilevel Models
65
Multivariate Measurement Error
Xo = Xt + , (0, )
Term 4, 2006
BIO656--Multilevel Models
66
Term 4, 2006
BIO656--Multilevel Models
67
The Multiple Imputation Algorithm in SAS
The MIANALYZE Procedure
– Combines the m different sets of the parameter
and variance estimates from the m imputations
– Generates valid inferences about the parameters
of interest
PROC MIANALYZE <options>;
BY variables;
VAR variables;
Term 4, 2006
BIO656--Multilevel Models
68
Multiple Imputation Algorithm in SAS
•
•
•
PROC MI <options>;
BY variables;
FREQ
variable;
MULTINORMAL <options>;
VAR variables;
Available options in PROC MI include: NIMPU=number (default=5)
Available options in MULTINORMAL statement:
METHOD=REGRESSION
METHOD=PROPENSITY<(NGROUPS=number)>
METHOD=MCMC<(options)>
The default is METHOD=MCMC
Term 4, 2006
BIO656--Multilevel Models
69
Term 4, 2006
BIO656--Multilevel Models
70