Download Challenges in Longitudinal Data Analysis: Baseline Adjustment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Challenges in Longitudinal Data
Analysis: Baseline Adjustment,
Missing Data, and Drop-out
Sandra Taylor, Ph.D.
IDDRC BBRD Core
23 April 2014
Objectives
• Baseline Adjustment
▫ Introduce approaches
▫ Guidance on when to use different approaches
• Missing Data/Drop-out
▫ Raise awareness regarding issues/challenges
caused by missing data
▫ Importance for study design and data analysis
▫ Basic understanding of approaches to handling
with missing data
• Differences in starting values
(i.e., baseline) important to
consider when trying to assess
change over time.
10
8
▫ Does the degree of change
differ between groups ?
6
• Interest is commonly on
differences in change over time
between groups
12
14
In longitudinal studies, subjects
typically have a baseline measurement
Time
Four options for baseline adjustment
1. Retain baseline value as outcome with no
assumptions about group differences at
baseline
2. Retain baseline value as outcome and assume
group means are equal at baseline
3. Subtract baseline from post baseline responses
and analyze differences from baseline
4. Include baseline value as a covariate.
Retain baseline as outcome;
No assumptions at baseline
Group 1
Group 2
Time
∙
∙
∙
∙
Allow intercepts (baselines) to differ between groups
Retain baseline as outcome;
Assume equal at baseline
Group 1
Group 2
Time
∙
∙
∙
Assume same intercepts (baselines) in both groups
Subtract baseline from post-baseline
responses
• Define new variable as response variable
• Model as before
• Interpretation of results a bit different
▫ Group – Are there differences at time 2?
▫ Group·Time – Are the lines parallel from time 2
to n?
▫ Joint test of Group and Group·Time required to
evaluate whether the patterns of change are the
same over time
Use Baseline as covariate
• Outcome becomes adjusted change scores
(i.e., change over time adjusted for baseline)
• Similar interpretation issues as Approach 3
Relationship Among Approaches
Retain baseline as outcome?
YES
Assume equal
means at
baseline?
YES
Approach 1
NO
Analyze change
from baseline
Include baseline
as covariate
Approach 3
Approach 4
NO
Approach 2
Which approach to use?
• Randomized or Observational Study?
▫ If randomized, reasonable to assume equal
baseline values across groups
Approach 2
▫ If observational
 Approach 2 if reasonable to assume equal baseline
values across groups
 Approach 1 if baseline values differ across groups
▫ Approaches 3 and 4 applicable where Approaches
1 and 2 are applicable, respectively.
What is it?
What does it matter?
What do we do about it?
What are missing data and drop-out?
• Missing Data
▫ Observations researcher was to collect but didn’t
▫ Many different causes for missing data
▫ Not specific to longitudinal data but common
• Drop-out
▫ Subjects leave a study before the intended end
▫ Special class of missing data unique to
longitudinal data
What does it matter?
• Potential for bias and incorrect inferences
▫ Bias can be severe
• Loss of information/power
▫ Reduced precision and efficiency of estimates
relative to complete data
• Data are unbalanced over time
▫ Problem for some analytical methods
Six Cities Study of Air Pollution and Health
Hypothetical Weight Loss Study
Muscatine Coronary Risk Factor Study
Six Cities Study of Air Pollution and
Health
• Objective: Characterize lung function growth in
children
▫ Enrolled 1st/2nd grade, followed until graduation
▫ Annual lung function tests
• Wide range (1-12) of observations per child
▫ Late enrollment – moved into school district after
2nd grade
▫ Drop out – moved out of school district
• Consider reasons for moving out of district
Hypothetical Weight Loss Study
• Objective: Determine if coached program is
more effective than on-line program
▫ Randomize subjects to each program
▫ Collect weight weekly for 3 months
• Types of missing values
▫
▫
Drop-out: missing all values after time t
Missing observation: missing one or more
observations in the middle of the study
• What could cause the missing values?
Muscatine Coronary Risk Factor Study
• Objective: Examine development and
persistence of coronary disease risk factors
▫ Children aged 5-15
▫ Measured height and weight biennially; classified
children as obese or not
▫ Parental consent required for each measurement
• Less 40% of children with complete data
• What factors contribute to missing values?
▫ No consent form
▫ Child absent from school on day of measurements
Missing Data Mechanisms
• 3 types distinguished based on relationship
between the probability of missingness and the
actual values (observed or unobserved)
▫ Missing Completely at Random (MCAR)
▫ Missing at Random (MAR)
▫ Not Missing at Random (NMAR)
• Mechanisms have different assumptions and
methods for adequately handling missing values
differ among the mechanisms
Missing Completely at Random
• Probability of missing response is unrelated to
▫ The value of the response had it been obtained
▫ The value of observed responses
• Examples:
▫ Missed appointment due to car trouble
▫ Variables measured on a subset of subjects by
study design
• Missingness is simply chance event unrelated to
any of the data observed or unobserved
• Observed data can be considered random
sample of the complete data
Missing at Random
• Probability of missing response
▫ depends on the set of observed responses but
▫ unrelated to the specific missing value that would
have been observed
• Examples:
▫ Removal of subject from study once pre-specified
value obtained by study design
▫ Higher educated people don’t report income
• Observed data can NOT be considered random
sample of the complete data
Not Missing at Random
• Probability of missing response is related to the
specific values that would have been obtained
• Examples
▫ Value is below the detection limit
▫ People with higher incomes don’t report income
▫ Subjects skips appointment because of weight gain
• Missingness is non-ignorable
Revisit Examples
• Weight Loss Study
▫ Moves out of area - MCAR
▫ Achieves goal weight – MAR or MNAR
▫ Not losing weight – MAR or MNAR
• Air Pollution and Health Study
▫ Job relocation – MCAR
▫ Child developed respiratory problems – MAR
▫ Avoid developing respiratory problems – MNAR
• Coronary Risk Factor Study
▫ Forgot to sign consent - MCAR
▫ Obese child feigns illness to avoid weighing – MNAR
Approaches to Handling Missing Data
• Deletion Methods
▫ Complete-case analysis (listwise deletion)
▫ Available-data analysis (pairwise deletion)
• Single Imputation Methods
• Model-Based Methods
▫ Multiple imputation
▫ Maximum likelihood
Deletion Methods
• Complete-Case Analysis
▫ Only analyze subjects with complete data
• Available-Data Analysis
▫ Analyzing all data that was observed
 Different analytical methods can handle partial data
(e.g., random effect models)
▫ More efficient/power than complete case because
uses more information
Deletion Methods
Advantages and Disadvantages
• Advantages
▫ Simple; available-data analysis is default for
statistics programs
• Disadvantages
▫ Reduced sample size
▫ Complete-case analysis discards data
▫ Biased estimates unless data is MCAR
Single Imputation
• Substitute missing values with an imputed value
• Analyze “complete” data using standard
methods
• Many different approaches to single imputation
Single Imputation Methods
• Mean value imputation
▫ Substitute mean value for missing value
• “Last value carried forward” imputation
▫ Use last value observed
• Regression imputation
▫ Replaces missing value with value predicted from
regression derived from observed data
• K-nearest neighbor imputation
▫ Impute value based on k most similar subjects
Single Imputation Methods
Advantages and Disadvantages
• Advantages
▫ Simple to implement and understand
▫ Maintains sample size
▫ Uses all available information
• Disadvantages
▫ Can reduce variability in the data
▫ Can weaken correlations/covariances
▫ Reduce standard errors because it doesn’t reflect
the uncertainty about the predicted unknown
values
Maximum Likelihood
• Parameters estimated based on maximum
likelihood using available data
▫ Random effect models implement this approach
• Advantages
▫ Uses all available information
▫ Unbiased estimates for MCAR and MAR data
• Disadvantages
▫ Model must be correctly specified
Multiple Imputation
• Missing values are imputed from a model (e.g.,
regression model)
• Imputation conducted multiple times
▫ Replacing missing value with a set of plausible
values
• Each imputed data is analyzed
• Results from analysis of each imputed data set
are pooled into single estimate
Multiple Imputation
Advantages and Disadvantages
• Advantages
▫ Better reflects data variability
▫ Considers variability due to sampling and
imputation
• Disadvantages
▫ More time and computer intensive
What if I have MNAR missingness?
•
•
•
•
Selection models
Pattern mixture models
Random effect models
Shared parameter models
What to do – study design?
• Carefully consider potential challenges to
obtaining complete data
▫ Duration of study, number of visits/surveys, travel
distance, participant characteristics/motivations
▫ Provide appropriate compensation/incentives
▫ Plan to enhance/support/encourage completion
• If possible, collect information about why an
observation is missing
What to do – data analysis?
• Evaluate missingness in data
▫ How much data is missing?
▫ Are there patterns to missingness?
▫ Are there differences between subjects with
complete and incomplete data?
▫ Are there differences in missingness among
experimental groups? Within experimental
groups?
• Consider and compare alternative approaches to
addressing missing data