Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014 Objectives • Baseline Adjustment ▫ Introduce approaches ▫ Guidance on when to use different approaches • Missing Data/Drop-out ▫ Raise awareness regarding issues/challenges caused by missing data ▫ Importance for study design and data analysis ▫ Basic understanding of approaches to handling with missing data • Differences in starting values (i.e., baseline) important to consider when trying to assess change over time. 10 8 ▫ Does the degree of change differ between groups ? 6 • Interest is commonly on differences in change over time between groups 12 14 In longitudinal studies, subjects typically have a baseline measurement Time Four options for baseline adjustment 1. Retain baseline value as outcome with no assumptions about group differences at baseline 2. Retain baseline value as outcome and assume group means are equal at baseline 3. Subtract baseline from post baseline responses and analyze differences from baseline 4. Include baseline value as a covariate. Retain baseline as outcome; No assumptions at baseline Group 1 Group 2 Time ∙ ∙ ∙ ∙ Allow intercepts (baselines) to differ between groups Retain baseline as outcome; Assume equal at baseline Group 1 Group 2 Time ∙ ∙ ∙ Assume same intercepts (baselines) in both groups Subtract baseline from post-baseline responses • Define new variable as response variable • Model as before • Interpretation of results a bit different ▫ Group – Are there differences at time 2? ▫ Group·Time – Are the lines parallel from time 2 to n? ▫ Joint test of Group and Group·Time required to evaluate whether the patterns of change are the same over time Use Baseline as covariate • Outcome becomes adjusted change scores (i.e., change over time adjusted for baseline) • Similar interpretation issues as Approach 3 Relationship Among Approaches Retain baseline as outcome? YES Assume equal means at baseline? YES Approach 1 NO Analyze change from baseline Include baseline as covariate Approach 3 Approach 4 NO Approach 2 Which approach to use? • Randomized or Observational Study? ▫ If randomized, reasonable to assume equal baseline values across groups Approach 2 ▫ If observational Approach 2 if reasonable to assume equal baseline values across groups Approach 1 if baseline values differ across groups ▫ Approaches 3 and 4 applicable where Approaches 1 and 2 are applicable, respectively. What is it? What does it matter? What do we do about it? What are missing data and drop-out? • Missing Data ▫ Observations researcher was to collect but didn’t ▫ Many different causes for missing data ▫ Not specific to longitudinal data but common • Drop-out ▫ Subjects leave a study before the intended end ▫ Special class of missing data unique to longitudinal data What does it matter? • Potential for bias and incorrect inferences ▫ Bias can be severe • Loss of information/power ▫ Reduced precision and efficiency of estimates relative to complete data • Data are unbalanced over time ▫ Problem for some analytical methods Six Cities Study of Air Pollution and Health Hypothetical Weight Loss Study Muscatine Coronary Risk Factor Study Six Cities Study of Air Pollution and Health • Objective: Characterize lung function growth in children ▫ Enrolled 1st/2nd grade, followed until graduation ▫ Annual lung function tests • Wide range (1-12) of observations per child ▫ Late enrollment – moved into school district after 2nd grade ▫ Drop out – moved out of school district • Consider reasons for moving out of district Hypothetical Weight Loss Study • Objective: Determine if coached program is more effective than on-line program ▫ Randomize subjects to each program ▫ Collect weight weekly for 3 months • Types of missing values ▫ ▫ Drop-out: missing all values after time t Missing observation: missing one or more observations in the middle of the study • What could cause the missing values? Muscatine Coronary Risk Factor Study • Objective: Examine development and persistence of coronary disease risk factors ▫ Children aged 5-15 ▫ Measured height and weight biennially; classified children as obese or not ▫ Parental consent required for each measurement • Less 40% of children with complete data • What factors contribute to missing values? ▫ No consent form ▫ Child absent from school on day of measurements Missing Data Mechanisms • 3 types distinguished based on relationship between the probability of missingness and the actual values (observed or unobserved) ▫ Missing Completely at Random (MCAR) ▫ Missing at Random (MAR) ▫ Not Missing at Random (NMAR) • Mechanisms have different assumptions and methods for adequately handling missing values differ among the mechanisms Missing Completely at Random • Probability of missing response is unrelated to ▫ The value of the response had it been obtained ▫ The value of observed responses • Examples: ▫ Missed appointment due to car trouble ▫ Variables measured on a subset of subjects by study design • Missingness is simply chance event unrelated to any of the data observed or unobserved • Observed data can be considered random sample of the complete data Missing at Random • Probability of missing response ▫ depends on the set of observed responses but ▫ unrelated to the specific missing value that would have been observed • Examples: ▫ Removal of subject from study once pre-specified value obtained by study design ▫ Higher educated people don’t report income • Observed data can NOT be considered random sample of the complete data Not Missing at Random • Probability of missing response is related to the specific values that would have been obtained • Examples ▫ Value is below the detection limit ▫ People with higher incomes don’t report income ▫ Subjects skips appointment because of weight gain • Missingness is non-ignorable Revisit Examples • Weight Loss Study ▫ Moves out of area - MCAR ▫ Achieves goal weight – MAR or MNAR ▫ Not losing weight – MAR or MNAR • Air Pollution and Health Study ▫ Job relocation – MCAR ▫ Child developed respiratory problems – MAR ▫ Avoid developing respiratory problems – MNAR • Coronary Risk Factor Study ▫ Forgot to sign consent - MCAR ▫ Obese child feigns illness to avoid weighing – MNAR Approaches to Handling Missing Data • Deletion Methods ▫ Complete-case analysis (listwise deletion) ▫ Available-data analysis (pairwise deletion) • Single Imputation Methods • Model-Based Methods ▫ Multiple imputation ▫ Maximum likelihood Deletion Methods • Complete-Case Analysis ▫ Only analyze subjects with complete data • Available-Data Analysis ▫ Analyzing all data that was observed Different analytical methods can handle partial data (e.g., random effect models) ▫ More efficient/power than complete case because uses more information Deletion Methods Advantages and Disadvantages • Advantages ▫ Simple; available-data analysis is default for statistics programs • Disadvantages ▫ Reduced sample size ▫ Complete-case analysis discards data ▫ Biased estimates unless data is MCAR Single Imputation • Substitute missing values with an imputed value • Analyze “complete” data using standard methods • Many different approaches to single imputation Single Imputation Methods • Mean value imputation ▫ Substitute mean value for missing value • “Last value carried forward” imputation ▫ Use last value observed • Regression imputation ▫ Replaces missing value with value predicted from regression derived from observed data • K-nearest neighbor imputation ▫ Impute value based on k most similar subjects Single Imputation Methods Advantages and Disadvantages • Advantages ▫ Simple to implement and understand ▫ Maintains sample size ▫ Uses all available information • Disadvantages ▫ Can reduce variability in the data ▫ Can weaken correlations/covariances ▫ Reduce standard errors because it doesn’t reflect the uncertainty about the predicted unknown values Maximum Likelihood • Parameters estimated based on maximum likelihood using available data ▫ Random effect models implement this approach • Advantages ▫ Uses all available information ▫ Unbiased estimates for MCAR and MAR data • Disadvantages ▫ Model must be correctly specified Multiple Imputation • Missing values are imputed from a model (e.g., regression model) • Imputation conducted multiple times ▫ Replacing missing value with a set of plausible values • Each imputed data is analyzed • Results from analysis of each imputed data set are pooled into single estimate Multiple Imputation Advantages and Disadvantages • Advantages ▫ Better reflects data variability ▫ Considers variability due to sampling and imputation • Disadvantages ▫ More time and computer intensive What if I have MNAR missingness? • • • • Selection models Pattern mixture models Random effect models Shared parameter models What to do – study design? • Carefully consider potential challenges to obtaining complete data ▫ Duration of study, number of visits/surveys, travel distance, participant characteristics/motivations ▫ Provide appropriate compensation/incentives ▫ Plan to enhance/support/encourage completion • If possible, collect information about why an observation is missing What to do – data analysis? • Evaluate missingness in data ▫ How much data is missing? ▫ Are there patterns to missingness? ▫ Are there differences between subjects with complete and incomplete data? ▫ Are there differences in missingness among experimental groups? Within experimental groups? • Consider and compare alternative approaches to addressing missing data