Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Prognostic Model Building with Biomarkers in Pharmacogenomics Trials Li-an Xu & Douglas Robinson Statistical Genetics & Biomarkers Exploratory Development, Global Biometric Sciences Bristol-Myers Squibb 2006 FDA/Industry Statistics Workshop Theme - Statistics in the FDA and Industry: Past, Present, and Future Washington, DC September 27-29, 2006 Outline Statistical Challenges in Prognostic Model Building Data quantity and quality across multiple platforms Dimension reduction in model building process Model performance measures Realistic assessment of model performance Handling correlated predictors: when p >> n 2 Data Quantity and Quality Across Platforms Tumor samples for mRNA Trial A Sample Size : 161 Subjects Trial B Sample Size : 110 Subjects Trial B Sample Size : 110 Subjects 83 usable mRNA samples (75%) Plasma protein profiling (Liquid Chromatography / Mass Spectrometry) 134 usable (sufficient quality and quantity) mRNA samples (85%) 90 usable plasma samples (82%) Even if sample collection is mandatory, usable sample size < subject sample size Need to design studies based on expected usable sample size 3 Dimension Reduction in Prognostic Model Building Number of potential predictors is greater than number of subjects (p>>n) in high throughput biomarker studies No unique solutions in prognostic model fitting with traditional methods Regularized methods can provide some possible solutions Penalized logistic regression (PLR) + Recursive Feature Elimination (RFE) Threshold gradient descent + RFE Further dimension reduction may still be needed Incorporate prior information (e.g. results from preclinical studies as the starting point for p) Intersection of single-biomarker results from multiple statistical methods 4 Dimension Reduction Through Penalized Logistic Regression with Recursive Feature Elimination to Select Genes Training Set Average Cross-validation Error ~22,000 genes G e n e s Patients 1 gene Choose the model with the smallest crossvalidation error and fewest genes Number of predictors in model 5 Dimension Reduction Through Preclinical Studies Predicting cell line sensitivity to a compound 18 cancer cell lines (12 sensitive, 6 resistant) Identified top 200 genes associated with in vitro sensitivity/resistance Sensitive Zr-75-1 MCF7 18 Caner Cell Lines Zr-75-30 BT474 Her2MCF7 HCC1428 MDAMB436 HCC70 Hs578T SkBr3 MDAMB157 HCC1954 BT20 BT549 HCC38 MDAMB435S Low Resistant 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 AU565 Expression level High Resistant HCC1806 Sensitive Expression Example of one gene 6 Predicting Response in Trial A All treated patients N=161 Patients included in the genomics analysis N=134 29 (18%) 23 (17%) Response Models PPV (95% CI) NPV (95% CI) Sensitivity (95% CI) Specificity (95% CI) Error Starting with full gene list, resulting in 6-gene model 0 (0-0.30) 0.81 (0.69-0.89) 0 (0 -0.26) 0.84 (0.72 -0.91) 0.580 Starting with preclinical top 200, resulting in 10-gene model 0.45 (0.21-0.72) 0.89 (0.79-0.95) 0.45 (0.21-0.72) 0.89 (0.79-0.95) 0.326 Dimension reduction by using prior preclinical results seemed to help in this trial 7 Dimension Reduction Through Intersection of SingleBiomarker Results from Multiple Statistical Methods Method Resp1 Resp2 Resp3 Resp4 Log Reg X X X X t - Test X X X X Cox TTP X Logistic Regression 297 Probesets 46 97 51 t – Test 396 Probesets Cox Proportional Hazards: 446 Probesets Intersection resulted in 51 potential candidates It may be more beneficial to start model building with this set than the complete set of potential predictors (work currently in progress) 8 Model Performance Measures Model 2 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.6 0.5 0.4 0.3 0.7 0.6 0.5 0.4 0.3 0.5 0.4 0.3 0.2 0.1 0.1 0.1 0 0 0 Non-Responder Response Status 0.6 0.2 Responder • These figures are from simulated perfect predictors 0.7 0.2 Non-Responder Model 3 Response Probability Model 1 Response Probability Sensitivity, Specificity, Positive and Negative Predictive Value are common measures of model performance Dependent on the threshold Area under the ROC curve (AUC) may be a better measure for comparing models Response Probability All three models yield complete separation between responders and non-responders Arbitrary threshold of 0.5 probability may lead one to believe that model 2 is superior AUC correctly shows equivalence Responder Non-Responder Response Status Responder Response Status Sensitivity Specificity PPV NPV AUC Model 1 0.73 1 1 0.79 1 Model 2 1 1 1 1 1 Model 3 1 0.77 0.81 1 1 9 Realistic Assessment of Model Performance When sample size is reasonably large Split sample into a training set and independent test Set Build the model on the training set and test the model performance on the test set Pro: One independent test of model performance for the model picked in the training set Cons: When sample size is small, the estimate of performance may have a large variance Reduced sample size for training may yield sub-optimal model Entire model building procedure should be cross-validated • Christophe Ambroise & Geoffrey J. McLachlan, PNAS 99(10): 2002 10 Realistic Assessment of Model Performance When sample size is small, one cannot split data into training / test set Cross–validation alone is a reasonable alternative Warning: Initial performance estimate may be misleading Cross-validated AUC Individual runs Average AUC Number of Predictors Cross-validation should be repeated multiple times Allows one to observe effects of sampling variability The average of replicate estimators gives a more accurate assessment 11 of model performance Handling Correlated Predictors: When p >> n Complex correlation structure (mRNA as example) Multiple probe sets interrogate the same gene Multiple genes function together in pathways Not all pathways are known Multiple response definitions that are interrelated False positive genes may be correlated with true positives Most prognostic modeling techniques do not handle this well Recursive feature elimination may remove important predictors because of correlations This is an open research problem 12 Summary Need to design studies based on expected usable sample size Dimension reduction in the model building process Overfitting problem can be mitigated by regularized methods To further reduce the candidate set of predictors Preclinical information can be useful Intersection of single-biomarker results by different statistical methods may also be useful Model performance Independent test set may be important for validation purposes. When sample size is small, cross-validation is a viable alternative. Cross-validation should include biomarker selection procedures and needs to be performed appropriately Cross-validation should be repeated multiple times Performance measures should be carefully chosen when comparing multiple models. AUC often is a good choice. Handling correlated predictors is still an open research problem 13 Acknowledgments Can Cai Scott Chasalow Ed Clark Mark Curran Ashok Dongre Matt Farmer Alexander Florczyk Shirin Ford Susan Galbraith Ji Gao Nancy Gustafson Ben Huang Tom Kelleher Christiane Langer Hyerim Lee Haolan Lu David Mauro Shelley Mayfield Oksana Mokliatchouk Relekar Padmavathibai Barry Paul Lynn Ploughman Amy Ronczka Katy Simonsen Eric Strittmatter Dana Wheeler Shujian Wu Shuang Wu Kim Zerba Renping Zhang 14