Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SOLVING WIDE PREDICTIVE MODELING PROBLEMS WITH CLINICAL AND GENOMIC DATA Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences SAS Institute, Inc. Copyright © 2013, SAS Institute Inc. All rights reserved. INTRODUCTION OUTLINE • Precision Medicine Initiative • Predictive Models and the Impact of “Big” Data • Tools for Model Assessment • Subgroup Analysis • Live Demonstration Copyright © 2013, SAS Institute Inc. All rights reserved. INTRODUCTION PRECISION MEDICINE INITIATIVE • Biological data to drive research for tailored therapies • Wide-range of application areas, including oncology and pharmacogenomics • Better predict… • Treatment outcomes • Responders • Survival or Time-to-Event • Rich data mining environment Copyright © 2013, SAS Institute Inc. All rights reserved. PREDICTIVE METHODS AND BIG DATA MODELING • Rich set of methodology for prediction problems • Popular methods: • • • • Continuous: GLM, PLS, Kernel methods (e.g. Ridge, Radial-Basis), Trees (e.g. Forest, Gradient Boosting), Quantile Regression Discrete: Logistic, Discriminant, KNN Censored: Life Regression, Cox Proportional Hazards, Buckley-James “Big” biological data => Wide prediction problem! • Serious risk of overfitting Copyright © 2013, SAS Institute Inc. All rights reserved. PREDICTIVE PREDICTOR REDUCTION MODELING • Simple Complex Filtering Techniques Known biology • Statistical testing • Clustering • Forest models or linear regression model selection • Optimization • • Combination of algorithms + predictor reduction = MILLIONS of potential models • Critical to perform filtering within a cross-validated framework to prevent OVERFITTING and generalization bias in your models Copyright © 2013, SAS Institute Inc. All rights reserved. MODEL CROSS-VALIDATION MODEL COMPARISON ASSESSMENT Data Hold Out: K-fold, leave L-out, leave P-percent-out, etc… • Hold Out Methods: Simple Random, Random Partition, Stratified, etc.. • Performance Metrics: RMSE, Harrell’s C, AUC, Correlation, etc… • Copyright © 2013, SAS Institute Inc. All rights reserved. SPECIALIZED PREDICTION SUBGROUP ANALYSIS PROBLEMS • Identify subjects most-likely to respond to treatment • Benefits in study design / safety / ethics • Subgroup 1 P(Improve if Treated) Guidance (CPMP, 2014) • Classification and Regression Trees popular models (Zink et al., 2015) GET WELL ANYWAY INCURABLE DRUG MAKES YOU WORSE 0.5 0 Copyright © 2013, SAS Institute Inc. All rights reserved. DRUG CURES YOU 0.5 1 P(Improve if NOT Treated) JMP LIFE SCIENCES LIVE DEMONSTRATION • JMP Genomics and JMP Clinical Predictive Modeling Reviews • Example Data • Sepsis prediction in hospitals with metabolite and protein data • Survival prediction in prostate cancer with clinical trials data Copyright © 2013, SAS Institute Inc. All rights reserved. DISCOVERY AND HOSPITAL BIOMARKER UTILITY TO PREDICT SEPSIS SURVIVAL PREDICTION Copyright © 2013, SAS Institute Inc. All rights reserved. SUBGROUP ANALYSIS INTERACTION TREES Linear, Logistic or Cox Model f(yi) = β0 + β1xi + β2Treatmenti + β3Treatmenti*xi Significant interaction implies differential treatment effect between subgroups defined by binary covariate All Randomized Subjects Split based on p-value of treatment by covariate interaction term Biomarker 1 Absent Biomarker 2 Absent Biomarker 1 Present Biomarker 2 Present Biomarker 3 Absent Biomarker 2 Absent Biomarker 3 Present Biomarker 2 Present Biomarker 3 Absent Su et al. (2009) Copyright © 2013, SAS Institute Inc. All rights reserved. Biomarker 3 Present SUBGROUP VIRTUAL TWINS ANALYSIS • Virtual Twins (Foster et al., 2011) • Fit forest model and tree model to response and counter-factual data estimated treatment effects Copyright © 2013, SAS Institute Inc. All rights reserved. SUBGROUP OPTIMAL TREATMENT REGIMES ANALYSIS • Subgroup identification • • “the right patients for a given drug” Optimal treatment regimes • “the best drug for a given patient” • Zhang et al. (2011) methodology to fit a response regression model and propensity score logistic model to create pseudo binary response and weight (augmented inverse probability weighted estimators or AIPWE) • Use as input into predictive modeling routines including cross-validated designs (Freidlan et al., 2009) Copyright © 2013, SAS Institute Inc. All rights reserved.