Download JMPDiscovery_2015

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Multinomial logistic regression wikipedia , lookup

Transcript
SOLVING WIDE PREDICTIVE
MODELING PROBLEMS WITH
CLINICAL AND GENOMIC DATA
Kelci J. Miclaus, PhD
Advanced Analytics R&D Manager
JMP Life Sciences
SAS Institute, Inc.
Copyright © 2013, SAS Institute Inc. All rights reserved.
INTRODUCTION OUTLINE
• Precision Medicine Initiative
• Predictive Models and the Impact of “Big” Data
• Tools for Model Assessment
• Subgroup Analysis
• Live Demonstration
Copyright © 2013, SAS Institute Inc. All rights reserved.
INTRODUCTION PRECISION MEDICINE INITIATIVE
• Biological data to drive research for
tailored therapies
• Wide-range of application areas,
including oncology and
pharmacogenomics
• Better predict…
• Treatment outcomes
• Responders
• Survival or Time-to-Event
• Rich data mining environment
Copyright © 2013, SAS Institute Inc. All rights reserved.
PREDICTIVE
METHODS AND BIG DATA
MODELING
• Rich set of methodology for prediction problems
• Popular methods:
•
•
•
•
Continuous: GLM, PLS, Kernel methods (e.g. Ridge, Radial-Basis), Trees
(e.g. Forest, Gradient Boosting), Quantile Regression
Discrete: Logistic, Discriminant, KNN
Censored: Life Regression, Cox Proportional Hazards, Buckley-James
“Big” biological data => Wide prediction problem!
• Serious risk of overfitting
Copyright © 2013, SAS Institute Inc. All rights reserved.
PREDICTIVE
PREDICTOR REDUCTION
MODELING
• Simple  Complex Filtering Techniques
Known biology
• Statistical testing
• Clustering
• Forest models or linear regression model selection
• Optimization
•
• Combination of algorithms + predictor reduction = MILLIONS of potential
models
• Critical to perform filtering within a cross-validated framework to prevent
OVERFITTING and generalization bias in your models
Copyright © 2013, SAS Institute Inc. All rights reserved.
MODEL
CROSS-VALIDATION MODEL COMPARISON
ASSESSMENT
Data Hold Out: K-fold, leave L-out, leave P-percent-out, etc…
• Hold Out Methods: Simple Random, Random Partition, Stratified, etc..
• Performance Metrics: RMSE, Harrell’s C, AUC, Correlation, etc…
•
Copyright © 2013, SAS Institute Inc. All rights reserved.
SPECIALIZED
PREDICTION SUBGROUP ANALYSIS
PROBLEMS
• Identify subjects most-likely to respond to treatment
• Benefits in study design / safety / ethics
• Subgroup
1
P(Improve if Treated)
Guidance (CPMP, 2014)
• Classification and Regression Trees popular models
(Zink et al., 2015)
GET WELL ANYWAY
INCURABLE
DRUG MAKES YOU
WORSE
0.5
0
Copyright © 2013, SAS Institute Inc. All rights reserved.
DRUG CURES YOU
0.5
1
P(Improve if NOT Treated)
JMP LIFE SCIENCES LIVE DEMONSTRATION
• JMP Genomics and JMP Clinical Predictive Modeling Reviews
• Example Data
• Sepsis prediction in hospitals
with metabolite and protein data
• Survival prediction in prostate cancer with clinical trials data
Copyright © 2013, SAS Institute Inc. All rights reserved.
DISCOVERY AND
HOSPITAL BIOMARKER UTILITY TO PREDICT SEPSIS SURVIVAL
PREDICTION
Copyright © 2013, SAS Institute Inc. All rights reserved.
SUBGROUP
ANALYSIS INTERACTION TREES
Linear, Logistic or Cox Model
f(yi) = β0 + β1xi + β2Treatmenti + β3Treatmenti*xi
Significant interaction implies
differential treatment effect between
subgroups defined by binary covariate
All
Randomized
Subjects
Split based on p-value of
treatment by covariate
interaction term
Biomarker 1
Absent
Biomarker 2
Absent
Biomarker 1
Present
Biomarker 2
Present
Biomarker 3
Absent
Biomarker 2
Absent
Biomarker 3
Present
Biomarker 2
Present
Biomarker 3
Absent
Su et al. (2009)
Copyright © 2013, SAS Institute Inc. All rights reserved.
Biomarker 3
Present
SUBGROUP
VIRTUAL TWINS
ANALYSIS
•
Virtual Twins (Foster et al., 2011)
•
Fit forest model and tree model to response and counter-factual data
estimated treatment effects
Copyright © 2013, SAS Institute Inc. All rights reserved.
SUBGROUP
OPTIMAL TREATMENT REGIMES
ANALYSIS
•
Subgroup identification
•
•
“the right patients for a given drug”
Optimal treatment regimes
•
“the best drug for a given patient”
•
Zhang et al. (2011) methodology to fit a response regression model and
propensity score logistic model to create pseudo binary response and
weight (augmented inverse probability weighted estimators or AIPWE)
•
Use as input into predictive modeling routines including cross-validated
designs (Freidlan et al., 2009)
Copyright © 2013, SAS Institute Inc. All rights reserved.