Download Artificial Intelligence Enables Precision Medicine

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Group development wikipedia , lookup

Transcript
™
™
ArtificialIntelligenceEnablesPrecisionMedicine
MohammedEslami,Ph.D.- ChiefDataScientist
June26,2017
DataIntelligenceConference,CapitalOne
Copyright2017Netrias,LLC™Allrightsreserved.
TheMulti-omics Stack- Makingsenseofdataforresearch
™
AspirationalAI
Medicine
Today
Medicine
Tomorrow
Diagnose
and Treat
Reality
>=$1B
5% success
Copyright2017Netrias,LLC™Allrightsreserved.
2
DatainLifeSciencesvsOtherDomains
•
LifeSciences:“cambrian explosion”ofdatatypes
(Volume/Variety):
•
Inconsistent andconflictingIndirectMeasurements
•
™
Cybersecurity:Consistent,well
understooddatatypes(Volume/Velocity)
• Variant,functionannotationsfrom
literaturedisagree
• Geneidentifierformatsvary
• Measurementsare2degreesseparated
fromobservedvariables
Copyright2017Netrias,LLC™Allrightsreserved.
3
ChallengesAligningTechnology&AnalyticstoResearchQuestions
™
•Analyticselectionbias
•Priorknowledgeapplication
•ComplexSystems
•ModelReuse
Copyright2017Netrias,LLC™Allrightsreserved.
4
SolutionsforAnalyticChallengesinBiomedicalResearch
Problems
AnalyticSelection
Bias
PriorKnowledge
Application
ComplexSystems
™
ModelReuse
Multi-omics Data
Solutions
DataDrivenSelection
andParameterization
PriorKnowledge
Embedding
Copyright2017Netrias,LLC™Allrightsreserved.
ComplexModels
TransferLearning
5
™
IdentifyfactorsthatCorrelatewithKennedy’sDisease
DataDrivenSelection
andParameterization
PriorKnowledge
Embedding
Copyright2017Netrias,LLC™Allrightsreserved.
UncoveringTissueSpecificityofSBMA
Kennedy’s Disease, Spinal
Bulbar Muscular Atrophy (SBMA)
Motor-neuron specific disease,
only in males
™
NINDS Clinician collecting data from
patient samples
7
©2017Netrias,LLC™Allrightsreserved.
IdentifyfactorsthatCorrelatewithKennedy’sDiseaseforNINDS
™
ScientificProblem
UsePriorKnowledge toIdentifyProteinLocalizationPatternsfromRNALevelsofCorticalNeurons
thatareunaffectedbyKennedy’sdisease
Skin Cell
Stem Cell
Control Motor
Neuron
Control Cortical
Neuron
Microarray Data
AnalyticProblem
TechnologyProblem
R,Spreadsheets,IngenuityPathwayAnalysis
Copyright2017Netrias,LLC™Allrightsreserved.
Standarddysregulationanalysis:Nosignificant
patternsorbiomarkersextracted
8
Data-driven discoveryofmodeloptimizesparameterizationto
ensureanalyticreproducibility
Investigate Clustering
Categories
Topology
Based
Link Based
™
Optimize Score
Function
Akaike Information
Criterion
Bayesian Information
Criterion
Density
Based
Silhouette
Score
Distribution
Based
Number of
Clusters
Copyright2017Netrias,LLC™Allrightsreserved.
9
Clusters based only
on expression value
+ Overlaying Prior
Knowledge
# of transcripts
IncorporationofPriorKnowledge helpedIdentifyPatternsinGene
ProductLocationfromGeneExpressionData
Distribution of locations reveals
no clear discriminants in the
clustering of expression data
1
2
# of transcripts
0
Embedding prior
knowledge in
clustering algorithm as
extra “constraints”
™
Looking into cluster of transcripts
that have proteins in both
cytoplasm and nucleus.
0
1
Copyright2017Netrias,LLC™Allrightsreserved.
2
3
4
5
6
7
10
™
IdentifyfactorsthatCorrelatewithColorectalCancerIntervention
DataDrivenSelection
andParameterization
PriorKnowledge
Embedding
Copyright2017Netrias,LLC™Allrightsreserved.
ComplexModels
EradicatingCancerwithArtificialIntelligence
™
12
©2017Netrias,LLC™Allrightsreserved.
IdentifyfactorsthatCorrelatewithColorectalCancerIntervention
ScientificProblem
™
Type of Response to Chemotherapy
Incomplete
Medium
DNA SNP
Microarray Data
Complete
161 Tumor Tissue
162 Control Tissue
Treatment Type + Pathological Response
TechnologyProblem
MATLABandlocalmachinetook~2.5
weeksofprocessing
AnalyticProblem
Hill-climbingwascomputationallyintensiveandsupport
vectormachinecannotaccuratelymodelcomplexsystem
Copyright2017Netrias,LLC™Allrightsreserved.
13
AnalyticSolution:MovetoDeep LearningTechniques
•
Abletolearncomplexfeatures/patterns
•Well-equippedtohandlehighdimensional,sparse,noisydatawithnonlinearrelationships
•Provideshighgeneralizabilityformultiplatformdatacommoninthelifesciences
DeepNeuralNetwork
Mamoshina, P., Vieira, A., Putin, E., & Zhavoronkov, A. (2016). Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics, 13(5), 1445–
1454. http://doi.org/10.1021/acs.molpharmaceut.5b00982
Copyright2017Netrias,LLC™Allrightsreserved.
14
™
ApplicationofDeepLearningtoIdentifyPost-ChemoTreatment
™
Hidden Layers
x1
Challenges:
x2
.
.
.
xN-1
y1
.
.
.
y2
Incomplete
Pathological
Response
1. Many degrees of freedom
model configuration (layers
x nodes / layer x activation
function)
Complete
Pathological
Response
2. Long training/testing times
per model configuration for
refinement
Ex. 1 hour training time per model
x 10 models = 10 hours
xN
Copyright2017Netrias,LLC™Allrightsreserved.
15
TechnologySolution:ScalingNCICancerAnalysis
™
• Creatingahybridlocal/publiccomputesolution
• “Burst”tocloud
EC2
EC2
EC2
Node
Node
Node
Serial
Parallel
Spin up
Provision
Execute
Tear down
User managed single run
S3
EC2
EC2
EC2
Node
Node
Node
EC2
EC2
EC2
Node
Node
Node
N user managed, on-demand, parallel runs
Copyright2017Netrias,LLC™Allrightsreserved.
16
ScalingComplexModelRefinement
™
Data-Driven Model
Refinement
S3
Solutions:
EC2
EC2Node
Node
EC2
Node
EC2
EC2Node
Node
EC2
Node
EC2
EC2Node
Node
EC2
Node
Model 1
Results
1.
Model 2
Results
...
More powerful hardware to
rapidly test various models
(1 hr → 30 mins for training)
2.
Parallelize model + parameter
runs (1 instance à N
instances)
Model N
Results
Copyright2017Netrias,LLC™Allrightsreserved.
Quickly iterate:
30 mins per model x 10 models / 10
instances
= 30 minutes vs. 10 hours
17
DeepLearningResultswithandwithoutPriorKnowledge
Cancer Cell Line Encyclopedia
National Cancer Institute
Analysis
™
Result
Transcription
Analysis
Result
Transcription
Prior Knowledge
Random
Sample1
Random
Sample2
Cancer
Related
Genes
Data driven DNN models showed accuracy
> random chance but high variability
Data driven + prior knowledge DNN models
showed high accuracy and low variability
Copyright2017Netrias,LLC™Allrightsreserved.
18
Conclusion
™
1. DataDrivenanalyticselectionidentifiesbestclassofalgorithmandappropriate
tuningforresearchquestion
2. Incorporationofpriorknowledge intothealgorithmautomaticallyidentifies
patternsinexpressionvsknowledgebase
3. Scalablearchitectureallowsfastexplorationofmodelparameterspaceand
parallelizationofmodelrunsusingexisting,heterogeneousdatafiles
4. Complexsystemmodelwithdeeplearningwasbetterthanrandomchancebut
needstohavepriorknowledge incorporatedappropriatelyforbetterperformance
5. Endtoendmodelrunsare provisionedandreproduciblefromingest→analysis→
results
Copyright2017Netrias,LLC™Allrightsreserved.
19
DataScienceasaServiceForBiomedicalResearch
™
We are hiring!
Tailored Analysis of Your
Needs and Requirements
Repeatable, Scalable
Insights and Workflows
Deliver the Right Analytics and
Technology to Your Data
Copyright2017Netrias,LLC™Allrightsreserved.
20
™
™
revealthehiddenstateofthesystem™
Copyright2017Netrias,LLC™Allrightsreserved.