Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
™ ™ ArtificialIntelligenceEnablesPrecisionMedicine MohammedEslami,Ph.D.- ChiefDataScientist June26,2017 DataIntelligenceConference,CapitalOne Copyright2017Netrias,LLC™Allrightsreserved. TheMulti-omics Stack- Makingsenseofdataforresearch ™ AspirationalAI Medicine Today Medicine Tomorrow Diagnose and Treat Reality >=$1B 5% success Copyright2017Netrias,LLC™Allrightsreserved. 2 DatainLifeSciencesvsOtherDomains • LifeSciences:“cambrian explosion”ofdatatypes (Volume/Variety): • Inconsistent andconflictingIndirectMeasurements • ™ Cybersecurity:Consistent,well understooddatatypes(Volume/Velocity) • Variant,functionannotationsfrom literaturedisagree • Geneidentifierformatsvary • Measurementsare2degreesseparated fromobservedvariables Copyright2017Netrias,LLC™Allrightsreserved. 3 ChallengesAligningTechnology&AnalyticstoResearchQuestions ™ •Analyticselectionbias •Priorknowledgeapplication •ComplexSystems •ModelReuse Copyright2017Netrias,LLC™Allrightsreserved. 4 SolutionsforAnalyticChallengesinBiomedicalResearch Problems AnalyticSelection Bias PriorKnowledge Application ComplexSystems ™ ModelReuse Multi-omics Data Solutions DataDrivenSelection andParameterization PriorKnowledge Embedding Copyright2017Netrias,LLC™Allrightsreserved. ComplexModels TransferLearning 5 ™ IdentifyfactorsthatCorrelatewithKennedy’sDisease DataDrivenSelection andParameterization PriorKnowledge Embedding Copyright2017Netrias,LLC™Allrightsreserved. UncoveringTissueSpecificityofSBMA Kennedy’s Disease, Spinal Bulbar Muscular Atrophy (SBMA) Motor-neuron specific disease, only in males ™ NINDS Clinician collecting data from patient samples 7 ©2017Netrias,LLC™Allrightsreserved. IdentifyfactorsthatCorrelatewithKennedy’sDiseaseforNINDS ™ ScientificProblem UsePriorKnowledge toIdentifyProteinLocalizationPatternsfromRNALevelsofCorticalNeurons thatareunaffectedbyKennedy’sdisease Skin Cell Stem Cell Control Motor Neuron Control Cortical Neuron Microarray Data AnalyticProblem TechnologyProblem R,Spreadsheets,IngenuityPathwayAnalysis Copyright2017Netrias,LLC™Allrightsreserved. Standarddysregulationanalysis:Nosignificant patternsorbiomarkersextracted 8 Data-driven discoveryofmodeloptimizesparameterizationto ensureanalyticreproducibility Investigate Clustering Categories Topology Based Link Based ™ Optimize Score Function Akaike Information Criterion Bayesian Information Criterion Density Based Silhouette Score Distribution Based Number of Clusters Copyright2017Netrias,LLC™Allrightsreserved. 9 Clusters based only on expression value + Overlaying Prior Knowledge # of transcripts IncorporationofPriorKnowledge helpedIdentifyPatternsinGene ProductLocationfromGeneExpressionData Distribution of locations reveals no clear discriminants in the clustering of expression data 1 2 # of transcripts 0 Embedding prior knowledge in clustering algorithm as extra “constraints” ™ Looking into cluster of transcripts that have proteins in both cytoplasm and nucleus. 0 1 Copyright2017Netrias,LLC™Allrightsreserved. 2 3 4 5 6 7 10 ™ IdentifyfactorsthatCorrelatewithColorectalCancerIntervention DataDrivenSelection andParameterization PriorKnowledge Embedding Copyright2017Netrias,LLC™Allrightsreserved. ComplexModels EradicatingCancerwithArtificialIntelligence ™ 12 ©2017Netrias,LLC™Allrightsreserved. IdentifyfactorsthatCorrelatewithColorectalCancerIntervention ScientificProblem ™ Type of Response to Chemotherapy Incomplete Medium DNA SNP Microarray Data Complete 161 Tumor Tissue 162 Control Tissue Treatment Type + Pathological Response TechnologyProblem MATLABandlocalmachinetook~2.5 weeksofprocessing AnalyticProblem Hill-climbingwascomputationallyintensiveandsupport vectormachinecannotaccuratelymodelcomplexsystem Copyright2017Netrias,LLC™Allrightsreserved. 13 AnalyticSolution:MovetoDeep LearningTechniques • Abletolearncomplexfeatures/patterns •Well-equippedtohandlehighdimensional,sparse,noisydatawithnonlinearrelationships •Provideshighgeneralizabilityformultiplatformdatacommoninthelifesciences DeepNeuralNetwork Mamoshina, P., Vieira, A., Putin, E., & Zhavoronkov, A. (2016). Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics, 13(5), 1445– 1454. http://doi.org/10.1021/acs.molpharmaceut.5b00982 Copyright2017Netrias,LLC™Allrightsreserved. 14 ™ ApplicationofDeepLearningtoIdentifyPost-ChemoTreatment ™ Hidden Layers x1 Challenges: x2 . . . xN-1 y1 . . . y2 Incomplete Pathological Response 1. Many degrees of freedom model configuration (layers x nodes / layer x activation function) Complete Pathological Response 2. Long training/testing times per model configuration for refinement Ex. 1 hour training time per model x 10 models = 10 hours xN Copyright2017Netrias,LLC™Allrightsreserved. 15 TechnologySolution:ScalingNCICancerAnalysis ™ • Creatingahybridlocal/publiccomputesolution • “Burst”tocloud EC2 EC2 EC2 Node Node Node Serial Parallel Spin up Provision Execute Tear down User managed single run S3 EC2 EC2 EC2 Node Node Node EC2 EC2 EC2 Node Node Node N user managed, on-demand, parallel runs Copyright2017Netrias,LLC™Allrightsreserved. 16 ScalingComplexModelRefinement ™ Data-Driven Model Refinement S3 Solutions: EC2 EC2Node Node EC2 Node EC2 EC2Node Node EC2 Node EC2 EC2Node Node EC2 Node Model 1 Results 1. Model 2 Results ... More powerful hardware to rapidly test various models (1 hr → 30 mins for training) 2. Parallelize model + parameter runs (1 instance à N instances) Model N Results Copyright2017Netrias,LLC™Allrightsreserved. Quickly iterate: 30 mins per model x 10 models / 10 instances = 30 minutes vs. 10 hours 17 DeepLearningResultswithandwithoutPriorKnowledge Cancer Cell Line Encyclopedia National Cancer Institute Analysis ™ Result Transcription Analysis Result Transcription Prior Knowledge Random Sample1 Random Sample2 Cancer Related Genes Data driven DNN models showed accuracy > random chance but high variability Data driven + prior knowledge DNN models showed high accuracy and low variability Copyright2017Netrias,LLC™Allrightsreserved. 18 Conclusion ™ 1. DataDrivenanalyticselectionidentifiesbestclassofalgorithmandappropriate tuningforresearchquestion 2. Incorporationofpriorknowledge intothealgorithmautomaticallyidentifies patternsinexpressionvsknowledgebase 3. Scalablearchitectureallowsfastexplorationofmodelparameterspaceand parallelizationofmodelrunsusingexisting,heterogeneousdatafiles 4. Complexsystemmodelwithdeeplearningwasbetterthanrandomchancebut needstohavepriorknowledge incorporatedappropriatelyforbetterperformance 5. Endtoendmodelrunsare provisionedandreproduciblefromingest→analysis→ results Copyright2017Netrias,LLC™Allrightsreserved. 19 DataScienceasaServiceForBiomedicalResearch ™ We are hiring! Tailored Analysis of Your Needs and Requirements Repeatable, Scalable Insights and Workflows Deliver the Right Analytics and Technology to Your Data Copyright2017Netrias,LLC™Allrightsreserved. 20 ™ ™ revealthehiddenstateofthesystem™ Copyright2017Netrias,LLC™Allrightsreserved.