Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech [email protected] More info at sunlab.org 1 My research focus on health analytics Health Analytic Apps Clinical data Social data Clinical Researchers Visualization User Behavior data Genomic data Privacy engine Heart disease predictor for $5.99 Research Challenges Analytic cloud My focus Big data analytics on the cloud Data mining and machine learning techniques Privacy preserving data sharing Visual analytic techniques 2 Outline Phenotyping from EHR Other work – PARAMO: Large scale predictive modeling pipeline – Patient Similarity 3 Phenotyping from Electronic Health Records Demographic Procedure Diagnosis EHR Medication Medical Images Lab Tests Phenotyping Medical Concepts (phenotypes) 4 Motivation: Increasing Importance of Electronic Health Records Explosion in interest EHR become acceptable data sources for clinical research EHR data can enable many more research 5 Challenges in Phenotyping from EHR Representation This talk – How to represent heterogeneous EHR data and phenotypes? Speed – How to construct diverse phenotypes in unsupervised fashion? Intuition – How to validate and refine the phenotypes? Adaptation – How to adapt phenotypes from one site to another? 6 Constructing Feature Tensor Tensor is a generalization of matrix – Matrix is a 2nd order tensor Tensors can better capture interactions among concepts Data element types: • Binary • Count (integer) • Continuous (numeric) Mode 7 Multiple Tensors Lab Results Medication Reconciliation Diagnosis-Medication Diagnostic Sources Vital Symptoms 8 Phenotyping through Tensor Factorization Medication factor Phenotype importance Factor elements sum to 1 Diagnosis factor λ1 λR ≈ + Patients factor Phenotype 1 …+ Elements sum to 1 Phenotype R 9 Example Phenotype Medication factor λk Diagnosis factor Candidate Phenotype k (40% of patients) Hypertension Patients factor Beta Blockers Cardio-Selective Thiazides and Thiazide-Like Diuretics HMG CoA Reductase Inhibitors Phenotyping Process using Tensor Factorization λ1 Count Data New Patients Tensor Factorization Count Data + …+ λR Phenotype Definitions Projection Phenotypes Matrix 11 CP-APR Model KL divergence for count data Element index Nonnegative combinations Stochastic constraint (elements in factor sum to 1) Chi, E.C. and Kolda, T.G. 2012. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications. 33, 4 (2012), 1272–1299. 12 Constructing the Tensor Medication orders from Geisinger dataset Diagnosis codes aggregated into HCC codes Medications are defined as pharmacy subclass 31,816 patients x 169 diagnoses x 471 medications 13 Evaluation of Phenotypes: Classification Task: predict patients with heart failure Model: logistic regression with ℓ1 regularization 10 random even splits of the dataset (50% training) Features: 1. Baseline using source independence matrix 2. Principal Component Analysis (PCA) 3. Nonnegative Matrix Factorization (NMF) 4. Phenotype Tensor Factorization (PTF) 14 Predictive Performance Effect Small number of phenotypes outperforms 640 features Features ● PCA NMF ● ● PTF 0.73 ● ● ● ● AUC 0.71 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Baseline 0.69 ● 0.67 ● ● 0.65 25 50 75 100 Number of Factors Number of Phenotypes 15 NMF factors are not concise, harder to interpret Phenotype 1 Hypertension – Opiod Combinations Disorders of the Vertebrae and Spinal Discs – Glucocortiocosteriods Disorders of the Vertebrae and Spinal Discs – Stimulant Laxatives Phenotype 2 Disorders of the Vertebrae and Spinal Discs – Beta Blockers Cardio-Selective Major Symptoms, Abnormalities – Stimulant Laxatives Disorders of the Vertebrae and Spinal Discs – Sympathomimetics Major Symptoms, Abnormalities – Beta Blockers Cardio-Selective Disorders of the Vertebrae and Spinal Discs – Anticonvulsants - Misc Major Symptoms, Abnormalities – Sympathomimetics Disorders of the Vertebrae and Spinal Discs – Central Muscle Relaxants Disorders of the Vertebrae and Spinal Discs – HMG CoA Reductase Inhibitors Disorders of the Vertebrae and Spinal Discs – Selective Serotonin Reuptake Inhibitors Major Symptoms, Abnormalities – Coumarin Anticoagulants Major Symptoms, Abnormalities – Salicylates Major Symptoms, Abnormalities – Surfactant Laxatives Major Symptoms, Abnormalities – Insulin Disorders of the Vertebrae and Spinal Discs – Surfactant Laxatives Major Symptoms, Abnormalities – Proton Pump Inhibitors Disorders of the Vertebrae and Spinal Discs – Proton Pump Inhibitors Major Symptoms, Abnormalities – Anti-infective Agents - Misc Disorders of the Vertebrae and Spinal Discs – Cephalosporins – 1st Generation Major Symptoms, Abnormalities – Vasodilators Disorders of the Vertebrae and Spinal Discs – Analgesics Other Disorders of the Vertebrae and Spinal Discs – Non-Barbiturate Hypnotics Disorders of the Vertebrae and Spinal Discs – Electrolyte Mixtures Hypertension – Opiod Combinations Other Gastrointestinal Disorders – Surfactant Laxatives Other Gastrointestinal Disorders – Insulin Minor Symptoms, Signs, Findings – Opiod Combinations Diabetes with No or Unspecified Complications – Insulin Post-Surgical States/Aftercare/Elective – Opiod Combinations Specified Heart Arrhythmias – Beta Blockers Cardio-Selective Post-Surgical States/Aftercare/Elective – Stimulant Laxatives Iron Deficiency and Other/Unspecified Anemias and Blood Disease - Hematopoietic Growth Factors Post-Surgical States/Aftercare/Elective – Beta Blockers Cardio-Selective Urinary Tract Infection – Insulin Post-Surgical States/Aftercare/Elective – HMG CoA Reductase Inhibitors Other Endocrine/Metabolic/Nutritional Disorders – Insulin Post-Surgical States/Aftercare/Elective – Proton Pump Inhibitors Vascular Disease – Coumarin Anticoagulants Post-Surgical States/Aftercare/Elective – Opiod Agonists Post-Surgical States/Aftercare/Elective – Cephalosporins – 1st Generation Post-Surgical States/Aftercare/Elective – Analgesics Other Vascular Disease – Insulin History of Disease– Insulin Unspecified Renal Failure – Coumarin Anticoagulants Diabetes with Renal Manifestation – Insulin Post-Surgical States/Aftercare/Elective – Non-Barbiturate Hypnotics Other Eye Disorders – Opiod Combinations Other Eye Disorders – Stimulant Laxatives Other Eye Disorders – Opiod Agonists Other Eye Disorders – Cephalosporins – 1st Generation Other Eye Disorders – Non-Barbiturate Hypnotics PTF interpretation: Major disease phenotypes can be identified Uncomplicated Diabetes Phenotype 3 (17.6% of patients) Diabetes with No or Unspecified Complications Sulfonylureas Biguanides Diagnostic Tests Insulin Sensitizing Agents Diabetic Supplies Meglitinide Analogues Antidiabetic Combinations Mild Hypertension Phenotype 4 (31.1% of patients) Hypertension ACE Inhibitors Thiazides and Thiazide-Like Diuretics Chronic Respiratory Inflammation/Infection Phenotype 5 (36.7% of patients) Other Ear, Nose, Throat, and Mouth Disorders Viral and Unspecified Pneumonia, Pleurisy Significant Ear, Nose, and Throat Disorders Cough/Cold/Allergy Combinations Azithromycin Fluoroquinolones Sympathomimetics Penicillin Combinations Antitussives Glucocorticosteroids Tetracyclines Anti-infective Misc. - Combinations Clarithromycin Cephalosporins - 2nd Generation Cephalosporins - 1st Generation Expectorants PTF interpretation: Disease subtypes can be automatically identified Mild Hypertension Phenotype 4 (31.1% of patients) Hypertension ACE Inhibitors Thiazides and Thiazide-Like Diuretics Moderate Hypertension Phenotype 2 (31.5% of patients) Hypertension Beta Blockers Cardio-Selective Angiotensin II Receptor Antagonists Loop Diuretics Potassium Nitrates Alpha-Beta Blockers Vasodilators Severe Hypertension Phenotype 6 (24.3% of patients) Hypertension Calcium Channel Blockers Antihypertensive Combinations Antiadrenergic Antihypertensives Potassium Sparing Diuretics Over 80% phenotype factors are clinically meaningful Summary: Phenotyping using Tensor Factorization λ1 λR ≈ … + + Few diagnosis Phenotype 1 Phenotype R Nonnegative tensor factorization can be used to learn phenotypes without supervision Small number of phenotypes outperforms a large number of features in a prediction task 19 System PARAMO: PARALLEL PREDICTIVE MODELING PLATFORM 20 Predictive Modeling Pipeline There are many different models that need to be built and evaluated – Different patient cohorts – Different targets – Different features – Different algorithms – Multiple training and testing splits in cross-validation 21 Running Time vs. Parallelism level 1000000 9 days Large Medium Small 72X speed up Runtime (s) 100000 3 hours 10000 1000 Serial Patient sets 10 20 40 80 120 160 Number of Concurrent Tasks – Small: 5,000 patients for hypertension control prediction – Medium: 33K for predicting heart failure onset – Large: 319K for hypertension diagnosis prediction Dependency graph: 1808 nodes and 3610 edges 22 Algorithm PATIENT SIMILARITY 23 Patient Similarity Problem Doctor Similarity search Patient 24 Patient Similarity Problem Patient Doctor 25 Summary on Patient Similarity To learn a customized distance metric for a target [1] Extension 1: Composite distance integration (Comdi) [2] – How to combine multiple patient similarity measures? Extension 2: Interactive metric update (iMet) [3] – How to update an existing distance measure? 1. Sun, J., Wang, F., Hu, J., Edabollahi, S., 2012. Supervised patient similarity measure of heterogeneous patient records. ACM SIGKDD Explorations Newsletter 14, 16. 2. Fei Wang, Jimeng Sun, Shahram Ebadollahi: Integrating Distance Metrics Learned from Multiple Experts and its Application in Inter-Patient Similarity Assessment. SDM 2011: 59-70 56 3. Fei Wang, Jimeng Sun, Jianying Hu, Shahram Ebadollahi: iMet: Interactive Metric Learning in Healthcare Applications. SDM 2011: 944-955 26 Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech [email protected] More info at sunlab.org 27