Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
High-Throughput Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research Jyotishman Pathak, PhD Assistant Professor of Biomedical Informatics Health Sciences Research Grand Rounds April 23, 2012 Background – The Problem • Patient recruitment is a huge bottleneck step in conducting successful clinical research studies • 50% of time is spent in recruitment • Low participant rates (~ 5%); studies are underpowered • Clinicians: lack resources to help patients find appropriate studies and trials • Patients: face difficultly to find appropriate studies that are locally available High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-2 Background – Use Cases • Large-scale genomics research • Linking biospecimens and genetic data to personal • health data via biorepositories Need large sample sizes for study design • Population-based epidemiological studies in understanding disease etiology • Often limited in scope or population diversity • Quality metrics and HITECH Act • Pay-for-Performance and quality-based incentives • Population management and cohort identification is nontrivial High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-3 Electronic health records (EHRs) driven phenotyping – The Proposed Solution • EHRs are becoming more and more prevalent within the U.S. healthcare system • Meaningful Use is one of the major drivers • Overarching goal • To develop techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-4 Advantages: EHR-derived phenotyping • There is a LOT of information about subjects • Demographics, labs, meds, procedures, clinical notes… • Identification of otherwise latent population differences • Minimal costs for case ascertainment, no studyspecific recruitment • Records are “retrospectively longitudinal” • Records are real world and contain many different phenotypes • Transportability and reuse of phenotype definitions across EHR enabled sites = power for clinical and research studies High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-5 Challenges: EHR-derived phenotyping • There is a LOT of information about subjects… • Non-standardized, heterogeneous, unstructured • • • data (compared to protocol-based structured data collection) Measured (e.g., demographics) vs. un-measured (e.g., socio-economic status) population differences Hospital specialization and coding practices Population/regional market landscape High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-6 The challenges can be addressed…if we • Develop techniques for standardization and normalization of clinical data and phenotypes • Develop techniques for transforming and managing unstructured clinical text into structured representations • Develop techniques for transportability of EHRdriven phenotyping algorithms • Develop a scalable, robust and flexible framework for demonstrating all of the above in a “real-world setting” High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-7 http://gwas.org High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-8 • Funded by the NHGRI/NIGMS • Goal: to assess utility of EHRs as resources for genome science • Each site includes a biorepository linked to EHRs • Each project includes informatics, biostatistics, community engagement, ELSI, genetics experts • Initial proposals included identifying a primary phenotype of interest in 3,000 subjects and conduct of a genome-wide association study at each center: Σ=18,000 • eMERGE Phase II has a target of developing ~40 phenotype algorithms by the end of 2014 • Algorithm transportability an integral component High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-9 EHR-based Phenotyping Algorithms • Typical components • • • • • • • Billing and diagnoses codes Procedure codes Labs Medications Phenotype-specific co-variates (e.g., Demographics, Vitals, Smoking Status, CASI scores) Pathology Imaging? • Organized into inclusion and exclusion criteria High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-10 EHR-based Phenotyping Algorithms • Iteratively refine case definitions through partial manual review to achieve ~PPV ≥ 95% • For controls, exclude all potentially overlapping syndromes and possible matches; iteratively refine such that ~NPV ≥ 98% High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-11 Algorithm Development Process Rules Evaluation Phenotype Algorithm Transform Mappings Visualization Transform Data NLP, SQL High-Throughput Phenotyping from EHRs [eMERGE Network] ©2012 MFMER | slide-12 Hypothyroidism: Initial Algorithm No thyroid-altering medications (e.g., Phenytoin, Lithium) 2+ non-acute visits in 3 yrs ICD-9s for Hypothyroidism Abnormal TSH/FT4 Thyroid replace. meds Antibodies for TTG or TPO (anti-thyroglobulin, anti-thyroperidase) No ICD-9s for Hypothyroidism No Abnormal TSH/FT4 No thyroid replace. meds No Antiboides for TTG/TPO No secondary causes (e.g., pregnancy, ablation) No hx of myasthenia gravis Case 1 Case 2 Control [Denny et al., 2012] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-13 Hypothyroidism: Initial Algorithm [Conway et al. 2011] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-14 Hypothyroidism: Algorithm Refinement No thyroid-altering medications (e.g., Phenytoin, Lithium) 2+ non-acute visits in 3 yrs ICD-9s for Hypothyroidism Abnormal TSH/FT4 Thyroid replace. meds Antiboides for TTG or TPO (anti-thyroglobulin, anti-thyroperidase) No ICD-9s for Hypothyroidism No Abnormal TSH/FT4 No thyroid replace. meds No Antiboides for TTG/TPO No secondary causes (e.g., pregnancy, ablation) No hx of myasthenia gravis Case 1 Case 2 Control [Denny et al., 2012] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-15 New Hypothyroidism Algorithm: Validation Positive Predictive Values (PPV) Based on Chart Review – All Sites EHR-based Cases/Controls Sampled for Chart Review Cases/Controls Old Case PPV (%) New Case PPV (%) Group Health 430/1,188 50/50 92 98 Marshfield 509/1193 50/50 88 91 Mayo Clinic 250/2,145 100/100 76 97 103/516 50/50 88 98 184/1,344 50/50 90 98 1,421/6,362 — 87 96 Site Northwestern Vanderbilt All sites [Denny et al., 2012] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-16 Data Categories used to define the EHR-driven Phenotyping Algorithms Clinical gold EHR-derived Phenotype Validation standard phenotype Definitions (PPV/NPV) Alzheimer’s Dementia Demographics, clinical examination of mental status, histopathologic examination Diagnoses, medications Demographics, laboratory tests, radiology reports Cataracts Clinical exam finding (Ophthalmologic examination) Diagnoses, procedure codes Demographics, medications 98%/98% Peripheral Arterial Disease Radiology test results (ankle-brachial index or arteriography) Diagnoses, Demographics procedure codes, medications, radiology test results 94%/99% Type 2 Diabetes Laboratory Tests Diagnoses, laboratory Demographics, tests, medications height, weight, family history 98%/100% Cardiac Conduction ECG measurements ECG report results 73% Demographics, 97% diagnoses, procedure codes, medications, [eMERGE Network] laboratory tests ©2012 MFMER | slide-17 Genotype-Phenotype Association Results disease Atrial fibrillation Crohn's disease Multiple sclerosis Rheumatoid arthritis Type 2 diabetes marker gene / region rs2200733 Chr. 4q25 rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 0.5 0.5 published 1.0 2.0 Odds Ratio The Linked Clinical Data (LCD) Project observed 5.0 5 et al. 2010] [Ritchie ©2012 MFMER | slide-18 Key lessons learned from eMERGE • Algorithm design and transportability • • • • Non-trivial; requires significant expert involvement Highly iterative process Time-consuming manual chart reviews Representation of “phenotype logic” for transportability is critical • Standardized data access and representation • Importance of unified vocabularies, data elements, and • • value sets Questionable reliability of ICD & CPT codes (e.g., billing the wrong code since it is easier to find) Natural Language Processing (NLP) needs High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-19 Algorithm Development Process - Modified Rules Semi-Automatic Execution Evaluation Phenotype Algorithm Transform Mappings Visualization Transform Data NLP, SQL High-Throughput Phenotyping from EHRs [eMERGE Network] ©2012 MFMER | slide-20 Strategic Health IT Advance Research Projects (SHARPn): Secondary Uses of EHR Data • Mission: To enable the use of EHR data for secondary purposes, such as clinical research and public health. Leveraging clinical and health informatics to: • generate new knowledge • improve care • address population needs http://sharpn.org [Chute et al. 2011] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-21 SHARPn: Secondary Use of EHR Data A $15M National Consortium • Agilex Technologies • Harvard University • CDISC (Clinical Data Interchange • Intermountain Healthcare Standards Consortium) • Mayo Clinic • Centerphase Solutions • Mirth Corporation, Inc. • Deloitte • MIT • Group Health, Seattle • MITRE Corp. • IBM Watson Research Labs • Regenstrief Institute, Inc. • University of Utah • SUNY, Buffalo • University of Pittsburgh • University of Colorado High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-22 Cross-integrated suite of projects High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-23 Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria Use the NQF Quality Data Model (QDM) • Rules • Conversion of structured phenotype criteria into executable queries Evaluation • Use JBoss® Drools (DRLs) Semi-Automatic Execution Phenotype Algorithm • Standardized representation of Visualization • Transform Mappings Transform clinical data Create new and re-use existing clinical element models (CEMs) Data NLP, SQL High-Throughput Phenotyping from EHRs [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-24 The SHARPn “phenotyping funnel” CEMs Mayo Clinic EHR QDMs Intermountain EHR DRLs Phenotype specific patient cohorts [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-25 Clinical data normalization • Data Normalization • Clinical data comes in all different forms even for • the same kind of information Comparable and consistent data is foundational to secondary use • Clinical Element Models (CEMs) • Basis for retaining computable meaning when data is exchanged between heterogeneous computer systems • Basis for shared computable meaning when clinical data is referenced in decision support logic High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-26 Clinical Element Models Higher-Order Structured Representations [Stan Huff, IHC] The Linked Clinical Data (LCD) Project ©2012 MFMER | slide-27 Pre- and Post-Coordination [Stan Huff, IHC] The Linked Clinical Data (LCD) Project ©2012 MFMER | slide-28 High-Throughput Phenotyping from EHRs [Stan Huff, IHC] Data element harmonization • Stan Huff (Intermountain Healthcare) • Clinical Information Model Initiative (CIMI) • NHS Clinical Statement • CEN TC251/OpenEHR Archetypes • HL7 Templates • ISO TC215 Detailed Clinical Models • CDISC Common Clinical Elements • Intermountain/GE CEMs High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-30 SHARPn data normalization flow - I High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-31 SHARPn data normalization flow - II CEM MySQL database with normalized patient information [Welch et| al. 2012] ©2012 MFMER slide-32 Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria Use the NQF Quality Data Model (QDM) • Rules Semi-Automatic Execution Evaluation Phenotype Algorithm • Standardized representation of Visualization • Transform Mappings Transform clinical data Create new and re-use existing clinical element models (CEMs) Data NLP, SQL High-Throughput Phenotyping from EHRs [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-34 NQF Quality Data Model (QDM) - I • Standard of the National Quality Forum (NQF) • A standard structure and grammar to represent quality measures precisely and accurately in a standardized format that can be used across electronic patient care systems • First (and only) standard for “eMeasures” • “All patients 65 years of age or older with at least two provider visits during the measurement period receiving influenza vaccine subcutaneously” • Implemented as set of XML schemas • Links to standard terminologies (ICD-9, ICD-10, SNOMED-CT, CPT-4, LOINC, RxNorm etc.) High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-35 NQF Quality Data Model (QDM) - II • Supports temporality & sequences • AND: "Procedure, Performed: eye exam" > 1 year(s) starts before or during "Measurement end date" • Groups of codes in a code set (ICD9, etc.) • Can group groups • Represented by OIDs, requires lookup • "Diagnosis, Active: steroid induced diabetes" using "steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)” • Focus on structured data • Would require extensions for NLP High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-36 116 Meaningful Use Phase I Quality Measures High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-37 Example: Diabetes & Lipid Mgmt. - I High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-38 Example: Diabetes & Lipid Mgmt. - II High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-39 NQF Measure Authoring Tool (MAT) High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-40 Our task: human readable machine computable [Thompson et al., submitted 2012] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-41 Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria Use the NQF Quality Data Model (QDM) • Rules • Conversion of structured phenotype criteria into executable queries Evaluation • Use JBoss® Drools (DRLs) Semi-Automatic Execution Phenotype Algorithm • Standardized representation of Visualization • Transform Mappings Transform clinical data Create new and re-use existing clinical element models (CEMs) Data NLP, SQL High-Throughput Phenotyping from EHRs [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-42 JBoss® open-source Drools environment • Represents knowledge with declarative production rules • Origins in artificial intelligence expert systems • Simple when <pattern> then <action> rules specified in text files • Separation of data and logic into separate components • Forward chaining inference model (Rete algorithm) • Domain specific languages (DSL) High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-43 Drools inference architecture Inference Execution Model Define a Knowledge Base • Compiled Rules • Produces Production Memory Extract Knowledge Session from Knowledge Base Insert Facts (data) into Knowledge Session “Agenda” Fire Rules (Race Conditions/Infinite Loops) Retrieve End Results High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-44 Example Drools rule {Rule Name} rule when "Glucose <= 40, Insulin On“ {binding} {Java Class} {Class Getter Method} $msg : GlucoseMsg(glucoseFinding <= 40, currentInsulinDrip > 0 ) then {Class Setter Method} glucoseProtocolResult.setInstruction(GlucoseInstructions GLUCOSE _LESS_THAN_40_INSULIN_ON_MSG); end Parameter {Java Class} High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-46 The “obvious” slide - T2DM Drools flow High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-47 Automatic translation from NQF QDM criteria to Drools Measure Authoring Toolkit From non-executable to executable Drools Engine Measures XML-based Structured representation Data Types XML-based structured representation Value Sets Converting measures to Drools scripts Mapping data types and value sets Drools scripts Fact Models saved in XLS files [Li et al., submitted 2012] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-48 SHARPn phenotyping architecture using CEMs, QDMs, and DRLs [Welch et al. 2012] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-49 The SHARPn “phenotyping funnel” CEMs Mayo Clinic EHR QDMs Intermountain EHR DRLs Phenotype specific patient cohorts [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-50 Phenotype library and workbench - I http://phenotypeportal.org ©2012 MFMER | slide-51 Phenotype library and workbench - I http://phenotypeportal.org 1. Converts QDM to Drools 2. Rule execution by querying the CEM database 3. Generate summary reports ©2012 MFMER | slide-52 Phenotype library and workbench - II http://phenotypeportal.org ©2012 MFMER | slide-53 Phenotype library and workbench - III http://phenotypeportal.org ©2012 MFMER | slide-54 Additional on-going research efforts • Machine learning and association rule mining • Manual creation of algorithms take time • Let computers do the “hard work” • Validate against expert developed ones [Caroll et al. 2011] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-55 Additional on-going research efforts • Machine learning and association rule mining • Manual creation of algorithms take time • Let computers do the “hard work” • Validate against expert developed ones • Just-in-time phenotyping • Current approach: retrospective, longitudinal • • and offline data processing for phenotypes Future: online, real-time phenotyping by implementing “phenotype sniffers” Applications in active syndrome surveillance for transfusion medicine [Kor et al. 2012] High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-56 What does this R&D mean to HSR? • Common, agreed-upon and well-validated phenotype definitions and criteria • Standardized clinical data retrieval and management • “One-stop place” for visualization, execution, and report generation of phenotyping algorithms • Implications for (to name a few): • Center for Science of Healthcare Delivery (SHCD) • Data Management Services (DMS/BSI) • Epidemiology & Health Care and Policy Research • (Epi./HCPR/Rochester Epi. Project) Mayo Clinic Biobank/Genome Consortia (MayoGC) High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-57 Summary • EHRs contain a wealth of phenotypes for clinical and translational research • EHRs represent real-world data, and hence has challenges with interpretation, wrong diagnoses, and compliance with medications • Handling referral patients even more so • Standardization and normalization of clinical data and phenotype definitions is critical • Phenotyping algorithms are often transportable between multiple EHR settings • Validation is an important component High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-58 Acknowledgements: eMERGE collaborators • Northwestern • NHGRI • • Rongling Li Teri Manolio • Group Health • • Eric Larson • Gail Jarvik • Chris Carlson • Wylie Burke • Gene Jart • David Carrell • Malia Fullerton • Walter Kukull • Paul Crane • Noah Weston Marshfield • Cathy McCarty • Peggy Peissig • Marilyn Ritchie • Russ Wilke • • • Rex Chisholm • Bill Lowe • Phil Greenland • Luke Rassmussen • Justin Starren • Maureen Smith • Jen Allen-Pacheco • Will Thompson Mayo Clinic • Christopher G. Chute • Iftikhar J. Kullo • Suzette Bielinski • Mariza de Andrade • John Heit • Jyoti Pathak • Matt Durski • Sean Murphy • Kevin Bruce Vanderbilt • Dan Roden • Josh Denny • Brad Malin • Ellen Wright Clayton • Dana Crawford • Melissa Basford High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-59 Acknowledgement: SHARPn collaborators • Agilex Technologies • Harvard University • CDISC (Clinical Data Interchange • Intermountain Healthcare Standards Consortium) • Mayo Clinic • Centerphase Solutions • Mirth Corporation, Inc. • Deloitte • MIT • Group Health, Seattle • MITRE Corp. • IBM Watson Research Labs • Regenstrief Institute, Inc. • University of Utah • SUNY, Buffalo • University of Pittsburgh • University of Colorado High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-60 Thank You! http://jyotishman.info High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-61