Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CRI-09: Cross-Institutional Systems to Support Phenotyping in Biomedical Research: Experiences from the eMERGE Network Luke Rasmussen Marshfield Clinic David Carrell, PhD Group Health Research Institute William Thompson, PhD Northwestern University Hua Xu, PhD Vanderbilt University Jyoti Pathak, PhD Mayo Clinic AMIA CRI Summit 2011 eMERGE Consortium • Principal sponsor: NHGRI with additional funding from NIGMS • NIH-funded consortium (CTSA awardee institutions) • DNA Biobanks linked to EHR data • Consortium members – – – – – Group Health of Puget Sound Marshfield Clinic Mayo Clinic Northwestern University Vanderbilt University Dementia Cataracts Type II diabetes Coordinating center Peripheral vascular disease QRS duration Marshfield Clinic Biobank Population Geographically defined cohort Stable population Minimal selection bias Over 95% of medical events captured in EMR Data All levels of inpatient and outpatient care 5 decades of retrospective clinical data Prospective & continuous data collection via EHR Event, testing, treatment and outcomes represented High utilization of primary care to classify controls Clinical, financial and environment data Health Events eMERGE Contributors • • • NHGRI – Rongling Li – Heather Junkins – Teri Manolio – Jim Ostell Group Health – Eric Larson – Gail Jarvik – Chris Carlson – Wylie Burke – Gene Jart – David Carrell – Malia Fullerton – Walter Kukull – Paul Crane – Noah Weston Northwestern – Rex Chisholm – Bill Lowe – Phil Greenland – Wendy Wolf – Maureen Smith – Geoff Hayes – Pedro Avila – Joel Humowiecki – Jen Allen-Pacheco – Amy Lemke – Will Thompson • Marshfield – Cathy McCarty – Peggy Peissig – Luke Rasmussen – Marilyn Ritchie – Justin Starren – Russ Wilke – Dick Berg – Jim Linneman • Mayo Clinic – Christopher G. Chute – Iftikhar J. Kullo – Barbara Koenig – Suzette Bielinski – Mariza de Andrade • Vanderbilt – Dan Roden – Dan Masys – Josh Denny – Brad Malin – Ellen Wright Clayton – Dana Crawford – Jonathan Haines – Jonathan Schildcrout – Jill Pulley – Melissa Basford – Marilyn Ritchie RFA HG-07-005: Genome-Wide Studies in Biorepositories with Electronic Medical Record Data • 2007 NIH Request for Applications from the National Human Genome Research Institute “The purpose of this funding opportunity is to provide support for investigative groups affiliated with existing biorepositories to develop necessary methods and procedures for, and then to perform, if feasible, genome-wide studies in participants with phenotypes and environmental exposures derived from electronic medical records, with the aim of widespread sharing of the resulting individual genotype-phenotype data to accelerate the discovery of genes related to complex diseases.” (Emphasis added) Development and Growth Idea Develop • Pre-existing and new systems/methods • Applied to common (yet different) tasks • Different locations/ environments Disseminate Issues More Ideas Tools and Methods Presenter Topic Luke Rasmussen Reusable phenotype algorithms Marshfield Clinic Techniques to facilitate future reuse of phenotype algorithms. David Carrell Clinical Text Explorer Search Interface Group Health Facilitates exploration of EHR for rapid phenotyping and algorithm refinement. William Thompson clinical Text Analysis and Knowledge Extraction System (cTAKES) Northwestern University Natural language processing (NLP) system utilized for multiple phenotypes, including PAD. Hua Xu MedEx Vanderbilt University NLP system utilized within eMERGE with additional applications to pharmacogenomic research. Jyoti Pathak eleMAP Mayo Clinic Facilitates harmonization and standardization of phenotype variables across sites. Reusable Phenotype Algorithms Luke Rasmussen Senior Programmer/Analyst Marshfield Clinic Research Foundation Biomedical Informatics Research Center AMIA CRI Summit 2011 Phenotype Development • • • • Multi-disciplinary teams Multiple sites Iterative Intangible →Tangible EMR-based Phenotype Algorithms • Typical components – – – – – Billing and diagnoses codes Procedure codes Labs Medications Phenotype-specific co-variates (e.g., Demographics, Vitals, Smoking Status, CASI scores) – Pathology – Imaging? • Organized into inclusion and exclusion criteria EMR-based Phenotype Algorithms • Iteratively refine case definitions through partial manual review to achieve ~PPV ≥ 95% • For controls, exclude all potentially overlapping syndromes and possible matches; iteratively refine such that ~NPV ≥ 98% Primary Phenotypes Site Phenotype Validation (PPV/NPV) 73% / 92% Group Health Dementia Marshfield Clinic Cataracts / Low HDL Mayo Clinic PAD 98% / 98% 82% / 96% 94% / 99% Northwestern University Vanderbilty University Type 2 DM 98% / 100% QRS Duration 97% / 100% Supplemental Phenotypes Site Phenotype Validation (PPV/NPV) * Group Health WBC Marshfield Clinic Mayo Clinic Diabetic Retinopathy RBC 80% / 98% Northwestern University Lipids / Height Vanderbilty University PheWAS 92% / 100% 95% / 100% * * - Not available at this time 98% / 94% Phenotype Reuse • T2DM Diabetic Retinopathy – Identification of DM – T2DM included T1DM for exclusion • Low HDL Lipids Phenotype Reuse Diabetic Retinopathy T2DM Iterative Refinement for Reuse Condition - Subtype A Condition - Subtype B Subtype A Subtype B Condition Formalizing Reuse • Identified potential for reuse • Leverage significant work • Phenotypes available: www.gwas.org • Limitations – Site-specific implementations Impressions • • • • Easy to do Fits with eMERGE goals Can fit retrospectively Prospective mindset Thank You Luke Rasmussen [email protected] AMIA CRI Summit 2011