Download Genomics and Health Data Standards: Lessons from the Past and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Public health genomics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Genomics wikipedia , lookup

Transcript
Genomics and Health Data Standards:
Lessons from the Past and Present
for a Genome-enabled Future
Daniel Masys, MD
Professor and Chair
Department of Biomedical Informatics
Professor of Medicine
Vanderbilt University School of Medicine
Principal Investigator
electronic Medical Records and Genomics (eMERGE)
Coordination Center
Topics
z
z
The eMERGE consortium: on the frontier of
genomes and phenotypes derived from
clinical sources
Lessons for future standards learned from
the last 40 years of biomedical informatics
RFA HG-07-005:
Genome-Wide Studies in Biorepositories with
Electronic Medical Record Data
z
z
2007 NIH Request for Applications from the National
Human Genome Research Institute
“The purpose of this funding opportunity is to
provide support for investigative groups affiliated
with existing biorepositories to develop necessary
methods and procedures for, and then to perform, if
feasible, genome-wide studies in participants with
phenotypes and environmental exposures derived
from electronic medical records, with the aim of
widespread sharing of the resulting individual
genotype-phenotype data to accelerate the
discovery of genes related to complex diseases.”
eMERGE: an NHGRI-funded consortium for
Biobanks linked to EMR data
z
Consortium members
z
z
z
z
z
z
Group Health of Puget Sound
Marshfield Clinic
Mayo Clinic
Northwestern University
Vanderbilt University
Condition of NIH funding: contribute genomic
and EMR-derived phenotype data to dbGAP
database at NCBI
Dementia
Cataracts
Type II diabetes
Coordinating
center
Peripheral vascular disease
QRS duration
z
z
z
z
Each site includes DNA linked to electronic medical
records
Each project includes community engagement,
genome science, natural language processing
capability for EMR data
Research plans include identifying a phenotype of
interest in 3,000 subjects and conduct of a genomewide association study at each center: Σ=18,000
Supplemental funding provided for cross-network
phenotypes
Supplement phenotypes using
genotyped samples from primary
phenotypes*
*The benefit of data from routine clinical testing results recorded in EHR
RBC/WBC
Diabetic
Retinopathy
Lipid Levels
& Height
GFR
GHC
3,579
230
3,114
1,713
Marshfield
3,865
213
3,693
3,929
Mayo
3,346
806
3,175
3,340
NU
2,484
139
2,816
1,485
VU
2,650
1,449
1,631
2,679
Informatics Issues in eMERGE
z
z
z
z
Determination of comparability of patient populations
across institutions
Data exchange standards for phenotype data
Representation of change over time (repeated
measures) and ‘clinical uncertainty’ for EMR-derived
observations (definite – probable – possible for
assertion and negation)
Re-identification potential of clinical data and
associated samples: maximizing scientific value
while complying with federal privacy protection
policies
Lessons learned from eMERGE
z
z
z
Structured data alone (ICD9 codes, labs, meds)
is not enough to accurately define clinical
phenotypes in EMR data.
Natural language processing of clinician notes
improves both sensitivity and specificity of
phenotype selection logic
Once selection logic has been validated at one
site, it can be successfully shared as
pseudocode and implemented at other sites
with widely varying EMR systems architectures
The problem with ICD codes
z
z
z
ICD code sets contain both false negatives and false positives
False negatives:
z
Outpatient billing limited to 4 diagnoses/visit
z
Outpatient billing done by physicians (e.g., takes too long to find the
unknown ICD9)
z
Inpatient billing done by professional coders:
z
omit codes that don’t pay well
z
can only code problems actually explicitly mentioned in
documentation
False positives
z
Diagnoses evolve over time -- physicians may initially bill for suspected
diagnoses that later are determined to be incorrect
z
Billing the wrong code (perhaps it is easier to find for a busier clinician)
z
Physicians may bill for a different condition if it pays for a given
treatment
z
Example: Anti-TNF biologics (e.g., infliximab) originally not covered
for psoriatic arthritis, so rheumatologists would code the patient as
having rheumatoid arthritis
Observations about clinical
genomics
z
z
z
Genomic data is the current poster child for
complexity in healthcare
No practitioner can absorb and remember
more than a tiny fraction of the knowledge
base of human variation
Therefore, computerized clinical decision
support is the only effective way to insert
genomic variation-based guidance into clinical
care
Need for Patient-Specific
Decision Support Assistance
Facts per Decision
1000
Proteomics and other
effector molecules
100
Functional Genetics:
Gene expression
profiles
10
Structural Genetics:
e.g. SNPs, haplotypes
Decisions by
clinical
phenotype
i.e., traditional
health care
Human Cognitive
Capacity
1990
2000
2010
2020
Observations about genomic
data in general
z
z
Predictably, some future uses of genomic
data will be so creative that nobody could
have predicted them.
“Raw data” e.g., primary DNA sequence,
specific SNP values, should be retained
when adding clinical interpretations, not
discarded in the synthesis and reporting of
those interpretations
Lessons learned
about standards
z
z
The most successful and widely used
standards are messaging and communications
standards e.g., TCP/IP, http, smtp, ftp, HL7.
Knowledge representation standards tend to
be widely used only if driven by financial
processes, government mandate (e.g., ICD-9),
or sustained government investment (e.g.,
MeSH, SNOMED, UMLS Metathesaurus)
Lessons learned about
standards, cont’d
z
z
Familiar and usable wins over theoretically
correct but too complex e.g., HL7 v.2 vs. v.3
Lindberg’s axiom: “Systems that get used get
better.”
URL:
www.gwas.net
Commander Willard Decker: V'ger... expects an answer.
Captain James T. Kirk:
An answer? I don't even know the question.
Star Trek, the movie 1979