Download slides - Referent Tracking Unit

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Public health genomics wikipedia , lookup

Transcript
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
International Conference on Biomedical Ontology
Providing a Realist Perspective on the
eyeGENE™ Database System
Buffalo NY July 24, 2009
Werner CEUSTERS
Center of Excellence in Bioinformatics and Life Sciences
Ontology Research Group
University at Buffalo, NY, USA
(eyeGENE™ screen shots courtesy of David Scheim, NIH/NEI)
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Over the past 15 years, nearly 500 genes that contribute to
inherited eye diseases have been identified. Diseasecausing mutations are associated with many ocular
diseases, including glaucoma, cataracts, strabismus,
corneal dystrophies and a number of forms of retinal
degenerations. This remarkable new genetic information
highlights the significant inroads that are being made in
understanding the medical basis of human ophthalmic
diseases. As a result, gene-based therapies are actively
being pursued to ameliorate ophthalmic genetic diseases
that were once considered untreatable.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Objectives of the Network
• provide easy access to genetic testing for patients
diagnosed with ocular diseases by screening for
these genes,
• collect and maintain relevant information in secure
databases
– to help speed the progress toward developing
treatments and
– to identify those who are most likely to benefit from
them,
• maintain a genetic specimen repository.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The eyeGENE™ Database System
• a repository of genotype and phenotype
information of patients with eye diseases,
• linked to a repository of DNA samples of patients
with inherited eye diseases,
• originally designed as a stand alone application,
• but now moving towards a system that can be
‘integrated’ with a variety of other health care IT
systems.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
eyeGENE™ system milestones
• July 2005. System Prototype complete
– The eyeGENE™ system vision is achievable
• July 2006. Version 1 deployed
– Initial usage to determine and specify what the eyeGENE™
network really wanted
• May 2007. v2 deployed
– Significant usage, validation, refinement of full-blown system
• March 2008. v3 deployed
– Images, auto-emails, polished CLIA interface
• In progress: v4 under development
– Support for 100+ monthly samples, more refinements and new
capabilities
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Core medical information in eyeGENE™
• Patient profile information:
– DOB, contact info, race, sex, etc.
• Family information:
– presence of ‘the same’ disease in family members
• Phenotype information:
– One or more diagnoses – currently there are 21 potential diagnoses
– clinical findings data obtained through structured questions for each
diagnosis.
• Genetic test results:
– Result rows organized by Gene (with unique GI#), exons screened, and lab
procedures
– For each gene, exons screened, and lab procedures, results are registered as
either ‘negative’, ‘mutation’ or ‘variant’
– For mutation or variant, results consist of exon, DNA changes, protein
changes, and genotype.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Enhancements to core system capabilities
Patient
Information
Clinical
Findings
Family
Information
Dynamic
Phenotype
Content
(v1/2)
Can track at
Multiple clinics (v4)
Unlimited
blood/DNA
flow (v2)
Redesigned
Interface (v3)
Automated
Email tracking (v3)
Blood and
DNA
specimens
eyeGENE
Genetic test
results
Anonymized
data for
analysis
New (v3)
Medical
Images and
other
uploaded files
Consent Forms
and other
administrative
data
Prototype
Done (v3)
Can upload
As PDFs (v3)
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The desire for Semantic Interoperability
• Many standards have been developed or envisioned, many
groups engaged in this process.
DICOM
ICD9
LOINC
• March 2008: Began study of future integration options for
eyeGENE to emerging medical information standards
• July 2008: Completed phase I, Inventory and description
of existing standards
• March 2009: Phrase II, Recommended options for
integration, draft complete
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Assumption
• Adopting currently available and emerging
medical information standards now will provide in
the future an additional layer of benefits in better
– data collection,
– data sharing and
– data analysis.
• The dream of semantic interoperability.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Degrees of interoperability
• D0: no interoperability at all
• D1: technical and syntactical interoperability (no semantic
interoperability)
• D2: two orthogonal levels of partial semantic
interoperability
– D2a: unidirectional semantic interoperability
– D2b: bidirectional semantic interoperability of meaningful
fragments
• D3: full semantic interoperability, sharable context,
seamless co-operability
Stroetmann et.al. Semantic Interoperability for Better Health and Safer Healthcare. SemabticHEALTH report. Jan 2009
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
‘Full interoperability’
• ‘Neither language nor technological differences
prevent the system to seamlessly integrate the
received information into the local record and
provide a complete picture of someone’s health as
if it would have been collected locally.’
• ‘Further, the anonymized data feeds directly into
the tools of public health authorities and
researchers.’
Stroetmann et.al. Semantic Interoperability for Better Health and Safer Healthcare. SemabticHEALTH report. Jan 2009
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The problems we are facing (1)
• Is ‘full interoperability’ here intentionally defined
with a restricted scope?
– ‘… seamlessly integrate the received information…’
• does not put any quality or scope restrictions on the
‘information’ itself
– ‘… data feeds directly into the tools of public health
authorities …’
• the needs and interests of such authorities are not equivalent
with the needs and interests of patients.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Is there a wish to improve beyond own needs?
• The NCI Thesaurus (in response to a critical
evaluation):
– ‘making changes to the incorrect use of the ‘‘all”
description logic qualifier to NCIt was not costeffective.’
– ‘Many of the problems the review identified, if
corrected, would not materially affect the ability of
NCIt to meet the use cases that it must support.’
Sherri de Coronado. The NCI Thesaurus quality assurance life cycle. Journal of Biomedical Informatics (2009, in press)
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The anti-ontological organization of MedDRA®
• Violates many principles for high quality terminology design and use:
1.
RL Richesson, KW Fung, JP Krischer. Heterogeneous but “standard” coding systems for adverse events: Issues
in achieving interoperability between apples and oranges. Contemporary Clinical Trials 29 (2008) 635–645.
2.
Bousquet C, Lagier G, Lillo-Le Louët A, Le Beller C, Venot A, Jaulent MC. Appraisal of the MedDRA
conceptual structure for describing and grouping adverse drug reactions. Drug Saf. 2005;28(1):19-34.
– Mixing ontology with epistemology:
• HLT: Gastrointestinal infections, site unspecified
• HLT: Headaches NEC
– Obscure, non-documented classification criteria:
• LLT: Retroauricular pain classified under PT: Headache
(v10, NCI Browser)
– LLTs under PT denoting distinct generic entities:
• PT: Nodal arrhythmia with LLTs: Junctional bradycardia, Junctional
tachycardia, Reciprocating tachycardia
– Inadequate versioning and change management:
• No reasons for change, deletions of HLTs, …
– No definitions for terms.
MedDRA® is a registered trademark of the International Federation of Pharmaceutical Manufacturers and Associations (IFPMA)
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The problems we are facing (2)
• Many Health IT standards do not acknowledge
relevant distinctions appropriately.
– suffer from reductionist views on reality which are
constrained by what can be seen through the lenses of
either information systems or terminologies and
ontologies that adhere to ‘concept representation’.
• Not being able to distinguish what is the case from
what is known hampers research.
• Therefore these Health IT standards hamper
research.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Furthermore: perfect ‘semantic’ tools are useless …
• … if data captured at the source is not of high
quality
• Prevailing HIT information models don’t allow
data to be stored at acceptable quality level:
– No formal distinction between disorders and diagnosis
– Messy nature of the notions of ‘problem’ and ‘concern’
– No unique identification of the entities about which
data is stored
• Unique IDs for data-elements cannot serve as unique IDs for
the entities denoted by these data-elements
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The alternative
•
A realist view on reality distinguishing:
Ontologies
that should describe only what is always true about the entities
of a domain
from
Data collections / repositories
which provide information about concrete cases described in
terms of the ontologies
from
Information models (or data structures)
which should describe the artifacts in which data is stored and
be such that the ‘aboutness’ of the data is not lost.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Objectives of the study
1. to understand the type of view embedded in the
eyeGENE™ database and,
2. in case this view would differ from the realist
one, to propose a migration path towards the
latter.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Materials & methods
• We studied
– the available documentation about eyeGENE™’s core medical information,
including parts of its information model and user interfaces.
– some of the clinical questions (and corresponding possible-answer sets) that
are asked to eyeGENE™ users when they enter data in the system,
– system generated reports about lab procedures performed on genes.
• We did not have access to a data-dictionary with data-definitions
and corresponding business rules
– thus had to do some guessing about the exact meaning of data-fields
• We checked
– for design choices in the system that would lead the information to be
collected not to match with the corresponding structure of reality;
– for structural and functional issues in eyeGENE™ that in absence of
sufficient background information for disambiguation would lead to
difficulties in interpreting data once entered.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Realist framework used
• Levels of reality:
– L1- entities: such as specific patients, their relatives, the
disorders they are suffering from, the lab tests that have been
conducted, and so forth;
– L2 - entities: interpretations and opinions on the side of
clinicians, including hypotheses and diagnoses;
•
thus being about entities in first-order reality, although not accessible
to third parties without additional L3 references;
– L3-entities: information-elements about L1 or L2 entities,
examples being entries in information systems such as the
eyeGENE™ database.
• The (type of) relationships that obtain between entities in
each of these levels and across levels.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Results
•
the pragmatic design approach initially followed
by eyeGENE™ exhibits several limitations:
(1) conflating the three levels of reality as described
above,
(2) not representing faithfully the relevant portions of
reality at each level,
(3) forcing ‘data’ to be entered while there is nothing
the data can be data about.
some examples …
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
‘Required fields’
• User must provide data for such fields, but what is
relation with reality?
– country: each person for sure lives in some country
– postal code, state:
• not all countries use postal codes nor consist of states
– phone number
• not everybody has a phone, or at the time of data entry the
number might not be known.
• No other option than entering fake data.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Reductionism (1)
• Forced selection from incomplete list
– 22 ocular disease types
• aren’t there any more? are the others not of interest? Just not
now or never? …
• Forced structure of data-types
– Belgian phone numbers are not structured the way US
phone numbers are structured
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
eyeGENE™ core medical data schema
Patient
Clinical
Encounter
Patient
Clinical Finding
Patient
Diagnosis
Diagnosis
Clinical
Finding
Diagnosis
Finding
Link
Clinical
Finding
Unit Link
Units
Specimen
Lab Result
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Reductionism (2)
• Where are the disorders ?
– diagnoses are in the heads of e.g. physicians
– disorders are in the body of the patient
• L1-L3 confusions
• The way clinical findings are linked to diagnoses
does not allow to study how findings are related to
disorders.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Ok, ok, …
• What eyeGENE™ exhibits is common practice.
– gives another meaning to ‘state of the art’.
• Should we bother because, after all, (almost) everybody
does it this way?
– erroneous assumption of inherent classification
• Parsons J, Wand Y. ACM Transactions on Database Systems.
2000;25(2):228-268.
– tyranny of the use case:
• ‘if most people wrongly believe that crocodiles are a kind of mammal,
then most people would find it easier to locate information about
crocodiles if it were located in a mammals grouping, rather than where it
factually belonged’
– Huhns MN, Stephens LM. In Enterprise Inter- and Intra-Organizational
Integration: Building International Consensus. Boston, MA: Kluwer
Academic Publishers; 2002:83 - 90
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Some recommendations (1)
•
For each table, data field and associated allowed values,
hard- or soft-coded business rule that restrict data-input,
1. assess what (type of) entity in reality would be denoted by any
data instance,
– includes any ‘value’ from ‘value sets’, external terminologies, etc
2. represent how these entities in reality relate to each other as
well as to other ontologically relevant entities that are not
explicitly addressed in the information model,
•
the domain model proper,
–
based on realism-based ontologies
3. describe formally how the information model has to be
interpreted in terms of the domain model.
–
‘interpretation model’
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Some recommendations (2)
•
•
The (relevant parts of the) interpretation model should
be part of any information exchange.
Change user interfaces and information model only
when no ‘realist interpretation’ is possible or faithful
data entry cannot be achieved.
–
–
–
certain fields should not be ‘required’,
formatting, e.g. phone numbers, is acceptable in a userinterface when it satisfies local situations (not ‘requirements’),
but not for exchange,
‘unknown’ and ‘null values’ are acceptable, if suitable
interpretations are provided in the interpretation model, not just
as text in data-dictionaries.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Conclusions
• eyeGENE™ is successfully in use and processes by now
over 100 samples / month.
• the NIH roadmap goal to ‘require new ways to organize
how clinical research information is recorded, new
standards for clinical research protocols, modern
information technology’, is not reached now. (Does any
system ?).
• making eyeGENE™ ‘reality-aware’ is feasible.
• the hope that at some future time relevant phenotypic data
can be automatically extracted from an electronic medical
record will remain a dream as long as these systems do
not change.