Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
R T U New York State Center of Excellence in Bioinformatics & Life Sciences International Conference on Biomedical Ontology Providing a Realist Perspective on the eyeGENE™ Database System Buffalo NY July 24, 2009 Werner CEUSTERS Center of Excellence in Bioinformatics and Life Sciences Ontology Research Group University at Buffalo, NY, USA (eyeGENE™ screen shots courtesy of David Scheim, NIH/NEI) R T U New York State Center of Excellence in Bioinformatics & Life Sciences Over the past 15 years, nearly 500 genes that contribute to inherited eye diseases have been identified. Diseasecausing mutations are associated with many ocular diseases, including glaucoma, cataracts, strabismus, corneal dystrophies and a number of forms of retinal degenerations. This remarkable new genetic information highlights the significant inroads that are being made in understanding the medical basis of human ophthalmic diseases. As a result, gene-based therapies are actively being pursued to ameliorate ophthalmic genetic diseases that were once considered untreatable. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Objectives of the Network • provide easy access to genetic testing for patients diagnosed with ocular diseases by screening for these genes, • collect and maintain relevant information in secure databases – to help speed the progress toward developing treatments and – to identify those who are most likely to benefit from them, • maintain a genetic specimen repository. R T U New York State Center of Excellence in Bioinformatics & Life Sciences The eyeGENE™ Database System • a repository of genotype and phenotype information of patients with eye diseases, • linked to a repository of DNA samples of patients with inherited eye diseases, • originally designed as a stand alone application, • but now moving towards a system that can be ‘integrated’ with a variety of other health care IT systems. R T U New York State Center of Excellence in Bioinformatics & Life Sciences eyeGENE™ system milestones • July 2005. System Prototype complete – The eyeGENE™ system vision is achievable • July 2006. Version 1 deployed – Initial usage to determine and specify what the eyeGENE™ network really wanted • May 2007. v2 deployed – Significant usage, validation, refinement of full-blown system • March 2008. v3 deployed – Images, auto-emails, polished CLIA interface • In progress: v4 under development – Support for 100+ monthly samples, more refinements and new capabilities R T U New York State Center of Excellence in Bioinformatics & Life Sciences Core medical information in eyeGENE™ • Patient profile information: – DOB, contact info, race, sex, etc. • Family information: – presence of ‘the same’ disease in family members • Phenotype information: – One or more diagnoses – currently there are 21 potential diagnoses – clinical findings data obtained through structured questions for each diagnosis. • Genetic test results: – Result rows organized by Gene (with unique GI#), exons screened, and lab procedures – For each gene, exons screened, and lab procedures, results are registered as either ‘negative’, ‘mutation’ or ‘variant’ – For mutation or variant, results consist of exon, DNA changes, protein changes, and genotype. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Enhancements to core system capabilities Patient Information Clinical Findings Family Information Dynamic Phenotype Content (v1/2) Can track at Multiple clinics (v4) Unlimited blood/DNA flow (v2) Redesigned Interface (v3) Automated Email tracking (v3) Blood and DNA specimens eyeGENE Genetic test results Anonymized data for analysis New (v3) Medical Images and other uploaded files Consent Forms and other administrative data Prototype Done (v3) Can upload As PDFs (v3) R T U New York State Center of Excellence in Bioinformatics & Life Sciences The desire for Semantic Interoperability • Many standards have been developed or envisioned, many groups engaged in this process. DICOM ICD9 LOINC • March 2008: Began study of future integration options for eyeGENE to emerging medical information standards • July 2008: Completed phase I, Inventory and description of existing standards • March 2009: Phrase II, Recommended options for integration, draft complete R T U New York State Center of Excellence in Bioinformatics & Life Sciences Assumption • Adopting currently available and emerging medical information standards now will provide in the future an additional layer of benefits in better – data collection, – data sharing and – data analysis. • The dream of semantic interoperability. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Degrees of interoperability • D0: no interoperability at all • D1: technical and syntactical interoperability (no semantic interoperability) • D2: two orthogonal levels of partial semantic interoperability – D2a: unidirectional semantic interoperability – D2b: bidirectional semantic interoperability of meaningful fragments • D3: full semantic interoperability, sharable context, seamless co-operability Stroetmann et.al. Semantic Interoperability for Better Health and Safer Healthcare. SemabticHEALTH report. Jan 2009 R T U New York State Center of Excellence in Bioinformatics & Life Sciences ‘Full interoperability’ • ‘Neither language nor technological differences prevent the system to seamlessly integrate the received information into the local record and provide a complete picture of someone’s health as if it would have been collected locally.’ • ‘Further, the anonymized data feeds directly into the tools of public health authorities and researchers.’ Stroetmann et.al. Semantic Interoperability for Better Health and Safer Healthcare. SemabticHEALTH report. Jan 2009 R T U New York State Center of Excellence in Bioinformatics & Life Sciences The problems we are facing (1) • Is ‘full interoperability’ here intentionally defined with a restricted scope? – ‘… seamlessly integrate the received information…’ • does not put any quality or scope restrictions on the ‘information’ itself – ‘… data feeds directly into the tools of public health authorities …’ • the needs and interests of such authorities are not equivalent with the needs and interests of patients. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Is there a wish to improve beyond own needs? • The NCI Thesaurus (in response to a critical evaluation): – ‘making changes to the incorrect use of the ‘‘all” description logic qualifier to NCIt was not costeffective.’ – ‘Many of the problems the review identified, if corrected, would not materially affect the ability of NCIt to meet the use cases that it must support.’ Sherri de Coronado. The NCI Thesaurus quality assurance life cycle. Journal of Biomedical Informatics (2009, in press) R T U New York State Center of Excellence in Bioinformatics & Life Sciences The anti-ontological organization of MedDRA® • Violates many principles for high quality terminology design and use: 1. RL Richesson, KW Fung, JP Krischer. Heterogeneous but “standard” coding systems for adverse events: Issues in achieving interoperability between apples and oranges. Contemporary Clinical Trials 29 (2008) 635–645. 2. Bousquet C, Lagier G, Lillo-Le Louët A, Le Beller C, Venot A, Jaulent MC. Appraisal of the MedDRA conceptual structure for describing and grouping adverse drug reactions. Drug Saf. 2005;28(1):19-34. – Mixing ontology with epistemology: • HLT: Gastrointestinal infections, site unspecified • HLT: Headaches NEC – Obscure, non-documented classification criteria: • LLT: Retroauricular pain classified under PT: Headache (v10, NCI Browser) – LLTs under PT denoting distinct generic entities: • PT: Nodal arrhythmia with LLTs: Junctional bradycardia, Junctional tachycardia, Reciprocating tachycardia – Inadequate versioning and change management: • No reasons for change, deletions of HLTs, … – No definitions for terms. MedDRA® is a registered trademark of the International Federation of Pharmaceutical Manufacturers and Associations (IFPMA) R T U New York State Center of Excellence in Bioinformatics & Life Sciences The problems we are facing (2) • Many Health IT standards do not acknowledge relevant distinctions appropriately. – suffer from reductionist views on reality which are constrained by what can be seen through the lenses of either information systems or terminologies and ontologies that adhere to ‘concept representation’. • Not being able to distinguish what is the case from what is known hampers research. • Therefore these Health IT standards hamper research. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Furthermore: perfect ‘semantic’ tools are useless … • … if data captured at the source is not of high quality • Prevailing HIT information models don’t allow data to be stored at acceptable quality level: – No formal distinction between disorders and diagnosis – Messy nature of the notions of ‘problem’ and ‘concern’ – No unique identification of the entities about which data is stored • Unique IDs for data-elements cannot serve as unique IDs for the entities denoted by these data-elements R T U New York State Center of Excellence in Bioinformatics & Life Sciences The alternative • A realist view on reality distinguishing: Ontologies that should describe only what is always true about the entities of a domain from Data collections / repositories which provide information about concrete cases described in terms of the ontologies from Information models (or data structures) which should describe the artifacts in which data is stored and be such that the ‘aboutness’ of the data is not lost. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Objectives of the study 1. to understand the type of view embedded in the eyeGENE™ database and, 2. in case this view would differ from the realist one, to propose a migration path towards the latter. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Materials & methods • We studied – the available documentation about eyeGENE™’s core medical information, including parts of its information model and user interfaces. – some of the clinical questions (and corresponding possible-answer sets) that are asked to eyeGENE™ users when they enter data in the system, – system generated reports about lab procedures performed on genes. • We did not have access to a data-dictionary with data-definitions and corresponding business rules – thus had to do some guessing about the exact meaning of data-fields • We checked – for design choices in the system that would lead the information to be collected not to match with the corresponding structure of reality; – for structural and functional issues in eyeGENE™ that in absence of sufficient background information for disambiguation would lead to difficulties in interpreting data once entered. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Realist framework used • Levels of reality: – L1- entities: such as specific patients, their relatives, the disorders they are suffering from, the lab tests that have been conducted, and so forth; – L2 - entities: interpretations and opinions on the side of clinicians, including hypotheses and diagnoses; • thus being about entities in first-order reality, although not accessible to third parties without additional L3 references; – L3-entities: information-elements about L1 or L2 entities, examples being entries in information systems such as the eyeGENE™ database. • The (type of) relationships that obtain between entities in each of these levels and across levels. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Results • the pragmatic design approach initially followed by eyeGENE™ exhibits several limitations: (1) conflating the three levels of reality as described above, (2) not representing faithfully the relevant portions of reality at each level, (3) forcing ‘data’ to be entered while there is nothing the data can be data about. some examples … R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences ‘Required fields’ • User must provide data for such fields, but what is relation with reality? – country: each person for sure lives in some country – postal code, state: • not all countries use postal codes nor consist of states – phone number • not everybody has a phone, or at the time of data entry the number might not be known. • No other option than entering fake data. R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences Reductionism (1) • Forced selection from incomplete list – 22 ocular disease types • aren’t there any more? are the others not of interest? Just not now or never? … • Forced structure of data-types – Belgian phone numbers are not structured the way US phone numbers are structured R T U New York State Center of Excellence in Bioinformatics & Life Sciences eyeGENE™ core medical data schema Patient Clinical Encounter Patient Clinical Finding Patient Diagnosis Diagnosis Clinical Finding Diagnosis Finding Link Clinical Finding Unit Link Units Specimen Lab Result R T U New York State Center of Excellence in Bioinformatics & Life Sciences Reductionism (2) • Where are the disorders ? – diagnoses are in the heads of e.g. physicians – disorders are in the body of the patient • L1-L3 confusions • The way clinical findings are linked to diagnoses does not allow to study how findings are related to disorders. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Ok, ok, … • What eyeGENE™ exhibits is common practice. – gives another meaning to ‘state of the art’. • Should we bother because, after all, (almost) everybody does it this way? – erroneous assumption of inherent classification • Parsons J, Wand Y. ACM Transactions on Database Systems. 2000;25(2):228-268. – tyranny of the use case: • ‘if most people wrongly believe that crocodiles are a kind of mammal, then most people would find it easier to locate information about crocodiles if it were located in a mammals grouping, rather than where it factually belonged’ – Huhns MN, Stephens LM. In Enterprise Inter- and Intra-Organizational Integration: Building International Consensus. Boston, MA: Kluwer Academic Publishers; 2002:83 - 90 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Some recommendations (1) • For each table, data field and associated allowed values, hard- or soft-coded business rule that restrict data-input, 1. assess what (type of) entity in reality would be denoted by any data instance, – includes any ‘value’ from ‘value sets’, external terminologies, etc 2. represent how these entities in reality relate to each other as well as to other ontologically relevant entities that are not explicitly addressed in the information model, • the domain model proper, – based on realism-based ontologies 3. describe formally how the information model has to be interpreted in terms of the domain model. – ‘interpretation model’ R T U New York State Center of Excellence in Bioinformatics & Life Sciences Some recommendations (2) • • The (relevant parts of the) interpretation model should be part of any information exchange. Change user interfaces and information model only when no ‘realist interpretation’ is possible or faithful data entry cannot be achieved. – – – certain fields should not be ‘required’, formatting, e.g. phone numbers, is acceptable in a userinterface when it satisfies local situations (not ‘requirements’), but not for exchange, ‘unknown’ and ‘null values’ are acceptable, if suitable interpretations are provided in the interpretation model, not just as text in data-dictionaries. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Conclusions • eyeGENE™ is successfully in use and processes by now over 100 samples / month. • the NIH roadmap goal to ‘require new ways to organize how clinical research information is recorded, new standards for clinical research protocols, modern information technology’, is not reached now. (Does any system ?). • making eyeGENE™ ‘reality-aware’ is feasible. • the hope that at some future time relevant phenotypic data can be automatically extracted from an electronic medical record will remain a dream as long as these systems do not change.