* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Development of the Field of Biomedical Ontology
Infection control wikipedia , lookup
Compartmental models in epidemiology wikipedia , lookup
Eradication of infectious diseases wikipedia , lookup
Fetal origins hypothesis wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Transmission (medicine) wikipedia , lookup
Epidemiology wikipedia , lookup
Towards an Ontological Treatment of Disease and Diagnosis Barry Smith New York State Center of Excellence in Bioinformatics and Life Sciences University at Buffalo http://ontology.buffalo.edu/smith 1 Anders Grimsmo, “Patients, diagnoses and processes in general practice in the Nordic countries. An attempt to make data from computerised medical records available for comparable statistics” Scandinavian Journal of Primary Health Care, 2001 “The major obstacle to extracting more epidemiological data from computerised medical records is caused by information in the databases not being uniquely linked to episodes of care.” http://ontology.buffalo.edu/smith 2 What is to be linked with what? What is information in the databases about? To answer this question (to assign numbers to discrete entities), we need a good ontology of the care domain, including episodes of care on the one hand and entities on the side of the patient on the other. http://ontology.buffalo.edu/smith 3 and we need to take account of context – of multiple diseases – of the patient’s style of life – of the patient’s environment – of specific aspects of the presentation http://ontology.buffalo.edu/smith 4 we do this by paying attention to natural language but the more we succeed in this, the more difficult it is to aggregate the data disease of UMLSitis http://ontology.buffalo.edu/smith 5 R T U New York State Center of Excellence in Bioinformatics & Life Sciences 6 R T U New York State Center of Excellence in Bioinformatics & Life Sciences 7 R T U New York State Center of Excellence in OBOFoundry Bioinformatics & Life Sciences 8 Buffalo Longitudinal Cancer Data Even with the best of intentions, and even if we just use one coding system, results are not always what they seem Problem of SNOMEDitis with acknowledgements to NLM: 1R21LM009824-01A1 11 Why does SNOMED change so much? 12 SNOMED CT: Anaplasma marginale (organism) with acknowledgements to NLM: 1R21LM009824-01A1 13 infectious agent is_a navigational concept with acknowledgements to Werner Ceusters 14 NLM: 1R21LM009824-01A1 infectious agent is_a navigational concept 15 with acknowledgements to NLM: 1R21LM009824-01A1 16 with acknowledgements to NLM: 1R21LM009824-01A1 17 with acknowledgements to NLM: 1R21LM009824-01A1 18 with acknowledgements to NLM: 1R21LM009824-01A1 19 20 Why does SNOMED change so much? • Problems with ‘concept’ no real coherence as to what SNOMED is representing 21 Why does SNOMED change so much? • No proper hierarchy (of more and less general) • Confusion of disorders (continuants) with etiological and diagnostic processes (occurrents) and of both with information entities (‘findings’) • Confusion of ‘disorders’ with ‘morphological abnormalities’ 22 SNOMED CT 128477000 44132006 Abscess (disorder) Abscess (morphologic abnormality) 23 Epistemology and Combinatorial Explosion • Epistaxis/nosebleed – Epistaxis (disorder) – Nosebleed/epistaxis symptom (finding) – On examination - epistaxis (disorder) – Has nosebleeds - epistaxis (disorder) – Evidence of recent epistaxis (finding) from Bill Hogan Epistemology and Combinatorial Explosion • Rash – Cutaneous eruption (morphologic abnormality), with synonym Rash – Eruption of skin (disorder), with synonym Rash – Complaining of a rash (finding) – On examination - a rash (finding) • Dry skin – – – – Dry skin (finding) Complaining of dry skin (finding) On examination - dry skin (finding) Dry skin dermatitis (disorder) from Bill Hogan An Alternative: Basic Formal Ontology 360 BC: Aristotle’s Metaphysics 1879: Invention of modern logic (Boole, Frege) 1920: The problem of the Unity of Science (Logical Positivism) 1940 Birth of computing (Turing) http://ontology.buffalo.edu/smith 30 Ontology Timeline 1970: 1980: 1990: 2000: 2007: AI, Robotics (J. McCarthy, P. Hayes) KIF: Knowledge Interchange Format Description Logics Semantic Web (OWL), Protégé National Center for Biomedical Ontology (NCBO) Bioportal http://ontology.buffalo.edu/smith 31 Uses of ‘ontology’ in PubMed abstracts 32 Biomedical Ontology in PubMed 1000 900 860 900 800 700 618 600 501 500 412 400 300 283 200 143 100 0 35 2000 37 2001 69 2002 2003 2004 2005 2006 2007 2008 2009 By far the most successful: GO (Gene Ontology) 34 Ontology Timeline 1990: Human Genome Project 1999: The Gene Ontology (GO) – Model Organism Research 2005: The Open Biomedical Ontologies (OBO) Foundry 2010: Ontology for General Medical Science http://ontology.buffalo.edu/smith 35 The GO is a controlled vocabulary for use in annotating data multi-species, multi-disciplinary, open source contributing to the cumulativity of scientific results obtained by distinct research communities compare use of kilograms, meters, seconds … in formulating experimental results 36 NIH Mandates for Data Sharing Organizations such as the NIH now require use of common standards in a way that will ensure that the results obtained through funded research are more easily accessible to external groups. ODR will be created in such a way that its use will address the new NIH mandates. It will designed also to allow information presented in its terms to be usable in satisfying other regulatory purposes—such as submissions to FDA. http://ontology.buffalo.edu/smith 37 GO provides answers to three types of questions: for each gene product (protein ...) in what parts of the cell has it been identified? Cell Constituent Ontology exercising what types of molecular functions? Molecular Function Ontology with what types of biological processes? Biological Process Ontology 38 39 40 = part_of = subtype_of Gene Product Associations 41 $100 mill. invested in literature curation using GO over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO ontologies provide the basis for capturing biological theories in computable form in contrast to terminologies and thesauri – which focus on socially diverse uses of language – the GO method focuses on commonly shared results of basic biological science 42 A new kind of biological research based on analysis and comparison of the massive quantities of annotations linking ontology terms to raw data, including genomic data, clinical data, public health data What 10 years ago took multiple groups of researchers months of data comparison effort, can now be performed in milliseconds 43 The GO covers only generic (‘normal’) biological entities of three sorts: – cellular components – molecular functions – biological processes It does not provide representations of diseases, symptoms, genetic abnormalities … How to extend the GO methodology to other domains of biology and medicine? 45 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 46 OBO Foundry ontologies all follow the same principles to ensure interoperability – GO Gene Ontology – ChEBI Chemical Ontology – PRO Protein Ontology – CL Cell Ontology – ... – OGMS Ontology for General Medical Science 47 Basic Formal Ontology: GO at a high level http://ontology.buffalo.edu/smith 48 Basic Formal Ontology (BFO) A simple top-level ontology to support information integration in scientific research No abstracta Nothing propositional Clear hierarchy No overlap with domain ontologies No confusion of ontology with epistemology No confusion of terms with what terms represent in reality 49 Basic Formal Ontology Continuant Independent Continuant Occurrent (Process, Event) Dependent Continuant http://ifomis.uni-saarland.de/bfo/ 50 BFO and the 3 Gene Ontologies (GO) Continuant Occurrent Biological Process Independent Continuant Cell Component Dependent Continuant Molecular Function Kumar A., Smith B, Borgelt C. Dependence relationships between Gene Ontology terms based on TIGR gene product annotations. CompuTerm 2004, 31-38. Bada M, Hunter L. Enrichment of OBO Ontologies. J Biomed Inform. 2006 Jul 26 51 Users of BFO NCI BiomedGT SNOMED CT Ontology for General Medical Science (OGMS) ACGT Clinical Genomics Trials on Cancer – Master Ontology / Formbuilder (Case Report Forms for Cancer Clinical Trials) 53 Users of BFO MediCognos / Microsoft Healthvault Cleveland Clinic Semantic Database in Cardiothoracic Surgery Major Histocompatibility Complex (MHC) Ontology (NIAID) Neuroscience Information Framework Standard (NIFSTD) and Constituent Ontologies 54 Users of BFO Interdisciplinary Prostate Ontology (IPO) Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research Neural Electromagnetic Ontologies (NEMO) ChemAxiom – Ontology for Chemistry Ontology for Risks Against Patient Safety (RAPS/REMINE) (EU FP7) IDO Infectious Disease Ontology (NIAID) 55 Infectious Disease Ontology Consortium • MITRE, Mount Sinai, UTSouthwestern – Influenza • IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus) • Colorado State University – Dengue Fever • Duke University – Tuberculosis, Staph. aureus • Case Western Reserve – Infective Endocarditis • University of Michigan – Brucellosis 56 The OBO Foundry • • • • • • • • • • • • • • GO Gene Ontology CL Cell Ontology SO Sequence Ontology ChEBI Chemical Ontology PATO Phenotype (Quality) Ontology FMA Foundational Model of Anatomy ChEBI Chemical Entities of Biological Interest CARO Common Anatomy Reference Ontology PRO Protein Ontology Infectious Disease Ontology Plant Ontology Environment Ontology Ontology for Biomedical Investigations RNA Ontology 57 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 58 RELATION TO TIME GRANULARITY INDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE CONTINUANT DEPENDENT Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RNAO, PRO) OCCURRENT Molecular Function (GO) Organism-Level Process (GO) Cellular Process (GO) Molecular Process (GO) rationale of OBO Foundry coverage (homesteading principle) 59 OBO Foundry organized in terms of Basic Formal Ontology Methodology of downward population Each Foundry ontology can be seen as an extension of a single upper level ontology (BFO) 60 Example: The Cell Ontology Ontology for General Medical Science BFO-based ontology for clinical medicine Continuant Independent Continuant Anatomical Component + Disorder Occurrent Dependent Continuant Pathological Process + Clinical Encounter Disease + Bodily Quality 62 Continuant Independent Continuant Quality Dependent Continuant Disposition ..... ..... 63 realization depends_on realizable Continuant Independent Continuant Dependent Continuant bearer disposition Occurrent Process of realization .... ..... ....... 67 the universal red instantiates the universal eye instantiates this particular case depends_on an instance of eye of redness (of a (in a particular fly) particular fly eye) 70 color is_a red instantiates the particular case of redness (of a particular fly eye) anatomical structure is_a eye instantiates an instance of an depends on eye (in a particular fly) 71 portion of water portion of ice instantiates at t1 portion of liquid water instantiates at t2 Phase transitions portion of gas instantiates at t3 this portion of H20 72 human in nature, no sharp boundaries here embryo instantiates at t1 fetus neonate instantiates at t2 instantiates at t3 infant instantiates at t4 child instantiates at t5 adult instantiates at t6 John (exists continuously) 73 temperature in nature, no sharp boundaries here 37ºC 37.1ºC instantiates at t1 instantiates at t2 37.2ºC instantiates at t3 37.3ºC instantiates at t4 37.4ºC 37.5ºC instantiates at t5 instantiates at t6 John’s temperature (exists continuously) 74 coronary heart disease early lesions and small fibrous plaques instantiates at t1 asymptomatic (‘silent’) infarction instantiates at t2 surface disruption of plaque instantiates at t3 unstable angina instantiates at t4 stable angina instantiates at t5 John’s coronary heart disease (exists continuously) time 75 OGMS Ontology for General Medical Science http://code.google.com/p/ogms/ 76 OGMS: The Big Picture 77 Disposition (potentiality) A disposition is a realizable entity which is such that, if it ceases to exist, then its bearer is physically changed, whose realization occurs, in virtue of the bearer’s physical make-up, when this bearer is in some special physical circumstances 89 Disorder independent continuant that is part of an organism that deviates from the canonical anatomy of the organism in a way that gives rise to pathological processes 90 Disorder serves as the bearer of a disposition to pathological processes A part of the body that typically gets larger over time 91 Disease course • the totality of all disease processes through which a given disease instance is realized . • multiple disease courses will be associated with the same disorder type, for example in reflection of the presence or absence of pharmaceutical or other interventions, of differences in environmental influence, and so forth. The Big Picture 94 A disease is a disposition rooted in a physical disorder in the organism and realized in pathological processes. produces etiological process bears disorder realized_in disposition pathological process produces diagnosis interpretive process produces signs & symptoms used_in abnormal bodily features recognized_as 95 Definitions - Foundational Terms Disorder =def. – A causally linked combination of physical components that is clinically abnormal. Pathological Process =def. – A bodily process that is a manifestation of a disorder and is clinically abnormal. Disease =def. – A disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism. 97 Influenza - infectious Etiological process - infection of airway epithelial cells with influenza virus produces Disorder - viable cells with influenza virus bears Disposition (disease) - flu realized_in Pathological process - acute inflammation produces Abnormal bodily features recognized_as Symptoms - weakness, dizziness Signs - fever Symptoms & Signs used_in Interpretive process produces Hypothesis - rule out influenza suggests Laboratory tests produces Test results - elevated serum antibody titers used_in Interpretive process produces Result - diagnosis that patient X has a disorder that bears the disease flu But the disorder also induces normal physiological processes (immune response) that can results in the elimination of the 98 disorder (transient disease course). Huntington’s Disease - genetic Etiological process - inheritance of >39 CAG repeats in the HTT gene produces Disorder - chromosome 4 with abnormal mHTT bears Disposition (disease) - Huntington’s disease realized_in Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum produces Abnormal bodily features recognized_as Symptoms - anxiety, depression Signs - difficulties in speaking and swallowing Symptoms & Signs used_in Interpretive process produces Hypothesis - rule out Huntington’s suggests Laboratory tests produces Test results - molecular detection of the HTT gene with >39CAG repeats used_in Interpretive process produces Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease 99 HNPCC - genetic pre-disposition Etiological process - inheritance of a mutant mismatch repair gene produces Disorder - chromosome 3 with abnormal hMLH1 bears Disposition (disease) - Lynch syndrome realized_in Pathological process - abnormal repair of DNA mismatches produces Disorder - mutations in proto-oncogenes and tumor suppressor genes with microsatellite repeats (e.g. TGF-beta R2) bears Disposition (disease) - non-polyposis colon cancer realized in Symptoms (including pain) 100 Dispositions and Predispositions All diseases are dispositions; not all dispositions are diseases. A predisposition is a disposition to acquire a disposition. Predisposition to Disease of Type X =def. – A disposition in an organism that constitutes an increased risk of the organism’s subsequently developing the disease X. HNPCC is caused by a disorder (mutation) in a DNA mismatch repair gene that disposes to the acquisition of additional mutations from defective DNA repair processes, and thus is a predisposition to the development of colon cancer. 101 Cirrhosis - environmental exposure • • • • • • • Etiological process - phenobarbitolinduced hepatic cell death – produces Disorder - necrotic liver – bears Disposition (disease) - cirrhosis – realized_in Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death – produces Abnormal bodily features – recognized_as Symptoms - fatigue, anorexia Signs - jaundice, splenomegaly Symptoms & Signs used_in Interpretive process produces Hypothesis - rule out cirrhosis suggests Laboratory tests produces Test results - elevated liver enzymes in serum used_in Interpretive process produces Result - diagnosis that patient X has a disorder that bears the disease cirrhosis Systemic arterial hypertension • • • • • • • Etiological process – abnormal reabsorption of NaCl by the kidney – produces Disorder – abnormally large scattered molecular aggregate of salt in the blood – bears Disposition (disease) - hypertension – realized_in Pathological process – exertion of abnormal pressure against arterial wall – produces Abnormal bodily features – recognized_as Symptoms Signs – elevated blood pressure Symptoms & Signs used_in Interpretive process produces Hypothesis - rule out hypertension suggests Laboratory tests produces Test results used_in Interpretive process produces Result - diagnosis that patient X has a disorder that bears the disease hypertension Type 2 Diabetes Mellitus • • • • • • • Etiological process – – produces Disorder – abnormal pancreatic beta cells and abnormal muscle/fat cells – bears Disposition (disease) – diabetes mellitus – realized_in Pathological processes – diminished insulin production , diminished muscle/fat uptake of glucose – produces Abnormal bodily features – recognized_as Symptoms – polydipsia, polyuria, polyphagia, blurred vision Signs – elevated blood glucose and hemoglobin A1c Symptoms & Signs used_in Interpretive process produces Hypothesis - rule out diabetes mellitus suggests Laboratory tests – fasting serum blood glucose, oral glucose challenge test, and/or blood hemoglobin A1c produces Test results used_in Interpretive process produces Result - diagnosis that patient X has a disorder that bears the disease type 2 diabetes mellitus Type 1 hypersensitivity to penicillin • • • • • • • Etiological process – sensitizing of mast cells and basophils during exposure to penicillin-class substance – produces Disorder – mast cells and basophils with epitope-specific IgE bound to Fc epsilon receptor I – bears Disposition (disease) – type I hypersensitivity – realized_in Pathological process – type I hypersensitivity reaction – produces Abnormal bodily features – recognized_as Symptoms – pruritis, shortness of breath Signs – rash, urticaria, anaphylaxis Symptoms & Signs used_in Interpretive process produces Hypothesis suggests Laboratory tests – produces Test results – occasionally, skin testing used_in Interpretive process produces Result - diagnosis that patient X has a disorder that bears the disease type 1 hypersensitivity to penicillin Next steps in OGMS • classification of distinct types of disease courses for instances of each disease type – in different typical environments – with and without treatment – with treatment plan that is or is not realized by the patient – where the disease exists in combination with other diseases Next steps in OGMS • modify the Big Picture to take account of differences between primary care and specialist care The Big Picture 108 Definitions - Clinical Evaluation Terms Sign =def. – A bodily feature of a patient that is observed in a physical examination and is deemed by the clinician to be of clinical significance. (Objectively observable features) Symptom =def. – An experienced bodily feature of a patient that is observed by and observable only by the patient and is of the type that can be hypothesized by a patient to be a realization of a disease. (A restricted family of phenomena including pain, nausea, anger, drowsiness, which are of their nature experienced in the first person) Symptoms are subjective. But this does not mean that there is no objective fact of the matter whether a given symptom exists 109 Definition: Etiology Etiological Process =def. – A process in an organism that leads to a subsequent disorder. Example: toxic chemical exposure resulting in a mutation in the genomic DNA of a cell; infection of a human with a pathogenic virus; inheritance of two defective copies of a metabolic gene The etiological process creates the physical basis of that disposition to pathological processes which is the disease. 110 Definitions - Diagnosis Clinical Picture =def. – A representation of a clinical phenotype that is inferred from the combination of laboratory, image and clinical findings about a given patient. Diagnosis =def. – A conclusion of an interpretive process that has as input a clinical picture of a given patient and as output an assertion to the effect that the patient has a disease of such and such a type. 111 Definitions - Qualities Manifestation of a Disease =def. – A bodily feature of a patient that is (a) a deviation from clinical normality that exists in virtue of the realization of a disease and (b) is observable. Observability includes observable through elicitation of response or through the use of special instruments. Preclinical Manifestation of a Disease =def. – A manifestation of a disease that exists prior to its becoming detectable in a clinical history taking or physical examination. Clinical Manifestation of a Disease =def. – A manifestation of a disease that is detectable in a clinical history taking or physical examination. Phenotype =def. – A (combination of) bodily feature(s) of an organism determined by the interaction of its genetic make-up and environment. Clinical Phenotype =def. – A clinically abnormal phenotype.112 For an ontology to succeed, potential users should be incentivized to use it, it should be populated using the terms that they need and using definitions that conform to their understanding of these terms it should be easily correctable in light of new research discoveries it should enable the data annotated in its terms to be easily integrated with legacy data from related fields it should be easily extendable to new kinds of data. http://ontology.buffalo.edu/smith 113 A new kind of Electronic Health Record resting on the use of the same (public domain) ontologies in mapping proprietary EHR vocabularies to yield patient data annotated in consistent ways that support integrated care and continuity of care comparison and integration for diagnosis and meta-analysis secondary uses for research 114