Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical data is siloed • • • • • • • • • Lab / pathology data Electronic Health Record data Clinical trial data Patient histories Medical imaging Microarray data Protein chip data Flow cytometry Genotype / SNP data 2 New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical data is siloed Data in Pittsburgh Data owned by Medicare Data owned by the NIH Data owned by HIV researchers Data owned by the Cleveland Clinic Data owned by regional health organizations Data owned by mouse biologists Data owned by Dr McFritz NIH mandates for data reusability 3 Department of Philosophy 135 Park Hall University at Buffalo Buffalo NY 14260 New York State Center of Excellence in Bioinformatics & Life Sciences Ontology: An antidote to silos promoting: • information retrieval • information consistency, and thus continuity and cumulation • information integration • reasoning 4 New York State Center of Excellence in Bioinformatics & Life Sciences Uses of ‘ontology’ in PubMed abstracts 5 New York State Center of Excellence in By far the&most successful: Bioinformatics Life Sciences GO (The Gene Ontology) You’re interested in which genes control heart muscle development 17,536 results 7 time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Microarray data shows changed expression of thousands of genes. Puparial adhesion Molting cycle hemocyanin Amino acid catabolism Lipid metobolism How will you spot the Peptidase activity patterns? Protein catabloism Immune response Immune response Toll regulated genes control attacked Tree: pearson Coloredby: by: arson lw n3d ... lw n3d ... Colored Copy of Copy C5_RMA Copy ofofCopy of(Defa... C5_RMA (Defa... 8 You’re interested in which of your hospital’s patient data is relevant to understanding how genes control heart muscle development 9 Lab / pathology data EHR data Clinical trial data Family history data Medical imaging Microarray data Model organism data Flow cytometry Mass spec Genotype / SNP data How will you spot the patterns? How will you find the data you need? 10 New York State Center of Excellence in Bioinformatics & Life Sciences GO provides a controlled system of 25,000 categories for use in annotating data • multi-species (model organism research) • multi-disciplinary • open source 11 12 Definitions 13 Gene products involved in cardiac muscle development in humans 14 The GO categorizations are organized in a way which provides a tool for algorithmic reasoning Hierarchical view representing relations between represented types 15 New York State Center of Excellence in Bioinformatics & Life Sciences $100 mill. invested in literature curation using GO over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO experimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO 16 New York State Center of Excellence in Bioinformatics & Life Sciences One standard method Sjöblöm T, et al. analyzed13,023 genes in 11 breast and 11 colorectal cancers using baseline functional information captured by GO for given gene product types identified 189 genes as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention. Science. 2006 Oct 13;314(5797):268-74. 17 New York State Center of Excellence in Bioinformatics & Life Sciences Uses of GO in studies of: • Persistent changes in spinal cord gene expression after recovery from inflammatory hyperalgesia: a preliminary study on pain memory. PMID: 18366630 • Spinal cord transcriptional profile analysis reveals protein trafficking and RNA processing as prominent processes regulated by tactile allodynia. PMID: 17069981 • Immune system involvement in abdominal aortic aneurisms (PMID 17634102) • Biomedical discovery acceleration, with applications to craniofacial development. PMID: 19325874 18 New York State Center of Excellence in Bioinformatics & Life Sciences Ontology in Buffalo Part 2: Problems of Clinical Ontologies New York State Center of Excellence in Bioinformatics & Life Sciences Source of all data Reality ! 20 New York State Center of Excellence in Bioinformatics & Life Sciences Ultimate goal A digital copy of the world 21 New York State Center of Excellence in Bioinformatics & Life Sciences Requirements for this digital copy • R1: A faithful representation of reality • R2 … of everything that is digitally registered, what is generic scientific theories what is specific what individual entities exist and how they relate • R3 … which is computable, in order to … … allow queries over the world’s past and present … make predictions (diagnostic support, early warnings …) … fill in gaps … identify mistakes ... 22 New York State Center of Excellence in Bioinformatics & Life Sciences … the ultimate crystal ball 23 New York State Center of Excellence in Bioinformatics & Life Sciences The ‘binding’ wall A cartoon of the world 24 New York State Center of Excellence in Bioinformatics & Life Sciences “Better Information” must cover … 1 Patient-specific information 3 Scientific “knowledge” 2 • EHR-EMR-ENR-… • PHR • Various modality-related databases – Lab, imaging, … • Textbooks • Classification systems • Terminologies • Ontologies 25 New York State Center of Excellence in Bioinformatics & Life Sciences Key question How to extend to clinical medicine the standard of quality of the GO and other ontologies based in biological science? 26 New York State Center of Excellence in Bioinformatics & Life Sciences 2 NCI Thesaurus (April 2008) 27 New York State Center of Excellence in Bioinformatics & Life Sciences 2 ? NCI Thesaurus (April 2008) 28 New York State Center of Excellence in MeSH: &some paths from Bioinformatics Life Sciences top to Wolfram Syndrome All MeSH Categories 2 Diseases Category Nervous System Diseases Male Urogenital Diseases Eye Diseases Cranial Nerve Diseases Optic Nerve Diseases Eye Diseases, Hereditary Optic Nerve Diseases Optic Atrophy Female Urogenital Diseases and Pregnancy Complications Female Urogenital Diseases Neurodegenerative Diseases Heredodegenerative Disorders, Nervous System Urologic Diseases Kidney Diseases Optic Atrophies, Hereditary Wolfram Syndrome Diabetes Insipidus 32 New York State Center of Excellence in What would it mean if Bioinformatics & Life Sciences used in the context of a patient ? All MeSH Categories 3 Diseases Category ??? Nervous System Diseases Male Urogenital Diseases Eye Diseases Cranial Nerve Diseases Optic Nerve Diseases Eye Diseases, Hereditary has … Optic Nerve Diseases Optic Atrophy ??? Female Urogenital Diseases and Pregnancy Complications Female Urogenital Diseases Neurodegenerative Diseases Heredodegenerative Disorders, Nervous System Urologic Diseases Kidney Diseases Optic Atrophies, Hereditary has Wolfram Syndrome Diabetes Insipidus 33 New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in BuffaloPart 3: What we do New York State Center of Excellence in Bioinformatics & Life Sciences The GO is amazingly successful in overcoming silo problems but it covers only generic biological entities of three sorts: – cellular components – molecular functions – biological processes and it does not provide representations of diseases, symptoms, … 35 New York State Center of Excellence in Bioinformatics & Life Sciences The core of biomedical ontology in Buffalo – extending the methodology of high quality ontologies to other domains of biology and medicine, and to EHRs and coding systems – combining ontology with referent tracking 36 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 37 New York State Center of Excellence in Bioinformatics & Life Sciences National Center for Biomedical Ontology NCBO (NCBO) NIH Roadmap Center for Biomedical Computing Collaboration of: Stanford Biomedical Informatics Research Mayo Clinic University at Buffalo 38 New York State Center of Excellence in Bioinformatics & Life Sciences National Center for Ontological Research (NCOR) • Army Net-Centric Data Strategy Center of Excellence – Biometrics Ontology – Command and Control Ontology – Universal Core Semantic Layer 39 New York State Center of Excellence in Bioinformatics & Life Sciences Current funded biomedical ontology projects • Protein Ontology (PRO) (NIH/NIGMS) • Infectious Disease Ontology (IDO) (NIH/NIAID) • Realism-Based Versioning for Biomedical Ontologies (SNOMED) (NIH/NLM) • Ontology for Risks Against Patient Safety (RAPS) (EU) • DSM Ontology (to support work on revision of Diagnostic and Statistical Manual of Mental Disorders • Cleveland Clinic Semantic Database in Cardiothoracic Surgery 40 New York State Center of Excellence in Bioinformatics & Life Sciences IDO Consortium • MITRE, Mount Sinai, UTSouthwestern – Influenza • IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus) • Colorado State University – Dengue Fever • Duke University – Tuberculosis • Cleveland Clinic – Infective Endocarditis • University of Michigan – Brucilosis 41 New York State Center of Excellence in Bioinformatics & Life Sciences “Better Information” must cover … 1 Patient-specific information 3 Scientific “knowledge” 2 • EHR-EMR-ENR-… • PHR • Various modality-related databases – Lab, imaging, … • Textbooks • Classification systems • Terminologies • Ontologies 42 New York State Center of Excellence in Bioinformatics & Life Sciences Ontologies Keeping track of what is general (diabetes, malaria, nasal bone, nose …) 43 New York State Center of Excellence in Bioinformatics & Life Sciences Referent tracking Keeping track of what is particular (this particular nasal bone, this particular fracture, this particular swimming pool, this particular image …) 44 New York State Center of Excellence in Bioinformatics & Life Sciences eyeGENE 45 New York State Center of Excellence in Bioinformatics & Life Sciences Ontology for Risks Against Patient Safety 46 New York State Center of Excellence in Bioinformatics & Life Sciences REMINE: RT-based adverse event analysis IUI #1 #2 Particular description the patient who is treated #1’s treatment #3 #4 #5 the physician responsible for #2 #1’s arthrosis #1’s anti-inflammatory treatment #6 #7 #8 #1’s physiotherapy #1’s stomach #7’s structure integrity #9 #10 #11 #1’s stomach ulcer coming into existence of #9 change brought about by #9 #12 noticing the presence of #9 #13 cognitive representation in #3 about #9 Properties #1 member C1 since t2 #2 instance_of C3 #2 has_participant #1 since t2 #2 has_agent #3 since t2 #3 member C4 since t2 #4 member C5 since t1 #5 part_of #2 #5 member C2 since t3 #6 part_of #2 #7 member C6 since t2 #8 instance_of C8 since t0 #8 inheres_in #7 since t0 #9 part_of #7 since t3 #10 has_participant #9 at t3 #11 has_agent #9 since t3 #11 has_participant #8 since t3 #11 instance_of C10 at t3 #12 has_participant #9 at t3+x #12 has_agent #3 at t3+x #13 is_about #9 since t3+x 47