Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
AMIA 05; CG Chute Biomedical Informatics Biomedical Informatics The Continuum Of Biomedical Informatics Bioinformatics meets Medical Informatics Reference Ontologies in Biomedicine Biology Medicine 10 9 Christopher G. Chute, MD DrPH 8 7 6 Professor and Chair, Biomedical Informatics Mayo Clinic College of Medicine Rochester, Minnesota, USA 5 4 3 2 1 G en om Pr et ot ics ab eo ol M m ic ol Pa ics e th M cul w ol a ec r M ays ul od ar el Si i m ng C ul el at lu i M la ol r M on ec od ul e a G en r A ls om ss a ys ic T B io est sp in g ec im e La ns b D is ea Tri Dat al a se s & D M Syn at a e dr Pa dic om tie al Im e nt a Re gi ng c E H or d R D St at a ru ct ur es 0 M AMIA, Washington DC, 2005 © Mayo Clinic College of Medicine 2005 Biomedical Informatics 2 Biomedical Informatics Big Science Collaboration and Semantics • Communication of findings and results Making Shared Context Explicit • Human publication • Sharing of data resources as building blocks • Foundation for incremental, big-science • Harnessing computing requires formalization • Data format and structures • Discrete terms, vocabulary, ontology • Non-ambiguous concepts, non-overlapping terms • Information models and problem architectures • Standards, conventions, and shared context © Mayo Clinic College of Medicine 2005 CONCEPT CONCEPT Symbolises Refers To Refers To Symbolises “I see a ClipArt image of a rose” Stands For Referent “Rose”, “ClipArt” Symbol “Rose”, “ClipArt” Symbol Context Stands For Context Syntax Terminologies 3 Biomedical Informatics Formal Shared Context © Mayo Clinic College of Medicine 2005 Terminologies [From Solbrig]4 Biomedical Informatics If science is communication what is its language? • Most ontologies and vocabularies are created to meet a specific application or use-case • Despair their re-use in alternative contexts • Virtually all terminologies invoke concepts “out of domain” Science Interlingua Interlocking Reference Terminologies Reference Terminology • A coherent organization of concepts about a well-characterized domain, e.g. • LOINC – drugs • SNOMED, MeSH – anatomy, drugs • Identifying common “atoms” an elusive goal • Fraught with composition • Micro-information models (sentences and ¶’s) © Mayo Clinic College of Medicine 2005 © Mayo Clinic College of Medicine 2005 5 • Units of Measure • Pharmaceuticals • Anatomy • Cellular processes • Reference concepts that underpin scientific expression • Abstract concept space that serves no application need (with or without the real world) © Mayo Clinic College of Medicine 2005 6 1 AMIA 05; CG Chute Biomedical Informatics Biomedical Informatics The VA, FDA, NLM Drug Ontology Projects Science Interlingua Babble vs Babel (and other dyslexiae) • Will an Esperanto of Science fail out of the box? [modified from Steve Brown] Unique Ingredient Identifier: UNII Chemical Structure VHA Drug Class Labeled Properties Drug Component Clinical Drug Of Action, Mechanism Pharmacokinetics Finished Dose Form Effect, Physiologic Intent, Therapeutic • Is it language for humans? • It is language for computers? • Is “the” science ontology [or interlocking plurality] • a period table? • a body of weights and measures? • a dictionary? • a thesaurus? • a language? • Whither cognition? • Inference is not understanding Active Ingredient Dosage Form Drug Product Indications Packaged Drug (NDC) Product Label © Mayo Clinic College of Medicine 2005 7 Biomedical Informatics © Mayo Clinic College of Medicine 2005 Familiar Points Along Continuum Modern Health Vocabularies • Nomenclature – Highly Detailed Descriptions (SNOMED) • Classification – Organized Aggregation of Descriptions into a Rubric (ICDs) • Groupings – High Level Categories of Rubrics (DRGs) Nomenclature Detailed 9 Biomedical Informatics Machine Abstracted, Algorithmically Derived Classifications Clinical Classifications Surveys, Scales, and Datasets Clinical Observations and Patient Record Recording Molecular, Genomic, Cellular Increasing Detail and Richness © Mayo Clinic College of Medicine 2005 © Mayo Clinic College of Medicine 2005 10 Blois, 1988 Medicine and the nature of vertical reasoning • Molecular: receptors, enzymes, vitamins, drugs • Genes, SNPs, gene regulation • Physiologic pathways, regulatory changes • Cellular metabolism, interaction, meiosis,… • Tissue function, integrity • Organ function, pathology • Organism (Human), disease • Sociology, environment, nutrition, mental health… High Level Groups, e.g. DRGs © Mayo Clinic College of Medicine 2005 Classification Groups Grouped Biomedical Informatics Abstraction Layers Narrow Boundary of Ideal, Human Recording Increasing Abstraction 8 Biomedical Informatics What are factors that erode shared context in biomedicine • Concept granularity (specificity) • Vertical scope (molecules to society) • Divergent concept content (codes) • Divergent information models • Terminology – Information model boundary • Human use of “machine” concepts • Human orneriness Humanly Recorded © Mayo Clinic College of Medicine 2005 11 © Mayo Clinic College of Medicine 2005 12 2 AMIA 05; CG Chute Biomedical Informatics Biomedical Informatics Content vs. Structure Contest or Synergy? Computer Equivalent? Domain-specific expansion of “MS” Semantics by domain context Cardiology mitral stenosis Neurology multiple sclerosis Anesthesia morphine sulfate Obstetrics magnesium sulfate Research science manuscript Physics millisecond Education Master of Science U.S. Postal Service Mississippi Computer science Microsoft Correspondence female name prefix © Mayo Clinic College of Medicine 2005 Family History of Breast Cancer Family History of Heart Disease Family History of Stroke Family History Breast Cancer Heart Disease Stroke Terminologic Model Information Model Equivalent Content [adapted from Rossi-Mori] 13 Biomedical Informatics © Mayo Clinic College of Medicine 2005 14 Biomedical Informatics Information Model (HL7) Terminology Model (SNOMED) HL7 RIM targetSiteCode(Observation) targetSiteCode(Procedure) methodCode (Observation & Procedure) approachSiteCode(Procedure) priorityCode(Act) Context mis-match Challenges for Reference concepts • Cultural aggregation and splitting SNOMED CT Attribute “finding site” • 7 words for “rice” in Thai • 17 works for “snow” in Inuktitut )Must the reference enumerate the superset? • Composition vs Terms • Pre-coordinated terms – “Colon Cancer” • Neologisms for “sentence concepts” )Should the reference include conceptual atoms “procedure site” “method” “approach,” “access” and molecules? “priority” • Composition is in the eye of the beholder [adapted from Markwell] © Mayo Clinic College of Medicine 2005 15 Biomedical Informatics © Mayo Clinic College of Medicine 2005 16 Biomedical Informatics Reference Truth: Variations in Identity On orthologs, paralogs, and SNPs • Identity in context of resolving to same concept in a reference terminology • Enzymes that share function: Sulfotransferase • Orthologs across primates • Paralogs (including pseudogenes) in humans • Polymorphisms between individuals © Mayo Clinic College of Medicine 2005 © Mayo Clinic College of Medicine 2005 SULT6B1 Sequence Differences Between Primates Location in Human SULT6B1 17 Nucleotide Sequence Nucleotide Amino Acid Human Gorilla Chimpanzee A148G G274A A314C A321G G336C C390G T400C G429C GCT(538-540)CCC C609A C636T A651C A697G LYS50GLU GLU92LYS LYS105THR THR107THR LEU112PHE PHE130LEU PHE134LEU ARG143SER ALA180PRO ALA203ALA HIS212HIS PRO217PRO SER233GLY A G A A G C T G GCT C C A A A A C G C G T C CCC A C C A G A A G C C C C GCT A T A G © Mayo Clinic College of Medicine 2005 18 3 AMIA 05; CG Chute Biomedical Informatics Biomedical Informatics Human SULT Genes SULT1A2 ~5.1 kb SULT1A3 ~7.3 kb 270 105 SULT1A1 ~4.4 kb 347 74 152 126 98 127 95 181 152 126 98 127 95 181 152 126 98 127 95 181 SULT Data Mining Pseudogenes Exon length ( bp) 337 SULT1C3 196 2 >142 113 88 113 3 129 4 98 6 95 5 127 7a 181 7b 181 8a >113 8b >113 499 ~18 kb 4998 SULT1B1 >28 kb 432 SULT1C1 ~20.1 kb >164 129 172 126 95 98 127 95 181 98 127 95 181 851 2129 3035 137 1421 4144 233 >113 SULT6B 361 1 2 106 3 113 4 90 4b 84 5 127 6 95 7 157 8 246 ~29 kb 542 SULT1C2 ~10.1 kb 97 SULT1E1 ~18 kb 154 215 SULT2A1 ~16 kb SULT2B1 ~48 kb 152 SULT4A1 ~38 kb 544 126 98 127 95 181 126 98 127 95 181 127 95 178 127 95 181 127 95 139 209 209 248 131 81 974 730 3840 407 3349 4231 3541 3368 167 955 291 1605 © Mayo Clinic College of Medicine 2005 19 Biomedical Informatics © Mayo Clinic College of Medicine 2005 20 Biomedical Informatics Discriminate Differences of Kind Reference Terminology • At what level of granularity? • What is a Sulfotransferase? • What level of detail should a reference terminology of enzymes convey? • Is the logical limit of all reference terminologies a basis on types of quark? © Mayo Clinic College of Medicine 2005 21 © Mayo Clinic College of Medicine 2005 22 Biomedical Informatics LexGrid as a Terminology Interchange • Proliferation of ontologies and vocabularies • Varieties of formats and terminology models • Various versions over time • Hard to find appropriate resource • Establish “web of terminology” to link content • Extension of Semantic Web concept • Common tools, formats, and interfaces LexGrig.org & HL7 CTS integrated into cBIO © Mayo Clinic College of Medicine 2005 © Mayo Clinic College of Medicine 2005 23 4