Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
A Guide to Ontology for the Pathologist of the Future Barry Smith http://ontology.buffalo.edu/smith Old biology data 2/ New biology data MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSF YEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFV EDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLF YLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIV RSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDT ERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNF GAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRL RKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVA QETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTD YNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFN HDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYAT FRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYES ATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQ WLGLESDYHCSFSSTRNAEDVDISRIVLYSYMFLNTAKGCLVEYA TFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYE 3 SATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWI How to do biology across the genome? MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGIS LLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWM DVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSR FETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVM KVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISV MVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERC HEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLK RDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCK LRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMK VSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVM VGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCH EIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKR DLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKL RSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMK VSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVM VGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCH4 EIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKR how to link the kinds of phenomena represented here 5 MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRK RSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSL FYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLL HVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNF GAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLD IFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDY NKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDIS RIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESA TSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVV AGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQA PPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDL YVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEK AIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKI RKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKE FVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKG ELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVAL PSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTN ASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNA TTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNT NATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDG NAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYF CPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDP VGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNL RESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRH HRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHW LDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLE 6 YLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVG ELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG to this? or this? 7 Answer Create an ontology: a controlled logically structured consensus classification of the types of entities in a domain All scientists in the domain use the same ontology aggressively to annotate their data 8 annotation using common ontologies allows navigation between databases GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem 9 this allows discovery and integration of databases GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem 10 96 Number of abstracts mentioning "ontology" or "ontologies" in PubMed/MEDLINE 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 By far the most successful: GO (Gene Ontology) 12 13 Gene Ontology $100 mill. invested in literature and database curation using the Gene Ontology (GO) based on the idea of annotation over 11 million annotations relating gene products (proteins) described in the UniProt, Ensembl and other databases to terms in the GO multiple secondary uses – because the ontology was not built to meet one specific set of requirements 14 GO provides a controlled system of terms for use in annotating (describing, tagging) sequence data • multi-species, multi-disciplinary, open source • contribute to the cumulativity of scientific results obtained by distinct research communities • formal definitions of all terms to support computational reasoning 15 RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) The Gene Ontology within the OBO Foundry 16 Built on the GO Cell Ontology ChEBI Chemistry Ontology Human Disease Ontology Human Phenotype Ontology Infectious Disease Ontology Protein Ontology Ontology for General Medical Science Tissue Ontology 17 The goal of ontology Demolishing data silos 18 SNOMED CT Systematized Nomenclature of Medicine – Clinical Terms built by College of American Pathologists 19 20 http://bioportal.bioontology.org/ 21 The Ontology for Biomedical Investigations 22 23 QIBO 24 QIO Quantitative Image Ontology Controlled structured vocabulary for describing • Slide preparation protocols • Image analysis protocols •… to work with data annotated using pathology ontologies such as SNOMED CT for describing biological and clinical data 25 Ontology and Imaging Informatics Buffalo, June 23-25, 2014 • Tutorial: June 23 • Workshop: June 24 • QIO Hackathon: June 25 http://goo.gl/GTViNW 26