Download Ontology-HIMA-May-20..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A Guide to Ontology for the
Pathologist of the Future
Barry Smith
http://ontology.buffalo.edu/smith
Old biology data
2/
New biology data
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSF
YEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFV
EDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLF
YLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIV
RSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDT
ERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNF
GAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRL
RKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVA
QETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTD
YNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFN
HDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYAT
FRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYES
ATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQ
WLGLESDYHCSFSSTRNAEDVDISRIVLYSYMFLNTAKGCLVEYA
TFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYE
3
SATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWI
How to do biology across the genome?
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS
VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER
CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL
KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC
KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGIS
LLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWM
DVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSR
FETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVM
KVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISV
MVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERC
HEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLK
RDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCK
LRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL
AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD
VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF
ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMK
VSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVM
VGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCH
EIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKR
DLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKL
RSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL
AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD
VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF
ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMK
VSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVM
VGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCH4
EIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKR
how to link the kinds of phenomena
represented here
5
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRK
RSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSL
FYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLL
HVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNF
GAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLD
IFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDY
NKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDIS
RIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESA
TSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVV
AGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQA
PPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDL
YVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEK
AIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKI
RKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKE
FVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKG
ELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVAL
PSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTN
ASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNA
TTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNT
NATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDG
NAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYF
CPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDP
VGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNL
RESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRH
HRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHW
LDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLE
6
YLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVG
ELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG
to this?
or this?
7
Answer
Create an ontology: a controlled logically
structured consensus classification of the
types of entities in a domain
All scientists in the domain use the same
ontology aggressively to annotate their
data
8
annotation using common ontologies allows
navigation between databases
GlyProt
MouseEcotope
sphingolipid
transporter
activity
DiabetInGene
GluChem
9
this allows discovery and integration of
databases
GlyProt
MouseEcotope
Holliday junction
helicase complex
DiabetInGene
GluChem
10
96
Number of abstracts mentioning "ontology" or
"ontologies" in PubMed/MEDLINE
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
By far the most successful: GO (Gene Ontology)
12
13
Gene Ontology
$100 mill. invested in literature and database
curation using the Gene Ontology (GO)
based on the idea of annotation
over 11 million annotations relating gene
products (proteins) described in the UniProt,
Ensembl and other databases to terms in the
GO
multiple secondary uses – because the
ontology was not built to meet one specific
set of requirements
14
GO provides a controlled system of terms
for use in annotating (describing, tagging)
sequence data
• multi-species, multi-disciplinary, open
source
• contribute to the cumulativity of
scientific results obtained by distinct
research communities
• formal definitions of all terms to support
computational reasoning
15
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Gene Ontology within the OBO Foundry
16
Built on the GO
Cell Ontology
ChEBI Chemistry Ontology
Human Disease Ontology
Human Phenotype Ontology
Infectious Disease Ontology
Protein Ontology
Ontology for General Medical Science
Tissue Ontology
17
The goal of ontology
Demolishing data silos
18
SNOMED CT
Systematized Nomenclature of Medicine –
Clinical Terms
built by College of American Pathologists
19
20
http://bioportal.bioontology.org/
21
The Ontology for Biomedical
Investigations
22
23
QIBO
24
QIO
Quantitative Image Ontology
Controlled structured vocabulary for
describing
• Slide preparation protocols
• Image analysis protocols
•…
to work with data annotated using pathology
ontologies such as SNOMED CT for
describing biological and clinical data
25
Ontology and Imaging
Informatics
Buffalo, June 23-25, 2014
• Tutorial: June 23
• Workshop: June 24
• QIO Hackathon: June 25
http://goo.gl/GTViNW
26
Related documents