* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Pax Terminologica - Buffalo Ontology Site
Survey
Document related concepts
Transcript
Common Anatomy Reference Ontology Workshop What an Ontology is For Barry Smith University at Buffalo http://ontology.buffalo.edu/smith 1 we are accumulating huge amounts of data how do we know what data we have ? how do I know what data you have ? how do we know what data we don’t have ? how do we make different sorts of data combinable ? 2 3 where in the cell ? what kind of process ? what kind of biological end ? we need semantic annotation of data 4 how create broad-coverage semantic annotation systems for biomedicine? Semantic Web, Moby, wikis, UMLS, etc. let a million flowers (weeds) bloom and create integration via post hoc mappings 5 an foralternative science develop high quality annotation resources in a collaborative, community effort create an evolutionary path towards improvement on the basis of common prospective standards based on science 6 for science science works out from a validated core, and strives to isolate and resolve inconsistencies as it extends outwards we need to create a validated core including ontologies corresponding to the basic biomedical sciences in this core low hanging fruit 7 Anatomical Structure Anatomical Space Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Pleural Sac Pleural Cavity Parietal Pleura Interlobar recess Organ Part Mediastinal Pleura Foundational FMA Model of Anatomy Pleura(Wall of Sac) Visceral Pleura Mesothelium of Pleura Tissue but for we science need more where do we find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development, histology in different model organisms? 9 what makes GO so wildly successful ? 10 The methodology of annotations science base: trained experts curating peerreviewed literature create an evolving set of standardized descriptions used to annotate the entities represented in the major biochemical databases and thereby to integrate these databases 11 this leads to improvements and extensions of the ontology which in turn leads to better annotations which leads to further improvement in the quality and reach of both future annotations and the ontology itself RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form 12 Five bangs for your GO buck cross-species database integration cross-granularity database integration through links to the things which are of biomedical relevance semantic searchability links people to software human curated science base creates de facto gold standard (benchmark for comparison) 13 but now need to create a de jure standard: improve the quality of the GO establish common rules governing best practices for creating ontologies and for using these in annotations apply these rules to create a complete suite of orthogonal interoperable biomedical reference ontologies 14 First step (2003) a shared portal for (so far) 58 ontologies (low regimentation) http://obo.sourceforge.net 15 Second step (2004) reform efforts initiated, e.g. linking GO to other OBO ontologies to ensure orthogonality GO id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. + Cell type = New Definition 16 Third step (2006) The OBO Foundry http://obofoundry.org/ 17 18 The OBO Foundry a family of interoperable gold standard biomedical reference ontologies to serve the annotation of inter alia scientific literature model organism databases clinical trial data The OBO Foundry http://obofoundry.org/ 19 A prospective standard designed to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping) 12 initial candidate OBO ontologies – focused primarily on basic science domains several being constructed ab initio by influential consortia who have the authority to impose their use on large parts of the relevant communities. 20 GO Gene Ontology undergoing ChEBI Chemical Ontology rigorous CL Cell Ontology FMA Foundational Model of Anatomy reform PaTO Phenotype Quality Ontology SO Sequence Ontology CARO Common Anatomy Reference Ontology CTO Clinical Trial Ontology FuGO Functional Genomics Investigation Ontology PrO Protein Ontology RnaO RNA Ontology RO Relation Ontology new The OBO Foundry http://obofoundry.org/ 21 GO Gene Ontology ChEBI Chemical Ontology CL Cell Ontology FMA Foundational Model of Anatomy PaTO Phenotype Quality Ontology SO Sequence Ontology CARO Common Anatomy Reference Ontology CTO Clinical Trial Ontology FuGO Functional Genomics Investigation Ontology PrO Protein Ontology to be absorbed in new RnaO RNA Ontology Ontology of RO Relation Ontology new The OBO Foundry http://obofoundry.org/ Biomedical Investigations (OBI) 22 Ontology Scope URL Custodians Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgibin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman Chemical Entities of Biological Interest (ChEBI) molecular entities ebi.ac.uk/chebi Paula Dematos, Rafael Alcantara Common Anatomy Reference Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland ??? Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.net FuGO Working Group Gene Ontology (GO) cellular components, molecular functions, biological processes www.geneontology.org Gene Ontology Consortium Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos Protein Ontology (PrO) protein types and modifications (under development) Protein Ontology Consortium Relation Ontology (RO) relations obo.sf.net/relationship Barry Smith, Chris Mungall RNA Ontology (RnaO) three-dimensional RNA structures (under development) RNA Ontology Consortium Sequence Ontology (SO) properties and features of nucleic sequences song.sf.net Karen Eilbeck RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy?) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) 24 all OBO Foundry developers have agreed to a common set of evolving principles reflecting best practice in ontology development designed to ensure tight connection to the biomedical basic sciences compatibility interoperability, common relations formal robustness support for logic-based reasoning The OBO Foundry http://obofoundry.org/ 25 PRINCIPLES The ontology is OPEN and available to be used by all. The ontology is in, or can be instantiated in, a COMMON FORMAL LANGUAGE. The developers of the ontology agree in advance to COLLABORATE with developers of other OBO Foundry ontology where domains overlap. The OBO Foundry http://obofoundry.org/ 26 PRINCIPLES UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary. The OBO Foundry http://obofoundry.org/ 27 orthogonality of ontologies implies for science additivity of annotations if we annotate a database or body of literature with one high-quality biomedical ontology, we should be able to add annotations from a second such ontology without conflicts science aims for consistency because science aims for correctness The OBO Foundry http://obofoundry.org/ 28 PRINCIPLES IDENTIFIERS: The ontology possesses a unique identifier space within OBO. VERSIONING: The ontology provider has procedures for identifying distinct successive versions to ensure BACKWARDS COMPATIBITY with annotation resources already in common use The ontology includes TEXTUAL DEFINITIONS and where possible equivalent formal definitions of its terms. 29 PRINCIPLES CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content. DOCUMENTATION: The ontology is welldocumented. USERS: The ontology has a plurality of independent users. The OBO Foundry http://obofoundry.org/ 30 PRINCIPLES COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.* * Smith et al., Genome Biology 2005, 6:R46 The OBO Foundry http://obofoundry.org/ 31 OBO Relation Ontology Foundational is_a part_of Spatial located_in contained_in adjacent_to Temporal transformation_of derives_from preceded_by Participation has_participant has_agent The OBO Foundry http://obofoundry.org/ 32 IT WILL GET HARDER Further principles will be added over time in light of lessons learned BUT NOT EVERYONE NEEDS TO JOIN The Foundry is not seeking to serve as a check on flexibility or creativity The OBO Foundry http://obofoundry.org/ 33 GOALS CREDIT for high quality ontology development work KUDOS for early adopters of high quality ontologies / terminologies e.g. in reporting clinical trial results The OBO Foundry http://obofoundry.org/ 34 GOALS to introduce some of the features of SCIENTIFIC PEER REVIEW into biomedical ontology development to providing a FRAMEWORK OF RULES to counteract the current policy of ad hoc creation if data-schemas are formulated using a single ontology system in widespread use this supports DATA REUSABILITY The OBO Foundry http://obofoundry.org/ 35 A dichotomy universals (types, kinds, classes) vs. instances (particulars, individuals) 36 Catalog vs. inventory A B C 515287 521683 521682 DC3300 Dust Collector Fan Gilmer Belt Motor Drive Belt 37 An ontology is a representation of universals 38 An ontology is a representation of universals We learn about universals by looking at scientific texts – which describe what is general in reality 39 universals substance organism animal mammal cat leaf class siamese instances frog rule of single inheritance no diamonds: C is_a2 B is_a1 A 41 problems with multiple inheritance B C is_a1 is_a2 A ‘is_a’ no longer univocal 42 ‘is_a’ is pressed into service to mean a variety of different things shortfalls from single inheritance are often clues to incorrect entry of terms and relations the resulting ambiguities make the rules for correct entry difficult to communicate to human curators 43 is_a overloading serves as obstacle to integration with neighboring ontologies The success of ontology alignment depends crucially on the degree to which basic ontological relations such as is_a and part_of can be relied on as having the same meanings in the different ontologies to be aligned. 44 What single inheritance costs In some respects harder to build ontologies harder to use ontologies to find terms Solutions: normalization, GUIs Recommendation: if building from scratch use single inheritance 45 What single inheritance brings Coherent hierarchies Modularity Statistical representativeness Jointly exhaustive pairwise disjoint classification Coherent methodology for definitions 46 Aristotelian definitions When A is_a B, the definition of ‘A’ has the form: an A =def. a B which ... a human being =def. an animal which is rational Each definition reflects the position in the hierarchy to which a defined term belongs. 47 FMA Examples Cell =def. an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus Plasma membrane =def. a cell part that surrounds the cytoplasm 48 Canonical ontologies 49 The FMA is a canonical representation of types and relations between types deduced from the qualitative observations of the normal human body, which have been refined and sanctioned by successive generations of anatomists and presented in textbooks and atlases of structural anatomy. 50 The GO is a canonical representation “The Gene Ontology is a computational representation of the ways in which gene products normally function in the biological realm” Nucl. Acids Res. 2006: 34. 51 The Gene Ontology is a canonical ontology – it represents only what is normal in the realm of molecular functioning 52 The core of the OBO Foundry consists of canonical ontologies (pathoanatomy, pathophysiology will come later) 53 Three canonical ontologies CARO + Ontology of Functions + Ontology of Developmental Processes (part of GO Biological Process ontology?) 54 A second fundamental dichotomy • universals vs. instances • continuants vs. occurrents 55 Continuants (aka endurants) – have continuous existence in time – preserve their identity through change Occurrents (aka processes) – have temporal parts – unfold themselves in successive phases 56 You are a continuant Your life is an occurrent You are 3-dimensional Your life is 4-dimensional 57 A third fundamental dichotomy • types vs. instances • continuants vs. occurrents • dependent vs. independent 58 Dependent entities require independent continuants as their bearers There is no grin without a cat There is no quality without a bearer There is no disease without an organism 59 All occurrents are dependent entities They are dependent on those independent continuants which are their participants (agents, patients, media ...) There is no run without a runner 60 Dependent vs. independent continuants Independent continuants (organisms, cells, molecules, environments) Dependent continuants (qualities, shapes, roles, propensities, functions) 61 Continuant Independent Continuant Occurrent (always dependent on one or more independent continuants) Dependent Continuant Top-Level Ontology 62 Continuant Occurrent biological process Independent Continuant Dependent Continuant cell component molecular function The GO Top-Level Ontology 63 Functions vs Functionings the function of your heart = to pump blood in your body this function is realized in processes of pumping blood not all functions are realized (consider the function of this sperm ...) not all processes are functionings 64 Continuant Independent Continuant Occurrent Dependent Continuant (Function) Functioning Stochastic process Incidental by-product 65 The OBO Relation Ontology 66 Part_of as a relation between universals heart part_of human being ? human heart part_of human being ? human testis part_of human being ? human being has_part human testis ? 67 two kinds of parthood 1. between instances: Mary’s heart part_of Mary this nucleus part_of this cell 2. between universals human heart part_of human cell nucleus part_of cell 68 Definition of part_of as a relation between universals A part_of B =Def. all instances of A are instance-level parts of some instance of B human testis part_of adult human being but not adult human being has_part human testis 69 Continuants – have continuous existence in time – preserve their identity through change Occurrents (aka processes) – have temporal parts – unfold themselves in successive phases 70 part_of (for processes) A part_of B =def. For all x, if x instance_of A then there is some y, y instance_of B and x part_of y where ‘part_of’ is the instance-level part relation EVERY A IS PART OF SOME B 71 part_of (for continuants) A part_of B =def. For all x, t if x instance_of A at t then there is some y, y instance_of B at t and x part_of y at t where ‘part_of’ is the instance-level part relation ALL-SOME STRUCTURE 72 part_of (for continuants) A part_of B =def. if an A exists at t then it is part_of some B at t where ‘part_of’ is the instance-level part relation 73 has_part (for continuants) A has_part B =def. if an A exists at t then there is some B of which it is a part at t 74 human testis part_of adult human being but not adult human being has_part human testis 75 is_a (for processes) A is_a B =def For all x, if x instance_of A then x instance_of B cell division is_a biological process 76 is_a (for continuants) A is_a B =def For all x, t if x instance_of A at t then x instance_of B at t abnormal cell is_a cell adult human is_a human but not: adult is_a child 77 A part_of B, B part_of C ... The all-some structure of the definitions in the OBO-RO allows cascading of inferences (i) within ontologies (ii) between ontologies (iii) between ontologies and EHR repositories of instance-data 78 OBO Relation Ontology Foundational is_a part_of Spatial located_in contained_in adjacent_to Temporal transformation_of derives_from preceded_by Participation has_participant has_agent 79 David Sutherland For any structure x, I should be able to answer the questions: 1. What is x (what type of thing is it)? 2. Where is x (what is it part of)? 3. What subtypes of x are there? 4. What parts does x have? 80 For any structure x, I should be able to answer the questions: 1. 2. 3. 4. 5. What type of thing is x? Say: A What types of things are As part of ? What types of things are As located in ? What subtypes of A’s are there? What parts do A’s have? For continuants: located_in = either part_of or contained_in 81 David The first 2 questions are important for navigating the ontology The second 2 questions are crucial to grouping curations If we are looking for phenotypes that effect hands, we need to be able to deduce that a hand has fingers and so add finger phenotypes to our hand phenotype list. I think that having 'has_part' relationships in the ontology is key to acheiving this. 82 Anatomical Structure Anatomical Space Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Pleural Sac Pleural Cavity Parietal Pleura Interlobar recess Organ Part Mediastinal Pleura Foundational FMA Model of Anatomy Pleura(Wall of Sac) Visceral Pleura Mesothelium of Pleura Tissue human uterus part_of human being but not human body has_part human uterus 84 Temporal relations 85 transformation_of same instance C c at t C1 c at t1 time 86 transformation_of A transformation_of B =Def. Every instance of A was at some earlier time an instance of B adult transformation_of child heart transformation of heart-precursor 87 C C1 c at t c at t1 embryological development 88 derives_from C C1 c at t c1 at t1 time C' c' at t instances ovum zygote derives_from sperm 89 two continuants fuse to form a new continuant C C1 c at t c1 at t1 C' c' at t fusion 90 one initial continuant is replaced by two successor continuants C c at t C1 c1 at t1 C2 c1 at t1 fission 91 is a relation combining transformation with fusion and fission (extended from the binary cases) what we are seeking in order to capture development via CARO? should this relation be called ‘derives_from’ or ‘develops_from’ 92 one continuant detaches itself from an initial continuant, which itself continues to exist C c at t c at t1 C1 c1 at t budding 93 one continuant absorbs a second continuant while itself continuing to exist C c at t c at t1 C' c' at t capture 94 Principle of low hanging fruit often one of two reciprocal relations (e.g. part_of and has_part) will hold universally human testis part_of human body but not human body has_part human testis 95 Principle of low hanging fruit nucleus adjacent_to cytoplasm but not cytoplasm adjacent_to nucleus 96 Principle of low hanging fruit seminal vesicle adjacent_to urinary bladder but not: urinary bladder adjacent_to seminal vesicle 97 Top-Level Categories in the FMA anatomical entity physical anatomical entity material physical anatomical entity anatomical structure body substance non-physical anatomical entity non-material physical anatomical entity body space boundary anatomical attribute anatomical relationship 98 Fiat vs. bona fide boundaries 99 Layers of the body’s surface kidshealth.org/kid/ body/skin_noSW.html 100 Top-Level Categories in the FMA anatomical entity physical anatomical entity material physical anatomical entity anatomical structure body substance non-physical anatomical entity non-material physical anatomical entity body space boundary anatomical attribute anatomical relationship 101 102 www.enel.ucalgary.ca/ People/Mintchev/stomach.htm anatomical entity physical anatomical entity material physical anatomical entity anatomical structure body substance non-physical anatomical entity non-material physical anatomical entity body space boundary fiat boundary anatomical attribute anatomical relationship bona fide boundary 103 fiat vs. bona fide boundaries fiat boundary in anatomical space physical boundary 104 105 www.enel.ucalgary.ca/ People/Mintchev/stomach.htm varieties of fiat boundaries in anatomical structures in body spaces spatial vs. temporal (stages, pathways) in instances in the realm of universals 106 varieties of fiat boundaries in anatomical structures 107 modes of connection –attached_to (muscle to bone) –synapsed_with (nerve to nerve, nerve to muscle) –continuous_with (= share a fiat boundary) 108 a continuous_with b = a and b are continuant instances which share a fiat boundary This relation on the instance level is always symmetric: if x continuous_with y , then y continuous_with x 109 continuous_with (relation between universals) A continuous_with B =Def. for all x, if x instance-of A then there is some y such that y instance_of B and x continuous_with y 110 continuous_with as a relation between universals is not symmetric Consider lymph node and lymphatic vessel: – Each lymph node is continuous with some lymphatic vessel, but there are lymphatic vessels (e.g. lymphs and lymphatic trunks) which are not continuous with any lymph nodes 111 wherever we have fiat boundaries there is a certain indeterminacy in the location of the boundary where does the arm begin? where does the head begin? where does abnormal curvature of the spine begin 112 do regions have this indeterminacy? 113 An ontology is a representation of types Each term in an ontology should be a singular common noun Cell, lung, ... refer to instances in reality by referring to the types which they instantiate 114 Problems with mass nouns ‘blood’ ‘menstrual fluid’ 115 Problems with ‘tissue’ a specific portion of cells (instance) a specific portion of cells (type) a specific portion of cells of a certain type (instance) a specific portion of cells of a certain type (type) an arbitrary portion of cells x 4 as above all of the above IN the body all of the above in the form of samples OUTSIDE the body a type of tissue, e.g. mesothelial tissue 116 Brenda Tissue Ontology contains statements like: arm is-a limb (here everything a tissue) Aukland Anatomy Ontology Classifies tissue into: Connective tissue, Epithelial tissue, Glandular tissue, Muscle tissue, Nervous tissue; proceeding further down the hierarchy we find not tissues but SimpleTubularGland, SimpleAcinarGland, etc. EndocrineGland is asserted to have two ‘instances’ EndocrineGland (!), and FollicularEndocrineGland. ConnectiveTissue has ‘instances’: Left Humerus, Right Tibia, ... 117 Recommendation avoid ‘tissue’ and all count nouns hypothesis: in every case where one would want to use ‘portion of tissue’ in a scientific anatomy we mean: maximally connected portion of tissue, and there is already a common noun for a corresponding type (?) 118 119 CM application (current and future) of Foundry principles in GO stages application aspects of multiple inheritance: pre- and post-coordination 120