Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NCBO, the OBO-Foundry, and you GO User’s meeting September 10th, 2006 Suzanna Lewis GO Consortium & National Center for Biomedical Ontology http://www.geneontology.org/ http://www.bioontology.org/ There is no requirement that ontology be done using any particular technology. Three fundamental dichotomies 1. types vs. instances 2. continuants vs. occurrents 3. dependent vs. independent For example, in the GO’s 3 ontologies occurent molecular function continuant dependent biological process cellular compone nt independent Molecules, cell components , organisms are independent continuants which have functions (these are dependent continuants), and these functions may be realized as an A Portion of the OBO Library QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Due Diligence is the 1st step! We keep reinventing the wheel We don’t even know what’s out there! We need tools to help us compare and contrast ontologies We need tools to keep track of ontology history and to compare versions We need infrastructure for connecting ontologies Open Biomedical Ontologies: OBO Mark 1 Initially side-project of the Gene Ontology http://obo.sourceforge.net ontology management and versioning website mailing lists limitations due to lack of resources lacking ontology development support little in the way of integration neither ‘nuts-n-bolts’ and semantic integration The National Center for Biomedical Ontology What is NCBO? NCBO’s 7 Cores Core 1: Computer science Core 2: Bioinformatics Core 3: Driving biological projects Core 4: Infrastructure Core 5: Education and Training Core 6: Dissemination Core 7: Administration Who NCBO is Stanford: Tools for ontology alignment, indexing, and management (Cores 1, 4–7: Mark Musen) Lawrence–Berkeley Labs: Tools to use ontologies for data annotation (Cores 2, 5–7: Suzanna Lewis) Mayo Clinic: Tools for access to large controlled terminologies (Core 1: Chris Chute) Victoria: Tools for ontology and data visualization (Cores 1 and 2: Margaret-Anne Story) University at Buffalo: Dissemination of best practices for ontology engineering (Core 6: Barry Smith) cBio Driving Biological Projects Trial Bank: UCSF, Ida Sim Qui ckTime™ and a TIFF (LZW) decompressor are needed to see thi s pi cture. Flybase: Cambridge, Michael Ashburner QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. ZFIN: Oregon, Monte Westerfield QuickT ime ™an d a TIFF ( Uncomp res sed) deco mpre ssor ar e need ed to see this pictur e. Animal disease models Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype Animal disease models Humans Mutant Gene Animal models Mutant Gene Mutant or missing Protein Mutant or missing Protein Mutant Phenotype (disease) Mutant Phenotype (disease model) Animal disease models Humans Mutant Gene Animal models Mutant Gene Mutant or missing Protein Mutant or missing Protein Mutant Phenotype (disease) Mutant Phenotype (disease model) Animal disease models Humans Mutant Gene Animal models Mutant Gene Mutant or missing Protein Mutant or missing Protein Mutant Phenotype (disease) Mutant Phenotype (disease model) SHH-/+ SHH-/- shh-/+ shh-/- Phenotype (clinical sign) = entity + quality Phenotype (clinical sign) = entity P1 = eye + quality + hypoteloric Phenotype (clinical sign) = entity P1 P2 + quality = eye + hypoteloric = midface + hypoplastic Phenotype (clinical sign) = entity P1 P2 P3 + = eye + = midface + = kidney + quality hypoteloric hypoplastic hypertrophied Phenotype (clinical sign) = entity P1 P2 P3 + = eye + = midface + = kidney + ZFIN: eye midface kidney + quality hypoteloric hypoplastic hypertrophied PATO: hypoteloric hypoplastic hypertrophied Phenotype (clinical sign) = entity + quality Anatomical ontology Cell & tissue ontology Developmental ontology Gene ontology biological process cellular component + PATO (phenotype and trait ontology) Phenotype (clinical sign) = entity P1 P2 P3 + = eye + = midface + = kidney + quality hypoteloric hypoplastic hypertrophied Syndrome = P1 + P2 + P3 (disease) = holoprosencephaly Human holoprosencephaly Zebrafish shh Zebrafish oep What is Phenote? A tool for annotating Phenotypes 1. Curator reads about a phenotype in the literature related to taxonomy or genotype 2. Curator enters genotype(or taxonomy) 3. Curator enters genetic context (optional) 4. Curator searches/enters Entity (e.g. Anatomy) 5. Curator searches/enters PATO attribute/value A Portion of the OBO Library QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. OBO Mark II: Infrastructure Integrated access to all OBO ontologies Programmatic and user access web interface interface via tools (OBO-Edit, Protégé) application programmer interfaces (APIs) web services Advanced search facilities lexgrid Visualization Ontology metadata Return to GO (do not collect $200) Specific Aims of the GO 2006 We will maintain comprehensive, logically rigorous and biologically accurate ontologies. We will comprehensively annotate 9 reference genomes in as complete detail as possible. We will support annotation across all organisms. We will provide our annotations and tools to the research community. Weaving and untangling the GO Missing relations is_a completeness Adding new relations within single GO ontology Adding “regulates” to BP Distinguishing different part_of relations Adding Relations between GO axis Linking between MF & BP & CC Adding relations between GO & other ontologies GO+Cell GO+anatomy Implicit ontologies within the GO: cysteine biosynthesis (ChEBI) myoblast fusion (Cell Type Ontology) hydrogen ion transporter activity (ChEBI) snoRNA catabolism (Sequence Ontology) wing disc pattern formation (Drosophila anatomy) epidermal cell differentiation (Cell Type Ontology) regulation of flower development (Plant anatomy) interleukin-18 receptor complex (not yet in bol produces genus-differentia logical definition OBO editor go.obo cell.obo cell.obo cell.obo name parser Ego.obo obol config cjm oboedit GO editor reasoner obol go ‘fixed’ obol report Relations to Other Ontologies CL GO blood cell cell differentiation lymphocyte differentiation lymphocyte B-cell activation is_a B-cell differentiation B-cell CELL Ontology [Term] id: CL:0000236 name: B-cell is_a: CL:0000542 ! lymphocyte develops_from: CL:0000231 ! B-lymphoblast Augmented GO [Term] id: GO:0030183 name: B-cell differentiation is_a: GO:0042113 ! B-cell activation is_a: GO:0030098 ! lymphocyte differentiation intersection_of: is_a GO:0030154 ! cell differentiation intersection_of: has_participant CL:0000236 ! B-cell There are many less than perfect ontologies Use the power of combination and collaboration Ontologies are like telephones: they are valuable only to the degree that they are used and networked with other ontologies But to work telephones must be connected Like telephones, most ontologies were broken when the technology was first being developed The OBO-Foundry is: foun·dry An establishment where metal is melted and poured into molds OBO-foun·dry An establishment where scientific theory is formalized and represented in ontologies To create the conditions for a stepby-step evolution towards robust gold standard reference ontologies in the biomedical domain. To introduce some of the features of scientific peer review into biomedical ontology development. obofoundry.org TheOBO OBO Foundry Foundry A subset of OBO ontologies whose developers agree in advance to accept a common set of principles designed to assure intelligibility to biologist curators, annotators, users formal robustness stability compatibility interoperability support for logic-based reasoning RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy?) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular December Component Function 1st & 2nd (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) Building out from the original GO CRITERIA The ontology is OPEN and available to be used by all. The ontology is in, or can be instantiated in, a COMMON FORMAL LANGUAGE. The developers of the ontology agree in advance to COLLABORATE with developers of other OBO Foundry ontology where domains overlap. The OBO Foundry http://obofoundry.org/ CRITERIA UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary. The OBO Foundry http://obofoundry.org/ Orthogonality of ontologies implies additivity of annotations If we annotate a database or body of literature with one high-quality biomedical ontology, we should be able to add annotations from a second such ontology without conflicts The OBO Foundry http://obofoundry.org/ CRITERIA IDENTIFIERS: The ontology possesses a unique identifier space within OBO. VERSIONING: The ontology provider has procedures for identifying distinct successive versions to ensure BACKWARDS COMPATIBITY with annotation resources already in common use The ontology includes TEXTUAL DEFINITIONS and where possible equivalent formal definitions of its terms. The OBO Foundry http://obofoundry.org/ CRITERIA CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content. DOCUMENTATION: The ontology is well-documented. USERS: The ontology has a plurality of independent users. The OBO Foundry http://obofoundry.org/ CRITERIA AGREE ON RELATIONS: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.* The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to Genome Biology 6:R46, 2005. be aligned. Elements for Success 1 A Community with a common vision A pool of talented and motivated developers/scientists A mix of academic and commercial An organized, light weight approach to product development A leadership structure Communication A well-defined scope, (our “business”) Adopted from “Open Source Menu for Success” Elements for Success 2 Keep It Simple: lowest possible barrier to entry Technology independence “With new data, we change our minds” An ontology must adapt to reflect current understanding of reality Plan for and anticipate changes Stay close to your users biologists and medical researchers Ontology: A thing of beauty is a joy forever With acknowledgement and thanks to • • • • • • • • Seth Carbon John DayRichter Karen Eilbeck Mark Gibson Sima Misra Chris Mungall Shu Shengqiang Nicole Berkeley Washington Michael o Mark Musen Ashburner o Chris Chute Judith Blake o Barry Smith J. Michael o Daniel Rubin Cherry o Monte David Hill Westerfield Midori Harris o Michael Rex Chisholm Ashburner NCBO And GO many o And more… *Without even going into our other projects: Apollo, SO, Chado, GMOD, DAS, Reactom more… BOP