Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microevolution wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
GO, NCBO, Phenotypes, & the OBO Foundry January 29th, 2007 Ontologies for Biomedical Investigations La Jolla Institute for Allergy and Immunology Suzanna Lewis GO Consortium & National Center for Biomedical Ontology http://www.geneontology.org/ http://www.bioontology.org/ Outline Perspective on the challenge of our mutual investment of time and effort on standards, formalisms, and representation GO case study retrospective NCBO today Phenotype case study OBO-Foundry The Scientific Method A body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and integrating previous knowledge. It is based on observable, empirical, measurable evidence, and subject to rules of reasoning. Isaac Newton (1687, 1713, 1726). "Rules for the study of natural philosophy", Philosophiae Naturalis Principia Mathematica, Book 3, The System of the World. Third edition, the 4 rules as reprinted on pages 794-796 of I. Bernard Cohen and Anne Whitman's 1999 translation, University of California Press ISBN 0-52008817-4, 974 pages. Today’s data is in electronic form Rules of reasoning are an intrinsic element of scientific investigation In our current era, data reside in electronic form Building and using computable ontologies will support rules of reasoning on our data And thereby support research in the computer age. Necessary Character of a computational environment for biological research Sustainable There must be mechanism for maintaining the environment (that is less than the initial cost). Adaptable It must work for the complete spectrum of data types, from genomics to clinical trials It must continually adapt to new knowledge and new technologies Interoperable We need the capability of easily integrating data from a variety of sources. Evolvable Mechanisms must be put in place to respond to the needs of the biomedical research community. They provide the primary selection pressure on the evolution of the technology. 10 Factors in achieving goals 1. Clarity of Vision/Goal 6. Feasible within available resources 2. Political Landscape needs to support the 7. Providing incentives goal for adoption 3. Decision-Process 8. Tactics must change needs to be with the adoption responsive and curve efficient 9. Sustaining effort over 4. Message has to be multiple years brought to the 10. Being satisfied with community highly imperfect, but 5. Accountability (i.e. no pragmatic solutions. vaporware) Credit to John Glaser Criteria for success, & signs of failure Measurable evidence of improved productivity and efficiency (time saved vs. number of users for given output) Evidence of learning from experience e.g., sustained improvements in content volume and quality) Evidence of discoverability - enables positive outputs that were unanticipated Continual, iterative process that occurs at each stage of development and growth, fully integrated into the lifecycle Evidence of community acceptance, that more data is being provided continuously Contains sufficient information to supports reproducibility of results Negotiation of meaning has occurred Stymied by barriers regarding IP, and credit attribution High relative cost for maintenance, support, and boosting interest Problems that occur between the cracks "For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled.” Richard Feynman (1962) GO case study Three fundamental dichotomies 1. types vs. instances 2. continuants vs. occurrents 3. dependent vs. independent For example, in the GO’s 3 ontologies occurent molecular function continuant dependent biological process cellular compone nt independent Molecules, cell components , organisms are independent continuants which have functions (these are dependent continuants), and these functions may be realized as an Specific Aims of the GO 2006 We will maintain comprehensive, logically rigorous and biologically accurate ontologies. We will comprehensively annotate 9 reference genomes in as complete detail as possible. We will support annotation across all organisms. We will provide our annotations and tools to the research community. Weaving and untangling the GO Missing relations is_a completeness Adding new relations within single GO ontology Adding “regulates” to BP Distinguishing different part_of relations Adding Relations between GO axis Linking between MF & BP & CC Adding relations between GO & other ontologies GO+Cell GO+anatomy Implicit ontologies within the GO: cysteine biosynthesis (ChEBI) myoblast fusion (Cell Type Ontology) hydrogen ion transporter activity (ChEBI) snoRNA catabolism (Sequence Ontology) wing disc pattern formation (Drosophila anatomy) epidermal cell differentiation (Cell Type Ontology) regulation of flower development (Plant anatomy) interleukin-18 receptor complex (not yet in Relations to Other Ontologies CL GO blood cell cell differentiation lymphocyte differentiation lymphocyte B-cell activation is_a B-cell differentiation B-cell CELL Ontology [Term] id: CL:0000236 name: B-cell is_a: CL:0000542 ! lymphocyte develops_from: CL:0000231 ! B-lymphoblast Augmented GO [Term] id: GO:0030183 name: B-cell differentiation is_a: GO:0042113 ! B-cell activation is_a: GO:0030098 ! lymphocyte differentiation intersection_of: is_a GO:0030154 ! cell differentiation intersection_of: has_participant CL:0000236 ! B-cell Correlation of mRNA decay rates with (GO) function Genome Research 13:1863-1872, 2003 Decay Rates of Human mRNAs: Correlation With Functional Characteristics and Sequence Attributes. E. Yang, E. van Nimwegen, M. Zavolan, N. Rajewsky, M. Schroeder, M Magnasco and JE Darnell, Jr How GO measures up Measurable evidence of improved productivity and efficiency Researchers simply use the GO, as judged by publications Evidence of learning from experience Formalism of the GO continues to improve Evidence of discoverability - enables positive outputs that were unanticipated Primary use of GO is cluster analysis of microarray expression data Continual, iterative process that occurs at each stage of development and growth, fully integrated into the lifecycle Quarterly updates of software Evidence of community acceptance, that more data is being provided continuously Number of species continues to increase The National Center for Biomedical Ontology BioPortal Phenotype Annotation NCBO’s 7 Cores Core 1: Computer science Core 2: Bioinformatics Core 3: Driving biological projects Core 4: Infrastructure Core 5: Education and Training Core 6: Dissemination Core 7: Administration Who NCBO is Stanford: Tools for ontology alignment, indexing, and management (Cores 1, 4–7: Mark Musen) Lawrence–Berkeley Labs: Tools to use ontologies for data annotation (Cores 2, 5–7: Suzanna Lewis) Mayo Clinic: Tools for access to large controlled terminologies (Core 1: Chris Chute) Victoria: Tools for ontology and data visualization (Cores 1 and 2: Margaret-Anne Story) NCBO Driving Biological Projects Trial Bank: UCSF, Ida Sim Qui ckTime™ and a TIFF (LZW) decompressor are needed to see thi s pi cture. Flybase: Cambridge, Michael Ashburner QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. ZFIN: Oregon, Monte Westerfield QuickT ime ™an d a TIFF ( Uncomp res sed) deco mpre ssor ar e need ed to see this pictur e. BioPortal Indexes, searches and visualizes terms in ontologies in library Uses LexGrid (Mayo) Contains ontologies that their editors have released to BioPortal The BioPortal Needs You! We need, and beg and plead for, your feedback http://www.bioontology.org/ncbo/faces/in dex.xhtml For example: Providing URIs for all ontologies and/or ontology content? Tomorrow depends on you, no request is too mundane. Phenotype work in progress Animal disease models Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype Animal disease models Humans Animal models Mutant Gene Mutant Gene Mutant or missing Protein Mutant or missing Protein Mutant Phenotype (disease) Mutant Phenotype (disease model) Animal disease models Humans Animal models Mutant Gene Mutant Gene Mutant or missing Protein Mutant or missing Protein Mutant Phenotype (disease) Mutant Phenotype (disease model) Animal disease models Humans Animal models Mutant Gene Mutant Gene Mutant or missing Protein Mutant or missing Protein Mutant Phenotype (disease) Mutant Phenotype (disease model) SHH-/+ SHH-/- shh-/+ shh-/- Phenotype (clinical sign) = entity + quality Phenotype (clinical sign) = entity P1 = eye + quality + hypoteloric Phenotype (clinical sign) = entity P1 P2 + quality = eye + hypoteloric = midface + hypoplastic Phenotype (clinical sign) = entity P1 P2 P3 + = eye + = midface + = kidney + quality hypoteloric hypoplastic hypertrophied Phenotype (clinical sign) = entity P1 P2 P3 + = eye + = midface + = kidney + ZFIN: eye midface kidney + quality hypoteloric hypoplastic hypertrophied PATO: hypoteloric hypoplastic hypertrophied Phenotype (clinical sign) = entity + quality Anatomical ontology Cell & tissue ontology Developmental ontology Gene ontology biological process cellular component + PATO (phenotype and trait ontology) Phenotype (clinical sign) = entity P1 P2 P3 + = eye + = midface + = kidney + quality hypoteloric hypoplastic hypertrophied Syndrome = P1 + P2 + P3 (disease) = holoprosencephaly Human holoprosencephaly Zebrafish shh Zebrafish oep Observable Attribute Bone Value length Qualifi er short very Conditions Genetic contribution attribute Assay Units Environmental contribution Compound observabl e (disease?) obesity observable value units body mass 30 BMI obesity blood 400 ? hypertension artery triglyceride levels elasticity 140/90 mm Hg qualifie r assay conditions TIGR, December 6-7, 2002 increased elevated body fat analysis blood test normal normal PaTO upper level Unifying goal: Integrating data within and across domains (e.g. different taxa) across levels of granularity across different perspectives Requires Rigorous formal definitions in both ontologies and annotation schemas Top level PaTO division: spatial vs temporal Note: some nodes omitted for Quality brevity Quality of a continuant Quality of an occurrent A quality which inheres In a continuant A quality which inheres In a process or spatiotemporal region physical quality color density morphology shape size cellular duration quality structure arrested premature delayed Top level PaTO division: Granularity Monadic quality of a continuant … Physical quality Cellular quality A quality that exists through action of continuants at the physical level of organisation A quality that exists at the cellular level of organisation color green pink yellow temperature hot cold mass large mass small mass ploidy potency … nucleate quality diploid multipotent haploid totipotent anucleate aneuploid oligoptent binculeate Monadic vs. relational quality of a continuant … Monadic quality of a C A quality of a C that inheres solely in the bearer and does not require another entity Physical Cellular quality quality shape morphology Relational quality of a C … A quality of a C that requires another entity apart from its bearer to exist Sensitivity (to) Displacement (with) Connecte d-ness (to) size structure Relational qualities involving the environment “drought sensitivity” [TO:0000029] Directed towards an additional entity type Q= PATO:sensitivity E2= EO:drought Def: a sensitivity which is directed towards drought [ inheres_in organism ] OBO needs a good environment ontology What is Phenote? A tool for annotating Phenotypes 1. Curator reads about a phenotype in the literature related to taxonomy or genotype 2. Curator enters genotype(or taxonomy) 3. Curator enters genetic context (optional) 4. Curator searches/enters Entity (e.g. Anatomy) 5. Curator searches/enters PATO attribute/value Phenote ZFIN integration Also Phenote Other ontologies… Anatomy Cell Chemical Drug Disease Environmental context ... Qualifier Unit GO - biological process GO - molecular function GO - cellular component OBO-Foundry OBO was conceived and announced in in 2001 10 october 2001 Michael Ashburner and Suzanna Lewis with acknowledgements of others in the GO community. open biology ontologies - obo.org The success of GO leads us to propose the formation of an umbrella body which we call OBO - open biology ontologies. OBO would act as an umbrella site for GO and other ontologies within the broad domain of biology. Open Biomedical Ontology It is critical that ontologies are developed cooperatively so that their classification strategies augment one another. The OBO-Foundry is: foun·dry An establishment where metal is melted and poured into molds OBO-foun·dry An establishment where scientific theory is formalized and represented in ontologies The OBO Foundry The developers of OBO ontologies agree in advance to accept a common set of principles designed to assure intelligibility to biologist curators, annotators, users formal robustness stability compatibility interoperability support for logic-based reasoning To create the conditions for a step-bystep evolution towards robust gold standard reference ontologies in the biomedical domain. To introduce some of the features of scientific peer review into biomedical ontology development. obofoundry.org RELATION TO TIME CONTINUANT INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy?) CELL AND CELLULAR COMPONENT Cell (CL) MOLECULE Anatomical Organ Entity Function (FMA, (FMP, CPRO) Phenotypic CARO) Quality (PaTO) Cellular Cellular Component Function (FMA, GO) (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) Building out from the original GO Agree on relations The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned. Genome Biology 6:R46, 2005. Elements for Success 1 A Community with a common vision A pool of talented and motivated developers/scientists A mix of academic and commercial An organized, light weight approach to product development A leadership structure Communication A well-defined scope, (our “business”) Adopted from “Open Source Menu for Success” Elements for Success 2: There is no requirement that ontology be done using any particular technology. Michael Ashburner Judith Blake J. Michael Cherry David Hill Midori Harris Rex Chisholm And many more… With thanks to* Others… GO Seth Carbon John Day-Richter Karen Eilbeck Mark Gibson Chris Mungall Shu Shengqiang Nicole Washington Berkeley BIRN DictyBase FlyBase MGI NESCENT ZFIN And more… Mark Musen Barry Smith Monte Westerfield Michael Ashburner Daniel Rubin And more… NCBO *Without even going into our other projects: Apollo, SO, Chado, GMOD, DAS,