* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Copy-number variation wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
Genetic engineering wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Expression vector wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Point mutation wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy wikipedia , lookup
Gene expression wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene nomenclature wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA The Gene Ontologies A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else! Outline of Topics Introduction to the Gene Ontologies (GO) Annotations to GO terms GO Tools Applications of GO Gene Ontology - Gene annotation system - Controlled vocabulary that can be applied to all organisms - Used to describe gene products What’s in a name? What is a cell? Cell Cell Cell Cell Cell Image from http://microscopy.fsu.edu Bud initiation? = bud initiation sensu Metazoa = bud initiation sensu Saccharomyces = bud initiation sensu Viridiplantae What’s in a name? The same name can be used to describe different concepts What’s in a name? What’s in a name? Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis All refer to the process of making glucose from simpler components What’s in a name? The same name can be used to describe different concepts A concept can be described using different names Comparison is difficult – in particular across species or across databases What is the Gene Ontology? A (part of the) solution: - A controlled vocabulary that can be applied to all organisms - Used to describe gene products - proteins and RNA - in any organism How does GO work? What information might we want to capture about a gene product? What does the gene product do? Why does it perform these activities? Where does it act? The 3 Gene Ontologies Molecular Function = elemental activity/task Biological Process = biological goal or objective the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions Cellular Component = location or complex subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme Example: Gene Product = hammer Function (what) Process (why) Drive nail (into wood) Carpentry Drive stake (into soil) Gardening Smash roach Pest Control Clown’s juggling object Entertainment Ontology Structure Ontologies can be represented as graphs, where the nodes are connected by edges Nodes = concepts in the ontology Edges = relationships between the concepts node edge node node Ontology Structure The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG) Terms can have more than one parent and zero, one or more children Terms are linked by two relationships is-a part-of Directed Acyclic Graphs (DAG) protein complex organelle mitochondrion [other protein complexes] fatty acid beta-oxidation multienzyme complex is-a part-of [other organelles] Parent-Child Relationships Nucleus Nucleoplasm A child is a subset of a parent’s elements Nuclear envelope Nucleolus Chromosome Perinuclear space The cell component term Nucleus has 5 children True Path Rule The path from a child term all the way up to its top-level parent(s) must always be true cell is-a cytoplasm chromosome nuclear chromosome cytoplasmic chromosome mitochondrial chromosome nucleus nuclear chromosome part-of What’s in a GO term? term: gluconeogenesis id: GO:0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. Annotation of gene products with GO terms Mitochondrial P450 Cellular component: mitochondrial inner membrane GO:0005743 Biological process: Electron transport GO:0006118 substrate + O2 = CO2 +H20 product Molecular function: monooxygenase activity GO:0004497 Other gene products annotated to monooxygenase activity (GO:0004497) - monooxygenase, DBH-like 1 (mouse) - prostaglandin I2 (prostacyclin) synthase (mouse) - flavin-containing monooxygenase (yeast) - ferulate-5-hydrolase 1 (arabidopsis) Two types of GO Annotations: Electronic Annotation Manual Annotation All annotations must: • be attributed to a source • indicate what evidence was found to support the GO term-gene/protein association IEA ISS IEP IMP IGI IPI IDA RCA TAS NAS IC ND Inferred from Electronic Annotation Inferred from Sequence Similarity Inferred from Expression Pattern Inferred from Mutant Phenotype Inferred from Genetic Interaction Inferred from Physical Interaction Inferred from Direct Assay Inferred from Reviewed Computational Analysis Traceable Author Statement Non-traceable Author Statement Inferred by Curator No biological Data available Ensuring Stability in a Dynamic Ontology • Terms become obsolete when they are removed or redefined • GO IDs are never deleted • For each term, a comment is added to explains why the term is now obsolete Biological Process Molecular Function Cellular Component Obsolete Biological Process Obsolete Molecular Function Obsolete Cellular Component Why modify the GO GO reflects current knowledge of biology New organisms being added makes existing terms arrangements incorrect Not everything perfect from the outset What can scientists do with GO? • Access gene product functional information • Find how much of a proteome is involved in a process/ function/ component in the cell • Map GO terms and incorporate manual annotations into own databases • Provide a link between biological knowledge and … • gene expression profiles • proteomics data Microarray analysis Whole genome analysis (J. D. Munkvold et al., 2004) http://www.geneontology.org/GO.tools Beyond GO – Open Biomedical Ontologies • Orthogonal to existing ontologies to facilitate combinatorial approaches - Share unique identifier space - Include definitions • Anatomies • Cell Types • Sequence Attributes • Temporal Attributes • Phenotypes • Diseases • More…. http://obo.sourceforge.net