* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Two-hybrid screening wikipedia , lookup
Genetic engineering wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Clinical neurochemistry wikipedia , lookup
Point mutation wikipedia , lookup
Biochemical cascade wikipedia , lookup
Expression vector wikipedia , lookup
Copy-number variation wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression wikipedia , lookup
Molecular ecology wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene nomenclature wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Part II GO-Vocabulary of Genome S. cerevisiae D. melanogaster C elegans Cells that normally survive CED-9 ON CED-3 CED-4 OFF Cells that normally die CED-9 OFF CED-3 CED-4 ON M. musculus Comparison of sequences from 4 organisms MCM3 MCM2 CDC46/MCM5 CDC47/MCM7 CDC54/MCM4 MCM6 These proteins form a hexamer in the species that have been examined The Gene Ontologies A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else! Gene Ontology - 1998 FlyBase Drosophila Cambridge, EBI, Harvard Berkeley & Bloomington. SGD Saccharomyces Stanford. MGI Mus Jackson Labs., Bar Harbor. Gene Ontology -now • • • • • • • • • • • Fruitfly - FlyBase Budding yeast - Saccharomyces Genome Database (SGD) Mouse - Mouse Genome Database (MGD & GXD) Rat - Rat Genome Database (RGD) Weed - The Arabidopsis Information Resource (TAIR) Worm - WormBase Dictyostelium discoidem - Dictybase InterPro/UniProt at EBI - InterPro Fission yeast - Pombase Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB Sanger • Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR • Grasses - rice & maize - Gramene database • zebra fish – Zfin ......... To provide structured controlled vocabularies for the representation of biological knowledge in biological databases. • Be open source • Use open standards • Make data & code available without constraint • Involve your community Outline • Introduction to the Gene Ontologies (GO) • Annotations to GO terms • GO Tools • Applications of GO Gene Ontology Objectives • GO represents concepts used to classify specific parts of our biological knowledge: – Biological Process – Molecular Function – Cellular Component • GO develops a common language applicable to any organism • GO terms can be used to annotate gene products from any species, allowing comparison of information across species GO: Three ontologies What does it do? Molecular Function What processes is it involved in? Biological Process Where does it act? Cellular Component gene product Example: Gene Product = hammer Function (what) Process (why) Drive nail (into wood) Carpentry Drive stake (into soil) Gardening Smash roach Pest Control Clown’s juggling object Entertainment Biological Examples Biological Process Molecular Function Cellular Component The 3 Gene Ontologies • Molecular Function = elemental activity/task – the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity • Biological Process = biological goal or objective – broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Cellular Component = location or complex – subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme Molecular Function • A single reaction or activity, not a gene product • A gene product may have several functions • Sets of functions make up a biological process Molecular Function Carbonate dehydratase activity Biological Process Gluconeogenesis Cellular Component • where a gene product acts Mitochondrial membrane What’s in a GO term? term: gluconeogenesis id: GO:0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. What’s in a name? Content of GO Molecular Function Biological Process Cellular Component 7,309 terms 10,041 terms 1,629 terms Total 18, 975 terms Definitions: Obsolete terms: 94.9 % 992 As of October 2005 What’s in a name? • • • • • Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis • All refer to the process of making glucose from simpler components tree directed acyclic graph Parent-Child Relationships Nucleus Nucleoplasm A child is a subset of a parent’s elements Nuclear envelope Nucleolus Chromosome Perinuclear space The cell component term Nucleus has 5 children Ontology Relationships Directed Acyclic Graph Evidence Codes for GO Annotations http://www.geneontology.org/doc/GO.evidence.html IEA ISS IEP IMP IGI IPI IDA RCA TAS NAS IC ND Inferred from Electronic Annotation Inferred from Sequence Similarity Inferred from Expression Pattern Inferred from Mutant Phenotype Inferred from Genetic Interaction Inferred from Physical Interaction Inferred from Direct Assay Inferred from Reviewed Computational Analysis Traceable Author Statement Non-traceable Author Statement Inferred by Curator No biological Data available IEA Inferred from Electronic Annotation • Sequence Similarity (BLAST) • Automatic transfer from mappings (InterPro2GO, EC2GO etc.) -> Not manually reviewed ISS Inferred from Sequence or Structural Similarity • Sequence similarity • Recognized domains • Structural similarity -> Use of ‘with’ column recommended IEP Inferred from Expression Pattern • Transcript levels (Northerns, microarrays) • Protein levels (Western blots) -> Timing or localization of expression -> Biological process annotations IMP Inferred from Mutant Phenotype • Gene mutation/knockout • Overexpression/ectopic expression • Anti-sense experiments • RNAi experiments • Specific protein inhibitors IGI Inferred from Genetic Interaction • Suppressors, synthetic lethals… • Functional complementation • Rescue experiments -> Use of ‘with’ column recommended IPI Inferred from Physical Interaction • 2-hybrid interactions • Co-purification • Co-immunoprecipitation • Ion/complex/protein binding experiments -> Use of ‘with’ column recommended IDA Inferred from Direct Assay • Enzyme assays • In vitro reconstitution (e.g. transcription) • Immunofluorescence (for cell. comp.) • Cell fractionation (for cell. comp.) • Physical interaction/binding assay RCA Inferred from Reviewed Computational Analysis • Non-sequence-based computational methods • Genome-wide analyses (e.g. 2-hybrid) • Combinations of large-scale experiments TAS Traceable Author Statement • Support from review article • Textbook ‘common knowledge’ -> Data that can be ‘traced’ back NAS Non-traceable Author Statement • Database entries that don't cite a paper -> Data that cannot be ‘traced’ back IC Inferred by Curator • Not supported by any direct evidence • Inferred from other GO annotations -> GO term in ‘with/from’ column required ND No biological Data available Curator found no information supporting any annotation • molecular function unknown GO:0005554 • biological process unknown GO:0000004 • cellular component unknown GO:0008372 Term Hierarchy TAS/IDA IMP/IGI/IPI ISS/IEP NAS IEA Annotation summaries Meloidogyne incognita: McCarter et al. 2003 Annotation of gene products with GO terms Mitochondrial P450 Cellular component: mitochondrial inner membrane GO:0005743 Biological process: Electron transport GO:0006118 substrate + O2 = CO2 +H20 product Molecular function: monooxygenase activity GO:0004497 Other gene products annotated to monooxygenase activity (GO:0004497) - monooxygenase, DBH-like 1 (mouse) - prostaglandin I2 (prostacyclin) synthase (mouse) - flavin-containing monooxygenase (yeast) - ferulate-5-hydrolase 1 (arabidopsis) Unknown v.s. Unannotated • “Unknown” is used when the curator has determined that there is no existing literature to support an annotation. – Biological process unknown GO:0000004 – Molecular function unknown GO:0005554 – Cellular component unknown GO:0008372 • NOT the same as having no annotation at all – No annotation means that no one has looked yet Annotation of a genome • GO annotations are always work in progress • Part of normal curation process – More specific information – Better evidence code • Replace obsolete terms • “Last reviewed” date How to access the Gene ontology and its annotations 1. Downloads • Ontologies • Annotations : Gene association files • Ontologies and Annotations 2. Web-based access • AmiGO (http://www.godatabase.org) • QuickGO (http://www.ebi.ac.uk/ego) among others… Gene Ontology: …analysis of high-throughput data according to GO MicroArray data analysis time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Puparial adhesion Molting cycle hemocyanin Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Immune response Toll regulated genes attacked control Selected Gene Tree: pearson Coloredby: by: ene Tree: pearson lw n3d ... lw n3d ... Colored Branch color classification: Set_LW_n3d_5p_... Gene List: r classification: Set_LW_n3d_5p_... Gene List: Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI. Copy of Copy C5_RMA Copy ofofCopy of(Defa... C5_RMA (Defa... allall genes (14010)(14010) genes Developmental Stage Molecular Disease Metabolic Ontologies Pathway Phenotype Anatomy Physiology