* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download GO: The Gene Ontology
Clinical neurochemistry wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Biochemical cascade wikipedia , lookup
Signal transduction wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Genetic engineering wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Copy-number variation wikipedia , lookup
Magnesium transporter wikipedia , lookup
Community fingerprinting wikipedia , lookup
Point mutation wikipedia , lookup
Expression vector wikipedia , lookup
Molecular ecology wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Gene therapy wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene expression wikipedia , lookup
Gene desert wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene nomenclature wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology S. cerevisiae D. melanogaster C elegans Cells that normally survive CED-9 ON CED-3 CED-4 OFF Cells that normally die CED-9 OFF CED-3 CED-4 ON M. musculus Comparison of sequences from 4 organisms MCM3 MCM2 CDC46/MCM5 CDC47/MCM7 CDC54/MCM4 MCM6 These proteins form a hexamer in the species that have been examined The Gene Ontologies A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else! Gene Ontology - 1998 FlyBase Drosophila Cambridge, EBI, Harvard Berkeley & Bloomington. SGD Saccharomyces Stanford. MGI Mus Jackson Labs., Bar Harbor. Gene Ontology -now • • • • • • • • • • • Fruitfly - FlyBase Budding yeast - Saccharomyces Genome Database (SGD) Mouse - Mouse Genome Database (MGD & GXD) Rat - Rat Genome Database (RGD) Weed - The Arabidopsis Information Resource (TAIR) Worm - WormBase Dictyostelium discoidem - Dictybase InterPro/UniProt at EBI - InterPro Fission yeast - Pombase Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB Sanger • Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR • Grasses - rice & maize - Gramene database • zebra fish – Zfin ......... To provide structured controlled vocabularies for the representation of biological knowledge in biological databases. • Be open source • Use open standards • Make data & code available without constraint • Involve your community Gene Ontology Objectives • GO represents concepts used to classify specific parts of our biological knowledge: – Biological Process – Molecular Function – Cellular Component • GO develops a common language applicable to any organism • GO terms can be used to annotate gene products from any species, allowing comparison of information across species GO: Three ontologies What does it do? Molecular Function What processes is it involved in? Biological Process Where does it act? Cellular Component gene product Content of GO Molecular Function Biological Process Cellular Component 7,309 terms 10,041 terms 1,629 terms Total 18, 975 terms Definitions: Obsolete terms: 94.9 % 992 What’s in a GO term? term: gluconeogenesis id: GO:0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. Annotation of gene products with GO terms Mitochondrial P450 Cellular component: mitochondrial inner membrane GO:0005743 Biological process: Electron transport GO:0006118 substrate + O2 = CO2 +H20 product Molecular function: monooxygenase activity GO:0004497 Other gene products annotated to monooxygenase activity (GO:0004497) - monooxygenase, DBH-like 1 (mouse) - prostaglandin I2 (prostacyclin) synthase (mouse) - flavin-containing monooxygenase (yeast) - ferulate-5-hydrolase 1 (arabidopsis) What’s in a name? • • • • • Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis • All refer to the process of making glucose from simpler components tree directed acyclic graph Parent-Child Relationships Nucleus Nucleoplasm A child is a subset of a parent’s elements Nuclear envelope Nucleolus Chromosome Perinuclear space The cell component term Nucleus has 5 children Ontology Relationships Directed Acyclic Graph Evidence Codes for GO Annotations http://www.geneontology.org/doc/GO.evidence.html IEA ISS IEP IMP IGI IPI IDA RCA TAS NAS IC ND Inferred from Electronic Annotation Inferred from Sequence Similarity Inferred from Expression Pattern Inferred from Mutant Phenotype Inferred from Genetic Interaction Inferred from Physical Interaction Inferred from Direct Assay Inferred from Reviewed Computational Analysis Traceable Author Statement Non-traceable Author Statement Inferred by Curator No biological Data available Annotation summaries Meloidogyne incognita: McCarter et al. 2003 Two types of GO Annotations: Electronic Annotation Manual Annotation All annotations must: • be attributed to a source • indicate what evidence was found to support the GO term-gene/protein association Manual Annotations • High–quality, specific gene/gene product associations made, using: • Peer-reviewed papers • Evidence codes to grade evidence BUT – is very time consuming and requires trained biologists Manual Annotations: Methods 1. Extract information from published literature 2. Curators performs manual sequence similarity analyses to transfer annotations between highly similar gene products (BLAST, protein domain analysis) Finding GO terms In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GFP fusion kinase proteinactivity, to the serine/threonine plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… integral membrane protein wound response PubMed ID: 12374299 Function: protein serine/threonine kinase activity GO:0004674 Component: integral to plasma membrane GO:0005887 Process: response to wounding GO:0009611 Electronic Annotations • Provides large-coverage • High-quality BUT – annotations tend to use high-level GO terms and provide little detail. Electronic Annotations: Methods 1. Database entries • Manual mapping of GO terms to concepts external to GO (‘translation tables’) • Proteins then electronically annotated with the relevant GO term(s) 2. Automatic sequence similarity analyses to transfer annotations between highly similar gene products Electronic Annotations Fatty acid biosynthesis (Swiss-Prot Keyword) EC:6.4.1.2 (EC number) GO:Fatty acid biosynthesis (GO:0006633) GO:acetyl-CoA carboxylase activity (GO:0003989) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) GO:acetyl-CoA carboxylase activity (GO:0003989) Mappings of external concepts to GO EC:1.1.1.1 > EC:1.1.1.10 > EC:1.1.1.104 > EC:1.1.1.105 > GO:alcohol dehydrogenase activity ; GO:0004022 GO:L-xylulose reductase activity ; GO:0050038 GO:4-oxoproline reductase activity ; GO:0016617 GO:retinol dehydrogenase activity ; GO:0004745 Additional points • A gene product can have several functions, cellular locations and be involved in many processes • Annotation of a gene product to one ontology is independent from its annotation to other ontologies • Annotations are only to terms reflecting a normal activity or location • Usage of ‘unknown’ GO terms Unknown v.s. Unannotated • “Unknown” is used when the curator has determined that there is no existing literature to support an annotation. – Biological process unknown GO:0000004 – Molecular function unknown GO:0005554 – Cellular component unknown GO:0008372 • NOT the same as having no annotation at all – No annotation means that no one has looked yet Annotation of a genome • GO annotations are always work in progress • Part of normal curation process – More specific information – Better evidence code • Replace obsolete terms • “Last reviewed” date How to access the Gene ontology and its annotations 1. Downloads • Ontologies • Annotations : Gene association files • Ontologies and Annotations 2. Web-based access • AmiGO (http://www.godatabase.org) • QuickGO (http://www.ebi.ac.uk/ego) among others… 组别 A C D E H M S 第四讲:讨论论文(课堂讨论 时间5分左右)