* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2005-05_Purdue_edimmer
Epigenetics of human development wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genome evolution wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Protein moonlighting wikipedia , lookup
Gene expression programming wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene Ontology (GO) Emily Dimmer [email protected] GOA group European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK GO Tutorial Outline: • Introduction to GO • Description of the GO ontologies • How groups annotate to GO • Practical: • Investigating the GO and OBO web sites • Browsing the GO using the AmiGO Browser. • Open Biomedical Ontologies • How GO is being used • Available Tools • GO slims • Practical: • Creating your own GO slim GO Tutorial Outline: • Introduction to GO • Description of the GO ontologies • How groups annotate to GO • Practical: • Investigating the GO and OBO web sites • Browsing the GO using the AmiGO Browser. • Open Biomedical Ontologies • How GO is being used • Available Tools • GO slims • Practical: • Creating your own GO slim GO Tutorial Outline: • Introduction to GO • Description of the GO ontologies • How groups annotate to GO • Practical: • Investigating the GO and OBO web sites • Browsing the GO using the AmiGO Browser. • Open Biomedical Ontologies • How GO is being used • Available Tools • GO slims • Practical: • Creating your own GO slim GO Tutorial Outline: • Introduction to GO • Description of the GO ontologies • How groups annotate to GO • Practical: • Investigating the GO and OBO web sites • Browsing the GO using the AmiGO Browser. • Open Biomedical Ontologies • How GO is being used • Available Tools • GO slims • Practical: • Creating your own GO slim Why is GO needed ? THE PROBLEM: • Huge body of knowledge with an extremely large vocabulary to describe it • Vocabulary used is poorly defined – i.e. one word can have different meanings – or different names for the same concept • Biological systems are complex and our knowledge of such systems is incomplete RESULT: Large databases which are difficult to manage and impossible to mine computationally What is GO? • A (part of the) solution: GO: “a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing” What can scientists do with GO? • Access gene product functional information • Provide a link between biological knowledge and … •gene expression profiles • proteomics data • Find how much of a proteome is involved in a process/ function/ component in the cell • using a GO-Slim (a slimmed down version of GO to summarize biological attributes of a proteome) • Map GO terms and incorporate manual GOA annotation into own databases • to enhance your dataset • or to validate automated ways of deriving information about gene function (text-mining). Tactition Taction Tactile sense ? Tactition Taction Tactile sense perception of touch ; GO:0050975 GO Three (Orthogonal) Ontologies •Molecular Function: elemental activity or task e.g. DNA binding, catalysis of a reaction •Biological Process: broad objective or goal e.g. mitosis, signal transduction, metabolism •Cellular Component: location or complex e.g. nucleus, ribosome GO Three (Orthogonal) Ontologies •Molecular Function: elemental activity or task e.g. DNA binding, catalysis of a reaction •Biological Process: broad objective or goal e.g. mitosis, signal transduction, metabolism •Cellular Component: location or complex e.g. nucleus, ribosome GO Three (Orthogonal) Ontologies •Molecular Function: elemental activity or task e.g. DNA binding, catalysis of a reaction •Biological Process: broad objective or goal e.g. mitosis, signal transduction, metabolism •Cellular Component: location or complex e.g. nucleus, ribosome GO Three (Orthogonal) Ontologies •Molecular Function: elemental activity or task e.g. DNA binding, catalysis of a reaction •Biological Process: broad objective or goal e.g. mitosis, signal transduction, metabolism •Cellular Component: location or complex e.g. nucleus, ribosome How does GO work? • Provides a standard, species-neutral way of representing biology • GO covers ‘normal’ functions and processes – No pathological processes – No experimental conditions Content of GO Molecular Function Biological Process Cellular Component 7,493 terms 9,640 terms 1,634 terms Total 18,767 terms Definitions: 16,696 (93.9 %) What is GO? • NOT a system of nomenclature or a list of gene products • GO doesn’t attempt to cover all aspects of biology or evolutionary relationships Open Biomedical Ontologies http://obo.sourceforge.net • NOT a dictated standard • NOT a way to unify databases http://www.geneontology.org Reactome Anatomy of a GO term • GO terms are composed of: • Term name • Unique GO ID • Definition (93 % of GO terms are defined) • Synonyms (optional) • Database references (optional) • Relationships to other GO terms I. The GO Ontologies Ontologies • “Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing” (Gruber 1993) Ontology applications Can be used to: • Formalise the representation of biological knowledge • Describe a common and defined vocabulary for database annotation • Standardise database submissions • Provide unified access to information through ontology-based querying of databases, both human and computational • Improve management and integration of data within databases. • Facilitate data mining Ontology Structure • Ontologies can be represented as graphs, where the vertices (nodes and leaves) are connected by edges. • The nodes are concepts in the ontology. • The edges are the relationships between the concepts node edge node node Ontology Structure • The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG). • Terms are linked by two relationships – is-a – part-of • Terms can have more than one parent Simple hierarchies (Trees) Directed Acyclic Graphs Directed Acyclic Graph cell membrane mitochondrial membrane is-a part-of chloroplast chloroplast membrane True Path Rule • The path from a child term all the way up to its top-level parent(s) must always be true is-a part-of cell cytoplasm chromosome nuclear chromosome nucleus nuclear chromosome Ensuring Stability in a Dynamic Ontology • Terms become obsolete when they are removed or redefined • GO IDs are never deleted • For each term, a comment is added to explains why the term is now obsolete Biological Process Molecular Function Cellular Component Obsolete Biological Process Obsolete Molecular Function Obsolete Cellular Component Access to the Gene Ontology • Downloads • formats available: OBO GO XML OWL MySQL (http://www.geneontology.org/GO.downloads) • Web-based tools • AmiGO (http://www.godatabase.org) • QuickGO (http://www.ebi.ac.uk/ego) II. Annotating to GO Use of GO terms to represent the activities and localizations of gene products. Basic information needed: 1. Database object (e.g. a protein or gene identifier) e.g. Q9ARH1 2. Reference ID e.g. PubMed ID: 12374299 3. GO term ID e.g. GO:0004674 4. Evidence code e.g. TAS GenNav: http://etbsun2.nlm.nih.gov:8000/perl/gennav.pl J. Clark et al. Plant Physiology 2005 (in press) Two types of GO Annotation: Electronic Annotation Manual Annotation All annotations must: • be attributed to a source. • indicate what evidence was found to support the GO term-gene/protein association. Electronic Annotation • Provides large-coverage • High-quality • BUT annotations tend to use high-level GO terms and provide little detail. Electronic Annotation 1. Assignment of GO terms to gene products using existing information within database entries • Manual mapping of GO terms to concepts external to GO (‘translation tables’). • Proteins then electronically annotated with the relevant GO term(s). 2. Automatic sequence analyses to transfer annotations between highly similar gene products Electronic Annotation Fatty acid biosynthesis ( Swiss-Prot Keyword) EC:6.4.1.2 (EC number) GO:Fatty acid biosynthesis (GO:0006633) GO:acetyl-CoA carboxylase activity (GO:0003989) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) MF_00527: Putative 3methyladenine DNA glycosylase (HAMAP) GO:acetyl-CoA carboxylase activity (GO:0003989) GO:DNA repair (GO:0006281) Mappings of external concepts to GO http://www.geneontology.org/GO.indices.shtml Evaluation of precision of annotation electronic techniques (InterPro2GO, SPKW2GO, EC2GO) • Compared manually-curated test set of GO annotated proteins with the electronic annotations • InterPro2GO = most coverage • EC2GO = 67 % of predictions exactly match the manual GO annotation. • 91-100 % of time the 3 mappings predicted GO terms within the same lineage Camon et al. BMC Bioinformatics 2005 in press Manual Annotation • High–quality, specific gene/gene product associations made, using: • Peer-reviewed papers • Evidence codes to grade evidence BUT – is very time consuming and requires trained biologists Finding GO terms …for B. napus PERK1 protein (Q9ARH1) In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein protein…these kinases have been implicated in early stages of wound woundresponse response… PubMed ID: 12374299 Function: protein serine/threonine kinase activity GO:0004674 Component: integral to plasma membrane GO:0005887 Process: response to wounding GO:0009611 GO Evidence Codes Code Definition *IEA Inferred from Electronic Annotation •Enzyme assays Inferred from Direct Assay IDA IDA: IEP •In vitro reconstitution Inferred from Expression Pattern (transcription) *IGI Inferred from Genetic Interaction •Immunofluorescence *With column IMP Inferred from Mutant Phenotype •Cell fractionation required *IPI Inferred from Physical Interaction *ISS Inferred from Sequence Similarity TAS Traceable Author Statement NAS Non-traceable Author Statement *IC RCA ND Manually annotated TAS: •In the literature source the original experiments referred to are traceable Inferred from Curator (referenced). Inferred from Reviewed Computational Analysis No Data GO Evidence Codes • additional needed identifier for annotations using certain evidence codes IGI: Code Definition *IEA IDA Inferred from Electronic Annotation • a gene identifier for the Inferred from Direct Assay "other" gene involved in the IEP Inferred from Expression Pattern *IGI Inferred from Genetic Interaction IMP Inferred from Mutant Phenotype *IPI Inferred from Physical Interaction *ISS TAS interaction *With column required IPI: • a gene or protein identifier Manually for the "other" protein Inferred from Sequence Similarity annotated involved in the interaction Traceable Author Statement NAS Non-traceable Author Statement *IC Inferred from Curator RCA Inferred from Reviewed Computational • GO term from another Analysis annotation used as the ND No Data IC: basis of a curator inference …some extra things: • Annotation of a gene product to one ontology is independent from its annotation to other ontologies. • Terms reflecting a normal activity or location are only annotated to. • Usage of ‘unknown’ GO terms (e.g. Molecular function unknown GO:0005554) …some extra things: Qualifier Information A set of ‘Qualifier’ terms is also available to curators modify the interpretation of an annotation. Allowable values: 1. NOT • a gene product is not associated with the GO term • to document conflicting claims in the literature. 2. Contributes to • distinguishes between individual subunits functions and whole complex functions • (used with GO Function Ontology) 3. Colocalizes with • Transiently or peripherally associated with an organelle or complex • where the resolution of an assay is not accurate. (used with GO Component Ontology) …some extra things: • The Qualifier column can be used to modify the interpretation of an annotation. Allowable values: 1. NOT • a gene product is not associated with the GO term • to document conflicting claims in the literature. 2. Contributes to • distinguishes between individual subunits functions and whole complex functions • (used with GO Function Ontology) 3. Colocalizes with • Transiently or peripherally associated with an organelle or complex • where the resolution of an assay is not accurate. (used with GO Component Ontology) …some extra things: • The Qualifier column can be used to modify the interpretation of an annotation. Allowable values: 1. NOT • a gene product is not associated with the GO term • to document conflicting claims in the literature. 2. Contributes to • distinguishes between individual subunits functions and whole complex functions • (used with GO Function Ontology) 3. Colocalizes with • Transiently or peripherally associated with an organelle or complex • where the resolution of an assay is not accurate. (used with GO Component Ontology) …some extra things: • The Qualifier column can be used to modify the interpretation of an annotation. Allowable values: 1. NOT • a gene product is not associated with the GO term • to document conflicting claims in the literature. 2. Contributes to • distinguishes between individual subunit functions and whole complex functions • (used with GO Function Ontology) 3. Colocalizes with • Transiently or peripherally associated with an organelle or complex • where the resolution of an assay is not accurate. (used with GO Component Ontology) Accessing annotations to the Gene Ontology 1. Downloads • Annotations – gene association files • Ontologies and annotations – MySQL and XML 2. Web-based access • AmiGO (http://www.godatabase.org) • QuickGO (http://www.ebi.ac.uk/ego) …among others… Gene Association File DB DB_Object_ID DB_Object_Symbol Qualifier UniProt UniProt UniProt P06703 P06703 P06703 S106_HUMAN S106_HUMAN S106_HUMAN DB_Object_Name NOT DB_Object_Synonym Calcyclin Calcyclin Calcyclin IPI00027463 IPI00027463 IPI00027463 GOid GO:0008083 GO:0007409 GO:0005515 DB:Reference GOA:spkw PMID:12152788 PMID:12577318 DB_Object_Type protein protein protein taxon taxon:9606 taxon:9606 taxon:9606 Evidence IEA NAS IPI With Aspect F P UniProt:P50995 F Date Assigned by 20040426 20030721 20030721 • via web (GO consortium page) http://www.geneontology.org/GO.current.annotations.shtml • UniProt UniProt UniProt http://www.geneontology.org/GO.current.annotations.shtml Summary • GO is still being developed and updated - it requires a serious and ongoing effort. – the biological community is involved • New model organism databases are joining the GO Consortium annotation effort Practical session 1. Visit the GO website 2. Visit the OBO website 3. Browse the ontologies using the official GO Consortium Browser – AmiGO Part 1. GO web site: www.geneontology.org OBO web site: http://obo.sourceforge.net AmiGO: http://www.godatabase.org GO terms with no children Querying the GO Search for GO terms or by Gene symbol/name Filter queries by organism, data source or evidence Querying the GO Querying the GO GOst tool GOst tool QuickGO browser: http://www.ebi.ac.uk/ego QuickGO browser: http://www.ebi.ac.uk/ego QuickGO browser: http://www.ebi.ac.uk/ego OBO and Gene Ontology Uses and Tools Developmental Stage Molecular Disease Metabolic Ontologies Pathway Phenotype Anatomy Physiology Beyond GO – Open Biomedical Ontologies • Orthogonal to existing ontologies to facilitate combinatorial approaches - Share unique identifier space - Include definitions • Anatomies • Cell Types • Sequence Attributes • Temporal Attributes • Phenotypes • Diseases • More…. http://obo.sourceforge.net Sequence Ontology http://song.sourceforge.net • Ontology of ‘small molecular entities’ http://www.ebi.ac.uk/chebi http://www.fruitfly.org/cgi-bin/ex/go.cgi Access to GO and its annotations How to access the Gene ontology and its annotations 1. Downloads • Ontologies – (various – GO, OBO, XML, OWL MySQL) • Annotations – gene association files • Ontologies and Annotations – MySQL and XML 2. Web-based access • AmiGO (http://www.godatabase.org) • QuickGO (http://www.ebi.ac.uk/ego) among others… http://www.ncbi.nlm.nih.gov/entrez www.uniprot.org/ http://www.ebi.ac.uk/intact SRS view… http://srs.ebi.ac.uk www.ensembl.org/ www.ensembl.org/ www.ensembl.org/ What can scientists do with GO? • Access gene product functional information • Provide a link between biological knowledge and … •gene expression profiles • proteomics data • Find how much of a proteome is involved in a process/ function/ component in the cell • using a GO-Slim (a slimmed down version of GO to summarize biological attributes of a proteome) • Map GO terms and incorporate manual GOA annotation into own databases • to enhance your dataset • or to validate automated ways of deriving information about gene function (text-mining). …analysis of high-throughput data according to GO MicroArray data analysis time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Puparial adhesion Molting cycle hemocyanin Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Immune response Toll regulated genes attacked control Selected Gene Tree: pearson Coloredby: by: ene Tree: pearson lw n3d ... lw n3d ... Colored Branch color classification: Set_LW_n3d_5p_... Gene List: r classification: Set_LW_n3d_5p_... Gene List: Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI. Copy of Copy C5_RMA Copy ofofCopy of(Defa... C5_RMA (Defa... allall genes (14010)(14010) genes …analysis of high-throughput data according to GO Proteomics data analysis GO classification Kislinger T et al, Mol Cell Proteomics, 2003 Analysis of Data: Clustering http://www.geneontology.org/GO.tools Color indicates up/down regulation GoMiner Tool, John Weinstein et al, Genome Biol. 4 (R28) 2003 Example of VLAD Output Compare annotations associated with the test set to the entire set of GO annotations…. DNA Repair seems to be a common theme. …overview proteome with GO Slim http://www.ebi.ac.uk/integr8 Off-the-shelf GO slims http://go.princeton.edu/cgi-bin/GOTermMapper map2slim.pl • distributed as part of the go-perl package • maps a set of annotations up to their parent GO slim terms Summary The Gene Ontology project precipitated a generalized implementation for ontologies for molecular biology Bio-ontologies such as GO have facilitated development of systems for hypothesis generation in biological systems Further integration – creation of cross-products between different ontologies Practical II – Creation of GO slims using the DAG-Edit tool. http://sourceforge.net/projects/geneontology/ …loading the GO …loading the GO …loading the GO …loading the GO …loading the GO …loading the GO ftp://ftp.geneontology.org/pub/go/ontology/gene_ontology.obo …loading the GO …loading the GO …browsing the GO …viewing GO terms …searching for GO terms …searching for GO terms …searching for GO terms …creating a new GO slim …creating a new GO slim …creating a new GO slim …creating a new GO slim …creating a new GO slim …creating a renderer for the GO slim …creating a renderer for the GO slim …creating a renderer for the GO slim …creating a renderer for the GO slim …creating a renderer for the GO slim …creating a renderer for the GO slim …adding terms to the GO slim …adding terms to the GO slim …adding terms to the GO slim …adding terms to the GO slim …filtering GO for terms in the GO slim …filtering GO for terms in the GO slim …filtering GO for terms in the GO slim …removing filters/renderers …saving the newly created GO slim