Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to the Gene Ontology and GO Annotation Resources EBI Bioinformatics Roadshow 15 March 2011 Düsseldorf, Germany Rebecca Foulger EBI is an Outstation of the European Molecular Biology Laboratory. OUTLINE OF TUTORIAL: PART I: Ontologies and the Gene Ontology (GO) PART II: GO Annotations How to access GO annotations How scientists use GO annotations GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 PART I: Gene Ontology GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 What’s in a name...? GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Q: What is a cell? A: It really depends who you ask! GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Different things can be described by the same name GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 The same thing can be described by different names: • • • • • Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Inconsistency in naming of biological concepts • Same name for different concepts • Different names for the same concept Comparison is difficult – in particular across species or across databases Just one reason why the Gene Ontology (GO) is is needed… GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Why do we need GO? • Inconsistency in naming of biological concepts • Large datasets need to be interpreted quickly •Increasing amounts of biological data available • Increasing amounts of biological data to come GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Increasing amounts of biological data available Search on mesoderm development…. you get 9441 results! Expansion of sequence information GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 1700s 1606 What is an ontology? • Dictionary: • A branch of metaphysics concerned with the nature and relations of being (philosophy) • A formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts (computer science) • Barry Smith: • The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality. GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 What is an ontology? • More usefully: • An ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things. GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 What is an ontology? • An ontology is more than just a list of terms (a controlled vocabulary) • A vocabulary of terms • Definitions for those terms • *** Defined logical relationships between the terms *** GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 What’s in an Ontology? GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 What is the Gene Ontology (GO)? A way to capture biological knowledge in a written and computable form Describes attributes of gene products (RNA and protein) GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 The scope of GO What information might we want to capture about a gene product? • What does the gene product do? • Where does it act? • How does it act? GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Biological Process what does a gene product do? A commonly recognised series of events transcription cell division GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Cellular Component where is a gene product located? • plasma membrane • mitochondrion • mitochondrial membrane • mitochondrial matrix • mitochondrial lumen • ribosome • large ribosomal subunit • small ribosomal subunit GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Molecular Function how does a gene product act? • • insulin binding • insulin receptor activity glucose-6-phosphate isomerase activity GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Three separate ontologies or one large one? • GO was originally three completely independent hierarchies, with no relationships between them • As of 2009, GO have started making relationships between biological process and molecular function in the live ontology GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Process Function Function GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 • GO IS: • species independent • covers normal processes • GO is NOT: • NO pathological/disease processes • NO experimental conditions • NO evolutionary relationships • NO gene products • NOT a nomenclature system GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Aims of the GO project • Compile the ontologies • Annotate gene products using ontology terms • Provide a public resource of data and tools GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Anatomy of a GO term Unique identifier Term name Definition Synonyms Crossreferences GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Ontology structure • GO is structured as a hierarchical directed acyclic graph (DAG) • Terms can have more than one parent and zero, one or more children • Terms are linked by relationships, which add to the meaning of the term node • Nodes = terms in the ontology edge node • Edges = relationships between the concepts node GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Relationships between GO terms • is_a • part_of • regulates • positively regulates • negatively regulates • has_part GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 is_a • If A is a B, then A is a subtype of B • mitotic cell cycle is a cell cycle • lyase activity is a catalytic activity. • Transitive relationship: can infer up the graph GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 part_of • Necessarily part of • Wherever B exists, it is as part of A. But not all B is part of A. A B • Transitive relationship (can infer up the graph) GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 regulates • One process directly affects another process or quality • Necessarily regulates: if both A and B are present, B always regulates A, but A may not always be regulated by B A B GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 has_part • Relationships are upside down compared to is_a and part_of • Necessarily has part GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 is_a complete • For all terms in the ontology, you have to be able to reach the root through a complete path of is_a relationships: • we call this being is_a complete • important for reasoning over the ontology, and ontology development GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 True path rule • Child terms inherit the meaning of all their parent terms. GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 How is GO maintained? • GO editors and annotators work with experts to remodel specific areas of the ontology • Signaling • Kidney development • Transcription • Pathogenesis • Cell cycle • Deal with requests from the community • database curators, researchers, software developers • Some simple requests can be dealt with automatically • GO Consortium meetings for large changes • Mailing lists, conference calls, content workshops GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Requesting changes to the ontology • Public Source Forge (SF) tracker for term related issues https://sourceforge.net/projects/geneontology/ Why modify the GO? • GO reflects current knowledge of biology • Information from new organisms can make existing terms and arrangements incorrect • Not everything perfect from the outset • Improving definitions • Adding in synonyms and extra relationships GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Ensuring Stability in a Dynamic Ontology • Terms become obsolete when they are removed or redefined • GO IDs are never deleted • For each term, a comment is added to explain why the term is now obsolete • Alternative GO terms are suggested to replace an obsoleted term GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Searching for GO terms http://www.ebi.ac.uk/QuickGO/ http://amigo.geneontology.org … there are more browsers available on the GO Tools page: http://www.geneontology.org/GO.tools.browsers.shtml The latest OBO Gene Ontology file can be downloaded from: http://www.geneontology.org/ontology/gene_ontology.obo GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Exercise Browsing the Gene Ontology using QuickGO • Exercise 1 GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 PART II: GO Annotation GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 E. Coli hub http://www.geneontology.org Reactome A GO annotation is… A statement that a gene product: 1. has a particular molecular function Or is involved in a particular biological process Or is located within a certain cellular component 2. as determined by a particular method 3. as described in a particular reference Accessio n Nam e GO ID GO term name Reference Evidence Code P00505 GOT2 GO:0004069 Aspartate transaminase activity PMID:2731362 IDA GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Evidence codes http://www.geneontology.org/GO.evidence.shtml IDA: enzyme assay IPI: e.g. Y2H BLASTs, orthology comparison, HMMs subcategories of ISS review papers GO evidence code decision tree GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Gene Ontology Annotation (GOA) The GOA database at the EBI is: • The largest open-source contributor of annotations to GO • Member of the GO Consortium since 2001 • Provides annotation for 321,998 species (February 2011 release) • GOA’s priority is to annotate the human proteome • GOA is responsible for human, chicken and bovine annotations in the GO consortium GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 GOA makes annotations using two methods • Electronic • Quick way of producing large numbers of annotations • Annotations are less detailed • Manual • Time-consuming process producing lower numbers of annotations • Annotations are very detailed and accurate GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Electronic annotation by GOA • 1. Mapping of external concepts to GO terms • InterPro2GO (protein domains) • SPKW2GO (UniProt/Swiss-Prot keywords) • HAMAP2GO (Microbial protein annotation) • EC2GO (Enzyme Commission numbers) • SPSL2GO (Swiss-Prot subcellular locations) • 2. Automatic transfer of annotations to orthologs Ensembl compara Macaque Chimpanzee Cow Guinea Pig Dog Rat Mouse Chicken GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Mappings of concepts from UniProtKB files Aspartate transaminase activity ; GO:0004069 lipid transport; GO:006869 GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Automatic transfer of annotations to orthologs Human Mouse Rat Zebrafish Xenopus Ensembl COMPARA • Homologies between different species calculated Currently provides 479,961 GO annotations for • GO terms projected from MANUAL annotation only (IDA, IEP, IGI, IMP, IPI) 60,515 proteins •from 49orthologies speciesused. (February 2011 release) One-to-one Macaque Chimpanzee Xenopus Human Guinea Pig Rat Human Mouse Tetraodon Rat Cow Dog Chicken Mouse Fugu Zebrafish Manual annotation by GOA • High-quality, specific annotations using: • Peer-reviewed papers • A range of evidence codes to categorize the types of evidence found in a paper www.ebi.ac.uk/GOA GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Finding annotations in a paper …for B. napus PERK1 protein (Q9ARH1) In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of serine/threonine kinase , In addition, the PERK1 has serine/threonine kinaseactivity activity, location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane integral membraneprotein protein…these kinases have been implicated in early stages of wound woundresponse… response PubMed ID: 12374299 Function: protein serine/threonine kinase activity GO:0004674 Component: integral to plasma membrane GO:0005887 Process: response to wounding GO:0009611 GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Additional information • Qualifiers Modify the interpretation of an annotation • • • NOT (protein is not associated with the GO term) colocalizes_with (protein associates with complex but is not a bona fide member) contributes_to (describes action of a complex of proteins) • 'With' column Can include further information on the method being referenced e.g. the protein accession of an interacting protein GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 The NOT qualifier • NOT is used to make an explicit note that the gene product is not associated with the GO term • Also used to document conflicting claims in the literature • NOT can be used with ALL three gene ontologies GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 In these cells, SIPP1 was mainly present in the nucleus, where it displayed a non-uniform, speckled distribution and appeared to be excludedfrom from the nucleoli excluded the nucleoli. GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 The colocalizes_with qualifier Gene products that are transiently or peripherally associated with an organelle or complex ONLY used with GO component ontology GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Immunoblot analysis with anti-PSI polyclonal antibodies of U1 snRNP particles affinity purified from Drosophila embryonic nuclear extracts showed that PSI PSI is is physically physicallyassociated associatedwith withU1 U1snRNP snRNP(Figure 1A, top panel). Associationof ofU1 U1snRNP snRNPwith with GST-PSI GST-PSI was detected Association by ethidium bromide staining of the selected snRNAs and was confirmed by blot hybridization with an antisense U1 snRNA riboprobe (Figure 1C, lane 4). GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 The contributes_to qualifier • Where an individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the whole complex • contributes_to is not needed to annotate a catalytic subunit. Furthermore, contributes_to may be used for any non-catalytic subunit, whether the subunit is essential for the activity of the complex or not • Annotations to contributes_to often use the IC evidence code, but others may also be used. ONLY used with GO function ontology GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 ..we next examined whether a complex of four proteins can be formed…. As shown in Figure 4, FLAG-tagged PIG-C was precipitated efficiently with antiFLAG beads in four combinations with other proteins (Figure 4A, lanes 1–4)….. These results strongly suggest that all four proteins form a complex. GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 whether the the protein .. To test whether proteincomplex complex consisting of PIG-A, has GlcNAc transferase transferase activity PIG-H, PIG-C and hGPI1 has GlcNAc activity in vitro…. …incubation of the radiolabeled donor of GlcNAc, UDP[6-3H]GlcNAc, with lysates of JY5 cells transfected with resultedininsynthesis synthesis of GlcNAc-PI GST-tagged PIG-A resulted GlcNAc-PIand and itssubsequent subsequent deacetylation to glucosa- minyl Its deacetylation to glucosa-minyl phosphatidylinositol (GlcN-PI) phosphatidylinositol (GlcN-PI) GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Unknown v.s. Unannotated • When there is no existing data to support an annotation, gene is annotated to the ROOT (top level) term • NOT the same as having no annotation at all • No annotation means that no one has looked yet GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 WITH column • The with column provides supporting evidence for ISS, IPI, IGI and IC evidence codes ISS: the accession of the aligned protein/ortholog IPI: the accession of the interacting protein IGI: the accession of the interacting gene IC: The GO:ID for the inferred_from term WITH column GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 How to access GO annotation data GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Where can you find annotations? UniProtKB Ensembl Entrez gene Gene Association Files • 17 column files containing all information for each annotation GO Consortium website GOA website GO browsers QuickGO browser Search GO terms or proteins Find sets of GO annotations GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Exercise Searching for GO annotations in QuickGO • Exercise 2 • Exercise 3 GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Exercise Using QuickGO to create a tailored set of annotations • Exercise 4: Filtering • Exercise 5: Statistics GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 How scientists use the GO, and the tools they use for analysis GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Using GO annotations • If you wanted to find out the role of a gene product manually, you’d have to read an awful lot of papers • But by using GO annotations, this work has already been done for you! GO:0006915 : apoptosis GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 How scientists use the GO • Access gene product functional information • Analyse high-throughput genomic or proteomic datasets • Validation of experimental techniques • Get a broad overview of a proteome • Obtain functional information for novel gene products • Some examples… GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Puparial adhesion Molting cycle Hemocyanin MicroArray data analysis Amino acid catabolism Lipid metobolism Peptidase activity Protein catabolism Immune response Immune response Toll regulated genes attackedcontrol pears on lw n3d ... lw n3d ...Colored cted Gene Tree: pearson Coloredby: by: : color Set_LW_n3d_5p_... Gene Lis t: ch classification: Set_LW_n3d_5p_... Gene List: Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EB Copy ofofCopy of C5_RMA (Defa... Copy of Copy C5_RMA (Defa... genes allall genes (14010)(14010) GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Validation of experimental techniques Rat liver plasma membrane isolation (Cao et al., Journal of Proteome Research 2006) GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Analysis of high-throughput proteomic datasets Characterisation of proteins interacting with ribosomal protein S19 (Orrù et al., Molecular and Cellular Proteomics 2007) GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Obtain functional information for novel gene products MPYVSQSQHIDRVRGAIEGRLPAPGNSSRLVSSWQRSYEQYRLDPGSVIGPRVLTS SELR DVQGKEEAFLRASGQCLARLHDMIRMADYCVMLTDAHGVTIDYRIDRDRRGD FKHAGLYI GSCWSEREEGTCGIASVLTDLAPITVHKTDHFRAAFTTLTCSASPIFAPTG ELIGVLDAS AVQSPDNRDSQRLVFQLVRQSAALIEDGYFLNQTAQHWMIFGHASRN FVEAQPEVLIAFD ECGNIAASNRKAQECIAGLNGPRHVDEIFDTSAVHLHDVARTDTI MPLRLRATGAVLYAR IRAPLKRVSRSACAVSPSHSGQGTHDAHNDTNLDAISRFLHS RDSRIARNAEVALRIAGK HLPILILGETGVGKEVFAQALHASGARRAKPFVAVNCGAIP DSLIESELFGYAPGAFTGA RSRGARGKIAQAHGGTLFLDEIGDMPLNLQTRLLRVLA EGEVLPLGGDAPVRVDIDVICA THRDLARMVEEGTFREDLYYRLSGATLHMPPLRER ADILDVVHAVFDEEAQSAGHVLTLD GRLAERLARFSWPGNIRQLRNVLRYACAVCDS TRVELRHVSPDVAALLAPDEAALRPALA LENDERARIVDALTRHHWRPNAAAEALGM InterProScan GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Annotating novel sequences • Can use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequence • Two tools currently available; • AmiGO BLAST (from GO Consortium) http://amigo.geneontology.org/cgi-bin/amigo/blast.cgi • searches the GO Consortium database • BLAST2GO (from Babelomics) http://www.blast2go.org/ • searches the NCBI database GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 AmiGO BLAST Exportin-T from Pongo abelii (Sumatran orangutan) GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Numerous Third Party Tools • Many tools exist that use GO to find common biological functions from a list of genes: http://www.geneontology.org/GO.tools.microarray.shtml GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 GO tools: enrichment analysis • Most of these tools work in a similar way: • input a gene list and a subset of ‘interesting’ genes • tool shows which GO categories have most interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes • tool provides a statistical measure to determine whether enrichment is significant • Try exercise 7 at home GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 GO slims • Many GO analysis tools use GO slims to give a broad overview of the dataset • GO slims are cut-down versions of the GO and contain a subset of the terms in the whole GO • GO slims usually contain less-specialised GO terms GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes: GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Slimming the GO using the ‘true path rule’ …however annotations can be mapped up to a smaller set of parent GO terms: GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 GO slims • Custom slims are available for download; http://www.geneontology.org/GO.slims.shtml • Or you can make your own using; • QuickGO • http://www.ebi.ac.uk/QuickGO • AmiGO's GO slimmer • http://amigo.geneontology.org/cgi-bin/amigo/slimmer GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Slimming with QuickGO Search GO terms or proteins Find sets of GO annotations Map-up annotations with GO slims www.ebi.ac.uk/QuickGO GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Exercise Map-up annotation using a GO slim • Exercise 6 GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Just some things to be aware of…. • The GO is continually changing • New terms created ontology • Existing terms obsoleted • Re-structured annotation • New annotations being created • ALWAYS use a current version of ontology and annotations • If publishing your analyses, please report the versions/dates you use: http://www.geneontology.org/GO.cite.shtml • Differences in representation of GO terms may be due to biological phenomenon. But also may be due to annotation-bias or experimental assays • Often better to remove the ‘NOT’ annotations before doing any large-scale analysis, as they can skew the results GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011 Thank you EBI is an Outstation of the European Molecular Biology Laboratory.