Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GO Further 24th Feb 2006 Jane Lomax GO annotations • Where do the links between genes and GO terms come from? 24th Feb 2006 Jane Lomax GO annotations • Contributing databases: – – – – – – – – – – – – – – Berkeley Drosophila Genome Project (BDGP) dictyBase (Dictyostelium discoideum) FlyBase (Drosophila melanogaster) GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania major and Trypanosoma brucei) UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases Gramene (grains, including rice, Oryza) Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus musculus) Rat Genome Database (RGD) (Rattus norvegicus) Reactome Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) The Institute for Genomic Research (TIGR): databases on several bacterial species WormBase (Caenorhabditis elegans) Zebrafish Information Network (ZFIN): (Danio rerio) 24th Feb 2006 Jane Lomax Species coverage • All major eukaryotic model organism species • Human via GOA group at UniProt • Several bacterial and parasite species through TIGR and GeneDB at Sanger – many more in pipeline 24th Feb 2006 Jane Lomax Annotation coverage 24th Feb 2006 Jane Lomax Anatomy of a GO annotation • Three key parts: – gene name/id – GO term(s) – evidence for association 24th Feb 2006 Jane Lomax Example annotation • Breast cancer type 1 susceptibility protein gene in humans 24th Feb 2006 Jane Lomax Types of GO annotation: 24th Feb 2006 Electronic Annotation Manual Annotation Jane Lomax Manual annotation • Created by scientific curators • High quality • Small number 24th Feb 2006 Jane Lomax Manual annotation In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… 24th Feb 2006 Jane Lomax Manual annotation 24th Feb 2006 Jane Lomax Electronic Annotation • Annotation derived without human validation – mappings file e.g. interpro2go, ec2go. – Blast search ‘hits’ • Lower ‘quality’ than experimental codes 24th Feb 2006 Jane Lomax Mappings files Fatty acid biosynthesis ( Swiss-Prot Keyword) EC:6.4.1.2 (EC number) GO:Fatty acid biosynthesis (GO:0006633) GO:acetyl-CoA carboxylase activity (GO:0003989) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) 24th Feb 2006 Jane Lomax GO:acetyl-CoA carboxylase activity (GO:0003989) Evidence types • • • • • • • • • • ISS: Inferred from Sequence/structural Similarity IDA: Inferred from Direct Assay IPI: Inferred from Physical Interaction IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern TAS: Traceable Author Statement NAS: Non-traceable Author Statement IC: Inferred by Curator ND: No Data available • IEA: Inferred from electronic annotation 24th Feb 2006 Jane Lomax GO terms • Where do GO terms come from? – most GO terms are added by the GO editorial office at EBI – new terms are usually only added when they are asked for by annotators – GO editors work with experts to make major ontology developments • metabolism • pathogenesis • cell cycle 24th Feb 2006 Jane Lomax GO stats • almost 20,000 GO terms – 10452 biological_process – 1687 cellular_component – 7393 molecular_function 24th Feb 2006 Jane Lomax Ja n0 M 1 ar M 01 ay -0 Ju 1 lSe 01 pN 01 ov -0 Ja 1 nM 02 ar M 02 ay -0 Ju 2 lSe 02 pN 02 ov -0 Ja 2 nM 03 ar M 03 ay -0 Ju 3 lSe 03 pN 03 ov -0 Ja 3 nM 04 ar M 04 ay -0 Ju 4 lSe 04 pN 04 ov -0 Ja 4 n0 M 5 ar -0 5 Number of terms Growth of GO GO term history 2001 - 2005 25000 20000 15000 defined terms undefined terms obsoletes 10000 5000 0 Date 24th Feb 2006 Jane Lomax No GO Areas • GO covers ‘normal’ functions and processes – No pathological processes – No experimental conditions • NO evolutionary relationships • NO gene products • NOT a system of nomenclature 24th Feb 2006 Jane Lomax Open Biomedical Ontologies (OBO) • A repository for well-structured controlled vocabularies for shared use across different biological and medical domains: http://obo.sourceforge.net/ 24th Feb 2006 Jane Lomax Open Biomedical Ontologies (OBO) • Requirements for inclusion: http://obo.sourceforge.net/crit.html 24th Feb 2006 Jane Lomax AmiGO exercise 24th Feb 2006 Jane Lomax Annotation exercise • We have provided a Nature paper (PMID: 14961121) for you to annotate with GO terms – This will help you to understand how the information is extracted from papers and GO terms are applied by the curators – It will also give you the opportunity to use another GO browser developed at EBI: QuickGO 24th Feb 2006 Jane Lomax Annotation exercise • The gene you are annotating is VG5Q – To make it easier we’ve highlighted some of the most relevant passages in the text • Use the GO browser QuickGO to look for the most appropriate GO terms: – http://www.ebi.ac.uk/ego/ 24th Feb 2006 Jane Lomax Annotation exercise • In QuickGO, you search for the GO terms by name http://www.ebi.ac.uk/ego/ 24th Feb 2006 Jane Lomax Annotation exercise • Remember, as well as the GO term, you also need to assign an evidence code – to remind you, we’ve included a list of the evidence codes at the back of the paper 24th Feb 2006 Jane Lomax Annotation exercise • To see how your annotations compared to those done by the GO curator, search QuickGO for Q8N302 – This is the UniProt id for the gene VG5Q • Click ‘show only manual’ and this will show you the annotations the curator made 24th Feb 2006 Jane Lomax