* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download uniprotkb-goa_aug2011
Genome evolution wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Metagenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Point mutation wikipedia , lookup
Public health genomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene desert wikipedia , lookup
Genome (book) wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene expression programming wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Protein moonlighting wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene expression profiling wikipedia , lookup
Introduction to the Gene Ontology and GO annotation resources Rachael Huntley UniProtKB-GOA Curator EBI EBI is an Outstation of the European Molecular Biology Laboratory. Talk Overview • Intro to GO and GO terms • Annotating to GO • Accessing GO annotations • Practical use of GO • GO analysis tools • Precautions 2 Reasons for the Gene Ontology • Increasing amounts of biological data available • Large datasets need to be interpreted quickly • Inconsistency in naming of biological concepts 3 www.geneontology.org Increasing amounts of biological data available Search on ‘DNA repair’... get over 60,000 results Expansion of sequence information 4 Reasons for the Gene Ontology • Increasing amounts of biological data available • Large datasets need to be interpreted quickly • Inconsistency in naming of biological concepts 5 Large datasets need to be interpreted quickly • Need to organise the data, analyse it, share it in a timely manner to benefit other researchers • Inconsistencies in analyses make crossspecies or cross-database comparison difficult 6 Reasons for the Gene Ontology • Increasing amounts of biological data available • Large datasets need to be interpreted quickly • Inconsistency in naming of biological concepts 7 Inconsistency in naming of biological concepts English is not a very precise language • Same name for different concepts • Different names for the same concept An example … Taction Tactition Tactile sense ? Sensory perception of touch ; GO:0050975 8 The Gene Ontology Less specific concepts • A way to capture biological knowledge in a written and computable form • A set of concepts and their relationships to each other arranged as a hierarchy More specific concepts www.ebi.ac.uk/QuickGO 9 The Concepts in GO 1. Molecular Function • • An elemental activity or task or job protein kinase activity insulin receptor activity 2. Biological Process A commonly recognised series of events • cell division • mitochondrion 3. Cellular Component Where a gene product is located 10 • mitochondrial matrix • mitochondrial inner membrane Anatomy of a GO term Unique identifier Term name Synonyms Definition Cross-references 11 Ontology structure • Directed acyclic graph Terms can have more than one parent • Terms are linked by relationships is_a part_of regulates +ve regulates -ve regulates 12 www.ebi.ac.uk/QuickGO Searching for GO terms http://www.ebi.ac.uk/QuickGO http://amigo.geneontology.org/cgi-bin/amigo/go.cgi … there are more browsers available on the GO Consortium Tools page: http://www.geneontology.org/GO.tools.shtml The latest OBO format Gene Ontology file can be downloaded from: http://www.geneontology.org/ontology/gene_ontology.obo 13 The EBI's QuickGO browser Search GO terms or proteins 14 http://www.ebi.ac.uk/QuickGO GO Annotation 15 http://www.geneontology.org Reactome 16 Aims of the GO project • Compile the ontologies - currently over 34,000 terms - constantly increasing and improving • Annotate gene products using ontology terms - around 30 groups provide annotations • Provide a public resource of data and tools - regular releases of annotations - tools for browsing/querying annotations and editing the ontology 17 Gene Ontology Annotation (UniProtKB-GOA) database at the EBI • Largest open-source contributor of annotations to GO • Member of the GO Consortium since 2001 • Provides annotation for more than 360,000 species • UniProtKB-GOA’s priority is to annotate the human proteome • UniProtKB-GOA is responsible for human, chicken and bovine annotations in the GO Consortium 18 A GO annotation is … …a statement that a gene product; 1. has a particular molecular function or is involved in a particular biological process or is located within a certain cellular component 2. as determined by a particular method 3. as described in a particular reference Accession Name P00505 19 GO ID GO term name GOT2 GO:0004069 aspartate transaminase activity Reference PMID:2731362 Evidence code IDA GOA makes annotations using two methods; Electronic • Quick way of producing large numbers of annotations • Annotations use less-specific GO terms Manual • Time-consuming process producing lower numbers of annotations • Annotations are very detailed and accurate 20 Evidence Codes Some examples… Electronic Inferred from Electronic Annotation (IEA) – e.g. UniProtKB keyword mapping Manual Inferred from Direct Assay (IDA) – e.g. enzyme assay Inferred from Physical Interaction (IPI) – e.g. yeast 2-hybrid See the full list on the GO Consortium website http://www.geneontology.org/GO.evidence.shtml 21 Electronic annotation by UniProtKB-GOA 1. Mapping of external concepts to GO terms e.g. InterPro2GO, Swiss-Prot Keyword2GO, Enzyme Commission2GO 2. Automatic transfer of annotations to orthologs Ensembl compara Macaque Chimpanzee Guinea Pig Rat e.g. Human Mouse Cow Dog Chicken Annotations are high-quality and have an explanation of the method (GO_REF) 22 Mapping of concepts from UniProtKB files GO:0005856: cytoskeleton GO:0004715 ; non-membrane spanning protein tyrosine kinase activity GO:0007155 ; cell adhesion 23 Electronic annotation by UniProtKB-GOA 1. Mapping of external concepts to GO terms e.g. InterPro2GO, Swiss-Prot Keyword2GO, Enzyme Commission2GO 2. Automatic transfer of annotations to orthologs Ensembl compara Macaque Chimpanzee Guinea Pig Rat ...and more e.g. Human Mouse Cow Dog Chicken Annotations are high-quality and have an explanation of the method (GO_REF) 24 Manual annotation by UniProtKB-GOA High–quality, specific annotations made using: • Peer-reviewed papers • A range of evidence codes to categorise the types of evidence found in a paper 25 http://www.ebi.ac.uk/GOA Finding annotations in a paper …for B. napus PERK1 protein (Q9ARH1) In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, serine/threonine kinase activity, the kinase domain of PERK1 has serine/threonine In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral integralmembrane membraneprotein protein…these kinases have been implicated in early stages of wound wound response… response PubMed ID: 12374299 Function: 26 protein serine/threonine kinase activity GO:0004674 Component: integral to plasma membrane GO:0005887 Process: response to wounding GO:0009611 Additional information • Qualifiers Modify the interpretation of an annotation • NOT (protein is not associated with the GO term) • colocalizes_with (protein associates with complex but is not a bona fide member) • contributes_to (describes action of a complex of proteins) • 'With' column Can include further information on the method being referenced e.g. the protein accession of an interacting protein 27 UniProtKB-GOA integrates manual annotations from external groups Human Protein Atlas GO:0005739 Mitochondrion also; LifeDB (subcellular locations) Reactome (pathways) 28 Status of UniProtKB-GOA Annotation Evidence Source Electronic annotations Manual annotations* Proteins UniProt Coverage 98,037,844 11,261,382 66% 910,536 159,630 0.9% Annotations Aug 2011 Statistics * Includes manual annotations integrated from external model organism and multi-species databases 29 How to access and use GO annotation data 30 Where can you find annotations? UniProtKB Ensembl Entrez gene 31 Gene Association Files 17 column files containing all information for each annotation GO Consortium website UniProtKB-GOA website 32 GO browsers 33 The EBI's QuickGO browser Search GO terms or proteins Find sets of GO annotations 34 http://www.ebi.ac.uk/QuickGO How scientists use the GO • Access gene product functional information • Analyse high-throughput genomic or proteomic datasets • Validation of experimental techniques • Get a broad overview of a proteome • Obtain functional information for novel gene products Some examples… 35 Analysis of high-throughput genomic datasets time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Puparial adhesion Molting cycle Hemocyanin MicroArray data analysis Amino acid catabolism Lipid metobolism Peptidase activity Protein catabolism Immune response Immune response Toll regulated genes 36 attacked control pears on lw n3d ... lw n3d ...Colored cted Gene Tree: pearson Coloredby: by: : color Set_LW_n3d_5p_... Gene Lis t: ch classification: Set_LW_n3d_5p_... Gene List: Copy ofofCopy of C5_RMA (Defa... Copy of Copy C5_RMA (Defa... genes allall genes (14010)(14010) Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI. Analysis of high-throughput proteomic datasets Characterisation of proteins interacting with ribosomal protein S19 (Orrù et al., Molecular and Cellular Proteomics 2007) 37 Validation of experimental techniques Rat liver plasma membrane isolation (Cao et al., Journal of Proteome Research 2006) 38 Obtain functional information for novel gene products MPYVSQSQHIDRVRGAIEGRLPAPGNSSRLVSSWQRSYEQYRLDPGSVIGPRVLTS SELR DVQGKEEAFLRASGQCLARLHDMIRMADYCVMLTDAHGVTIDYRIDRDRRGD FKHAGLYI GSCWSEREEGTCGIASVLTDLAPITVHKTDHFRAAFTTLTCSASPIFAPTG ELIGVLDAS AVQSPDNRDSQRLVFQLVRQSAALIEDGYFLNQTAQHWMIFGHASRN FVEAQPEVLIAFD ECGNIAASNRKAQECIAGLNGPRHVDEIFDTSAVHLHDVARTDTI MPLRLRATGAVLYAR IRAPLKRVSRSACAVSPSHSGQGTHDAHNDTNLDAISRFLHS RDSRIARNAEVALRIAGK HLPILILGETGVGKEVFAQALHASGARRAKPFVAVNCGAIP DSLIESELFGYAPGAFTGA RSRGARGKIAQAHGGTLFLDEIGDMPLNLQTRLLRVLA EGEVLPLGGDAPVRVDIDVICA THRDLARMVEEGTFREDLYYRLSGATLHMPPLRER ADILDVVHAVFDEEAQSAGHVLTLD GRLAERLARFSWPGNIRQLRNVLRYACAVCDS TRVELRHVSPDVAALLAPDEAALRPALA LENDERARIVDALTRHHWRPNAAAEALGM InterProScan 39 www.ebi.ac.uk/InterProScan Annotating novel sequences • Can use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequence • Two tools currently available; AmiGO BLAST (from GO Consortium) http://amigo.geneontology.org/cgi-bin/amigo/blast.cgi – searches the GO Consortium database BLAST2GO (from Babelomics) http://www.blast2go.org/ – searches the NCBI database 40 AmiGO BLAST Exportin-T from Pongo abelii (Sumatran orangutan) 41 amigo.geneontology.org/cgi-bin/amigo/blast.cgi Analysis of large datasets using GO 42 Using the GO to provide a functional overview for a large dataset • Many GO analysis tools use GO slims to give a broad overview of the dataset • GO slims are cut-down versions of the GO and contain a subset of the terms in the whole GO • GO slims usually contain less-specialised GO terms 43 Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes: 44 Slimming the GO using the ‘true path rule’ …however annotations can be mapped up to a smaller set of parent GO terms: 45 GO slims Custom slims are available for download; http://www.geneontology.org/GO.slims.shtml or you can make your own using; • QuickGO http://www.ebi.ac.uk/QuickGO • AmiGO's GO slimmer http://amigo.geneontology.org/cgi-bin/amigo/slimmer 46 The EBI's QuickGO browser Search GO terms or proteins Find sets of GO annotations Map-up annotations with GO slims 47 www.ebi.ac.uk/QuickGO GO analysis tools 48 Numerous Third Party Tools 49 www.geneontology.org/GO.tools.shtml Term enrichment • Most popular type of GO analysis • Determines which GO terms are more often associated with a specified list of genes/proteins compared with a control list or rest of genome • Many tools available to do this analysis • User must decide which is best for their analysis 50 Precautions when using GO annotations for analysis • The Gene Ontology is always changing and GO annotations are continually being created - always use a current version of both - if publishing your analyses please report the versions/dates you used http://www.geneontology.org/GO.cite.shtml • Recommended that ‘NOT’ annotations are removed before analysis - only ~5000 out of ~100 million annotations are ‘NOT’ - can confuse the analysis 51 Precautions when using GO annotations for analysis • Unannotated is not unknown - where there is no evidence in the literature for a process, function or location the gene product is annotated to the appropriate ontology’s root node with an ‘ND’ evidence code (no biological data), thereby distinguishing between unannotated and unknown • Pay attention to under-represented GO terms - a strong under-representation of a pathway may mean that normal functioning of that pathway is necessary for the given condition 52 The UniProtKB-GOA group Curators: Emily Dimmer Rachael Huntley Yasmin Alam-Faruque Software developer: Tony Sawford Team leaders: Rolf Apweiler Claire O’Donovan Email: [email protected] 53 http://www.ebi.ac.uk/GOA Acknowledgements Members of; UniProtKB GO Consortium InterPro IntAct HAMAP Funding National Human Genome Research Institute (NHGRI) Kidney Research UK EMBL 54