* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download User`s guide to GO
Survey
Document related concepts
Gene therapy wikipedia , lookup
Expression vector wikipedia , lookup
Gene desert wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Biochemical cascade wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Gene expression wikipedia , lookup
Gene nomenclature wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Molecular ecology wikipedia , lookup
Gene regulatory network wikipedia , lookup
Gene expression profiling wikipedia , lookup
Transcript
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010 1. 2. 3. 4. Provides structural annotation for agriculturally important genomes Provides functional annotation (GO) Provides tools for functional modeling Provides bioinformatics & modeling support for research community Avian Gene Nomenclature Introduction to GO Anatomy of a GO term: a GO annotation example GO evidence codes Making annotations: literature biocuration & computation analysis ND vs no GO Using the GO GO tools Functional modeling considerations Gene Ontology (GO) Not about genes! Gene products: genes, transcripts, ncRNA, proteins The GO describes gene product function Not a single ontology Biological Process (BP or P) Molecular Function (MF or F) Cellular Component (CC or C) What is the Gene Ontology? “a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing” assign functions to gene products at different levels, depending on how much is known about a gene product is used for a diverse range of species structured to be queried at different levels, eg: find all the chicken gene products in the genome that are involved in signal transduction zoom in on all the receptor tyrosine kinases human readable GO function has a digital tag to allow computational analysis of large datasets COMPUTATIONALLY AMENABLE ENCYCLOPEDIA OF GENE FUNCTIONS AND THEIR RELATIONSHIPS relationships between terms Ontologies digital identifier (computers) As of ontology version 1.1348 (27/07/2010): description (humans) 32,091 terms, 99.3% defined * 19169 biological process * 2745 cellular component * 8736 molecular function 1441 obsolete terms (not included in figures above) GO annotation example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO:0006633 fatty acid biosynthetic process TAS GO:0006120 mitochondrial electron transport, NADH to ubiquinone TAS GO:0008610 lipid biosynthetic process IEA NDUFAB1 GO:0005504 GO:0008137 GO:0016491 GO:0000036 Molecular Function (MF or F) fatty acid binding IDA NADH dehydrogenase (ubiquinone) activity TAS oxidoreductase activity TAS acyl carrier activity IEA Cellular Component (CC or C) GO:0005759 mitochondrial matrix IDA GO:0005747 mitochondrial respiratory chain complex I IDA GO:0005739 mitochondrion IEA GO annotation example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa GO:ID (unique) aspect or ontology GO evidence code GO term name GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Guide to GO Evidence Codes http://www.geneontol ogy.org/GO.evidence.s html Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model GO EVIDENCE CODES Direct Evidence Codes GO Mapping IDA - inferred fromExample direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCANDUFAB1 - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available Biocuration of literature • detailed function • “depth” • slower (manual) P05147 Biocuration of Literature: detailed gene function Find a paper about the protein. PMID: 2976880 Read paper to get experimental evidence of function Use most specific term possible experiment assayed kinase activity: use IDA evidence code GO EVIDENCE CODES Direct Evidence Codes GO Mapping IDA - inferred fromExample direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Biocuration of literature • detailed function • “depth” • slower (manual) Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCANDUFAB1 - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available Sequence analysis • rapid (computational) • “breadth” of coverage • less detailed ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model Unknown Function vs No GO ND – no data Biocurators have tried to add GO but there is no functional data available Previously: “process_unknown”, “function_unknown”, “component_unknown” Now: “biological process”, “molecular function”, “cellular component” No annotations (including no “ND”): biocurators have not annotated this is important for your dataset: what % has GO? Using the GO Using the GO Decide on GO analysis tool How much GO is available for your species? Getting GO for you data set Adding GO for your data http://www.geneontology.org/ However…. many of these tools do not support non-model organisms the tools have different computing requirements may be difficult to determine how up-to-date the GO annotations are… Need to evaluate tools for your system. Evaluating GO tools Some criteria for evaluating GO Tools: 1. Does it include my species of interest (or do I have to “humanize” my list)? 2. What does it require to set up (computer usage/online) 3. What was the source for the GO (primary or secondary) and when was it last updated? 4. Does it report the GO evidence codes (and is IEA included)? 5. Does it report which of my gene products has no GO? 6. Does it report both over/under represented GO groups and how does it evaluate this? 7. Does it allow me to add my own GO annotations? 8. Does it represent my results in a way that facilitates discovery? Some useful expression analysis tools: Database for Annotation, Visualization and Integrated Discovery (DAVID) http://david.abcc.ncifcrf.gov/ AgriGO -- GO Analysis Toolkit and Database for Agricultural Community http://bioinfo.cau.edu.cn/agriGO/ used to be EasyGO chicken, cow, pig, mouse, cereals, dicots includes Plant Ontology (PO) analysis Onto-Express http://vortex.cs.wayne.edu/projects.htm#Onto-Express can provide your own gene association file Funcassociate 2.0: The Gene Set Functionator http://llama.med.harvard.edu/funcassociate/ can provide your own gene association file Functional Modeling Considerations Should I add my own GO? Should I do GO analysis and pathway analysis and network analysis? use GOProfiler to see how much GO is available for your species use GORetriever to find existing GO for your dataset Does analysis tool allow me to add my own GO? different functional modeling methods show different aspects about your data (complementary) is this type of data available for your species (or a close ortholog)? What tools should I use? which tools have data for your species of interest? what type of accessions are accepted? availability (commercial and freely available) Overview of Functional Modeling Strategy Microarray Ids ArrayIDer Protein/Gene identifiers GOModeler hypothesis testing Pathways and network analysis Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID GO Enrichment analysis GORetriever Genes/Proteins with GO annotations no GO annotations GOanna Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID EasyGO/AgriGO Onto-Express Onto-Express-to-go (OE2GO) GOSlimViewer summarizes GO function Yellow boxes represent AgBase tools Green/Purple boxes are non-AgBase resources For more information about GO GO Evidence Codes: http://www.geneontology.org/GO.evidence.shtml gene association file information: http://www.geneontology.org/GO.format.annotation.shtml tools that use the GO: http://www.geneontology.org/GO.tools.shtml GO Consortium wiki: http://wiki.geneontology.org/index.php/Main_Page All websites are listed on the AgBase workshop website.