* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Annotations
Epigenetics of diabetes Type 2 wikipedia , lookup
History of genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Copy-number variation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Public health genomics wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Genome evolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Protein moonlighting wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genome (book) wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome editing wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene desert wikipedia , lookup
Point mutation wikipedia , lookup
Gene nomenclature wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Designer baby wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Metagenomics wikipedia , lookup
Introduction This presentation is designed to show the features of four ‘third-party’ GO analysis tools. These tools and others listed on http://www.geneontology.org/GO.tools.shtml#micro can be used in proteomics studies to view GO terms associated with a list of proteins obtained from high-throughput experiments and their statistical significance compared with a reference set of proteins.* Each presentation was prepared by the developers of the tools, using for the analysis a list of human cardiovascular-related protein accessions (or in the case of Blast2GO, the equivalent bovine protein sequences). *All of these tools have been created outside of the GO Consortium. The articles authors do not intend to recommend any tool, merely demonstrate how GO analysis of proteome sets could be performed using some of these tools. We advise researchers to try several different tools to find one which suits their needs. Contents Blast2GO Slide 4 FatiGO Slide 13 Onto-Express Slide 20 Ontologizer Slide 27 Accession list I Slide 35 Accession list II Slide 36 Blast2GO in Babelomics http://babelomics.bioinfo.cipf.es Functional Annotation: First, the BLAST step to obtain the homologue sequences for the query sequences. Second, the actual GO annotation by applying the Blast2GO method which, basically, transfers the most confident and appropriate GO annotations to the novel sequences. Statistical charts help here to understand and interpret the annotation results. Visualization: This step allows the users to get an overall idea of the assigned GO annotations of the sequence dataset making use of GO's graph structure. Bioinformatics Department Centro de Investigación Príncipe Felipe (CIPF) [email protected] Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M. & Robles, M. (2005). Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21: 3674-3676 Functional Annotation with Blast2GO Annotation is the process of assigning functional categories to gene or gene products. In Blast2GO this assignment is performed for each sequence based on the information available for the homologous sequences retrieved by BLAST. Blast2GO annotation proceeds through a 2 step strategy: 1. All GO terms for the BLAST hit sequences are collected For the first step, BLAST results are parsed and the identifiers of the BLAST hits are found and used to query the Gene Ontology database to recover associated functional terms. Also the evidence code of each particular annotation is recovered. The evidence codes indicate how the functional assignment in the Gene Ontology database has been obtained. 2. GO terms are selected from this original pool to extract the most reliable annotation Once all this information is gathered, an annotation score is computed for each {GO,Query Sequence} pair. Only the most specific GO term within a branch of the GO is assigned to the query sequence, and this assignment is dependent on the 'annotation score', the threshold for which is preset by the user. The annotation score is computed as: Annotation score{GO, Seq} = (max.sim * ECw) + (#GO-1 * GOw) where: max.sim: is the maximal value of similarity between the query and hit sequences that have the given GO annotation ECw: is the weight given to the Evidence Code of the original annotation. Blast2GO has defined values for these weights, which can also be modified by the user. In general, ECw = 1 for experimental evidence codes and ECw < 1 for non-experimental evidence codes. #GO: is the number of annotated children terms GOw: is the weight given to the contribution of annotated children term to a given term The BLAST Step (1/2) In this tab you can see the actual status of your job and for big datasets come back later to retrieve the results. Upload your sequence file in FASTA format, choose the appropriate BLAST parameters and database (blastp for protein sequences) and press RUN The homology search is the first and most time consuming step when attempting to transfer functional information from similar sequences to uncharacterized sequence data. This simple tool gives you the option to perform high-throughput BLAST searches against several protein databases, keep processes running until they are finished monitoring its actual status and saving the generated alignments as XML file. These XML-files can than be used as input data for the Blast2GO annotation method. The BLAST Step (2/2) Save your results as an XML file. Open the results with this link The Annotation Step Upload and parse your BLAST results in NCBI's XML format applying several filters Annotation rule parameters: e-Value cut-off as minimal quality criteria annotation rule cut-off (coverage vs. exactness) GOWeight (more general vs. more specific terms) define a minimal alignment length allowed for function transfer Evidence code weights can be set to in/decrease the influence of different kinds of annotation evidence e.g. automatically generated source annotation Start the annotation assignment The Blast2GO web tool generates a multitude of statistical charts to understand the underlying dataset and to better interpret the generated annotation results The result table to browse and export the generated annotations A chart showing the e-value distribution of the BLAST results A chart showing from which source databases the transferred GO terms were originally coming from review browse export A chart showing how many GO terms were assigned to how many sequences A chart showing the distribution of the different evidence codes throughout the GO terms per BLAST hit A chart showing the distribution of the different evidence codes throughout the GO terms per sequence A chart showing the most frequent GO terms throughout the dataset A chart showing the distribution of the different species from which the BLAST hits originate A chart showing the success of the annotations process giving the number of successfully ‘BLASTed’, GO-mapped and annotated sequences A chart showing the number of sequences annotated at a certain GO level and category A chart showing the distribution of BLAST sequence similarities Saving and exporting results Blast2GO annotations are exported in a tabular format: SeqId<tab>GOterm<tab>SeqDesc Browse the generated annotations in the result table Open and save the results in a tabular format for further use in the GO-Graph-Viewer or as download data in Blast2GO project format for direct import into Blast2GO Visualization: The GO-Graph-Viewer The DAG viewer tool generates joined Gene Ontology graphs (DAGs) to create overviews of the functional context of groups of sequences. Interactive graph visualization allows the navigation of large and unwieldy graphs often generated when trying to biologically explore large sets of sequence annotations. Zoom and graph navigation is provided through the DAG viewer Java Web Start tool. Save parts of your graphs in high resolution images to better communicate your results Upload your Blast2GO generated annotations Start the interactive graph visualization tool with Java Web Start Define graph filtering parameters for more dense and informative graphs FatiGO Functional enrichment analysis Bioinformatics Department Centro de Investigación Príncipe Felipe (CIPF) http://www.fatigo.org http://www.babelomics.org [email protected] Al-Shahrour, F., et al. (2005), Babelomics: a suite of web-tools for functional annotation and analysis of group of genes in high-throughput experiments, Nucleic Acids Research, 33, W460-W464 Al-Shahrour, F., et al. (2004), FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes, Bioinformatics, 20, 578-580 Select your organism *Several types of identifier are acceptable, such as UniProtKB, Ensembl IDs, HGNC symbols, RefSeq, Entrez Gene etc. Enter your list or file of genes/proteins* In this example, list #1 is a list of BHF-UCL annotated cardiovascular-related proteins (see Slide 35) and list #2 is the “Rest of genome” Select the database(s) you want to query Click options to filter the database (optional) Filter Tool Use the level of the DAG and the evidence code as filtering criteria Select subsets of annotations based on keywords and on the size of the gene module Babelomics allows for sub-selection of gene annotations, in which gene modules are based, in order to test hypotheses in a more focused and sensitive manner. Removing from the analysis modules whose testing is unnecessary and superfluous increases the power of the tests in the multiple-testing adjustment step. Results of GO analysis Level 3 is lessgranular terms. Level 9 is moregranular terms. The number of annotated proteins per GO level is displayed Low p-value = more significant The proteins from your query set that are annotated to each GO term are listed FatiGO returns a list of GO terms which are over-represented in the list of interest, in this case the BHF-UCL list. For Biological Process terms at level 3 of the ontology, the terms that are over-represented in the BHF-UCL list include muscle contraction, cell cycle and anatomical structure development. Best p-value FatiGO shows terms deeper in the ontology, at level 6, which are over-represented in the BHF-UCL list (but not necessarily significantly – compare p-values) such as regulation of progression through cell cycle, heart development and cholesterol absorption. These are all processes you would expect cardiovascular-related proteins to be involved in. GO-Graph-Viewer Tool You can upload your FatiGO results to the interactive graph visualization tool The DAG viewer tool allows visualization of the significant GO terms as a GO graph. The GO term names are displayed together with the annotation score. Onto-Express Features at a Glance http://vortex.cs.wayne.edu/projects.htm#Onto-Express Purvesh Khatri ([email protected]) Sorin Draghici ([email protected]) Intelligent Systems and Bioinformatics Lab Department of Computer Science Wayne State University Input interface Select type of IDs in input file Choose a statistical distribution from: 1. hypergeometric 2. binomial 3. chi-square Select organism Choose from more than 300 microarrays. If an array of choice is not available, use your own reference. Choose a correction for multiple hypotheses from: 1. Bonferroni, 2. FDR, 3. Holm, 4. Sidak Supported input types are GenBank accession numbers, UniGene cluster IDs, Entrez Gene IDs, gene symbols, Affymetrix probe IDs, any of the IDs used in GO database. Results – Flat view Results – tree view • Choose a level to expand the GO tree and click “Expand” button. • Only the GO terms with at least one input gene are displayed in the tree. Results – chromosome view • Chromosome information is supported for human, mouse and rat. It displays number of genes on each chromosome and their positions. • Clicking on “NCBI Genome view” links out to NCBI Mapviewer. Results – single gene view Selecting “show in gene view” in the tree view displays the annotations for the selected gene in the GO hierarchy in the single gene view. References • Purvesh Khatri, Sorin Draghici, G. Charles Ostermeier, Stephen A. Krawetz. Profiling Gene Expression Using Onto-Express. Genomics, 79(2):266-270, February 2002. • Sorin Draghici, Purvesh Khatri, Rui P. Martins, G. Charles Ostermeier and Stephen A.Krawetz. Global functional profiling of gene expression. Genomics 81(2):98-104, February 2003. • Purvesh Khatri and Sorin Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21(18):3587-95, September 2005. • http://vortex.cs.wayne.edu/projects.htm. Ontologizer Ontologizer Open Source Team http://compbio.charite.de/ontologizer/ located at Institute for Medical Genetics Charité Universitätsmedizin Berlin Grossman S., Bauer S., Robinson P.N., Vingron M. Improved detection of overrepresentation of Gene Ontology annotations with parent child analysis. Bioinformatics. 2007 Nov 15;23(22):3024-31. Robinson P.N., Wollstein A., Böhme U., Beattie B. Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology. Bioinformatics. 2004 Apr 12;20(6):979-81. Ontologizer – Setting up a Project Inputs: • Ontology, defines the GO structure • Annotations, map genes to GO terms There are several predefined entries for various settings… …or you may specify the fields manually. Ontologizer – Editing Sets of Identifiers Annotated identifiers are highlighted on the fly. Mouse hovering reveals direct annotations. No annotation for this one The induced graph of these terms can be displayed. Ontologizer – Overview Of interest here are two lists of identifiers – study and population.* Choose analysis method; parent-child takes account of the ontology structure, term-for-term treats each term independently. But multiple projects may reside in the workspace. *In this example the study list is a list of BHF-UCL annotated cardiovascular-related proteins (see Slide 35) and the population list is a random list of human UniProtKB accessions. Ontologizer – Results A list of terms is displayed. The shading indicates significance – darker shading is more significant. Click on a term to display its position in the ontology, definition and the proteins annotated to it and its parents. Ontologizer – Graphical View of Results Yellow = Molecular Function Pink = Cellular Component Green = Biological Process The term highlighted in the table will also be highlighted red in the graph. Ontologizer – What Else? • Can be easily invoked from the Web. • Input files can be located remotely. • Several procedures of multiple testing correction are supported. • Results can be filtered and stored in a tabular as well as in a graphical fashion. • A command line version is available. Acknowledgments The authors wish to thank the developers of the tools for preparing these presentations as follows; • Blast2GO Stefan Götz • FatiGO Fatima Al-Shahrour • Onto-Express Sorin Draghici and Purvesh Khatri • Ontologizer Sebastian Bauer and Peter Robinson List of human UniProtKB accessions used in FatiGO, Onto-Express and Ontologizer analyses O00273 P04180 P12643 P35226 P55290 Q8N726 O60543 P05231 P12829 P36897 P61812 Q8TBM5 O75955 P05976 P12830 P37173 P84022 Q92673 O95477 P06727 P13501 P38936 Q00534 Q96AB3 P00519 P06741 P16519 P40337 Q00872 Q96N67 P01127 P06858 P17947 P42684 Q01449 Q9BQE4 P01137 P07203 P18510 P42771 Q13485 Q9H172 P01375 P08590 P22301 P42772 Q14114 Q9H1R3 P01584 P09493 P24385 P42773 Q14896 Q9H221 P02647 P09958 P25098 P45379 Q15796 Q9H222 P02649 P10253 P25103 P45844 Q16665 Q9HC96 P02652 P10636 P29120 P46527 Q5JRA6 Q9UKX2 P02655 P10916 P30279 P49918 Q6PGN9 Q9UNQ0 P02656 P11597 P30281 P50150 Q6Q788 Q9UPY8 P04114 P11802 P34947 P55273 Q86Y82 Q9Y5C1 Q9Y623 List of bovine UniProtKB accessions used in Blast2GO analysis A0JNJ5 P09428 Q06599 Q2KIW4 Q4GZT4 A1A3Z1 P11151 Q08DE0 Q2KJB3 Q4TTZ1 A4FUX1 P13789 Q0P5D3 Q2KJD8 Q4ZJV8 A4FUZ9 P15497 Q0VC16 Q2KJD8 Q4ZJV9 A4IFM7 P18341 Q0VC37 Q2TBI0 Q58D48 A5PJI9 P19034 Q0VD56 Q32KX0 Q5E9I5 A5PKM2 P19035 Q1HE26 Q32KX7 Q5KR49 A6QLS3 P21146 Q1RMM7 Q32KY4 Q6R8F2 A6QP89 P21214 Q1W668 Q32PJ1 Q9BE40 A7MBB9 P26892 Q24JY8 Q32PJ2 Q9BE41 O46680 P43249 Q28193 Q3B7N0 Q9GLR0 O77482 P43480 Q29RJ9 Q3MHH5 Q9GLR1 O97919 P81644 Q29RV0 Q3SYR3 Q9MYM4 P00435 P85100 Q2KI22 Q3SZE5 Q9XTA5 P05363 Q03247 Q2KI76 Q3SZE5