Download Tutorial - Processing of Prokaryotic Genome and Transcriptome data

GSEA-Pro Tutorial Anne de Jong University of Groningen Introduction  The main principle of a Gene Set Enrichment Analysis (GSEA) is to discover which biological function is or functions are overrepresented in a set of genes or proteins.  For such an analysis GSEA-Pro use the Genome2D database that describes the relation between genes/proteins and functions (functional classification). As example, all genes encoding enzymes for a specific metabolic pathway belong to the same class  GSEA-Pro use multiple classification; GO, InterPro, KEGG, COG, PFAM, SMART and Superfamily  For GSEA-Pro locus-tags are used as ID for genes as well as for proteins Introduction  Overview of Functional Analysis of Genes Sets Transcriptomics Proteomics Metagenomics One or multiple sets of Genes Unravel the biological function of a “Gene Set” -omics Input  STEP 1: Select Genome  The GSEA-Pro is integrated into the Genome2D web-server that contain classifications of all ‘complete’ genomes of the NCBI.  Be sure to select the correct strain (check your locus-tags).  Preferably use the RefSeq locus-tags names, but also old-locus-tags are supported if a genome is selected from the RefSeq database. The ‘old’ non-RefSeq NCBI genome database is also supported and still contain gene names and locustags which are discarded by NCBI in the RefSeq database.  STEP 2: Four types of data tables can be used as input  Single list of locus-tags: This is a bare list of genes (as locus-tags) deduced from transcriptome or proteome analysis results.  Single list of locus-tags with ratio values: The first column contains the locus-tags, the second ratio values generated by differential expression (DE) analysis.  Experiments: From time series or perturbation experiments GSEA-Pro will select the gene set of each experiment on the basis of ratio data. Default threshold values can be changed on the webserver.  Clustering: Clustering algorithms will group genes showing similar behavior over purtubation experiments or time series. GSEA-Pro will handle each cluster as a gene set and will show the biological function of each cluster. The first column of the input table should contain the locus-tags and the column with cluster-IDs should have the header “clusterID” (or change this at the web-server ) Input  Step 3: Examples of input data tables Tables can be uploaded to the webserver as tab delimited file or by copy and paste directly from e.g. Excel Single list BSU40340 BSU40320 BSU40100 BSU40090 BSU39380 BSU38470 BSU37740 BSU37560 BSU36640 BSU35310 BSU34670 BSU33830 BSU33810 BSU33800 BSU33440 .. .. Single list + ratio data locus Null-WT BSU32600 9.823054 BSU32610 9.171172 BSU32590 8.934336 BSU32580 8.7597 BSU09460 7.679297 BSU02390 7.497631 BSU32570 7.258288 BSU03010 6.926733 BSU32090 6.846735 BSU32550 6.438756 BSU31540 6.313128 BSU19350 6.063705 BSU10450 5.88237 BSU10460 5.857612 .. Experiments Clustering [ value columns will be ignored ] locus A_F71Y-WTB_R61K-WTC_R61H-WT Null-WT BSU40420 -0.18052 -0.54343 -1.15383 -1.50486 BSU40340 0.846962 0.910176 1.078578 1.139989 BSU40320 0.530724 0.939465 1.06793 1.164206 BSU40180 1.193949 1.410571 2.207017 2.447594 BSU40100 0.593872 0.649456 1.197021 1.0736 BSU40090 0.762748 0.587133 1.146103 1.017818 BSU40022 0.873289 1.049594 1.582536 1.928704 BSU40021 1.076014 1.562787 1.779806 2.252712 BSU39930 0.02815 0.289389 1.00797 2.638345 BSU39920 0.193887 0.476066 1.322628 2.777654 BSU39910 0.89087 1.137493 2.184607 4.009186 BSU39900 1.802093 2.304714 3.464422 5.355291 BSU39890 1.150499 2.53483 3.751564 5.580559 BSU39880 0.418429 1.152457 2.205843 3.864053 .. locus A_F71Y-WTB_R61K-WTC_R61H-WT Null-WT clusterID BSU40420 -0.18052 -0.54343 -1.15383 -1.50486 1 BSU38810 -0.26891 -0.54053 -0.98884 -1.25233 1 BSU35920 -0.7741 -0.64672 -1.26974 -1.16388 1 BSU23560 -0.61699 -0.62814 -1.16469 -1.16779 1 BSU18120 -0.44501 -0.53468 -1.21979 -1.5308 2 BSU15560 -0.53203 -0.50688 -0.83918 -1.09821 2 BSU15550 -0.48608 -0.45498 -0.84874 -1.01399 2 BSU15540 -0.73774 -0.49082 -0.89019 -1.06595 2 BSU15530 -0.54945 -0.51365 -0.83166 -1.03166 3 BSU15520 -0.47401 -0.46463 -0.83956 -1.03379 3 BSU15510 -0.33067 -0.52 -0.82483 -1.04785 3 BSU15500 -0.30324 -0.51485 -0.80274 -1.04997 3 BSU15490 -0.34899 -0.46083 -0.83381 -1.053 3 BSU14700 -0.25436 -0.36245 -0.59358 -1.05428 3 .. Results  Normally the results should be ready in seconds and generates 4 main tables;  Table 1: All combinations of class / experiment are represented in one table. Values are only shown if the p-value is lower then the cutoff value (0.01). Within brackets: the number of genes of the class that are differential expressed (TopHits). The light to dark blue coloring represents low to high significance, respectively. The intensity of the color is based on (TopHits/ClassSize) * -log2(adj-pvalue).   Items in the ClassID column links to external databases describing the class IDs  Items in the Experiment columns links to genes and gene annotations which are member of that specific class / experiment combination  The ClassSize column show the total number of genes that are member of the classID in the selected organism Table 2: Heatmap of Class x Experiments and clickable to the ‘GSEA-Pro BarGraph’  The GSEA-Pro BarGraph show the overrepresented classes and its p-value (as –log).  A detailed table links to online information of classIDs and links to the genes found for the specific class  Table 3: Heatmap of Class x Experiments and clickable to the full class table  Table 4: Overview of the locus-tags of each experiment or cluster used for the GSEA  TreeMap: Global visualization and quick mining trough the GSEA-Pro results

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Tutorial - Processing of Prokaryotic Genome and Transcriptome data