Download TAGS: a tool for gene set analysis of expression time series

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

Public health genomics wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene wikipedia , lookup

Genome (book) wikipedia , lookup

Genome evolution wikipedia , lookup

NEDD9 wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Microevolution wikipedia , lookup

Gene nomenclature wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
TAGS: a tool for gene set analysis of expression time series
Ying Liu
[email protected]
MOE Key Laboratory of Bioinformatics and Bioinformatics Div, TNLIST
Department of Automation, Tsinghua University, Beijing 100084, China
Name
TAGS – Time-series Analysis for Gene Set
Description
TAGS is a tool for gene set analysis for expression time series, which can incorporate existing
knowledge and analyze the dynamic property of a group of genes. It could be used to discover
expression regulatory relationships, and analyze a set of genes that have functional or structural
associations.
System Requirement
This software has been tested on Intel Core 2 CPU, 2G RAM and Windows XP/Vista. A minimum
of 500M space on the hard disk is required to install and run it. Perl should be installed, and the
current edition of TAGS has been tested under ActivePerl.
Installation
Double click ‘TAGS.msi’ to install TAGS. Follow the instructions to complete the installation.
The TAGS files will be extracted to the installation folder specified during installation.
Usage
Double click the desktop shortcut to start TAGS. In Windows Vista one must find the executable
file ‘TS.exe’ in the installation directory and run it manually as administrator. Figure 1 shows the
TAGS main window.
Figure 1 The main window of TAGS

Loading files
There are five types of data files which can be loaded by TAGS (see Files and Directory
Structure):
 Expression file
 Covariate file
 Rank file
 Result file
 External file
One can load these files through the Load menu.

Discovery of regulatory relationships across a time course
Load expression, covariate and external files which contain gene expression time series, time
points, and regulator expression time series, respectively. Click Analysis->With External Data
(Figure 2). The parameters and options are as following:
 Permutation times: times of gene-set permutation (default 10 for a try). We recommend at
least 100 times of permutation in a real analysis. Type the number in the line edit.
 Gene set path: click browse… to specify the directory which contains candidate target set
files.
 Q value cutoff: a number between 0 and 1 as the Q value cutoff when there is more than 1
candidate sets (regulators). When there is only 1 set to be tested, the Q value is equivalent
to the commonly used P value. All the sets with Q values less than or equal to the cutoff is
reported by TAGS.
Click OK to run the process after specifying appropriate parameters and options, and the current
parameters and options will be stored by TAGS and will appear as default next time. Click Cancel
to return to the main window.
Figure 2 Analyses from external data

Discovery of significant gene sets for expression time series
One can do the gene set based analysis both from expression profile and a predefined gene rank.
 Analysis from expression profile
TAGS can calculate significant gene sets using an expression file loaded in the last step. One can
do the analysis through Analysis->From Expression Data (Figure 3). The options and parameters
are as follows:






Figure 3 Analysis from expression profile
Ranking method: regression, variance and correlation can be used for gene ordering. If
regression or correlation is used, a covariate file will be needed. If variance is used,
only gene-set permutation can be employed for the significance evaluation because
time-point permutation cannot change variance (see Permutation method).
 Basis for regression: the basis used for the regression analysis of a single time
series. Both natural cubic spline and polynomial spline are provided. Use the
combo box for a selection. Cubic spline (default) is recommended because its
flexibility.
Permutation method: both time-point permutation and gene-set resampling can be used
to evaluate the significance of enrichment scores.
Permutation times: times of time point permutation (default 10 for a try). We
recommend at least 100 times of permutation in a real analysis. Type the number in the
line edit.
Gene set path: use browse… to specify the directory which contains candidate gene set
files.
Q value cutoff: a number between 0 and 1 as the Q value cutoff when there is more than
1 candidate sets. When there is only 1 set to be tested, the Q value is equivalent to the
commonly used P value. All the sets with Q values less than or equal to the cutoff is
reported by TAGS.
Adjust rank: if selected, q-value, variance or Pearson correlation coefficient is used for
calculating the enrichment score.

Tie: genes with similar q-values, variances or Pearson correlation coefficients (the
difference is less than Threshold) are considered as a tie.
 Weighting: q-value, variance or Pearson correlation coefficient is used for
weighting.
Click OK to run the process after specifying appropriate options and parameters, and the current
options and parameters will be stored by TAGS and will appear as default next time. Click Cancel
to return to the main window. First, TAGS will call an EDGE function to calculate a gene rank
according to each gene’s differential expression. Next, time point permutation is done and
corresponding ranks are calculated with the same strategy as above. Finally, gene set analysis is
done to find the significant gene sets. The running time depends on the number of candidate sets
and, more importantly, the permutation times.
A result dialog will open automatically when calculation is finished (Figure 4). The Significant
Gene Set(s) text browser shows the result, including the order of gene sets according to their Q
values, the significant gene set (represented by corresponding file names), the normalized
enrichment score, P value, Q value. One can respecify the Q value cutoff according to the results
through the Q value cutoff text edit, and click Recalculate for a recalculation. New results will
appear in the same dialog in the same format just described. Click Save Result to show the save
dialog (Figure 5), specify the result file name and path, click OK to save the result containing the
significant sets (TAGS records all the sets together with their information automatically to rec.tmp
in the installation path). Click Cancel to return to the result dialog. TAGS also saves the
leading-edge subset for each significant gene set to the ‘lead’ directory (see Files and Directory
Structure) automatically. Click Done in the result dialog to return to the main window.
Figure 4 The result dialog
Figure 5 Save analysis result
 Analysis from gene rank
TAGS can run an analysis against an existing gene rank file, generated by either the user or other
software and loaded through the Load menu. One can access to this function through
Analysis->From Gene Rank (Figure 6). The parameters are as follows:

Figure 6 Analysis from gene rank
Permutation method: both gene-set resampling and time-point permutation can be used
to evaluate the significance of enrichment scores.
 Permutation times: if gene-set permutation is used, permutation times should be
clarified.

Permutation file path: if time-point permutation is used, a directory containing files
of ranks generated by time point permutation should be prepared. Because there is
no expression file, TAGS cannot permute time points automatically to generate
corresponding gene lists. Users must prepare the gene ranks which are needed for
the analysis and load them into TAGS.
 Gene set path: use browse… to specify the directory which contains candidate gene set
files.
 Q value cutoff: same as the corresponding parameter in Analysis from Expression Data.
See Analysis from expression profile.
 Adjust rank: same as the corresponding parameter in Analysis from Expression Data.
See Analysis from expression profile.
Click OK to run the process after specifying appropriate parameters, and the current parameters
will be stored by TAGS and will appear as default next time. Click Cancel to return to the main
window. Result is shown in the result dialog (see Analysis from expression profile).

Image
TAGS can plot 2 kinds of images for the illustration of the result.
 Heatmap for significant gene set
When the analysis is finished, or a result file is loaded (see Loading files), one can click
Image->Heat Map for Significant Gene Set to show the heatmap dialog (Figure 7). All the
significant gene sets will be listed in the Select a set combo box. Choose one and click OK to
show the heatmap (Figure 8). Click Save Heatmap under the image to save the heatmap as a bmp
file to a specified directory. Click Done to return.
Figure 7 Choosing a significant set to plot heatmap
Figure 8 Heatmap of a significant gene set
 Histogram of permuted NESs (normalized enrichment scores)
When an analysis is done (either from expression profile or from a gene rank), click
Image->Histogram of Permuted NESs, a histogram of all the permuted NESs will be shown
(Figure 9). Click Save Histogram to save the image as a bmp file. Click Done to return to the main
window.
Figure 9
Histogram of permuted NESs
Files and Directory Structure
 File types
 Expression file
A tab delimited file containing the expression matrix. Each row represents a time series except the
first row which contains a header and sample (time point) names for each column. The first
column contains the gene or probe set names, which should be the same identifiers as those in the
gene set files. The following is an example:
GeneName GSM27015 GSM27016 GSM27017 GSM27018 GSM27019
AA004795 582.583 933.728 481.011 572.583 641.637
AA010078 77.757 73.5 122.316 89.047 106.645
…
 Covariate file
A tab delimited file containing the time point variant of each sample. The first row contains a
header and sample (time point) names, and the second row specifies the corresponding time points.
The following is an example:
Cov Name GSM27015 GSM27016 GSM27017 GSM27018 GSM27019
Age 26 26 27 29 30
 Rank file
A tab delimited file containing a gene rank, with or without a header row. The first column is the
order (1, 2, 3, …). The second column is the gene or probe set names quoted by ‘’’, which should
be the same identifiers as those in the gene set files. The third column is p-values. The forth
column is q-values which may be used for weighting. Specifically, the EDGE output can be used
directly for the analysis. Here is an example:
Rank
Gene Name P-Value Q-Value
1 'VDAC1P'
7.963686e-07
0.0002886315
2 'RAP2A' 7.963686e-07
0.0002886315
…
 Result file
A file generated by TAGS, containing the analysis result, i.e. significant gene sets and other
relevant information. The format is the same as in the text browser in the result dialog (see
Usage).
 Gene set file
Each gene set file represents a candidate set for analysis. Each line is a gene or probe set name,
which should be the same identifier as that in the expression or rank file. Repeated rows are NOT
allowed. See the following example:
PBEF1
NT5C2
…
 External file
A tab delimited file containing the expression time series of regulators (i.e., TFs). The file format
is the same as Expression file (see Expression file). The regulator identifiers in the first column
should be the same as the gene-set file names.
 Directory structure
There are four directories in the installation folder.
 lead: used to store the leading-edge subset of each significant gene set for further analysis.
The file name is the order of the corresponding set in the result dialog. The file format is the
same as that of gene set files.
 permutedCovariate: used to store the permuted covariate files when analyzing from
expression profile.
 permutedRank: used to store the ranks generated from the files in the permutedCovariate
folder.
 ranks: used to store ranks corresponding to each candidate regulator if one is analyzing
regulators with their targets.
 R-2.6.2: R version 2.6.2, which is used by TAGS.
There are also some files in the installation folder. Sometimes users may want to double click
TS.exe to run the software.
See Also
Futher information about TAGS can be found at
http://bioinfo.au.tsinghua.edu.cn/member/yliu/TAGS.
Other usage tips
Do not run two instances of TAGS at the same time on one computer.
Copyright
TAGS is free for academic usage. You can distribute under the terms of the GNU General Public
License.
Acknowledgements
The author thanks Dr. Xuegong Zhang and Bo Jiang for their comments.
References
Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW (2005) Significance analysis
of time course microarray experiments. PNAS 102: 12837-12842.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. (2005) Gene
set enrichment analysis: A knowledge-based approach for interpreting genome-wide
expression profiles. PNAS 102: 15545-15550.