* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download here
Neuronal ceroid lipofuscinosis wikipedia , lookup
Genome evolution wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Human genetic variation wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene nomenclature wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome (book) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
DyVER algorithm
Quick start manual
Dissecting dynamic genetic variation that controls temporal gene response in yeast
Avital Brodt, Maya Botzman, Eyal David, Irit Gat-Viks
Quick start
The following manual describes how to use DyVER algorithm.
DyVER's executable JAR and source code can be downloaded from different browsers including
FireFox version 22 (22.0 and higher), Chrome version 28 (28.0.1500.72 and higher), Safari version
6 (6.0.2 and higher) and Explorer version 10 (10.0.9200 and higher).
General Overview:
The DyVER algorithm takes as input gene expression data and genotyping data, and tests the
dynamic effect pattern of a single gene of interest.
Specifically, DyVER takes the following inputs:
1. A name of a single gene of interest.
2. A directory containing the input files.
3. A directory containing the output files.
4. A total expression data file containing a list of gene expression file names.
5. A genetic variants file containing the genotypes of all strains.
6. A gene position file containing the positions of all genes.
7. The number of repeats for the permutation-based P value calculations.
8. The likelihood ratio test formulation (values: 0,1,2). The default formulation is 0 and
specifies the DyVER score.
9. An imputation strategy (values: true – imputation; false – no imputation).
All those inputs should be uploaded by the user. Inputs nos. 4-6 should be located in the input
directory (specified in input no. 2). Inputs nos. 8-9 are optional.
DyVER outputs:
1. An output file containing the DyVER score – together with its inferred temporal pattern
and model parameters - for each genetic variant.
2. An output file containing only the information (from file no. 1) for the best predicted
variant.
All output files are located in the output directory (specified in input no. 3).
Running DyVER.
Step1: Preparing an expression data file for each time point.
There are several requirements for the expression data file:
The file should be tab-delimited.
Each row represents an expression of a gene, and each column represents the strain from
which the measurement was taken.
The first row contains the names of strains and the first column includes the gene names.
o The names of each strain and each gene should be unique.
o The names of the strains should not include spaces.
Any gene with missing expression value for a certain strain should be filled with 0.
The data should be provided after a log2 transformation.
Note: each time point should be represented with separate expression data file with the same order
of rows and columns. All expression data files must be located in the input directory. An example
expression data file is provided here.
2
Step2: Preparing a total expression data file.
All expression data files are listed in a single file called total expression data file (input no. 4),
which lists the names of the expression data files. An example total expression data file is provided
here.
Step 3: Preparing the genetic variants file.
There are several requirements for the genetic variants file:
The file should be tab-delimited.
Each row represents a genetic variant, and each column represents the strain for which the
genotyping was done.
The first row contains the names of strains. The first column includes a variant index. The
second column includes a variant name. The third column includes a variant chromosome.
The fourth column includes a variant genomic position.
o The names of each strain and each variant should be unique.
o The names of the strains should not include any spaces.
Each variant can have a 0 or 1 genotype, according to its variant's parental origin. Any
variant with missing genotypic value for a certain strain should be filled with 2.
All strains that appear in the expression data files should be included in this file.
An example genetic variants file is provided here.
Step 4: Preparing the gene position file.
There are several requirements for the gene position file:
The file should be tab-delimited.
Each row represents details regarding the position of a gene. The first column includes a
gene index. The second column includes a gene name. The third column includes a gene
chromosome. The fourth column includes a gene direction (can be empty). The fifth
column includes a gene start position. The sixth column includes a gene end position.
All genes that appear in the expression data files should be included in this file.
An example gene position file is provided here.
Step 5: Specify the number of permutations.
This parameter is used for the P value calculation based on a permutation test. When this parameter
equals 0, a permutation test is not performed and the P value is estimated only based on the 2
distribution. Notably, to accelerate the run time, the 2 approximation is recommended.
Step 6: Specify the scoring method.
This parameter specifies the formulation of the likelihood ratio score. Here, 0, 1 and 2 refer to the
DyVER score, alternative formulations nos. I and II as detailed below:
See details in Figure S14 in Brodt et al. “Dissecting Dynamic Genetic Variation that Controls
Temporal Gene Response in Yeast”, submitted, 2014.
3
Step 7: Specify the imputation scheme.
This parameter is used for the purpose of filling missing expression values in the expression data
files. When this option is on ("true"), an imputation strategy is performed as follows: Each of the
missing values is filled based on the average expression values from neighbouring time points (if
available) of the same strain.
Step 8: Running the analysis.
The algorithm is provided as a JAR file for a simple usage. The command-line:
java –JAR DyVER.jar <gene name> <input dir> <output dir> <total expression data file> <genetic
variants file> <gene position file> <number of repeats> <likelihood ratio scoring scheme>
<imputation method>
For example, you may use the example files that are already located in the ./example_input
directory as follows:
java -jar DyVER.jar Trim12 ./example_input ./required_output_dir
example_total_gene_expression_data.txt example_genetic_variants.txt example_gene_positions.txt
0 0 false
Notably, when running the above example, the output files should be identical to the files available
in ./example_output/Trim12/.
Output files format.
The output files can be found in a <gene name> directory within the <output dir>.
Two files are provided:
1. manhattan_plot_output_<gene name>.txt – A tab-delimited file containing the gene name
(column 1), best genetic variant found using DyVER (column 2), the variant chromosome
(column 3), the variant genomic position (column 4), the DyVER score (column 5), the
averaged and variance observed effect sizes of the high- and low-effect states (column 6-9,
see details in columns 6-9 of file no. 2). The next columns present the temporal genetic
effect pattern found using DyVER (number of columns depends on number of time points
provided as input data).
2. best_variant_real_data_including_parameters.txt – A tab-delimited file containing the gene
name (column 1), the gene chromosome (column 2), best genetic variant found using
DyVER (column 3), the variant chromosome (column 4), the DyVER score (column 5), the
averaged and variance observed effect sizes of the high-effect state (column 6, 7,
respectively), the averaged and variance observed effect sizes of the low-effect state
(column 8, 9, respectively). Columns 10 and on present the temporal genetic effect pattern
found using DyVER (number of columns depends on number of time points provided as
input data). The next column indicates the estimated ln P value calculated based on the 2
distribution. The last column indicates the P value calculated by a permutation test (if
requested).
4