Download here

DyVER algorithm Quick start manual Dissecting dynamic genetic variation that controls temporal gene response in yeast Avital Brodt, Maya Botzman, Eyal David, Irit Gat-Viks Quick start The following manual describes how to use DyVER algorithm. DyVER's executable JAR and source code can be downloaded from different browsers including FireFox version 22 (22.0 and higher), Chrome version 28 (28.0.1500.72 and higher), Safari version 6 (6.0.2 and higher) and Explorer version 10 (10.0.9200 and higher). General Overview: The DyVER algorithm takes as input gene expression data and genotyping data, and tests the dynamic effect pattern of a single gene of interest. Specifically, DyVER takes the following inputs: 1. A name of a single gene of interest. 2. A directory containing the input files. 3. A directory containing the output files. 4. A total expression data file containing a list of gene expression file names. 5. A genetic variants file containing the genotypes of all strains. 6. A gene position file containing the positions of all genes. 7. The number of repeats for the permutation-based P value calculations. 8. The likelihood ratio test formulation (values: 0,1,2). The default formulation is 0 and specifies the DyVER score. 9. An imputation strategy (values: true – imputation; false – no imputation). All those inputs should be uploaded by the user. Inputs nos. 4-6 should be located in the input directory (specified in input no. 2). Inputs nos. 8-9 are optional. DyVER outputs: 1. An output file containing the DyVER score – together with its inferred temporal pattern and model parameters - for each genetic variant. 2. An output file containing only the information (from file no. 1) for the best predicted variant. All output files are located in the output directory (specified in input no. 3). Running DyVER. Step1: Preparing an expression data file for each time point. There are several requirements for the expression data file:  The file should be tab-delimited.  Each row represents an expression of a gene, and each column represents the strain from which the measurement was taken.  The first row contains the names of strains and the first column includes the gene names. o The names of each strain and each gene should be unique. o The names of the strains should not include spaces.  Any gene with missing expression value for a certain strain should be filled with 0.  The data should be provided after a log2 transformation. Note: each time point should be represented with separate expression data file with the same order of rows and columns. All expression data files must be located in the input directory. An example expression data file is provided here. 2 Step2: Preparing a total expression data file. All expression data files are listed in a single file called total expression data file (input no. 4), which lists the names of the expression data files. An example total expression data file is provided here. Step 3: Preparing the genetic variants file. There are several requirements for the genetic variants file:  The file should be tab-delimited.  Each row represents a genetic variant, and each column represents the strain for which the genotyping was done.  The first row contains the names of strains. The first column includes a variant index. The second column includes a variant name. The third column includes a variant chromosome. The fourth column includes a variant genomic position. o The names of each strain and each variant should be unique. o The names of the strains should not include any spaces.  Each variant can have a 0 or 1 genotype, according to its variant's parental origin. Any variant with missing genotypic value for a certain strain should be filled with 2.  All strains that appear in the expression data files should be included in this file. An example genetic variants file is provided here. Step 4: Preparing the gene position file. There are several requirements for the gene position file:  The file should be tab-delimited.  Each row represents details regarding the position of a gene. The first column includes a gene index. The second column includes a gene name. The third column includes a gene chromosome. The fourth column includes a gene direction (can be empty). The fifth column includes a gene start position. The sixth column includes a gene end position.  All genes that appear in the expression data files should be included in this file. An example gene position file is provided here. Step 5: Specify the number of permutations. This parameter is used for the P value calculation based on a permutation test. When this parameter equals 0, a permutation test is not performed and the P value is estimated only based on the 2 distribution. Notably, to accelerate the run time, the 2 approximation is recommended. Step 6: Specify the scoring method. This parameter specifies the formulation of the likelihood ratio score. Here, 0, 1 and 2 refer to the DyVER score, alternative formulations nos. I and II as detailed below: See details in Figure S14 in Brodt et al. “Dissecting Dynamic Genetic Variation that Controls Temporal Gene Response in Yeast”, submitted, 2014. 3 Step 7: Specify the imputation scheme. This parameter is used for the purpose of filling missing expression values in the expression data files. When this option is on ("true"), an imputation strategy is performed as follows: Each of the missing values is filled based on the average expression values from neighbouring time points (if available) of the same strain. Step 8: Running the analysis. The algorithm is provided as a JAR file for a simple usage. The command-line: java –JAR DyVER.jar <gene name> <input dir> <output dir> <total expression data file> <genetic variants file> <gene position file> <number of repeats> <likelihood ratio scoring scheme> <imputation method> For example, you may use the example files that are already located in the ./example_input directory as follows: java -jar DyVER.jar Trim12 ./example_input ./required_output_dir example_total_gene_expression_data.txt example_genetic_variants.txt example_gene_positions.txt 0 0 false Notably, when running the above example, the output files should be identical to the files available in ./example_output/Trim12/. Output files format. The output files can be found in a <gene name> directory within the <output dir>. Two files are provided: 1. manhattan_plot_output_<gene name>.txt – A tab-delimited file containing the gene name (column 1), best genetic variant found using DyVER (column 2), the variant chromosome (column 3), the variant genomic position (column 4), the DyVER score (column 5), the averaged and variance observed effect sizes of the high- and low-effect states (column 6-9, see details in columns 6-9 of file no. 2). The next columns present the temporal genetic effect pattern found using DyVER (number of columns depends on number of time points provided as input data). 2. best_variant_real_data_including_parameters.txt – A tab-delimited file containing the gene name (column 1), the gene chromosome (column 2), best genetic variant found using DyVER (column 3), the variant chromosome (column 4), the DyVER score (column 5), the averaged and variance observed effect sizes of the high-effect state (column 6, 7, respectively), the averaged and variance observed effect sizes of the low-effect state (column 8, 9, respectively). Columns 10 and on present the temporal genetic effect pattern found using DyVER (number of columns depends on number of time points provided as input data). The next column indicates the estimated ln P value calculated based on the 2 distribution. The last column indicates the P value calculated by a permutation test (if requested). 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download here