Download SPoRE - LCQB

SPoRE Here we provide the R scripts that allow you to reproduce the computation of the models proposed in the article, reproduce the benchmarks of the article, predict new hot and cold spots, and use SPoRE on another genome. Requirements:  R 2.14 or more (http://www.r-project.org/) (R 3.0 has also been tested)  this R package: seqinr  OS: o Linux has been tested o Windows works (tested with Windows XP), but the directory with R binaries needs to be added to the PATH for the "Rscript" command to work (needed by SPoRE). How to run the complete analysis: You can regenerate all the models by running this command: Rscript scripts/SPoRE_predict_all.R You can then compare them to S. Cerevisiae experimental data like this (to reproduce the benchmarks of the article): Rscript scripts/SPoRE_benchmark_all.R Model curves: The files with the curves produced with the models are in the WIG directory (and are in WIG format), and are named axis_model3-1500.wig (Red1 model 3) and DSB_model6-250.wig (DSB model 6). You can load them in a program like IGV or on an online service like UCSC Genome Browser. Hotspot prediction: To predict hot and cold spots, use the following command: Rscript scripts/SPoRE_predict_hotspots.R spo11-spots DSB_model6-250 This will predict as hot or cold the spots listed in spots-input/spo11-spots.txt, using the DSB model 6 (ie. the curve of DSB_model6-250.wig). Note that you need to generate the curves as explained above before using hotspot prediction. The example file provided, spo11-spots.txt, contains the hot and cold spots on chromosome IV that we used for the benchmark we present in the article. Hence you can reproduce our prediction, which gives an accuracy of 84%. The "hot" column in the input file is optional. If it is there, it is interpreted as experimental data telling whether it is actually a hotspot. If it is present, a benchmark is made, and the number of true/false positive/negatives is computed, as well as the accuracy. The output is stored in spots-output/spo11-spots.txt (ie the same name as the input file, but in the spots-output directory). Compared to the input table, the output table has two new columns:  predictedDensity which predicts the density of DSBs in the spot  predictedAsHot which tells whether SPoRE predicts the spot as hot (TRUE) or cold (FALSE) Axis site prediction: Axis site prediction works exactly as DSB hotspot prediction, except that you should use “axis_model3-1500” as the second parameter instead of “DSB_model6-250”. To adapt the analysis for another species, you have to change:    genomes/yeast_genome.fasta which contains the yeast genome in FASTA format: Each sequence is a chromosome: from 1 to N. (N=16 for S. Cerevisiae) genome_info_matrix/yeast_genes_for_model.txt which is a matrix with the genes to consider If and only if you want to use model 7, which takes in account the Transcription Factor Binding Sites (TFBS) to define the promoter positions instead of an automatic approximation (like in models 3 to 6), you need to change the file data/TF.txt which contains the TFBS positions for the genes. How to format this gene matrix: Don't change the name of the columns, they are referenced by our program. They are:  id: unique id for the gene (can be what you want, it just has to be unique)  chromosomeNumber: chromosome number from 1 to N (integer)  strand: "FORWARD" or "REVERSE"  positionMin: first position of the gene (included)  positionMax: last position of the gene (included) The positions are relative to the chromosome, with the first base numbered as 1. How to format the TF.txt matrix (only necessary for DSB model 7): This matrix contains the transcription factor binding sites for each gene.  chr: chromosome number from 1 to N (optional - unused by SPoRE)  position: position on the chromosome where the transcription factor binds to regulate the target gene  value: any number (optional - unused by SPoRE)  target: the id of the target gene (an id appearing in the id column of the gene matrix)  TF: a name for the transcription factor (optional - unused by SPoRE) As you can see, only the “position” and “target” columns are actually used by SPoRE. The chromosome number is not used because SPoRE assumes that the position of a TFBS of a gene is on the same chromosome as the gene (which should be the case unless there is a bug in the data). Note that the values in columns target and TF are not unique since a TF may regulate several genes, and several TF may regulate a single gene. If a gene has no TFBS at all (it never appears in the “target” column) then the promoter position approximation of models 3-6 is used, so it is not a problem if the information is incomplete. In the extreme case, if TF.txt is empty, model 7 will be identical to model 6.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download SPoRE - LCQB