Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gene Prediction Exercise Initial concepts to be known: 1)What are ab initio and homology based methods and what are the differences between them. 2) What is the structure of a prokaryotic operon. Possible programs to be used for prediction of genes by ab initio methods:Glimmer Implementation of Glimmer is divided into 2 steps: 1)First a probability model called an ICM of coding sequences is built based on the known genes/genes from similar species. build-icm [ options ] output-file < input-file 2) The glimmer3 program itself is run to analyze the sequences and make gene predictions. glimmer3 [ options ] sequence icm tag. The program can be downloaded from http://www.cbcb.umd.edu/software/glimmer/ Prodigal Prodigal uses dynamic programming and performs well with genomes having high GC content. For command line usage in a linux system use the following:Usage: prodigal [-a trans_file] [-c] [-d nuc_file] [-f output_type] [-g tr_table] [-h] [-i input_file] [-m] [-n] [-o output_file][-p mode] [-q] [-s start_file] [-t training_file] [-v] The program can be downloaded from http://code.google.com/p/prodigal/ GeneMark Genemark.hmm use hidden mark model to predict genes in orfs. GeneMarks runs a self training part and then runs Genemark.hmm for the final gene prediction. I. Prokaryotic GeneMark.hmm(Found in the GeneMarkS package) Usage: gmhmmp [parameters ...] [sequence filename] -m parameter is mandatory Input sequence file in FASTA format can have multi-FASTA sequence II. GeneMarkS Usage: gmsn.pl [options] <sequence file name> Input sequence file in FASTA format The program can be downloaded from http://exon.gatech.edu/license_download.cgi Possible programs to be used for prediction of RNA genes:RNAmmer Hidden Usage: perl rnammer -S bac -m lsu,ssu,tsu -gff - < [inputfile] The input file should be in fasta format. The program can be downloaded from http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer tRNAscan-SE Identifies transfer RNA (tRNA) genes by integrating and post-processing the outputs of three independent tRNA prediction programs Runs tRNAscan and the Pavesi algorithm to find candidate tRNAs Candidate tRNAs go through the covariance model search program for confirmation Predicted tRNA bounds are trimmed and run through the covariance model global structure alignment program to get a secondary structure prediction Usage: tRNAscan-SE [-options] <FASTA file(s)> The program can be downloaded from http://lowelab.ucsc.edu/tRNAscan-SE/ sRNA scanner Identifies intergenic small RNA (sRNAs) transcriptional units Transcriptional signal data used as positive training data Uses a position weight matrix (PWM) to predict sRNAs Usage: ./sRNAscanner.exec Input.data