Download Gene Prediction Exercise Initial concepts to be known: 1)What are

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genome editing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Gene Prediction Exercise
Initial concepts to be known:
1)What are ab initio and homology based methods and what are the differences between them.
2) What is the structure of a prokaryotic operon.
Possible programs to be used for prediction of genes by ab initio methods:Glimmer
Implementation of Glimmer is divided into 2 steps:
1)First a probability model called an ICM of coding sequences is built based on the known
genes/genes from similar species.
build-icm [ options ] output-file < input-file
2) The glimmer3 program itself is run to analyze the sequences and make gene predictions.
glimmer3 [ options ] sequence icm tag.
The program can be downloaded from http://www.cbcb.umd.edu/software/glimmer/
Prodigal
Prodigal uses dynamic programming and performs well with genomes having high GC content.
For command line usage in a linux system use the following:Usage: prodigal [-a trans_file] [-c] [-d nuc_file] [-f output_type] [-g tr_table] [-h] [-i input_file]
[-m] [-n] [-o output_file][-p mode] [-q] [-s start_file] [-t training_file] [-v]
The program can be downloaded from http://code.google.com/p/prodigal/
GeneMark
Genemark.hmm use hidden mark model to predict genes in orfs. GeneMarks runs a self training
part and then runs Genemark.hmm for the final gene prediction.
I.
Prokaryotic GeneMark.hmm(Found in the GeneMarkS package)
Usage: gmhmmp [parameters ...] [sequence filename]
-m parameter is mandatory
Input sequence file in FASTA format can have multi-FASTA sequence
II.
GeneMarkS
Usage: gmsn.pl [options] <sequence file name>
Input sequence file in FASTA format
The program can be downloaded from http://exon.gatech.edu/license_download.cgi
Possible programs to be used for prediction of RNA genes:RNAmmer
Hidden
Usage: perl rnammer -S bac -m lsu,ssu,tsu -gff - < [inputfile]
The input file should be in fasta format.
The program can be downloaded from http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer
tRNAscan-SE
Identifies transfer RNA (tRNA) genes by integrating and post-processing the outputs of three
independent tRNA prediction programs

Runs tRNAscan and the Pavesi algorithm to find candidate tRNAs

Candidate tRNAs go through the covariance model search program for confirmation

Predicted tRNA bounds are trimmed and run through the covariance model global
structure alignment program to get a secondary structure prediction
Usage: tRNAscan-SE [-options] <FASTA file(s)>
The program can be downloaded from http://lowelab.ucsc.edu/tRNAscan-SE/
sRNA scanner

Identifies intergenic small RNA (sRNAs) transcriptional units

Transcriptional signal data used as positive training data

Uses a position weight matrix (PWM) to predict sRNAs
Usage: ./sRNAscanner.exec Input.data