Download RNA gene prediction

Document related concepts

RNA world wikipedia , lookup

Point mutation wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

X-inactivation wikipedia , lookup

NEDD9 wikipedia , lookup

RNA wikipedia , lookup

Metagenomics wikipedia , lookup

Human genome wikipedia , lookup

Transposable element wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genomics wikipedia , lookup

History of RNA biology wikipedia , lookup

Primary transcript wikipedia , lookup

History of genetic engineering wikipedia , lookup

RNA interference wikipedia , lookup

Public health genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic engineering wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genome (book) wikipedia , lookup

Gene wikipedia , lookup

Non-coding RNA wikipedia , lookup

The Selfish Gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome editing wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA silencing wikipedia , lookup

Epitranscriptome wikipedia , lookup

Gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene nomenclature wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Designer baby wikipedia , lookup

Transcript
Gene Prediction
Chengwei Luo, Amanda McCook, Nadeem Bulsara,
Phillip Lee, Neha Gupta, and Divya Anjan Kumar
Gene Prediction
•
Introduction
•
Protein-coding gene prediction
•
RNA gene prediction
•
Modification and finishing
•
Project schema
Gene Prediction
•
Introduction
•
Protein-coding gene prediction
•
RNA gene prediction
•
Modification and finishing
•
Project schema
Why gene prediction?
experimental way?
Why gene prediction?
Exponential growth of sequences
New sequencing technology
Metagenomics: ~1% grow in lab
How to do it?
How to do it?
It is a complicated task, let’s break it into parts
How to do it?
It is a complicated task, let’s break it into parts
Genome
How to do it?
It is a complicated task, let’s break it into parts
Genome
How to do it?
Protein-coding gene prediction
Homology Search
Phillip Lee & Divya Anjan Kumar
ab initio approach
Nadeem Bulsara & Neha Gupta
How to do it?
RNA gene prediction
Amanda McCook & Chengwei Luo
tRNA
rRNA
sRNA
Homology Search
Homology Search
Strategy
open reading frame(ORF)
How/Why find ORF?
How/Why find ORF?
How/Why find ORF?
Protein Database Searches
Domain searches
Limits of Extrinsic Prediction
ab initio Prediction
Homology Search is not Enough!
Biased and incomplete Database
sequenced genomes are not evenly
distributed on the tree of life, and does
not reflect the diversity accordingly
either.
ab initio Gene Prediction
Features
ORFs (6 frames)
Codon Statistics
Features (Contd.)
Probabilistic View
Supervised Techniques
Unsupervised Techniques
Usually Used Tools
GeneMark
Glimmer
EasyGene
PRODIGAL
GeneMark
GeneMark.hmm
GeneMark.hmm
GeneMarkS
Glimmer
Glimmer Journey
Glimmer3.02
PRODIGAL
Prokaryotic Dynamic Programming Gene Finding Algorithm
Developed at Oak Ridge National Laboratory and the University of Tennessee
Features
Features
EasyGene
Developed at University of Copenhagen
Statistical significance is the measure for gene prediction.
Ґ High quality data set based on
similarity in SwissPRot is
extracted from genome.
Ґ Data set used to estimate the
HMM where based on ORF score
and length statistical significance is
calculated.
Problem:
Ґ No standalone version available
Comparison of Different Tools
RNA Gene Prediction
Why Predict RNA?
Regulatory sRNA
sRNA Challenges
Fundamental Methodology
RFAM
What Is Covariance?
Fig: Christian Weile et al. BMC Genomics (2007) 8:244
Noncomparative Prediction
Fig: James A. Goodrich & Jennifer F. Kugel, Nature Rev. Mol. Cell Biol. (2006) 7:612
Noncomparative Prediction
*Rolf Backofen & Wolfgang R. Hess, RNA Biol. (2010) 7:1
Comparative+Noncomparative
•
Effective sRNA prediction in V. cholerae
•
Non-enterobacteria
•
sRNAPredict2
•
32 novel sRNAs predicted
•
9 tested
•
6 confirmed
Jonathan Livny et al. Nucleic Acids Res. (2005) 33:4096
Software
*Rolf Backofen & Wolfgang R. Hess, RNA Biol. (2010) 7:1
Eva K. Freyhult et al. Genome Res. (2007) 17:117
Modification & finishing
•
Consensus strategy to integrate ab initio
results
•
Broken gene recruiting
•
TIS correcting
•
IS calling
•
operon annotating
•
Gene presence/absence analysis
Modification & finishing
Consensus strategy
Broken gene recruiting
pass
pass
candidate fragments
fail
homology search
ab initio results
Modification & finishing
TIS correcting
Start codon redundancy:ATG, GTG, TTG, CTG
Leaderless genes
Markov iteration, experimental verified data
Modification & finishing
IS calling
IS Finder DB
Operon annotating
Modification & finishing
Gene Presence/absence analysis
Schema (proposed)
Schema (proposed)
assembly group
Schema (proposed)
assembly group