Download Comparative Annotation of Viral Genomes with Non

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Comparative Annotation of Viral Genomes
with Non-Conserved Gene Structure
Saskia de Groot and Jotun Hein
Department of Statistics, University of Oxford
Motivation
Results
Viral genome annotation is a complex task:
• Overlapping and nested reading frames
• Atypical sequence evolution
• Non-conserved gene structure
HIV2 vs. HIV2
84-89% Sensitivity
97-99.9% Specificity
⇒ Current comparative HMM methodologies can’t cope.
Aim
• Introduce a pair hidden Markov model to annotate two aligned
homologous genomes simultaneously
• Analyse HIV1 and HIV2 – two sequences related, but with
non-homologous gene structure
HIV1 vs. HIV2
84% Sensitivity
98.5% Specificity
• Incorporate prior knowledge by annotating one sequence
conditional on the other
Methods
Introduce pair HMM specific
to overlapping reading frames.
3 reading frames ⇒
23 x23 = 64 states
Define three different types of
start transition probability α, β, γ
depending on coding state
HIV1 | HIV2
98.7% Sensitivity
99.5% Specificity
Use evolutionary model specific
to overlapping reading frames –
substitutions are accepted by a
selection factor f.
Use EM with Forward-Backward and Newton-Raphson for
parameter estimation & Viterbi to get annotation.
Conclusion
Future Work
• Shown validity of overlapping pair HMM approach
• Improve model by adding varying selection levels
• Demonstrated amount of information contained in
conservation of gene structure
• Incorporate a viral genome aligner for de novo gene
annotation
• Provided successful method for annotating new viral strains
• Build viral genome & evolution simulator to test hypotheses