Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure Saskia de Groot and Jotun Hein Department of Statistics, University of Oxford Motivation Results Viral genome annotation is a complex task: • Overlapping and nested reading frames • Atypical sequence evolution • Non-conserved gene structure HIV2 vs. HIV2 84-89% Sensitivity 97-99.9% Specificity ⇒ Current comparative HMM methodologies can’t cope. Aim • Introduce a pair hidden Markov model to annotate two aligned homologous genomes simultaneously • Analyse HIV1 and HIV2 – two sequences related, but with non-homologous gene structure HIV1 vs. HIV2 84% Sensitivity 98.5% Specificity • Incorporate prior knowledge by annotating one sequence conditional on the other Methods Introduce pair HMM specific to overlapping reading frames. 3 reading frames ⇒ 23 x23 = 64 states Define three different types of start transition probability α, β, γ depending on coding state HIV1 | HIV2 98.7% Sensitivity 99.5% Specificity Use evolutionary model specific to overlapping reading frames – substitutions are accepted by a selection factor f. Use EM with Forward-Backward and Newton-Raphson for parameter estimation & Viterbi to get annotation. Conclusion Future Work • Shown validity of overlapping pair HMM approach • Improve model by adding varying selection levels • Demonstrated amount of information contained in conservation of gene structure • Incorporate a viral genome aligner for de novo gene annotation • Provided successful method for annotating new viral strains • Build viral genome & evolution simulator to test hypotheses