Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparative Genomics of Viruses: VirGen as a case study Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune Pune 411 007 [email protected] Biodiversity Data diversity Data Diversity at various levels of Biocomplexity Viral Comparative Genomics • Viruses: Best represented taxa with respect to complete genome sequences • Viral genomes sequences: – ‘Entries’ in primary sequence databases such as GenBank/EMBL/DDBJ – Lack of annotations for genomic sequences – Opportunity to develop ‘Derived’ database VirGen: Comparative genomics & data mining of viral genomes © Bioinformatics Centre, University of Pune Browse VirGen at http://bioinfo.ernet.in/virgen/virgen.html Public Repository Database Issues involved in data curation • Source of data: GenBank • Retrieval engine: Entrez • Queries: well-designed Perl scripts • Consistent with ICTV nomenclature • Annotation including strain information • Generation of representative list of genomes • Sequence-based ontology for protein name • Annotation of unannotated entries using representative genomes What is complete and putative genome record? • Complete genome – annotated as 'complete genome record' by the primary sequence databases available in the public domain. • Putative genome – is not annotated as a ‘complete genome record’ but is likely to be a complete genome, as the sequence length is in the typical range of the complete genome for the respective virus. • As the database contains, multiple genomic entries for various strains/isolates for most of the viruses, a 'representative genomic entry' is identified for every viral species. The representative entries provide a non-redundant set of viral genome sequences, which are subsequently used for annotation and to study the phylogenetic relationships. Organisation of VirGen Salient features of VirGen: Organizes genomic data in a structured fashion navigating from the family to an isolate Full genomes of viruses Compilation of representative genome entries for every viral species (Virus Taxonomy, 7th report of ICTV) Complete annotation of every genomic entry Graphical representation of genome organization using SVG technology Generation of alternative names of proteins On-the-fly genome comparisons using BLAST2 Multiple Sequence Alignment (MSA) of genomes, proteomes and individual proteins Whole genome phylogeny Prediction of B-cell epitopes Design & Implementation OS: DBMS: Data processing & Query system: Graphical interface: Web interface Microsoft Windows 2000 server MySQLTM CGI Perl scripts and ASP SVG HTML implementing VB and Java scripts Sequence analysis programs used Sequence similarity search: Genome comparisons: Multiple Sequence Alignment: Phylogeny: B- cell epitope prediction: BLAST v2.2.5 (Altschul et al., 1997). BLAST2 v2.2.5 Parallel version of ClustalWv1.8 (Chenna et al., 2003) Parallel version of PHYLIP v3.573 (Felsenstein & SGI) Kolaskar & Tongaonkar (1990). VirGen home Menu to browse viral families Navigation bar Search using Keywords & Motifs Genome analysis & Comparative genomics resources Guided tour & Help Sample genome record in VirGen Tabular display of genome annotation Retrieve sequence in FASTA format ‘Alternate names’ of proteins Graphical view of Genome Organization Viral polyprotein along with the UTRs Graphical view generated dynamically using Scalable Vector Graphics technology Multiple Sequence Alignment MSA Link for batch retrieval of sequences Dendrogram Browsing the module of Whole Genome Phylogenetic trees Most parsimonious tree of genus Flavivirus Input data: Whole genome Method: DNA parsimony Bootstrapping: 1000 Browsing the module of Predicted epitopes B-cell epitopes predicted using Kolaskar & Tongaonkar method VirGen: Structure bin Links to CEP for precomputed sequential and conformational epitopes CEP: Conformational Epitope Prediction Server http://bioinfo.ernet.in/cep.htm Precomputed CEs: OCA browser (PDB) links CEP predictions Applications of VirGen • Representative Genome list – A curated and annotated data set for analyses • Genome View – Graphical representation of genome organisation – Insertion/Deletion analysis – Gene order • MSA data – Discovery of patterns: Diagnostics – Primer design • Predicted epitopes – Vaccinome at a glance: DNA/peptide vaccine • Whole Genome Phylogeny – Evolution of strains/viruses – Characterisation of virus Case study: Whole Genome Phylogeny depicts clustering of viruses w.r.t. their vectors Family: Flaviviridae Pestivirus Flavi:Tick borne Flavi:Mosquito borne Hepacivirus Unassigned ? Pestivirus Case Study: Insertions in Pestivirus 1 891-1787 bp region remains unannotated using representative strain What is the origin of the insert ??? BLAST with VirGen confirmed the non-viral origin of the insert BLAST with GenBank produced significant match with Bos taurus J-domain protein VirGen: current statistics