Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Picormatics Today’s goal: Give you an overview of some recent technological bioinformatics developments that can be applied to picornaviruses. Where possible in less than a day's work, I have applied those techniques, as an example, to 'my' virus: R14. This seminar is available (without © ) from: http://swift.cmbi.ru.nl/gv/seminars/ Some notes up front Your community is not very WWW oriented. This is concluded from a low number of cross pointers, high numbers of dead links and incomplete sites, and from a lack of update dates,contact addresses, references, etc. Your community is not very bioinformatics oriented either. Example, www.iah.bbsrc.ac.uk holds a beautifully complete list of VP1 sequences, one-by-one.... Your 'simple' bioinformatics options Many protein structures Many protein sequences This allows for structure based sequence alignments that are very precise, and therefore allow for novel sequence analysis techniques 1) Correlated mutation analysis 2) Sequence variability analysis Structure-based alignment This is top left corner of alignment of ~1000 sequences of ~300 residues Correlated mutations APGADSFGDFHKM ALGADSFRDFRRL ARGLDPFGMNHSI AGGLDPFRMNRRV Gray is conserved Black is variable Red/green are correlated mutations Correlated mutations guarantee a function. Function is determined by the position in the structure; not by the residue type. Correlated mutations Pilot indicates this works for VP1,2,3 too. Correlated mutations and drug design Correlated mutations and drug design Correlated mutations and drug design Correlations between residues and ligands. Automatic structure comparison Rhino Polio FMDV Mengo 9 2 1 1 Automatic structure comparison Automatic structure comparison Automatic structure comparison R14 drug placed in R16 Automatic structure comparison agonist antagonist Example from nuclear hormone receptor drug design study Back to sequences First rule of sequence analysis: If a residue is conserved, it is important. Sequence analysis (continued) Second rule of sequence analysis: If a residue is very conserved, it is very important. But what about the variable residues 20 Ei = S pi ln(pi) i=1 But what about the variable residues Sequence variability is the number of residues that is present in more than 0.5% of all sequences. Entropy - Variability Entropy Variability = = Information Chaos 11 main function Entropy - Variability 12 first shell around main function 22 core residues (signal transduction) 23 modulator 33 mainly surface Mutation information Most information about mutations is carefully hidden in the literature. Automatic extraction of this information is no longer science-fiction. More than 90% of the 2226 mutations used for the previous few slides were extracted automatically from the literature. We extracted160 more mutations 'by hand'. Problems are mainly related to protein/gene nomenclature, residue numbering, and unclear description of the effects. Mutation information Mutation information Mutation data Transcription Diseases 20% 60% 50% 15% 40% 10% 30% 20% 5% 10% 0% 0% Box 11 Box 12 Box 22 Box 23 Box 11 Box 33 Coregulator Box 12 Box 22 Box 23 Box 33 Dimerization 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% Box 11 Box 12 Box 22 Box 23 Box 33 Box 11 Box 12 Box 22 Box 23 Box 33 Mutation data No effect Ligand binding 6% 30% 5% 4% 20% 3% 10% 2% 0% 0% 1% Box 11 Box 12 Box 22 Box 23 Box 33 Box 23 Box 33 No mutations 25% 20% 15% 10% 5% 0% Box 11 Box 12 Box 22 Box 11 Box 12 Box 22 Box 23 Box 33 Picorna mutation information A PubMed search gives: picornavirus mutation rhinovirus mutation poliovirus mutation mengovirus mutation 1176 101 600 30 (2) (62) (144) (29) About 1 in 5 (in a small manually checked subset) contained identifiable mutation information in the abstract. But unfortunately often with nomenclature that 'our' software doesn't understand yet. Now something totally different Motion is the main ingredient for protein function. Even if that function is as 'dumb' as being a container for the RNA. For example, all early Rhino directed drugs were aimed at reducing the mobility of its VP1… Protein dynamics calculation The simulation of protein motion is normally called molecular dynamics, or MD. MD is commonly known as a very difficult technique for which you need the help of an army of mathematicians. That is no longer true. Dynamite (based on Bert de Groot's CONCOORD software) predicts protein motions via the WWW. Protein dynamics calculation A short break for a word from our sponsors Laerte Oliveira Wilma Kuipers Weesp Bob Bywater Copenhagen Nora vd Wenden The Hague Mike Singer New Haven Ad IJzerman Leiden Margot Beukers Leiden Fabien Campagne New York Øyvind Edvardsen TromsØ Simon Folkertsma Frisia Henk-Jan Joosten Wageningen Joost van Durma Brussels David Lutje Hulsik Utrecht Tim Hulsen Goffert Manu Bettler Lyon Adje F L O R E N C E H O R N Margot Our industrial sponsor: David Elmar Tim Fabien Manu Simon Folkertsma Krieger