Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Global Evolution and Adaptation of Vibrio cholerae Across Multiple Niche Dimensions Flinders 2015 Rob Edwards How to annotate a couple of hundred genomes Flinders 2015 Rob Edwards Annotation of microbial genomes and comparison across differences Cholerae Haiti Genome Sequencing ORF Calling Annotation Global evolution Niche dimensions Cholera is caused by Vibrio cholerae A world wide pandemic About 3-5 million cases per year About 100 - 200,000 deaths world wide per year Notable deaths: Tchaikovsky Symptoms About 75% of patients have no symptoms 25-50 PINTS of diarrhea per DAY Severe symptoms are by dehydration Treatment Clean water Electrolytes Vaccine Not antibiotics Multiple Pandemics 1st – 1817 to 1823 Started at the Ganges, spread by colonialists 2nd – 1829 to 1849 Worldwide spread via immigrants 3rd – 1852 to 1859 John Snow first epidemiologist First epidemiological study John Snow Portrait painted in 1847 when he was 34 years old. First epidemiological study John Snow Cholera outbreak in Soho, London 1854 Plotted all cases on a map Found big cluster around water well First epidemiological study John Snow’s Map On the mode of communication of Cholera 1854 Cholera caused by bacteria Outbreaks of cholera Multiple Pandemics 1st – 1817 to 1823 Started at the Ganges, spread by colonialists 2nd – 1829 to 1849 Worldwide spread via immigrants 3rd – 1852 to 1859 John Snow first epidemiologist 4th – 1863 to 1879 Originated in mecca 5th – 1881 to 1896 First cholerae vaccine (1892) 6th – 1899 to 1923 Killed 800,000 people 7th – 1961 to present Haitian Outbreak Earthquake Jan 12th, 2010 No cholera in Haiti for > 50 years First case, October 22nd, 2010 By February, 2011 250,000 cases and ~5,000 deaths What was the original source? Haitian cholera outbreaks http://www.ph.ucla.edu/ Source: Final Report of the Independent Panel of Experts on the Cholera Outbreak in Cases by day – Mirebalais Hospital Cases by Age – St Marc Hospital On October 20th, 2010 Haitian Outbreak Two hypotheses: Endemic, waterborne strain that has been in Haiti but not caused disease for 50 years Imported from another country The environmental hypothesis "They have been fortunate in Haiti that for 50 years the conditions have been such that they haven’t had an intense increase in cholera bacterial populations. ... But they’ve had an earthquake, they’ve had destruction, they’ve had a hurricane ... I think it’s very unfortunate to look for a scapegoat. It is an environmental phenomenon that is involved” Rita Colwell Johns Hopkins School of Public Health The human hypothesis “The organism that is causing the disease is very uncharacteristic of (Haiti and the Caribbean), and is quite characteristic of the region from where the soldiers in the base came. ... I don't see there is any way to avoid the conclusion that an unfortunate and presumably accidental introduction of the organism occurred." John Mekalanos Harvard Medical School Conditions favor human hypothesis Source: Final Report of the Independent Panel of Experts on the Cholera Outbreak in Conditions favor human hypothesis Source: Final Report of the Independent Panel of Experts on the Cholera Outbreak in Conditions favor human hypothesis Source: Final Report of the Independent Panel of Experts on the Cholera Outbreak in Global evolution of Vibrio Can Which we use gene(s) genomics are important to identifyforthe global temporal/spatial evolution of Vibrio? variation? Prototype Vibrio cholerae sequence TIGR Nature 406, 477-483(3 August 2000) Sequenced genomes 2011 – 32 Vibrio strains sequenced Fabiano Thompson's Lab @ UFRJ Fundação Oswaldo Cruz Ion quality scores 2011 – 171 Vibrio strains sequenced Sequenced genomes 2011 – 32 Vibrio strains sequenced How do you analyze 250+ genomes? The steps in genome sequencing Generate genome sequence Assembly ORF calling tRNA identification rRNA identification Functional annotation www.sigmaaldrich.com Putative protein Open Reading Frame (ORF) Coding Sequence (CDS) An ORF that could encode a protein Hypothetical protein = putative protein An ORF that could encode a protein Protein encoding gene (PEG) A stretch of amino acids with no stop codon Something that has not been experimentally shown Polypeptide Reads per chromosome (Chr. I) Reads per chromosome (Chr. II) Cholera Toxin Phage Assembly ORF Calling Annotation Annotated Vibrio using RAST Single nucleotide polymorphisms ATCATCGATCAGCATGCATCAGCATCGATCAGC ATCATCGATCAGCATGCATCAGCATCGATCAGC ATCATCGATCAGCATGCATCAGCCTCGATCAGC ATCATCGATCAGCATGCATCAGCCTCGATCAGC ATCATCGATCAGCAAGCATCAGCCTCGATCAGC ATCATCGATCAGCAAGCATCAGCCTCGATCAGC ATCATCGATCAGCAAGCATCAGCCTCGATCAGC ATCATCGATCAGCAAGCATCAGCCTCGAGCAGC ATCATCGATCAGCAAGCATCAGCCTCGAGCAGC Global evolution Mutreja et al 2011 Waves of spread of cholera Mutreja et al 2011 Different evolution for each wave Mutreja et al 2011 On the source of Haitian cholera Harveyi Parahemolyticus Mimicus Cholerae Vibrio cholerae from Bangladesh in 1994 Vibrio cholerae from Haiti in 2010 Vibrio cholerae from Bangladesh in 2002 Vibrio cholerae from Haiti in 2010 Vibrio cholerae from Haiti in 2010 Nepalese soldiers? Outbreak in Khatmandu, Nepal before the soldiers left Outbreaks downstream (not upstream) along the river from the nepalese UN camp But that could have come from river trade. Ships used to fly the yellow flag when they were quarantined by cholera Haitian cholera outbreaks http://www.ph.ucla.edu/ Evolution not only by SNPs Conservation of the ~120 kb superintegron region across 210 strains Horizontal gene transfer versus Vertical evolution Mother SNPs Daughter Daughter HGT Niche dimensions 210 Vibrio genomes Reassembled Reannotated Find interesting genes! Year Continent Country Lat/Lon Coordinates Clinical or Environmental Source Serogroup Serotype V. cholerae classification Vibrio cholerae Cholera toxin Serogroup Biotype Serotype O1 Non-cholera toxin O139 Classical El Tor Ogawa Inaba Epidemics No disease Response variables 15,000 genes in the pangenome 933 subsystems (pathways) present in at least one genome SNPs (after Mutreja) Analysis Recreate evolution of the Vibrios What are the important genes for each niche dimension Who, what, when, where! Use random forests to identify important variables Random Forest O-antigen Exopolysaccharide Capsule Sialic Acid DNA recomb. 01 10 20 5 10 01 10 20 5 10 01 10 20 5 10 0139 100 1 8 10 0139 100 1 8 10 0139 100 1 8 10 0139 100 1 8 10 Random Forest Exopolysaccharide <50 DNA- O1 recombination Capsule 10 O139 <10 O1 O139 O1 O139 Random Forest Each tree votes on the importance of each variable. Typically, run 10,000 trees Response variables and niche dimensions Genes important for who ? (serogroup) Genes important for what? (clinical, environmental, ...) Genes important for where? (continent) Separation of functions by continent Genes important for when? (year) DNA Repair DNA repair & phages Normal DNA repair (134 strains) Additional DNA repair (4 strains; not O1) Phage borne DNA repair (72 strains) umuC umuD umuC umuD prophage umuC Different evolution for each wave Waves 2 & 3 have phage Interrupted repair Mutreja et al 2011 Conclusions Unraveling evolution and spread of new pathogens Mining genomes and niche dimensions Don't get scooped! Multi-genome projects?? Current multigenome projects Organism Number Organism Number S. pyogenes 3,615 Mycobacterium tuberculosis 390 S. pneumoniae 3,085 Salmonella in cattle and humans 373 Rice (Oryza sativa) 3,000 Vibrio 274 C. elegans 2,007 Shigella sonnei 263 Clostridium difficile 1,250 Mycobacterium tuberculosis 259 The thousand (human) genome project 1,092 Streptococcus pneumoniae 240 Mycobacterium tuberculosis 1,000 Methicillin-resistant Staphylococcus aereus 193 Plasmodium falciparum 825 Campylobacter jejuni 192 Streptococcus pneumoniae 616 Mycobacterium abscessus in CF 170 Nick Loman: http://lab.loman.net/