Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
UBC, Vancouver, June 2006 THE GLOBAL MARINE VIRIOME Rob Edwards Dept. Biology , SDSU Computational Sciences Research Center, SDSU Center for Microbial Sciences, San Diego, Fellowship for Interpretation of Genomes, Chicago, IL The Burnham Inst. for Medical Research, San Diego IMEC, LLC, San Diego Outline • Forget DGGE, just sequence it – (Fabulous four-five-four for facile functional findings) • • • • • Functional analysis is a blast Is community structure antiestablishment? Are there viruses in the ocean? Why people suck Why we’re screwed Metagenomics 200 liters water 5-500 g fresh fecal matter Concentrate and purify viruses Epifluorescent Microscopy Extract nucleic acids DNA/RNA LASL Sequence Breitbart et al., multiple papers Pyrosequencing whole genome amplification 5-100ng DNA www.454.com 2-5 µg DNA 454 Sequence Data (Only from Rohwer Lab, in one year) • 42 libraries – 22 microbial, 20 phage • 1,028,563,420 bp total – 33% of the human genome – 95% of all complete and partial bacterial genomes – 10% of community sequencing of JGI per year • 9,933,184 sequences – Average 236,511 per library • Average read length 103.5 bp – Av. read length has not increased in 12 months Sampling Sites Freshwater Marine Near-shore water Off-shore water Near- and off-shore sediments Aquifer Glacial lake Metazoan associated Corals Fish Human blood Human stool Extreme Terrestrial/Soil Amazon rainforest Konza prairie Joshua Tree desert Air Hot springs (84oC; 78oC) Soda lake (pH 13) Solar saltern (>35% salt) Outline • Forget DGGE, just sequence it – (Fabulous four-five-four for facile functional findings) • • • • • Functional analysis is a blast Is community structure antiestablishment? Are there viruses in the ocean? Why people suck Why we’re screwed The SEED database developed by FIG http://theseed.uchicago.edu/FIG/index.cgi Current version: 580 Bacteria (342 complete) 38 Archaea (26 complete) 562 Eukarya (29 complete) 1335 Viruses 2 Environmental Genomes Subsystems are not just for gene clusters genome context (virulence islands, prophages, conserved gene clusters) virulence mechanism enzymatic activity cellular localization predicted or measured co-regulation common phenotype combinations of criteria Cyanoseed: http://cyanoseed.theFIG.info Marine Seed: http://theseed.uchicago.edu/FIG/organisms.cgi?show=marine Outline • Forget DGGE, just sequence it – (Fabulous four-five-four for facile functional findings) • • • • • Functional analysis is a blast Is community structure antiestablishment? Are there viruses in the ocean? Why people suck Why we’re screwed Assembly of 454 sequences mitochondrion ca. 17 kb assembled fragment ca. 10 kb Thanks: Lutz Krause Community structure Community structure based on frequency of finding overlapping fragments from the sequences 2-contig 3-contig Outline • Forget DGGE, just sequence it – (Fabulous four-five-four for facile functional findings) • • • • • Functional analysis is a blast Is community structure antiestablishment? Are there viruses in the ocean? Why people suck Why we’re screwed Phages In The Worlds Oceans ARC 56 samples 16 sites 1 year BBC 85 samples 38 sites 8 years LI 4 sites 1 year GOM 41 samples 13 sites 5 years SAR 1 sample 1 site 1 year Most Marine Phage Sequences are Novel Phages are specific to environments ssDNA -like Phage Proteomic Tree v. 5 (Edwards, Rohwer) T4-like T7-like Thanks: Mya Breitbart Marine Single-Stranded DNA Viruses • 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae) • 40% viral particles in SAR are ssDNA phage • Several full-genome sequences were recovered via de novo assembly of these fragments • Confirmed by PCR and sequencing SAR Aligned Against the Chlamydia 4 Individual sequence reads Coverage Concatenated hits Chlamydia phi 4 genome 12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome Chl4 ORF calls Outline • Forget DGGE, just sequence it – (Fabulous four-five-four for facile functional findings) • • • • • Functional analysis is a blast Is community structure antiestablishment? Are there viruses in the ocean? Why people suck Why we’re screwed Phages, Reefs, and Human Disturbance Phages, Reefs, and Human Disturbance Kingman Palmyra Washington Fanning Christmas The Northern Line Islands Expedition, 2005 16S rDNA at each island 16S genes from 454 same as from cloning Black stuff Cloned and 454 sequenced 16S are indistinguishable Cloned Red Red Christmas to Kingman Bias in No. Phage Hosts Negative numbers mean relatively more phage hosts at Kingman More photosynthesis at Kingman. No people at Kingman. More pathogens at Christmas. More people at Christmas. Outline • Forget DGGE, just sequence it – (Fabulous four-five-four for facile functional findings) • • • • • Functional analysis is a blast Is community structure antiestablishment? Are there viruses in the ocean? Why people suck Why we’re screwed Computational Challenges • Sequence annotations and analysis – What is there? – What is it doing? – How is it doing it? • Gene predictions in unknowns – Lutz Krause • Sequence comparisons – BLAST – Other ways to rapidly compare short sequences – What happens when everyone is using 454 sequencing? Sequence data from 21 libraries 600 million bp 6 million sequences • Each BLASTX search takes 1,000 CPU hours • 42 libraries = 42,000 CPU hours or 4.8 CPU years • Users want • repeat runs, • TBLASTX, • more analysis • more data • more, more, more, more SDSU Forest Rohwer Beltran Rodriguez-Brito Lutz Krause USF Mya Breitbart Rohwer Lab Linda Wegley Florent Angly Matt Haynes ANL Rick Stevens Bob Olsen CI Support Also at SDSU Anca Segall Willow R-S Stanley Maloy MIT: Ed DeLong FIG Veronika Vonstein Ross Overbeek Annotators Math Guys@SDSU Peter Salamon Joe Mahaffy James Nulton Ben Felts David Bangor Steve Rayhawk Jennifer Mueller