Download EdwardsUBC - Edwards @ SDSU

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
UBC, Vancouver, June 2006
THE GLOBAL MARINE VIRIOME
Rob Edwards
Dept. Biology , SDSU
Computational Sciences Research Center, SDSU
Center for Microbial Sciences, San Diego,
Fellowship for Interpretation of Genomes, Chicago, IL
The Burnham Inst. for Medical Research, San Diego
IMEC, LLC, San Diego
Outline
• Forget DGGE, just sequence it
– (Fabulous four-five-four for facile functional findings)
•
•
•
•
•
Functional analysis is a blast
Is community structure antiestablishment?
Are there viruses in the ocean?
Why people suck
Why we’re screwed
Metagenomics
200 liters water
5-500 g fresh fecal matter
Concentrate and purify viruses
Epifluorescent
Microscopy
Extract nucleic acids
DNA/RNA LASL
Sequence
Breitbart et al., multiple papers
Pyrosequencing
whole genome
amplification
5-100ng DNA
www.454.com
2-5 µg DNA
454 Sequence Data
(Only from Rohwer Lab, in one year)
• 42 libraries
– 22 microbial, 20 phage
• 1,028,563,420 bp total
– 33% of the human genome
– 95% of all complete and partial bacterial genomes
– 10% of community sequencing of JGI per year
• 9,933,184 sequences
– Average 236,511 per library
• Average read length 103.5 bp
– Av. read length has not increased in 12 months
Sampling
Sites
Freshwater
Marine
Near-shore water
Off-shore water
Near- and off-shore
sediments
Aquifer
Glacial lake
Metazoan
associated
Corals
Fish
Human blood
Human stool
Extreme
Terrestrial/Soil
Amazon rainforest
Konza prairie
Joshua Tree desert
Air
Hot springs
(84oC; 78oC)
Soda lake
(pH 13)
Solar saltern
(>35% salt)
Outline
• Forget DGGE, just sequence it
– (Fabulous four-five-four for facile functional findings)
•
•
•
•
•
Functional analysis is a blast
Is community structure antiestablishment?
Are there viruses in the ocean?
Why people suck
Why we’re screwed
The SEED database developed
by FIG
http://theseed.uchicago.edu/FIG/index.cgi
Current version:
580 Bacteria (342 complete)
38 Archaea (26 complete)
562 Eukarya (29 complete)
1335 Viruses
2 Environmental Genomes
Subsystems are not just for gene clusters
genome context
(virulence islands, prophages,
conserved gene clusters)
virulence mechanism
enzymatic activity
cellular localization
predicted or measured
co-regulation
common phenotype
combinations of criteria
Cyanoseed:
http://cyanoseed.theFIG.info
Marine Seed:
http://theseed.uchicago.edu/FIG/organisms.cgi?show=marine
Outline
• Forget DGGE, just sequence it
– (Fabulous four-five-four for facile functional findings)
•
•
•
•
•
Functional analysis is a blast
Is community structure antiestablishment?
Are there viruses in the ocean?
Why people suck
Why we’re screwed
Assembly of 454 sequences
mitochondrion ca. 17 kb
assembled fragment ca. 10 kb
Thanks: Lutz Krause
Community structure
Community structure based on frequency of finding
overlapping fragments from the sequences
2-contig
3-contig
Outline
• Forget DGGE, just sequence it
– (Fabulous four-five-four for facile functional findings)
•
•
•
•
•
Functional analysis is a blast
Is community structure antiestablishment?
Are there viruses in the ocean?
Why people suck
Why we’re screwed
Phages In The Worlds Oceans
ARC
56 samples
16 sites
1 year
BBC
85 samples
38 sites
8 years
LI
4 sites
1 year
GOM
41 samples
13 sites
5 years
SAR
1 sample
1 site
1 year
Most Marine Phage Sequences are Novel
Phages are specific to environments
ssDNA
-like
Phage
Proteomic
Tree v. 5
(Edwards, Rohwer)
T4-like
T7-like
Thanks:
Mya Breitbart
Marine Single-Stranded DNA Viruses
•
6% of SAR sequences ssDNA phage (Chlamydia-like
Microviridae)
•
40% viral particles in SAR are ssDNA phage
•
Several full-genome sequences were recovered via de novo
assembly of these fragments
•
Confirmed by PCR and sequencing
SAR Aligned Against the Chlamydia 4
Individual
sequence
reads
Coverage
Concatenated
hits
Chlamydia phi 4
genome
12,297 sequence fragments hit
using TBLASTX over a ~4.5 kb genome
Chl4 ORF
calls
Outline
• Forget DGGE, just sequence it
– (Fabulous four-five-four for facile functional findings)
•
•
•
•
•
Functional analysis is a blast
Is community structure antiestablishment?
Are there viruses in the ocean?
Why people suck
Why we’re screwed
Phages, Reefs, and Human Disturbance
Phages, Reefs, and Human Disturbance
Kingman
Palmyra
Washington
Fanning
Christmas
The Northern Line Islands
Expedition, 2005
16S rDNA at each island
16S genes from 454 same as from cloning
Black stuff
Cloned and 454 sequenced
16S are indistinguishable
Cloned
Red
Red
Christmas to Kingman Bias in No. Phage Hosts
Negative numbers mean relatively more phage hosts at Kingman
More photosynthesis at Kingman.
No people at Kingman.
More pathogens at Christmas.
More people at Christmas.
Outline
• Forget DGGE, just sequence it
– (Fabulous four-five-four for facile functional findings)
•
•
•
•
•
Functional analysis is a blast
Is community structure antiestablishment?
Are there viruses in the ocean?
Why people suck
Why we’re screwed
Computational Challenges
• Sequence annotations and analysis
– What is there?
– What is it doing?
– How is it doing it?
• Gene predictions in unknowns
– Lutz Krause
• Sequence comparisons
– BLAST
– Other ways to rapidly compare short sequences
– What happens when everyone is using 454 sequencing?
Sequence data from 21 libraries
600 million bp
6 million sequences
• Each BLASTX search takes 1,000 CPU hours
• 42 libraries = 42,000 CPU hours or 4.8 CPU years
• Users want
• repeat runs,
• TBLASTX,
• more analysis
• more data
• more, more, more, more
SDSU
Forest Rohwer
Beltran Rodriguez-Brito
Lutz Krause
USF
Mya Breitbart
Rohwer Lab
Linda Wegley
Florent Angly
Matt Haynes
ANL
Rick Stevens
Bob Olsen
CI Support
Also at SDSU
Anca Segall
Willow R-S
Stanley Maloy
MIT:
Ed DeLong
FIG
Veronika Vonstein
Ross Overbeek
Annotators
Math Guys@SDSU
Peter Salamon
Joe Mahaffy
James Nulton
Ben Felts
David Bangor
Steve Rayhawk
Jennifer Mueller
Related documents