Download The Metagenomics RAST server: Annotation, Analysis, and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Roche Life Sciences Workshop, Sept 2008
The Metagenomics RAST server:
Annotation, Analysis, and Comparisons
Perfect for Pyrosequencing
Rob Edwards
Department of Computer Science,
San Diego State University
Mathematics and Computer Sciences
Division, Argonne National Laboratory
www.nmpdr.org
www.theseed.org
Outline
• Metagenomics
• Tools for analyzing sequences
• Computational Challenges
• Does it work?
www.nmpdr.org
www.theseed.org
Number of known sequences
How much has been sequenced?
100
Environmental bacterial
sequencing
genomes
First
bacterial
genome
1,000
bacterial
genomes
Year
www.nmpdr.org
www.theseed.org
How much will be sequenced?
Everybody in
USA
Everybody in
San Diego
100
people
All
cultured
Bacteria
www.nmpdr.org
One genome from
every species
Most major
microbial environments
www.theseed.org
Metagenomics
(Just sequence it)
200 liters water
5-500 g fresh fecal matter
50 g soil
Concentrate and purify bacteria,
viruses, etc
Epifluorescent
Microscopy
Extract nucleic acids
Sequence
Publish papers
Modern
Metagenomics
Marine
Near-shore water (~100 samples)
Off-shore water (~50 samples)
Metazoan
associated
Corals
Fish
Human blood
Human stool
Near- and off-shore sediments
Freshwater
Aquifer
Glacial lake
Extreme
Terrestrial/Soil
Terragenomics
Amazon rainforest
Konza prairie
Joshua Tree desert
Air
Hot springs
(84oC; 78oC)
Soda lake
(pH 13)
Solar saltern
(>35% salt)
The Problem
How do you generate consistent and accurate
annotations for metagenomes?
www.nmpdr.org
www.theseed.org
The SEED
Family
www.nmpdr.org
www.theseed.org
Annotations using subsystems
FIG developed the notion of Subsystem – a
generalization of “pathway” as a collection of
functional roles jointly involved in a biological
process or complex
Extended subsystems into FIGfams – protein
families that perform the same functions.
www.nmpdr.org
www.theseed.org
Annotation of Complete Genomes
http://rast.nmpdr.org/
• Automated user originated
processing
• Takes 1-7 hours depending on
size and complexity of the
genome
• ~2,000 external submissions,
including hundreds of genomes
not yet publicly released.
• Reannotation of >500 genomes
complete
• 1,000 users, 200 organizations,
25 countries.
www.nmpdr.org
www.theseed.org
The metagenomics RAST server
www.nmpdr.org
www.theseed.org
Automated
Processing
Summary View
www.nmpdr.org
www.theseed.org
Metagenomics Tools
Annotation & Subsystems
www.nmpdr.org
www.theseed.org
Metagenomics Tools
Annotation & KEGG maps
Metagenomics Tools
Recruitment Plots
Metagenomics Tools
Phylogenetic Reconstruction
Metagenomics Tools
Comparative Tools
Computational Requirements
Hours of Compute Time
~19 hours of compute per input megabyte
Input size (MB)
www.nmpdr.org
www.theseed.org
How much so far
986 metagenomes
~300 GS20
~300 FLX
~300 Sanger
79,417,238 sequences
17,306,834,870 bp (17 Gbp)
Average: ~15-20 M bp per genome
Compute time (on a single CPU):
328,814 hours = 13,700 days = 38 years
www.nmpdr.org
www.theseed.org
Lots of sequences
all pyrosequencing
www.nmpdr.org
www.theseed.org
Metagenomics Tools
Functional Heat Maps
From Sequences To Environments
Stress
Membrane
transport
Sulfur
Signaling
Capsule
Motility
Phosphorus
RNA
CDA 60.2%
CDA 21.7%
Mine
Saltern
Coral
Fish
Respiration
Marine
Microbialites
Animals
Freshwater
Dinsdale et al, Nature 200
Workshops
Free workshops on NMPDR, RAST, mg-RAST, SEED
Contact Leslie McNeil
[email protected]
or visit
http://www.nmpdr.org/
www.nmpdr.org
www.theseed.org
Acknowledgements
FIG
Ross Overbeek
Veronika Vonstein
Annotators
Metagenomics Annotation Server
Rick Stevens
Folker Meyer
Bob Olson
Daniel Paarman
Mark D'Souza
Argonne Sequencing
Jared Wilkening
Marc Domanus
Andreas Wilke
Areej Ammar
Statistics & Web services
Liz Dinsdale
Robert Schmieder
Dana Hall
Beltran Rodriguez-Brito
Bahador Nosrat
Environmental Genomics
Forest Rohwer
All the labs that
provided sequence Artist
Paula Morris
www.nmpdr.org
www.theseed.org
Artists impression : not all machines are known to explode
Terragenomics
Differences between soil samples
Related documents