Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Roche Life Sciences Workshop, Sept 2008 The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory www.nmpdr.org www.theseed.org Outline • Metagenomics • Tools for analyzing sequences • Computational Challenges • Does it work? www.nmpdr.org www.theseed.org Number of known sequences How much has been sequenced? 100 Environmental bacterial sequencing genomes First bacterial genome 1,000 bacterial genomes Year www.nmpdr.org www.theseed.org How much will be sequenced? Everybody in USA Everybody in San Diego 100 people All cultured Bacteria www.nmpdr.org One genome from every species Most major microbial environments www.theseed.org Metagenomics (Just sequence it) 200 liters water 5-500 g fresh fecal matter 50 g soil Concentrate and purify bacteria, viruses, etc Epifluorescent Microscopy Extract nucleic acids Sequence Publish papers Modern Metagenomics Marine Near-shore water (~100 samples) Off-shore water (~50 samples) Metazoan associated Corals Fish Human blood Human stool Near- and off-shore sediments Freshwater Aquifer Glacial lake Extreme Terrestrial/Soil Terragenomics Amazon rainforest Konza prairie Joshua Tree desert Air Hot springs (84oC; 78oC) Soda lake (pH 13) Solar saltern (>35% salt) The Problem How do you generate consistent and accurate annotations for metagenomes? www.nmpdr.org www.theseed.org The SEED Family www.nmpdr.org www.theseed.org Annotations using subsystems FIG developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex Extended subsystems into FIGfams – protein families that perform the same functions. www.nmpdr.org www.theseed.org Annotation of Complete Genomes http://rast.nmpdr.org/ • Automated user originated processing • Takes 1-7 hours depending on size and complexity of the genome • ~2,000 external submissions, including hundreds of genomes not yet publicly released. • Reannotation of >500 genomes complete • 1,000 users, 200 organizations, 25 countries. www.nmpdr.org www.theseed.org The metagenomics RAST server www.nmpdr.org www.theseed.org Automated Processing Summary View www.nmpdr.org www.theseed.org Metagenomics Tools Annotation & Subsystems www.nmpdr.org www.theseed.org Metagenomics Tools Annotation & KEGG maps Metagenomics Tools Recruitment Plots Metagenomics Tools Phylogenetic Reconstruction Metagenomics Tools Comparative Tools Computational Requirements Hours of Compute Time ~19 hours of compute per input megabyte Input size (MB) www.nmpdr.org www.theseed.org How much so far 986 metagenomes ~300 GS20 ~300 FLX ~300 Sanger 79,417,238 sequences 17,306,834,870 bp (17 Gbp) Average: ~15-20 M bp per genome Compute time (on a single CPU): 328,814 hours = 13,700 days = 38 years www.nmpdr.org www.theseed.org Lots of sequences all pyrosequencing www.nmpdr.org www.theseed.org Metagenomics Tools Functional Heat Maps From Sequences To Environments Stress Membrane transport Sulfur Signaling Capsule Motility Phosphorus RNA CDA 60.2% CDA 21.7% Mine Saltern Coral Fish Respiration Marine Microbialites Animals Freshwater Dinsdale et al, Nature 200 Workshops Free workshops on NMPDR, RAST, mg-RAST, SEED Contact Leslie McNeil [email protected] or visit http://www.nmpdr.org/ www.nmpdr.org www.theseed.org Acknowledgements FIG Ross Overbeek Veronika Vonstein Annotators Metagenomics Annotation Server Rick Stevens Folker Meyer Bob Olson Daniel Paarman Mark D'Souza Argonne Sequencing Jared Wilkening Marc Domanus Andreas Wilke Areej Ammar Statistics & Web services Liz Dinsdale Robert Schmieder Dana Hall Beltran Rodriguez-Brito Bahador Nosrat Environmental Genomics Forest Rohwer All the labs that provided sequence Artist Paula Morris www.nmpdr.org www.theseed.org Artists impression : not all machines are known to explode Terragenomics Differences between soil samples