* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT - Larry Smarr
Survey
Document related concepts
Transcript
The Emerging Global Community of Microbial Metagenomics Researchers Opening Talk Metagenomics 2007 Calit2@UCSD July 11, 2007 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD Abstract Calit2, the J. Craig Venter Institute, and UCSD's SDSC and Scripps Institution of Oceanography, is creating a metagenomic Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA), funded by the Gordon and Betty Moore Foundation. The CAMERA computational and storage cluster, which contains multiple ocean microbial metagenomic datasets, as well as the full genomes of ~166 marine microbes, is actively in use. End users can access the metagenomic data either via the web or over novel dedicated 10 Gb/s light paths (termed "lambdas") through the National LambdaRail. The end user clusters are reconfigured as "OptIPortals," providing the end user with local scalable visualization, computing, and storage. Currently over 1000 users from over 40 countries are CAMERA registered users, with over a dozen remote OptIPortal sites becoming active. This CAMERA connected community sets the stage for creating a software system to support a social network of metagenomic researchers--a "MySpace" for scientists. We look forward to gathering ideas from Metagenomics 2007 participants for the functional requirements of such a system. Calit2 Brings Computer Scientists and Engineers Together with Biomedical Researchers • Some Areas of Concentration: – Algorithmic and System Biology – Bioinformatics – Metagenomics – Cancer Genomics – Human Genomic Variation and Disease – Proteomics – Mitochondrial Evolution – Computational Biology – Multi-Scale Cellular Imaging UC Irvine National Biomedical Computation Resource an NIH supported resource center – Information Theory and Biological Systems – Telemedicine Southern California Telemedicine Learning Center (TLC) UC Irvine Philip Papadopoulos, SDSC/Calit2 2pm Friday Paul Gilna Ex. Dir. PI Larry Smarr Announced January 17, 2006 $24.5M Over Seven Years CAMERA 1.1 is Up and Running! CAMERA Combines Genomic and Metagenomic Tools Can We Create a “My Space” for Science Researchers? Microbial Metagenomics as a Cyber-Community Over 1000 Registered Users From 45 Countries 70 CAMERA Users Feedback Session Friday 2pm Paul Gilna USA United Kingdom Canada France Germany 583 46 35 35 32 • Calit2 is Prototyping Social Networks for Reseachers • Research Intelligence Project – ri.calit2.net • Add in: – – – – – MyProteins MyMicrobes MyEnvironments MyPapers MyGenomes Emerging Capabilities That Tie Together Metagenomics Researchers • Advanced Computing Techniques • Broad Coverage of Complete Microbe Genomes – Moore Foundation – DOE JGI • Proteomics of Microbes • Cellular Network Models Metagenomic Challenge--Enormous Biodiversity: Very Little of GOS Metagenomic Data Assembles Well • Use Reference Genomes to Recruit Fragments – Compared 334 Finished and 250 Draft Microbial Genomes • Only 5 Microbial Genera Yielded Substantial and Uniform Recruitment – Prochlorococcus, Synechococcus, Pelagibacter, Shewanella, and Burkholderia Source: Douglas Rusch, et al. (PLOS Biology March 2007) Use of Self Organizing Maps to Identify Species Massive Computation on the Japanese Earth Simulator C. Elegans Drosophilia Rice Arabidopsis SOM Created from an Unsupervised Neural Network Algorithm to Analyze Tetranucleotide Frequencies in a Wide Range of Genomes Fugu Human 10kb Moving Window T. Abe, H. Sugawara, S. Kanaya, T. Ikemura Journal of the Earth Simulator, Volume 6, October 2006, 17–23 www.es.jamstec.go.jp/publication/journal/jes_vol.6/pdf/JES6_22-Abe.pdf Using SOM, Sargasso Sea Metagenomic Data Yields 92 Microbial Genera ! Eukaryotes Mitochondria Chloroplasts Prokaryotes Viruses Input Genomes: 1500 Microbes 40 Eukaryotes 1065 Viruses 642 Mitochondria 42 Chloroplasts 5kb Window T. Abe, H. Sugawara, S. Kanaya, T. Ikemura Journal of the Earth Simulator, Volume 6, October 2006, 17–23 Moore Microbial Genome Sequencing Project Selected Microbes Throughout the World’s Oceans Microbes Nominated by Leading Ocean Microbial Biologists www.moore.org/microgenome/worldmap.asp Moore Foundation Funded the Venter Institute to Provide the Full Genome Sequence of 155 Marine Microbes Phylogenetic Trees Created by Uli Stingl, Oregon State Blue Means Contains One of the Moore 155 Genomes www.moore.org/microgenome/trees.aspx Moore 155 Marine Microbial Genomes Gives Broad Coverage of Microbial “Tree of Life” Phylogenetic Trees Created by Uli Stingl, Oregon State www.moore.org/microgenome/alpha-proteobacteria.aspx Joint Genome Institute is a Leading Microbial Genomic Source JGI Metagenomics Projects (42 Projects) 2005 termite hindgut (CalTech) planktonic archaea (MIT) EBPR sludge (UW/UQ) groundwater (ORNL) 2006 AMD Alaskan soil (UW) Gutless worm (MPI) TA-degrading bioreactor (NUS) Antarctic bacterioplankton (DRI) hypersaline mats (UCol) Korarchaeota enrichment Farm soil (Diversa) 2007 8 new metagenomic projects Source: Eddie Rubin, DOE JGI Key Problem with Analysis of Microbial Metagenomic Data Proteobacteria TM6 OS-K Acidobacteria Termite Group OP8 Nitrospira Bacteroides Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 OP11 At Least 40 Phyla of Bacteria, But Only a Few are Well Sampled Source: Eddie Rubin, DOE JGI DOE Genomic Encyclopedia of Bacteria and Archaea (GEBA) / Bergey Solution: Deep Sampling Across Phyla Proteobacteria TM6 OS-K Acidobacteria Termite Group OP8 Nitrospira Bacteroides Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 OP11 Well sampled phyla No cultured taxa Source: Eddie Rubin, DOE JGI GEBA / Bergey Pilot Project at JGI • Goal – To Finish ~100 Bacterial and Archaeal Genomes – Selected Based on: Input / Interactions with: – Phylogeny, – Availability of Phenotype Information Community Advisory Group , ASM, – Community Interest Academy of Microbiology, Etc… • Approach – Select 200 Organisms – Order DNA from Culture Collections (DSMZ and ATCC) – Sequence 100 for which DNA QC is Received • Project Lead (Jonathan Eisen JGI/UC Davis) – Project Management (David Bruce JGI/LANL) – Methods for Sequencing in Changing Technology Landscape (Paul Richardson JGI) – Linking to educational project (Cheryl Kerfeld JGI) Source: Eddie Rubin, DOE JGI Converting Genome Sequences to Protein Fold Space • • • • • • How many folds? How many sequences adopt the same fold? How does function vary as sequences diverge within a family? Are there still Kingdom-specific families? Can we determine function from structure? How diverse are metabolic pathways and networks? 5-amino-6-(5-phosphoribosylamino) uracil reductase JCSG: 2hxv Building Genome-Scale Models of Living Organisms JTB 2002 E. coli i2K Transcription &Translation b8 v1 G1 + RNAP G1* 2aGDP + 2aPi v2 Genomics b3 protein1 b5 b1 2nPi aAA Transcriptomics v5 rib Regulation rib1* 2aGTP Pi If [Carbon1] > 0, tc2 = 0 Monomers & Energy Proteins Pc2 A Metabolism GLC trx zwf 6PGA 6PG pgl G6P H+ ATP rbsK RIB MAL 2PG aceE pflA LAC trx ackA ETH trx ETHxt AC G6a O6a (+) (-) t2a R5 C+ 4 NADH C + 2 ATP + 3 NADH P2a t6a R2a B Hext H P6a R6a G + 1 ATP + 2 NADH SUCC SUCCxt Map Legend sucA aceA acnA icdA acs FOR trx FORxt (+) P5 3E SuccCoA CIT ACTP ETH LAC LACxt t5 FOR FADH GLX gltA pta adhE FOR dld G2a aceB OAA pckA AcCoA pykF O2a O5 sucC ppc PYR trx fdoH sdhA2 SUCC trx sfcA PEP PYRxt frdA AC trx ICIT ACxt AKG in Silico Organisms Now Available 2007: If Rh > 0, [H] is in surplus, t6a = 0 mdh eno ppsA NADH maeB PYR O2 + NADH sdhA1 fumA 3PG pts Rres B pnt1A FUM gpmA GLxt Qh2 nuoA RIBxt pgk GL trx cyoA NADPH RIB trx gapA DPG glpK G5 O2xt atpA pnt2A tpi GL3P Pres ATP O2 trx pfkA gpsA GL CO2xt CO2 trx O2 GA3P glpD tktA2 R5P FDP Metabolomics CO 2 talA tktA1 rpiA fba DHAP G R3b Pi trx Ru5P F6P fbp tres (+) O2 Pixt Pi gnd pgi Gres P3b 0.8 C + 2 NADH Carbon2 Rc2 If R1 = 0, we say [B] is not in surplus, t2a = t5 = 0 E4P rpe glk pts S7P X5P GLC Ores O3b t3b (+) tc2 (-) Carbon1 (indirect) G3b Gc2 Oc2 GLCxt If Oxygen = 0, we say [O2] = 0, tres= t3b = 0 – Has 4300 Genes – Model Has 2000! b9 b7 Proteomics JBC 2002 v4 (subject to global max.) aAMP + 2aPi aAA-tRNA Regulatory Actions b4 mRNA1v3=k1[mRNA1] nNMP atRNA v6 b6 b2 nNTP aATP • E. Coli GROWTH/BIOMASS PRECURSORS Input Signals EXTRACELLULAR METABOLITE INTRACELLULAR METABOLITE reaction/gene name Interactomics Environment Source: Bernhard Palsson UCSD Genetic Circuits Research Group http://gcrg.ucsd.edu •Escherichia coli •Haemophilus influenzae •Helicobacter pylori •Homo sapiens Build 1 •Human red blood cell •Human cardiac mitochondria •Methanosarcina barkeri •Mouse Cardiomyocyte •Mycobacterium tuberculosis •Saccharomyces cerevisiae •Staphylococcus aureus Biochemically, Genetically and Genomically (BiGG) Genome-Scale Metabolic Reconstructions S. aureus • 640 Reactions • 619 Genes S. typhimurium • 898 Reactions • 826 Genes M. barkeri • 619 Reactions • 692 Genes RBC Mitoc. • 39 Rxns• 218 Rxns H. sapiens • 3311 Reactions • 1496 Genes S. aureus S. typhimurium H. influenzae H. pylori E. coli • 2035 Reactions • 1260 Genes M. tuberculosis • 939 Reactions • 661 Genes H. pylori • 558 Reactions • 341 Genes H. influenzae • 472 Reactions • 376 Genes S. cerevisiae • 1402 Reactions • 910 Genes Systems Biology Research Group http://systemsbiology.ucsd.edu Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Acidobacteria bacterium Ellin345 Soil Bacterium 5.6 Mb Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD OptIPortal–Termination Device for the Dedicated Gigabit/sec Lightpaths Collaborative Analysis of Large Scale Images of Cancer Cells Integration of High Definition Video Streams with Large Scale Image Display Walls Photo Source: David Lee, Mark Ellisman NCMIR, UCSD An Emerging High Performance Collaboratory for Microbial Metagenomics OptIPortals UW UMich NW! UIC EVL MIT UC Davis JCVI UCI SIO UCSD SDSU CICESE OptIPortal