* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Kravitz_Symposium
Metalloprotein wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Magnesium transporter wikipedia , lookup
Exome sequencing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
Molecular ecology wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
Expression vector wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Interactome wikipedia , lookup
Genomic library wikipedia , lookup
Protein structure prediction wikipedia , lookup
Protein purification wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Western blot wikipedia , lookup
Homology modeling wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
CAMERA A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute Acknowledgements • UCSD/Calit2 - Larry Smarr, PI; Paul Gilna, Executive Director - Phil Papadopoulos, Technical Lead - Weizhong Li • JCVI - Marv Frazier, co-PI Leonid Kagan, Architect; Jennifer Wortman, Bioinformatics Rekha Seshadri, Outreach and Training; Doug Rusch, Shibu Yooseph, Aaron Halpern, Granger Sutton • UC Davis - Jonathan Eisen, co-investigator • Gordon and Betty Moore Foundation - David Kingsbury and Mary Maxon Outline • • • • • • New Discipline of Metagenomics Global Ocean Sampling Expedition Challenges of Metagenomic Data CAMERA Features CAMERA Usage to Date Cyberinfrastructure Genomics vs Metagenomics • Genomics – ‘Old School’ - Study of an organism's genome - Genome sequence determined using shotgun sequencing and assembly - ~1300 microbes sequenced, first in 1995 - DNA usually obtained from pure cultures • Metagenomics - Application of genome sequencing methods to environmental samples (no culturing) - Environmental shotgun sequencing is the most widely used approach Metagenomic Questions • Within an environment - What biological functions are present (absent)? - What organisms are present (absent) • Compare data from (dis)similar environments - What are the fundamental rules of microbial ecology • Search for novel proteins and protein families Metagenomics Applications • Marine Ecology and Microbiology • Alternative Energy and Industrial - Hypersaline ponds, Oceans - Termite Metabolism • Medical Applications - Microbial Ecology of Human body cavities and fluids • Agricultural - Disease Vector Metabolism (Glassy Eyed Sharpshooter) - Soil Ecology • Environmental Remediation - DOE: Acid Mine Drainage, Chemical and Radioactive Waste Metadata • Metagenomics - Genomics + Metadata • Environmental Metadata - Time and location (lat, long, depth) of sample collection - Correlate w/remote sensing data - Physico-chemical properties (e.g. temperature, salinity) MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 JCVI Global Ocean Sampling Expedition Largest Metagenomic Study to Date Global Ocean Sampling (GOS) 178 Total Sampling Locations Phase 1: 41 samples, 7.7M reads, >6M proteins Diverse Environments Open ocean, estuary, embayment, upwelling, fringing reef, atoll, warm seep, mangrove, fresh water, biofilms, sediments, soils GOS Protein Analysis Yooseph et al (PLoS 2007) • Novel clustering process • Sequence similarity based • Predict proteins and group into related clusters • Include GOS and all known proteins • Findings • GOS proteins cover ~all existing prokaryotic families • GOS expands diversity of known protein families • 1700 large novel clusters with no homology to known protein families • Higher than expected proportion of novel clusters are viral • No saturation in the rate of novel protein family discover Added Diversity Rubisco homologs UVDE homologs H. marismortui D. radiodurans D. psychrophila GOS eukaryotes GOS prokaryotes T. thermophilus B. halodurans B. anthracis GOS viral GOS prokaryotes Known eukaryotes Known eukaryotes Known prokaryotes Known prokaryotes Known viral Rate of Protein Discovery Number of clusters (thousands) Rate of discovery 250 200 size >=3 150 size >=5 size >=10 100 size >=20 50 0 0 1 2 3 4 5 Number of sequences (millions) 6 7 Fragment Recruitment Viewer Rusch et al, PLoS 3/2007 Sequence absent from most strains – phage/other lateral transfer? 100% Percent Identity 100% 50% 55% Reference Genome Coordinates “core” genome, ~75% identical Ribosomal operon Why CAMERA? • Public repositories not focused on environmental metagenomics - Sargasso Sea data underutilized by community • M$ invested in sequencing and analysis but only accessible to bioinformatics elite • Release of GOS dataset in March 2007 • Comply with Convention on Biodiversity CAMERA – http://camera.calit2.net • “Convenient acronym for cumbersome name…” - Henry Nichols, PLoS Biology • Mission - Enable Research in Marine Microbiology • CAMERA Partners: Challenges • Enormous datasets with high gene density - large compute resources required - 2 orders of magnitude jump • Fragmentary data - inadequate bioinformatics tools for assembly, annotation, analysis, visualization • Metadata standards non-existent - metadata absent from databases - Lack of standards impedes collection of datasets • Diversity of User Sophistication and Needs CAMERA Services • Maintain searchable sequence collections - ALL metagenomic sequence reads, assemblies Non-identical amino acid collection (extended NRAA) Viral, Fungal, pico-Eukaryotes, Microbial CAMERA protein clusters • Metagenomics data easily downloadable • Interactive and Batch Search Facility - Scalable parallel implementations of BLAST - Integrated with associated metadata Distinctive Features Set in Progress • Graphical Tools for Visualizing Diversity - Based on Rusch et al - Fragment recruitment viewer • CAMERA Protein Clusters - Based on Yooseph et al - Incremental version implemented in 2007 • Annotation - Break through quadratic complexity via clusters - Phyletic Classification • Overviews of sequence collections Fragment Recruitment Viewer Metagenomic Sequence vs Reference Sequence • Highlight and Select with Associated Metadata • View large datasets • AJAX I/F Based on Doug Rusch’s Viewer