Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
MetaPhlAn v2 and tracking microbes at the strain level Edoardo Pasolli Nicola Segata’s Lab Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento, Italy MetaSUB Conference June 20th, 2015 The shotgun metagenomic workflow 1 2 2 2 Taxonomic profiling: who’s there? Tons of microbes Tons of short reads Shotgun sequencing Taxonomic profiling Sequenced genomes of (some) microbes Microbial taxonomy • MetaPhlAn (Segata et al., Nature Methods 2012) • MetaPhlAn v2 (released, under review) • http://segatalab.cibio.unitn.it/tools/metaphlan2 Organismal relative abundances 3 Taxonomic profiling: unique marker genes X is a unique marker gene for clade Y Gene X THE INPUT • ~25.000 genomes (Bacteria, Archaea, Fungi, Viruses) • ~1/10 of genomes are final, ~9/10 draft • ~7,100 species (excludes incomplete annotations, spp., etc.) • • • • IDEA 1. Pre-identify markers from reference genomes 2. Use markers as proxy for taxonomic clades in shotgun metagenomics 2004 2005 2006 2007 2008 15k 12k Method: ChocoPhlAn THE RESULTING DATABASE ~15M total unique marker genes ~1M “most representative” unique marker genes 180±45 markers per species (200 fixed max) Quasi-markers used to resolve ambiguity in postprocessing 2003 18k 9k 6k Number of microbial organisms in RefSeq 2009 2010 2011 2012 3k 20134 0k Taxonomic profiling: MetaPhlAn’s overview Marker database Reference genomes + taxonomy Clade 1 Clade 1 Marker identification ChocoPhlAn (offline) Clade 2 Metagenome Clade 2 Mapping MetaPhlAn database Profiled Metagenome 5 Taxonomic profiling: MetaPhlAn’s main features • • • • • • Species-level resolution Computational feasibility Prevotella copri Organismal relative abundance rather than DNA concentrations Consistent detection confidence for all clades High accuracies for very short reads (as short as ~50nt) Detection of organisms without sequenced genomes Main MetaPhlAn2 additions • • • • • • • • Profiling not only for Bacteria and Archaea, but also for viruses, Fungi and Protozoa 6-fold increase in the number of considered species: >7000 species Introduction of the concept of quasi-markers Improvement of quantitative performances: higher correlation with true abundances, lower false positive and false negative rates Improvement of computational performances Addition of strain-specific barcoding for microbial strain tracking Profiled thousands of samples in few days Strain-level identification for organisms with sequenced genomes Integration with post-processing and visualization tools 6 Thanks! The Laboratory of Computational Metagenomics Matthias Scholz Adrian Tett Tin Truong Edoardo Pasolli Federica Armanini Francesco Asnicar Pamela Ferretti Moreno Zolfo Thomas Tolio Serena Manara Mattia Bolzan Francesco Beghini Luca Erculiani http://cibiocm.bitbucket.org - [email protected] Olivier Curtis Huttenhower Jousson Wendy Garrett Doyle Ward Jacques Izard Flaminia Marco Ventura Catteruccia Owen R White Dan Littman Veronica De Sanctis Roberto Bertorelli Enrico Blanzieri http://segatalab.cibio.unitn.it/tools/metaphlan2 7