Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Taxonomic profiling with MetaPhlAn2 Curtis Huttenhower Galeb Abu-Ali Eric Franzosa Harvard T.H. Chan School of Public Health Department of Biostatistics 08-12-16 The two big questions of microbial community analysis... Who is there? What are they doing? 2 Taxonomic profiling: who’s there http://huttenhower.sph.harvard.edu/metaphlan2 3 Efficient assembly-free meta’omics by leveraging isolates I II III III 4 I II II III II I I IV II II IV IV V V I IV I I III II II IV II II V IV V V III III II V NCBI isolate genomes Archaea 300 Bacteria 12,926 Viruses 3,565 Eukaryota 112 V II Open reading frames 49.0 million total genes RepoPhlAn Species pangenomes Core genes Marker genes 7,677 containing 18.6 million gene clusters Nicola Segata ChocoPhlAn http://www.metaref.org MetaPhlAn2: metagenomic taxonomic profiling http://huttenhower.sph.harvard.edu/metaphlan2 X is a unique marker gene for clade Y Gene X • ~1M most representative markers used for identification • 184±45 markers per species (target 200) • • • • • ~7,100 species (excludes incomplete annotations, spp., etc.) False positive/False negative rates of ~1 in 106 Profiles all domains of life: bacteria, viruses, euks, archaea Strain level profiling using marker barcodes and SNPs Quasi-markers used to resolve ambiguity in postprocessing 5 Coverage Per-species abundance by robust averaging Multi-copy genes Plateau of genes from one metagenome’s strain Absent genes Abundance-sorted pan-gene families 6 Meta-analysis of metagenomic taxonomic profiles • • • Waldron and Segata: meta-analysis of >2,400 gut metagenomes. Available as an R package. Allows systematic tests of phenotypes across datasets, or health vs. disease. 7