* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPTX - Tandy Warnow
Genomic imprinting wikipedia , lookup
Quantitative trait locus wikipedia , lookup
History of genetic engineering wikipedia , lookup
Ridge (biology) wikipedia , lookup
Designer baby wikipedia , lookup
DNA barcoding wikipedia , lookup
Genome (book) wikipedia , lookup
Public health genomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Species distribution wikipedia , lookup
Human microbiota wikipedia , lookup
Koinophilia wikipedia , lookup
Minimal genome wikipedia , lookup
Microevolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
TIPP: Taxonomic Identification And Phylogenetic Profiling Nam-phuong Nguyen Computer Science And Engineering University Of California, San Diego Precision Medicine • Personalized treatment based upon the patients’ phenotypes and genotypes • Precision Medicine Initiative launched with $215M in 2016 • Many different aspects including genomics, epigenetics, microbiome Image courtesy of gurdanhealth.com Precision Medicine • Personalized treatment based upon the patients’ phenotypes and genotypes • Precision Medicine Initiative launched with $215M in 2016 • Many different aspects including genomics, epigenetics, microbiome Image courtesy of gurdanhealth.com Human Microbiome • 10 times more bacteria cells than human cells • Important role in regulating health • Disruption associated with risk factors for diseases • Analysis through metagenomics Image courtesy of humanlongevity.com Metagenomics • Analyzing DNA sequences from environmental sample • Typical datasets contain millions of reads Fundamental Questions • What is the identity of a read? • What is the microbial profile of a sample? • What genes/functions are present? Fundamental Questions • What is the identity of a read? • What is the microbial profile of a sample? • What genes/functions are present? Metagenomic Taxon Identification Objective: classify short reads in a metagenomic sample Abundance Profiling Objective: distribution of the species (or genera, or families, etc.) within the sample For example, the distribution of a sample at the species level might be: Species A: 10% Species B: 25% Species C: 55% Species D: 1% Species E: 9% Genome-based profiling A A B Population of 2 bacteria, A and B. B has twice as large genome as A. True profile: 67% A, 33% B Profile estimated from reads: 50% A, 50%B Single copy marker-based profiling A A Population of 2 bacteria, A and B. B has twice as large genome as A. Each have a single copy of gene C B True profile: 67% A, 33% B Profile estimated from reads: 67% A, 33%B TIPP: Taxonomic Identification And Phylogenetic Profiling Fragmentary unknown reads for a gene Known full length sequences for a gene, and an alignment and a tree ACCG CGAG CGG GGCT … … … … ACCT AGG...GCAT (species1) TAGC...CCA (species2) TAGA...CTT (species3) AGC...ACA (species4) ACT..TAGAA (species5) TIPP: Taxonomic Identification And Phylogenetic Profiling • Nguyen et al., Bioinformatics, 2014 Reads Assign to marker genes Marker genes Classify reads Compute profile Abundance Profiling • Objective: distribution of the species (or genera, or families, etc.) within the sample. • Leading techniques: • PhymmBL (Brady & Salzberg, Nature Methods 2009) • NBC (Rosen, Reichenberger, and Rosenfeld, Bioinformatics 2011) • MetaPhyler (Liu et al., BMC Genomics 2011), from the Pop Lab at the University of Maryland • MetaPHlAn (Segata et al., Nature Methods 2012), from the Huttenhower Lab at Harvard • mOTU (Bork et al., Nature Methods 2013) • MetaPhyler, MetaPHlAn, and mOTU are marker-based techniques (but use different marker genes). “Hard” genome datasets (known genomes and high indel error) Note: NBC, MetaPhlAn, and MetaPhyler cannot classify any sequences from at least of the high indel long sequence datasets. mOTU terminates with an error message on all the high indel datasets. “Novel” genome datasets Note: mOTU terminates with an error message on the long fragment datasets and high indel datasets. TIPP Compared To Other Profiling Methods • TIPP is highly accurate, even in the presence of novel genomes and high sequencing error • All other methods are less robust • Accurate profiles can be estimated using only a portion of the reads Do Individual Primates From The Same Species Have Personal Microbiomes? Humans have personalized microbiome Fierer et al., PNAS 2010 showed that you can identify who had previously used a keyboard via the residual contact microbiome (three individuals in study) Experimental Design • Dataset (unpublished; in preparation) • Data collected by Patton’s Lab at U of Washington • Longitudinal study of the vaginal, rectal, and fecal microbiome in 39 female captive Pigtailed Macacas • Weekly matched paired samples taken over a period of a month from each individual • 16S rRNA amplicon sequencing • TIPP (Nguyen et al. 2014) used to generate profiles • Questions • How to the microbiomes differ by body site and individual • Can we identify an individual based upon the microbiome? Experimental Design Week 1 Week 2 Week 3 Which individual? Identification Results Future Directions • Expanding the marker set, both in the number of species and genes • Statistical approach to combining profiles from different marker genes • Developing TIPP for virobiome Acknowledgements • Illinois • Tandy Warnow • Rebecca Stumpf • Bryan White • Mike Nute • Brenda Wilson • UCSD • Siavash Mirarab • UMD • Mihai Pop • Bo Liu • U of Copenhagen • Alonzo Alfaro-Núñez • Tom Hansen • Anders Hansen • Funding • NSF 09-35347 • NSF 08-20709 • NSF 0733029 • University of Alberta Questions? • TIPP tutorial tomorrow at 10:00-11:00 in MR7 • Instructions for downloading at https://github.com/smirarab/sepp/blob/master/README.TIPP.md • Tutorial at https://github.com/smirarab/sepp/blob/master/tutorial/tipptutorial.md