* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Machine Learning and Data Mining: A Case Study with
Survey
Document related concepts
Transcript
Machine Learning and Data Mining: A Case Study with Enterotypes Gabe Al-Ghalith Jimmy Reeve Chapter 28, data mining http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-047-computational-biology-genomesnetworks-evolution-fall-2008/lecture-notes/MIT6_047f08_lec04_slide04.pdf Choosing Between Clustering and Classification – Clustering: summarize big data without a priori hypotheses – How would you categorize people based on their: – Blood-Type? – Gut bacteria? – Blood type calls for Classification ● – Consensus on blood groups: A, B, AB, O Gut Bacteria calls for Clustering ● No consensus on types or even number of categories http://www.nytimes.com/2011/04/21/science/21gut.html? _r=2&scp=2&sq=bacteria&st=cse& Reasons to Consider Gut Bacteria ● Contribute to diseases and response to treatments ● Protective role, digestive role ● We have 100s of genes that involve handling these bacteria ● NPR.org.- “Gut bacteria might guide the workings of our minds” ● Characterizing these bacteria can help us tease out these associations: ● Personalized medicine and treatment http://www.gutmicrobiotawatch.org/gut-microbiota-info/ http://www.npr.org/blogs/health/2013/11/18/244526773/gutbacteria-might-guide-the-workings-of-our-minds 3 Distinct “Enterotypes” Revealed from Clustering Approach ● ● Bacterial populations fell into 3 groups based on population composition These three “enterotypes” each contain one representative member of gut bacteria (chief/first principle component) – Enterotype 1: Bacteroides, enriched in vitamins B5,B7,C – Enterotype 2: Prevotella, enriched in vitamins B1, B9 – Enterotype 3: Blautia (Ruminococcus): H2/CO2 to acetate ● ~ 1500 known sequences used as filter for raw metagenomic reads. These are the “features.” A “sample” is the population composition in a subject's gut. ● 85 metagenomes from one source, 154 from another, 33 from a third. Same 3 classes emerged upon clustering each. Enterotypes of the human gut microbiome. Nature 473: 174–180. Clustering Methodology Used in the Original Paper ● Karhunen–Loève transform (KLT) – PCA ● Dimensionality reduction technique ● – Parallels with SQL3: “pivot” along axis with most variance, then final “roll up” based on distance metric – Some metrics: Euclidian, Manhattan, Vector angle, Pearsons, Jensen-Shannon... Ade4 package in R uses “pam” algorithm (“K-medoid”) Enterotypes of the human gut microbiome. Nature 473: 174–180. References ● ● ● ● ● Cluster in R (ade4 hooks this) http://cran.rproject.org/web/packages/cluster/cluster.pdf Ade4 primer on dimensionality reduction: cran.rproject.org/web/packages/ade4/index.html “The human gut microbiome: are we our enterotypes?” Microbial Biotechnology (2011) 4(5), 550–553 “Bacteria Divide People Into 3 Types, Scientists Say.” New York Times, April 20th, 2011. Dan Knights. Seminar: “Diet and microbiome: Which came first, the chicken nuggets or the Eggerthella?” Sep 26, 2013