Download Part 1 - WSU EECS

MapReduce for Machine Learning on Multicore Cheng-Tao Chu and Sang Kyun Kim et al. Stanford University, Stanford CA Presented by Inna Rytsareva 1 Outline  What is MapReduce?  Problem Description and Formalization  Statistical Query Model and Summation Form  Architecture (inspired by MapReduce)  Adopted ML Algorithms  Experiments  Future of MapReduce for Machine Learning  Discussion Map-Reduce for Machine Learning on Multicore 2 Motivation  Problem: lots of data  Example: 20+ billion web pages x 20KB = 400+ terabytes  One computer can read 30-35 MB/sec from disk  ~four months to read the web  ~1,000 hard drives just to store the web  Even more to do something with the data 3 Motivation  Solution: spread the work over many machines  Same problem with 1000 machines, < 3 hours  programming work  communication and coordination  recovering from machine failure  status reporting  debugging  optimization  locality  repeat for every problem you want to solve 4 Cluster Computing  Many racks of computers, thousands of machines per cluster  Limited bisection bandwidth between racks http://upload.wikimedia.org/wikipedia/commons/d/d3/IBM_Blue_Gene_ P_supercomputer.jpg 5 MapReduce  A simple programming model that applies to many large-scale computing problems  Hide messy details in MapReduce runtime library:  automatic parallelization  load balancing  network and disk transfer optimization  handling of machine failures  robustness 6 Programming model  Input & Output: each a set of key/value pairs  Programmer specifies two functions: map() reduce()  map (in_key, in_value) -> list(out_key, intermediate_value)  Processes input key/value pair  Produces set of intermediate pairs  reduce (out_key, list(intermediate_value)) -> list(out_value)  Combines all intermediate values for a particular key  Produces a set of merged output values (usually just one) 7 Example  Page 1: the weather is good  Page 2: today is good  Page 3: good weather is good. 8 Map Output  Worker 1:  (the 1), (weather 1), (is 1), (good 1).  Worker 2:  (today 1), (is 1), (good 1).  Worker 3:  (good 1), (weather 1), (is 1), (good 1). 9 Reduce Input  Worker 1:  (the 1)  Worker 2:  (is 1), (is 1), (is 1)  Worker 3:  (weather 1), (weather 1)  Worker 4:  (today 1)  Worker 5:  (good 1), (good 1), (good 1), (good 1) 10 Reduce Output  Worker 1:  (the 1)  Worker 2:  (is 3)  Worker 3:  (weather 2)  Worker 4:  (today 1)  Worker 5:  (good 4) 11 http://4.bp.blogspot.com/_j6mB7TMmJJY/STAYW9gC-NI/AAAAAAAAAGY/lLKo7sBp5i8/s1600-h/P1.png 12 Fault tolerance  On worker failure:  Detect failure via periodic heartbeats (ping)  Re-execute completed and in-progress map tasks  Re-execute in progress reduce tasks  Task completion committed through master  Master failure:  Could handle, but don't yet (master failure unlikely) 13 MapReduce Transparencies  Parallelization  Fault-tolerance  Locality optimization  Load balancing 14 Suitable for your task if  Have a cluster  Working with large dataset  Working with independent data (or assumed)  Can be cast into map and reduce 15 References  Original paper J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, vol. 51, 2008, pp. 107-113. (http://labs.google.com/papers/mapreduce.html)  On wikipedia (http://en.wikipedia.org/wiki/MapReduce)  Hadoop – MapReduce in Java (http://lucene.apache.org/hadoop/)  Starfish - MapReduce in Ruby (http://rufy.com/starfish/) 16 Outline  What is MapReduce?  Problem Description and Formalization  Statistical Query Model and Summation Form  Architecture (inspired by MapReduce)  Adopted ML Algorithms  Experiments  Future of MapReduce for Machine Learning  Discussion Map-Reduce for Machine Learning on Multicore 17 Motivations  Industry-wide shift to multicore  No good framework for parallelize ML algorithms  Goal: develop a general and exact technique for parallel programming of a large class of ML algorithms for multicore processors http://upload.wikimedia.org/wikipedia/commons/a/af/E6750bs8.jpg 18 Idea … … … • Statistical Query Model • Summation Form • MapReduce 19 Outline  What is MapReduce?  Problem Description and Formalization  Statistical Query Model and Summation Form  Architecture (inspired by MapReduce)  Adopted ML Algorithms  Experiments  Future of MapReduce for Machine Learning  Discussion Map-Reduce for Machine Learning on Multicore 20 Valiant Model [Valiant’84]  x is the input  y is a function of x that we want to learn  In Valiant model, the learning algorithm uses randomly drawn examples <x, y> to learn the target function 21 Statistical Query Model [Kearns’98]  A restriction on Valiant model  A learning algorithm uses some aggregates over the examples, not the individual examples  Given a function f(x,y) over instances (data points x and labels y), a statistical oracle will return an estimate of the expectation of f(x,y)  Any model that computes gradients or sufficient statistics over f(x,y) ﬁts this model  Typically this is achieved by summing over the data. 22 Summation Form  Aggregate over the data:  Divide the data set into pieces  Compute aggregates on each cores  Combine all results at the end 23 Example: Linear Regression Model: Goal: Solution: Given m examples: (x1, y1), (x2, y2), …, (xm, ym) We write a matrix X with x1, …, xm as rows, and row vector Y=(y1, y2, …ym). Then the solution is Parallel computation: • • 24 Outline  What is MapReduce?  Problem Description and Formalization  Statistical Query Model and Summation Form  Architecture (inspired by MapReduce)  Adopted ML Algorithms  Experiments  Future of MapReduce for Machine Learning  Discussion Map-Reduce for Machine Learning on Multicore 25 Lighter Weight MapReduce for Multicore 26 Outline  What is MapReduce?  Problem Description and Formalization  Statistical Query Model and Summation Form  Architecture (inspired by MapReduce)  Adopted ML Algorithms  Experiments  Future of MapReduce for Machine Learning  Conclusion and Discussion Map-Reduce for Machine Learning on Multicore 27 Locally Weighted Linear Regression (LWLR) Solve: When wi == 1, this is least squares.  Mappers: one sets compute subgroups of A, the other set compute subgroups b  Two reducers for computing A and b  Finally compute the solution 28 Naïve Bayes (NB)  Goal: estimate P(xj=k|y=1) and P(xj=k|y=0) and P(y)  Computation: count the occurrence of (xj=k, y=1) and (xj=k, y=0), count the occurrence of (y=1) and (y=0)  Mappers: count a subgroup of training samples  Reducer: aggregate the intermediate counts, and calculate the final result 29 Gaussian Discriminative Analysis (GDA)  Goal: classification of x into classes of y  assuming each class is a Gaussian Mixture model with different means but same covariance.  Computation:  Mappers: compute for a subgroup of training samples  Reducer: aggregate intermediate results 30 K-means  Computing the Euclidean distance between sample vectors and centroids  Recalculating the centroids  Divide the computation to subgroups to be handled by map-reduce 31 Neural Network (NN)  Back-propagation, 3-layer network  Input, middle, 2 output nodes  Goal: compute the weights in the NN by back propagation  Mapper: propagate its set of training data through the network, and propagate errors to calculate the partial gradient for weights  Reducer: sums the partial gradients and does a batch gradient descent to update the weights 32 Principal Components Analysis (PCA)  Compute the principle eigenvectors of the covariance matrix m 1  T    mi1 xi xi  T  Clearly, we can compute the summation form using map-reduce  Express the mean vector as a sum m 1     i1 x i   m    33 Other Algorithms  Logistic Regression  Independent Component Analysis  Support Vector Machine  Expectation Maximization (EM) 34 Time Complexity Basically: Linear speed up with increasing number of cores 35 Outline  What is MapReduce?  Problem Description and Formalization  Statistical Query Model and Summation Form  Architecture (inspired by MapReduce)  Adopted ML Algorithms  Experiments  Future of MapReduce for Machine Learning  Discussion Map-Reduce for Machine Learning on Multicore 36 Setup  Compare map-reduce version and sequential version  10 data sets  Machines:  Dual-processor Pentium-III 700MHz, 1GB RAM  16-way Sun Enterprise 6000 37 Dual-Processor SpeedUps 38 SpeedUp for 2-16 processors Bold – average Error Bars – max/min Dashed - variance 39 Multicore Simulator Results  Multicore simulator over the sensor dataset     Better results – reported for NN & LR NN  16 cores 15.5x  32 cores 29x  64 cores 54x LR  16 cores 15x  32 cores 29.5x  64 cores 53x Could be because of less communication cost 40 Conclusion  Parallelize summation forms  NO change in the underlying algorithm  NO approximation  Use map-reduce on a single machine 41 Outline  What is MapReduce?  Problem Description and Formalization  Statistical Query Model and Summation Form  Architecture (inspired by MapReduce)  Adopted ML Algorithms  Experiments  Future of MapReduce for Machine Learning  Discussion Map-Reduce for Machine Learning on Multicore 42 Apache Mahout  An Apache Software Foundation project to create scalable machine learning libraries under the Apache Software License  http://mahout.apache.org  Why Mahout?      Community Documentation and Examples Scalability the Apache License Not-specific research-oriented http://dictionary.reference.com/browse/mahout 43 Focus: Scalable  Goal: Be as fast and efficient as the possible given the intrinsic design of the algorithm  Some algorithms won’t scale to massive machine clusters  Others fit logically on a MapReduce framework like Apache Hadoop  Still others will need other distributed programming models  Most Mahout implementations are MapReduce enabled  Work in Progress 44 Sampling of Who uses Mahout? https://cwiki.apache.org/confluence/display/MAHOUT/Powered+By+Mahout 45 Focus: Machine Learning Applications Examples Genetic Freq. Pattern Mining Utilities Lucene/Vectorizer Classification Clustering Math Vectors/Matrices/ SVD Recommenders Collections (primitives) Apache Hadoop http://cwiki.apache.org/confluence/display/MAHOUT/Algorithms 46 Resources  “Mahout in Action”  Owen, Anil, Dunning and Friedman  http://awe.sm/5FyNe  “Introducing Apache Mahout”  http://www.ibm.com/developerworks/java/library/j-mahout/  “Taming Text” by Ingersoll, Morton, Farris  “Programming Collective Intelligence” by Toby Segaran  “Data Mining - Practical Machine Learning Tools and Techniques” by Ian H. Witten and Eibe Frank  “Data-Intensive Text Processing with MapReduce” by Jimmy Lin and Chris Dyer 47 Outline  What is MapReduce?  Problem Description and Formalization  Statistical Query Model and Summation Form  Architecture (inspired by MapReduce)  Adopted ML Algorithms  Experiments  Future of MapReduce for Machine Learning  Discussion Map-Reduce for Machine Learning on Multicore 48 Discussion  What are other alternatives to MapReduce?  What to do if “summation form” is not applicable?  Does the dataset quality effect implementation and performance of parallel machine learning algorithms?  Multicore processors… future? Predicting Structural and Functional Sites in Proteins by Searching for Maximum-Weight Cliques 49

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Part 1 - WSU EECS