Download s577-ghoting

Cache-conscious Frequent Pattern Mining on a Modern Processor Amol Ghoting, Gregory Buehrer, and Srinivasan Parthasarathy Data Mining Research Laboratory, CSE The Ohio State University Daehyun Kim, Anthony Nguyen, Yen-Kuang Chen, and Pradeep Dubey Intel Corporation Copyright 2005, Data Mining Research Lab, The Ohio State University Roadmap • • • • • • Motivation and Contributions Background Performance characterization Cache-conscious optimizations Related work Conclusions Copyright 2005, Data Mining Research Lab, The Ohio State University Motivation • Data mining applications – Rapidly growing segment in commerce and science – Interactive  response time is important – Compute- and memoryintensive • Modern architectures – Memory wall – Instruction level parallelism (ILP) Note: Experiment conducted on specialized hardware FP-Growth 2.4x 1.6x SATURATION Copyright 2005, Data Mining Research Lab, The Ohio State University Contributions • We characterize the performance and memory access behavior of three state-of-the-art frequent pattern mining algorithms • We improve the performance of the three frequent pattern mining algorithms – Cache-conscious prefix tree • Spatial locality + hardware pre-fetching • Path tiling to improve temporal locality • Co-scheduling to improve ILP on a simultaneous multi-threaded (SMT) processor Copyright 2005, Data Mining Research Lab, The Ohio State University Roadmap • • • • • • Motivation and Contributions Background Performance characterization Cache-conscious optimizations Related work Conclusions Copyright 2005, Data Mining Research Lab, The Ohio State University Frequent pattern mining (1) • Finds groups of items that co-occur frequently in a transactional data set • Example: Customer 1: milk bread cereal Customer 2: milk bread eggs sugar Customer 3: milk bread butter Customer 4: eggs sugar • Minimum support = 2 • Frequent patterns: – Size 1: (milk), (bread), (eggs), (sugar) – Size 2: (milk & bread), (milk & eggs), (sugar & eggs) – Size 3: None • Search space traversal: breadth-first or depth-first Copyright 2005, Data Mining Research Lab, The Ohio State University Frequent pattern mining (2) • Algorithms under study – FP-Growth (based on the pattern-growth method) • Winner of the 2003 Frequent Itemset Mining Implementations (FIMI) evaluation – Genmax (depth-first search space traversal) • Maximal pattern miner – Apriori (breadth-first search space traversal) • All algorithms use a prefix tree as a data set representation Copyright 2005, Data Mining Research Lab, The Ohio State University Roadmap • • • • • • Motivation and Contributions Background Performance characterization Cache-conscious optimizations Related work Conclusions Copyright 2005, Data Mining Research Lab, The Ohio State University Setup • Intel Xeon processor – At 2Ghz with 4GB of main memory – 4-way 8KB L1 data cache – 8-way 512KB L2 cache on chip – 8-way 2MB L3 cache • Intel VTune Performance Analyzers to collect performance data • Implementations – FIMI repository • FP-Growth (Gosta Grahne and Jianfei Zhu) • Genmax (Gosta Grahne and Jianfei Zhu) • Apriori (Christian Borgelt) – Custom memory managers Copyright 2005, Data Mining Research Lab, The Ohio State University Execution time breakdown FPGrowth Genmax Apriori Count-FPGrowth () – 61% Count-GM () – 91% Count-Apriori () – 70% Project-FPGrowth () – 31% Other – 9% Candidate-Gen () – 25% Other – 8% Other – 5% Support counting in a prefix tree Similar to Count-FPGrowth () Copyright 2005, Data Mining Research Lab, The Ohio State University Operation mix CountFPGrowth () CountGM () CountApriori () Integer ALU operations per instruction 0.65 0.64 0.34 Memory operations per instruction 0.72 0.69 0.66 Note: Each column need not sum up to 1 Copyright 2005, Data Mining Research Lab, The Ohio State University Memory access behavior Count-FPGrowth() Count-GM() Count-Apriori() L1 hit rate 89% 87% 86% L2 hit rate 43% 42% 49% L3 hit rate 39% 40% 27% L3 misses per instruction 0.03 0.03 0.04 CPU utilization 9% 9% 8% Copyright 2005, Data Mining Research Lab, The Ohio State University Roadmap • • • • • • Motivation and Contributions Background Performance characterization Cache-conscious optimizations Related work Conclusions Copyright 2005, Data Mining Research Lab, The Ohio State University FP-tree r r a:1 a:2 a:4 f:1 c:1 c:1c:3 c:2 b:1 b:1 f:1f:3 f:2 Minimum support = 3 •Index based on largest common prefix p:1 m:1 m:1 m:2 b:1 b:1 p:1p:2 p:1 m:1 m:1 ITEM COUNT NODE POINTER CHILD POINTERS PARENT POINTER Typically results in a compressed data set representation Node pointers allow for fast searching of items Copyright 2005, Data Mining Research Lab, The Ohio State University FP-Growth r • Basic step: – For each item in the FP-tree, build conditional FP-tree – Recursively mine the conditional FP-tree a:4 f:1 c:1 c:3 b:1 b:1 f:3 •Dynamic data structure and only two node fields are used through the bottom up traversal p:1 m:2 b:1 ITEM p:2 m:1 COUNT •Poor spatial locality NODE POINTER •Large data structure •Poor temporal locality •Pointer de-referencing Conditional FP-tree for m: •Poor instruction level parallelism (ILP) We process items p, f, c, a, b similarly r a:3 c:3 f:3 Copyright 2005, Data Mining Research Lab, The Ohio State University CHILD POINTERS PARENT POINTER Mine conditional FP-tree Cache-conscious prefix tree • Improves cache line utilization – Smaller node size + DFS allocation • Allows the use of hardware cache line pre-fetching for bottom-up traversals r Header lists (for Node pointers) a c f b c b a b c f m p ITEM PARENT POINTER f p o o o o o o o o o o o o COUNT CHILD POINTERS m p b m NODE POINTER Count lists (for Node counts) a b c f m p DFS allocation Copyright 2005, Data Mining Research Lab, The Ohio State University o o o o o o o o o o o o Path tiling to improve temporal locality r • DFS order enables breaking the tree into tiles based on node addresses – These tiles can partially overlap • Maximize tree reuse – All tree accesses are restructured to operate on a tileby-tile basis Tile 1 Tile 2 Tile N-1 Copyright 2005, Data Mining Research Lab, The Ohio State University Tile N Improving ILP using SMT Thread1 Thread2 r • Simultaneous multi-threading (SMT) Same data but different computation – Intel hyper-threading (HT) maintains two hardware contexts on chip – Improves instruction level parallelism (ILP) • When one thread waits, the other thread can use CPU resources • Identifying independent threads is not good enough • Unlikely to hide long cache miss latency • Can lead to cache interference (conflicts) • Solution: Restructure multi-threaded computation to reuse cache on a Tile tile-by-tile basis Tile N-1 Tile N 1 Tile 2 Copyright 2005, Data Mining Research Lab, The Ohio State University Speedup for FP-Growth (Synthetic data set) 4000 4500 5000 5500 Copyright 2005, Data Mining Research Lab, The Ohio State University Speedup for FP-Growth (Real data set) For FP-Growth, L1 hit rate improves from 89% to 94% L2 hit rate improves from 43% to 98% 50000 Speedup Genmax – up to 4.5x Apriori – up to 3.5x 58350 66650 75000 Copyright 2005, Data Mining Research Lab, The Ohio State University Roadmap • • • • • • Motivation and Contributions Background Performance characterization Cache-conscious optimizations Related work Conclusions Copyright 2005, Data Mining Research Lab, The Ohio State University Related work (1) • Data mining algorithms – Characterizations • Self organizing maps – Kim et al. [WWC99] • C4.5 – Bradford and Fortes [WWC98] • Sequence mining, graph mining, outlier detection, clustering, and decision tree induction – Ghoting et al. [DAMON05] – Memory placement techniques for association rule mining • Considered the effects of memory pooling and coarse grained spatial locality on association rule mining algorithms in a serial and parallel setting – Parthasarathy et al. [SIGKDD98,KAIS01] Copyright 2005, Data Mining Research Lab, The Ohio State University Related work (2) • Data base algorithms – DBMS on modern hardware • Ailamaki et al. [VLDB99,VLDB2001] – Cache sensitive search trees and B+-trees • Rao and Ross [VLDB99,SIGMOD00] – Prefetching for B+-trees and Hash-Join • Chen et al. [SIGMOD00,ICDE04] Copyright 2005, Data Mining Research Lab, The Ohio State University Ongoing and future work • Algorithm re-design for next generation architectures – e.g. graph mining on multi-core architectures • Cache-conscious optimizations for other data mining and bioinformatics applications on modern architectures – e.g. classification algorithms, graph mining • Out-of-core algorithm designs • Microprocessor design targeted at data mining algorithms Copyright 2005, Data Mining Research Lab, The Ohio State University Conclusions • Characterized the performance of three popular frequent pattern mining algorithms • Proposed a tile-able cache-conscious prefix tree – Improves spatial locality and allows for cache line prefetching – Path tiling improves temporal locality • Proposed novel thread-based decomposition for improving ILP by utilizing SMT – Overall, up to 4.8-fold speedup • Effective algorithm design in data mining needs to take into account modern architectural designs. Copyright 2005, Data Mining Research Lab, The Ohio State University Thanks • We would like to acknowledge the following grants – – – – NSF: CAREER-IIS-0347662 NSF: NGS-CNS-0406386 NSF: RI-CNS-0403342 DOE: ECPI-DE-FG02-04ER25611 Copyright 2005, Data Mining Research Lab, The Ohio State University

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download s577-ghoting