Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Speeding up k-Means by GPUs 1 YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Outline 2 Introduction Efficiency of data mining -> GPGPU -> k-means on GPU; Related work Method Research Plan Efficiency of Data mining 3 Face the challenge of efficiency due to the increasing data Fig.2 Fig.1 Parallel data mining GPGPU 5 A general-purpose and high performance parallel hardware; Supply another platform for parallelizing data mining algorithms. Control Tesla C870 Cache 500 GeForce 8800 GTX GFLOPS 400 ALU ALU CPU 600 Fig.3 ALU ALU DRAM Quadro FX 5600 300 G71 G70-512 G70 200 100 NV35 NV30 0 Jan 2003 NV40 3.0 GHz Core 2 Duo GPU 3.0 GHz Core 2 Quad 3.0 GHz Pentium 4 Jul 2003 Jan 2004 Jul 2004 Jan 2005 Jul 2005 Jan 2006 Jul 2006 Jan 2007 Jul 2007 DRAM k-means on GPU 6 Programming on GPU CUDA: integrated CPU+GPU , C program k-Means Widely used in statistical data analysis, pattern recognition, etc.; Easy to implement on CPU, suitable to implement on GPU; Outline 7 Introduction Related work UV_k-Means, GPUMiner and HP_k-Means; Method Research Plan Related work 8 Speed of k-Means on low dimension data, in second. 700 n k d 2 million 100 400 100 400 100 400 100 400 2 2 8 8 2 2 8 8 4 million MineBech on CPU 19.36 70.93 39.81 152.25 38.74 141.84 79.60 304.46 HP k-Means 1.45 2.16 2.48 4.53 2.88 4.38 4.95 9.03 UV k-Means 2.84 5.96 6.07 16.32 5.64 11.94 12.85 34.54 NVIDIA GTX 280 GPU; Intel(R) Core(TM) i5 CPU; GPU Miner 61.39 63.46 192.05 226.79 130.36 126.38 383.41 474.83 600 500 Series2 Series1 400 300 200 100 0 MineBech HP on CPU k-Means UV k-Means GPU Miner Outline 9 Introduction Related work Method and Results k-Means (three steps)-> step 1 -> step 2 -> step 3; Experiments; Research Plan k-Means algorithm 10 n data point; k centroid; Compute distanc (ni, ki) find the closest centroid compute new centroid Yes If centroid change? No End Step 1 O(nkd) Step 2 O(nk) Step 3 O(nd) Memory Mechanism Memory Mechanism of GPU 11 Global Memory Large size Long latency Register Small size Short latency User cannot control Shared memory Medium size Short latency User control k-Means on GPU 12 Key idea Increase the number of computing operation for each global memory access; Adopts the method from matrix multiplication and reduction. Dimension is a key parameter For low dimension: use register; For high dimension: use shared memory; k-Means on GPU 13 For low dimension Read each data from global memory once k-Means on GPU 14 For high dimension Read each data from global memory once Experiments 15 The experiments were conducted on a PC with an NVIDIA GTX280 GPU and an Intel(R) Core(TM) i5 CPU. GTX 280 has 30 SIMD multi-processors, and each one contains eight processors and performs at 1.29 GHz. The memory of the GPU is 1GB with the peak bandwidth of 141.7 GB/sec. The CPU has four cores running at 2.67 GHz. The main memory is 8 GB with the peak bandwidth of 5.6 GB/sec. We use Visual Studio 2008 to write and compile all the source code. The version of CUDA is 2.3. We calculate the time of the application after the file I/O, in order to show the speedup effect more clearly. Experiments 16 On low dimension data Compare with HP, UV and GPUMiner, the data is generated randomly 40 35 30 25 Series1 Series2 20 15 10 5 0 Our k-Means HP k-Means UV k-Means Four to ten times faster than HP Experiments 17 On high dimension data Compare with UV and GPUMiner, the data is from KDD 1999. 45 40 35 30 Series1 Series2 25 20 15 10 5 0 Our k-Means UV k-Means GPU Miner Four to eight times faster than UV Experiments 18 Compare with CPU The results illustrate that our algorithm compares very favorably with other existing algorithms. Forty to two hundred times faster than CPU version Outline 19 Introduction Related work Method Research Plan Research Plan 20 Detail analysis about k-Means on GPU GFLOPS Deal with even larger data set Other data mining algorithms on GPU K-nn SDP (widely used in protein identification ) Q&A 21 Thanks very much