Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STMIK AMIKOM Yogyakarta Chapter 9 ALGORITME Cluster dan WEKA Clustering K-Means Case Sulidar Fitri, M.Sc Data Mining © Sulidar Fitri, Ms.C STMIK AMIKOM Yogyakarta REFERENCES • Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. 2006. Department of Computer Science University of Illinois at Urbana-Champaign. www.cs.uiuc.edu/~hanj • Ian H. Witten, Eibe Frank, Mark A. Hall. Data Mining Practical Machine Learning Tools and Techniques Third Edition.2011. Elsevier • Kusrini dan Luthfi, E., 2009, Algoritma Data Mining, Penerbit Andi • Kusrini, Pattern Recognition. • WEKA Data Mining © Sulidar Fitri, Ms.C Clustering Introduction The previous data mining task of classification deals with partitioning data based on a preclassified training sample Clustering is an automated process to group related records together. Related records are grouped together on the basis of having similar values for attributes The groups are usually disjoint Data Mining © Sulidar Fitri, Ms.C Via (Yohana, 2011) Data Mining © Sulidar Fitri, Ms.C (Larose, 2005) Data Mining © Sulidar Fitri, Ms.C Contoh Kasus: Proses pendeskritan kelas kontinyu Input Data awal, berupa data kontinyu atau data diskret Delta, yaitu nilai yang digunakan untuk menentukan selisih centroid dan mean yang diijinkan Output: tabel pemetaan yang berisi kelas diskret beserta nilai centroidnya Data Mining © Sulidar Fitri, Ms.C Langkah Proses: 1.Tentukan jumlah cluster 2.Alokasikan data ke dalam cluster secara random 3.Hitung centroid/rata-rata dari data yang ada di masing- masing cluster 4.Alokasikan masing-masing data ke centroid/rata-rata terdekat 5.Kembali ke Step 3, apabila masih ada data yang berpindah cluster atau apabila perubahan nilai centroid, ada yang di atas nilai threshold yang ditentukan atau apabila perubahan nilai pada objective function yang digunakan di atas nilai threshold yang ditentukan Data Mining © Sulidar Fitri, Ms.C Penentuan centroid: acak atau ditentukan dengan rumus Data Mining © Sulidar Fitri, Ms.C Input: 79, 85, 83, 90, 82, 81, 85, 87, 89 dan 84 Jumlah kelas target: 3 delta : 0,01 Proses: Min: 79 Max : 90 Toleransi error: 0.01 * (90-79) : 0.11 Data Mining © Sulidar Fitri, Ms.C Min: 79, max: 90 Centroid awal C2 dan C3? Data Mining © Sulidar Fitri, Ms.C 0,92 > error (0.11) Rerata menjadi centroid baru Data Mining © Sulidar Fitri, Ms.C Data Mining © Sulidar Fitri, Ms.C STMIK AMIKOM Yogyakarta WEKA PRACTICE Data Mining © Sulidar Fitri, Ms.C STMIK AMIKOM Yogyakarta Clustering • • • • • • Buka weka dan input data .arff Pilih tab Cluster Choose algoritma kMeans Pilih Cluster/kelompok yang diinginkan berapa Start Baca outputnya Data Mining © Sulidar Fitri, Ms.C STMIK AMIKOM Yogyakarta GET STARTED Data Mining © Sulidar Fitri, Ms.C