Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PENGENALAN DATA MINING Fakultas Informatika – Telkom University 1 5/25/2017 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Task dalam Data mining Fungsionalitas Data mining Hubungan antara sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse, dan Business Intelligence Permasalahan dalam Data Mining 5/25/2017 Sistem belajar kita: Student Centered Learning 3 5/25/2017 Latar Belakang Data Mining (1) Melimpahnya Data –Terciptanya data dari tools otomatis dan teknologi basis data sehingga jumlah yang tercatat dalam basis data atau media penyimpanan lain semakin membesar 5/25/2017 Latar Belakang Data Mining (2) Walaupun data teramat melimpah, namun yang diolah menjadi knowledge sangat sedikit Solusinya?? Data warehouse dan data mining –Data warehouse dan OLAP (on-line analytical processing) –Ekstraksi knowledge yang menarik dalam bentuk rule, regularities, pola, konstrain dll dari data yang tersimpan dalam sejumlah besar basis data 5/25/2017 Top 10 Database Terbesar 2012 No Badan/Organisasi Jumlah Data 1 World Data Centre for Climate • 20 terabytes of web data • 6 petabytes of additional data 2 National Energy Research Scientific Computing Center • 2.8 petabytes of data • Operated by 2,000 computational scientists 3 AT&T • 23 terabytes of information • 1.9 trillion phone call records 4 Google •1 million searches per day Sumber: http://www.siliconindia.com/news/enterpriseit/Top-10-Largest-Databases-in-theWorld-nid-118891-cid-7.html 5/25/2017 Perkembangan Data di Dunia (1) Source : Tan, 2004 5/25/2017 Perkembangan Data di Dunia (2) The amount of data stored in various media has doubled in three years, from 1999 to 2002. the amount of data put into storage in 2002, five exabytes (one quintillion bytes), was equal to the contents pf ahalf a million new libraries, each containing a digitised version of the print collection of the entire US Library of Congress (Lyman and varian, UC Berkeley, 2003) 5/25/2017 Perkembangan Data di Dunia (3) " It is projected that just four years from now, the world’s information base will be doubling in size every 11 hours. So rapid is the growth in the global stock of digital data that the very vocabulary used to indicate quantities has had to expand to keep pace. A decade or two ago, professional computer users and managers worked in kilobytes and megabytes. Now school children have access to laptops with tens of gigabytes of storage, and network managers have to think in terms of the terabyte (1,000 gigabytes) and the petabyte (1,000 terabytes). Beyond those lie the exabyte, zettabyte and yottabyte, each a thousand times bigger than the last. (IBM Global Technical Services white paper published in July 2006, titled, "The toxic terabyte: How data-dumping threatens business efficiency.) 5/25/2017 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Hubungan sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 5/25/2017 Data Mining? 5/25/2017 5/25/2017 Just Joke.. 5/25/2017 Definisi Data Mining Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. [Kantardzic , 2003] Data mining (DM) is the extraction of hidden predictive information from large databases (DBs). With the automatic discovery of knowledge implicit within DBs, DM uses sophisticated statistical analysis and modeling techniques to uncover patterns and relationships hidden in organizational DBs [Wang, 2003] Data mining refers to extracting or \mining" knowledge from large amounts of data [Han, 2005] Non-trivial extraction of implicit, previously unknown and potentially useful information from data [Tan, 2003] 5/25/2017 Awal Data Mining Berawal dari beberapa disiplin ilmu, bertujuan untuk memperbaiki teknik tradisional sehingga bisa menangani: –Jumlah data yang sangat besar –Dimensi data yang tinggi –Data yang heterogen 5/25/2017 dan berbeda bersifat Jadi Data Mining?? Kata kunci data mining: –Sifatnya non trivial/ iteratif –Menemukan knowledge atau informasi dari data yang berjumlah besar Data Mining merupakan inti dari proses Knowledge Discovery in Databases (KDD) 5/25/2017 Data Mining & Proses KDD Data Mining Evaluasi Pola Task-relevant Data Data Warehouse Selection Data Cleaning Data Integration Database s 5/25/2017 Source : Han 2004 Jenis Data pada Data Mining database, data warehouse, database transaksional Data streams dan sensor data Time-series data, temporal data, sequence data Struktur data, graf, social networks dan database link Object-relational database Spatial data spatiotemporal data Multimedia database Text databases The World-Wide Web 5/25/2017 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Hubungan sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse , dan Business Intelligence Fungsionalitas Data mining Task dalam Data mining Permasalahan dalam Data Mining 5/25/2017 Arsitektur Sistem Data Mining Graphical User Interface Pattern Evaluation Data Mining Engine Database or Data Warehouse Server data cleaning, integration, and selection Database 5/25/2017 Data World-Wide Other Info Repositories Warehouse Web Know ledge -Base Hubungan DM, DB dan DW Untuk mengoptimalkan penggunaannya sistem Data Mining seharusnya memiliki hubungan dengan sistem basis data dan data warehouse. Tidak adanya hubungan tidak direkomendasikan misalnya seperti flat file processing Hubungan Loose coupling misalkan mpengambilan data dari DB/DW Hubungan Semi-tight coupling, yakni utnuk menambah performansi DM dengan pengimplementasian primitif data mining dalam sistem DB/DW misalkan sorting, indexing, aggregation, histogram analysis, multiway join dll Hubungan Tight coupling— merupakan enviroment pemrosesan yang sama dimana DM terintegrasi dengan sistem DB/DW, mining query dioptimasi berdasrkan 5/25/2017 mining query, indexing, metode pemrosesan query Data Mining & Business Intelligence Meningkatkan potensi untuk mendukung keputusan bisnis Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery End User Business Analyst Data Analyst Data Statistical Analysis, Querying and Reporting Exploration Data Warehouses / Data Marts OLAP, MDA DBA Data Sources Paper, Files, Information Providers, Database Systems, OLTP 5/25/2017 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 5/25/2017 Task dalam Data Mining Metode Prediksi –Dengan menggunakan beberapa variabel untuk memprediksi nilai yang belum diketahui (unknown ) atau nilai selanjutnya (future) dari variabel lain Contoh: Classification Regression Deviation Detection Metode Deskripsi –Menemukan pola pendeskripsian data yang dapat diinterpretasikan oleh manusia Contoh: Clustering Association Rule Discovery Sequential Pattern Discovery 5/25/2017 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 5/25/2017 Fungsionalitas Data Mining (1) Klasifikasi dan Prediksi Frequent patterns, asosiasi , korelasi dan kausalitas Analisis klaster Analisis Outlier Analysis Trend dan evolution Analisis statistik 5/25/2017 Aplikasi Data Mining (1) Analisis dan Manajemen Pasar ▪ target pemasaran, customer relation management (CRM), market basket analysis, cross selling, segmentasi pasar Analisis dan Manajemen Resiko ▪ Forecasting, customer retention, quality control, analisis kompetisi Deteksi dan (kecurangan) Text manajemen mining (news group, dokumen) dan Analisis Web. 5/25/2017 fraud email, Aplikasi Data Mining (2) Marketing and Sales Promotion Supermarket shelf management. Inventory Management Diagnosis Medis Collaborative Filtering Business Intelligence Network Intrusion detection Deteksi spam dll 5/25/2017 5/25/2017 5/25/2017 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 5/25/2017 Permasalahan Utama Bagaimana Menentukan metodologi mining? karena: Tipe data berbeda Performansi yang diharapkan dari segi keefektifan, efisiensi dan skalabilitas bisa jadi berbeda tiap metodologi Evaluasi pola yanki pengukuran “interestingness’ yang berbeda Penanganan missing value dan noise dll Bagaimana Bentuk Interaksi dengan User? Apakah: –Menggunakan Data mining query languages dan ad-hoc mining –Hasil data mining berupa ekspresi dan visualisasi Aplikasi dan Dampak Sosial 5/25/2017 5/25/2017 5/25/2017