Download Data Mining? - Telkom University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
PENGENALAN
DATA MINING
Fakultas Informatika – Telkom University
1
5/25/2017
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Task dalam Data mining
Fungsionalitas Data mining
Hubungan antara sistem data mining dengan
Sistem Basis Data, Sistem Data Warehouse,
dan Business Intelligence
Permasalahan dalam Data Mining
5/25/2017
Sistem belajar kita:
Student Centered Learning
3
5/25/2017
Latar Belakang Data Mining (1)
Melimpahnya Data
–Terciptanya data dari tools otomatis dan teknologi
basis data sehingga jumlah yang tercatat dalam basis
data atau media penyimpanan lain semakin
membesar
5/25/2017
Latar Belakang Data Mining (2)
Walaupun data teramat melimpah, namun yang
diolah menjadi knowledge sangat sedikit
Solusinya??  Data warehouse dan data mining
–Data warehouse dan OLAP (on-line analytical
processing)
–Ekstraksi knowledge yang menarik dalam bentuk rule,
regularities, pola, konstrain dll dari data yang
tersimpan dalam sejumlah besar basis data
5/25/2017
Top 10 Database Terbesar 2012
No
Badan/Organisasi
Jumlah Data
1
World Data Centre for Climate
• 20 terabytes of web
data
• 6 petabytes of additional
data
2
National Energy Research Scientific
Computing Center
• 2.8 petabytes of data
• Operated by 2,000
computational scientists
3
AT&T
• 23 terabytes of
information
• 1.9 trillion phone call
records
4
Google
•1 million searches per day
Sumber: http://www.siliconindia.com/news/enterpriseit/Top-10-Largest-Databases-in-theWorld-nid-118891-cid-7.html
5/25/2017
Perkembangan Data di Dunia
(1)
Source : Tan, 2004
5/25/2017
Perkembangan Data di Dunia (2)
The amount of data stored in various media
has doubled in three years, from 1999 to
2002. the amount of data put into storage
in 2002, five exabytes (one quintillion
bytes), was equal to the contents pf ahalf
a million new libraries, each containing a
digitised version of the print collection of
the entire US Library of Congress
(Lyman and varian, UC Berkeley, 2003)
5/25/2017
Perkembangan Data di Dunia (3)
" It is projected that just four years from now, the
world’s information base will be doubling in size
every 11 hours. So rapid is the growth in the global
stock of digital data that the very vocabulary used to
indicate quantities has had to expand to keep pace. A
decade or two ago, professional computer users and
managers worked in kilobytes and megabytes. Now
school children have access to laptops with tens of
gigabytes of storage, and network managers have to
think in terms of the terabyte (1,000 gigabytes) and
the petabyte (1,000 terabytes). Beyond those lie the
exabyte, zettabyte and yottabyte, each a thousand
times bigger than the last.
(IBM Global Technical Services white paper published in July 2006, titled, "The toxic terabyte: How
data-dumping threatens business efficiency.)
5/25/2017
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Hubungan sistem data mining dengan Sistem Basis Data,
Sistem Data Warehouse , dan Business Intelligence
Task dalam Data mining
Fungsionalitas Data mining
Permasalahan dalam Data Mining
5/25/2017
Data Mining?
5/25/2017
5/25/2017
Just Joke..
5/25/2017
Definisi Data Mining
Data mining is an iterative process within which progress is
defined by discovery, through either automatic or manual
methods. [Kantardzic , 2003]
Data mining (DM) is the extraction of hidden predictive
information from large databases (DBs). With the automatic
discovery of knowledge implicit within DBs, DM uses
sophisticated statistical analysis and modeling techniques to
uncover patterns and relationships hidden in organizational
DBs [Wang, 2003]
Data mining refers to extracting or \mining" knowledge from
large amounts of data [Han, 2005]
Non-trivial extraction of implicit, previously unknown and
potentially useful information from data [Tan, 2003]
5/25/2017
Awal Data Mining
Berawal dari
beberapa disiplin
ilmu, bertujuan
untuk memperbaiki
teknik tradisional
sehingga bisa
menangani:
–Jumlah data yang
sangat besar
–Dimensi data yang
tinggi
–Data yang heterogen
5/25/2017
dan berbeda bersifat
Jadi Data Mining??
Kata kunci data mining:
–Sifatnya non trivial/ iteratif
–Menemukan knowledge atau informasi
dari data yang berjumlah besar
 Data Mining merupakan inti dari
proses Knowledge Discovery in
Databases (KDD)
5/25/2017
Data Mining & Proses KDD
Data Mining
Evaluasi Pola
Task-relevant Data
Data
Warehouse
Selection
Data Cleaning
Data Integration
Database
s
5/25/2017
Source : Han 2004
Jenis Data pada Data Mining
database, data warehouse, database transaksional
Data streams dan sensor data
Time-series data, temporal data, sequence data
Struktur data, graf, social networks dan database
link
Object-relational database
Spatial data
spatiotemporal data
Multimedia database
Text databases
The World-Wide Web
5/25/2017
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Hubungan sistem data mining dengan Sistem
Basis Data, Sistem Data Warehouse , dan
Business Intelligence
Fungsionalitas Data mining
Task dalam Data mining
Permasalahan dalam Data Mining
5/25/2017
Arsitektur Sistem Data Mining
Graphical User Interface
Pattern Evaluation
Data Mining Engine
Database or Data
Warehouse Server
data cleaning, integration, and selection
Database
5/25/2017
Data
World-Wide Other Info
Repositories
Warehouse
Web
Know
ledge
-Base
Hubungan DM, DB dan DW
Untuk mengoptimalkan penggunaannya sistem Data
Mining seharusnya memiliki hubungan dengan sistem
basis data dan data warehouse.
Tidak adanya hubungan tidak direkomendasikan
misalnya seperti flat file processing
Hubungan Loose coupling misalkan mpengambilan
data dari DB/DW
Hubungan Semi-tight coupling, yakni utnuk menambah
performansi DM dengan pengimplementasian primitif
data mining dalam sistem DB/DW misalkan sorting,
indexing, aggregation, histogram analysis, multiway
join dll
Hubungan Tight coupling— merupakan enviroment
pemrosesan yang sama dimana DM terintegrasi dengan
sistem DB/DW, mining query dioptimasi berdasrkan
5/25/2017
mining query, indexing, metode pemrosesan query
Data Mining &
Business Intelligence
Meningkatkan potensi untuk
mendukung keputusan bisnis
Making
Decisions
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
End User
Business
Analyst
Data
Analyst
Data
Statistical Analysis,
Querying and Reporting
Exploration
Data Warehouses / Data Marts
OLAP, MDA
DBA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
5/25/2017
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Integrasi sistem data mining dengan Sistem
Basis Data,Sistem Data Warehouse , dan
Business Intelligence
Task dalam Data mining
Fungsionalitas Data mining
Permasalahan dalam Data Mining
5/25/2017
Task dalam Data Mining
Metode Prediksi
–Dengan menggunakan beberapa variabel untuk
memprediksi nilai yang belum diketahui (unknown )
atau nilai selanjutnya (future) dari variabel lain
Contoh:
Classification
Regression
Deviation Detection
Metode Deskripsi
–Menemukan pola pendeskripsian data yang dapat
diinterpretasikan oleh manusia
Contoh:
Clustering
Association Rule Discovery
Sequential Pattern Discovery
5/25/2017
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Integrasi sistem data mining dengan Sistem
Basis Data,Sistem Data Warehouse , dan
Business Intelligence
Task dalam Data mining
Fungsionalitas Data mining
Permasalahan dalam Data Mining
5/25/2017
Fungsionalitas Data Mining (1)
Klasifikasi dan Prediksi
Frequent patterns, asosiasi , korelasi dan
kausalitas
Analisis klaster
Analisis Outlier
Analysis Trend dan evolution
Analisis statistik
5/25/2017
Aplikasi Data Mining (1)
 Analisis dan Manajemen Pasar
▪ target pemasaran, customer relation
management (CRM), market basket analysis,
cross selling, segmentasi pasar
 Analisis dan Manajemen Resiko
▪ Forecasting, customer retention, quality
control, analisis kompetisi
 Deteksi
dan
(kecurangan)
 Text
manajemen
mining (news group,
dokumen) dan Analisis Web.
5/25/2017
fraud
email,
Aplikasi Data Mining (2)
Marketing and Sales Promotion
Supermarket shelf management.
Inventory Management
Diagnosis Medis
Collaborative Filtering
Business Intelligence
Network Intrusion detection
Deteksi spam
dll
5/25/2017
5/25/2017
5/25/2017
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Integrasi sistem data mining dengan Sistem
Basis Data,Sistem Data Warehouse , dan
Business Intelligence
Task dalam Data mining
Fungsionalitas Data mining
Permasalahan dalam Data Mining
5/25/2017
Permasalahan Utama
Bagaimana Menentukan metodologi mining?
karena:
Tipe data berbeda
Performansi yang diharapkan dari segi keefektifan, efisiensi dan
skalabilitas bisa jadi berbeda tiap metodologi
Evaluasi pola yanki pengukuran “interestingness’ yang berbeda
Penanganan missing value dan noise
dll
Bagaimana Bentuk Interaksi dengan User? Apakah:
–Menggunakan
Data mining query languages dan ad-hoc
mining
–Hasil data mining berupa ekspresi dan visualisasi
Aplikasi dan Dampak Sosial
5/25/2017
5/25/2017
5/25/2017