Download Week#02 - Intro2DataMining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Agenda
•
•
•
•
•
Introduction 2 Data Mining
Opas Wongtaweesap (Aj’OaT)
Intelligent Information Systems Development and
Research Laboratory Centre
Motivation
Motivation: Why data mining?
What is data mining?
Data Mining: On what kind of data?
Data mining functionality
Classification of data mining systems
• “Necessity is the Mother of Invention”
• Data explosion problem
• Automated data collection tools and mature
database technology lead to tremendous
amounts of data stored in databases, data
warehouses and other information repositories.
URL:: http://www.facebook.com/OaTCoM,
http://twitter.com/OaTCoM
E-mail:: [email protected], [email protected]
Data Mining & WEKA
Evolution of Database
Technology
2
Motivation (Con’t)
• 1960s:
• 1970s:
– Relational data model, relational DBMS implementation
• Solution: Data warehousing and data mining
– Data warehousing and on-line analytical processing
– Extraction of interesting knowledge (rules,
regularities, patterns, constraints) from data in large
databases
• 1980s:
– RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
Input Data
Data Mining & WEKA
Hidden Information Patterns
or Knowledge
Raw Information
– Data mining and data warehousing, multimedia databases,
and Web databases
4
Output (Information)
Process
• 1990s—2000s:
Data Mining & WEKA
What is Data Mining?
5
What is not Data Mining?
• Data mining (knowledge discovery in
databases):
•
•
•
•
– Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
information or patterns from data in large databases
Query Processing.
Expert Systems.
Small Machine Learning.
Statistical Programs.
Database
Technology
Information
Science
Data Mining & WEKA
6
Statistics
Machine
Learning
– Knowledge discovery in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology,
data dredging, information harvesting, business
intelligence etc.
7
Data Mining & WEKA
Confluence of Multiple Disciplines
• Alternative names:
Data Mining & WEKA
3
Motivation (Con’t)
• Rutherford D. Rogers.
• We are drowning in data, but starving for
knowledge!
– Data collection, database creation, IMS and network DBMS
Data Mining & WEKA
8
Visualization
Other
Disciplines
Data Mining & WEKA
9
Why Data Mining? Potential Applications
• Database analysis and decision support
Potential Applications
Forecasting
• http://www.tops.co.th/crm/index-crm-th.html
– Market analysis and management
• Target Marketing, Market Basket Analysis.
• Customer Relation Management (CRM).
• Cross Selling, Market Segmentation
– Risk analysis and management
• Forecasting, Customer Retention.
• Improved Underwriting, Quality Control.
• Competitive Analysis
– Fraud detection and management
Data Mining & WEKA
10
Why Data Mining? Potential Applications
Data Mining & WEKA
11
Data Mining: A KDD Process
Data Mining & WEKA
12
Steps of a KDD Process
• Other Applications
• Learning the application domain:
– relevant prior knowledge and goals of application
– Text Mining (NEWS Group, Email, Documents) and
Web Analysis.
– Intelligent query answering.
– Medical outcomes analysis, Astronomy.
– Sports scouting.
• Creating a target data set: data selection
• Data cleaning and preprocessing: (may take
60% of effort!)
• Data reduction and transformation:
– Find useful features, dimensionality/variable
reduction, invariant representation.
Data Mining & WEKA
13
Steps of a KDD Process
14
Data Mining and BI
• Choosing functions of data mining
– summarization, classification, regression, association,
clustering.
• Choosing the mining algorithm(s)
• Data mining: search for patterns of interest
• Pattern evaluation and knowledge
presentation
Increasing potential
to support
business decisions
Making
Decisions
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
Data Mining & WEKA
15
DM System Architecture
End User
Business
Analyst
Data
Analyst
Data Exploration
Statistical Analysis, Querying and Reporting
– visualization, transformation, removing redundant
patterns, etc.
• Use of discovered knowledge
Data Mining & WEKA
Data Mining & WEKA
16
Data Warehouses / Data Marts
OLAP, MDA
Data Sources
Paper, Files, Information
Database Systems, OLTP
Data Providers,
Mining & WEKA
DBA
17
Data Mining & WEKA
18
DM: On What Kind of Data?
•
•
•
•
Spatial databases
Time-series data
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
–
–
–
–
–
–
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
Data Mining & WEKA
19
Data Mining & WEKA
20
Data Mining Functionalities I
• Classification and Prediction
– Finding models (functions) that describe and
distinguish classes or concepts for future prediction
– E.g., classify countries based on climate, or classify
cars based on gas mileage
– Presentation: decision-tree, classification rule, neural
network
– Prediction: Predict some unknown or missing
numerical values
– Generalize, summarize, and contrast data
characteristics, e.g., dry vs. wet regions
• Association (correlation and causality)
– Multi-dimensional vs. single-dimensional association
– age(X, “20..29”) ^ income(X, “20..29K”) buys(X,
“PC”) [support = 2%, confidence = 60%]
– contains(T, “computer”) contains(x, “software”)
[1%, 75%]
22
Data Mining & WEKA
23
Data Mining Functionalities III
Data Mining Functionalities IV
• Cluster analysis
• Outlier analysis
– Class label is unknown:
Group data to form new
classes, e.g., cluster
houses to find distribution
patterns
– Clustering based on the
principle: maximizing the
intra-class similarity and
minimizing the interclass
similarity
Data Mining & WEKA
21
Data Mining Functionalities II
• Concept description: Characterization and
discrimination
Data Mining & WEKA
Data Mining & WEKA
Data Mining & WEKA
24
Outlier Analysis
– Outlier: a data object that does not comply with the
general behavior of the data
– It can be considered as noise or exception but is quite
useful in fraud detection, rare events analysis
• Trend and evolution analysis
– Trend and deviation: regression analysis
– Sequential pattern mining, periodicity analysis
– Similarity-based analysis
• Other pattern-directed or statistical analyses
25
Data Mining & WEKA
26
Data Mining & WEKA
27
Outlier Analysis (Con’t)
DM: Classification Schemes
Data Mining Output Model
• General functionality
• Data Mining functionalities are used to specify
the kind of patterns to be found in mining tasks.
– Descriptive data mining
– Predictive data mining
– Descriptive data mining / Unsupervised Model
• Characterize general properties of data in DB e.g.
Association rules mining.
• Different views, different classifications
–
–
–
–
Data Mining & WEKA
28
A Multi-Dimensional View of
Data Mining Classification
Data Mining & WEKA
– Predictive data mining / Supervised Model
• Perform inference on current data in order to make
predictions.
– Prediction Model.
– Classification Model.
29
A Multi-Dimensional View of
Data Mining Classification
• Databases to be mined
Data Mining & WEKA
30
Data Mining: A KDD Process
• Techniques utilized
– Relational, transactional, object-oriented, objectrelational, active, spatial, time-series, text, multimedia, heterogeneous, legacy, WWW, etc.
– Database-oriented, data warehouse (OLAP), machine
learning, statistics, visualization, neural network, etc.
• Applications adapted
• Knowledge to be mined
– Retail, telecommunication, banking, fraud analysis,
DNA mining (Bioinformatic), stock market analysis,
Web mining, Weblog analysis, etc.
– Characterization, discrimination, association,
classification, clustering, trend, deviation and outlier
analysis, etc.
– Multiple/integrated functions and mining at multiple
levels
Data Mining & WEKA
Kinds of databases to be mined
Kinds of knowledge to be discovered
Kinds of techniques utilized
Kinds of applications adapted
31
Data Mining & WEKA
32
Data Mining & WEKA
33
Related documents