Download Data Mining Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining
Kelby Lee
3-1
Overview










Transaction Database
What is Data Mining
Data Mining Primitives
Data Mining Objectives
Predictive Modeling
Knowledge Discovery
Other Objectives to Data Mining
What Data Mining is Not
Other Factors in Data Mining Categorization
Conclusion
3-2
Transaction Database
 Relation
Consisting of Transactions
 TID (Transaction Identifier)
 Regularities between Transaction
Behavior
3-3
Transaction Database
Table 1.1 Transaction Database
TID
Customer
Item
Date
Price
Quantity
--------------------------------------------------------------------------------------------------------------------------------100
C1
chocolate
01/11/2001
1.59
2
100
C1
ice cream
01/11/2001
1.89
1
200
C2
chocolate
01/12/2001
1.59
3
200
C2
candy bar
01/12/2001
1.19
2
200
C2
jackets
01/12/2001
120.39
2
300
C3
jackets
01/14/2001
168.88
1
300
C3
color shirts
01/14/2001
27.95
2
400
C4
jackets
01/15/2001
149.49
1
3-4
Association Rules
A
customer who buys chocolate will
likely buy candy bar
 one type of Data Mining task
3-5
Discovered Rules
Table 1.2 Discovered Rules
Rule
Bought this...
...also bought that
------------------------------------------------------------------------------------------------1
chocolate
ice cream
2
candy bar
chocolate
3
ski pants
colored shirt
4
beer
diaper
3-6
What is Data Mining
 Retrieve
individual elements
Given a name of a product, find price and
producer
 Analysis
Average monthly sales amount and
derivation
3-7
Advances Allow For
 Large
amounts of Data to be Handled
 Aspect of Analysis
 “Data Rich” but “Knowledge Poor”
3-8
Discover Patterns
 Improve
Business Performance
Exploit favorable patterns
Avoid problematic patterns
 Increase
Understanding
 Predict Outcome
3-9
Answer the Key Business
Questions
 Who
will buy? What will they buy? How
much?
Classification and Prediction
 What
are the different types of
Customers?
Segmentation of Customers
3-10
Answer the Key Business
Questions
 What
relationship exists between
customers or Website visitors and the
products?
Association
 What
are the groupings hidden in the
data?
Clustering Analysis
3-11
Data Mining Definition
Non Trivial Extraction of implicit,
previously unknown, interesting, and
potentially useful information from data
3-12
Different Types of Data
Mining
 Business
Data Mining
 Scientific Data Mining
 Internet Data Mining
3-13
Data Mining Applications
 Medical
 Control
Theory
 Engineering
 Public Administration
 Marketing and Finance
 Data Mining on the Web
 Scientific Data Base
 Fraud Detection
3-14
Data Mining Primitives
 Fundamental
Elements Needed to Define
a Data Mining Task
 Eight Elements (P,D,K,B,T,M,I,U)
8 - Tuple
3-15
Elements
P
- Problem Specification
 D - Task Relevant Data
 K - Kind of Knowledge to be Mined
 B - Background Knowledge
 T - Specific algorithms or techniques
 M - Models developed or knowledge
patterns extracted
 I - Interestingness
 U- User
3-16
Diagram
3-17
Relationship between
Elements
 User
Defines Problem (P) and specifies
Interestingness (I)
 Data Miner with K and T as core
elements utilizing D and B and
incorporates I
 Data Miner produces M
3-18
Data Mining Objectives
 Discovery
Finding human interpretable patterns
describing the data
 Prediction
Using some variables or fields in database
to predict unknown or future values or other
variables of interest
3-19
Data Mining Objectives
 Knowledge
Discovery
Stage somewhat prior to prediction where
information is insufficient
Closer to decision support
3-20
Predictive Modeling
 Predict
Values Based on Similar Groups
of Data
 Submit records with some unknown
fields and system will predict value
3-21
Predictive Modeling
 Pattern
Recognition
Association of an observation to past
experience or knowledge
Interchangeable with classification
3-22
Predictive Modeling
 Classification
Process of assigning finite set of labels to
an observation
 Estimation
Assign infinite number of numeric labels to
an observation
3-23
Knowledge Discovery
 Find
Patterns in Data Base
If someone buys one thing, what else will
they buy
 Interesting
+ Certain = Knowledge
Output called Discovered Knowledge
 KDD
- Knowledge Discovery in Data
Base
3-24
Data Mining
 Is
about why, about hidden regularities,
important aspect related to perception,
learning and evolving
 Decision support process in which we
search patterns of information in data
Once found, display in suitable format
3-25
Four Points of KDD
 Discovered
Knowledge Represented in
High-Level Language
 Accurately Portray contents of Database
 Interesting to user
 Process is Efficient
3-26
Important Issues
 Human
Centered
Under control of human user to meet
human needs
 Incorporate
Interestingness
 Provide Various Types
 Provide Visualization
3-27
Other Objectives
 Forensic
analysis
Applying extracted patterns to find
anomalous or unusual data elements largely
involved in business applications
Find out what the norm is and find those
that deviate from the norm
3-28
What Data Mining is Not
 Analysis
vs Monitoring
Analysis - previously collected information
Monitoring
 Collect
data as it comes in and compare to set of
conditions
 Unexpected
Discovery
Must have general goal in mind
3-29
Other Factors in
Categorization
 Data
Retention
Data is retained for future pattern matching
 Pattern
Distillation
Analyse data, extract pattern, leave data
behind
3-30
Conclusion









Transaction Database
What is Data Mining
Data Mining Primitives
Data Mining Objectives
Predictive Modeling
Knowledge Discovery
Other Objectives to Data Mining
What Data Mining is Not
Other Factors in Data Mining Categorization
3-31