Download Cluster Analysis for Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Business Intelligence:
A Managerial Approach
(2nd Edition)
Chapter 4:
Data Mining for Business
Intelligence
4.1 DATA MINING CONCEPTS
AND DEFINITIONS


It used to understanding customers, vendors,
business processes, and the extended supply
chain very well.
Although the term data mining is relatively new,
the ideas behind it are not.

2-2
Why, then, has it suddenly gained the attention
of the business world??
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
4.1 DATA MINING CONCEPTS
AND DEFINITIONS

2-3
WHY DATA MINING ?
 More intense competition at the global scale
 Recognition of the value in data sources
 Availability of quality data on customers, vendors,
transactions, Web, etc.
 Consolidation and integration of data repositories into data
warehouses
 The exponential increase in data processing and storage
capabilities; and decrease in cost
 Movement toward conversion of information resources into
nonphysical form
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Definitions, Characteristics, and
Benefits

Definitions:


2-4
is a term used to describe discovering or "mining"
knowledge from large amounts of data. “knowledge
mining”
Technically speaking, data mining is a process that
uses statistical, mathematical, and artificial intelligence
techniques to extract and identify useful information
and subsequent knowledge (or patterns) from large
sets of data.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Definition



2-5
The nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data stored in structured databases
Keywords in this definition: Process, nontrivial , valid,
novel, potentially useful, understandable
Other names: knowledge extraction, pattern analysis,
knowledge discovery, information harvesting, pattern
searching, data dredging
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining
Characteristics/Objectives







2-6
Source of data for DM is often a consolidated data warehouse (not
always!).
DM environment is usually a client-server
Sophisticated new tools, including advanced visualization tools,
help to remove the information ore buried in corporate files or
archival public records
The miner is often an end user.
Striking it rich requires creative thinking.
Data mining tools are readily combined with spreadsheets and
other software development tools.
Because of the large amounts of data and massive search efforts,
it is sometimes necessary to use parallel processing for data
mining.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining at the Intersection
of Many Disciplines
ial
e
Int
tis
tic
s
c
tifi
Ar
Pattern
Recognition
en
Sta
llig
Mathematical
Modeling
Machine
Learning
Databases
Management Science &
Information Systems
2-7
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
ce
DATA
MINING
What Does DM Do?
How Does it Work?
DM extracts patterns from data

Pattern?
A mathematical (numeric and/or symbolic) relationship among
data items

Types of patterns





2-8
Association
Prediction
Cluster (segmentation)
Sequential (or time series) relationships
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
A Taxonomy for Data Mining
Tasks.
2-9
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Tasks.


2-10
PREDICTION :is commonly referred to as the act of
telling about the future.
prediction can be named more specifically as classification:
(where the predicted thing, such as tomorrow's forecast, is a
class label such as "rainy" or "sunny") or regression: (where the
predicted thing, such tomorrow's temperature, is a real number
such as "65°F").
 Classification, is perhaps the most common of all data
mining tasks. The objective of classification is to analyze the
historical data stored in a database and automatically
generate a model that can predict future behavior.
 (Neural networks OR Decision trees )
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Tasks.


2-11
Clustering: partitions a collection of things
(e.g. , objects and events presented in a
structured dataset) into segments (or natural
groupings) whose members share similar
characteristics.
market segmentation with cluster analysis. OR
segmenting customers.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Tasks.


Associations: is a popular and wellresearched technique for discovering
interesting relationships among
variables in large databases
In the context of the retail industry ,
association rule mining is often called
market-basket analysis
2-12
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Applications











2-13
Customer Relationship Management • Healthcare
Banking & Other Financial
• Medicine
• Entertainment industry
Retailing and Logistics
• Sports
Manufacturing and Maintenance
•Etc
Brokerage and Securities Trading
Insurance
Computer hardware and software
Science and engineering
Government and defense
Homeland security and law enforcement
Travel industry
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Process: CRISP-DM
Cross-Indust1y
Standard
Process
proposed in the
mid-1990s by a
European
consortium of
companies
2-14
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Process: CRISP-DM
Step 1: Business Understanding
know what the study is for like ("What are the common
characteristics of the customers we have)
Step 2: Data Understanding
identify the relevant data from many available databases.
quantitative OR qualitative
Step 3: Data Preparation (!)


2-15
(table 4.4)
called as data preprocessing . take the data identified in the
previous step and prepare them for analysis by data mining
data preprocessing consumes the most time and effort; most believe
that this step accounts for roughly 80% of the total time.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Process: CRISP-DM
Step 4: Model Building
use a variety of data mining methods and algorithms
Step 5: Testing and Evaluation
a critical and challenging task
Step 6: Deployment
The deployment step may also include maintenance
activities
2-16
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Preparation – A Critical DM
Task (see table 4.4)
Real-world
Data
Data Consolidation
·
·
·
Collect data
Select data
Integrate data
Data Cleaning
·
·
·
Impute missing values
Reduce noise in data
Eliminate inconsistencies
Data Transformation
·
·
·
Normalize data
Discretize/aggregate data
Construct new attributes
Data Reduction
·
·
·
Reduce number of variables
Reduce number of cases
Balance skewed data
Well-formed
Data
2-17
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Process: SEMMA
Sample
(Generate a representative
sample of the data)
Assess
Explore
(Evaluate the accuracy and
usefulness of the models)
(Visualization and basic
description of the data)
SEMMA
2-18
Model
Modify
(Use variety of statistical and
machine learning models )
(Select variables, transform
variable representations)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Process
Source: KDNuggets.com, August 2007
2-19
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Methods:
Classification





2-20
Most frequently used DM method
Part of the machine-learning family
Employ supervised learning
Learn from past data, classify new data
The output variable is categorical
(nominal or ordinal) in nature
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Classification Techniques








2-21
Decision tree analysis
Statistical analysis
Neural networks
Support vector machines
Case-based reasoning
Bayesian classifiers
Genetic algorithms
Rough sets
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Decision Trees


A general
algorithm
for
decision
tree
building
Employs the divide and conquer method
Recursively divides a training set until each
division consists of examples from one class
1.
2.
3.
4.
2-22
Create a root node and assign all of the training
data to it
Select the best splitting attribute
Add a branch to the root node for each value of
the split. Split the data into mutually exclusive
subsets along the lines of the specific split
Repeat the steps 2 and 3 for each and every leaf
node until the stopping criteria is reached
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Decision Tree
2-23
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining






2-24
Used for automatic identification of
natural groupings of things
Part of the machine-learning family
Employ unsupervised learning
Learns the clusters of things from past
data, then assigns new instances
There is not an output variable
Also known as segmentation
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining

Clustering results may be used to





2-25
Identify natural groupings of customers
Identify rules for assigning new cases to
classes for targeting/diagnostic purposes
Provide characterization, definition,
labeling of populations
Decrease the size and complexity of
problems for other data mining methods
Identify outliers in a specific domain (e.g.,
rare-event detection)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining

Analysis methods





2-26
Statistical methods (including both
hierarchical and nonhierarchical), such as
k-means, k-modes, and so on
Neural networks (adaptive resonance
theory [ART], self-organizing map [SOM])
Fuzzy logic (e.g., fuzzy c-means algorithm)
Genetic algorithms
Divisive versus Agglomerative methods
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining

k-Means Clustering Algorithm

k : pre-determined number of clusters

Algorithm (Step 0: determine value of k)
Step 1: Randomly generate k random points as
initial cluster centers
Step 2: Assign each point to the nearest cluster
center
Step 3: Re-compute the new cluster centers
Repetition step: Repeat steps 3 and 4 until some
convergence criterion is met (usually that the
assignment of points to clusters becomes stable)
2-27
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Miningk-Means Clustering Algorithm
2-28
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining







2-29
A very popular DM method in business
Finds interesting relationships (affinities)
between variables (items or events)
Part of machine learning family
Employs unsupervised learning
There is no output variable
Also known as market basket analysis
Often used as an example to describe DM to
ordinary people, such as the famous
“relationship between diapers and beers!”
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining




Input: the simple point-of-sale transaction data
Output: Most frequent affinities among items
Example: according to the transaction data…
“Customer who bought a laptop computer and a virus
protection software, also bought extended service plan
70 percent of the time."
How do you use such a pattern/knowledge?



2-30
Put the items next to each other for ease of finding
Promote the items as a package (do not put one on sale if the
other(s) are on sale)
Place items far apart from each other so that the customer
has to walk the aisles to search for it, and by doing so
potentially seeing and buying other items
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining

A representative applications of association
rule mining include


2-31
In business: cross-marketing, cross-selling, store
design, catalog design, e-commerce site design,
optimization of online advertising, product pricing,
and sales/promotion configuration
In medicine: relationships between symptoms and
illnesses; diagnosis and patient characteristics and
treatments (to be used in medical DSS); and genes
and their functions (to be used in genomics
projects)…
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining

Are all association rules interesting and useful?
A Generic Rule: X  Y [S%, C%]
X, Y: products and/or services
X: Left-hand-side (LHS)
Y: Right-hand-side (RHS)
S: Support: how often X and Y go together
C: Confidence: how often Y go together with the X
Example: {Laptop Computer, Antivirus Software} 
{Extended Service Plan} [30%, 70%]
2-32
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining

Apriori Algorithm


Finds subsets that are common to at least
a minimum number of the itemsets
uses a bottom-up approach



2-33
frequent subsets are extended one item at a
time (the size of frequent subsets increases
from one-item subsets to two-item subsets,
then three-item subsets, and so on), and
groups of candidates at each level are tested
against the data for minimum support
see the figure…
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining

Apriori Algorithm
Raw Transaction Data
2-34
One-item Itemsets
Two-item Itemsets
Three-item Itemsets
Transaction
No
SKUs
(Item No)
Itemset
(SKUs)
Support
Itemset
(SKUs)
Support
Itemset
(SKUs)
Support
1
1, 2, 3, 4
1
3
1, 2
3
1, 2, 4
3
1
2, 3, 4
2
6
1, 3
2
2, 3, 4
3
1
2, 3
3
4
1, 4
3
1
1, 2, 4
4
5
2, 3
4
1
1, 2, 3, 4
2, 4
5
1
2, 4
3, 4
3
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Problem Decomposition –
Example
Transaction ID Items Bought
1
Shoes, Shirt, Jacket
2
Shoes,Jacket
3
Shoes, Jeans
4
Shirt, Sweatshirt
Frequent Itemset
{Shoes}
{Shirt}
{Jacket}
{Shoes, Jacket}
Support
75%
50%
50%
50%
For min support = 50% = 2 trans,
and min confidence = 50%
For the rule Shoes  Jacket
•Support = Sup({Shoes,Jacket)}=50%
50
•Confidence =
=66.6%
75
Jacket  Shoes has 50% support and 100% confidence
2-35
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
2-36
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
2-37
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Element of ANN



2-38
PROCESSING ELEMENTS (PE) The PE of an ANN are essentially
artificial neurons. Similar to biological neurons.
INFORMATION PROCESSING The inputs received by a neuron
go through a two-step process to turn into outputs: summation
function and transformation function
NETWORK STRUCTURE Each ANN is composed of a collection of
neurons (or PE) that are grouped into layers
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall