Download Data Mining Data Mining – Task Types Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Multinomial logistic regression wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
3/24/2011
Data Mining
By Susan Miertschin
1
Data Mining – Task Types
Data mining is useful for certain types of tasks
As new algorithms are developed and evolve, new task types or
extensions of existing task types may evolve
2
Data Mining - Task Types
 Classification
 Clustering
 Discovering Association Rules
 Discovering Sequential Patterns – Sequence Analysis
 Regression
 Detecting Deviations from Normal
3
1
3/24/2011
Prediction versus Description
 The purpose of some tasks is to describe the status quo
 The purpose of some tasks is be able to predict something
based on something else
4
Supervised versus Unsupervised
 The purpose of some tasks is to describe the status quo
 Techniques in this category are referred to as unsupervised
 The purpose of some tasks is be able to predict something
based on something else
 Techniques in this category are referred to as supervised
5
Description and Examples
Data Mining - Task Types
6
2
3/24/2011
Data Mining - Task Types
 Classification
 Fit items into slots – Larson
 Assign items in a collection to target categories or classes –
Oracle book (http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/classify.htm)
 Clustering
 Discovering Association Rules
 Discovering Sequential Patterns – Sequence Analysis
 Regression
 Detective Deviations from Normal
7
Classification
 Here is data about loan applicants. Which are good risks and
which are poor risks?
 Which of our current customers are likely to increase their
current rate of spending if given an affinity card?
 Which consumers are likely to buy a new cell-phone
cell phone product
if I send them a direct mailing?
8
Classification
Predictive or Descriptive?
Supervised or Unsupervised?
9
3
3/24/2011
Classification
Predictive
Supervised
10
Classification – Predictive - Supervised
 Predict which loan applicants are good risks and which are
poor risks
 Predict which of our current customers are likely to increase
their current rate of spending if given an affinity card
 Predict which consumers are likely to buy a new cell-phone
cell phone
product if I send them a direct mailing
 Predict which patients will respond better to treatment A or
treatment B
11
Data Mining - Task Types
 Classification
 Clustering
 Divide data into groups with similar characteristics - Larson
 Find clusters of data objects similar in some way to one another
– Oracle book
(http://download oracle com/docs/cd/B28359 01/datamine 111/b28129/clustering htm)
(http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/clustering.htm)
 Discovering Association Rules
 Discovering Sequential Patterns – Sequence Analysis
 Regression
 Detective Deviations from Normal
12
4
3/24/2011
Clustering
 Find customers similar to each other based on geographical
distance to nearest store-front location, number of small
dogs owned, number of cats owned, and number of children
in household
 Purpose?
p
Target
g niche markets,, pplan new stores
 Find cardiologists who are similar with respect to likelihood
of prescribing a certain class of medication for treatment of
congestive heart failure (based on hospital patient records)
and patient mix demographics
 Purpose? Target these cardiologists for a particular marketing
effort related to a new pharmaceutical product
13
Clustering
Predictive or Descriptive?
Supervised or Unsupervised?
14
Clustering
Descriptive
Unsupervised
15
5
3/24/2011
Data Mining - Task Types
 Classification
 Clustering
 Discovering Association Rules
 Find patterns in group membership - Larson
 Findingg p
probabilityy of the co-occurrence of items in a collection
– Oracle book (http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/market_basket.htm)
 Produce dependency rules which will predict occurrence of an
item or event based on occurrences of other items
 Discovering Sequential Patterns – Sequence Analysis
 Regression
 Detective Deviations from Normal
16
Association
 Also called market basket analysis
 Customers who bought this book also bought this other book
in the same purchase at what rate?
 Patients who were treated with drug X developed side effect
B at a particular rate
rate, what else did the side effect B people
have in common?
17
Association
Predictive or Descriptive?
Supervised or Unsupervised?
18
6
3/24/2011
Association
Predictive
Supervised
19
Association – Predictive - Supervised
 Predict which second book a customer choosing a first book
might like
 Predict which patients undergoing treatment with drug X
will develop side effect B
20
Data Mining - Task Types
 Classification
 Clustering
 Discovering Association Rules
 Discovering Sequential Patterns – Sequence Analysis
 Predict future routes based on past routes – Larson
 Given is a set of objects, with each object associated with its
own timeline of events, find rules that predict sequential
dependencies among different events – Swaroop and Golden
(http://www.rhsmith.umd.edu/faculty/bgolden/classes_links/2009_jan_data%20mining_BUDT%20758.pdf)
 Regression
 Detective Deviations from Normal
21
7
3/24/2011
Sequence Analysis
 A user is on web page A, what page is the user most likely to
navigate to next?
 A customer buys a Kindle. What is the customer most likely
to do next?
 Does the sequential pattern of events in an event log tell us
anything about server outages? Network outages?
22
Sequence Analysis
Predictive or Descriptive?
Supervised or Unsupervised?
23
Sequence Analysis
Predictive and Descriptive methods
Supervised and Unsupervised methods
24
8
3/24/2011
Sequence Analysis – Predictive and/or
Descriptive
25
Data Mining - Task Types
 Classification
 Clustering
 Discovering Association Rules
 Discovering Sequential Patterns – Sequence Analysis
 Regression
 Predict the value of a continuous variable (as opposed to a
categorical variable) – Larson
 Regression predicts a number – Oracle book
(http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/regress.htm)
 Detective Deviations from Normal
26
Regression
 Here is data about loan applicants. Which are good risks and
which are poor risks?
 Which of our current customers are likely to increase their
current rate of spending if given an affinity card?
 Which consumers are likely to buy a new cell-phone
cell phone product
if I send them a direct mailing?
27
9
3/24/2011
Regression
Predictive or Descriptive?
Supervised or Unsupervised?
28
Regression
Predictive
Supervised
29
Regression – Predictive - Supervised
 Predict which loan applicants are good risks and which are
poor risks
 Predict which of our current customers are likely to increase
their current rate of spending if given an affinity card
 Predict which consumers are likely to buy a new cell-phone
cell phone
product if I send them a direct mailing
 Predict which patients will respond better to treatment A or
treatment B
30
10
3/24/2011
Data Mining - Task Types
 Classification
 Clustering
 Discovering Association Rules
 Discovering Sequential Patterns – Sequence Analysis
 Regression
 Detecting Deviations from Normal – Anomaly Detection
 Identify cases that are unusual within homogeneous data –
Oracle book (http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/anomalies.htm)
31
Detecting Deviations from Normal –
Anomaly Detection
 Is there anything unusual about this pattern of credit card
charges?
 Is there anything unusual about this event log that would
indicate an unauthorized intrusion?
 Is there any pattern here that indicates an unusual pattern of
accidents and treatments?
32
Detecting Deviations from Normal
– Anomaly Detection
Predictive or Descriptive?
Supervised or Unsupervised?
33
11
3/24/2011
Detecting Deviations from Normal
– Anomaly Detection
Predictive
Supervised
34
Anomaly Detection
– Predictive - Supervised
 This pattern of credit card charges implies a stolen credit
card
 This pattern of events in the event log indicates an
unauthorized intrusion
 This pattern of accidents and treatments indicates the
likelihood of insurance fraud
35
Data Mining – Algorithms
Different algorithms are available for different data mining tasks
Different tools exist that implement different algorithms and
different versions of algorithms
36
12
3/24/2011
Algorithms Available in Analysis
Services
 Decision Trees
 Linear Regression
 Naïve Bayes
 Clustering Algorithms
 Association Rules
 Sequence Clustering
 Time Series Analysis
 Neural Networks
 Logistic Regression
37
Data Mining
By Susan Miertschin
38
13