Download In Class Exercise Da..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
MBA 8473 DATA MINING – IN CLASS WORKSHEET
PROBLEM DOMAIN – MARKET BASKET ANALYSIS
GOAL:
1. To understand data processing for data mining. The example that we will use
involves converting Point of Sales data into a useful “count” form so that it can be
used for further data mining.
2. To learn to locate patterns that can be expressed as rules – by simple inspection or
manual computation.
3. To understand the support level for a rule by the data set.
4. To compute the significance of a rule.
5. To learn to do above 1,2,3,4 for a larger number of data points where we need the
help of DM tools.
DO:
1. Use the Tables below. Fill in the co-incidence matrix in Table 2 by counting data
from Table 1. You are looking for how many times one item (e.g. orange juice)
occurred with another item (e.g. soda).
2. To locate patterns, look for relative frequency of co-incidence of items. Mostly
extreme frequencies (high or low) are the places to start with, and see what can
interpreted from them. Start writing them down as a rule. (While doing this
ignore the diagonal frequencies).
An example pattern can be:
Orange juice and soda are more likely to be purchased together than any other two
items.
Expressed as a rule:
If a customer purchases soda, then the customer also purchases orange juice.
3. How well is this rule supported by the data? (i.e. in how many transactions from
the total number of transactions this happens). Generally expressed a percentage.
4. How about the confidence in the rule? Confidence is defined as the ratio of the
number of the transactions supporting the rule to the number of transactions
where the conditional part (i.e. the IF part) of the rule holds. What’s the
confidence in the following rules:
CONFIDENCE
If soda, then orange juice.
If orange juice, then soda.
Additional discussion:
The above rules are, in general, of the form IF A THEN B.
One can also look for, IF A THEN NOT B type rules.
How do we find them? Start by looking at zero frequencies in the co-incidence matrix.
Did you find any?
As a rule: If a customer purchases milk, then the customer does not purchase soda.
Then there are rules that have the form:
IF A and B, then C
IF A and C, Then B
If B and C, Then A.
For example, IF diapers AND Thursday, then beer
Table 1: Grocery Point-of-Sale Transactions
Customer
Items
1
Orange juice, soda
2
Milk, orange juice, window cleaner
3
Milk, detergent
4
Orange juice, detergent, soda
5
Window cleaner, soda
Table 2: Co-Occurrence of Products
Orange Juice
Orange Juice
Window Cleaner
Milk
Soda
Detergent
Window Cleaner
Milk
Soda
Detergent