Download Association - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Concurrency control wikipedia , lookup

Operational transformation wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Relational algebra wikipedia , lookup

Association rule learning wikipedia , lookup

Transcript
Chapter 4
Data Mining: Association
Ms. Malak Bagais
[textbook]: Chapter 4
Association Rule Mining
 The data mining process of identifying associations
from a dataset.
 Searches for relationships between items in a dataset.
 Also called market-basket analysis
Example:
90% of people who purchase bread also purchase butter
Why?




Analyze customer buying habits
Helps retailer develop marketing strategies.
Helps inventory management
Sale promotion strategies
Basic Concepts





Support
Confidence
Itemset
Strong rules
Frequent Itemset
Support
IF A B
Support (AB)=
#of tuples containing both (A,B)
Total # of tuples
The support of an association pattern is the percentage of
task-relevant data transactions for which the pattern is
true.
Confidence
 IF A B
 Confidence (AB)=
#of tuples containing both (A,B)
Total # of tuples containing A
Confidence is defined as the measure of certainty or
trustworthiness associated with each discovered
pattern.
Itemset
 A set of items is referred to as itemset.
 An itemset containing k items is called k itemset.
 An itemset can also be seen as a conjunction of items
(or a predicate)
Frequent Itemset
 Suppose min_sup is the minimum support threshold.
 An itemset satisfies minimum support if the
occurrence frequency of the itemset is greater than
or equal to min_sup.
 If an itemset satisfies minimum support, then it is a
frequent itemset.
Strong Rules
Rules that satisfy both a minimum support
threshold and a minimum confidence threshold are
called strong.
Association Rules
Algorithms that obtain association rules from data
usually divide the task into two parts:
 Find the frequent itemsets and
 Form the rules from them:
 Generate strong association rules from the frequent
itemsets
A priori algorithm
 Agrawal and Srikant in 1994
 Also called the level-wise algorithm
 It is the most accepted algorithm for finding all the frequent
sets
 It makes use of the downward closure property
 The algorithm is a bottom-up search, progressing upward
level-wise in the lattice
 Before reading the database at every level, it prunes many
of the sets, sets which are unlikely to be frequent sets.
A priori Algorithm
 Uses a Level-wise search, where k-itemsets are used
to explore (k+1)itemsets, to mine frequent itemsets
from transactional database for Boolean association
rules.
 First, the set of frequent 1-itemsets is found. This set
is denoted L1. L1 is used to find L2, the set of frequent
2-itemsets, which is used to fine L3, and so on,
A priori Algorithm steps
 The first pass of the algorithm simply counts item
occurrences to determine the frequent itemsets.
 A subsequent pass, say pass k, consists of two phases:
 The frequent itemsets Lk-1 found in the (k-1)th pass are
used to generate the candidate item sets Ck, using the a
priori candidate generation function.
 the database is scanned and the support of candidates
in Ck is counted.
Join Step
 Assume that we know frequent itemsets of size k-1.
Considering a k-itemset we can immediately conclude
that by dropping two different items we have two
frequent (k-1) itemsets.
 From another perspective this can be seen as a
possible way to construct k-itemsets. We take two (k1) item sets which differ only by one item and take
their union. This step is called the join step and is
used to construct POTENTIAL frequent k-itemsets.
Join Algorithm
Pruning Algorithm
Pruning Algorithm Pseudo code
 Tuples represent
transactions (15
transactions)
 Columns represent items
(9 items)
 Min-sup = 20%
 Itemset should be
supported by 3
transactions at least
Example
Source: http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput499/slides/Lect10/sld054.htm