Download Association - WordPress.com

Chapter 4 Data Mining: Association Ms. Malak Bagais [textbook]: Chapter 4 Association Rule Mining  The data mining process of identifying associations from a dataset.  Searches for relationships between items in a dataset.  Also called market-basket analysis Example: 90% of people who purchase bread also purchase butter Why?     Analyze customer buying habits Helps retailer develop marketing strategies. Helps inventory management Sale promotion strategies Basic Concepts      Support Confidence Itemset Strong rules Frequent Itemset Support IF A B Support (AB)= #of tuples containing both (A,B) Total # of tuples The support of an association pattern is the percentage of task-relevant data transactions for which the pattern is true. Confidence  IF A B  Confidence (AB)= #of tuples containing both (A,B) Total # of tuples containing A Confidence is defined as the measure of certainty or trustworthiness associated with each discovered pattern. Itemset  A set of items is referred to as itemset.  An itemset containing k items is called k itemset.  An itemset can also be seen as a conjunction of items (or a predicate) Frequent Itemset  Suppose min_sup is the minimum support threshold.  An itemset satisfies minimum support if the occurrence frequency of the itemset is greater than or equal to min_sup.  If an itemset satisfies minimum support, then it is a frequent itemset. Strong Rules Rules that satisfy both a minimum support threshold and a minimum confidence threshold are called strong. Association Rules Algorithms that obtain association rules from data usually divide the task into two parts:  Find the frequent itemsets and  Form the rules from them:  Generate strong association rules from the frequent itemsets A priori algorithm  Agrawal and Srikant in 1994  Also called the level-wise algorithm  It is the most accepted algorithm for finding all the frequent sets  It makes use of the downward closure property  The algorithm is a bottom-up search, progressing upward level-wise in the lattice  Before reading the database at every level, it prunes many of the sets, sets which are unlikely to be frequent sets. A priori Algorithm  Uses a Level-wise search, where k-itemsets are used to explore (k+1)itemsets, to mine frequent itemsets from transactional database for Boolean association rules.  First, the set of frequent 1-itemsets is found. This set is denoted L1. L1 is used to find L2, the set of frequent 2-itemsets, which is used to fine L3, and so on, A priori Algorithm steps  The first pass of the algorithm simply counts item occurrences to determine the frequent itemsets.  A subsequent pass, say pass k, consists of two phases:  The frequent itemsets Lk-1 found in the (k-1)th pass are used to generate the candidate item sets Ck, using the a priori candidate generation function.  the database is scanned and the support of candidates in Ck is counted. Join Step  Assume that we know frequent itemsets of size k-1. Considering a k-itemset we can immediately conclude that by dropping two different items we have two frequent (k-1) itemsets.  From another perspective this can be seen as a possible way to construct k-itemsets. We take two (k1) item sets which differ only by one item and take their union. This step is called the join step and is used to construct POTENTIAL frequent k-itemsets. Join Algorithm Pruning Algorithm Pruning Algorithm Pseudo code  Tuples represent transactions (15 transactions)  Columns represent items (9 items)  Min-sup = 20%  Itemset should be supported by 3 transactions at least Example Source: http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput499/slides/Lect10/sld054.htm

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Association - WordPress.com