Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 4 Data Mining: Association Ms. Malak Bagais [textbook]: Chapter 4 Association Rule Mining The data mining process of identifying associations from a dataset. Searches for relationships between items in a dataset. Also called market-basket analysis Example: 90% of people who purchase bread also purchase butter Why? Analyze customer buying habits Helps retailer develop marketing strategies. Helps inventory management Sale promotion strategies Basic Concepts Support Confidence Itemset Strong rules Frequent Itemset Support IF A B Support (AB)= #of tuples containing both (A,B) Total # of tuples The support of an association pattern is the percentage of task-relevant data transactions for which the pattern is true. Confidence IF A B Confidence (AB)= #of tuples containing both (A,B) Total # of tuples containing A Confidence is defined as the measure of certainty or trustworthiness associated with each discovered pattern. Itemset A set of items is referred to as itemset. An itemset containing k items is called k itemset. An itemset can also be seen as a conjunction of items (or a predicate) Frequent Itemset Suppose min_sup is the minimum support threshold. An itemset satisfies minimum support if the occurrence frequency of the itemset is greater than or equal to min_sup. If an itemset satisfies minimum support, then it is a frequent itemset. Strong Rules Rules that satisfy both a minimum support threshold and a minimum confidence threshold are called strong. Association Rules Algorithms that obtain association rules from data usually divide the task into two parts: Find the frequent itemsets and Form the rules from them: Generate strong association rules from the frequent itemsets A priori algorithm Agrawal and Srikant in 1994 Also called the level-wise algorithm It is the most accepted algorithm for finding all the frequent sets It makes use of the downward closure property The algorithm is a bottom-up search, progressing upward level-wise in the lattice Before reading the database at every level, it prunes many of the sets, sets which are unlikely to be frequent sets. A priori Algorithm Uses a Level-wise search, where k-itemsets are used to explore (k+1)itemsets, to mine frequent itemsets from transactional database for Boolean association rules. First, the set of frequent 1-itemsets is found. This set is denoted L1. L1 is used to find L2, the set of frequent 2-itemsets, which is used to fine L3, and so on, A priori Algorithm steps The first pass of the algorithm simply counts item occurrences to determine the frequent itemsets. A subsequent pass, say pass k, consists of two phases: The frequent itemsets Lk-1 found in the (k-1)th pass are used to generate the candidate item sets Ck, using the a priori candidate generation function. the database is scanned and the support of candidates in Ck is counted. Join Step Assume that we know frequent itemsets of size k-1. Considering a k-itemset we can immediately conclude that by dropping two different items we have two frequent (k-1) itemsets. From another perspective this can be seen as a possible way to construct k-itemsets. We take two (k1) item sets which differ only by one item and take their union. This step is called the join step and is used to construct POTENTIAL frequent k-itemsets. Join Algorithm Pruning Algorithm Pruning Algorithm Pseudo code Tuples represent transactions (15 transactions) Columns represent items (9 items) Min-sup = 20% Itemset should be supported by 3 transactions at least Example Source: http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput499/slides/Lect10/sld054.htm