Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Association Rules Presented by: Panicker Anilkumar What is Data Mining?? • Search for valuable information in large volumes of data. • A step in knowledge discovery in databases. • It enables companies to focus on customer satisfaction, corporate profits, and determining the impact of various parameters on the sales. Association Rule • Association rules are used to show the relationships between data items. • Association rules detect common usage of data items. • E.g. The purchasing of one product when another product is purchased represents an association rule. Example 1 • Grocery store. • Association rules have most direct application in the retail businesses. • Association rules used to assist in marketing, advertising, floor placements and inventory control. • From the transaction history several association rules can be derived. • E.g. 100% of the time that PeanutButter is purchased, so is bread. • 33% of the time PeanutButter is purchased, Jelly is also purchased. Example 2 • A Telephone Company. • A telephone company must ensure that all calls are completed and in acceptable period of time. • In this environment, a potential data mining problem would be to predict a failure of a node. • This can be done by finding association rules of the type XFailure. • If these types of rules occur with a high confidence, Failures can be predicted. • Even though the support might be low because the X condition does not frequently occur. Association rule • Given a set of items I = {I1,I2,….Im} and a database of transactions D = {t1,t2,….tm} where ti = { Ii1,Ii2,….Iik} and IiJ € I , an association rule is an implication of the form X Y where X,Y C I are sets of items called itemsets and X∩Y =ø. • Support (s): The support (s) for an association rule XY is the percentage of transactions in the database that contain X U Y. E.g. If bread along with peanutbutter occurs in 60% of the total transactions, then the support for breadpeanutbutter is 60% • Confidence or Strength (α): The confidence or strength (α) for an association rule XY is the ratio of the number of transactions that contain X U Y to the number of transactions that contain X. Eg.if support for breadpeanutbutter is 60% and bread occurs in 80% of total transactions then confidence for breadpeanutbutter is 75%. Selecting Association rules • The selection of association rules is based on Support and Confidence. • Confidence measures the strength of the rule, Whereas support measures how often it should occur in the database. • Typically large confidence values and a smaller support are used. • Rules that satisfy both minimum support and minimum confidence are called strong rules. Association Rule Problem • Given a set of Items I = {I1,I2,….Im} and a database of transactions D = {t1,t2,….tn} where ti = { Ii1,Ii2,….Iik} and IiJ € I . The association rule problem is to identify all association rules XY with a minimum support and confidence. These values (s,α) are given as input to the problem. Large Itemsets • A Large Itemset / frequent Itemset is an itemset whose number of occurrences is above a threshold, s (Support) • Finding large Itemsets generally is quite easy but very costly. • The naive approach would be to count all itemsets that appear in any transaction. • Given a set of items of size m, there are 2m subsets. Ignoring the empty set we are still left with 2m – 1 subsets. • For e.g. In the retail store example if have set of items of size 5, i.e the store sells 5 products. Then the possible number of itemsets is 25 – 1 = 31. • If the 5 products sold are bread,peanutbutter,milk,beer and jelly. then the 31 possible itemsets are • • • • • • • • • • • • • • • • • Bread Peanutbutter Milk Beer Jelly Bread,peanutbutter Bread,milk Bread,beer Bread,jelly Peanutbutter,milk Peanutbutter,beer Peanutbutter,jelly Milk,beer Milk,jelly Beer, jelly Bread,peanutbutter,milk Bread, Peanutbutter, beer and so on. • For m = 30 the number of potential itemsets become 1073741823. • The challenge in solving an association problem is hence to efficiently determining all large itemsets. • Most association rule algorithms are based on smart ways to reduce the number of itemsets to be counted. Large Itemsets • The most common approach to finding association rules is to breakup the problem into two parts 1. Finding large Itemsets and 2. Generating rules from these itemsets. • Subset of any large itemset is also large. • Once the large Itemsets have been found, we know that any interesting association rule, XY ,must have X U Y in this set of frequent itemsets. • When all large itemsets are found, generating the association rules is straightforward. Apriori Algorithm • Apriori algorithm is the most well known association rule algorithm. • Apriori algorithm is used to efficiently discover large itemsets. • Apriori algorithm uses the property that any subset of a large itemset must be large. • Inputs: Itemsets, Database of transactions, support and the output is large itemsets. Apriori Algorithm Example ITEM SET SUPPORT 1,3,4 {1} 2 200 2,3,5 {2} 3 300 1,2,3,5 {3} 3 400 2,5 {4} 1 {5} 3 T.I.D. Items 100 Support threshold = 2 ITEM SET SUPPORT {1} 2 {2} 3 {3} 3 {5} 3 ITEM SET {1,2} {1,3} {1,5} {2,3} {2,5} {3,5} Threshold Support = 2 ITEM SET SUPPORT ITEM SET SUPPORT {1,2} 1 {1,3} 2 {1,3} 2 {2,3} 2 {1,5} 1 {2,3} 2 {2,5} 3 {2,5} 3 {3,5} 2 {3,5} 2 ITEM SET ITEM SET {2,3,5} {2,3,5} SUPPORT 2 References • Data Mining by Margaret Dunham. • Wikipedia Q&A …… Thanks..