Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bahria University Department of Computer Sciences Advanced Databases Lab10: Discovering Association Rules in Data Mining Objectives To understand the concept of discovering Assocations Rules in Data Mining To practice discovering Association Rules using Support and Confidence concepts Association Rules in Data Mining Association rules. These rules correlate the presence of a set of items with another range of values for another set of variables. Examples: (1) When a female retail shopper buys a handbag, she is likely to buy shoes. (2) An X-ray image containing characteristics a and b is likely to also exhibit characteristic c. Example A common example is that of market-basket data containing sets of items a consumer buys in a supermarket during one visit. Consider four such transactions in a random sample shown in Figure 28.1. An association rule is of the form: X => Y where X = {x1, x2, ..., xn}, and Y = {y1, y2,..., ym} are sets of items with xi and yj being distinct items for all i and all j. This association states that if a customer buys X, he or she is also likely to buy Y. The set X Y is called an itemset, the set of items purchased by customers. For an association rule to be of interest to a data miner, the rule should satisfy some interest measure. Two common interest measures are support and confidence: The support for a rule X => Y refers to how frequently a specific itemset occurs in the database. That is, the support is the percentage of transactions that contain all of the items in the itemset X Y. If the support is low, it implies that there is no overwhelming evidence that items in X Y occur together because the itemset occurs in only a small fraction of transactions. BU, CS Department Advanced Databases 2/3 Lab10: Discovering Association Rules in Data Mining The confidence is with regard to the implication shown in the rule. The confidence of the rule X => Y is computed as the support(X Y)/support(X). We can think of it as the probability that the items in Y will be purchased given that the items in X are purchased by a customer. Another term for confidence is strength of the rule. As an example of support and confidence, consider the following two rules: milk => juice and bread => juice. Looking at our four sample transactions in Figure 28.1, we see that the support of {milk, juice} is 50 percent and the support of {bread, juice} is only 25 percent. The confidence of milk => juice is 66.7 percent (meaning that, of three transactions in which milk occurs, two contain juice) and the confidence of bread => juice is 50 percent (meaning that one of two transactions containing bread also contains juice). As we can see, support and confidence do not necessarily go hand in hand. The goal of mining association rules, then, is to generate all possible rules that exceed some minimum user-specified support and confidence thresholds. The sets of items that have a support that exceeds the threshold are called large (or frequent) itemsets. (Note that large here means large support). BU, CS Department Advanced Databases 3/3 Lab10: Discovering Association Rules in Data Mining Exercises Exercise 1 1. Use the following data to find out the answers to questions given below: Trans_id -------101 102 103 104 105 106 107 108 109 110 Items_purchased --------------milk, bread, eggs milk, juice juice, butter milk, bread, eggs coffee, eggs coffee coffee, juice milk, bread, cookies, eggs cookies, butter milk, bread The set of items is {milk, bread, cookies, eggs, butter, coffee, juice}. a) Find out the support for following rules: bread => eggs milk => eggs milk => bread cookies => milk b) Find out the confidence for following rules: bread => eggs milk => eggs milk => bread cookies => milk milk => cookies c) Show two rules that have a large itemset containing three items from the given data. Use the threshold value of 0.25 (25%). Also write the support for each rule: d) Show two rules that have a confidence of 0.7 (70%) or greater for an itemset containing three items from the given data. Also write the confidence for each rule: