Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chap 6: Association Rules Rule Rules! Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data. Example : supermarket transaction => barcode,website automatically record purchase data These data provides POSSIBLE interaction among each item. Supermarket buying @ transaction data might provides consumer buying pattern! Association Analysis Association analysis is a popular data mining technique aimed at discovering novel and interesting relationships between the data objects present in a database. Is used to estimate the probability of whether a person will purchase a product given that they own particular product or group of products “Market Basket Analysis” looks at transactions to see which products get bought together. Association Rules Known as “market-basket analysis”. Aims to find regularities behaviors ~ to find set of products that are frequently be bought together! Rules structure: { X ^ Y } Z consequence Antecedent @ @ apriori Example: priori “if costumer bought milk and eggs, they often bought sugar too!” association rules: (milk ^ eggs) {sugar} Apriori Algorithm Method to find frequent patterns, associations and causal structures among set of items. Main concept ~ frequent itemsets itemset that appeared more and related with another item How? Using support and confidence value. Given item {X,Y,Z}: Support (S): probability that a transaction X ^ Y Z contain all given transaction (X,Y,Z). Measures how often the rules occur in database Confidence (C) : conditional probability that a transaction X ^ Y Z contains only item Z. sometimes known as “accuracy” Measures the strength of the rules Support and Confidence Concept Given items X,Y with T transaction, if rule XY therefore: Support : = transaction that contain every item in A and B / (total number of transactions) Confidence: =Transaction that contain every item in A and B / transaction that contain the items in A The support and confidence value is ranges between 0 and 1. Only rule that exceed minimum support will be generated. Support and Confidence Concept Example: (A ^ B) D ID Items 1 A,D,E 2 A,B,C support s 3 A,B,C,D confidence 4 A,B,E,C 5 A,C,B,D ( A, B, D) 5 2 40% 5 ( A, B, D) 2 50% ( A, B) 4 Illustrating Apriori Algorithm Principles Collect single item counts, find the combination of k itemsets and evaluate until finish. ID Items 1 Bread, Milk 2 Cheese, Diaper, Bread, Eggs 3 Cheese, Coke, Diaper, Milk 4 Cheese, Bread, Diaper, Milk 5 Coke, Bread, Diaper, Milk Given minimum support, s = 3 Illustrating Apriori Algorithm Principles (cont.) Convert into single itemsets. (with s=3) (1st itemsets) (2nd itemsets) ID Counts ID Counts Bread 4 Bread, Milk 3 Coke 2 Bread, Cheese 2 Cheese 3 Bread, Diaper 3 Milk 4 Milk, Cheese 2 Diaper 4 Milk, Diaper 3 Cheese, Diaper 3 S >= 3 Eggs 1 * Prune items COKE & EGGS because the count < 3 Illustrating Apriori Algorithm Principles (cont.) Support And Confidence. (with s=3) Relations Lift Support(%) 2 0.9 60 2 0.9 2 Confidence(%) Transaction Count Rule 75 3 Milk ==> Diaper 60 75 3 Diaper ==> Milk 0.9 60 75 3 Milk ==> Bread 2 0.9 60 75 3 Bread ==> Milk 2 1.3 60 75 3 Diaper ==> Cheese 2 1.3 60 100 3 Cheese ==> Diaper 2 0.9 60 75 3 Diaper ==> Bread 2 0.9 60 75 3 Bread ==> Diaper Interpreting Support and Confidence Confidence measure the strength the rules, whereas support measures how often it should occur in the database. For example, look at Diaper Cheese. With a confidence of 75%, this indicates that this rule holds 75% of the time it could. That is, ¾ times that Diaper occur, so does Cheese. The support value of 60% indicate that, this rules exists almost 60% of the all transaction. Example(Association Rules) A B C A CD Rule AD CA AC B&CD B CD ADE B C E Support Confidence 2/5 2/5 2/5 1/5 2/3 2/4 2/3 1/3 Implication? Checking Account No Yes Saving Account No 500 3,500 4,000 Yes 1,000 5,000 6,000 10,000 Support(SVG CK) = 50% Confidence(SVG CK) = 83% Lift(SVG CK) = 0.83/0.85 < 1 Apriori Algorithm Principles (cont.) •Lift is equal to the confidence factor divided by the expected confidence. •Lift is a factor by which the likelihood of consequent increases given an antecedent. •Expected confidence is equal to the number of consequent transactions divided by the total number of transactions. •A creditable rule has a large confidence factor, a large level of support, and a value of lift greater than 1. •Rules having a high level of confidence but little support should be interpreted with caution.