Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Database Management Systems: Data Mining Market Baskets Association Rules Jerry Post Copyright © 2003 1 D A T A M i n i n g Association/Market Basket Examples What items are customers likely to buy together? What Web pages are closely related? Others? Classic (early) example: Analysis of convenience store data showed customers often buy diapers and beer together. Importance: Consider putting the two together to increase crossselling. 2 D A T A M i n i n g Association Challenges If an item is rarely purchased, any other item bought with it seems important. So combine items into categories. Item Freq. Item Freq. 1 “ nails 2% Hardware 15% 2” nails 1% Dim. Lumber 20% 3” nails 1% Plywood 15% 4” nails 2% Finish lumber 15% Lumber 50% Some relationships are obvious. Burger and fries. Some relationships are meaningless. Hardware store found that toilet rings sell well only when a new store first opens. But what does it mean? 3 D A T A M i n i n g Association Measure: Confidence Does A B? If a customer purchases A, will they purchase B? confidence( A B) # baskets containing both A and B # baskets containing A 4 D A T A M i n i n g Association Measure: Support Does the existing data support the rule? What percentage of baskets contain both A and B? # baskets containing both A and B Support ( A B) # baskets 5 D A T A M i n i n g Association Measure: Lift How does the association rule compare to the null hypothesis (the A item exists without the B item)? What is the likelihood of finding the second item (B) in any random basket? Support ( A and B) Lift ( A B) Support ( A) * Support ( B) P( A B) P( A) P( B) P( B | A) P( B) 6 D A T A M i n i n g Association Details (two items) Rule evaluation (A implies B) Support for the rule is measured by the percentage of all transactions containing both items: P(A ∩ B) Confidence of the rule is measured by the transactions with A that also contain B: P(B | A) Lift is the potential gain attributed to the rule—the effect compared to other baskets without the effect. If it is greater than 1, the effect is positive: P(A ∩ B) / ( P(A) P(B) ) P(B|A)/P(B) Example: Diapers implies Beer Support: P(D ∩ B) = .6 Confidence: P(B|D) = .857 Lift: P(B|D) / P(B) = 1.714 P(D) = .7 P(B) = .5 = P(D ∩ B)/P(D) = .6/.7 = .857 / .5 7 D A T A M i n i n g Example (Marakas) Transaction data 1. Frozen pizza, cola, milk 2. Milk, potato chips 3. Cola, frozen pizza 4. Milk, pretzels 5. Cola, pretzels 8