Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Association Rule. Association rule mining It is an important data mining model studied extensively by the database and data mining community. Assume all data are categorical. Initially used for Market Basket Analysis to find how items purchased by customers are related. Transaction data: supermarket data Market basket transactions: t1: {bread, cheese, milk} t2: {apple, eggs, salt, yogurt} … … tn: {biscuit, eggs, milk} Concepts: • An item: an item/article in a basket • I: the set of all items sold in the store • A transaction: items purchased in a basket; it may have TID (transaction ID) • A transactional dataset: A set of transactions The model: rules A transaction t contains X, a set of items (itemset) in I, if X t. An association rule is an implication of the form: X Y, where X, Y I, and X Y = An itemset is a set of items. • E.g., X = {milk, bread, cereal} is an itemset. A k-itemset is an itemset with k items. • E.g., {milk, bread, cereal} is a 3-itemset Rule strength measures Support: The rule holds with support sup in T (the transaction data set) if sup% of transactions contain X Y. sup = Pr(X Y)= Count (XY)/total count. Confidence: The rule holds in T with confidence conf if conf% of transactions that contain X also contain Y. • conf = Pr(Y | X)=support(X,Y)/support(X). An association rule is a pattern that states when X occurs, Y occurs with certain probability. Goal of Association Rule. Find all rules that satisfy the userspecified minimum support (minsup) and minimum confidence (minconf). An Example. Transaction data Assume: minsup = 30% minconf = 80% An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7] Association rules from the itemset: Clothes Milk,Chicken[sup = 3/7, conf = 3/3] … … Clothes, Chicken Milk[sup = 3/7, conf = 3/3] •t1: Beef, Chicken, Milk •t2: Beef, Cheese •t3: Cheese, Boots •t4: Beef, Chicken, Cheese •t5: Beef, Chicken, Clothes, Cheese, Milk •t6: Chicken, Clothes, Milk •t7: Chicken, Milk, Clothes Data set. This data set related to retail industry. The data set contains information of each transaction with the transaction ids. Each row represent a single transaction ,i.e information of a single customer. For example if a row present the data like this{Bread sandwich,Milk,Egg,Butter}, it means this customer has taken those mentioned item in a single transaction. Objective. Here our main objective is to find out the pattern of buying from this huge data base The discovery of such association rule can help people to develop marketing strategies by gaining insight into, which items are frequently purchased together by customer. Here we have taken the following parameters, Minsup=.08 Minconf=.40 Mincorr=.30 Analysis. The spreadsheet showing the frequently item set with the support values. From the table it is clear that Fluid milk has the maximum frequencies followed by Bananas ,Salad vegetable, Eggs etc. This means most of the customers has taken these three items into their basket. The fifth rule has got highest confidence value 58.83424%,which means 58% of customers who are taking Eggs also taking Fluid milk. Similarly 54% of customers who are taking Tomatoes also taking Salad vegetables. Same way 52% of customer who are taking Bread Sandwiches also taking Fluid milk. Rule Graph. This will represent the entire Association rules Graphically, which will help us to understand the entire process in a single snapshot. In this graph, the support values for the Body and Head portions of each association rule are indicated by the sizes and colors of each circle. The thickness of each line indicates the confidence value (conditional probability of Head given Body) for the respective association rule. The sizes and colors of the circles in the center, above the Implies label, indicate the joint support (for the co-occurrences) of the respective Body and Head components of the respective association rules. In the graphical summary the strongest support value was found for Fluid milk associated with Bananas, Bread sandwiches, and Eggs. From the graph it is also clear that Fluid milk and Eggs has got the highest confidence value (thickness of these rule is very high). 3D Rule Graph. The above graph is the 3D version of the earlier graph. From the graph it is clear that Fluid milk and Eggs have the highest confidence value compared to any other items. Conclusion. According to the rule Fluid milk, Bananas, Bread sandwiches, Eggs, Salad Vegetables, Grapes, Fruit juice these items are frequently taken by customers into their basket. Also the rule suggest that more than 50% of customers who are buying Fluid milk also buying Eggs and Bread sandwiches. All the above information can be utilized for better marketing strategies. For example retailer can arrange those frequently brought items very close to each other in the super market so that customer can get all these items easily. Some new products (related to previous items) can also be placed nearby which will attract to the customers. Thank You. Krishnendu Kundu (Statistician) StatSoft India. Email- [email protected] Mobile - +919873119520