Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013) A Study on Market Basket Analysis Using a Data Mining Algorithm Phani Prasad J1, Murlidher Mourya2 1,2 Vardhaman Engineering College, Hyderabad, India Confidence = Probability (B if A) = P(B/A) Confidence = (# of transactions involving A and B) / (total number of transactions that have A). Consider the following example as Abstract— Association rule mining is the power ful tool now a days in Data mining. It identifies the correlation between the items in large databases. A typical example of Association rule mining is Market Basket analysis. In this method or approach it examines the buying habits of the customers by identifying the associations among the items purchased by the customers in their baskets. This helps to increase in the sales of a particular product by identifying the frequent items purchased by the customers. This paper mainly focuses on the study of the existing data mining algorithm for Market Basket data. Customer Item Purchased Item Purchased 1 Burger Coke 2 puff Mineral water Keywords— Association Rule Mining , Apriori Algorithm, Market Basket Analysis. 3 Burger Mineral water I. INTRODUCTION 4 Puff Tea Association rules can be mined and this process of mining the association rules is one of the most important and powerful aspect of data mining. One of the main criteria of ARM is to find the relationship among various items in a database. An association rule is of the form A→B where A is the antecedent and B is the Consequent . and here A and B are item sets and the underlying rule says us purchased by the customers who purchase A are likely to purchase B with a probability percentage factor as %C where C is known as confidence such a rule is as follows: “seventy per cent of people who purchase beer will also like to purchase diapers” This helps the shop managers to study the behaviour or buying habits of the customers to increase the sales . based on this study items that are regularly purchased by the customers are put under closed proximity. For example persons who purchase milk will also likely to purchase Bread . The interestingness measures like support and confidence also plays a vital role in the association analysis. The support is defined as percentage of transactions that contained in the rule and is given by Support = (# of transactions involving A and B) / (total number of transactions). The other factor is confidence it is the percentage of transactions that contain B if they contain A If A is “purchased Burger “ and B is “purchased mineral water” then Support=P(A and B)=1/4 Confidence=P(B/A)=1/2 Item sets that satisfy minimum support and minimum confidence are called strong association rules. II. LITERATURE SURVEY Trnka[1], in this paper the author paper describes the way of Market Basket Analysis implementation to Six Sigma methodology. Data Mining methods provide a lot of opportunities in the market sector. Basket Market Analysis is one of them. Six Sigma methodology uses several statistical methods. With implementation of Market Basket Analysis (as a part of Data Mining) to Six Sigma (to one of its phase), we can improve the results and change the Sigma performance level of the process. In our research we used GRI (General Rule Induction) algorithm to produce association rules between products in the market basket analysis. These associations show a variety between the products. To show the dependence between the products we used a Web plot. 361 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013) Yanthy et al., [2] in this paper author says about The important goal in data mining is to reveal hidden knowledge from data and various algorithms have been proposed so far. But the problem is that typically not all rules are interesting - only small fractions of the generated rules would be of interest to any given user. Hence, numerous measures such as confidence, support, lift, information gain, and so on, have been proposed to determine the best or most interesting rules. However, some algorithms are good at generating rules high in one interestingness measure but bad in other interestingness measures. The relationship between the algorithms and interestingness measures of the generated rules is not clear yet. In this paper, we studied the relationship between the algorithms and interesting measures. We used synthetic data so that the obtained result is not limited to specific cases. Chiu etal,. [3] proposed a principal component model for market basket analysis as Market-basket analysis is a wellknown business problem, which can be (partially) solved computationally using association rules, mined from transaction data to maximize cross-selling effects. Here, we model the market-basket analysis as a finite mixture density of human consumption behavior according to social and cultural events. This leads to the use of principle component analysis and possibly mixture density analysis of transaction data, which was not apparent before. We compare PCA and association rules mined from a set of benchmark transaction data, to explore similarities and differences between these two data exploration tools. Cunningham et., [4] proved a model for library circulation data and here We apply the a-priori market basket tool to the task of detecting subject classification categories that co-occur in transaction records of books borrowed from a university library. This information can be useful in directing users to additional portions of the collection that may contain documents which are relevant to their information need, and in determining a library's physical layout. These results can also provide insight into the degree of “scatter” that the classification scheme induces in a particular collection of documents. Vo et al,.[5] Proposed a model as we present an application of lattice in mining traditional association rules which will reduce greatly the time for mining rules - our method includes two phases: (1) building frequent item sets lattice and (2) mining association rules from lattice. We based on the parent-child relationships in lattice to fast discover the association rules. The experiments show that the mining rules from lattice is more effective than the direct mining from frequent item sets using hash table. Rastogi et al,.[6] has presented his contribution on optimized association approach on Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that either the support or confidence of the rule is maximized. In this paper, we generalize the optimized association rules problem in three ways: 1) association rules are allowed to contain disjunctions over uninstantiated attributes, 2) association rules are permitted to contain an arbitrary number of uninstantiated attributes, and 3) uninstantiated attributes can be either categorical or numeric. Our generalized association rules enable us to extract more useful information about seasonal and local patterns involving multiple attributes. We present effective techniques for pruning the search space when computing optimized association rules for both categorical and numeric data. III. APRORI ALGORITHM Apriori algorithm is one of the Data Mining algorithm which is used to find the frequent items/ itemsets from a given data repository. The algorithm mainly involves 2 steps: Pruning and joining . The Apriori property is the important factor to be considered before proceeding with the algorithm. Apriori Property: if an item X is joined with item Y, Support(X U Y) = min(Support(X), Support(Y)) APRIORI ALGORITHM: //find all frequent itemsets Appriori(database D of transactions, min_support) { F1 = {frequent 1-itemsets} k=2 while Fk-1 ≠ EmptySet Ck= AprioriGeneration(Fk-1) for each transaction t in the database D{ Ct= subset(Ck, t) for each candidate c in Ct { count c ++ 362 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013) } It scans the database lot of times. i.e every time it runs it scans the database every time, this results in shortage of memory to store the data. The I/O load is not sufficient therefore it takes lot of time to process and exhibits low efficiency The time complexity is very Huge . As a part of solution we propose the following points to be observed to improve efficiency of Apriori algorithm: Fk = {c in Ck such that countc ≥ min_support} k++ } F = U k ≥ 1 Fk } //prune the candidate itemsets Group items into higher conceptual groups, e.g. white and brown bread become “bread.” Reduce the number of scans of the entire database (Apriori needs n+1 scans, where n is the length of the longest pattern) V. CONCLUSION ApprioriGeneration(Fk-1) { This paper provides lot of case studies about the Association rules and the existing data mining algorithms usage for market basket analysis , also it clearly mentioned about the existing algorithm and its implementation clearly and also about its problems and solutions. In future the same algorithm can be modified and it can be extended in the future work which also decreases the time complexity. //Insert into Ck all combinations of elements in Fk-1 obtained by self-joining itemsets in Fk-1 //self joining means that all but the last item in the itemsets considered “overlaps,” i.e join items p, q from Fk-1 to make candidate k-itemsets of form p1p2 …p k1q1q2…q k-1 (without overlapping) such that p i =q i for i=1,2, .., k-2 and pk-1 < qk-1. REFERENCES [1] //Delete all itemsets c in Ck such that some (k-1) subset of c is not in Lk-1 [2] [3] } //find all subsets of candidates contained in t [4] Subset(Ck, t) { [5] … [6] } IV. PROBLEMS AND SOLUTIONS The Various existing applications and approaches are discussed in this paper on a single data mining algorithm. All the approaches has their own merits and demerits. The present section described some of the draw backs of existing algorithm 363 Trnka., “Market Basket Analysis with Data Mining Methods”, International Conference on Networking and Information Technology (ICNIT) ,2010. W Yanthy, T. Sekiya, K. Yamaguchi , “Mining Interesting Rules by association and Classification Algorithms”, FCST 09. Chiu, K.S.Y., Luk, R.W.P, Chan, K.C.C., and Chung, K.F.L, “Market-basket Analysis with Principal Component Analysis:An Exploration”, IEEE International Conference on Systems, Man and Cybernetics, Vol.3, 2002. Cunningham , S.J. and Frank, E., “Market Basket Analysis of Library Circulation Data”, International Conference on Neural Information Processing, Vol.2. 1999. Vo,B.and Le,B.,”Mining traditional association rules using frequent itemsets lattice”,International Conference on Computers & Industrial Engineering. 2009. Rastogi, R.. and kyuseok Shim, “Mining optimised association rules with Categorical and neumerical attributes”, IEEE transactions on Knowledge and Data Engineering, vol.14, 2002.