Download A Study on Market Basket Analysis Using a Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013)
A Study on Market Basket Analysis Using a Data Mining
Algorithm
Phani Prasad J1, Murlidher Mourya2
1,2
Vardhaman Engineering College, Hyderabad, India
Confidence = Probability (B if A) = P(B/A)
Confidence = (# of transactions involving A and
B) / (total number of transactions that have A).
Consider the following example as
Abstract— Association rule mining is the power ful tool
now a days in Data mining. It identifies the correlation
between the items in large databases. A typical example of
Association rule mining is Market Basket analysis. In this
method or approach it examines the buying habits of the
customers by identifying the associations among the items
purchased by the customers in their baskets. This helps to
increase in the sales of a particular product by identifying the
frequent items purchased by the customers. This paper
mainly focuses on the study of the existing data mining
algorithm for Market Basket data.
Customer
Item Purchased
Item Purchased
1
Burger
Coke
2
puff
Mineral water
Keywords— Association Rule Mining , Apriori Algorithm,
Market Basket Analysis.
3
Burger
Mineral water
I. INTRODUCTION
4
Puff
Tea
Association rules can be mined and this process of
mining the association rules is one of the most important
and powerful aspect of data mining. One of the main
criteria of ARM is to find the relationship among various
items in a database. An association rule is of the form
A→B where A is the antecedent and B is the Consequent .
and here A and B are item sets and the underlying rule says
us purchased by the customers who purchase A are likely
to purchase B with a probability percentage factor as %C
where C is known as confidence such a rule is as follows:
“seventy per cent of people who purchase beer will also
like to purchase diapers” This helps the shop managers to
study the behaviour or buying habits of the customers to
increase the sales . based on this study items that are
regularly purchased by the customers are put under closed
proximity. For example persons who purchase milk will
also likely to purchase Bread .
The interestingness measures like support and
confidence also plays a vital role in the association
analysis. The support is defined as percentage of
transactions that contained in the rule and is given by
Support = (# of transactions involving A and B) / (total
number of transactions). The other factor is confidence it is
the percentage of transactions that contain B if they
contain A
If A is “purchased Burger “ and B is “purchased
mineral water” then
Support=P(A and B)=1/4
Confidence=P(B/A)=1/2
Item sets that satisfy minimum support and minimum
confidence are called strong association rules.
II. LITERATURE SURVEY
Trnka[1], in this paper the author paper describes the
way of Market Basket Analysis implementation to Six
Sigma methodology. Data Mining methods provide a lot of
opportunities in the market sector. Basket Market Analysis
is one of them. Six Sigma methodology uses several
statistical methods. With implementation of Market Basket
Analysis (as a part of Data Mining) to Six Sigma (to one of
its phase), we can improve the results and change the
Sigma performance level of the process. In our research we
used GRI (General Rule Induction) algorithm to produce
association rules between products in the market basket
analysis. These associations show a variety between the
products. To show the dependence between the products
we used a Web plot.
361
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013)
Yanthy et al., [2] in this paper author says about The
important goal in data mining is to reveal hidden
knowledge from data and various algorithms have been
proposed so far. But the problem is that typically not all
rules are interesting - only small fractions of the generated
rules would be of interest to any given user. Hence,
numerous measures such as confidence, support, lift,
information gain, and so on, have been proposed to
determine the best or most interesting rules. However,
some algorithms are good at generating rules high in one
interestingness measure but bad in other interestingness
measures. The relationship between the algorithms and
interestingness measures of the generated rules is not clear
yet. In this paper, we studied the relationship between the
algorithms and interesting measures. We used synthetic
data so that the obtained result is not limited to specific
cases.
Chiu etal,. [3] proposed a principal component model for
market basket analysis as Market-basket analysis is a wellknown business problem, which can be (partially) solved
computationally using association rules, mined from
transaction data to maximize cross-selling effects. Here, we
model the market-basket analysis as a finite mixture
density of human consumption behavior according to social
and cultural events. This leads to the use of principle
component analysis and possibly mixture density analysis
of transaction data, which was not apparent before. We
compare PCA and association rules mined from a set of
benchmark transaction data, to explore similarities and
differences between these two data exploration tools.
Cunningham et., [4] proved a model for library
circulation data and here We apply the a-priori market
basket tool to the task of detecting subject classification
categories that co-occur in transaction records of books
borrowed from a university library. This information can be
useful in directing users to additional portions of the
collection that may contain documents which are relevant
to their information need, and in determining a library's
physical layout. These results can also provide insight into
the degree of “scatter” that the classification scheme
induces in a particular collection of documents.
Vo et al,.[5] Proposed a model as we present an
application of lattice in mining traditional association rules
which will reduce greatly the time for mining rules - our
method includes two phases: (1) building frequent item sets
lattice and (2) mining association rules from lattice. We
based on the parent-child relationships in lattice to fast
discover the association rules. The experiments show that
the mining rules from lattice is more effective than the
direct mining from frequent item sets using hash table.
Rastogi et al,.[6] has presented his contribution on
optimized association approach on Optimized association
rules are permitted to contain uninstantiated attributes and
the problem is to determine instantiations such that either
the support or confidence of the rule is maximized. In this
paper, we generalize the optimized association rules
problem in three ways: 1) association rules are allowed to
contain disjunctions over uninstantiated attributes, 2)
association rules are permitted to contain an arbitrary
number of uninstantiated attributes, and 3) uninstantiated
attributes can be either categorical or numeric. Our
generalized association rules enable us to extract more
useful information about seasonal and local patterns
involving multiple attributes. We present effective
techniques for pruning the search space when computing
optimized association rules for both categorical and
numeric data.
III. APRORI ALGORITHM
Apriori algorithm is one of the Data Mining algorithm
which is used to find the frequent items/ itemsets from a
given data repository. The algorithm mainly involves 2
steps: Pruning and joining . The Apriori property is the
important factor to be considered before proceeding with
the algorithm.
Apriori Property: if an item X is joined with item Y,
Support(X U Y) = min(Support(X), Support(Y))
APRIORI ALGORITHM:
//find all frequent itemsets
Appriori(database D of transactions, min_support) {
F1 = {frequent 1-itemsets}
k=2
while Fk-1 ≠ EmptySet
Ck= AprioriGeneration(Fk-1)
for each transaction t in the database D{
Ct= subset(Ck, t)
for each candidate c in Ct {
count c ++
362
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013)

}
It scans the database lot of times. i.e every time it
runs it scans the database every time, this results
in shortage of memory to store the data.
 The I/O load is not sufficient therefore it takes lot
of time to process and exhibits low efficiency
 The time complexity is very Huge .
As a part of solution we propose the following points to
be observed to improve efficiency of Apriori algorithm:
Fk = {c in Ck such that countc ≥
min_support}
k++
}

F = U k ≥ 1 Fk

}
//prune the candidate itemsets
Group items into higher conceptual groups, e.g.
white and brown bread become “bread.”
Reduce the number of scans of the entire database
(Apriori needs n+1 scans, where n is the length of
the longest pattern)
V. CONCLUSION
ApprioriGeneration(Fk-1) {
This paper provides lot of case studies about the
Association rules and the existing data mining algorithms
usage for market basket analysis , also it clearly mentioned
about the existing algorithm and its implementation clearly
and also about its problems and solutions. In future the
same algorithm can be modified and it can be extended in
the future work which also decreases the time complexity.
//Insert into Ck all combinations of elements in
Fk-1 obtained by self-joining itemsets in Fk-1
//self joining means that all but the last item
in the itemsets considered “overlaps,” i.e join items p, q
from Fk-1 to make candidate k-itemsets of form p1p2 …p k1q1q2…q k-1 (without overlapping) such that p i =q i for
i=1,2, .., k-2 and pk-1 < qk-1.
REFERENCES
[1]
//Delete all itemsets c in Ck such that some (k-1)
subset of c is not in Lk-1
[2]
[3]
}
//find all subsets of candidates contained in t
[4]
Subset(Ck, t) {
[5]
…
[6]
}
IV. PROBLEMS AND SOLUTIONS
The Various existing applications and approaches are
discussed in this paper on a single data mining algorithm.
All the approaches has their own merits and demerits. The
present section described some of the draw backs of
existing algorithm
363
Trnka., “Market Basket Analysis with Data Mining Methods”,
International Conference on Networking and Information
Technology (ICNIT) ,2010.
W Yanthy, T. Sekiya, K. Yamaguchi , “Mining Interesting Rules by
association and Classification Algorithms”, FCST 09.
Chiu, K.S.Y., Luk, R.W.P, Chan, K.C.C., and Chung, K.F.L,
“Market-basket Analysis with Principal Component Analysis:An
Exploration”, IEEE International Conference on Systems, Man and
Cybernetics, Vol.3, 2002.
Cunningham , S.J. and Frank, E., “Market Basket Analysis of
Library Circulation Data”, International Conference on Neural
Information Processing, Vol.2. 1999.
Vo,B.and Le,B.,”Mining traditional association rules using frequent
itemsets lattice”,International Conference on Computers & Industrial
Engineering. 2009.
Rastogi, R.. and kyuseok Shim, “Mining optimised association rules
with Categorical and neumerical attributes”, IEEE transactions on
Knowledge and Data Engineering, vol.14, 2002.