Download Market-Basket Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Market-Basket Analysis
Frequent Itemset Mining helps in discovering the associations and correlations among items in large
transactional or relational data sets. Because of the technological advancements, huge data is getting
generated extensively which is being stored. There is a growing demand for appropriate techniques
which facilitate in analyzing this kind of data to mine for interesting patterns. Discovering interesting
correlation relationships among huge amounts of business transaction records can help in many
business decision-making processes such as cross-marketing, customer shopping behavior analysis, etc.
Frequent Pattern Mining seems to be very simple but it is very difficult to solve which challenged many
researchers.
A typical application of the frequent pattern mining techniques is market basket analysis. Market Basket
Analysis is a mathematical modeling technique based upon the theory that if you buy a certain group of
items, you are likely to buy another group of items. It is used to analyze the customer purchase
behavior. Identifying the purchase patterns like:-
what items tend to be purchased together
o Pizza; Coke
what items are purchased sequentially
o Television Set; DVD Player
what items are tend to be purchased by season
o Umbrellas and Raincoats during the rainy season
Let us look at an example showing the importance of market basket analysis. Suppose you are the
manager of a food retail mart. You are interested in finding which items are bought by the customers
whenever they are visiting your mart. If you can find such items by using the market basket analysis,
they can help you provide guidelines to come up with new advertising strategies or to design new
catalogues, etc., which can help promote the sales. For example, in one case the items which are bought
together when placed in a near proximity can help in incrementing the sales of those items together.
Customers who buy bread are more likely to buy jam. If these items are places in the near proximity,
people who buy bread are encouraged to buy Jam too. In another case, if you can place the items in the
different corners of the mart, they can help buy customers different other items too. Customers who
buy bread are likely to buy cheese and minced meat. So, if we place bread and Jam at the opposite
corners can promote the sale of cheese and minced meat too.
There are many considerations while preparing the data for market basket analysis.
-
We need to determine the scope of the dataset like data considered is from how many stores,
and what is the period these transactions were made, etc.
Generalizing the items to an appropriate level is an important consideration. For example rolling
up the rare items to get an adequate support. If the frequency of “Cherry” is very less than the
support, then we can roll it up to “Fruit”.
-
Also, if we consider all the items at the store as a Universal set, then each item has a Boolean
variable representing the presence or absence of that item. Each market basket can be
represented by a Boolean vector of values assigned to these variables. These vectors can then
be analyzed for the buying patterns that reflect items that are frequently associated or
purchased together. These patterns can be represented in the form of association rules. For
example, customers who buy bread also tend to buy jam which when represented in the form of
an association rule is shown as:
Bread  Jam [support=15% confidence=80%]
Support and confidence are the two measures representing the interestingness of an association rule
which individually shows the usefulness and the certainty of the rule. Typically an association rule will be
interesting when both the measures support and confidence exceeds the minimum support threshold
and minimum confidence threshold respectively which are set by the domain experts or users. In the
above example, 20% support means that of all the transactions, both these items are bought 20%
together. Confidence of 80% means 80% of the customers who bought bread also bought jam.
We cannot always guarantee that support and confidence is enough. So, consider Lift which is an
improvement to support and confidence. It tells us how much better a rule is at predicting the result
than just assuming the result in the first place. Suppose the rule is of the form:
LHS  RHS
Lift = P(LHS^RHS) / (P(LHS).P(RHS)
When lift > 1, then the rule is better at predicting the result than guessing.
When lift < 1, then the rule is doing worse than informed guessing and using the negative rule produces
a better rule than guessing.
Further analysis can be done to identify the correlations between associated items.