Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Market-Basket Analysis Frequent Itemset Mining helps in discovering the associations and correlations among items in large transactional or relational data sets. Because of the technological advancements, huge data is getting generated extensively which is being stored. There is a growing demand for appropriate techniques which facilitate in analyzing this kind of data to mine for interesting patterns. Discovering interesting correlation relationships among huge amounts of business transaction records can help in many business decision-making processes such as cross-marketing, customer shopping behavior analysis, etc. Frequent Pattern Mining seems to be very simple but it is very difficult to solve which challenged many researchers. A typical application of the frequent pattern mining techniques is market basket analysis. Market Basket Analysis is a mathematical modeling technique based upon the theory that if you buy a certain group of items, you are likely to buy another group of items. It is used to analyze the customer purchase behavior. Identifying the purchase patterns like:- what items tend to be purchased together o Pizza; Coke what items are purchased sequentially o Television Set; DVD Player what items are tend to be purchased by season o Umbrellas and Raincoats during the rainy season Let us look at an example showing the importance of market basket analysis. Suppose you are the manager of a food retail mart. You are interested in finding which items are bought by the customers whenever they are visiting your mart. If you can find such items by using the market basket analysis, they can help you provide guidelines to come up with new advertising strategies or to design new catalogues, etc., which can help promote the sales. For example, in one case the items which are bought together when placed in a near proximity can help in incrementing the sales of those items together. Customers who buy bread are more likely to buy jam. If these items are places in the near proximity, people who buy bread are encouraged to buy Jam too. In another case, if you can place the items in the different corners of the mart, they can help buy customers different other items too. Customers who buy bread are likely to buy cheese and minced meat. So, if we place bread and Jam at the opposite corners can promote the sale of cheese and minced meat too. There are many considerations while preparing the data for market basket analysis. - We need to determine the scope of the dataset like data considered is from how many stores, and what is the period these transactions were made, etc. Generalizing the items to an appropriate level is an important consideration. For example rolling up the rare items to get an adequate support. If the frequency of “Cherry” is very less than the support, then we can roll it up to “Fruit”. - Also, if we consider all the items at the store as a Universal set, then each item has a Boolean variable representing the presence or absence of that item. Each market basket can be represented by a Boolean vector of values assigned to these variables. These vectors can then be analyzed for the buying patterns that reflect items that are frequently associated or purchased together. These patterns can be represented in the form of association rules. For example, customers who buy bread also tend to buy jam which when represented in the form of an association rule is shown as: Bread Jam [support=15% confidence=80%] Support and confidence are the two measures representing the interestingness of an association rule which individually shows the usefulness and the certainty of the rule. Typically an association rule will be interesting when both the measures support and confidence exceeds the minimum support threshold and minimum confidence threshold respectively which are set by the domain experts or users. In the above example, 20% support means that of all the transactions, both these items are bought 20% together. Confidence of 80% means 80% of the customers who bought bread also bought jam. We cannot always guarantee that support and confidence is enough. So, consider Lift which is an improvement to support and confidence. It tells us how much better a rule is at predicting the result than just assuming the result in the first place. Suppose the rule is of the form: LHS RHS Lift = P(LHS^RHS) / (P(LHS).P(RHS) When lift > 1, then the rule is better at predicting the result than guessing. When lift < 1, then the rule is doing worse than informed guessing and using the negative rule produces a better rule than guessing. Further analysis can be done to identify the correlations between associated items.