Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Corecursion wikipedia , lookup
Neuroinformatics wikipedia , lookup
Data analysis wikipedia , lookup
Multidimensional empirical mode decomposition wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Market Basket Analysis By Sowjanya Alaparthi Topics to be discussed • • • • • Introduction to Market basket analysis Apriori Algorithm Demo-1 ( Using self created table) Demo-2 ( Using Oracle sample schema) Demo-3 ( Using OLAP analytic workspace) Introduction to Market Basket Analysis • Def: Market Basket Analysis (Association Analysis) is a mathematical modeling technique based upon the theory that if you buy a certain group of items, you are likely to buy another group of items. • It is used to analyze the customer purchasing behavior and helps in increasing the sales and maintain inventory by focusing on the point of sale transaction data. • Given a dataset, the Apriori Algorithm trains and identifies product baskets and product association rules Definitions and Terminology • Transaction is a set of items (Itemset). • Confidence : It is the measure of uncertainty or trust worthiness associated with each discovered pattern. • Support : It is the measure of how often the collection of items in an association occur together as percentage of all transactions • Frequent itemset : If an itemset satisfies minimum support,then it is a frequent itemset. • Strong Association rules: Rules that satisfy both a minimum support threshold and a minimum confidence threshold • In Association rule mining, we first find all frequent itemsets and then generate strong association rules from the frequent itemsets Definitions and Terminology- Continued • Apriori algorithm is the most established algorithm for finding frequent item sets mining. • The basic principle of Apriori is “Any subset of a frequent itemset must be frequent”. • We use these frequent itemsets to generate association rules. Apriori Algorithm Ck: Candidate itemset of size k Lk: Frequent itemset of size k L1={frequent items}; For (k=1; Lk!=0; k++) do begin Ck+1= Candidates generated from Lk; For each transaction t in the database do Increment the count of all candidates in Ck+1 that are contained in t Lk+1=candidates in Ck+1 with min_support End Return UkLk; Pictorial representation of Apriori algorithm Step 1 Scan the transaction database to get the support S of each 1-itemsets, compare S with min _sup, and get a set of frequent 1itemsets,L1 Step 3 Scan the transaction database to get the support S of each candidate k-itemset in the final set, compare S with min_sup, and get a set of frequent k-itemsets,Lk Step 2 Use Lk-1 join Lk-1 to generate a candidate k-itemsets. And use appiori propersty to prune the unfrequented k-itemssets from this set. NO Step 4 The candidate set = null YES Step 6 For every non empty subset s of 1, output the rule “s =>(1-s)” if confidence C of the rule “s=> (1-s)”(=support S of 1/support S of s )3 Step 5 For each frequent itemset 1, generate all non empty subsets of 1 Demo-1 Installations Oracle 10g enterprise edition SQL Plus Oracle Data Miner Client Demo 1 - Data Preparation • Download the sample data, which is in excel sheet. • write macro to convert data in excel sheet to insert queries • Create a table and execute these insert queries in SQLplus • As we are connected to Oracle server, this table is then found in Oracle database Demo-1 Connections Connect Oracle Data Miner Client to Oracle Database • Make sure the oracle listener is listening • Database instance ‘ora478’ is started. • The port used is 1521 • Give the hostname as oracle.itk.ilstu.edu Demo-1 • Perform the activity, after installations and connections are made. Demo-2 ( using oracle sample schema) • Download Oracle 10g on your system and install it • Select the sample schema option during the custom installation • Launch Oracle Data Miner Client • In order to use this sample scheme for our activity, we should have the system administrator privileges. • The username is SH and password is password Demo -2 • Administrator should perform some grants in sqlplusw to build this activity. They are alter user sh account unlock; alter user sh identified by password; grant create table to sh; grant create sequence to sh; grant create session to sh; grant create view to sh; grant create procedure to sh; grant create job to sh; grant create type to sh; grant create synonym to sh; grant execute on ctxsys.ctx_ddl to sh; Demo-2 The points to be noted before starting the activity are: • Make sure the oracle listener is started • Database instance ‘ORCL’ is started. • The port used is 1521 • Give the hostname as 127.0.0.1, which is a general hostname. Demo-2 • Finally, the results from the model are published to a table, and this table forms the raw source for the new OLAP product dimension. • At this point there is no information relating to revenue, costs or quantity. So, we need to extend the activity beyond association analysis to OLAP. OLAP • We have to correctly format the results obtained from Association analysis for dimension mapping in OLAP. This can be done using OLAP DML or PL/SQL. • In our activity we create a separate dimension that can hold the results from algorithm. For each dimension we can create Levels, hierarchies, attributes and mappings. OLAP- Analytic workspace • Launch Analytic workspace and give the login details as Username- sh Connection information127.0.0.1:1521:orcl This connects to Oracle sample schema SH on 1521 port and local host 127.0.0.1 and orcl database instance. Demo 3- OLAP Analytic Workspace • Perform the activity and show the mappings Conclusion • We have shown how Market basket analysis using association rules works in determining the customer buying patterns. This can be further extended using OLAP Analytic workspace as shown in demo-3, to add dimensions and cube to identify other measures like costs, revenue and quantity. References Books: • [1] Michael J. A. Berry, Gordon Linoff.”Data Mining Techniques: For Marketing, Sales, and Customer Support (Paperback)”. • [2] J.Han,M. Kamber(2001) “Data Mining”, Morgan Kaufmann publishers, San Francisco,CA Links: • [3]. http://oraclebi.blogspot.com/2007/02/using-market-basketanalysis-to-add.html • [4]. http://nymetro.chapter.informs.org/prac_cor_pubs/Ausleder-Onmarket-basket-analysis-May-04.pdf • [5]. http://www2.sas.com/proceedings/sugi28/223-28.pdf • [6] http://en.wikipedia.org/wiki/Market_basket_analysis • [7] http://www.cs.ualberta.ca/~zaine/courses/cmput499/slides/lect10/sl d053.htm • [8] http://www.icaen.uiowa.edu/~comp/Public/Apriori.pdf Questions??