Download Mining Serial Episode Rules with Time Lags over Multiple Data

Efficient Mining of High Utility Itemsets from Large Datasets Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley Western Australia PAKDD08 1 Outline  Introduction  Preliminaries  Method – Compressed Transaction Utility-Prol  Experiments  Conclusions 2 Introduction  The goal of frequent itemset mining is to find items that co-occur in a transaction database above a user given frequency threshold, without considering the quantity or weight such as profit of the items.  Quantity and weight are significant for addressing real world decision problems that require maximizing the utility in an organization.  TwoPhase based on Apriori is suitable for sparse data sets with short patterns, CTU-Mine based on the pattern growth is suitable for dense data. 3 Definition   4 u(3 4, t1) =$60 u(3 4, t3)=$60 u(3 4) = $120 , Definition  Transaction Utility :  Transaction weighted Utility: twu(X)   tu(T Tq D  X  Tq  tu(1) = 80 twu(3 4)=$190 5 q ) Compressed Transaction Utility-Prol 6 GlobalItem index 1 2 3 4 5 - Original item id 5 1 2 4 3 6 Profit 5 10 150 35 25 2 Quantity 60 12 5 4 2 TWU 98 7 96 810 595 422 9 4 9 4 99<min_Utility(129.9) Compressed Utility Pattern-Tree  Parallel projection of transaction database 7 CUP-tree  Traverse index 1 (110) from 5, 2 (310) from (2,3,4),  3 (195) from 2, and 4 (190)from (3,5) 8 ProCUP-tree  index 1 (110) from 5, cause 110<min_Utility(129.9)  2 (310) from (2,3,4),3 (195) from 2, and 4 (190)from (3,5) 9 ProCUP-tree  GlobalItem index 1 2 3 4 5 Original item id 5 1 2 4 3 ProItem index -- 1 2 3 -- Profit 5 10 150 35 25 Quantity 60 12 4 5 4 TWU 987 964 810 595 422  oriUtility*itemQuantity + proUtility*proQuantity = Utility  35*2+25*2=120, 150*1+25*1=175, 10*5+25*3=125  High_Utility_Itemset = (3,2) (3,2,1) 10 Experiments 11 Conclusion  CTU-Pro algorithm to mine the complete set of high utility itemsets from both sparse and relatively dense datasets with short or longer high utility patterns.  The algorithm adapts to large data by constructing parallel subdivisions on disk that can be mined independently. 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Mining Serial Episode Rules with Time Lags over Multiple Data