Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Efficient Mining of High Utility Itemsets from Large Datasets Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley Western Australia PAKDD08 1 Outline Introduction Preliminaries Method – Compressed Transaction Utility-Prol Experiments Conclusions 2 Introduction The goal of frequent itemset mining is to find items that co-occur in a transaction database above a user given frequency threshold, without considering the quantity or weight such as profit of the items. Quantity and weight are significant for addressing real world decision problems that require maximizing the utility in an organization. TwoPhase based on Apriori is suitable for sparse data sets with short patterns, CTU-Mine based on the pattern growth is suitable for dense data. 3 Definition 4 u(3 4, t1) =$60 u(3 4, t3)=$60 u(3 4) = $120 , Definition Transaction Utility : Transaction weighted Utility: twu(X) tu(T Tq D X Tq tu(1) = 80 twu(3 4)=$190 5 q ) Compressed Transaction Utility-Prol 6 GlobalItem index 1 2 3 4 5 - Original item id 5 1 2 4 3 6 Profit 5 10 150 35 25 2 Quantity 60 12 5 4 2 TWU 98 7 96 810 595 422 9 4 9 4 99<min_Utility(129.9) Compressed Utility Pattern-Tree Parallel projection of transaction database 7 CUP-tree Traverse index 1 (110) from 5, 2 (310) from (2,3,4), 3 (195) from 2, and 4 (190)from (3,5) 8 ProCUP-tree index 1 (110) from 5, cause 110<min_Utility(129.9) 2 (310) from (2,3,4),3 (195) from 2, and 4 (190)from (3,5) 9 ProCUP-tree GlobalItem index 1 2 3 4 5 Original item id 5 1 2 4 3 ProItem index -- 1 2 3 -- Profit 5 10 150 35 25 Quantity 60 12 4 5 4 TWU 987 964 810 595 422 oriUtility*itemQuantity + proUtility*proQuantity = Utility 35*2+25*2=120, 150*1+25*1=175, 10*5+25*3=125 High_Utility_Itemset = (3,2) (3,2,1) 10 Experiments 11 Conclusion CTU-Pro algorithm to mine the complete set of high utility itemsets from both sparse and relatively dense datasets with short or longer high utility patterns. The algorithm adapts to large data by constructing parallel subdivisions on disk that can be mined independently. 12