Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Frequent pattern mining Sequential pattern mining Frequent closed pattern mining Our new methods Conventional methods FP-growth Apriori, TreeProjection PrefixSpan, FreeSpan CLOSET GSP A-close, CHARM Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins Mining Complete Set of Frequent Patterns on T10I4D100k Runtime (second) 140 120 Apriori 100 TreeProjection 80 FP-growth 60 40 20 0 0.00% 0.05% 0.10% Support threshold 0.15% Mining Complete Set of Frequent Patterns on T25I20D100k Runtime (second) 200 180 Apriori TreeProjection 160 140 120 FP-growth 100 80 60 40 20 0 0.00% 0.50% 1.00% Support threshold 1.50% Runtime (second) Mining Complete Set of Frequent Patterns on Connect-4 400 Apriori 350 TreeProjection 300 FP-growth 250 200 150 100 50 0 70% 75% 80% 85% Support threshold 90% 95% Mining Sequential Patterns on C10T4S16I4 PrefixSpan-1 PrefixSpan-2 GSP FreeSpan-2 800 Run time (second) 700 600 500 400 300 200 100 0 0.00% 0.50% 1.00% Support threshold 1.50% 2.00% Mining Sequential Patterns on C10T8S8I8 PrefixSpan-1 PrefixSpan-2 GSP FreeSpan-2 200 180 Run time (second) 160 140 120 100 80 60 40 20 0 0.00% 0.50% 1.00% Support threshold 1.50% 2.00% Scalability of Mining Sequential Patterns on C10-100T8S8I8 800 PrefixSpan-1 PrefixSpan-2 GSP FreeSpan-2 Run time (second) 700 600 500 400 300 200 100 0 0 20000 40000 60000 Numbe r of se que nce s 80000 100000 Scalability of Mining Sequential Patterns on C10-100T4S16I4 1600 PrefixSpan-1 PrefixSpan-2 GSP FreeSpan-2 Run time (second) 1400 1200 1000 800 600 400 200 0 0 20000 40000 60000 Numbe r of se que nce s 80000 100000 Why Prefix Is Faster Than GSP? 100 10 1 0.00% # cand/pattern in GSP 100 Runtime/proj. db in PrefixSpan 10 # cand/pattern in GSP Runtime/proj. db in PrefixSpan 1 0.50% 1.00% 1.50% 2.00% 0.00% 0.1 0.1 0.01 0.01 0.001 0.001 Support threshold Dataset C10T4S16I4 0.50% 1.00% 1.50% Support threshold Dataset C10T8S8I8 2.00% Mining Frequent Closed Itemsets on T25I20D100k 100 A-CLOSE CLOSET Runtime (second) 80 ChARM 60 40 20 0 0.7% 0.9% 1.1% Support threshold 1.3% 1.5% Mining Frequent Closed Itemsets on Connect-4 10000 A-CLOSE Runtime (second) CLOSET ChARM 1000 100 10 1 40% 50% 60% 70% 80% Support threshold 90% 100% Mining Frequent Closed Itemsets on Pumsb 300 A-CLOSE Runtime (second) 250 CLOSET ChARM 200 150 100 50 0 75% 80% 85% Support threshold 90% 95% References R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), (to appear), 2000. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487--499, Santiago, Chile, September 1994. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. KDD'2000, Boston, August 2000. J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, Proc. SIGMOD’2000, Dallas, TX, May 2000. J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, submitted for publication R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT), pages 3-17, Avignon, France, March 1996. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. ICDT’99, Israel, January 1999. M.J. Zaki and C. Hsiao. ChARM: An efficient algorithm for closed association rule mining. In Proc. KDD'2000, Boston, August 2000. DBMiner Version 2.5 (Beta) DBMiner Technology Inc. B.C. Canada What we had for DBMiner 2.0… Association module on data cubes Classification module on data cubes Clustering module on data cubes OLAP browser 3D Cube browser What we will do in DBMiner 2.5… Keep the existing association module and classification module in version 2.0 Change the existing clustering module Add new visual classification module both on SQL server and OLAP Add new sequential pattern modules on SQL server using FP algorithm What we have done… We have incorporated the existing association module and added OLAP browser Module We have added the visual classification module We have changed the existing clustering module We have added the sequential pattern module We are still in the development stage Association module on data cubes New sequential pattern module on SQL Server New visual classification module on data cubes New clustering module on data cubes