Download dbminer - SFU computing science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Our New Progress on
Frequent/Sequential Pattern Mining

We develop new frequent/sequential
pattern mining methods
Frequent pattern
mining
Sequential pattern
mining
Frequent closed
pattern mining

Our new
methods
Conventional
methods
FP-growth
Apriori, TreeProjection
PrefixSpan,
FreeSpan
CLOSET
GSP
A-close, CHARM
Performance study on both synthetic and
real data sets shows that our methods
outperform conventional ones in wide
margins
Mining Complete Set of Frequent
Patterns on T10I4D100k
Runtime (second)
140
120
Apriori
100
TreeProjection
80
FP-growth
60
40
20
0
0.00%
0.05%
0.10%
Support threshold
0.15%
Mining Complete Set of Frequent
Patterns on T25I20D100k
Runtime (second)
200
180
Apriori
TreeProjection
160
140
120
FP-growth
100
80
60
40
20
0
0.00%
0.50%
1.00%
Support threshold
1.50%
Runtime (second)
Mining Complete Set of Frequent
Patterns on Connect-4
400
Apriori
350
TreeProjection
300
FP-growth
250
200
150
100
50
0
70%
75%
80%
85%
Support threshold
90%
95%
Mining Sequential Patterns on
C10T4S16I4
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
800
Run time (second)
700
600
500
400
300
200
100
0
0.00%
0.50%
1.00%
Support threshold
1.50%
2.00%
Mining Sequential Patterns on
C10T8S8I8
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
200
180
Run time (second)
160
140
120
100
80
60
40
20
0
0.00%
0.50%
1.00%
Support threshold
1.50%
2.00%
Scalability of Mining Sequential
Patterns on C10-100T8S8I8
800
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
Run time (second)
700
600
500
400
300
200
100
0
0
20000
40000
60000
Numbe r of se que nce s
80000
100000
Scalability of Mining Sequential
Patterns on C10-100T4S16I4
1600
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
Run time (second)
1400
1200
1000
800
600
400
200
0
0
20000
40000
60000
Numbe r of se que nce s
80000
100000
Why Prefix Is Faster Than GSP?
100
10
1
0.00%
# cand/pattern in
GSP
100
Runtime/proj. db in
PrefixSpan
10
# cand/pattern in
GSP
Runtime/proj. db in
PrefixSpan
1
0.50%
1.00%
1.50%
2.00%
0.00%
0.1
0.1
0.01
0.01
0.001
0.001
Support threshold
Dataset C10T4S16I4
0.50%
1.00%
1.50%
Support threshold
Dataset C10T8S8I8
2.00%
Mining Frequent Closed
Itemsets on T25I20D100k
100
A-CLOSE
CLOSET
Runtime (second)
80
ChARM
60
40
20
0
0.7%
0.9%
1.1%
Support threshold
1.3%
1.5%
Mining Frequent Closed
Itemsets on Connect-4
10000
A-CLOSE
Runtime (second)
CLOSET
ChARM
1000
100
10
1
40%
50%
60%
70%
80%
Support threshold
90%
100%
Mining Frequent Closed
Itemsets on Pumsb
300
A-CLOSE
Runtime (second)
250
CLOSET
ChARM
200
150
100
50
0
75%
80%
85%
Support threshold
90%
95%
References








R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of
frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High
Performance Data Mining), (to appear), 2000.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int.
Conf. Very Large Data Bases, pages 487--499, Santiago, Chile, September 1994.
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. FreeSpan: Frequent
pattern-projected sequential pattern mining. In Proc. KDD'2000, Boston, August 2000.
J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, Proc.
SIGMOD’2000, Dallas, TX, May 2000.
J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential
Patterns Efficiently by Prefix-Projected Pattern Growth, submitted for publication
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance
improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT), pages 3-17, Avignon, France, March 1996.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for
association rules. In Proc. ICDT’99, Israel, January 1999.
M.J. Zaki and C. Hsiao. ChARM: An efficient algorithm for closed association rule mining.
In Proc. KDD'2000, Boston, August 2000.
DBMiner Version 2.5 (Beta)
DBMiner Technology Inc.
B.C. Canada
What we had for DBMiner 2.0…





Association module on data cubes
Classification module on data cubes
Clustering module on data cubes
OLAP browser
3D Cube browser
What we will do in DBMiner 2.5…




Keep the existing association module and
classification module in version 2.0
Change the existing clustering module
Add new visual classification module both
on SQL server and OLAP
Add new sequential pattern modules on
SQL server using FP algorithm
What we have done…





We have incorporated the existing
association module and added OLAP
browser Module
We have added the visual classification
module
We have changed the existing clustering
module
We have added the sequential pattern
module
We are still in the development stage
Association module on data cubes
New sequential pattern module
on SQL Server
New visual classification module
on data cubes
New clustering module on data
cubes
Related documents