Download Constrained Frequent Itemset Mining from Uncertain Data Streams

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Carson Kai-Sang Leung, Boyu Hao, Fan Jiang
ICDE 2010
1


Motivation
Method (UF-streaming+, UF-streaming*,
CUF-streaming)

Experimental results

Conclusion
2

There are many situations in which ones are
uncertain about the contents of transactions.
Moreover, there are also situations in which users
are interested in only some portions of the mined
frequent itemsets.
3
Minsup=1.2
preMinsup=0.9
First batch:
a
b
c
d
e
1.8 1.6 1.9 0.9 1.4
例如:
expSup({a, e}) = (1 × 0.9 × 0.6) + (1 × 0.9 × 0.7) = 1.17 ≥ preMinsup
expSup({c, e}) = (1 × 0.7 × 0.6)+ (1 × 0.8 × 0.7) = 0.98 ≥ preMinsup
expSup({d, e}) =1 × 0.9 × 0.1 = 0.09 < preMinsup
expSup({a, c, e}) = (1 × 0.9 × 0.7×0.6) + (1 × 0.9 × 0.8×0.7) ≈ 0.88 < preMinsup.)
4
First batch:
{a} {a, c} {a, e} {b} {c} {c, e} {d} {e}
1.8 ,1.35, 1.17 ,1.6 ,1.5 ,0.98 , 0.9 , 1.4
----------------------------------Second batch:
{a} {a, c} {b} {b, d} {c} {d}
0.9, 0.9, 1.4, 1.4, 1.8, 2.0
5
Second batch:
{a} {a, c} {b} {b, d} {c} {d}
0.9, 0.9, 1.4, 1.4, 1.8, 2.0
----------------------------third batch:
{a} {a, c} {b} {b, d} {c} {d}
1.7, 1.53, 1.0, 1.0, 1.9 1.2
post-processing step:
{a}:2.6, {a, c}:2.43, {b}:2.4 and {c}:3.7 satisfying C1.
6

the algorithm first uses the same UF-growth mining technique to find all
“frequent” itemsets, and it then checks the mined itemsets against
userspecified constraints before storing the constrained itemsets in the UFstream structure.




7
Type1: ANTI-MONOTONE CONSTRAINT
min(X.attr) ≥ const  R+ (Xi.attr ≤ Xi+1.attr)
max(X.attr) ≤ const  R- (Xi.attr ≥ Xi+1.attr)
Ex : C1 ≡ min(X.WBC) ≥ 10*103/μL
 (e , d , c , b , a )
9.0
9.5 10.5 11.0 11.5
Type2: MONOTONE CONSTRAINT
max(X.attr) ≥ const  Rmin(X.attr) ≤ const  R+
Ex: C2 ≡ max(X.RBC) ≥ 6.1 × 106/μL
a
b
c
d
e
8.5
3.3
7.5
6.6
5.9

a
c
d
e
b
8.5
7.5
6.6
5.9
3.3
8
Type3: CONVERTIBLE ANTI-MONOTONE CONSTRAINT
avg (X.attr) ≥ const or sum(X−.attr) ≥ const  Ravg (X.attr) ≤ const or sum(X+.attr) ≤ const R+
Ex: C3 ≡ sum(X.Rainfall ) ≤ 200mm
a
b
c
d
e
50
33
200 101 120

b
a
d
e
c
33
50
101 120 200
Type4: CONVERTIBLE MONOTONE CONSTRAINT
sum(X+.attr) ≥ const  Rsum(X−.attr) ≤ const  R+
Ex: C4 ≡sum(X.Rainfall ) ≥ 200mm
a
b
201 52
c
d
e
70
300 180

d
a
e
c
b
300
201
180
70
52
9
10
C1 ≡ min(X.WBC) ≥ 10000/μL R+
(e,d,c,b,a)
{a}
 (c,b,a)
check
{a, c} {a, e} {b} {c} {c, e} {d}
{e}
 {a}, {a, c}, {b} ,{c}
11
12
we proposed three tree-based algorithms—namely, UFstreaming+, UF-streaming∗ and CUF-streaming— which
integrate :
(i) mining of uncertain data
(ii) constrained mining
(iii) mining of data streams. These algorithms effectively mine
constrained frequent itemsets from uncertain data streams.

13
Related documents