Download Frequent Pattern Mining from Time-Fading Streams of Uncertain Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Frequent Pattern Mining from
Time-Fading Streams of Uncertain
Data
Carson Kai-Sang Leung and Fan Jiang
DaWaK 2011
1
Outline
Motivation
Background
Method






A Naive Algorithm:TUF-Streaming(Naive)
A Space-Saving Algorithm:TUF-Streaming(Space)
A Time-Saving Algorithm:TUF-Streaming(Time)
Experimental Result
Conclusion


2
Motivation
In past few years, several mining algorithms have been
proposed to discover frequent patterns from uncertain
data. However, most of them mine frequent patterns from
static databases—but not dynamic streams—of uncertain
data.

3
Background

Mining from Static Database of Uncertain data

x:item
X:itemset
DB: transaction database
ti:transaction
the expected support of X in the DB can be computed by
summing (over all transactionst1, ..., t|DB|) the product (of
existential probabilities of items within X):




4
Background

Mining from Uncertain data Streams with Sliding window

Bi:batch
X:itemset
T:time
DB: transaction database
the expected support of X in the current sliding window
containing 𝜔 batches of uncertain data in Batches
𝐵𝑇−𝜔+1 , … , 𝐵𝑇 inclusive can be computed as follows:




5
A Naive Algorithm:TUF-Streaming(Naive)
E
minsup=1.0
preMinsup=0.8

expSup a = 0.7 × 1 + 1.0 × 1 = 1.7

expSup b = 0.9 × 1 + 0.9 × 1 = 1.8

expSup c = 0.8 × 1 + 0.8 × 1 = 1.6

expSup d = 0.1 × 1 + 0.6 × 1 + 0.6 × 1 = 1.3

expSup {b, c} = 0.9 × 0.8 × 1 + 0.9 × 0.8 × 1 = 1.44

expSup {b, d} = 0.9 × 0.6 × 1 + 0.9 × 0.6 × 1 = 1.08

expSup {b, c, d} = 0.9 × 0.8 × 0.6 × 1 + 0.9 × 0.8 × 0.6 × 1 = 0.86

expSup {c, d} = 0.8 × 0.6 × 1 + 0.8 × 0.6 × 1 = 0.96
6
(Cont.)
7
(Cont.)
minsup=1.0
preMinsup=0.8
𝛼 = 0.9
𝑒𝑥𝑝𝑆𝑢𝑝 𝑏, 𝑐 , 𝐵1 ∪ 𝐵2 = 1.44 × 0.9 + 0.8 ≈ 2.10
𝑒𝑥𝑝𝑆𝑢𝑝 𝑏, 𝑐 , 𝐵1 ∪ 𝐵2 ∪ 𝐵3 = 1.44 × 0.92 + 0.8 ×
0.9 + 0 ≈ 1.89


8
A Space-Saving Algorithm:TUFStreaming(Space)
minsup=1.0
preMinsup=0.8
𝛼 = 0.9
𝑒𝑥𝑝𝑆𝑢𝑝 𝑏, 𝑐 , 𝐵1 ∪ 𝐵2 = 1.44 × 0.9 + 0.8 ≈ 2.10
𝑒𝑥𝑝𝑆𝑢𝑝 𝑏, 𝑐 , 𝐵1 ∪ 𝐵2 ∪ 𝐵3 = (1.44 × 0.9 + 0.8) ×
0.9 + 0 ≈ 1.89


9
(Cont.)
10
A Time-Saving Algorithm:TUFStreaming(Time)

In 𝐵1 have frequent:
{a}=1.7,{b}=1.8,{b,c}=1.44,{b,c,d}=0.86,{b,d}=1.08,{c}=1.6,
{c,d}=0.96,{d}=1.3
Last batch
Last batch’s expected
support
11
(Cont.)
minsup=1.0
preMinsup=0.8
𝛼 = 0.9

In 𝐵2 have frequent:{a},{a,d},{b},{b,c},{c},{d}

𝑒𝑥𝑝𝑆𝑢𝑝 𝑏, 𝑐 , 𝐵1 ∪ 𝐵2 = 1.44 × 0.92−1 + 0.8 ≈ 2.1
12
(Cont.)
minsup=1.0
preMinsup=0.8
𝛼 = 0.9

In 𝐵3 have frequent:{a},{b},{b,d},{c},{d}

𝑒𝑥𝑝𝑆𝑢𝑝 𝑏, 𝑑 , 𝐵1 ∪ 𝐵2 ∪ 𝐵3 = 1.08 × 0.93−1 + 1.44 ≈
2.31
13
Experimental Result
14
(Cont.)
15
Conclusion

In this paper, we proposed tree-based mining algorithms
that can be used for mining frequent patterns from
dynamic streams of uncertain data with both time-fading
and landmark models.
16
Related documents