Download Mining Multidimensional Sequential Patterns over Data Streams

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Mining Multidimensional
Sequential Patterns
over Data Streams
Chedy Raїssi and Marc Plantevit
DaWak_2008
Outlines
• Introduction
• Problem Definition
• The MDSDS Approach
• Experimental Results
• Conclusions
2
Introduction
• We propose to consider the intrinsic multidimensionality of
the streams for the extraction of more interesting sequential
patterns.
• The search space in multidimensional framework is huge.
• We only focus on the most specific abstraction level for items
instead of mining at all possible levels.
3
Problem Definition
• multidimensional item a = (d1, . . . , dm)
• * : wild-card value that can be interpreted by ALL.
• multidimensional itemset i = {a1, . . . , ak}
• multidimensional sequence s = <i1, . . . , il>
4
Cont.
• We focus on the most specific frequent items to generate the
multidimensional sequential patterns.
• E.g.
▫ If items (LA, ∗, M, ∗) and (∗, ∗, M, Wii) are frequent, we do not
consider the frequent items (LA, ∗, ∗, ∗), (∗, ∗, M, ∗) and
(∗, ∗, ∗, Wii).
5
Cont.
• Data stream DS = B0, B1, . . . , Bn
• Bi = {B1, B2, B3, ..., Bk}
B0
B1
B1
B2
B3
6
7
Cont.
• min_sup = 50%
• <{(∗, ∗, M, ∗)}>
• <{(LA, ∗, Y, ∗)}>
<{(LA, ∗, ∗, ∗)}>
<{(LA, ∗, ∗, iPid)}>
<{(∗, ∗, M, ∗)} {(∗, ∗, M, ∗)}>
<{(∗, ∗, M, Wii)}>
• specialization
The MDSDS Approach
• MDSDS extracts the most specific multidimensional items.
• MDSDS uses a data structure consisting of a prefix-tree and
tilted-time windows tables.
• The patterns are:
(1) frequent patterns,
(2) sub-frequent patterns,
(3) infrequent patterns (not stored in the prefix-tree).
8
Cont.
• Step 1 : mine the most specific multidimensional items
▫ .
10
1
5
8
11
2
3
4
6
7
9
12
13
14
15
▫ Multidimensional representation : (LA, ∗, ∗, ∗), (∗, ∗, M, ∗)
▫ Detecting the specialization or generalization.
9
Cont.
• Step 2 :
▫ Subfrequent sequences may become frequent in future
batches.
▫ Using PrefixSpan algorithm to mine efficiently the
multidimensional sequences.
10
PrefixSpan algorithm
• .
min_sup = 2
• 1. Find length-1 sequential patterns, <a>:4, <b>:4, <c>:4, <d>:3,
<e>:3, <f>:3.
• 2. Divide search space, (1) the ones having prefix<a>;…; and (6) the
ones having prefix<f>.
▫ <a>-projected database:<(abc)(ac)d(cf)>, <(_d)c(bc)(ae)>,
<(_b)(df)cb>, <(_f)cbc>.
▫ The length-2 sequential patterns <aa>:2, <ab>:4, <(ab)>:2, <ac>:4,
<ad>:2, <af>:2.
▫ …
11
Cont.
• 3. Find subsets of sequential patterns.
12
Cont.
• Step 3 :
▫ Tilted-time windows table
▫ The updating operations and pruning techniques are done
after receiving a batch from the data stream.
13
Tilted-time windows
• .
• .
14
Cont.
• .
• .
15
Experimental Results
16
Cont.
17
Cont.
18
Conclusions
• Experiments on real data gathered from TCP/IP network
traffic provide compelling evidence that it is possible to
obtain accurate and fast results for multidimensional
sequential pattern mining.
• We propose to take multidimensional framework into
account in order to detect high-level changes like trends.
19