Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008 Outlines • Introduction • Problem Definition • The MDSDS Approach • Experimental Results • Conclusions 2 Introduction • We propose to consider the intrinsic multidimensionality of the streams for the extraction of more interesting sequential patterns. • The search space in multidimensional framework is huge. • We only focus on the most specific abstraction level for items instead of mining at all possible levels. 3 Problem Definition • multidimensional item a = (d1, . . . , dm) • * : wild-card value that can be interpreted by ALL. • multidimensional itemset i = {a1, . . . , ak} • multidimensional sequence s = <i1, . . . , il> 4 Cont. • We focus on the most specific frequent items to generate the multidimensional sequential patterns. • E.g. ▫ If items (LA, ∗, M, ∗) and (∗, ∗, M, Wii) are frequent, we do not consider the frequent items (LA, ∗, ∗, ∗), (∗, ∗, M, ∗) and (∗, ∗, ∗, Wii). 5 Cont. • Data stream DS = B0, B1, . . . , Bn • Bi = {B1, B2, B3, ..., Bk} B0 B1 B1 B2 B3 6 7 Cont. • min_sup = 50% • <{(∗, ∗, M, ∗)}> • <{(LA, ∗, Y, ∗)}> <{(LA, ∗, ∗, ∗)}> <{(LA, ∗, ∗, iPid)}> <{(∗, ∗, M, ∗)} {(∗, ∗, M, ∗)}> <{(∗, ∗, M, Wii)}> • specialization The MDSDS Approach • MDSDS extracts the most specific multidimensional items. • MDSDS uses a data structure consisting of a prefix-tree and tilted-time windows tables. • The patterns are: (1) frequent patterns, (2) sub-frequent patterns, (3) infrequent patterns (not stored in the prefix-tree). 8 Cont. • Step 1 : mine the most specific multidimensional items ▫ . 10 1 5 8 11 2 3 4 6 7 9 12 13 14 15 ▫ Multidimensional representation : (LA, ∗, ∗, ∗), (∗, ∗, M, ∗) ▫ Detecting the specialization or generalization. 9 Cont. • Step 2 : ▫ Subfrequent sequences may become frequent in future batches. ▫ Using PrefixSpan algorithm to mine efficiently the multidimensional sequences. 10 PrefixSpan algorithm • . min_sup = 2 • 1. Find length-1 sequential patterns, <a>:4, <b>:4, <c>:4, <d>:3, <e>:3, <f>:3. • 2. Divide search space, (1) the ones having prefix<a>;…; and (6) the ones having prefix<f>. ▫ <a>-projected database:<(abc)(ac)d(cf)>, <(_d)c(bc)(ae)>, <(_b)(df)cb>, <(_f)cbc>. ▫ The length-2 sequential patterns <aa>:2, <ab>:4, <(ab)>:2, <ac>:4, <ad>:2, <af>:2. ▫ … 11 Cont. • 3. Find subsets of sequential patterns. 12 Cont. • Step 3 : ▫ Tilted-time windows table ▫ The updating operations and pruning techniques are done after receiving a batch from the data stream. 13 Tilted-time windows • . • . 14 Cont. • . • . 15 Experimental Results 16 Cont. 17 Cont. 18 Conclusions • Experiments on real data gathered from TCP/IP network traffic provide compelling evidence that it is possible to obtain accurate and fast results for multidimensional sequential pattern mining. • We propose to take multidimensional framework into account in order to detect high-level changes like trends. 19