Download Mining Multidimensional Sequential Patterns over Data Streams

Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008 Outlines • Introduction • Problem Definition • The MDSDS Approach • Experimental Results • Conclusions 2 Introduction • We propose to consider the intrinsic multidimensionality of the streams for the extraction of more interesting sequential patterns. • The search space in multidimensional framework is huge. • We only focus on the most specific abstraction level for items instead of mining at all possible levels. 3 Problem Definition • multidimensional item a = (d1, . . . , dm) • * : wild-card value that can be interpreted by ALL. • multidimensional itemset i = {a1, . . . , ak} • multidimensional sequence s = <i1, . . . , il> 4 Cont. • We focus on the most specific frequent items to generate the multidimensional sequential patterns. • E.g. ▫ If items (LA, ∗, M, ∗) and (∗, ∗, M, Wii) are frequent, we do not consider the frequent items (LA, ∗, ∗, ∗), (∗, ∗, M, ∗) and (∗, ∗, ∗, Wii). 5 Cont. • Data stream DS = B0, B1, . . . , Bn • Bi = {B1, B2, B3, ..., Bk} B0 B1 B1 B2 B3 6 7 Cont. • min_sup = 50% • <{(∗, ∗, M, ∗)}> • <{(LA, ∗, Y, ∗)}> <{(LA, ∗, ∗, ∗)}> <{(LA, ∗, ∗, iPid)}> <{(∗, ∗, M, ∗)} {(∗, ∗, M, ∗)}> <{(∗, ∗, M, Wii)}> • specialization The MDSDS Approach • MDSDS extracts the most specific multidimensional items. • MDSDS uses a data structure consisting of a prefix-tree and tilted-time windows tables. • The patterns are: (1) frequent patterns, (2) sub-frequent patterns, (3) infrequent patterns (not stored in the prefix-tree). 8 Cont. • Step 1 : mine the most specific multidimensional items ▫ . 10 1 5 8 11 2 3 4 6 7 9 12 13 14 15 ▫ Multidimensional representation : (LA, ∗, ∗, ∗), (∗, ∗, M, ∗) ▫ Detecting the specialization or generalization. 9 Cont. • Step 2 : ▫ Subfrequent sequences may become frequent in future batches. ▫ Using PrefixSpan algorithm to mine efficiently the multidimensional sequences. 10 PrefixSpan algorithm • . min_sup = 2 • 1. Find length-1 sequential patterns, <a>:4, <b>:4, <c>:4, <d>:3, <e>:3, <f>:3. • 2. Divide search space, (1) the ones having prefix<a>;…; and (6) the ones having prefix<f>. ▫ <a>-projected database:<(abc)(ac)d(cf)>, <(_d)c(bc)(ae)>, <(_b)(df)cb>, <(_f)cbc>. ▫ The length-2 sequential patterns <aa>:2, <ab>:4, <(ab)>:2, <ac>:4, <ad>:2, <af>:2. ▫ … 11 Cont. • 3. Find subsets of sequential patterns. 12 Cont. • Step 3 : ▫ Tilted-time windows table ▫ The updating operations and pruning techniques are done after receiving a batch from the data stream. 13 Tilted-time windows • . • . 14 Cont. • . • . 15 Experimental Results 16 Cont. 17 Cont. 18 Conclusions • Experiments on real data gathered from TCP/IP network traffic provide compelling evidence that it is possible to obtain accurate and fast results for multidimensional sequential pattern mining. • We propose to take multidimensional framework into account in order to detect high-level changes like trends. 19

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Mining Multidimensional Sequential Patterns over Data Streams