Download 1 DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DSM-TKP: Mining Top-K Path
Traversal Patterns
over Web Click-Streams DSM-TKP:
Mining Top-K Path Traversal Patterns
over Web Click-Streams
Hua-Fu Lia, Suh-Yin Lee, and Man-Kwan Shanb
Proceedings of of the 2005 IEEE/WIC/ACM International
Conference on Web Intelligence (WI 2005)
Advisor:Jia-Ling Koh
Speaker:Chun-Wei Hsieh
1
Introduction

The challenge of Data stream
– Limited memory
– Processing time
– Total number of data scan

DSM-TKP
– A single-pass algorithm
– An effective data structure, TKP-forest
2
Problem Statement

Click Sequence
–

Forward References
–

{A,B,C,D,C,B,E,G,H,G,W,A,O,U,
O,V}
1,2,3,6,7,8,10,12,13,15
Backward References
–
4,5,9,11,14
3
Problem Statement

Maximal:a reference sequence is not a
substring of any others.

Maximal Forward References(MFR)
– {A,B,C,D}
– {A,B,E,G,H}
– {A,B,E,G,W}
– {A,O,U}
– {A,O,V}
4
Problem Statement


Web Click:
– WC= (Uid, r)
Stream of We Click:
– S=[(Uid,r)1,(Uid,r)2,…,(Uid,r)n]
S=[(100,a),(100,b),(200,a),(100,c),(200,b),(200,c),
(100,d),(100,e),(200,a),(200,e)]
<100,abcde><200,abcae>
MFR:<abcde><abc><ae>
5
Data Structure: TKP-forest

TKP-forest consists
–
–
1.KR-list
2.LP-tree
r1
r2
r3
r1
r2
r3
……
rk
rk
6
Algorithm

Step 1:read a MFR from buffer

Step 2:construct TKP-forest

Step 3:prune and maintain TKP-forest

Step 4:find the path traversal patterns from TKP-forest
7
Example-construction
<abcde>
<abcde>
<abcde>
<abcde>
<acd>
<acd>
<acd>
a:2:1 c:3:1
a:2:1
a:1:1
c:2:1
c:3:1
b:1:1
b:1:1
d:2:1
a:2:1
d:2:1
c:2:1
c:1:1
b:1:1
b:1:1
d:2:1
d:2:1
d:1:1
e:1:1b:1:1
e:2:1
e:2:1e:1:1
e:1:1
f:1:3
f:1:3
a:2:1 c:3:1
a:2:1
a:1:1
c:2:1
b:1:1
c:3:1
b:1:1
d:2:1
a:2:1
d:2:1
c:2:1
c:1:1
b:1:1
d:2:1
b:1:1
d:2:1
d:1:1
e:1:1b:1:1
e:1:1
e:2:1
e:2:1e:1:1
f:1:3
f:1:3
<cef>
c:1:1
c:1:1
b:1:1
d:2:1
b:1:1
c:1:1
e:1:1
c:1:1c:1:1
e:1:1
d:2:1d:2:1
d:2:1
b:1:1
d:1:1
e:1:1
e:1:1
c:1:1
c:1:1
e:1:1
e:1:1
e:1:1
f:1:1 f:1:1 c:1:1
d:1:1
d:1:1
c:1:1
c:1:1
e:1:1
c:1:1
d:1:1
d:1:1d:1:1
f:1:1
f:1:1
e:1:1e:1:1
e:1:1
c:1:1
e:1:1
d:1:1
d:1:1
d:1:1
d:1:1
d:1:1
d:1:1
d:1:1
e:1:1
e:1:1
d:1:1
e:1:1
e:1:1
e:1:1
e:1:1
e:1:1
e:1:1
e:1:1
e:1:1
8
Example-pruning
<abcde>
<acd>
<cef>
<acdf>
K=3
K=3
f:1:1
f:1:1
c:4:1
c:3:1
c:3:1
a:3:1
a:2:1 a:2:1
d:3:1
d:2:1d:2:1
e:1:1e:1:1
b:1:1
e:2:1 f:1:3f:1:3
b:1:1
f:2:3
c:4:1
c:4:1
a:3:1
a:3:1 a:3:1
d:3:1
d:3:1d:3:1
e:2:1e:2:1
b:1:1
e:2:1 f:2:3f:2:3
b:1:1
f:2:3
d:3:1
d:3:1
e:1:1
e:1:1
c:2:1 c:3:1b:1:1
b:1:1 e:1:1e:1:1
c:2:1
e:1:1f:1:1f:1:1
f:1:1 f:1:1f:1:1
f:1:1
c:1:1
c:1:1
e:1:1
e:1:1
f:1:1
f:1:1
d:2:1
d:2:1 d:3:1c:1:1
c:1:1
d:1:1
d:1:1
f:1:1
f:1:1
f:1:1
e:1:1
e:1:1
d:1:1
d:1:1
e:1:1
e:1:1
e:1:1
9
Example-finding the Top-K patterns
The top-3 path traversal patterns are :<acd:3><ef:3><df:2><cef:2>
10
Memory Analysis

KR-list:k
k  j / 2

LP-tree:  Ci
j / 2  i
j 1 i 1
k  j / 2

Total space bound: O(k    Ci
 j / 2  i
)
j 1 i 1
11
Experiments

Compare with FS and SS
(M.-S. Chen ,J.-S.Park and P. S.Yu ,Efficient Data
Mining for Path Traversal Patterns,1998)
–
–
faster speed:one scan
smaller space:no candidate generation
12
Related documents