Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
DSM-TKP: Mining Top-K Path
Traversal Patterns
over Web Click-Streams DSM-TKP:
Mining Top-K Path Traversal Patterns
over Web Click-Streams
Hua-Fu Lia, Suh-Yin Lee, and Man-Kwan Shanb
Proceedings of of the 2005 IEEE/WIC/ACM International
Conference on Web Intelligence (WI 2005)
Advisor:Jia-Ling Koh
Speaker:Chun-Wei Hsieh
1
Introduction
The challenge of Data stream
– Limited memory
– Processing time
– Total number of data scan
DSM-TKP
– A single-pass algorithm
– An effective data structure, TKP-forest
2
Problem Statement
Click Sequence
–
Forward References
–
{A,B,C,D,C,B,E,G,H,G,W,A,O,U,
O,V}
1,2,3,6,7,8,10,12,13,15
Backward References
–
4,5,9,11,14
3
Problem Statement
Maximal:a reference sequence is not a
substring of any others.
Maximal Forward References(MFR)
– {A,B,C,D}
– {A,B,E,G,H}
– {A,B,E,G,W}
– {A,O,U}
– {A,O,V}
4
Problem Statement
Web Click:
– WC= (Uid, r)
Stream of We Click:
– S=[(Uid,r)1,(Uid,r)2,…,(Uid,r)n]
S=[(100,a),(100,b),(200,a),(100,c),(200,b),(200,c),
(100,d),(100,e),(200,a),(200,e)]
<100,abcde><200,abcae>
MFR:<abcde><abc><ae>
5
Data Structure: TKP-forest
TKP-forest consists
–
–
1.KR-list
2.LP-tree
r1
r2
r3
r1
r2
r3
……
rk
rk
6
Algorithm
Step 1:read a MFR from buffer
Step 2:construct TKP-forest
Step 3:prune and maintain TKP-forest
Step 4:find the path traversal patterns from TKP-forest
7
Example-construction
<abcde>
<abcde>
<abcde>
<abcde>
<acd>
<acd>
<acd>
a:2:1 c:3:1
a:2:1
a:1:1
c:2:1
c:3:1
b:1:1
b:1:1
d:2:1
a:2:1
d:2:1
c:2:1
c:1:1
b:1:1
b:1:1
d:2:1
d:2:1
d:1:1
e:1:1b:1:1
e:2:1
e:2:1e:1:1
e:1:1
f:1:3
f:1:3
a:2:1 c:3:1
a:2:1
a:1:1
c:2:1
b:1:1
c:3:1
b:1:1
d:2:1
a:2:1
d:2:1
c:2:1
c:1:1
b:1:1
d:2:1
b:1:1
d:2:1
d:1:1
e:1:1b:1:1
e:1:1
e:2:1
e:2:1e:1:1
f:1:3
f:1:3
<cef>
c:1:1
c:1:1
b:1:1
d:2:1
b:1:1
c:1:1
e:1:1
c:1:1c:1:1
e:1:1
d:2:1d:2:1
d:2:1
b:1:1
d:1:1
e:1:1
e:1:1
c:1:1
c:1:1
e:1:1
e:1:1
e:1:1
f:1:1 f:1:1 c:1:1
d:1:1
d:1:1
c:1:1
c:1:1
e:1:1
c:1:1
d:1:1
d:1:1d:1:1
f:1:1
f:1:1
e:1:1e:1:1
e:1:1
c:1:1
e:1:1
d:1:1
d:1:1
d:1:1
d:1:1
d:1:1
d:1:1
d:1:1
e:1:1
e:1:1
d:1:1
e:1:1
e:1:1
e:1:1
e:1:1
e:1:1
e:1:1
e:1:1
e:1:1
8
Example-pruning
<abcde>
<acd>
<cef>
<acdf>
K=3
K=3
f:1:1
f:1:1
c:4:1
c:3:1
c:3:1
a:3:1
a:2:1 a:2:1
d:3:1
d:2:1d:2:1
e:1:1e:1:1
b:1:1
e:2:1 f:1:3f:1:3
b:1:1
f:2:3
c:4:1
c:4:1
a:3:1
a:3:1 a:3:1
d:3:1
d:3:1d:3:1
e:2:1e:2:1
b:1:1
e:2:1 f:2:3f:2:3
b:1:1
f:2:3
d:3:1
d:3:1
e:1:1
e:1:1
c:2:1 c:3:1b:1:1
b:1:1 e:1:1e:1:1
c:2:1
e:1:1f:1:1f:1:1
f:1:1 f:1:1f:1:1
f:1:1
c:1:1
c:1:1
e:1:1
e:1:1
f:1:1
f:1:1
d:2:1
d:2:1 d:3:1c:1:1
c:1:1
d:1:1
d:1:1
f:1:1
f:1:1
f:1:1
e:1:1
e:1:1
d:1:1
d:1:1
e:1:1
e:1:1
e:1:1
9
Example-finding the Top-K patterns
The top-3 path traversal patterns are :<acd:3><ef:3><df:2><cef:2>
10
Memory Analysis
KR-list:k
k j / 2
LP-tree: Ci
j / 2 i
j 1 i 1
k j / 2
Total space bound: O(k Ci
j / 2 i
)
j 1 i 1
11
Experiments
Compare with FS and SS
(M.-S. Chen ,J.-S.Park and P. S.Yu ,Efficient Data
Mining for Path Traversal Patterns,1998)
–
–
faster speed:one scan
smaller space:no candidate generation
12