Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams Hua-Fu Lia, Suh-Yin Lee, and Man-Kwan Shanb Proceedings of of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2005) Advisor:Jia-Ling Koh Speaker:Chun-Wei Hsieh 1 Introduction The challenge of Data stream – Limited memory – Processing time – Total number of data scan DSM-TKP – A single-pass algorithm – An effective data structure, TKP-forest 2 Problem Statement Click Sequence – Forward References – {A,B,C,D,C,B,E,G,H,G,W,A,O,U, O,V} 1,2,3,6,7,8,10,12,13,15 Backward References – 4,5,9,11,14 3 Problem Statement Maximal:a reference sequence is not a substring of any others. Maximal Forward References(MFR) – {A,B,C,D} – {A,B,E,G,H} – {A,B,E,G,W} – {A,O,U} – {A,O,V} 4 Problem Statement Web Click: – WC= (Uid, r) Stream of We Click: – S=[(Uid,r)1,(Uid,r)2,…,(Uid,r)n] S=[(100,a),(100,b),(200,a),(100,c),(200,b),(200,c), (100,d),(100,e),(200,a),(200,e)] <100,abcde><200,abcae> MFR:<abcde><abc><ae> 5 Data Structure: TKP-forest TKP-forest consists – – 1.KR-list 2.LP-tree r1 r2 r3 r1 r2 r3 …… rk rk 6 Algorithm Step 1:read a MFR from buffer Step 2:construct TKP-forest Step 3:prune and maintain TKP-forest Step 4:find the path traversal patterns from TKP-forest 7 Example-construction <abcde> <abcde> <abcde> <abcde> <acd> <acd> <acd> a:2:1 c:3:1 a:2:1 a:1:1 c:2:1 c:3:1 b:1:1 b:1:1 d:2:1 a:2:1 d:2:1 c:2:1 c:1:1 b:1:1 b:1:1 d:2:1 d:2:1 d:1:1 e:1:1b:1:1 e:2:1 e:2:1e:1:1 e:1:1 f:1:3 f:1:3 a:2:1 c:3:1 a:2:1 a:1:1 c:2:1 b:1:1 c:3:1 b:1:1 d:2:1 a:2:1 d:2:1 c:2:1 c:1:1 b:1:1 d:2:1 b:1:1 d:2:1 d:1:1 e:1:1b:1:1 e:1:1 e:2:1 e:2:1e:1:1 f:1:3 f:1:3 <cef> c:1:1 c:1:1 b:1:1 d:2:1 b:1:1 c:1:1 e:1:1 c:1:1c:1:1 e:1:1 d:2:1d:2:1 d:2:1 b:1:1 d:1:1 e:1:1 e:1:1 c:1:1 c:1:1 e:1:1 e:1:1 e:1:1 f:1:1 f:1:1 c:1:1 d:1:1 d:1:1 c:1:1 c:1:1 e:1:1 c:1:1 d:1:1 d:1:1d:1:1 f:1:1 f:1:1 e:1:1e:1:1 e:1:1 c:1:1 e:1:1 d:1:1 d:1:1 d:1:1 d:1:1 d:1:1 d:1:1 d:1:1 e:1:1 e:1:1 d:1:1 e:1:1 e:1:1 e:1:1 e:1:1 e:1:1 e:1:1 e:1:1 e:1:1 8 Example-pruning <abcde> <acd> <cef> <acdf> K=3 K=3 f:1:1 f:1:1 c:4:1 c:3:1 c:3:1 a:3:1 a:2:1 a:2:1 d:3:1 d:2:1d:2:1 e:1:1e:1:1 b:1:1 e:2:1 f:1:3f:1:3 b:1:1 f:2:3 c:4:1 c:4:1 a:3:1 a:3:1 a:3:1 d:3:1 d:3:1d:3:1 e:2:1e:2:1 b:1:1 e:2:1 f:2:3f:2:3 b:1:1 f:2:3 d:3:1 d:3:1 e:1:1 e:1:1 c:2:1 c:3:1b:1:1 b:1:1 e:1:1e:1:1 c:2:1 e:1:1f:1:1f:1:1 f:1:1 f:1:1f:1:1 f:1:1 c:1:1 c:1:1 e:1:1 e:1:1 f:1:1 f:1:1 d:2:1 d:2:1 d:3:1c:1:1 c:1:1 d:1:1 d:1:1 f:1:1 f:1:1 f:1:1 e:1:1 e:1:1 d:1:1 d:1:1 e:1:1 e:1:1 e:1:1 9 Example-finding the Top-K patterns The top-3 path traversal patterns are :<acd:3><ef:3><df:2><cef:2> 10 Memory Analysis KR-list:k k j / 2 LP-tree: Ci j / 2 i j 1 i 1 k j / 2 Total space bound: O(k Ci j / 2 i ) j 1 i 1 11 Experiments Compare with FS and SS (M.-S. Chen ,J.-S.Park and P. S.Yu ,Efficient Data Mining for Path Traversal Patterns,1998) – – faster speed:one scan smaller space:no candidate generation 12