Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining linguistic browsing patterns in the world wide web Authors: Tzung-Pei Hong, Kuei-Ying Lin, Shyue-Liang Wang Soruces: Soft Computing –A Fusion of Foundations, Methodlogies and Application, Vol. 6, No. 5, pp. 329 – 336, August 2002 Speaker: Hui-Lin Weng Date : 12/13/2005 1 Outline • Introduction – Web-content mining – Web-usage mining • • • • • The proposed algorithm The fuzzy data mining approach Example of fuzzy web mining algorithm Conclusions Comments 2 Introduction • Web-content mining – Focus on information discovery from sources across the world wide web – e.g. mining page-keyword relations from web pages • Web-usage mining – Focus on the automatic discovery of user access patterns from web servers – e.g. mining page browsing patterns from log files 3 The proposed algorithm • A novel web-mining algorithm to find linguistic browsing behaviors from data logs on web severs. • Goals: – Use fuzzy concept to analyze the browsing time of a customer on each web page. – The algorithm focuses on the most important linguistic terms for reduced time complexity. 4 The fuzzy data mining approach • The approach is consisted of three main steps: – Step1: • Transform each quantitative value in the transaction data into a fuzzy set using the given membership function. – Step2: • Generate large itemsets by calculating the fuzzy cardinality of each candidate itemset. – Step3: • Induce fuzzy association rules from the large itemsets found in step 2. 5 Example of fuzzy web mining algorithm (1/9) • Input – Log data • Include date、time、client-ip、file name – Membership function • For converting browsing durations into linguistic terms – Min-sup • Output – Fuzzy browsing patterns 6 Example of fuzzy web mining algorithm (2/9) • Step 1: – The following file names are selected • .asp, .htm, .html, .jva, .cgi and closing connection – The following four fields are kept • date, time, cilent-ip and file-name 7 Example of fuzzy web mining algorithm (3/9) • Step2: – The values of field client-ip are transformed into contiguous integers for convenience • Step3: – The log data sorted first by encoded client ID and then by date and time 8 Example of fuzzy web mining algorithm (4/9) • Step 4: – The time durations of the web pages browsed by each encoded client ID are calculated • e.g. 2001/03/01, 05:39:56 – 2001/03/01, 05:40:26,the time duration is 30 seconds. • Step 5: – The web pages browsed by each client are listed to form browsing sequence 9 Example of fuzzy web mining algorithm (5/9) • Step 6: – The time durations are represented as fuzzy sets • Using the given membership functions • e.g. the second item (B, 30) in Client 1 (0.8 / B.Short + 0.2 / B.Middle) 10 Example of fuzzy web mining algorithm (6/9) • Step 7: – The maximum membership value for each region in each sequence is found • e.g. client 2: (0.2/D.Short + 0.8/D.Middle) (0.8/B.Short + 0.2/B.Middle) (0.6/D.Middle + 0.4/D.High) • D.Middle:max(0.8, 0.0, 0.6)=0.8 • Step 8: – The support value of each region is calculated • e.g. D.Middle:client 1: max(0,0,0.6,0)+client 2: max(0.8,0,0.6)+client 3: max(0,0.8)+client 4: max(0,0,0,0,0)+clinet 5: max(1.0,0,0)+client 5: max(1.0,0,0,0)=0.6+0.8+0.8+0.0+1.0+1.0=4.2 11 Example of fuzzy web mining algorithm (7/9) • Step 9~11: – Large 1-sequences are generated • e.g. Assume Min-sup: 2 • B.Short, C.Middle, D.Middle • Step 12~15: – Large k-sequences are generated (candidate 2-itemsets) • • • • B.Short, C.Middle C.Middle, B.Short B.Short, D.Middle D.Middle, B.Short 12 Support value of composite regions • The support value of each composite region is calculated – For example:client 4 (B.Short, C.Middle) – (1.0/B.Short) (0.6/C.Middle + 0.4/C.High) (0.2/E.Middle + 0.8/E.High) (1.0/B.Short) (0.6/C.Short + 0.4/C.Middle) – max[min(1.0, 0.6), min(1.0, 0.4)] = 0.6 Client ID Membership value of (B.Short, C.Middle) 1 0.8 2 0.0 3 0.0 4 0.6 5 0.8 6 0.0 The support value:0.8+0.0+0.0+0.6+0.8+0.0=2.2 13 Example of fuzzy web mining algorithm (8/9) • Large 2-sequences: – (B.Short, C.Middle) – (D.Middle, B.Short) – (D.Middle, C.Middle) • In this example, no large 3-sequences exist. 14 Conclusions • The duration of each web page browsed by a client is calculated from the time durations are numeric • The web-mining of authors uses fuzzy concepts to form linguistic terms that can reduces its time complexity. 15 Comments • How to deal with the problem when the user idle? • How to deal with the problem of the user only stay several seconds? 16 Thanks for your listening! 17