Download Effective Prediction of Web-user Access: A Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Effective Prediction of Web-user
Accesses: A Data Mining Approach
Nanopoulos Alexandros
Katsaros Dimitrios
Yannis Manolopoulos
Aristotle Univ. of Thessaloniki, Greece
Presentation:
Spyros Papadimitriou, Carnegie Mellon Univ.
WebKDD 2001
Aristotle University of Thessaloniki
1
Introduction (1/2)
• Web Prefetching:
Deducing forthcoming
user accesses based on log information
• Focus on:
– Predictive prefetching (use of history)
– Server initiated (server makes
predictions and piggybacks them to the
clients)
WebKDD 2001
Aristotle University of Thessaloniki
2
Introduction (2/2)
• Within a site, users navigate
following links [5]
• For server-initiated predictive
prefetching interest is for access
patterns reflecting this behavior
WebKDD 2001
Aristotle University of Thessaloniki
3
Outline
•
•
•
•
Motivation & Related work
Proposed method
Comparative performance evaluation
Conclusions
WebKDD 2001
Aristotle University of Thessaloniki
4
Presentation Outline
•
•
•
•
Motivation & Related work
Proposed method
Comparative performance evaluation
Conclusions
WebKDD 2001
Aristotle University of Thessaloniki
5
Requirements
•
Site structure and contents impose
1.
The order of dependencies (first or higher)
among the documents
2. The interleaving of documents belonging to
patterns with random visits (noise)
•
Discovered patterns should respect these
factors
WebKDD 2001
Aristotle University of Thessaloniki
6
Related work
• Dependency graph (DG) [9]
– A graph maintains pairwise accesses
• Prediction by Partial Match (PPM) [10]
– A trie maintains sequences of consecutive
accesses
• LBOT [6]
– Special form of association rules of length 2
• Others (variations of the above) [3,11]
WebKDD 2001
Aristotle University of Thessaloniki
7
Motivation
Order
(1st Req.)
Noise
(2nd Req.)
DG
No
Yes
PPM
Yes
No
LBOT
No
No
Yes
Yes
Proposed
WebKDD 2001
Aristotle University of Thessaloniki
8
Presentation Outline
•
•
•
•
Motivation & Related work
Proposed method
Comparative performance evaluation
Conclusions
WebKDD 2001
Aristotle University of Thessaloniki
9
Proposed Method (1)
• Novel Web log mining algorithm
(WMo)
– Apriori-like
– Effective
• Immune to noise
• Considers high order dependencies
– Efficient
• Significant reduction in the number of
candidates
WebKDD 2001
Aristotle University of Thessaloniki
10
Proposed Method (2)
• Session (or transaction): A sequence of
requests that occur in a specified time
interval from each other [2]
• Containment relationship addresses the 1st
requirement (avoiding noise)
• Example:
T = A, X, B, Y, C
X, Y noise
S = A, B, C
the pattern
S is contained by T
• Comment:With contiguous subsequences
based only on support S (the pattern) will be
missed.
WebKDD 2001
Aristotle University of Thessaloniki
11
Proposed Method (3)
• Candidate generation respects the
ordering of accesses in transactions.
• Example: A,B  B,A
• Dramatic increase in the number of
candidates
• Exploits the site structure for pruning
[7,8]
WebKDD 2001
Aristotle University of Thessaloniki
12
Proposed Method (4)
Algorithm genCandidates(Lk, G)
//Lk the set of large k-paths and G the graph
begin
foreach L=l1, …, lk, L  Lk {
N+(lk) = {v|  arc lk v  G}
foreach v  N+(lk) {
//apply modified apriori pruning
if v  L and L’ = l2, …, lk,v  Lk {
C= l1, …, lk , v
if ( S  C, S  L’  S  Lk )
insert C in the candidate-trie
}
}
}
end
WebKDD 2001
Aristotle University of Thessaloniki
13
Discussion
• Sequential patterns [1]
•
– Reduction when “customer-sequence” = “user-session”
– Suffers from large number of candidates (by not
considering the site structure)
Path Fragments [4] (containment relationship is
performed with regular expressions and the “*” label )
– Focus on semantics (recommendation systems)
• Prefetching: patterns are for system and not
for human consumption
• WMo focuses on efficiency/effectiveness
rather on expressiveness (semantics)
WebKDD 2001
Aristotle University of Thessaloniki
14
Presentation Outline
•
•
•
•
Motivation & Related work
Proposed method
Comparative performance evaluation
Conclusions
WebKDD 2001
Aristotle University of Thessaloniki
15
Methodology
• Synthetic (sample site with 1000 nodes)
– Synthetic data generator (see the paper)
• Modeling site nodes, site linkage, size of
documents
• Real data sets (see the paper)
• Examine the impact of:
–
–
–
–
noise
order
client cache (see the paper)
efficiency
WebKDD 2001
Aristotle University of Thessaloniki
16
Accuracy w.r.t. noise
0.4
0.35
DG
PPM
WM
WMo
LBOT
0.3
0.25
0.2
0.15
0.1
1.6
WebKDD 2001
1.8
2
2.2
2.4
mean noise
2.6
Aristotle University of Thessaloniki
2.8
3
17
Usefulness w.r.t. noise
0.2
DG
PPM
WM
WMo
LBOT
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
1.6
WebKDD 2001
1.8
2
2.2
2.4
mean noise
2.6
Aristotle University of Thessaloniki
2.8
3
18
Traffic w.r.t. noise
1.7
DG
PPM
WM
WMo
LBOT
1.65
1.6
1.55
1.5
1.45
1.4
1.35
1.3
1.25
1.6
WebKDD 2001
1.8
2
2.2
2.4
mean noise
2.6
Aristotle University of Thessaloniki
2.8
3
19
Accuracy w.r.t. order
0.4
0.35
0.3
DG
PPM
WM
WMo
LBOT
0.25
0.2
0.15
0.1
0.1
WebKDD 2001
0.2
0.3
0.4
0.5
0.6
0.7
higher order percentage
Aristotle University of Thessaloniki
0.8
0.9
20
Usefulness w.r.t. order
0.18
0.16
0.14
DG
PPM
WM
WMo
LBOT
0.12
0.1
0.08
0.06
0.04
0.1
WebKDD 2001
0.2
0.3
0.4
0.5
0.6
0.7
higher order percentage
Aristotle University of Thessaloniki
0.8
0.9
21
Traffic w.r.t. order
1.65
1.6
DG
PPM
WM
WMo
LBOT
1.55
1.5
1.45
1.4
1.35
0.1
WebKDD 2001
0.2
0.3
0.4
0.5
0.6
0.7
higher order percentage
Aristotle University of Thessaloniki
0.8
0.9
22
Efficiency (see also [7,8])
1.1e+006
1e+006
WM
WMo/wp
WMo
900000
800000
700000
600000
500000
400000
300000
200000
100000
0
0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26
support threshold (percentage)
WebKDD 2001
Aristotle University of Thessaloniki
23
Presentation Outline
•
•
•
•
Motivation & Related work
Proposed method
Comparative performance evaluation
Conclusions
WebKDD 2001
Aristotle University of Thessaloniki
24
Conclusions
• Factors that influence Web Prefetching
– Noise
– Order
• A new algorithm WMo was presented based
on data mining
• Compares favorably with previously
proposed algorithms
• WMo is an effective and efficient Web
prefetching algorithm
WebKDD 2001
Aristotle University of Thessaloniki
25
References
1.
2.
R.Agrawal, Ramakrishnan Srikant, Mining Sequential Patterns, ICDE 1995.
R.Cooley, B. Mobasher, J.Srivastava, Data Preparation for Mining World Wide Web
Browsing Patterns, KAIS, 1(1), pp. 5-32, 1999.
3. M. Deshpande, G. Karypis, Selective Markov Models for Predicting Web-page Accesses,
SIAM Data Mining, 2001.
4. W.Gaul, L.T.Schimdt-Thieme, Mining Web Navigation Path Fragments, WebKDD 2000.
5. B. A. Huberman, P. Pirolli, J. Pitkow and R. J. Lukose, Strong Regularities in World
Wide Web Surfing. Science, 280, pp. 95-97, 1998.
6. B.Lan, S.Bressan, B.C. Ooi, Y.Tay, Making Web Servers Pushier, WebKDD 1999.
7. A. Nanopoulos, Y. Manolopoulos, Finding Generalized Path Patterns for Web Log Data
Mining, ADBIS-DASFAA 2000.
8. A. Nanopoulos, Y. Manolopoulos, Mining patterns from graph traversals, DKE 37(3),
pp.243-266, 2001.
9. V.Padmanabhan, J. Mogul, Using Predictive Prefetching to Improve World Wide Web
Latency, ACM SIGCOMM Computer Communications Review, 26(3), 1996.
10. T.Palapans, A.Mendelzon, Web Prefetching Using Partial Match Prediction, WCW
1999.
11. J. Pitkow, P. Pirroli, Mining Longest Repeating Subsequences to Predict World Wide
Web Surfing, USITS, 1999.
12. L.T.Schimdt-Thieme, W.Gaul, Recommender Systems Based on Navigation Path
Features, WebKDD 2001.
WebKDD 2001
Aristotle University of Thessaloniki
26
Related documents