Download Web clickstream analysis to understand customer behaviour

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SeUGI 19, Florence
June 1st, 2001
Web clickstream analysis to
understand customer
behaviour
Erika Blanc and Paolo Giudici
UNIVERSITY OF PAVIA
SUMMARY
AIM: to combine sequence rules with statistical association
methods to obtain valuable info on e-consumers behaviour
from logfiles data, and to show how this combined analyis
can be carried out in SAS and SAS Enterprise Miner.
•
•
•
•
•
In the presentation we shall show:
The available data (source: SAS Italy)
Use of support and confidence rules
Use of odds ratios and corresponding confidence intervals
Symmetric models for sequences (graphical loglinear
models)
Asymmetric models for sequences (probabilistic expert
systems)
THE DATA
Logfile of an e-commerce site. Below are some of the
250711 observations describing visit behaviour of 22527
visitors to the 36 pages of the site.
DERIVED DATASET
For each visitor, 36 binary variables describing visit/non visit to each
page. This dataset, where order of visits is lost, will be used to calculate
association rules and statistical measures.
CLASSICAL SEQUENCE AND
ASSOCIATION RULES
N A⇒ B
• Support ( A ⇒ B) =
N
• Confidence ( A ⇒ B) =
N A⇒ B
NA
=
support ( A⇒B)
support ( A)
The previous rules, implemented in SAS Enterprise Miner,
are applicable to both associations and sequences, yet with
a different meaning
HIGHEST CONFIDENCE 2-SEQUENCE
RULES FOUND IN THE DATA
HIGHEST CONFIDENCE
ASSOCIATION RULES FOUND IN THE
DATA
HIGHEST CONFIDENCE N-SEQUENCE
RULES FOUND IN THE DATA
GRAPHICAL REPRESENTATION FOR
SEQUENCE RULES
STATISTICAL ASSOCIATION MEASURES:
ODDS RATIOS
P(Y = 1 | X = 1)
θ1 P(Y = 0 | X = 1)
θR = =
θ 0 P(Y = 1 | X = 0)
P(Y = 0 | X = 0)
Interpretation:
q >1
POSITIVE ASSOCIATION
q =1
NO ASSOCIATION
q <1
NEGATIVE ASSOCIATION
R
R
R
ODDS RATIOS: AN EXAMPLE
GRAPHICAL REPRESENTATION
FOR ODDS RATIOS
For odds ratios, confidence intervals can be built, giving a more correct
evaluation of associations.
If the confidence interval for an odds ratio contains the value of 1, the
association is not significative. Therefore, in the graphical representation,
NO link will be inserted between the two corresponding nodes.
Otherwise, if the value of 1 is outside the confidence interval for the
odds ratio, a link can be inserted. However, we shall insert only links
describing positive associations.
GRAPHICAL REPRESENTATION FOR
ODDS RATIOS
COMPARISON
oDDS RATIOS-SEQUENCE RULES
Odds ratios are symmetric (no order) and measure associations.
Can be easily accompanied by inferential models (e.g.
confidence intervals), giving a variability assessment.
Sequence rules can be asymmetric (order taken into account)
and measure dependencies. Cannot be related (yet) to
inferential models. We are working on this.
COMPARISON ODDS RATIOSCONFIDENCE MESURES FOR
ASSOCIATIONS
Page associations
(A,B)
Odds ratio
Confidence (A-B) (%)
Confidence (B-A) (%)
freeze*pay_req
pay_req*pay_res
addcart*freeze
download*shelf
addcart*pay_req
freeze*pay_res
register*regpost
addcart*pay_res
p_info*product
login*logpost
download*pay_re
addcart*product
logpost*pay_req
freeze*product
download*logpost
download*pay_re
cart*pay_req
logpost*pay_res
freeze*logpost
2041,72
1876,40
1616,54
911,53
686,88
629,30
543,99
289,27
141,14
22,39
18,34
13,18
11,73
11,10
10,92
10,22
9,44
9,11
8,63
67,33
66,77
78,23
99,27
52,88
45,12
65,81
35,41
99,71
68,04
58,35
97,97
39,88
97,85
81,89
60,39
49,63
26,45
70,26
99,56
99,22
99,40
41,56
99,35
99,15
98,57
98,88
57,05
85,14
43,25
36,93
79,14
29,03
20,59
30,12
54,87
78,01
52,35
PROPOSAL: A GRAPHICAL
REPRESENTATION BASED ON ODDS
RATIOS AND CONFIDENCE RULES
PROPOSAL: SYMMETRIC
GRAPHICAL MODELS FOR
CLICKSTREAM ANALYSIS
SELECTED SYMMETRIC GRAPHICAL
MODEL
PROPOSAL: DIRECTED GRAPHICAL
MODEL (PROBABILISTIC EXPERT
SYSTEM)
ACKNOWLEDGEMENTS
This work has been carried out in a stage project of Erika
Blanc, Master’s student at the University of Pavia, jointly
supervised by Sabina Silani (SAS) and Paolo Giudici.
We also thank SAS Italy for having supported us with the
data as well as with the software Enterprise Miner.
For more details on the presentation, please see:
Applied statistical methods for data mining, lecture notes
by Paolo Giudici, [email protected]
The applied research activity of our group on data mining
and risk management can be found at:
www.baystat.it/giudici/index.htm
Web clickstream analysis to understand customer behaviour
Paolo Giudici, University of Pavia, [email protected]
Erika Blanc, University of Pavia, [email protected]
With the increased competition and decreased loyalty inherent in e-commerce, it is more
imperative than ever for companies to gain, retain and grow their Web stakeholders
(customers, prospects, partners, staff, etc.). While most companies are readily aware of the
technical processes involved in setting up and maintaining a Web site, they may not know
what their visitors love or hate about their site and can only guess how the site could be
improved. With this weak state of information, it is difficult for the companies to personalize
their relationship with their stakeholders. For this reason we believe companies should
implement an e-intelligence process which involves Web server planning, click-stream
behavior, visitors profiling and purchase predicting.
In this paper we focus on a case study, developed with SAS Enterprise Miner, providing an
example of clickstream analysis, that shows the advantageuos information that can be
extracted from such a process.
The case study gives also an opportunity to emphasize the importance of mining reliable
association and sequence rules, as these can be of strong relevance to understand customer
behaviour. With respect to this viewpoint we illustrate recent research work of ours
concerning the comparison between classical association and sequence rules with statistical
methods for associations. In particular we introduce graphical models, as developed in the
artificial intelligence/probabilistic expert systems literature. We show how the latter can
bring very useful information on web customer behaviour, and illustrate ways to practically
implement them in SAS. We also compare them with respect to what currently implemented
in Enterprise Miner for clickstream analysis.
We finally remark that the case study has been developed in a Master's degree project stage
of Erika Blanc at the University of Pavia, supervised jointly by Sabina Silani (SAS) and
Paolo Giudici (University of Pavia). We acknowledge SAS Italy for both the data and the
usage of Enterprise Miner.