Download Applying User Profile Ontology for Mining Web Site

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Tallinn University of Technology
Department of Computer Engineering
Applying User Profile Ontology for
Mining Web Site Adaptation
Recommendations
Tarmo Robal, Ahto Kalja
[email protected], [email protected]
Outline
 Introduction
» Web Mining & Adaptive Web Sites
» Recommender Systems
 Web Usage data Capturing
 User Profiles Extraction
 Recommendations Generation
 Summary
2
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Introduction
 The electronic age
» Internet – enourmous source of information
» Competition over users
» Browsing affected by many factors
 System feedback
» What is actually going on within the system
» Observe users’ actions & preferences
Constant need for web improvement!
3
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Reaching the Aim
 Make browsing easier - better user experience
 Collect usage data
» Exploit a log system
 Apply web mining techniques on the collected data to:
» Analyse & Reason
 Employ the mining results
» Construct users’ profile ontology
» Adaptive websites & Recommender systems
 Continue collecting usage data
4
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Introduction
 Research based on the access data of the website of
our department
»
»
»
»
»
Dynamic website
Run by system kernel developed at our lab
Witholds 118 pages
Average access rate 250 sessions daily
Average number of operations
per session 1.9
(4.3 in sessions with more
than 1 page request)
» http://ati.ttu.ee
5
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Web Mining
... is the use of data mining techniques to automatically
discover and extract information from Web documents
and services (Perkowitz and Etzioni 2001)
 Content Mining
discovery of document content patterns
 Structure Mining
discovery of hypertext/linking structure patterns
 Usage Mining
discovery of access patterns
 Profile Mining
discovery of user profiles
6
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Adaptive Websites
... sites that automatically improve their organization and
presentation by learning from visitor access patterns
 Tactical
» Adaptions triggered in real time
» Adding value to provided information
» Highlightning items
» Recommending items
» Easier browsing
 Strategic
» Adaptions triggered on the structure
» Offline & with approval
Towards enhanced web experience!
7
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Recommender Systems




To assist users during browsing
Improved user experience
More relevant information for the user
Based on site’s usage:
» Transparent i.e. general
» Personalized (i-banking)
 Why?
» Constant competition over rating
» Marketing, e-commerce, information portals, ...
8
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Recommender Systems
 Users implicitly use a concept model based on their
own knowledge of the domain or topic searched, even
though mostly they do not know how to represent it!
(Li & Zhong)
 If we are able to track down users’ actions, we are also
able to produce dynamically discovered
recommendations
 Step towards intelligent web
 Basis for adaptive web
9
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Collecting Web Usage Data
 Explicit data collection
 Implicit data collection
»
»
»
»
Transparent to end-user
Monitor accessed pages
Time spent on a particular page
Discover navigational paths
 Need for a special log system
» Ability to capture distinct and recurrent user
sessions
10
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Web Server Logs NOT Suitable?
 Reasons:
»
»
»
»
»
»
»
11
suffer from insufficiencies
do not allow to identify visitor sessions
impossible to track recurrent visits
no information about users’ screen resolution
are not kept for a long period of time
are of large size
a lot of detailed information about every element
accessed on the web server
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
The Log System
 Data collected:



»
»
12
Page requested
Client identifier (session ID)
Request time
IP and host
Browser and OS
Tarmo Robal, Ahto Kalja
»
»
»
»
»
Query method and query string
Site referrer
Page load time and server load
Recurrent visit ID (session ID)
Screen resolution
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
User Profiles Extraction
 Construct user navigational paths from
session data s=<pi, pi+1,…pn>
pi  P
» 269 782 paths
 Apply further processing
» 87 953 paths
 Apply the Locality Model onto discovered
paths
» Extract localities L
L = pj, pj+1, … pm,
where pj  pj+1  …  pm
» Size of locality window w?
13
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
The Locality Model
 If a large number of users frequently access a set of
pages, then these pages must be related
 The locality L is defined by the users nearest sequential
activity history within the site during a session
 L is constructed based on navigational paths
 Users are moving from one locality L to another, which
can be represented by the w latest operations
(requests for pages)
L
100 – 400 –
410 – 400 – 410 - 4110 – 410 – 460 – 430
w
w
w
14
...
Tarmo Robal, Ahto Kalja
w=3
L=CalculateLocality(st,w)
N=FindNextItem(st,w)
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
The Locality Model
 What’s the size of w?
w has to cover a rationale
amount of page requests
 Attributes observed:
» cover percentage for the number of combinations computed
from the paths
» average frequency of finding these combinations in paths
» average number of possible localities in path
» the availability of next item for each locality (progress)
 The size of w is correllated to the absolute menu depth
Properties observed
(1) Combination coverage [%]
(2) Combination frequency
(3) No of localities in path
(4) Availability of next item [%]
15
Tarmo Robal, Ahto Kalja
Studied window size w
2
3
4
5
31.2 35.5 20.7 12.6
1.1
1.0
1.0
1.0
6.3
6.6
6.5
5.9
76.6 77.4 74.1 76.3
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
User Profiles Extraction
User Session from DW
Navigational path sequence construction
100 – 400 – 410 – 410 – 400 – 410 -4110 – 410 – 460 – 430 – 430
Path minimization –
100 – 400 –
removal of redundant operations
410 – 400 – 410 - 4110 – 410 – 460 – 430
Filtering of non-relevant paths
100 – 400 –
(e.g. paths with 1 item)
410 – 400 – 410 - 4110 – 410 – 460 – 430
User Session
100
400
410
410
4110
410
460
430
430
Extracting localities L with size w
100 – 400 – 410
400 – 410 – 400
410 – 400 – 410
400 – 410 - 4110
100 – 400 – 410
Removal
of cyclic
localities
400 – 410 - 4110
410 - 4110 – 410
4110 – 410 – 460
4110 – 410 – 460
410 – 460 – 430
410 – 460 – 430
Extracted user
profiles
Ontology
16
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
User Profiles Ontology
Frequent user
profiles
discovered
from web
usage
Predefined
user profile
classes
Mapping of
Extracted Profiles
onto Concepts of
Web Ontology
Concepts of
Web Ontology
for ati.ttu.ee
17
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Inferred Ontology
Definitions for
predefined user
profile classes
Profiles inferred for
predefined user
profile class
18
Users profiled as Students
are interested in ...
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Producing Recommendations
 RE determines the type of user online
 RE computes recommendations
» User’s recent actions
» Knowledge from ontologies
» Page ranking
 Pages ranked with inverse time weighting
algorithm
n
Interest value(i)
Rank  p 
Age (i)
i 1
19
Tarmo Robal, Ahto Kalja
No. of hits during Age(i)
Days into the past
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Producing Recommendations
LOG SYSTEM
Usage Data
Capturing
Refined Topology
Data Mining
Recommended
Sub-Topology
Tactical Adaption
Detection of Locality
Window Size w
Strategic Adaption
Extracted
User Profiles
Web Ontology
Web Site
Ontology
MAPPING
Ranked
Pages
Recommendation
Engine (RE)
Profiles Ontology
20
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
The Locality of User Online (recent w actions)
Web Site
Tactical Recommendations
 Raising / highlightning items during
user’s online session
 Adding recommended items to existing
topology
 Providing sub-topologies for targeted
user groups
 Enhanced (semi-personalized)
User Experience
21
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Strategic Recommendations
 Deriving recommendations for general
site improvement to adjust sites to their
users preferences
 Long-term
 Discovering related page-sets according
to users preferences
 Improved Site Structure
22
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Conclusions
 Monitoring users actions and producing
concept models based on that enables to
» Classify a user as an individual into one of
the conceptual user groups (predefined user
profiles)
» Produce recommendations that correlate to
that particular individual
» Tactical recommendations
» Strategic recommendations
23
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Summary
 Introduction
» Web Mining & Adaptive Web Sites
» Recommender Systems
 Web Usage data Capturing
 User Profiles Extraction
 Recommendations Generation
 Summary
24
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Tallinn University of Technology
Department of Computer Engineering
Thank you!
Questions?
26
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007