Download Web Usage Mining: Discovery and Applications of Usage Patterns

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Web Usage Patterns
Ryan McFadden
IST 497E
December 5, 2002
Introduction







Web Data Mining
Application Areas of Web Data Mining
Problems with Web Data Mining
Current Research
Nielsen//NetRatings
Other Issues – Privacy, Security, etc
Conclusions
Web Data Mining

Web Data Mining is the application of
data mining techniques to discover and
retrieve useful information and patterns
from the World Wide Web documents and
services.
What web data is being mined?




Content – data from Web documents –
text & graphics
Structure – data from Web Structure –
HTML or XML tags
Usage – data from Web log data – IP
addresses, date & time access
User Profile – data that is user specific –
registration and customer profile
Web Data Mining Process
Web Data Mining Process Tasks

Resource finding:


Information selection and pre-processing:


Automatically selecting and pre-processing specific
information from retrieved Web resources
Generalization:


The task of retrieving intended Web documents
Automatically discover general patterns at individual
Web sites as well as across multiple sites
Analysis:

Validation and/or interpretation of the mined patterns
Application Areas for Web
Usage Mining





Personalization
System Improvement
Site Modification
Business Intelligence
Usage Characterization
Personalization



Personalizing the Web experience for a user is
the holy grail of many Web-based applications
Dynamic recommendations to a Web user based
on a profile in addition to usage behavior
The specification to the individual of tailored
products, services, information or information
relating to products or service
System Improvement



Performance and other service quality attributes
are crucial to user satisfaction and high quality
performance of a web application is expected
Web usage mining of patterns provides a key to
understanding Web traffic behavior, which can
be used to deal with policies on web caching,
network transmission, load balancing, or data
distribution
Web usage and data mining is also useful for
detecting intrusion, fraud, and attempted
break-ins to the system
Site Modification



This application of web usage patterns involves
the attractiveness of a Web site, in terms of
content and structure
Web usage patterns or mining can provide
detailed feedback on user behavior which can
lead the Web site designer to information on
which to base redesign decisions
This could lead to future applications where the
structure and content of a Web site based on
usage patterns
Business Intelligence



Information on how customers are using a Web
site is critical information for marketers of ecommerce businesses
Customer relationship life cycle:
 Customer attraction
 Customer retention
 Cross sales
 Customer departure
Can provide information on products bought and
advertisement click-through rates
Usage Characterization



Mining of web usage patterns can help in the
study of how browsers are used and the user’s
interaction with a browser interface
Usage characterization can also look into
navigational strategy when browsing a particular
site
Web usage mining focuses on techniques that
could predict user behavior while the user
interacts with the Web
Problems with Web Data Mining


The World Wide Web is a huge, diverse and
dynamic medium for the dissemination of
information – maybe too much information to mine –
information overload – a lot of this information is
irrelevant and not indexed
Other problems with Web Data Mining:
 Finding relevant information to mine
 Personalization & mass customization is difficult
 E-commerce businesses have to know what the
customers want
Current Research

WebSIFT example

Data Mining for Intelligent Web Caching

Areas of Future Research
WebSIFT Example



Web Site Information Filter System (WebSIFT) is
a Web usage mining framework, that uses the
content and structure information from a Web
site, and identifies the interesting results from
mining usage data
Input of the mining process: server logs (access,
referrer, and agent), HTML files, optional data
Prototypical Web usage mining system
Data Mining for Intelligent
Web Caching



Application based on data warehouse technology
that is capable of adapting its behavior based on
access patterns of the clients/users
Use an algorithm to maximize the hit rate, or
percentage of requested Web entities that are
retrieved directly in cache, without requesting them
back to the origin server
This approach enhances least recently used
caching with data mining models based on historical
data, aimed at increasing the hit rate
Areas of Future Research

Data mining in the following application areas:











Electronic Commerce
Bioinformatics
Computer security
Web intelligence
Intelligent learning
Database systems
Finance
Marketing
Healthcare
Telecommunications,
And other fields
Nielsen//NetRatings

What are they?

What is the purpose?

Current NetRatings for home and work
Nielsen//NetRatings –
What are they?


This service is provided via a partnership
between NetRatings, Nielsen Media
Research and ACNielsen
The service includes an Internet audience
measurement service and they report Internet
usage estimates based on a sample of
households that have access to the Internet
Nielsen//NetRatings –
What is the purpose?


The purpose of the Nielsen//NetRatings
service is to provide a source of global
information on consumer and business usage
of the Internet
This information helps companies make
business-critical decisions
Average Web Usage at Home –
Month of October 2002, US Data
Number of Sessions per Month
23
Number of Unique Sites Visited
49
Time Spent per Month
12:06:56
Time Spent During Surfing Session
32:03:00
Duration of a Page viewed
0:55
Active Internet Universe
106,567,327
Current Internet Universe Estimate
168,366,482
Average Web Usage at Work –
Month of October 2002, US Data
Number of Sessions per Month
56
Number of Unique Sites Visited
95
Time Spent per Month
31:08:04
Time Spent During Surfing Session
33:21:00
Duration of a Page viewed
1:01
Active Internet Universe
47,844,347
Current Internet Universe Estimate
53,057,035
September 2002 Global Internet
Index Average Usage ( * Home Internet Access)
September
August
% Change
Number of Sessions per Month
19
19
1.99
Number of Unique Domains
Visited
49
48
0.77
778
785
-0.97
40
41
-2.9
10:17:45
10:17:44
0
Time Spent During Surfing
Session
0:31:44
0:32:22
-1.95
Duration of a Page Viewed
0:00:48
0:00:47
0.98
Active Internet Universe
220,444,008
218,038,452
1.1
Current Internet Universe Estimate
385,564,028
385,998,080
-0.11
Page Views per Month
Page Views per Surfing Session
Time Spent per Month
Other Issues





Privacy
Security
Intellectual Ownership
Visual Data Mining
Risk Analysis
Conclusions



Web usage and data mining to find patterns
is a growing area with the growth of Webbased applications
Application of web usage data can be used to
better understand web usage, and apply this
specific knowledge to better serve users
Web usage patterns and data mining can be
the basis for a great deal of future research
Any Questions?
References


Data Mining for Intelligent Web Caching – Francesco Bonchi, Fosca
Giannotti, Giuseppe Manco, Mirco Nanni, Dino Pedreschi, Chiara Renso,
Salvatore Ruggieri
IEEE International Conference on Data Mining -
http://www.cs.uvm.edu/~xwu/icdm.html




Nielsen//NetRatings – http://www.nielsen-netratings.com
Web Usage: Mining: Discovery and Applications of Usage Patterns
from Web Data - Jaideep Srivastava, Robert Cooley, Mukund Deshpande,
Pang-Ning Tan Dept of CSE – University of Minnesota
Web Mining: Pattern Discovery from World Wide Web Transactions Web Mining Research: A Survey – Raymond Kosala, Hendrik Blockeel
Dept of CS Katholieke Universiteit Leuven