Download visual web mining - TKS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

URL redirection wikipedia , lookup

Transcript
19-03-08
1
Web mining is the use of data mining
techniques to automatically discover and
extract information from Web
documents/services.
19-03-08
2
Over 1 billion HTML pages, 15 terabytes
Wealth of information
Bookstores, restaurants, travel, malls, dictionaries, news, stock
quotes, yellow & white pages, maps, markets, .........
Diverse media types: text, images, audio, video
Heterogeneous formats: HTML, XML, postscript, pdf, JPEG,
MPEG, MP3
 Highly Dynamic
1 million new pages each day
Average page changes in a few weeks
 Graph structure with links between pages
Average page has 7-10 links



Hundreds of millions of queries per day
19-03-08
3

E-commerce
 generate user profiles
 targeted advertising
 fraud

Network Management
 performance management
 fault management

Information Retrieval
19-03-08
4
Web Mining
Web Content
Mining
19-03-08
Web Structure
Mining
Web Usage
Mining
5
Web content mining:
focuses on techniques for assisting a user in
finding documents that meet a certain criterion
(text mining)
 Web structure mining:
aims at developing techniques to take advantage
of the collective judgement of web page quality
which is available in the form of hyperlinks
 Web usage mining:
focuses on techniques to study the user
behaviour when navigating the web

19-03-08
6

Visual Web Mining (VWM) is the application of
Information Visualization techniques on
results of Web Mining in order to further
amplify the perception of extracted patterns,
rules and regularities, or to visually explore
new ones in web domain.
19-03-08
7
19-03-08
8






Webbot
Integration Engine
Data mining suite
Link analysis suite
Database
VTK
19-03-08
9



Global techniques
Geometric techniques
Feature-based techniques
The second and third have now become the
most widely used visualization methods.
19-03-08
10
The Web Knowledge Visualization and
Discovery System (WEBKVDS) is mainly
composed of two parts:
1- FootPath: for visualizing the web structure
with the different data and pattern layers.
2- Web Graph Algebra: for manipulating and
operating on the web graph objects for visual
data mining.
19-03-08
11



Web graph
Web image
Information layers
•
•
•
•

NumofVisit layer
LinkUsage layer
ViewTime layer
ProbUsage layer
Pattern layers
• Association rules
19-03-08
12
19-03-08
13
Footpath is the rendering engine of
visualization and discovery system. A web
graph is displayed by first rendering the web
image and then attributing visual
characteristics to nodes and edges such as
colour, thickness etc., to represent data from
information layers.
 Web image rendering
 Dynamic layout
19-03-08
14
Web Graph Algebra, to manipulate and produce web graphs.
Variables in our algebra are web graphs.
Operator FILTER: θ = FLTLayer,threshold(α)
Operator ADD: θ = α + β
 Operator MINUS: θ = α − β
 Operator COMMON: θ = α :: β
 Operator MINUS IN: θ = α − .β
 Operator MINUS OUT θ = α. − β
 Operator EXCEPT: θ = α _ β


19-03-08
15
VISUALIZATION
DIAGRAMS
19-03-08
Figure shows 2D
visualization with
strahler coloring.It
shows user access
paths scattering
from first page of
website (the node
in center) to cluster
of web pages
corresponding to
faculty pages,
course home pages,
etc.
16
VISUALIZATION
DIAGRAM 2
19-03-08
It is a 3D visualization
of web usage for a site.
The cylinder like part
of this figure is
visualization of web
usage of surfers as
they browse a long
HTML document
17
VISUALIZATION
DIAGRAM 3
19-03-08
Right: One can observe long
user sessions as strings
falling off clusters. Those
are special type of long
sessions when user
navigates sequence of web
pages which come one after
the other under a cluster,
e.g., sections of a long
document. In many cases
we found web pages with
many nodes connected with
Next/Up/Previous
hyperlinks.
18
VISUALIZATION
DIAGRAM 4
19-03-08
User’s browsing access
pattern is amplified by a
different coloring.
Depending on link structure
of underlying pages, we can
see vertical access patterns
of a user drilling down the
cluster, making a cylinder
shape. Also users following
links going down a
hierarchy of web pages
makes a cone shape and
users going up hierarchies,
e.g., back to main page of
website makes a funnel
shape
19
VISUALIZATION
DIAGRAM 5
19-03-08
Frequent access patterns
extracted by web mining
process are visualized as a
white graph on top of
embedded and colorful
graph of web usage.
20
VISUALIZATION
DIAGRAM 6
19-03-08
Superimposition of Web
Usage on top of Web
Structure with span tree
layout . One can easily see
what parts of the web site
was visited by users and
what parts are not
frequently used. Coloring
gives visual cue of entry
and exit points of access
paths.
21
web knowledge visualization and discovery
system visualizes multi-tier web graphs, and
with the help of the web graph algebra,
provides a powerful means for interactive
visual web mining. Moreover, we have yet to
study interesting properties such as
commutativity, associativity, or distributivity
of operators if coefficients are introduced
later in the algebra.
19-03-08
22
www.cs.rpi.edu
 www.cs.arizona.edu
 [email protected]

19-03-08
23
19-03-08
24