Download Operating System Support for Database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Prasanna K. Desikan
E-mail: [email protected]
CSCI 8701
ID# 1916156
Date: 02/07/2002
Grouping Web Page References into transactions for Mining World
Wide Web Browsing Patterns.
This paper identifies a model of user behavior that separates web page references into
those made for navigational purposes and those for information content purposes. It
presents a general model for identifying transactions for data mining from WWW log
data. The contributions of the paper include: defining a generic transaction identification
module, defining a user browsing behavior model, development of specific transaction
identification modules, evaluation of the different transaction identification modules.
The user browsing behavior model assumes that a given user treats a page either for
navigational purposes to find links, or for actual information or content purposes. The
paper also assumes that the page references in a server log can be readily sorted by user
identification. Each reference can then be classified as navigation or content reference.
The paper first introduces a general model for transaction identification based on dividing
a large transaction into multiple smaller ones or merging small transactions into fewer
large ones. The discussion moves on to specific transaction identification modules. The
First one, the reference length module is based on the assumption that the amount of time
a user spends on a page correlates to whether a page should be classified as a navigation
or content page for that user. In the Maximal Forward Reference module each transaction
is defined to be the set of pages in the path from the first page in the log for a user up to
the page before a backward reference is made. The time window module simply divides
the log for a user up into time intervals no larger than a specified parameter. It assumes
that meaningful transactions have an overall average length associated with them. Next,
the WEBMINER system is explained in brief. It has two main parts. The first part
includes the domain dependent process of transforming the Web data into suitable
"transaction" form. And the second part includes the, largely domain independent,
application of generic data mining techniques.
A test server log data was created and three different types of web sites were modeled for
evaluation, a sparsely connected graph, a densely connected graph, and a graph with
medium amount of connectivity. The experimental evaluation was done using created
data and real data. In both cases the reference length model performed better though the
maximal forward model did fairly well in sparse connected graphs of created data and
when the association rule algorithm was run with navigation-content transactions on real
data.
An important area of research is to develop methods of clustering log entries into user
transactions using criteria such as time differential among entries, time spent on a page
relative to the page size, and user profile information collected during
registration.