Download Operating System Support for Database Management

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data center wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Data vault modeling wikipedia , lookup

Information privacy law wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Business intelligence wikipedia , lookup

Data mining wikipedia , lookup

Semantic Web wikipedia , lookup

Transcript
Prasanna K. Desikan
E-mail: [email protected]
CSCI 8701
ID# 1916156
Date: 02/07/2002
Web Usage Mining: Discovery and Applications of Usage patterns from
Web Data
The paper divides Web Usage mining into three phases – preprocessing, pattern
discovery and pattern analysis and describes each phase in detail. Web Usage mining is
defined as the process of applying data mining techniques to the discovery of usage
patterns from the web data, targeted towards various applications. Web mining can be
divided broadly into three classes, i.e content mining, usage mining and structure mining.
In Web mining, data can be collected at the server-side, client-side, proxy-servers or
obtained from an organizations database. The data that can be used in Web mining can be
classified into four kinds as: content, structure, usage and user profile. The server side
collection of data reflects the access of a website by multiple users. The data can be
stored by packet sniffer technology or by using cookies and query data. Client side data
collection can be implemented by using a remote agent (such as javascripts or java
applets) or by modifying the source code of the existing browser to enhance its data
collection capabilities. These methods will collect only single-site single user browsing
behavior. A web proxy that acts as an intermediate level of caching between client
browsers and web servers can also be used for data collection. The performance of proxy
caches depends on their ability to predict future page requests correctly. The information
provided by the data sources described above can all be used to construct/identify several
data abstractions like users, server sessions, episodes, click streams and page views.
The first main task of peforming a Web Usage mining is the preprocessing.. It consists of
converting the usage, content, and structure information contained in various available
data sources into the data abstractions necessary for pattern discovery. The usage
preprocessing is difficult due to incompleteness of the available data. Assuming each user
has been identified, the click stream for each user must be divided into sessions using
timeouts. Content preprocessing often consists of performing classification or clustering.
Dynamic page views present a challenge. The structure of site is created by the hypertext
links between page views. For pattern discovery some of the techniques discussed are:
Statistical Analysis, Association Rules, Clustering, Classification, Sequential Patterns and
Dependency Modeling. The motivation for pattern analysis is to filter out uninteresting
rules or patterns from the set found in pattern discovery.
A section is devoted to describe the dimensions and application areas that can be used to
classify Web Usage Mining projects. An overview of the WebSIFT system designed to
perform Web usage mining from server logs is presented.Finally the privacy issues
concerned are also addressed.
The paper has attempted to provide an up-to-date survey of the rapidly growing area of
Web Usage mining and also signifies the need to analyze Web usage data to understand
Web usage, and apply the knowledge to better serve users.