* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Operating System Support for Database Management
Survey
Document related concepts
Transcript
Prasanna K. Desikan E-mail: [email protected] CSCI 8701 ID# 1916156 Date: 02/07/2002 Web Usage Mining: Discovery and Applications of Usage patterns from Web Data The paper divides Web Usage mining into three phases – preprocessing, pattern discovery and pattern analysis and describes each phase in detail. Web Usage mining is defined as the process of applying data mining techniques to the discovery of usage patterns from the web data, targeted towards various applications. Web mining can be divided broadly into three classes, i.e content mining, usage mining and structure mining. In Web mining, data can be collected at the server-side, client-side, proxy-servers or obtained from an organizations database. The data that can be used in Web mining can be classified into four kinds as: content, structure, usage and user profile. The server side collection of data reflects the access of a website by multiple users. The data can be stored by packet sniffer technology or by using cookies and query data. Client side data collection can be implemented by using a remote agent (such as javascripts or java applets) or by modifying the source code of the existing browser to enhance its data collection capabilities. These methods will collect only single-site single user browsing behavior. A web proxy that acts as an intermediate level of caching between client browsers and web servers can also be used for data collection. The performance of proxy caches depends on their ability to predict future page requests correctly. The information provided by the data sources described above can all be used to construct/identify several data abstractions like users, server sessions, episodes, click streams and page views. The first main task of peforming a Web Usage mining is the preprocessing.. It consists of converting the usage, content, and structure information contained in various available data sources into the data abstractions necessary for pattern discovery. The usage preprocessing is difficult due to incompleteness of the available data. Assuming each user has been identified, the click stream for each user must be divided into sessions using timeouts. Content preprocessing often consists of performing classification or clustering. Dynamic page views present a challenge. The structure of site is created by the hypertext links between page views. For pattern discovery some of the techniques discussed are: Statistical Analysis, Association Rules, Clustering, Classification, Sequential Patterns and Dependency Modeling. The motivation for pattern analysis is to filter out uninteresting rules or patterns from the set found in pattern discovery. A section is devoted to describe the dimensions and application areas that can be used to classify Web Usage Mining projects. An overview of the WebSIFT system designed to perform Web usage mining from server logs is presented.Finally the privacy issues concerned are also addressed. The paper has attempted to provide an up-to-date survey of the rapidly growing area of Web Usage mining and also signifies the need to analyze Web usage data to understand Web usage, and apply the knowledge to better serve users.