Download ii. literature survey

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A PROPOSAL
TO
RESEARCH
ON
“USER NAVIGATIONAL PATTERNS FOR WEB
PERSONALIZATION AND RECOMMENDATION “
PREPARED
BY
CH. MUTYALA RAO
Research Scholar (Part Time)-CSE D
OSMANIA UNIVERSITY, HYDERABAD
1
ABSTRACT
With the never-ending growth of Web services and Web-based information systems, the
volumes of click stream and user data collected by Web-based organizations in their daily
operations has reached enormous proportions analysing such huge data can help to evaluate
the effectiveness of promotional campaigns, optimize the functionality of Web-based
applications, and provide more personalized content to visitors. The Semantic Web is an
extension of the current web in which information is given well-defined meaning, better
enabling computers and people to work in cooperation.
With millions of pages available on web, it has become difficult to access relevant
information. One possible approach to solve this problem is web personalization. Web
personalization is defined as any action that customizes the information or services provided
by a web site to an individual. Web usage mining is used to discover interesting user
navigation patterns and can be applied to many real-world problems, such as improving Web
sites, making additional topic or product recommendations, user behavior studies, etc. A Web
usage mining system performs the given five major tasks: i) data gathering, ii) data
preparation, iii) navigation pattern discovery, iv). pattern analysis and visualization, and v)
pattern applications.
To have a clear and well organized website have become one of the primary objectives of
enterprises and organizations. Website administrators may want to know how they can attract
visitors, which pages are being accessed most/least frequently, which part of website is
most/least popular and need enhancement, etc. Of late, the rapid growth of the use of Internet
has made automatic knowledge extraction from server log files a necessity. Analysis of server
log data can provide significant and useful information. This can improve the effectiveness of
the Web sites by adapting the information structure to the users’ behavior. Most of the Web
Usage Mining techniques use Server log files as raw data to produce the user navigation
patterns. Along with the server access log file, we incorporate Website knowledge into the
web usage mining phases. This incorporation can lead to superior patterns. These patterns can
be used to provide set of recommendations for the web site which can be deployed by web
site administrator for website enhancement.
Keywords—Sematic Web, Web Personalization, Usage Mining, Navigational Patterns,
Pattern Analysis and Ontology
2
LIST OF CONTENTS
PAGE NUMBER
ABSTRACT
I.
INTRODUCTION
1-2
II.
LITERATURE SURVEY
3-4
III.
PROBLEM STATEMENT
5-6
IV.
OBJECTIVES
7
V.
PLAN OF ACTION
8
VI. CONCLUSION
9
VII. REFERENCES
9
3
I.INTRODUCTION
The Semantic Web provides a common framework that allows data to be shared and reused
across application, enterprise, and community boundaries. Suggested a layer structure for the
Semantic Web: (1) Unicode/URI (Uniform Resource Identifiers) (2) XML/Name
Spaces/XML Schema; (3) RDF (Resource Description Framework)/RDF Schema; (4)
Ontology vocabulary; (5) Logic; (6) Proof; and (7) Trust.
Uniform Resource Identifiers (URIs) are a fundamental component of the current Web and
are a foundation of the Semantic Web. The Extensible Markup Language (XML) is also a
fundamental component for supporting the Semantic Web. XML provides an interoperable
syntactical foundation upon which the more important issue of representing relationships and
meaning can be built. URIs provide the ability for uniquely identifying resources as well as
relationships among resources. The Resource Description Framework (RDF) family of
standards leverages URIs and XML to provide a stepwise set of functionality to represent
these relationships and meaning. An ontology is a specification of a conceptualization, which
is an abstract, simplified view of the world that we wish to represent for some purposes. For
Semantic Web, an ontology is a conceptualization of a domain into a human understandable,
but machine readable format consisting of entities, attributes, relationships, and axioms.
Proof and trust follow the understanding that it is important to be able to check the validity of
statements made in the Semantic Web. These two layers are rarely tackled today, but are
interesting topics for future research.
With the Semantic Web, the large amount of information on the Web can be shared, reused
and managed effectively. Since machine can understand the content on the Semantic Web, it
enables more advanced automated processing on the Web. Intelligent search engines can be
developed to help people find relevant information by using semantic query language. New
knowledge can be derived from existing information efficiently. Many advanced applications
and services such as e-business, e-government, and e-learning become possible.
4
With the never-ending growth of Web services and Web-based information systems, the
volumes of click stream and user data collected by Web-based organizations in their daily
operations has reached enormous proportions. Analysing such huge data can help to evaluate
the effectiveness of promotional campaigns, optimize the functionality of Web-based
applications, and provide more personalized content to visitors.
Web Usage Mining is the process of applying data mining techniques to the discovery of
usage patterns from data extracted from Web Log files. It mines the secondary data (web
logs) derived from the users' interaction with the web pages during certain period of Web
sessions. Web usage mining consists of three phases, namely pre-processing, pattern
discovery, and pattern analysis. Web Usage Mining analyses the usage patterns of web sites
in order to get an improved understanding of the users’ interests and requirements. This
information is especially valuable for E-Business sites in order to achieve improved customer
satisfaction. Web personalization is the process of customizing a Web site to the needs of
specific users, taking advantage of the knowledge acquired from the analysis of the user's
navigational behaviour (usage data) in correlation with other information collected in the
Web context, namely, structure, content, and user profile data. Due to the explosive growth of
the Web, the domain of Web personalization has gained great momentum both in the research
and commercial areas.
Web personalization is the process of customizing the content and structure of a website to
the specific and individual needs of users. The website is personalized through the
highlighting of the existing hyperlinks, dynamically inserting new hyperlinks that seem to be
interesting to the current user, or even creating new index pages. The content on the Web in
various fields is rapidly increasing and the need for identifying and retrieving the content
exactly based on the needs of the users is more than required. Therefore, an ultimate need
nowadays is that of predicting the user needs in order to improve the usability of a Web site.
5
II. LITERATURE SURVEY
A) PERSONALIZATION AND WEB USAGE MINING
The aim of personalization based on Web usage mining is to recommend a set of objects to
the current user as determined by matching usage patterns. This task is accomplished by
matching the active user session with the usage patterns discovered through Web usage
mining. This process is performed by the recommendation engine which is the online
component of the personalization system. The process of Web personalization based on Web
usage mining consists of three phases:
n
The data preparation phase transforms unprocessed Web log files into transaction data which
can be then processed by data mining tasks. Various data mining techniques can be applied to
this transaction data in the pattern discovery phase, such as clustering, association rule
mining, and sequential pattern discovery. The results of the mining phase are transformed
into aggregate usage profiles. These aggregate usage profiles are suitable for use in the
recommendation phase. The recommendation engine takes into account the active user
session in conjunction with the discovered patterns to provide personalized content.
6
B) PROCESS OF PERSONALIZATION
The personalization process consists of data collection, Data analysis and personalized
output.
1) Data Collection
Web personalization is based on three general types of data:
Data about the user, data about the Website usage and data about the software and hardware
available on the user’s side.
Data about the user:
This category denotes information about personal characteristics of the user. Such as:
Demographics (name, phone number, geographic information, age, sex, education etc.)
Skills and capabilities
Interests and preferences
Goals and plans
2) Data Analysis: Data analysis involves following phases: Data preparation and
preprocessing, Pattern discovery and Pattern analysis
C). PATTERN DISCOVERY
Pattern discovery aims to detect interesting patterns in the preprocessed Web usage data by
deploying statistical and data mining methods that includes:
I.
II.
III.
IV.
Association rule mining
Clustering
Classification
Sequential Pattern discovery
D) PATTERN ANALYSIS
In this final phase the objective is to convert discovered rules, patterns and statistics into
knowledge or insight involving the Website being analyzed. Knowledge here is an abstract
notion that in essence describes the transformation from information to understanding; it is
thus highly dependent on the human performing the analysis and reaching conclusions.
7
III.PROBLEM STATEMENT
The current focus on the application of web usage mining for automatically determining Web
Recommendations. Recommendations help users to quickly find the information they want or
find interesting. On the other hand, they allow website owners to optimize the website,
increase web user satisfaction and save on the costs of content management.
Recommendations are dynamically determined either based on manually specified rules or
automatically determined by different recommendation algorithms. Data Warehouse
technology is used to effectively manage large amounts of usage data and support various
recommender algorithms.
Web usage mining as an enabling mechanism to overcome the problems associated with
more traditional Web personalization techniques such as collaborative or content based
filtering. These problems include lack of scalability, reliance on subjective user ratings or
static profiles, and the inability to capture a richer set of semantic relationships among
objects. Usage-based personalization can be problematic when little usage data is available
pertaining to some objects or when the site content changes regularly. For more effective
personalization, both usage and content attributes of a site must be integrated into a Web
mining framework and used by the recommendation engine in a uniform manner.
Web personalization is the process of customizing the content and structure of a website to
the specific and individual needs of users. The website is personalized through the
highlighting of the existing hyperlinks, dynamically inserting new hyperlinks that seem to be
interesting to the current user, or even creating new index pages. Therefore, further research
needs to be carried out to identify new intelligent techniques and services for web users.
Generally, web logs can be regarded as a collection of sequences of access events from one
user or session in timestamp ascending order. Preprocessing tasks including data cleaning,
user identification, session identification and transaction identification can be applied to the
original web log files to obtain the web access transactions. Let E be a set of unique access
events, which represents web resources accessed by users, i.e. web pages, URLs, or topics. A
web access sequence S = e1e2…en (ei E for 1 i n) is a sequence of access events, and |S|
= n is called the length of S. Note that it is not necessary that ei ej for i j in S, that is repeat
8
of items is allowed. A web access transaction, denoted as WAT = (t, S), consists of a
transaction time t and a web access sequence S.
All the web access transactions in a database can belong to either a single user (for client-side
logs) or multiple users (for server-side and proxy logs). The proposed algorithm does not
depend on the type of web logs that contains the web access transactions. Suppose we have a
set of web access transactions with the access event set, E = {a, b, c, d, e, f}.
In S = e1e2…ek ek+1…en, Sprefix = e1e2…ek is called a prefix sequence of S, or a prefix sequence
of ek+1 in S. And Ssuffix = ek+1ek+2…en is called a suffix sequence of S or a suffix sequence of ek
in S. A web access sequence can be denoted as S = Sprefix + Ssuffix. For example, S = abdac can
be denoted as S = a+bdac = ab+dac = … = abda+c. Let S1 and S2 be two suffix sequences of
ei in S, and S1 is also the suffix sequence of ei in S2. Then S1 is called the sub-suffix sequence
of S2 and S2 is the super-suffix sequence of S1. The suffix sequence of ei in S without any
super-suffix sequence is called the long suffix sequence of ei in S. For example, if S = abdacb,
then S1 = cb is the sub-suffix sequence of S2 = bdacb and S2 is the super-suffix sequence of S1.
S2 is also the long suffix sequence of a in S.
Given a web access transaction database WATDB = {(t1, S1), (t2, S2), …, (tm, Sm)} in which
Si (1 i m) is a web access sequence, and ti is a transaction time. Given a calendar-based
periodic time constraint C that is defined as WATDB (C) = {(ti, Si) | ti is covered by C, 1 i
m} is a subset of WATDB under C. |WATDB(C)| is called the length of WATDB under C. The
support of S in WATDB under C is defined in equation
9
A web access sequence S is called a periodic sequential access pattern,
if sup(S, C) MinSup, where MinSup is a given support threshold.
IV.OBJECTIVES
The primary objective of this research is to investigate different web usage mining techniques
that can discover the knowledge hidden in web usage logs. In particular, this research focuses
on mining web server logs and client-side logs. The discovered knowledge will be applied for
some practical advanced web applications such as web recommendation, web personalization
and information retrieval from web usage data.
To achieve the objective, we will investigate the followings:
Develop new techniques for discovering knowledge from web logs. In particular, we will
investigate data mining techniques for mining sequential access patterns and association
access patterns from web logs. The purpose is to develop efficient and effective mining
algorithms to discover access patterns from usage logs. Besides common sequential patterns
and traditional Apriori-based association rules, we are also interested in mining some special
access patterns.
Investigate techniques for applying the discovered knowledge for practical advanced web
applications. The web applications can also serve the purpose to evaluate the quality and
effectiveness of the discovered access patterns. The applications will be developed for
providing effective and practical web services. In particular, we focus on applications such as
web recommendation and web user personalization and profiling.
Investigate new techniques for semantic web usage mining. We will investigate some
potential new techniques for extracting semantics from web logs and mining web usage data
based on the Semantic Web. In particular, we will focus on ontology learning and extraction.
10
V.PLAN OF ACTION
Web usage mining techniques have to be applied to many practical applications including the
followings:
Personalization: Web personalization is the process of customizing the content and
structure of a website to the specific and individual needs of users. The website is
personalized through the highlighting of the existing hyperlinks, dynamically inserting new
hyperlinks that seem to be interesting to the current user, or even creating new index pages.
System Improvement: By analyzing web traffic behavior, the frequent access patterns can
be discovered and applied for developing policies of web caching, document pre-fetching,
and data distribution. As such, the performance of the system can be improved. Web usage
mining is also useful for detecting intrusions and frauds by discovering frequent unexpected
access patterns and outliers.
Site Modification: The ways the users accessing a website are restricted by the website’s
link structure. The organization of web pages within the website has great influence on the
quality of the web service provided. Since web logs store user access behaviors, web usage
mining can be applied to provide insight on the organization of the website in order to
improve user browsing activities.
E-Business Intelligence: Web logs of ecommerce websites can provide information on
how customers purchase products online. Web usage mining can be used to gather e-business
intelligence and identify potential customers to improve sales and advertisements.
VI.CONCLUSION
Although the World Wide Web is the largest source of electronic information, it lacks with
effective methods for retrieving, filtering, and displaying the information that is exactly
needed by each user. With the advent of the Internet, there is a dramatic growth of data
available on the World Wide Web. Hence the task of retrieving the only required information
keeps becoming more and more difficult and time consuming. To reduce information
11
overload and create customer loyalty, Web Personalization, a significant tool that provides
the users with important competitive advantages is required. A Personalized Information
Retrieval approach that is mainly based on the end user modeling increases user satisfaction.
Also personalizing web search results has been proved as to greatly improve the search
experience. This paper reviews the various research activities carried out to improve the
performance of personalization process and also the Information Retrieval system
performance.
VII.REFERENCES
[1] P. Markellou, Maria Rigou, Spiros S., Mining for Web Personalization, Web Mining:
Applications and Technique.
[2] Honghua Dai, Bamshad Mobasher, Integrating Semantic Knowledge with Web Usage
Mining for Personalization. Web Mining: Applications and Technique.
[3] C. Ramesh, Dr. K. V. Chalapati Rao, Dr. A. Goverdhan, A Semantically Enriched Web
Usage Based Recommendation Model. International Journal of Computer Science and
Information Technology (IJCSIT) Vol 3, No 5, Oct 2011.
[4] Erinaki, M. and Vazirgiannis, M.: Web Mining for Web Personalization [Electronic
version]. ACM Transaction on Internet Technology, Volume 3, Issue 1, Pages 1 – 27, 2003
[5] Bamshad M., Cooley R. and Srivastava J.: Automatic Personalization Based on Web
Usage Mining[Electronic version]. Communications of the ACM Vol 43.No. 8, 2000.
12