Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A PROPOSAL TO RESEARCH ON “USER NAVIGATIONAL PATTERNS FOR WEB PERSONALIZATION AND RECOMMENDATION “ PREPARED BY CH. MUTYALA RAO Research Scholar (Part Time)-CSE D OSMANIA UNIVERSITY, HYDERABAD 1 ABSTRACT With the never-ending growth of Web services and Web-based information systems, the volumes of click stream and user data collected by Web-based organizations in their daily operations has reached enormous proportions analysing such huge data can help to evaluate the effectiveness of promotional campaigns, optimize the functionality of Web-based applications, and provide more personalized content to visitors. The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. With millions of pages available on web, it has become difficult to access relevant information. One possible approach to solve this problem is web personalization. Web personalization is defined as any action that customizes the information or services provided by a web site to an individual. Web usage mining is used to discover interesting user navigation patterns and can be applied to many real-world problems, such as improving Web sites, making additional topic or product recommendations, user behavior studies, etc. A Web usage mining system performs the given five major tasks: i) data gathering, ii) data preparation, iii) navigation pattern discovery, iv). pattern analysis and visualization, and v) pattern applications. To have a clear and well organized website have become one of the primary objectives of enterprises and organizations. Website administrators may want to know how they can attract visitors, which pages are being accessed most/least frequently, which part of website is most/least popular and need enhancement, etc. Of late, the rapid growth of the use of Internet has made automatic knowledge extraction from server log files a necessity. Analysis of server log data can provide significant and useful information. This can improve the effectiveness of the Web sites by adapting the information structure to the users’ behavior. Most of the Web Usage Mining techniques use Server log files as raw data to produce the user navigation patterns. Along with the server access log file, we incorporate Website knowledge into the web usage mining phases. This incorporation can lead to superior patterns. These patterns can be used to provide set of recommendations for the web site which can be deployed by web site administrator for website enhancement. Keywords—Sematic Web, Web Personalization, Usage Mining, Navigational Patterns, Pattern Analysis and Ontology 2 LIST OF CONTENTS PAGE NUMBER ABSTRACT I. INTRODUCTION 1-2 II. LITERATURE SURVEY 3-4 III. PROBLEM STATEMENT 5-6 IV. OBJECTIVES 7 V. PLAN OF ACTION 8 VI. CONCLUSION 9 VII. REFERENCES 9 3 I.INTRODUCTION The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Suggested a layer structure for the Semantic Web: (1) Unicode/URI (Uniform Resource Identifiers) (2) XML/Name Spaces/XML Schema; (3) RDF (Resource Description Framework)/RDF Schema; (4) Ontology vocabulary; (5) Logic; (6) Proof; and (7) Trust. Uniform Resource Identifiers (URIs) are a fundamental component of the current Web and are a foundation of the Semantic Web. The Extensible Markup Language (XML) is also a fundamental component for supporting the Semantic Web. XML provides an interoperable syntactical foundation upon which the more important issue of representing relationships and meaning can be built. URIs provide the ability for uniquely identifying resources as well as relationships among resources. The Resource Description Framework (RDF) family of standards leverages URIs and XML to provide a stepwise set of functionality to represent these relationships and meaning. An ontology is a specification of a conceptualization, which is an abstract, simplified view of the world that we wish to represent for some purposes. For Semantic Web, an ontology is a conceptualization of a domain into a human understandable, but machine readable format consisting of entities, attributes, relationships, and axioms. Proof and trust follow the understanding that it is important to be able to check the validity of statements made in the Semantic Web. These two layers are rarely tackled today, but are interesting topics for future research. With the Semantic Web, the large amount of information on the Web can be shared, reused and managed effectively. Since machine can understand the content on the Semantic Web, it enables more advanced automated processing on the Web. Intelligent search engines can be developed to help people find relevant information by using semantic query language. New knowledge can be derived from existing information efficiently. Many advanced applications and services such as e-business, e-government, and e-learning become possible. 4 With the never-ending growth of Web services and Web-based information systems, the volumes of click stream and user data collected by Web-based organizations in their daily operations has reached enormous proportions. Analysing such huge data can help to evaluate the effectiveness of promotional campaigns, optimize the functionality of Web-based applications, and provide more personalized content to visitors. Web Usage Mining is the process of applying data mining techniques to the discovery of usage patterns from data extracted from Web Log files. It mines the secondary data (web logs) derived from the users' interaction with the web pages during certain period of Web sessions. Web usage mining consists of three phases, namely pre-processing, pattern discovery, and pattern analysis. Web Usage Mining analyses the usage patterns of web sites in order to get an improved understanding of the users’ interests and requirements. This information is especially valuable for E-Business sites in order to achieve improved customer satisfaction. Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user's navigational behaviour (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. Web personalization is the process of customizing the content and structure of a website to the specific and individual needs of users. The website is personalized through the highlighting of the existing hyperlinks, dynamically inserting new hyperlinks that seem to be interesting to the current user, or even creating new index pages. The content on the Web in various fields is rapidly increasing and the need for identifying and retrieving the content exactly based on the needs of the users is more than required. Therefore, an ultimate need nowadays is that of predicting the user needs in order to improve the usability of a Web site. 5 II. LITERATURE SURVEY A) PERSONALIZATION AND WEB USAGE MINING The aim of personalization based on Web usage mining is to recommend a set of objects to the current user as determined by matching usage patterns. This task is accomplished by matching the active user session with the usage patterns discovered through Web usage mining. This process is performed by the recommendation engine which is the online component of the personalization system. The process of Web personalization based on Web usage mining consists of three phases: n The data preparation phase transforms unprocessed Web log files into transaction data which can be then processed by data mining tasks. Various data mining techniques can be applied to this transaction data in the pattern discovery phase, such as clustering, association rule mining, and sequential pattern discovery. The results of the mining phase are transformed into aggregate usage profiles. These aggregate usage profiles are suitable for use in the recommendation phase. The recommendation engine takes into account the active user session in conjunction with the discovered patterns to provide personalized content. 6 B) PROCESS OF PERSONALIZATION The personalization process consists of data collection, Data analysis and personalized output. 1) Data Collection Web personalization is based on three general types of data: Data about the user, data about the Website usage and data about the software and hardware available on the user’s side. Data about the user: This category denotes information about personal characteristics of the user. Such as: Demographics (name, phone number, geographic information, age, sex, education etc.) Skills and capabilities Interests and preferences Goals and plans 2) Data Analysis: Data analysis involves following phases: Data preparation and preprocessing, Pattern discovery and Pattern analysis C). PATTERN DISCOVERY Pattern discovery aims to detect interesting patterns in the preprocessed Web usage data by deploying statistical and data mining methods that includes: I. II. III. IV. Association rule mining Clustering Classification Sequential Pattern discovery D) PATTERN ANALYSIS In this final phase the objective is to convert discovered rules, patterns and statistics into knowledge or insight involving the Website being analyzed. Knowledge here is an abstract notion that in essence describes the transformation from information to understanding; it is thus highly dependent on the human performing the analysis and reaching conclusions. 7 III.PROBLEM STATEMENT The current focus on the application of web usage mining for automatically determining Web Recommendations. Recommendations help users to quickly find the information they want or find interesting. On the other hand, they allow website owners to optimize the website, increase web user satisfaction and save on the costs of content management. Recommendations are dynamically determined either based on manually specified rules or automatically determined by different recommendation algorithms. Data Warehouse technology is used to effectively manage large amounts of usage data and support various recommender algorithms. Web usage mining as an enabling mechanism to overcome the problems associated with more traditional Web personalization techniques such as collaborative or content based filtering. These problems include lack of scalability, reliance on subjective user ratings or static profiles, and the inability to capture a richer set of semantic relationships among objects. Usage-based personalization can be problematic when little usage data is available pertaining to some objects or when the site content changes regularly. For more effective personalization, both usage and content attributes of a site must be integrated into a Web mining framework and used by the recommendation engine in a uniform manner. Web personalization is the process of customizing the content and structure of a website to the specific and individual needs of users. The website is personalized through the highlighting of the existing hyperlinks, dynamically inserting new hyperlinks that seem to be interesting to the current user, or even creating new index pages. Therefore, further research needs to be carried out to identify new intelligent techniques and services for web users. Generally, web logs can be regarded as a collection of sequences of access events from one user or session in timestamp ascending order. Preprocessing tasks including data cleaning, user identification, session identification and transaction identification can be applied to the original web log files to obtain the web access transactions. Let E be a set of unique access events, which represents web resources accessed by users, i.e. web pages, URLs, or topics. A web access sequence S = e1e2…en (ei E for 1 i n) is a sequence of access events, and |S| = n is called the length of S. Note that it is not necessary that ei ej for i j in S, that is repeat 8 of items is allowed. A web access transaction, denoted as WAT = (t, S), consists of a transaction time t and a web access sequence S. All the web access transactions in a database can belong to either a single user (for client-side logs) or multiple users (for server-side and proxy logs). The proposed algorithm does not depend on the type of web logs that contains the web access transactions. Suppose we have a set of web access transactions with the access event set, E = {a, b, c, d, e, f}. In S = e1e2…ek ek+1…en, Sprefix = e1e2…ek is called a prefix sequence of S, or a prefix sequence of ek+1 in S. And Ssuffix = ek+1ek+2…en is called a suffix sequence of S or a suffix sequence of ek in S. A web access sequence can be denoted as S = Sprefix + Ssuffix. For example, S = abdac can be denoted as S = a+bdac = ab+dac = … = abda+c. Let S1 and S2 be two suffix sequences of ei in S, and S1 is also the suffix sequence of ei in S2. Then S1 is called the sub-suffix sequence of S2 and S2 is the super-suffix sequence of S1. The suffix sequence of ei in S without any super-suffix sequence is called the long suffix sequence of ei in S. For example, if S = abdacb, then S1 = cb is the sub-suffix sequence of S2 = bdacb and S2 is the super-suffix sequence of S1. S2 is also the long suffix sequence of a in S. Given a web access transaction database WATDB = {(t1, S1), (t2, S2), …, (tm, Sm)} in which Si (1 i m) is a web access sequence, and ti is a transaction time. Given a calendar-based periodic time constraint C that is defined as WATDB (C) = {(ti, Si) | ti is covered by C, 1 i m} is a subset of WATDB under C. |WATDB(C)| is called the length of WATDB under C. The support of S in WATDB under C is defined in equation 9 A web access sequence S is called a periodic sequential access pattern, if sup(S, C) MinSup, where MinSup is a given support threshold. IV.OBJECTIVES The primary objective of this research is to investigate different web usage mining techniques that can discover the knowledge hidden in web usage logs. In particular, this research focuses on mining web server logs and client-side logs. The discovered knowledge will be applied for some practical advanced web applications such as web recommendation, web personalization and information retrieval from web usage data. To achieve the objective, we will investigate the followings: Develop new techniques for discovering knowledge from web logs. In particular, we will investigate data mining techniques for mining sequential access patterns and association access patterns from web logs. The purpose is to develop efficient and effective mining algorithms to discover access patterns from usage logs. Besides common sequential patterns and traditional Apriori-based association rules, we are also interested in mining some special access patterns. Investigate techniques for applying the discovered knowledge for practical advanced web applications. The web applications can also serve the purpose to evaluate the quality and effectiveness of the discovered access patterns. The applications will be developed for providing effective and practical web services. In particular, we focus on applications such as web recommendation and web user personalization and profiling. Investigate new techniques for semantic web usage mining. We will investigate some potential new techniques for extracting semantics from web logs and mining web usage data based on the Semantic Web. In particular, we will focus on ontology learning and extraction. 10 V.PLAN OF ACTION Web usage mining techniques have to be applied to many practical applications including the followings: Personalization: Web personalization is the process of customizing the content and structure of a website to the specific and individual needs of users. The website is personalized through the highlighting of the existing hyperlinks, dynamically inserting new hyperlinks that seem to be interesting to the current user, or even creating new index pages. System Improvement: By analyzing web traffic behavior, the frequent access patterns can be discovered and applied for developing policies of web caching, document pre-fetching, and data distribution. As such, the performance of the system can be improved. Web usage mining is also useful for detecting intrusions and frauds by discovering frequent unexpected access patterns and outliers. Site Modification: The ways the users accessing a website are restricted by the website’s link structure. The organization of web pages within the website has great influence on the quality of the web service provided. Since web logs store user access behaviors, web usage mining can be applied to provide insight on the organization of the website in order to improve user browsing activities. E-Business Intelligence: Web logs of ecommerce websites can provide information on how customers purchase products online. Web usage mining can be used to gather e-business intelligence and identify potential customers to improve sales and advertisements. VI.CONCLUSION Although the World Wide Web is the largest source of electronic information, it lacks with effective methods for retrieving, filtering, and displaying the information that is exactly needed by each user. With the advent of the Internet, there is a dramatic growth of data available on the World Wide Web. Hence the task of retrieving the only required information keeps becoming more and more difficult and time consuming. To reduce information 11 overload and create customer loyalty, Web Personalization, a significant tool that provides the users with important competitive advantages is required. A Personalized Information Retrieval approach that is mainly based on the end user modeling increases user satisfaction. Also personalizing web search results has been proved as to greatly improve the search experience. This paper reviews the various research activities carried out to improve the performance of personalization process and also the Information Retrieval system performance. VII.REFERENCES [1] P. Markellou, Maria Rigou, Spiros S., Mining for Web Personalization, Web Mining: Applications and Technique. [2] Honghua Dai, Bamshad Mobasher, Integrating Semantic Knowledge with Web Usage Mining for Personalization. Web Mining: Applications and Technique. [3] C. Ramesh, Dr. K. V. Chalapati Rao, Dr. A. Goverdhan, A Semantically Enriched Web Usage Based Recommendation Model. International Journal of Computer Science and Information Technology (IJCSIT) Vol 3, No 5, Oct 2011. [4] Erinaki, M. and Vazirgiannis, M.: Web Mining for Web Personalization [Electronic version]. ACM Transaction on Internet Technology, Volume 3, Issue 1, Pages 1 – 27, 2003 [5] Bamshad M., Cooley R. and Srivastava J.: Automatic Personalization Based on Web Usage Mining[Electronic version]. Communications of the ACM Vol 43.No. 8, 2000. 12