Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013) Web Usage Mining – Its Application in E-Services Anupama Prasanth 1 Research Scholar, Karpagam University, India Lecturer, AMA International University, Bahrain 2 For that gather all the huge of volume of distributed data from the web, they are in semi structured format so the next task is to extract and convert them into a standard format in order for easy processing [19]. Because of the diversity nature of the information in web, the sampling data set is little bit large. Even though the difficulties above, the Web also provides other ways to support mining, for example, the links among Web pages are important resource to be used. [7] Besides the challenge to find relevant information, users could also find other difficulties when interacting with the Web such as the degree of quality of the information found, the creation of new knowledge out of the information available [7]. All these can be resolved by applying the web mining tools, an application of data mining. These tools can easily solve the challenging problems like searching in web links, ranks the importance of Web contents, discovers the regularity and dynamics of web contents, and mines web access patterns. Web Mining is categorized into three depends upon their applications area. Web usage mining is one of the fastest developing areas of web mining [17]. Its attention in analyzing users behavior on the web after exploring access logs made its popularity very rapidly especially in E-services areas. Its direct applications in these areas added its admiration and made it as an inevitable part in computer and information sciences [18]. Details like user log files, request for resources etc. are maintain in web servers, which is the core mining area of web usage. The analysis of these gives the user browsing patterns and that can be utilized for target advertisement, enhancement of web design, satisfaction of customers and making market analysis. Most of the e-service providers realized the fact that they can apply this tool to retain their customers [20]. This paper focused on web usage mining and is structured as follows: In section 2 we provide an overview of Web mining categories. In section 3 we discuss Web usage mining and in section 4 its application in E-services. And we conclude this paper in Section 5. Abstract— Retrieving knowledge from World Wide Web is a tedious task because of the growth in the availability of information resources on it. So this escalates the necessity to employ an intelligent system to retrieve the knowledge from World Wide Web. The performances of Web information retrievals and Web based data warehousing are boosted with the extraction of information from the Web using web mining tools. Web usage mining is one of the fastest developing areas of web mining. Its attention in analyzing users behavior on the web after exploring access logs made its popularity very rapidly especially in E-services areas. Most of the e-service providers realized the fact that they can apply this tool to retain their customers. This paper tries to provide an insight into web mining and the different areas of web mining. Then it focuses on Web usage mining, its application and impact in E-services. Keywords— E-Commerce, E-Governance, E-Learning, EServices, Pattern Analysis, Pattern Discovery, Pre-processing, Web Content Mining, Web Structure Mining, Web Usage Mining. I. INTRODUCTION The World Wide Web has variety of information service centers, like news sites, encyclopedias, education sites, ecommerce etc. So the information in WWW is spreads in theses information centers globally. To retrieve from these distributed storage areas, is a quite difficult process and it required an efficient tool to find the desired information. Only an Intelligent system which effectively mine for knowledge can resolve these problems [6]. The following factors made it difficult for an effective data warehousing and data mining [6]. The huge size of the web No proper structure for the web documents. The dynamic nature of the information source. The diversity in usage and user communities. However these are the challenges that stand as the driving force for the research into efficient and effective discovery and use of resources on the Internet. The fundamental characteristics of web make us have to think about to shape and lengthen the outmoded methodologies, accordingly. 572 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013) Web structure mining tries to identify authoritative web pages. Web is a complex data store it contains not only pages but also of hyperlinks pointing from one page to another. By giving a hyperlink to another page the author tries to show his testimonial of the other page. This tremendous linkage information forms a rich source of web mining. It offers stunning information about relevance, the quality, and the structure of web contents. The architecture of the hyperlinks underlying the website is the result of this category, and appropriate handling of this information can lead into an improvement in accuracy of the web page retrieval. Web usage mining tries to extract useful information from user web access history, what they are interested on the Internet whether textual data or multimedia data etc. Web usage mining collects the data from Web log records to discover user access patterns of Web pages. These patterns lead to accessed web pages. This is vital information for companies and their internet/intranet based applications. They use the analyzed reports of those patterns for different purposes. The applications generated from this analysis can be classified as personalization, system improvement, site modification, business intelligence and usage characterization [2]. Web usage mining depends on the collaboration of the user to allow the access of the Web log records. Due to this dependence, privacy is becoming a new issue to Web usage mining, since users should be made aware about privacy policies before they make the decision to reveal their personal data [8]. In many cases, Web content, structure and usage information is co-present in the same data file. There is no clear boundary between these categories II. WEB MINING OVERVIEW The unearthing and exploration of useful information from WWW termed as Web mining. This can be easily understood by the suggestion of Kosala and Blockeel [1]. According to them Web Mining is composed of following tasks: 1. Resource finding: Find matching documents from web. 2. Information selection and pre-processing: From the selected list nominate the relevant documents and preprocess those data. 3. Generalization: Analyze the documents and spontaneously determines general trends. 4. Analysis: Use the general trends and mark conclusions. There are three categories of Web Mining [1, 2] - Web usage Mining, describes the process of extracting useful information from user access patterns; Web content mining, the automatic search of information resources available online; and Web structure mining, which is a tool to identify the authoritative pages. The process of knowledge detection of hidden and possibly useful information from the Web is the main attention of each category, but they concentrates on different mining stuffs of the Web [16]. A brief description of each of these categories is follows: Web pages are semi-structured, DOM, in nature. Unfortunately, the HTML environment is so flexible; majority of web pages do not follow the standard structure, which may end up in errors in the DOM tree structure. This is the context where we apply the efficient mining, Web content mining. Its techniques are equivalent of data mining techniques for text mining, since it is possible to find similar types of information from the unstructured data residing in Web documents. The Web document usually contains several types of data, such as text, image, audio, video, metadata and hyperlinks. The unstructured characteristic of Web data forces the Web content mining towards a more complicated approach. Web content mining can be explained in two different contexts [6]: information retrieval and database. The role of web content mining in Information Retrieval is mainly to support the information findings or improve the information filtering based on user queries. This result can be applied to web search engines and web personalization systems. In Database context the content mining can be aid to integrate data on the web, so that more sophisticated queries other than the keywords based search could be performed. The mining result can be used to build the web warehouse and web database, and apply warehousing and database techniques on the data. III. WEB USAGE MINING Web usage mining provides better understanding for serving the needs of Web-based applications [2]. It is the automatic unearthing of user access patterns from the servers on the web. The companies retrieve huge quantities of data from their day to day operations which are usually generated by the web servers and are saved in the server access logs. In fact, Web usage mining has many benefits which attract business and government agencies towards it. Government agencies utilized the classification and predicting capability of this technology to fight against terrorism and identifying criminal activities. 573 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013) Business sectors are benefited by personalized marketing, customer retention, and customer relationship and even they got the opportunity to provide promotional offers to specific customers to retain them. The stated activities are carrying out by three major tools of Web Usage Mining, shown in figure I [21], namely: Preprocessing, Pattern discovery, and Pattern analysis [9]. A. Preprocessing The conversion of user information into the format of data abstraction, which is an essential part of pattern discovery, is preprocessing. According to the preprocessing data, it is categorized into three: Usage Preprocessing, Content Preprocessing and Structure Preprocessing. Usage preprocessing is the one of the difficult task in web usage mining. They gather data from IP address, agents and server side click streams, because of the nature of data, always the data is incomplete. Preprocessing of text, image, scripts and multimedia files are carrying out in Content preprocessing. Structure preprocessing involves the processing of hyperlinks between the page views. FIGURE I: WEB USAGE MINING PROCESS IV. APPLICATION IN E-SERVICES World Wide Web is become a wide spread medium for the circulation of information. Advancement in technology reveals that the volume of data in web and its complex structure are increasing day by day. In this scenario, the application of web usage mining has its own significance. Rapid growth in online services called e-service applications like e-commerce, e- governance, e- market, efinance, e-learning, e-banking etc. has made business community and customers face a new situation. Adoption of -intelligent marketing strategies are the only solution for the business community to face the challenges of business competition and the customers option to choose from several alternatives. This paper focuses on application of web usage mining in three main e-services E-commerce, Elearning and E-governance. B. Pattern Discovery Pattern discovery tools are derived from several fields such as statistics, data mining, machine learning and pattern recognition. After cleaning the data and the identification of user transactions and sessions from access logs only we can start pattern discovery process. Statistical techniques are used to abstract knowledge about the website visitors. Then from this abstracted knowledge Association rule generates the association between frequently referenced pages and Sequential pattern tools helps in predicting future visit patterns. From those data Clustering tools group’s similar characteristics items together, most interested groups in web usage mining tasks are image group, image cluster, and page group, page cluster, and Classification tool do the generalization process and combine together into one predefined class. A. E-Commerce: E-Commerce means two trading parties based on Internet according to certain rules or standard developing the whole traditional business activity in digital network mode [10]. Buying and selling of products or services through Internet, E-commerce generates huge volume of interactions. This tremendous growth in the E-commerce enterprise, twisted to product surplus. Also they faced a common question on how to know the customer satisfaction and their purchase trend. The competition in the field raised the necessity of serving customers in better way also initiated many issues. In order to provide better service to the users there should be a requirement of an efficient marketing strategy for analyzing their satisfaction and usage. C. Pattern Analysis Pattern analysis is the last part of Web Usage Mining. This phase will filter out all unimportant patterns from the set found in the pattern discovery. Knowledge query mechanism, such as SQL, is the most common form of pattern analysis method. These use content and structure information also for filtering out patterns containing pages of certain usage types, content types or pages that match a certain hyperlink structure. 574 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013) Discovering usage patterns from web data is the techniques which adopted in web usage mining. It has been an important technology for understanding users' behaviors on the Web. Its technology to collect customer click streams, buying and traversing patterns etc. are vital information for the E-commerce enterprises. They assist in analyzing demographic data and help them to propose cross-marketing policies across products and services. It also supports e-commerce sites to retain the most profitable customers [6], improve the functionality of web based applications, provides more custom-made content to visitors. In addition, with the use of Web usage mining techniques e-commerce companies can improve products quality or sales by anticipating problems before they occur. They also provide companies with previously unknown buying patterns and behavior of their online customers. More importantly, the fast feedback the companies obtain by using Web usage mining is very helpful in increasing the company’s benefit [7]. C. E-Governance E-governance provides a single web portal that integrates all services that includes government, nonprofit and private-sector entities [13]. In such a type of service system which provides ready access to information, the user interface quality is an important factor. This is one of the challenging user-centric parameter since this has to provide information to extensive and various users [15]. If the presentation sub- system adjusts according to the individual inclinations of each user will ensure extensive participation in e-governance systems. The patterns of the online behavior of the users can be discovered by using Web usage mining techniques. These patterns reveal the user interests and that can be utilized to fine tune user interfaces and suggest the most appropriate browsing paths. User requirements also are exhibiting in their navigation behavior. Analyzed results can be seen as knowledge to be used in intelligent online applications, refining web site maps, web based personalized system. This technology also uses the experience of users of past sessions to provide recommendation to users of current session [14]. B. E-Learning E-learning is a form of electronically supported learning which allows the people to learn any subject at anytime and anywhere. The simplicity in using the tools to browse the resources on the web, its easiness in deploying and maintaining resources made the web as an excellent tool for delivering courses. Web is one and only major choice to manage and maintain learning resources and has become one of the leading choice of modern advanced distance education system. As education becomes more technologically advanced, the complexity of available learning resources also increased accordingly. It is difficult to evaluate the structure of the course content and its effectiveness on the learning process. Track and judge all the activities performed by learners are also very tough as well as time consuming. This is the scenario where web usage mining can contribute. The pattern analysis capability of web usage mining has an important role in web-based learning system. They can analyze the students and instructors behavior [11] and improve the educational experience. Tracking the activities happening in the course website and mine patterns is also beneficial to improve or adapt the course contents. This allows instructors to appraise the access behavior, assess the learning activities and compare learners. The arrangement of the course contents can be enhanced by analyzing the traversal paths of the course content web pages is another advantage of Web usage mining [12]. V. CONCLUSION Web usage mining is becoming an active interesting field of research because of its prospective commercial benefits. It is further possible to analyze the visitor’s behavior by linking the Web logs with cookies and forms, and which could help e-services site to address several business questions. Its attention in analyzing users behavior on the web after exploring access logs made its popularity very rapidly especially in E-services areas. Details like user log files, request for resources etc. are maintain in web servers, which is the core mining area of web usage. The analysis of these gives the user browsing patterns and that can be utilized for target advertisement, enhancement of web design, satisfaction of customers and making market analysis. Most of the e-service providers realized the fact that they can apply this tool to retain their customers REFERENCES [1 ] E. Kim, W. Kim, Y. Lee. Purchase propensity prediction of EC customer by combining multiple classifiers based on GA. International Conference on Electronic Commerce 2000: 274~280. [2 ] J. B. Schafer, J. A. Konstan, J. Riedl. E-commerce recommendation applications. Data Mining and Knowledge Discovery, 2001(5):115~153. [3 ] S. W. Changchien, T. Lu. Mining association rules procedure to support on-line recommendation by customers and products fragmentation. Expert Systems with Applications, 2001(20): 325~335. 575 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013) [4 ] B. Sarwar, G. Karypis, J. Konstan, J. Riedl. Analysis of recommendation algorithms for e-commerce. Proceedings of ACM Ecommerce 2000 Conference, 2001:158~167. [5 ] S. Yuan, W. Chang. Mixed-initiative synthesized learning approach for Web-based CRM. Expert Systems with Applications, 2001(20):187~200. [6 ] http://revistaie.ase.ro/content/51/104%20-20 SIVA RAMA KRISHNAN, %20BALAKRISHNAN.pdf [7 ] J. g. Liu, h. h. Huang. Web Ming for Electronic Business Application, Proceedings of the Fourth International Conference on Parallel and Distributed Computing, Applications and Technologies, Chengdu, China, 2003:872~876. [8 ] http://www.ceng.metu.edu.tr/~nihan/ceng553/StudentPapers/016351 56 [9 ] Jaydeep Srivastava, Robert Cooley, Mukund Deshpande, Pang-Ning Tan; Web Usage mining: Discovery and Applications of Usage Patterns from Web Data; ACM SIGKDD; Jan 2000; Volume 1; Issue 2. [10 ] Ning Bin, Lei Yuan; Research on Application of Web Mining in Ecommerce; Advanced Materials Research - Scientific. Net; Volume 403 – 408; Pages 1830 – 1833; Nov 2011. [11 ] Romero C. Ventura S, Pechenizky M , Baker R. S ; Handbook of educational data mining; 2010; CRC Press. [12 ] Bart C Palmer; Web Usage Mining: Application to an online educational digital library service; Digital Commons@USU; 2012 [13 ] Zakareya Ebrahim and Zahir Irani; ―E-government adoption: architecture and barriers‖ Emerald Business Process Management journal, vol.II, No.5 2005, pp589-611, 2005 [14 ] A. S.Chakraverty 1, B. G.Rani, C. B.Singla and D. D.Anand; Experience based recommendations system for e-governance; 2012. [15 ] G.Rani; S.Chakraverty, ―Boosting Interactivity of EGovernance‖, International Conference on Communication Languages and Signal Processing- with Preference to 4 G Technologies‖, ICCLSP 4G, January2012. [16 ] Nasraoui, O.; Soliman, M.; Saka, E.; Badia, A.; Germain, R.; "A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites,"Knowledge and Data Engineering, IEEE Transactions on , vol.20, no.2, pp.202-215, Feb. 2008 [17 ] Cooley, R., Mobasher, B., Srivastava. J., ―Web Mining: Information and Pattern Discovery on the World Wide Web‖, Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), November 1997. [18 ] H. Lieberman. Letizia: An agent that assists web browsing. In Proc. of the 1995 International Joint Conference on Artificial Intelligence, Montreal, Canada, 1995. [19 ] Tao Huachuca, Jiang Lingyan. Web-based Data Mining Behavior Analysis and Research. Fujian Computer. 2004 No. 3 [20 ] Jin Fengrong. Study of Web Usage Mining and Discovery of Browse Interest. Master's Degree Thesis of Beijing Science and Technology University. February 2004 [21 ] Chu Hue Lee, Yo Lung Lo, Yu Hsiang Fu; A novel prediction model based on hierarchical characteristic of web site; Elsevier; Volume 38 Issue 4 , April 2011, Pages 3422 – 3430 576