Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Electronics Communication and Computer Technology (IJECCT) Volume 1 Issue 1 | September 2011 A Web Usage Mining Framework for Business Intelligence Sonal Tiwari Department of MCA NRI institute of information science & Technology Bhopal, India [email protected] Abstract—In this paper, we introduce a web mining solution to business intelligence to discover hidden patterns and business strategies from their customer and web data. We propose a new framework based on web mining technology. Web mining attempts to determine useful knowledge from secondary data obtained from the interactions of the users with the web. Keywords- Data mining, Web Mining, Web Usage Mining, Xml.. I. INTRODUCTION Web mining has become very vital for effective web side management, creating adaptive web sides, business and support services, personalization, network traffic flow analysis and so on[1]. The WWW continues to grow at wonderful rate as an information gateway and as a medium for conducting business. Web mining is the extraction of appealing and useful knowledge and implicit information from artifacts or activity correlated to the WWW [2]. Web mining has been widely used in the past for analyzing huge collections of data, and is currently being applied to a variety of domains [3]. Based on several research studies we can broadly classify web mining into three domains: content, structure and usage mining. Web content mining is the process of extracting knowledge from the content of the actual web documents (text content, multimedia etc.). Web structure mining is targeting useful knowledge from the web structure, hyperlink references and so on. Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the web. Web usage mining has become very critical for effective web site management, creating adaptive web sites, business and support services, personalization and network traffic flow analysis. In this paper, we describe a framework that aims at solution to business intelligence to discover the hidden insight of their business and web data. We demonstrate how web mining technology can be effectively applied in business intelligence. The framework we propose takes the results of the web mining process as input, and converts these results into actionable knowledge, by enriching them with information that can be readily interpreted by the business analyst. ISSN: 2249-7838 Figure 1: Web Usage Mining Framework II. WEB USAGE MINING AND BUSINESS INTELLIGENCE The fast business growth has made both business community and customers face a new situation. Due to intense competition on the one hand and the customer's option to prefer from a number of alternatives, the business community has realized the essential of intelligent marketing strategies and relationship management. Web servers record and accumulate data about user relations whenever requirements for resources are received. Analyzing the Web access logs can help understand the user behavior and the web structure. From the business and applications point of view, knowledge obtained from the web usage [5] patterns could be directly applied to efficiently manage activities correlated to e-business, e-services and eeducation. Accurate web usage information could help to attract new customers, retain current customers, improve cross marketing/sales, effectiveness of promotional campaigns, tracking leaving customers etc. The usage information can be exploited to improve the performance of Web servers by developing proper perfecting and caching strategies so as to decrease the server response time. User profiles could be built by combining users‟ navigation paths with other data features, such as page viewing time, hyperlink structure, and page content [5]. IJECCT | www.ijecct.org 19 International Journal of Electronics Communication and Computer Technology (IJECCT) Volume 1 Issue 1 | September 2011 Web Usage Mining techniques can be used to anticipate the user behavior in real time by comparing the current navigation pattern with typical patterns which were extracted from past Web log. Recommendation systems could be developed to recommend interesting links to products which could be interesting to users. One of the major issues in web log mining is to group all the users‟ page requests so to clearly identify the paths that users followed during navigation through the web site. The most common approach is to use cookies to track down the sequence of users‟ page requests or by using some heuristic methods. Session reconstruction is also difficult from proxy server log file data and sometimes not all users‟ navigation paths can be identified. A. Data Sources The usage data collected at different sources represent the navigation patterns of different segments of the overall web traffic, ranging from single user, single site browsing Behavior to multi-user, multi-site access patterns. Web server log does not accurately contain sufficient information for inferring the behavior at the client side as they relate to the pages served by the web server. Data may be collected from (a) Web servers, (b) proxy servers, and (c) Web clients. Web servers collect large amounts of information in their log files Databases are used instead of simple log files to store information so to improve querying of massive log repositories [6]. Internet service providers use proxy server services to improve navigation speed through caching. Collecting navigation data at the proxy level is basically the same as collecting data at the server level but the proxy servers collects data of groups of users accessing groups of web servers. Usage data can be tracked also on the client side by using JavaScript, Java applets, or even modified browsers [8]. B. Data Pre-Processing The raw web log data after pre-processing and cleaning could be used for pattern discovery, pattern analysis, web usage statistics, and generating association/ sequential rules. Much work has been performed on extracting various pattern information from web logs and the application of the discovered knowledge range from improving the design and structure of a web site to enabling business organizations to function more efficiently .Data pre-processing involves mundane tasks such as merging multiple server logs into a central location and parsing the log into data fields. The preprocessing comprises of (a) the data cleaning, (b) the identification and the reconstruction of users‟ sessions, and (c) the data formatting. Data cleaning Consists of removing all the data tracked in Web logs that are useless for mining purposes. Graphic file requests, agent/spider crawling etc. could be easily removed by only looking for HTML files requests. Normalization of URL‟s is often required to make the requests consistent. ISSN: 2249-7838 III. DATA MINING TECHNIQUE The term data mining [8] refers to a broad spectrum of mathematical modeling techniques and software tools that are used to find patterns in data and user these to build models. In this context of recommender applications, the term data mining is used to describe the collection of analysis techniques used to infer recommendation rules or build recommendation models from large data sets. Recommender systems that incorporate data mining techniques make their recommendations using knowledge learned from the actions and attributes of users. Classical data mining techniques include classification of users, finding associations between different product items or customer behavior, and clustering of users [9]. A. Clustering Clustering techniques work by identifying groups of consumers who appear to have similar preferences. Once the clusters are created, averaging the opinions of the other consumers in her cluster can be used to make predictions for an individual. Some clustering techniques represent each user with partial participation in several clusters. The prediction is then an average across the clusters, weighted by degree of participation. B. Classification Classifiers are general computational models for assigning a category to an input. The inputs may be vectors of features for the items being classified or data about relationships among the items. The categories are a domain specific classification such as malignant/benign for tumor classification, approve/reject for credit requests or intruder/authorized for security checks. One way to build a recommender system using a classifier is to use information about a product and a customer as the input, and to have the output . C. Association Rules Mining Association rule mining is to search for interesting relationships between items by finding items frequently appeared together in the transaction database. If item B appeared frequently when item A appeared, then an association rule is denoted as A confidence are two measures of rule interestingness that reflect usefulness and certainty of a rule respectively [10].Support, as usefulness of a rule, describes the proportion of transactions that contain both items A and B, and confidence, as validity of a rule ,describes the proportion of transactions containing item B among the transactions containing item A. The association rules that satisfy user specified minimum support threshold (minSup) and minimum confidence threshold (minCon) are called Strong association rules. One of the best-known examples of web mining in recommender systems is the discovery of association rules, or item-to-item correlations [11]. Association rules have been used for many years in merchandising, both to analyze patterns of preference across products, and to recommend products to consumers based on other products they have selected. Recommendation using association rules is to predict IJECCT | www.ijecct.org 20 International Journal of Electronics Communication and Computer Technology (IJECCT) Volume 1 Issue 1 | September 2011 preference for item k when the user preferred item i and j, by adding confidence of the association rules that have k in the result part and i or j in the condition part [9]. IV. WEB MINING FRAMEWORK FOR BUSINESS INTELLENGENCE A. A Visual Web Log Mining Architecture In this section, we present A Visual Web Log Mining Architecture [7] for e-commerce recommender systems, named V-Web Log Miner, which relies on mining and on visualization of Web Services log data captured in business intelligence environment .As shown in Figure 2, V-Web Log Miner is a multi-layered architecture capable to deal with both Web services XML based logs and traditional Web server logs as input data. 3) The Data Layer: The data layer is a repository of input/output business intelligence data. It also stores pre-processed logs, business intelligence sessions, and information about the Web services execution. 4) The Recommendations Engine Layer: The recommendations engine Layer is a data mining engine and is in charge of bulk loading XML data from database, executing SQL commands against it and executes the mining algorithms. This layer integrates BI tools, e.g. OLAP and data mining etc.. Using the user profiles and content profiles, the businesses apply data mining techniques [12] to identify appropriate business rules. These rules could involve a simple classification of the users using their profiles and the website click-streams, association between content profiles and user behavior, or association between different products. The knowledge of customers‟ behavior will help to improve customer relationships and make business strategies. 5) The Visualization tools: Visualization tools should be used to present implicit and useful knowledge from recommendations engine, Web services usage and composition. Data can be viewed at different levels of granularity and abstractions as patrolled coordinate‟s graphs [10, 11].This visual model easily shows the interrelationships and dependencies between different components. Interactively, the model can be used to discover sensitivities and to do approximate optimization, etc. V. Figure 2: Visual Web Log Mining Architecture 1) The integration: layer The integration layer is set of programs used to prepare data for further processing. For instance: extraction, cleaning, transformation and loading. This layer uses X Query, XSLT and XML Schemas to feed the data repository, i.e., relational or XML [15] native database. The Web log parser component is used to parse and transform plain ASCII files produced by a Web server to a standard database format. This component is important to make the architecture independent from the Web server supplier 2) The Sessionization Layer: The sessionization layer is used to tie the instances of Web services and Web pages to sessions and to user. This layer is important to investigate the usage of the Web services composition used through users sessions. ISSN: 2249-7838 CONCLUSION AND FUTURE WORK This paper introduces the basic ideas of Recommender system and importance of web usage mining in business intelligence. Recommender systems have emerged as powerful tools for helping customers find items of interest. The research work existing in this paper makes several contributions to the framework of recommender systems linked research. First of all, we propose a new framework based on web mining technology for structure a Web-page recommender system. Additionally, we demonstrate how web mining technology can be effectively applied in a business intelligence environment. There are some possible extensions to this work. Research for analyzing customers‟ past purchasing pattern will enable to discover an appropriate. Also, it will bean interesting research area to conduct a real marketing promotion to target customers using our slant and then to evaluate its performance. We have developed web mining for a business intelligence .for this we have developed a framework that uses association rule, web mining and on visualization of web services log data captured in business intelligence environment. REFERENCES [1] P. Pirolli, J. Pitkow, and R. Rao, “Silk From a Sow„s Ear: Extracting Usable Structures from the Web”,Proceedings on Human Factors in ComputingSystems (CHI‟96), ACM Press, 1996 IJECCT | www.ijecct.org 21 International Journal of Electronics Communication and Computer Technology (IJECCT) Volume 1 Issue 1 | September 2011 [2] [3] [4] [5] [6] [7] R. Cooley, “Web Usage Mining: Discovery and Application of Interesting patterns from Web Data”, Ph. D. Thesis, University of Minnesota, Department D.J.H and, H.Mannila, and P.Smyth. “Principles of Data Mining”.MIT Press, 2000.f Computer Science, 2000. M. Spiliopoulou, and L.C. Faulstich, “WUM: A Web Utilization Miner”, Proceedings of EDBT Workshop on the Web and Data Bases (WebDB‟98), Springer Verlag, pp. 109-115, 1999. F. Masseglia, P. Poncelet, and R. Cicchetti, “An Efficient Algorithm for Web Usage Mining”, Networking and Information Systems Journal (NIS), vol.2, no. 5-6, pp. 571-603, 1999 K.P. Joshi, A. Joshi and Y. Yesha, On using a warehouse to analyze web logs, Distributed and Parallel Databases, 13 (2), pp. 161–180, 2003. J. Srivastava, R. Cooley, M. Deshpande, and P.N. Tan, Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations, vol. 1, no. 2, pp. 12-23,2000. [8] F.M. Facca and P.L. Lanzi, Mining interesting knowledge from weblogs: a survey, Data & Knowledge Engineering, Volume 53, Issue 3 , pp.225-241, 2005. [9] Choonho Kim and Juntae Kim, A Recommendation Algorithm Using Multi-Level Association Rules, Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence, p.524, October 1317,2003. [10] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaurmann Publishers, 2000 [11] Sarwar, B., Karypis, G., Konstan, J.A., & Reidl, J. Item-based Collaborative Filtering Recommendation Algorithms. Proceedings of the Tenth International Conference on World Wide Web, pp. 285 -295, 2001. [12] [13] [14] [15] I.Press. IBM intelligent miner.In IBM Documentation, 2001. O.Press. Oracle personalization.In Oracle Documentation, 2001. O.Press. Oracle personalization.In Oracle Documentation, 2001. Inselberg, A. Multidimensionl detective, In IEEE Symposium on Information Visualization, 1997, vol.00, p.100-110. ISSN: 2249-7838 IJECCT | www.ijecct.org 22