Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE) ISSN: 0976-1353 Volume 23 Issue 5 –SEPTEMBER 2016. Web Usage Mining Based Analysis of Web Site Using Web Log Expert Tool J.Umarani#1, A.Silambarasi*2 and G.Thangaraju Author*3 # Assistant Professor, Department of Computer Applications, Thanthai Hans Roever College, Perambalur, Tamilnadu, India. *2 Research Scholar, Department of Computer Science, Thanthai Hans Roever College, Perambalur, Tamilnadu, India. *3 Research Scholar, Department of Computer Science, Karpagam University, Coimbatore, Tamilnadu, India. server log file which is generally maintained by the web site administrator. Web usage mining is a class of Web mining used to mine these logs to extract useful information. Three different steps are used in Web usage mining process: - Data Preprocessing, Pattern Discovery and Pattern Evaluation [4]. Data preprocessing is an important step due to unstructured nature of log data. It is necessary to preprocess log files before applying basic data mining techniques in pattern discovery phase. It improves efficiency and scalability of later phases of Web Usage Mining. Data preprocessing involves several main steps like data fusion, data extraction, data cleaning, user identification, session identification, data reduction, data transformation etc. [5, 6]. The following figure Fig (1) shows the Stages of Web mining process. Abstract— The elementary responsibility of Web usage mining is to capture, analyze, and model the Web server logs. Usually it automatically discovers the usage behavior of the Website users. Web usage mining is the application of data mining techniques to discover and extract the information hidden in the web server log file. The extracted information is user access patterns and used for analyzing users behavior patterns. Understanding the frequently access patterns of the users allows the website owners to manage and improve the website accordingly in order to improve web based applications. Analyzing the web usage log data web mining systems can discover knowledge about user’s interest and systems usage characteristics. In this paper, we had utilized the Web Log Expert tool to analyze the Web server log files of the Website. It evaluates the important information about visitors, top errors, web browsers and different platforms used by the Website users mostly. The obtained information shall definitely increase the effectiveness of the Website Index Terms— Web Usage Mining, Server log, Web Log Expert Tool. I. INTRODUCTION A revolution has been practical in the way people work on the internet. People are making use of this important tool for disseminating their ideas, conducting business and most important entertaining themselves. Data on the web is rapidly increasing day by day [1]. Web is an open medium [2]. And due to this openness it has become real tough for users to plough through this abundant information. In order to provide a solution to this problem the term Web mining was coined. Analyzing the web user’s behavior is also known as Web Usage Mining (WUM). WUM is an active research area which entails in adapting the mining methods to the records of web access log files. These web log files collect numerous types of data include, host IP address, the URL requested, the date and the other information about the user navigation of the web. The techniques of Web Usage Mining provide interesting knowledge about the web user’s behavior in order to excerpt relationships in the recorded data. User’s navigation pattern can be extracted from server log [3]. Whenever a user requests a resource on web his activity is automatically logged by the server into a special file called Fig: 1 Types of Web mining II. WEB LOG ANALYSIS Web server logs stores click stream data which can be useful for mining purposes [7]. They are plain text (ASCII) files which contain information about User Name, IP Address, Time Stamp, Access Request, URL that Referred, error codes (if any) etc. and generally reside in the web servers. Traditionally there are four types of server logs: Transfer Log, Agent Log, Error Log and Referrer Log [8]. The Transfer and 33 International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE) ISSN: 0976-1353 Volume 23 Issue 5 –SEPTEMBER 2016. the Agent Log are said to be standard whereas the error and referrer log are considered optional as they may not be turned on. Every log entry records the traversal from one page to another, storing user IP number and all the related information [9]. IV. PROBLEM IDENTIFICATION Website design is currently based on systematic investigations about the interests of website visitors and investigated assumptions about their exact behavior. Today, understanding the interests of users is becoming a fundamental need for Websites owners in order to better serve their users by making adaptive the content and usage, structure of the website to their preferences. The analysis of web log data permits to identify useful patterns of the browsing behavior of users, which exploited in the process of navigational behavior. Web log data captures web-browsing behaviors of users from a Website. Academic institutions are good examples that develop website. One such institution of the education sector has been considered in our work. This paper presents visitor pattern analysis performed through educational institution web log data. We have been performed different analysis on a sample of Web log data to– Determine the usability of the Website, including the- Fig: 2 Taxonomy of web server logs Access log file contains all the information that provides to the clients by the server. Error log file contains a list of any server error. These two log files are very common and important to fetch the required information in accessing the user behavior during suspected user investigation. Agent and Referrer log file is not always enabled at server side. Agent log file provide the information about user’s browser, operating system and version of browser. Referrer log file [10] is used to allow websites and web servers to identify where people are visiting them from, for promotional or security purposes. Visitor access pattern Analysis Page vision Analysis Time Analysis Source of the Website Visitors Pages of the Website that are accessed Number of document downloaded (both hits & accesses) In this work, Web Log Expert reports undergo a time analysis and page view analysis. The time analysis looks at the different times of day, days of week, and days of month that the Website receives the most visitors. The page view analysis provides which website pages are most viewed by the visitors. The combination of these statistics will help us to predict the attributes of the Website user and the Website usability. In this study, the user access web log data has been collected from the Educational Institution Website which stores normally secondary data source in view of the fact that web log keeps every activity of the user regarding to visit of the Website. The web log data contains the information from 310 June 2016 to 31 July 2016 of one month period. During this period, 1.04 GB data had been transferred for the complete work. In Web Usage Mining technique, the main data origin has three kinds: Server-side data, Client-side data, and Proxy-side data (middle data). In this work, we use the case of the Web server. III. RELATED WORKS Many Researchers do lot of work in web usage mining using various tools available in online. • A new approach was introduced by Liu, H., et al., for classifying user navigation patterns and predicting user’s future request [11]. • Arya.S et al. proposed another work that a methodology was used and web log data was used to improve mar keting activities [12]. • A research is carried out on mining interesting knowledge from web logs which presented in [13]. • Ramya et al. have proposed a methodology for discovering patterns in usage mining to improve the quality of data by reducing the quantity of data[14]. • Maheswara Rao et al. have identified a research frame work capable of preprocessing web log data completely and efficiently. This framework helps to mine usage behavior of the users [15]. • A research work specifies a recommender system that was able to online personalization for user patterns [16]. • In a research work, a methodology was proposed for interesting knowledge mining through web access logs [17]. A. Web Log Data A Web log data is a listing of page reference data sometimes it is referred to as click stream data[18]. The web plays an important role and medium for extracting useful information. The web server log data contains several attributes. These attributes are as follows: Number of Hits– This number usually signifies the number of times any resource is accessed in a Website. A hit is a request to a web server for a file i.e. web page, image, JavaScript, Cascading Style Sheet, etc. Number of Visitors– A visitor is exactly what it sounds like. It is a human who navigates to the website and browses one or more pages on the website. 34 International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE) ISSN: 0976-1353 Volume 23 Issue 5 –SEPTEMBER 2016. average of hits in a day, page views, bandwidth etc. It enlists all the general information which one should know related to a website. Visitor Referring Website– The referring website gives the information or URL of the website which referred the particular website in consideration. Visitor Referral Website– The referral website gives the information or URL of the website which is being referred to the particular website in consideration. Time and Duration– This information in the web server logs give the time and duration for how long the website was accessed by the particular user. Path Analysis– Path analysis gives the analysis of the path to a particular user has followed in accessing contents of a website. Visitor IP Address– This information gives the IP address of the visitors who visited the website. Browser Type– This information gives the information of the type of web browser that was used for accessing the website. Platform– This information provides the type of operating systems or platforms etc. which has been used to access the website. Cookies– A message given to a web browser by a web server. The browser stores the message in a text file called cookie. The message is then sent back to the server each time the browser requests a page from the server. The main purpose of cookies is to identify users and possibly prepare customized web pages for them. A. Summary of Result V. WEB LOG EXPERT TOOL There are various commercial and freely available tools exists for web mining purposes. Web Log Expert Lite7.8 is one of the fast and powerful Web log analyzer tool [19]. This tool helps to reveal important statistics regarding a web site’s usage such as activity of visitors, access statistics, paths through the website, visitors' browsers, etc. It supports W3C extended log format that is the default log format of Microsoft IIS 4.0/.05/6.0/7.0 and also the combined and common log formats of Apache web server. It reads compressed log files (.gz, .bz2 and .zip) and can automatically detect the log file format. If necessary, log files can also be downloaded via FTP or HTTP. We have been used a web log analyzer Web Log Expert Lite7.8 web mining tool. It is one such program and used to produce highly detailed, easily configurable usage reports in Hypertext Markup Language (HTML) format, for viewing with a standard web browser [19]. Using this web mining tool we have been identified Hits statistics like Total Hits, Visitors Hits, Average Hits per Day, Average Hits per Visitor, etc., Page View Analysis like Total Page views, Average Page Views per Day, Average Page Views per Visitor, total Visitors, Total Visitors, Average Visitors per Day, Total Unique IPs, Bandwidth, Total Bandwidth, Visitor Bandwidth, Average Bandwidth per Day, Average Bandwidth per Hit, and Average Bandwidth per Visitor of the Website on monthly and day of the week basis. B. Activity by Hour of Day VI. RESULTS AND DISCUSSIONS In this section we get overall information pertaining to the website like how many times the website was hit an 35 International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE) ISSN: 0976-1353 Volume 23 Issue 5 –SEPTEMBER 2016. [10] L. K. Joshila Grace, V. Maheswari and D. Nagamalai, “Analysis of Web Logs and Web User in Web Mining”, International Journal of Network Security & Its Application (IJNSA), Vol. 3, No. 1, 2011, pp. 99-110 [11] Liu, H., et al., “Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting user’s future requests”, Data and Knowledge Engineering, 2007,Vol 61, Issue 2, pp.304-330. [12] Arya, S., et al., “A methodology for web usage mining and its applications to target group identification”, Fuzzy sets and systems, 2004, pp.139-152. [13] F.M. Facca, and P.L. Lanzi, “Mining interesting Knowledge from Web logs: a survey”, Elsevier Science, Data and Knowledge Engineering, 2005, 53, pp.225-241. [14] G. R.C. et al., "An Efficient Preprocessing Methodology for Discovering Patterns and Clustering of Web Users using a Dynamic ART1 Neural Network, “Fifth International Conference on Information Processing, 2011. Springer-Verlag. [15] Maheswara Rao.V.V.R and Valli Kumari.V, "An Enhanced Pre-Processing Research Framework for web Log Data Using a Learning Algorithm," Computer Science and Information Technology, DOI, pp. 1-15, 2011. 10.5121/csit.2011.1101. [16] Mehrdad Jalali et al., “A Recommender System for Online Personalization in the WUM Applications”, Proceedings of the World Congress on Engineering and Computer Science 2009 Vol. II, WCECS 2009, October 20-22, 2009, San Francisco, USA. [17] K Sudheer Reddy et al, “An Effective Methodology for Pattern Discovery in Web Usage Mining”, International Journal of Computer Science and Information Technologies, Vol. 3 (2) , 2012, 3664-3667. [18] Castellano.G et al., “Log Data Preparation for Mining Web Usage Patterns”, International Conference Applied Computing, 2007, pp.371-378. C. Activity by Day of Week VII. CONCLUSION Web is one of the most used interfaces to access remote data, commercial and non-commercial services. Web mining is a growing area with the growth of web based applications to find web usage patterns. By using web mining we could found website user’s interest and behavior through which we can make our website valuable and easily accessible. The complete work has accomplished by analyzing educational institution web log data for one month period. Our experimental results help to predict and identify the number of visitors for the Website and improve the Website usability. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] Brijendra Singh, Hemant Kumar Singh: Web Data Mining Research: A Survey, IEEE, 2010. G.K. Gupta, Introduction to Data Mining with Case Studies: Web Data Mining, PHI Learning Private Limited, pp. 231-233, 2011. Ankita Kusmakar, Sadhna Mishra ,Web Usage Mining: A Survey on Pattern Extraction from Web Logs , International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 9, September 2013 ISSN: 2277 128X,Page:-834-838. Cooley, Robert, Bamshad Mobasher, and Jaideep Srivastava. "Data preparation for mining world wide web browsing patterns." Knowledge and information systems 1.1 (1999): 5-32. V. Sujatha and Punithavalli, “Improved User Navigation Pattern Prediction Technique From Web Log Data”, ELSEVIER-2012. K. Sudheer Reddy, M. Kantha Reddy & V. Sitaramulu, “An Effective Data preprocessing Method for Web Usage Mining”, Feb-2013, IEEE. Navin kr Tyagi ,A.K. Solanki, Manoj Wadhwa : Analysis of Server Log by Web Usage Mining for Website Improvement, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 4, No 8,pp. 17-21,2010. L.K. Joshila Grace, V.Maheswari, Dhinaharan Nagamalai: Analysis of Web Logs and Web User In Web Mining, International Journal of Network Security & Its Applications (IJNSA), Vol.3, No.1, January 2011. Theint Aye: Web Log Cleaning for Mining of Web Usage Patterns, IEEE, 2011. 36