Download Web Usage Mining Based Analysis of Web Site Using Web

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE)
ISSN: 0976-1353 Volume 23 Issue 5 –SEPTEMBER 2016.
Web Usage Mining Based Analysis of Web Site
Using Web Log Expert Tool
J.Umarani#1, A.Silambarasi*2 and G.Thangaraju Author*3
#
Assistant Professor, Department of Computer Applications, Thanthai Hans Roever College, Perambalur,
Tamilnadu, India.
*2
Research Scholar, Department of Computer Science, Thanthai Hans Roever College, Perambalur, Tamilnadu,
India.
*3
Research Scholar, Department of Computer Science, Karpagam University, Coimbatore, Tamilnadu, India.
server log file which is generally maintained by the web site
administrator.
Web usage mining is a class of Web mining used to mine
these logs to extract useful information. Three different steps
are used in Web usage mining process: - Data Preprocessing,
Pattern Discovery and Pattern Evaluation [4]. Data
preprocessing is an important step due to unstructured nature
of log data. It is necessary to preprocess log files before
applying basic data mining techniques in pattern discovery
phase. It improves efficiency and scalability of later phases of
Web Usage Mining. Data preprocessing involves several
main steps like data fusion, data extraction, data cleaning,
user identification, session identification, data reduction, data
transformation etc. [5, 6]. The following figure Fig (1) shows
the Stages of Web mining process.
Abstract— The elementary responsibility of Web usage
mining is to capture, analyze, and model the Web server logs.
Usually it automatically discovers the usage behavior of the
Website users. Web usage mining is the application of data
mining techniques to discover and extract the information
hidden in the web server log file. The extracted information is
user access patterns and used for analyzing users behavior
patterns. Understanding the frequently access patterns of the
users allows the website owners to manage and improve the
website accordingly in order to improve web based applications.
Analyzing the web usage log data web mining systems can
discover knowledge about user’s interest and systems usage
characteristics. In this paper, we had utilized the Web Log
Expert tool to analyze the Web server log files of the Website. It
evaluates the important information about visitors, top errors,
web browsers and different platforms used by the Website users
mostly. The obtained information shall definitely increase the
effectiveness of the Website
Index Terms— Web Usage Mining, Server log, Web Log
Expert Tool.
I. INTRODUCTION
A revolution has been practical in the way people work on
the internet. People are making use of this important tool for
disseminating their ideas, conducting business and most
important entertaining themselves. Data on the web is rapidly
increasing day by day [1]. Web is an open medium [2]. And
due to this openness it has become real tough for users to
plough through this abundant information. In order to provide
a solution to this problem the term Web mining was coined.
Analyzing the web user’s behavior is also known as Web
Usage Mining (WUM). WUM is an active research area
which entails in adapting the mining methods to the records of
web access log files. These web log files collect numerous
types of data include, host IP address, the URL requested, the
date and the other information about the user navigation of the
web. The techniques of Web Usage Mining provide
interesting knowledge about the web user’s behavior in order
to excerpt relationships in the recorded data. User’s
navigation pattern can be extracted from server log [3].
Whenever a user requests a resource on web his activity is
automatically logged by the server into a special file called
Fig: 1 Types of Web mining
II. WEB LOG ANALYSIS
Web server logs stores click stream data which can be useful
for mining purposes [7]. They are plain text (ASCII) files
which contain information about User Name, IP Address,
Time Stamp, Access Request, URL that Referred, error codes
(if any) etc. and generally reside in the web servers.
Traditionally there are four types of server logs: Transfer Log,
Agent Log, Error Log and Referrer Log [8]. The Transfer and
33
International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE)
ISSN: 0976-1353 Volume 23 Issue 5 –SEPTEMBER 2016.
the Agent Log are said to be standard whereas the error and
referrer log are considered optional as they may not be turned
on. Every log entry records the traversal from one page to
another, storing user IP number and all the related information
[9].
IV. PROBLEM IDENTIFICATION
Website design is currently based on systematic
investigations about the interests of website visitors and
investigated assumptions about their exact behavior. Today,
understanding the interests of users is becoming a
fundamental need for Websites owners in order to better serve
their users by making adaptive the content and usage,
structure of the website to their preferences. The analysis of
web log data permits to identify useful patterns of the
browsing behavior of users, which exploited in the process of
navigational behavior. Web log data captures web-browsing
behaviors of users from a Website. Academic institutions are
good examples that develop website. One such institution of
the education sector has been considered in our work. This
paper presents visitor pattern analysis performed through
educational institution web log data. We have been performed
different analysis on a sample of Web log data to–
Determine the usability of the Website, including the-
Fig: 2 Taxonomy of web server logs
Access log file contains all the information that provides to
the clients by the server. Error log file contains a list of any
server error. These two log files are very common and
important to fetch the required information in accessing the
user behavior during suspected user investigation. Agent and
Referrer log file is not always enabled at server side. Agent
log file provide the information about user’s browser,
operating system and version of browser. Referrer log file
[10] is used to allow websites and web servers to identify
where people are visiting them from, for promotional or
security purposes.
Visitor access pattern Analysis
Page vision Analysis
Time Analysis
Source of the Website Visitors
Pages of the Website that are accessed
Number of document downloaded (both hits & accesses)
In this work, Web Log Expert reports undergo a time
analysis and page view analysis. The time analysis looks at the
different times of day, days of week, and days of month that
the Website receives the most visitors. The page view analysis
provides which website pages are most viewed by the visitors.
The combination of these statistics will help us to predict the
attributes of the Website user and the Website usability.
In this study, the user access web log data has been
collected from the Educational Institution Website which
stores normally secondary data source in view of the fact that
web log keeps every activity of the user regarding to visit of
the Website. The web log data contains the information from
310 June 2016 to 31 July 2016 of one month period. During
this period, 1.04 GB data had been transferred for the
complete work. In Web Usage Mining technique, the main
data origin has three kinds: Server-side data, Client-side data,
and Proxy-side data (middle data). In this work, we use the
case of the Web server.
III. RELATED WORKS
Many Researchers do lot of work in web usage mining
using various tools available in online.
• A new approach was introduced by Liu, H., et al., for
classifying user navigation patterns and predicting
user’s future request [11].
• Arya.S et al. proposed another work that a
methodology was used and web log data was used to
improve mar keting activities [12].
• A research is carried out on mining interesting
knowledge from web logs which presented in [13].
• Ramya et al. have proposed a methodology for
discovering patterns in usage mining to improve the
quality of data by reducing the quantity of data[14].
• Maheswara Rao et al. have identified a research frame
work capable of preprocessing web log data
completely and efficiently. This framework helps to
mine usage behavior of the users [15].
• A research work specifies a recommender system that
was able to online personalization for user patterns
[16].
• In a research work, a methodology was proposed for
interesting knowledge mining through web access
logs [17].
A. Web Log Data
A Web log data is a listing of page reference data
sometimes it is referred to as click stream data[18]. The web
plays an important role and medium for extracting useful
information. The web server log data contains several
attributes. These attributes are as follows:
Number of Hits– This number usually signifies the
number of times any resource is accessed in a Website. A hit
is a request to a web server for a file i.e. web page, image,
JavaScript, Cascading Style Sheet, etc.
Number of Visitors– A visitor is exactly what it sounds
like. It is a human who navigates to the website and browses
one or more pages on the website.
34
International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE)
ISSN: 0976-1353 Volume 23 Issue 5 –SEPTEMBER 2016.
average of hits in a day, page views, bandwidth etc. It enlists
all the general information which one should know related to a
website.
Visitor Referring Website– The referring website gives
the information or URL of the website which referred the
particular website in consideration.
Visitor Referral Website– The referral website gives the
information or URL of the website which is being referred to
the particular website in consideration.
Time and Duration– This information in the web server
logs give the time and duration for how long the website was
accessed by the particular user.
Path Analysis– Path analysis gives the analysis of the path
to a particular user has followed in accessing contents of a
website.
Visitor IP Address– This information gives the IP address
of the visitors who visited the website.
Browser Type– This information gives the information of
the type of web browser that was used for accessing the
website.
Platform– This information provides the type of operating
systems or platforms etc. which
has been used to access the website.
Cookies– A message given to a web browser by a web
server. The browser stores the message in a text file called
cookie. The message is then sent back to the server each time
the browser requests a page from the server. The main
purpose of cookies is to identify users and possibly prepare
customized web pages for them.
A. Summary of Result
V. WEB LOG EXPERT TOOL
There are various commercial and freely available tools
exists for web mining purposes. Web Log Expert Lite7.8 is
one of the fast and powerful Web log analyzer tool [19]. This
tool helps to reveal important statistics regarding a web site’s
usage such as activity of visitors, access statistics, paths
through the website, visitors' browsers, etc. It supports W3C
extended log format that is the default log format of Microsoft
IIS 4.0/.05/6.0/7.0 and also the combined and common log
formats of Apache web server. It reads compressed log files
(.gz, .bz2 and .zip) and can automatically detect the log file
format. If necessary, log files can also be downloaded via FTP
or HTTP. We have been used a web log analyzer Web Log
Expert Lite7.8 web mining tool. It is one such program and
used to produce highly detailed, easily configurable usage
reports in Hypertext Markup Language (HTML) format, for
viewing with a standard web browser [19]. Using this web
mining tool we have been identified Hits statistics like Total
Hits, Visitors Hits, Average Hits per Day, Average Hits per
Visitor, etc., Page View Analysis like Total Page views,
Average Page Views per Day, Average Page Views per
Visitor, total Visitors, Total Visitors, Average Visitors per
Day, Total Unique IPs, Bandwidth, Total Bandwidth, Visitor
Bandwidth, Average Bandwidth per Day, Average
Bandwidth per Hit, and Average Bandwidth per Visitor of the
Website on monthly and day of the week basis.
B. Activity by Hour of Day
VI. RESULTS AND DISCUSSIONS
In this section we get overall information pertaining to
the website like how many times the website was hit an
35
International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE)
ISSN: 0976-1353 Volume 23 Issue 5 –SEPTEMBER 2016.
[10] L. K. Joshila Grace, V. Maheswari and D. Nagamalai, “Analysis of
Web Logs and Web User in Web Mining”, International Journal of
Network Security & Its Application (IJNSA), Vol. 3, No. 1, 2011, pp.
99-110
[11] Liu, H., et al., “Combined mining of Web server logs and web contents
for classifying user navigation patterns and predicting user’s future
requests”, Data and Knowledge Engineering, 2007,Vol 61, Issue 2,
pp.304-330.
[12] Arya, S., et al., “A methodology for web usage mining and its
applications to target group identification”, Fuzzy sets and systems,
2004, pp.139-152.
[13] F.M. Facca, and P.L. Lanzi, “Mining interesting Knowledge from Web
logs: a survey”, Elsevier Science, Data and Knowledge Engineering,
2005, 53, pp.225-241.
[14] G. R.C. et al., "An Efficient Preprocessing Methodology for
Discovering Patterns and Clustering of Web Users using a Dynamic
ART1 Neural Network, “Fifth International Conference on Information
Processing, 2011. Springer-Verlag.
[15] Maheswara Rao.V.V.R and Valli Kumari.V, "An Enhanced
Pre-Processing Research Framework for web Log Data Using a
Learning Algorithm," Computer Science and Information Technology,
DOI, pp. 1-15, 2011. 10.5121/csit.2011.1101.
[16] Mehrdad Jalali et al., “A Recommender System for Online
Personalization in the WUM Applications”, Proceedings of the World
Congress on Engineering and Computer Science 2009 Vol. II, WCECS
2009, October 20-22, 2009, San Francisco, USA.
[17] K Sudheer Reddy et al, “An Effective Methodology for Pattern
Discovery in Web Usage Mining”, International Journal of Computer
Science and Information Technologies, Vol. 3 (2) , 2012, 3664-3667.
[18] Castellano.G et al., “Log Data Preparation for Mining Web Usage
Patterns”, International Conference Applied Computing, 2007,
pp.371-378.
C. Activity by Day of Week
VII. CONCLUSION
Web is one of the most used interfaces to access remote data,
commercial and non-commercial services. Web mining is a
growing area with the growth of web based applications to
find web usage patterns. By using web mining we could found
website user’s interest and behavior through which we can
make our website valuable and easily accessible. The
complete work has accomplished by analyzing educational
institution web log data for one month period. Our
experimental results help to predict and identify the number of
visitors for the Website and improve the Website usability.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
Brijendra Singh, Hemant Kumar Singh: Web Data Mining Research: A
Survey, IEEE, 2010.
G.K. Gupta, Introduction to Data Mining with Case Studies: Web Data
Mining, PHI Learning Private Limited, pp. 231-233, 2011.
Ankita Kusmakar, Sadhna Mishra ,Web Usage Mining: A Survey on
Pattern Extraction from Web Logs , International Journal of Advanced
Research in Computer Science and Software Engineering, Volume 3,
Issue 9, September 2013 ISSN: 2277 128X,Page:-834-838.
Cooley, Robert, Bamshad Mobasher, and Jaideep Srivastava. "Data
preparation for mining world wide web browsing patterns." Knowledge
and information systems 1.1 (1999): 5-32.
V. Sujatha and Punithavalli, “Improved User Navigation Pattern
Prediction Technique From Web Log Data”, ELSEVIER-2012.
K. Sudheer Reddy, M. Kantha Reddy & V. Sitaramulu, “An Effective
Data preprocessing Method for Web Usage Mining”, Feb-2013, IEEE.
Navin kr Tyagi ,A.K. Solanki, Manoj Wadhwa : Analysis of Server Log
by Web Usage Mining for Website Improvement, IJCSI International
Journal of Computer Science Issues, Vol. 7, Issue 4, No 8,pp.
17-21,2010.
L.K. Joshila Grace, V.Maheswari, Dhinaharan Nagamalai: Analysis of
Web Logs and Web User In Web Mining, International Journal of
Network Security & Its Applications (IJNSA), Vol.3, No.1, January
2011.
Theint Aye: Web Log Cleaning for Mining of Web Usage Patterns,
IEEE, 2011.
36