Download Web Usage Mining – Its Application in E-Services

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013)
Web Usage Mining – Its Application in E-Services
Anupama Prasanth
1
Research Scholar, Karpagam University, India
Lecturer, AMA International University, Bahrain
2
For that gather all the huge of volume of distributed data
from the web, they are in semi structured format so the next
task is to extract and convert them into a standard format in
order for easy processing [19]. Because of the diversity
nature of the information in web, the sampling data set is
little bit large. Even though the difficulties above, the Web
also provides other ways to support mining, for example,
the links among Web pages are important resource to be
used. [7] Besides the challenge to find relevant
information, users could also find other difficulties when
interacting with the Web such as the degree of quality of
the information found, the creation of new knowledge out
of the information available [7]. All these can be resolved
by applying the web mining tools, an application of data
mining. These tools can easily solve the challenging
problems like searching in web links, ranks the importance
of Web contents, discovers the regularity and dynamics of
web contents, and mines web access patterns. Web Mining
is categorized into three depends upon their applications
area.
Web usage mining is one of the fastest developing areas
of web mining [17]. Its attention in analyzing users
behavior on the web after exploring access logs made its
popularity very rapidly especially in E-services areas. Its
direct applications in these areas added its admiration and
made it as an inevitable part in computer and information
sciences [18]. Details like user log files, request for
resources etc. are maintain in web servers, which is the
core mining area of web usage. The analysis of these gives
the user browsing patterns and that can be utilized for
target advertisement, enhancement of web design,
satisfaction of customers and making market analysis. Most
of the e-service providers realized the fact that they can
apply this tool to retain their customers [20].
This paper focused on web usage mining and is
structured as follows: In section 2 we provide an overview
of Web mining categories. In section 3 we discuss Web
usage mining and in section 4 its application in E-services.
And we conclude this paper in Section 5.
Abstract— Retrieving knowledge from World Wide Web is
a tedious task because of the growth in the availability of
information resources on it. So this escalates the necessity to
employ an intelligent system to retrieve the knowledge from
World Wide Web. The performances of Web information
retrievals and Web based data warehousing are boosted with
the extraction of information from the Web using web mining
tools. Web usage mining is one of the fastest developing areas
of web mining. Its attention in analyzing users behavior on the
web after exploring access logs made its popularity very
rapidly especially in E-services areas. Most of the e-service
providers realized the fact that they can apply this tool to
retain their customers. This paper tries to provide an insight
into web mining and the different areas of web mining. Then
it focuses on Web usage mining, its application and impact in
E-services.
Keywords— E-Commerce, E-Governance, E-Learning, EServices, Pattern Analysis, Pattern Discovery, Pre-processing,
Web Content Mining, Web Structure Mining, Web Usage
Mining.
I. INTRODUCTION
The World Wide Web has variety of information service
centers, like news sites, encyclopedias, education sites, ecommerce etc. So the information in WWW is spreads in
theses information centers globally. To retrieve from these
distributed storage areas, is a quite difficult process and it
required an efficient tool to find the desired information.
Only an Intelligent system which effectively mine for
knowledge can resolve these problems [6].
The following factors made it difficult for an effective
data warehousing and data mining [6].
 The huge size of the web
 No proper structure for the web documents.
 The dynamic nature of the information source.
 The diversity in usage and user communities.
However these are the challenges that stand as the
driving force for the research into efficient and effective
discovery and use of resources on the Internet. The
fundamental characteristics of web make us have to think
about to shape and lengthen the outmoded methodologies,
accordingly.
572
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013)
Web structure mining tries to identify authoritative web
pages. Web is a complex data store it contains not only
pages but also of hyperlinks pointing from one page to
another. By giving a hyperlink to another page the author
tries to show his testimonial of the other page. This
tremendous linkage information forms a rich source of web
mining. It offers stunning information about relevance, the
quality, and the structure of web contents. The architecture
of the hyperlinks underlying the website is the result of this
category, and appropriate handling of this information can
lead into an improvement in accuracy of the web page
retrieval.
Web usage mining tries to extract useful information
from user web access history, what they are interested on
the Internet whether textual data or multimedia data etc.
Web usage mining collects the data from Web log records
to discover user access patterns of Web pages. These
patterns lead to accessed web pages. This is vital
information for companies and their internet/intranet based
applications. They use the analyzed reports of those
patterns for different purposes. The applications generated
from this analysis can be classified as personalization,
system improvement, site modification, business
intelligence and usage characterization [2]. Web usage
mining depends on the collaboration of the user to allow
the access of the Web log records. Due to this dependence,
privacy is becoming a new issue to Web usage mining,
since users should be made aware about privacy policies
before they make the decision to reveal their personal data
[8].
In many cases, Web content, structure and usage
information is co-present in the same data file. There is no
clear boundary between these categories
II. WEB MINING OVERVIEW
The unearthing and exploration of useful information
from WWW termed as Web mining. This can be easily
understood by the suggestion of Kosala and Blockeel [1].
According to them Web Mining is composed of
following tasks:
1. Resource finding: Find matching documents from
web.
2. Information selection and pre-processing: From the
selected list nominate the relevant documents and preprocess those data.
3. Generalization: Analyze the documents and
spontaneously determines general trends.
4. Analysis: Use the general trends and mark
conclusions.
There are three categories of Web Mining [1, 2] - Web
usage Mining, describes the process of extracting useful
information from user access patterns; Web content mining,
the automatic search of information resources available online; and Web structure mining, which is a tool to identify
the authoritative pages. The process of knowledge
detection of hidden and possibly useful information from
the Web is the main attention of each category, but they
concentrates on different mining stuffs of the Web [16]. A
brief description of each of these categories is follows:
Web pages are semi-structured, DOM, in nature.
Unfortunately, the HTML environment is so flexible;
majority of web pages do not follow the standard structure,
which may end up in errors in the DOM tree structure.
This is the context where we apply the efficient mining,
Web content mining. Its techniques are equivalent of data
mining techniques for text mining, since it is possible to
find similar types of information from the unstructured data
residing in Web documents. The Web document usually
contains several types of data, such as text, image, audio,
video, metadata and hyperlinks. The unstructured
characteristic of Web data forces the Web content mining
towards a more complicated approach. Web content mining
can be explained in two different contexts [6]: information
retrieval and database. The role of web content mining in
Information Retrieval is mainly to support the information
findings or improve the information filtering based on user
queries. This result can be applied to web search engines
and web personalization systems. In Database context the
content mining can be aid to integrate data on the web, so
that more sophisticated queries other than the keywords
based search could be performed. The mining result can be
used to build the web warehouse and web database, and
apply warehousing and database techniques on the data.
III. WEB USAGE MINING
Web usage mining provides better understanding for
serving the needs of Web-based applications [2]. It is the
automatic unearthing of user access patterns from the
servers on the web. The companies retrieve huge quantities
of data from their day to day operations which are usually
generated by the web servers and are saved in the server
access logs.
In fact, Web usage mining has many benefits which
attract business and government agencies towards it.
Government agencies utilized the classification and
predicting capability of this technology to fight against
terrorism and identifying criminal activities.
573
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013)
Business sectors are benefited by personalized
marketing, customer retention, and customer relationship
and even they got the opportunity to provide promotional
offers to specific customers to retain them.
The stated activities are carrying out by three major tools
of Web Usage Mining, shown in figure I [21], namely:
Preprocessing, Pattern discovery, and Pattern analysis [9].
A. Preprocessing
The conversion of user information into the format of data
abstraction, which is an essential part of pattern discovery,
is preprocessing. According to the preprocessing data, it is
categorized into three: Usage Preprocessing, Content
Preprocessing and Structure Preprocessing.
Usage preprocessing is the one of the difficult task in web
usage mining. They gather data from IP address, agents and
server side click streams, because of the nature of data,
always the data is incomplete. Preprocessing of text, image,
scripts and multimedia files are carrying out in Content
preprocessing.
Structure preprocessing involves the
processing of hyperlinks between the page views.
FIGURE I: WEB USAGE MINING PROCESS
IV. APPLICATION IN E-SERVICES
World Wide Web is become a wide spread medium for
the circulation of information. Advancement in technology
reveals that the volume of data in web and its complex
structure are increasing day by day. In this scenario, the
application of web usage mining has its own significance.
Rapid growth in online services called e-service
applications like e-commerce, e- governance, e- market, efinance, e-learning, e-banking etc. has made business
community and customers face a new situation. Adoption
of -intelligent marketing strategies are the only solution for
the business community to face the challenges of business
competition and the customers option to choose from
several alternatives. This paper focuses on application of
web usage mining in three main e-services E-commerce, Elearning and E-governance.
B. Pattern Discovery
Pattern discovery tools are derived from several fields
such as statistics, data mining, machine learning and
pattern recognition. After cleaning the data and the
identification of user transactions and sessions from access
logs only we can start pattern discovery process.
Statistical techniques are used to abstract knowledge
about the website visitors. Then from this abstracted
knowledge Association rule generates the association
between frequently referenced pages and Sequential
pattern tools helps in predicting future visit patterns. From
those data Clustering tools group’s similar characteristics
items together, most interested groups in web usage mining
tasks are image group, image cluster, and page group, page
cluster, and Classification tool do the generalization
process and combine together into one predefined class.
A. E-Commerce:
E-Commerce means two trading parties based on
Internet according to certain rules or standard developing
the whole traditional business activity in digital network
mode [10]. Buying and selling of products or services
through Internet, E-commerce generates huge volume of
interactions. This tremendous growth in the E-commerce
enterprise, twisted to product surplus. Also they faced a
common question on how to know the customer
satisfaction and their purchase trend. The competition in
the field raised the necessity of serving customers in better
way also initiated many issues. In order to provide better
service to the users there should be a requirement of an
efficient marketing strategy for analyzing their satisfaction
and usage.
C. Pattern Analysis
Pattern analysis is the last part of Web Usage Mining.
This phase will filter out all unimportant patterns from the
set found in the pattern discovery. Knowledge query
mechanism, such as SQL, is the most common form of
pattern analysis method. These use content and structure
information also for filtering out patterns containing pages
of certain usage types, content types or pages that match a
certain hyperlink structure.
574
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013)
Discovering usage patterns from web data is the
techniques which adopted in web usage mining. It has been
an important technology for understanding users' behaviors
on the Web. Its technology to collect customer click
streams, buying and traversing patterns etc. are vital
information for the E-commerce enterprises. They assist in
analyzing demographic data and help them to propose
cross-marketing policies across products and services. It
also supports e-commerce sites to retain the most profitable
customers [6], improve the functionality of web based
applications, provides more custom-made content to
visitors. In addition, with the use of Web usage mining
techniques e-commerce companies can improve products
quality or sales by anticipating problems before they occur.
They also provide companies with previously unknown
buying patterns and behavior of their online customers.
More importantly, the fast feedback the companies obtain
by using Web usage mining is very helpful in increasing
the company’s benefit [7].
C. E-Governance
E-governance provides a single web portal that
integrates all services that includes government, nonprofit
and private-sector entities [13]. In such a type of service
system which provides ready access to information, the
user interface quality is an important factor. This is one of
the challenging user-centric parameter since this has to
provide information to extensive and various users [15]. If
the presentation sub- system adjusts according to the
individual inclinations of each user will ensure extensive
participation in e-governance systems.
The patterns of the online behavior of the users can be
discovered by using Web usage mining techniques. These
patterns reveal the user interests and that can be utilized to
fine tune user interfaces and suggest the most appropriate
browsing paths. User requirements also are exhibiting in
their navigation behavior. Analyzed results can be seen as
knowledge to be used in intelligent online applications,
refining web site maps, web based personalized system.
This technology also uses the experience of users of past
sessions to provide recommendation to users of current
session [14].
B. E-Learning
E-learning is a form of electronically supported learning
which allows the people to learn any subject at anytime and
anywhere. The simplicity in using the tools to browse the
resources on the web, its easiness in deploying and
maintaining resources made the web as an excellent tool for
delivering courses. Web is one and only major choice to
manage and maintain learning resources and has become
one of the leading choice of modern advanced distance
education system. As education becomes more
technologically advanced, the complexity of available
learning resources also increased accordingly. It is difficult
to evaluate the structure of the course content and its
effectiveness on the learning process. Track and judge all
the activities performed by learners are also very tough as
well as time consuming.
This is the scenario where web usage mining can
contribute. The pattern analysis capability of web usage
mining has an important role in web-based learning system.
They can analyze the students and instructors behavior [11]
and improve the educational experience. Tracking the
activities happening in the course website and mine
patterns is also beneficial to improve or adapt the course
contents. This allows instructors to appraise the access
behavior, assess the learning activities and compare
learners. The arrangement of the course contents can be
enhanced by analyzing the traversal paths of the course
content web pages is another advantage of Web usage
mining [12].
V. CONCLUSION
Web usage mining is becoming an active interesting
field of research because of its prospective commercial
benefits. It is further possible to analyze the visitor’s
behavior by linking the Web logs with cookies and forms,
and which could help e-services site to address several
business questions. Its attention in analyzing users behavior
on the web after exploring access logs made its popularity
very rapidly especially in E-services areas. Details like user
log files, request for resources etc. are maintain in web
servers, which is the core mining area of web usage. The
analysis of these gives the user browsing patterns and that
can be utilized for target advertisement, enhancement of
web design, satisfaction of customers and making market
analysis. Most of the e-service providers realized the fact
that they can apply this tool to retain their customers
REFERENCES
[1 ] E. Kim, W. Kim, Y. Lee. Purchase propensity prediction of EC
customer by combining multiple classifiers based on GA.
International Conference on Electronic Commerce 2000: 274~280.
[2 ] J. B. Schafer, J. A. Konstan, J. Riedl. E-commerce recommendation
applications. Data Mining and Knowledge Discovery,
2001(5):115~153.
[3 ] S. W. Changchien, T. Lu. Mining association rules procedure to
support on-line recommendation by customers and products
fragmentation. Expert Systems with Applications, 2001(20):
325~335.
575
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 2, February 2013)
[4 ] B. Sarwar, G. Karypis, J. Konstan, J. Riedl. Analysis of
recommendation algorithms for e-commerce. Proceedings of ACM
Ecommerce 2000 Conference, 2001:158~167.
[5 ] S. Yuan, W. Chang. Mixed-initiative synthesized learning approach
for Web-based CRM. Expert Systems with Applications,
2001(20):187~200.
[6 ] http://revistaie.ase.ro/content/51/104%20-20
SIVA
RAMA
KRISHNAN, %20BALAKRISHNAN.pdf
[7 ] J. g. Liu, h. h. Huang. Web Ming for Electronic Business
Application, Proceedings of the Fourth International Conference on
Parallel and Distributed Computing, Applications and Technologies,
Chengdu, China, 2003:872~876.
[8 ] http://www.ceng.metu.edu.tr/~nihan/ceng553/StudentPapers/016351
56
[9 ] Jaydeep Srivastava, Robert Cooley, Mukund Deshpande, Pang-Ning
Tan; Web Usage mining: Discovery and Applications of Usage
Patterns from Web Data; ACM SIGKDD; Jan 2000; Volume 1; Issue
2.
[10 ] Ning Bin, Lei Yuan; Research on Application of Web Mining in Ecommerce; Advanced Materials Research - Scientific. Net; Volume
403 – 408; Pages 1830 – 1833; Nov 2011.
[11 ] Romero C. Ventura S, Pechenizky M , Baker R. S ; Handbook of
educational data mining; 2010; CRC Press.
[12 ] Bart C Palmer; Web Usage Mining: Application to an online
educational digital library service; Digital Commons@USU; 2012
[13 ] Zakareya Ebrahim and Zahir Irani; ―E-government adoption:
architecture and barriers‖ Emerald Business Process Management
journal, vol.II, No.5 2005, pp589-611, 2005
[14 ] A. S.Chakraverty 1, B. G.Rani, C. B.Singla and D. D.Anand;
Experience based recommendations system for e-governance; 2012.
[15 ] G.Rani; S.Chakraverty, ―Boosting Interactivity of EGovernance‖,
International Conference on Communication Languages and Signal
Processing- with Preference to 4 G Technologies‖, ICCLSP 4G,
January2012.
[16 ] Nasraoui, O.; Soliman, M.; Saka, E.; Badia, A.; Germain, R.; "A
Web Usage Mining Framework for Mining Evolving User Profiles
in Dynamic Web Sites,"Knowledge and Data Engineering, IEEE
Transactions on , vol.20, no.2, pp.202-215, Feb. 2008
[17 ] Cooley, R., Mobasher, B., Srivastava. J., ―Web Mining: Information
and Pattern Discovery on the World Wide Web‖, Proceedings of the
9th IEEE International Conference on Tools with Artificial
Intelligence (ICTAI'97), November 1997.
[18 ] H. Lieberman. Letizia: An agent that assists web browsing. In Proc.
of the 1995 International Joint Conference on Artificial Intelligence,
Montreal, Canada, 1995.
[19 ] Tao Huachuca, Jiang Lingyan. Web-based Data Mining Behavior
Analysis and Research. Fujian Computer. 2004 No. 3
[20 ] Jin Fengrong. Study of Web Usage Mining and Discovery of Browse
Interest. Master's Degree Thesis of Beijing Science and Technology
University. February 2004
[21 ] Chu Hue Lee, Yo Lung Lo, Yu Hsiang Fu; A novel prediction model
based on hierarchical characteristic of web site; Elsevier; Volume 38
Issue 4 , April 2011, Pages 3422 – 3430
576