Download A Web Usage Mining Framework for Business Intelligence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Corecursion wikipedia , lookup

Pattern recognition wikipedia , lookup

Data analysis wikipedia , lookup

Neuroinformatics wikipedia , lookup

Transcript
International Journal of Electronics Communication and Computer Technology (IJECCT)
Volume 1 Issue 1 | September 2011
A Web Usage Mining Framework for Business
Intelligence
Sonal Tiwari
Department of MCA
NRI institute of information science & Technology
Bhopal, India
[email protected]
Abstract—In this paper, we introduce a web mining solution to
business intelligence to discover hidden patterns and business
strategies from their customer and web data. We propose a new
framework based on web mining technology. Web mining
attempts to determine useful knowledge from secondary data
obtained from the interactions of the users with the web.
Keywords- Data mining, Web Mining, Web Usage Mining,
Xml..
I.
INTRODUCTION
Web mining has become very vital for effective web side
management, creating adaptive web sides, business and support
services, personalization, network traffic flow analysis and so
on[1]. The WWW continues to grow at wonderful rate as an
information gateway and as a medium for conducting business.
Web mining is the extraction of appealing and useful
knowledge and implicit information from artifacts or activity
correlated to the WWW [2]. Web mining has been widely used
in the past for analyzing huge collections of data, and is
currently being applied to a variety of domains [3]. Based on
several research studies we can broadly classify web mining
into three domains: content, structure and usage mining. Web
content mining is the process of extracting knowledge from the
content of the actual web documents (text content, multimedia
etc.). Web structure mining is targeting useful knowledge from
the web structure, hyperlink references and so on. Web usage
mining attempts to discover useful knowledge from the
secondary data obtained from the interactions of the users with
the web.
Web usage mining has become very critical for effective
web site management, creating adaptive web sites, business
and support services, personalization and network traffic flow
analysis. In this paper, we describe a framework that aims at
solution to business intelligence to discover the hidden insight
of their business and web data. We demonstrate how web
mining technology can be effectively applied in business
intelligence. The framework we propose takes the results of the
web mining process as input, and converts these results into
actionable knowledge, by enriching them with information that
can be readily interpreted by the business analyst.
ISSN: 2249-7838
Figure 1: Web Usage Mining Framework
II.
WEB USAGE MINING AND BUSINESS
INTELLIGENCE
The fast business growth has made both business community
and customers face a new situation. Due to intense
competition on the one hand and the customer's option to
prefer from a number of alternatives, the business community
has realized the essential of intelligent marketing strategies
and relationship management.
Web servers record and accumulate data about user
relations whenever requirements for resources are received.
Analyzing the Web access logs can help understand the user
behavior and the web structure. From the business and
applications point of view, knowledge obtained from the web
usage [5] patterns could be directly applied to efficiently
manage activities correlated to e-business, e-services and eeducation. Accurate web usage information could help to
attract new customers, retain current customers, improve cross
marketing/sales, effectiveness of promotional campaigns,
tracking leaving customers etc. The usage information can be
exploited to improve the performance of Web servers by
developing proper perfecting and caching strategies so as to
decrease the server response time. User profiles could be built
by combining users‟ navigation paths with other data features,
such as page viewing time, hyperlink structure, and page
content [5].
IJECCT | www.ijecct.org
19
International Journal of Electronics Communication and Computer Technology (IJECCT)
Volume 1 Issue 1 | September 2011
Web Usage Mining techniques can be used to anticipate the
user behavior in real time by comparing the current navigation
pattern with typical patterns which were extracted from past
Web log. Recommendation systems could be developed to
recommend interesting links to products which could be
interesting to users.
One of the major issues in web log mining is to group all
the users‟ page requests so to clearly identify the paths that
users followed during navigation through the web site. The
most common approach is to use cookies to track down the
sequence of users‟ page requests or by using some heuristic
methods. Session reconstruction is also difficult from proxy
server log file data and sometimes not all users‟ navigation
paths can be identified.
A. Data Sources
The usage data collected at different sources represent the
navigation patterns of different segments of the overall web
traffic, ranging from single user, single site browsing Behavior
to multi-user, multi-site access patterns. Web server log does
not accurately contain sufficient information for inferring the
behavior at the client side as they relate to the pages served by
the web server. Data may be collected from (a) Web servers,
(b) proxy servers, and (c) Web clients. Web servers collect
large amounts of information in their log files
Databases are used instead of simple log files to store
information so to improve querying of massive log repositories
[6]. Internet service providers use proxy server services to
improve navigation speed through caching. Collecting
navigation data at the proxy level is basically the same as
collecting data at the server level but the proxy servers collects
data of groups of users accessing groups of web servers. Usage
data can be tracked also on the client side by using JavaScript,
Java applets, or even modified browsers [8].
B. Data Pre-Processing
The raw web log data after pre-processing and cleaning
could be used for pattern discovery, pattern analysis, web usage
statistics, and generating association/ sequential rules. Much
work has been performed on extracting various pattern
information from web logs and the application of the
discovered knowledge range from improving the design and
structure of a web site to enabling business organizations to
function more efficiently .Data pre-processing involves
mundane tasks such as merging multiple server logs into a
central location and parsing the log into data fields. The
preprocessing comprises of (a) the data cleaning, (b) the
identification and the reconstruction of users‟ sessions, and (c)
the data formatting. Data cleaning Consists of removing all the
data tracked in Web logs that are useless for mining purposes.
Graphic file requests, agent/spider crawling etc. could be easily
removed by only looking for HTML files requests.
Normalization of URL‟s is often required to make the requests
consistent.
ISSN: 2249-7838
III.
DATA MINING TECHNIQUE
The term data mining [8] refers to a broad spectrum of
mathematical modeling techniques and software tools that are
used to find patterns in data and user these to build models. In
this context of recommender applications, the term data mining
is used to describe the collection of analysis techniques used to
infer recommendation rules or build recommendation models
from large data sets. Recommender systems that incorporate
data mining techniques make their recommendations using
knowledge learned from the actions and attributes of users.
Classical data mining techniques include classification of users,
finding associations between different product items or
customer behavior, and clustering of users [9].
A. Clustering
Clustering techniques work by identifying groups of
consumers who appear to have similar preferences. Once the
clusters are created, averaging the opinions of the other
consumers in her cluster can be used to make predictions for an
individual. Some clustering techniques represent each user
with partial participation in several clusters. The prediction is
then an average across the clusters, weighted by degree of
participation.
B. Classification
Classifiers are general computational models for assigning
a category to an input. The inputs may be vectors of features
for the items being classified or data about relationships among
the items. The categories are a domain specific classification
such as malignant/benign for tumor classification,
approve/reject for credit requests or intruder/authorized for
security checks. One way to build a recommender system using
a classifier is to use information about a product and a
customer as the input, and to have the output .
C. Association Rules Mining
Association rule mining is to search for interesting
relationships between items by finding items frequently
appeared together in the transaction database. If item B
appeared frequently when item A appeared, then an association
rule is denoted as A confidence are two measures of rule interestingness that reflect
usefulness and certainty of a rule respectively [10].Support, as
usefulness of a rule, describes the proportion of transactions
that contain both items A and B, and confidence, as validity of
a rule ,describes the proportion of transactions containing item
B among the transactions containing item A. The association
rules that satisfy user specified minimum support threshold
(minSup) and minimum confidence threshold (minCon) are
called Strong association rules.
One of the best-known examples of web mining in
recommender systems is the discovery of association rules, or
item-to-item correlations [11]. Association rules have been
used for many years in merchandising, both to analyze patterns
of preference across products, and to recommend products to
consumers based on other products they have selected.
Recommendation using association rules is to predict
IJECCT | www.ijecct.org
20
International Journal of Electronics Communication and Computer Technology (IJECCT)
Volume 1 Issue 1 | September 2011
preference for item k when the user preferred item i and j, by
adding confidence of the association rules that have k in the
result part and i or j in the condition part [9].
IV.
WEB MINING FRAMEWORK FOR BUSINESS
INTELLENGENCE
A. A Visual Web Log Mining Architecture
In this section, we present A Visual Web Log Mining
Architecture [7] for e-commerce recommender systems, named
V-Web Log Miner, which relies on mining and on visualization
of Web Services log data captured in business intelligence
environment .As shown in Figure 2, V-Web Log Miner is a
multi-layered architecture capable to deal with both Web
services XML based logs and traditional Web server logs as
input data.
3) The Data Layer:
The data layer is a repository of input/output business
intelligence data. It also stores pre-processed logs, business
intelligence sessions, and information about the Web services
execution.
4) The Recommendations Engine Layer:
The recommendations engine Layer is a data mining engine
and is in charge of bulk loading XML data from database,
executing SQL commands against it and executes the mining
algorithms. This layer integrates BI tools, e.g. OLAP and data
mining etc.. Using the user profiles and content profiles, the
businesses apply data mining techniques [12] to identify
appropriate business rules. These rules could involve a simple
classification of the users using their profiles and the website
click-streams, association between content profiles and user
behavior, or association between different products. The
knowledge of customers‟ behavior will help to improve
customer relationships and make business strategies.
5) The Visualization tools:
Visualization tools should be used to present implicit and
useful knowledge from recommendations engine, Web services
usage and composition. Data can be viewed at different levels
of granularity and abstractions as patrolled coordinate‟s graphs
[10, 11].This visual model easily shows the interrelationships
and dependencies between different components. Interactively,
the model can be used to discover sensitivities and to do
approximate optimization, etc.
V.
Figure 2: Visual Web Log Mining Architecture
1) The integration: layer
The integration layer is set of programs used to prepare data
for further processing. For instance: extraction, cleaning,
transformation and loading. This layer uses X Query, XSLT
and XML Schemas to feed the data repository, i.e., relational or
XML [15] native database. The Web log parser component is
used to parse and transform plain ASCII files produced by a
Web server to a standard database format. This component is
important to make the architecture independent from the Web
server supplier
2) The Sessionization Layer:
The sessionization layer is used to tie the instances of Web
services and Web pages to sessions and to user. This layer is
important to investigate the usage of the Web services
composition used through users sessions.
ISSN: 2249-7838
CONCLUSION AND FUTURE WORK
This paper introduces the basic ideas of Recommender
system and importance of web usage mining in business
intelligence. Recommender systems have emerged as powerful
tools for helping customers find items of interest. The research
work existing in this paper makes several contributions to the
framework of recommender systems linked research. First of
all, we propose a new framework based on web mining
technology for structure a Web-page recommender system.
Additionally, we demonstrate how web mining technology can
be effectively applied in a business intelligence environment.
There are some possible extensions to this work. Research for
analyzing customers‟ past purchasing pattern will enable to
discover an appropriate. Also, it will bean interesting research
area to conduct a real marketing promotion to target customers
using our slant and then to evaluate its performance.
We have developed web mining for a business
intelligence .for this we have developed a framework that uses
association rule, web mining and on visualization of web
services log data captured in business intelligence environment.
REFERENCES
[1]
P. Pirolli, J. Pitkow, and R. Rao, “Silk From a Sow„s Ear: Extracting
Usable Structures from the Web”,Proceedings on Human Factors in
ComputingSystems (CHI‟96), ACM Press, 1996
IJECCT | www.ijecct.org
21
International Journal of Electronics Communication and Computer Technology (IJECCT)
Volume 1 Issue 1 | September 2011
[2]
[3]
[4]
[5]
[6]
[7]
R. Cooley, “Web Usage Mining: Discovery and Application of
Interesting patterns from Web Data”, Ph. D. Thesis, University of
Minnesota, Department
D.J.H and, H.Mannila, and P.Smyth. “Principles of Data Mining”.MIT
Press, 2000.f Computer Science, 2000.
M. Spiliopoulou, and L.C. Faulstich, “WUM: A Web Utilization Miner”,
Proceedings of EDBT Workshop on the Web and Data Bases
(WebDB‟98), Springer Verlag, pp. 109-115, 1999.
F. Masseglia, P. Poncelet, and R. Cicchetti, “An Efficient Algorithm for
Web Usage Mining”, Networking and Information Systems Journal
(NIS), vol.2, no. 5-6, pp. 571-603, 1999
K.P. Joshi, A. Joshi and Y. Yesha, On using a warehouse to analyze web
logs, Distributed and Parallel Databases, 13 (2), pp. 161–180, 2003.
J. Srivastava, R. Cooley, M. Deshpande, and P.N. Tan, Web Usage
Mining: Discovery and Applications of Usage Patterns from
Web Data. SIGKDD Explorations, vol. 1, no. 2, pp. 12-23,2000.
[8]
F.M. Facca and P.L. Lanzi, Mining interesting knowledge from
weblogs: a survey, Data & Knowledge Engineering, Volume 53, Issue 3
, pp.225-241, 2005.
[9] Choonho Kim and Juntae Kim, A Recommendation Algorithm Using
Multi-Level Association Rules, Proceedings of the 2003 IEEE/WIC
International Conference on Web Intelligence, p.524, October 1317,2003.
[10] J. Han and M. Kamber, Data Mining: Concepts and Techniques,
Morgan Kaurmann Publishers, 2000
[11] Sarwar, B., Karypis, G., Konstan, J.A., & Reidl, J. Item-based
Collaborative
Filtering
Recommendation
Algorithms.
Proceedings of the Tenth International Conference on World
Wide Web, pp. 285 -295, 2001.
[12]
[13]
[14]
[15]
I.Press. IBM intelligent miner.In IBM Documentation, 2001.
O.Press. Oracle personalization.In Oracle Documentation, 2001.
O.Press. Oracle personalization.In Oracle Documentation, 2001.
Inselberg, A. Multidimensionl detective, In IEEE Symposium on
Information Visualization, 1997, vol.00, p.100-110.
ISSN: 2249-7838
IJECCT | www.ijecct.org
22