Download Enhancing e-Business Through Web Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Enhancing e-Business Through
Web Data Mining
Amy Shi 1, Allen Long2, and David Newcomb3
Accurate Business Solutions, Courtyard, Denmark Street,
Wokingham, RG 40 2AZ, U.K.
[email protected]
2 School of SCISM, South Bank University,
London, SE1, 0AA, U.K.
[email protected]
3 BigSoft Limited, 40 Belmond Road, Reading, RG30 2UU, U.K.
[email protected]
1
Abstract. Today, there is more interest than ever about “e.” The Internet, ecommerce and e-business undoubtedly hold an important key to every organisation’s
future. The paper aims to introduce a web data mining solution to e-businesses to
discover hidden patterns and business strategies from their customer and web data. A
three-layer virtual e-business framework is proposed in the paper, as well as the web
mining technique to personalise e-services, increase cross-selling, and improve the
customer relationship management. Compared with general data mining algorithms,
specific characteristics associated with web data are discussed too.
1. Introduction
Today, there is more interest, more discussion and more hype than ever
about “e.” The Internet, e-commerce and e-business undoubtedly hold an
important key to every organisation’s future and success, offering
tremendous opportunities and worldwide markets. Nobody can afford to let
the competition pass by, yet if started in haste, it is bound to fail – ebusiness projects and dot-com companies have unfortunately the highest
rate of failure due to the poor understanding of the new rules in the eeconomy environment.
To differentiate themselves in the Internet economy, wining enterprises
are realizing that e-business is much more than a simple buy/sell
transaction, right e-strategies are the key to successfully increasingly
competitiveness in the marketplaces. However, even the principles that made
organisations successful yesterday are still the best foundation of where to
start today, the implementation of e-strategy is not as easy as simply adding
an “e” in front of their current business strategy.
This paper aims to introduce a data mining solution to e-businesses to
discover the hidden insight of their business and web data. This will help eorganisations to make intelligent business strategies and improve their
customer relationships management. A three-layer virtual e-business
framework is proposed in Section 2. Section 3 discusses how to enhance ebusiness through web data mining.
2. A Virtual Framework of e-Business
2.1 Data Involved in e-business
Generally, the business data involved e-businesses is massive. Mostly, it
contains customer information, purchase information, product/service
information, suppliers, security and priority information, management
reports including standing and statistic analysis of production, sale,
financial etc, as well as online web access data. Fig.1 shows an example of
basic kinds of data involved with e-business, the content may vary with
different types of e-businesses. In the figure, tables circled together, e.g.
Customer, Contact, Product, Purchase, Payment and Web_log, are connected
to each other. The management database contains all the information,
reports and knowledge generated by an organisation for business
management.
Fig. 1 An example of e-business data
This online e-business data is growing constantly. Effectively organising and
managing the data is a fundamental task to all e-businesses. There is no
doubt that advanced database/warehouse technology is required to handle
the data which is likely in different formats and distributed environments,
providing reasonably quick response to customer queries and intraprocessing. Additionally, the data is required to be shared by the whole
organisation based on specific priority control policies in the e-business
environment, leading to a common data resource and processing platform.
2.2 Three-Layer Architecture
A new virtual e-business framework is structured as a three-layer
architecture, i.e. customer service, data manager and business intelligent
(BI) support, as shown in Fig. 2.
Fig. 2: A virtual e-business framework, the three-layer architecture
•
Customer Service - External and Internal Navigation Platforms:
This layer is essential to any kind of e-business. The external web platform
provides major part of communications and services to visitors or customers.
A well-designed website pages should have the characteristics like easy to
use, quick responds, good quality but right amount of information,
convenient access to customer related data without returning etc, as well as
the security guarantee.
Back offices of the e-organisation, like customers, also work on the net
via the Intra-navigation. This is a common platform to all functioning
departments to share the same data resource and deliver processed results.
For example, when a customer order is received and put into database by
the sales department, it will also involve the financial department to deal
with the payment, the delivery department to arrange the shipping, and
customer service department to confirm the order. Every department will
deliver relevant records and modify the data once a process is completed.
Business reports and internal documents are also sharable through the
platform, so that a marketing manager can quote the numbers of finance
manager’s reports to support the performance of a promotion campaign. The
overall performance of the e-organisation is improved via the intra-platform
by providing up-to-date and accurate information to every element.
•
Data Manager
This layer is very important to the effectiveness of e-business, as it is in
charge of the management of the entire e-business data that discussed in
Section 2.1. In fact, this layer acts as a bridge that links visitors/customers
and organisation together via data exchange. It requires advanced database/
warehouse technology and a well-designed data structure.
This layer can significantly strengthen an e-organisation’s intra-process
by using automatic function-oriented agents. For instance, the standard
order-processing example discussed above can be done by a sales agent and
look after all relevant data records effectively. Fig.3 shows how the sales
agent (the black cartoon) can handle a standard sales procedure and modify
related data records when a sub-task, such as credit confirmation or
inventory check, is completed. In addition, the sales agent can also deal with
some special cases. For example, if the publication of a new book is delayed,
the agent is able to find the customers who have already ordered the book
from the Purchase table, and forwards the information (Customer’s name, IP
address, order items etc) to every customer touch point, e.g. back offices and
call center. Therefore, when the customer visits the website or telephones
the call center, the specific information will be put forward immediately
(through the IP address) to the customer to explain the situation. Obviously,
this will strengthen the intra-processing of the e-organisation, speed up the
response time and improve customer relationships.
Fig. 3 An example of the sales agent
•
Business Intelligence (BI) Support:
This layer doesn’t have to exist but it may just separate the winners and
losers. Integrating BI tools, e.g. OLAP and data mining etc., into ebusinesses has become more widely accepted by e-organisations to reveal
hidden facts of business data. The knowledge of customers’ behavior will
help to improve customer relationships and make business strategies. These
techniques are discussed in the following Section 3.
3. Enhance e-business Through Web Data Mining
It has been a challenge for e-organisations to uncover patterns that reveal
the hidden insights of their massive e-business data effectively, as the data
is constantly growing. Data mining techniques are becoming more popular
as a powerful BI tool to fill the increasing gap between data collection and
exploration, helping e-businesses of all sizes to sift through the data in
search of useful patterns.
3.1 A Glance at Web Data
Web data is the information that is recorded by the website server when a
user visits. As an example, a file named access_log, as shown in Fig. 4,
contains all the website “hit” information, such as visitor’s IP address, date
and time (GMT + time difference), required pages, and status code indicating
if the request is completed or failed. Similarly, a file called error_log records
the error details, web server problems and possible intrusions.
Fig. 4 An example access_log and error_log files
The web data in access_log and error_log is required to be converted into
database format, so that data mining algorithms can be applied to it. Two
tables, i.e. Web_log and Error_log, are used to collect the relevant data from
the two files. Fig. 5 shows the table “Web_log” that contains part of the data
in the file of access_log.
ID
IP
Date
Time
Adaptation
Page
Status
1
172.16.100.232
11-Aug-00
9:24:25
+0100
GET /vnvi/ HPPT /1.1
304
2
172.16.100.232
11-Aug-00
9:24:25
+0100
GET /vnvi/ md5.js HPPT /1.1
304
3
172.16.100.232
11-Aug-00
9:24:25
+0100
GET /image/coimage.gif HPPT /1.1 304
4
172.16.100.232
11-Aug-00
9:24:25
+0100
GET /vnvi/ md5.js HPPT /1.1
404 298
5
172.16.100.232
11-Aug-00
9:24:25
+0100
GET /vnvi/ image/coimage.gif
404 303
..
….
…
…
..
…
…
Fig. 5 The table Web_log contains part of the web data in the file of access_log.
The web data is massive since the visitor’s every click in the website will
leave several records in the tables. This also allows the website owner to
track visitors’ behavior details and discover valuable patterns. For example,
the visitor’s IP address and time difference represent the organisation and
geographic location; the required page indicate the visitors’ interests and
searching topics; frequency statistics and time duration separates the
regular users from casual visitors; and the status codes show the stability of
the website - succeed or failure to get the required pages, quick or slow
response as loading from cache or hard disk etc. These useful patterns can
be uncovered by web data mining technique, helping e-organisations to
enhance their e-services.
3.2 Personalised e-Services
Organisations are seeking effective and low cost personalised service for
their customers. For example, an e-banking organisation wants to provide a
quick access to frequently-used customers to their most likely wanted
functions, e.g., statement enquiry, payment, or stock market quotation etc,
instead of the bank’s general front page.
This purpose can be achieved by applying data mining to the web data.
Two attributes of Web_log, i.e. the “IP” and “Page” in Fig. 5, are relevant.
Since every web page is designed for a specific function, a concept hierarchy
for web pages can be build up based on the assigned function. Fig. 6 gives
an example hierarchy of the pages shown in Fig. 4.
Fig. 6 Concept hierarchy of web pages
Based on this page hierarchy, every web page recorded in Fig. 5 is replaced
by the higher concept in the hierarchy, resulting in a generalised table in
Fig. 7, with more abstract concepts of visiting purposes, as well as the
visiting frequency indicated by the statistic numbers.
IP
Login
Statement
Transfer Stock Currency
…
172.16.100.232 231
38
2
200
12
…
184.63.2.4
123
100
45
0
0
…
195.58.1.101
94
90
12
0
0
…
202.27.2.121
140
112
1
100
67
…
…
…
…
…
…
…
…
Fig. 7 the generalised table of IP and Pages with statistic information
Customer’s visiting purposes are revealed clearly in the obtained table by the
items of “transfer” or “statement” instead of web pages. With this piece of
knowledge, the organisation can build up a hot mapping of IP address to
services. When a specific customer who checks share prices every day login
to the e-bank, the website will jump to the stock market quotation page
firstly even with the stock numbers. As can be seen, without the data
generalisation process, it is almost impossible to understand and analyse
the giant amount of web data in its original data level.
Applying other data mining algorithms to the above-generalised table can
uncover more complicated patterns. For example, classification algorithms
can find out the visitor segmentation based on their interests shown in the
frequently visited pages. Another example is to reveal casual visitor’s
browsing patterns, such as general time duration that visitors spend in the
website, mostly visited pages and likely checked out pages, as well as the
reasons etc., from the attributes of Date, Time, Pages and Status in Web_log.
These patterns are very helpful to the website owner to improve the
attractiveness and stability of the website.
It is noticed that the visitor segmentations and browsing patterns
discovered from web data are different from the customer segmentations and
shopping behaviors that can be discovered from a normal business like
supermarket. This is because that web visitors are not necessarily
customers if they have no previous purchase record; or simply because that
the website only provides service and not selling any product. Therefore, web
visitors’ personal details or consuming potentials cannot be collected directly
like normal businesses, unless specific forms are required to be filled in,
which unfortunately is a big reason why visitors give up the website.
However, e-business data has other advantages. As every click in the website
creates many web data records, significant patterns can also be found out
from the massive data that recording every movement in the website. This
characteristic of web data requires more pre-processing and specific
procedures based on web techniques, to filter out irrelevant or noisy records
before applying a normal mining algorithm. Therefore, compared with the
ones discovered from normal business data, patterns generated from ebusiness and web data might have slightly lower accuracy but more natural
meanings.
3.3 Increasing Cross-Selling by Basket Analysis
All businesses constantly try to increase cross-selling opportunities. The
secret of successful cross-selling is to understand customer’s interests and
recommend the right products to the right customer. The knowledge
revealing customers’ consuming behaviors are hidden in the large amount of
the historical purchase records maintained by the data manager. Relevant
tables, including Customer, Purchase, Payment, Contact and Web_Log, are
shown in Fig.8, as well as their relationships. Based on this set of data,
interesting patterns can be developed to increase cross-selling.
Fig. 8 Relevant tables and their relationships
Association rule algorithm, or market basket analysis algorithm, can be
employed to the dataset of all the items that have been bought together,
resulting in patterns like “this percentage of customers who bought XX also
bought YY”. These associations can be used to recommend YY to the
customers who are going to buy XX, or launch a package promotion for the
goods like XX and YY that are likely to be bought together.
These e-association patterns should be as good as the ones generated
from normal business data, if the e-company has reasonable amount of
customers. In fact, to e-businesses, not only the purchase records, but also
the customer’s potential interests indicated by the frequently visited pages
can be taken into account to increase the accuracy of the association rules.
At the same time, the Web_log shows whether a new order is made as the
result of specific recommendations, which allows the e-organisation to
monitor the performance of the rules, and adapt their marketing strategies
accordingly. Obviously, the knowledge and strategies with measurable
results can better target potential markets, maximise the success
opportunity and minimise the marketing cost.
4. Conclusion
Successful e-business needs cutting edge BI and CRM technology. This
paper touches upon how web data mining can help e-business to improve
their customer relationship, make intelligent business strategies, and
sharpen competitive edge; yet it reveals only a tip of the iceberg. The
experience of many e-business winners has shown the tremendous benefits
from applying even only a single piece of mining technology, but which has
singled them out from their peer.
References:
1. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. 1996.
Advances in Knowledge Discovery and Data Mining (AKDDM). AAAI/NIIT Press.
2. Chen, M.S.; Han, J.; and Yu, P.S. 1996. Data Mining: An Overview from A
Database Perspective, IEEE Trans. on Knowledge and Data Engineering, Vol. 8,
No.6, P866-883.
3. Amy Shi, Mining Linguistic Knowledge From Financial Data, Ph.D Thesis, HK
Polytechnic University, Hong Kong, 1999.
4. Business Intelligent 2000, London, U.K. 2000.
5. Customer Relationship Management 2000, London, U.K. 2000