Document related concepts
ISSN No: 2309-4893
International Journal of Advanced Engineering and Global Technology
Vol-2, Issue-1, January 2014
Web Service mining and its techniques in Web Mining
B.Meena, I.S.L.Sarwani, S.V.S.S.Lakshmi
ANITS, Visakhapatnam, India
web mining is the integration of information gathered by
traditional data mining methodologies and techniques with
information gathered over the World Wide Web. Mining
means extracting something useful or valuable from a
baser substance, such as mining gold from the earth. Web
mining is used to understand customer behavior, evaluate
the effectiveness of a particular Web site, and help quantify
the success of a marketing campaign. Web Service mining
and its techniques has been discussed in this paper to know
the future trends of Mining in Web.
Web mining is the use of data mining techniques to
automatically discover and extract information from web
Keywords :webservice, web mining ,web usage ,
web log
1.Introduction to web mining
Web mining allows to look for patterns in data
through content mining, structure mining, and usage
mining. Content mining is used to examine data
collected by search engines and Web spiders.
Structure mining is used to examine data related to
the structure of a particular Web site and usage
mining is used to examine data related to a particular
user's browser as well as data gathered by forms the
user may have submitted during Web transactions.
Figure 1 : classification of web mining
documents and services (content, structure, and usage). Two
different approaches were taken in initially defining web
• Process_centric View – Web mining as a sequence of
• Data_centric view – web mining as a web data that
was being used in the mining process.
The information gathered through Web mining is
evaluated (sometimes with the aid of software
graphing applications) by using traditional data
mining parameters such as clustering and
classification, association, and examination of
sequential patterns.
The important data mining techniques applied in the web
domain include Association Rule, Sequential pattern
discovery, clustering, path analysis, classification and outlier
discovery. web mining refers to the overall process of
discovering potentially useful and previously unknown
information or knowledge from the web data. Web mining
aims at finding and extracting relevant information that is
hidden in web-related data, in particular in text documents that
are published on the web like data mining is a multidisciplinary effort that draws technique from fields like
information retrieval, statistics, machine learning, natural
language processing and others. Web mining can be a
promising tool to address ineffective search engines that
produce incomplete indexing, retrieval of irrelevant
information/unverified reliability or retrieved information. It
is essential to have a system that helps the user find relevant
and reliable information easily and quickly on the web. Web
mining discovers information from mounds of data on the
www, but it also monitors and predicts user visit patterns.
This gives designers more reliable information in structuring
2. Web Mining –classification
The web contains collection of pages that includes
countless hyperlinks and huge volumes of access and
usage information. Because of the ever-increasing
amount of information in cyberspace, knowledge
discovery and web mining are becoming critical for
successfully conducting business in the cyber world.
Web mining is the discovery and analysis of useful
information from the web.
and designing a web site.Given the rate of growth of the web,
scalability of search engines is a key issue, as the amount of
hardware and network resources needed is large, and
expensive. In addition, search engines are popular tools, so
they have heavy constraints on query answer time. So, the
efficient use of resources can improve both scalability and
answer time. One tool to achieve these goal is web mining.
Web mining can be categorized into three areas of interest
based on which part of the web to mine (Web mining research
3.Web content mining
3.1 Agent based approaches:
Involves AI systems that can “act autonomously or semi
autonomously on behalf of a particular user, to discover and
organize web based information”. Agent Based approaches
focus on intelligent and autonomous web mining tools based
on agent technology. i. Some intelligent web agents can use a
user profile to search for relevant information, then organize
and interpret the discovered information. example: Harvest.
ii) Some use various information retrieval techniques and the
characteristics of open hypertext documents to organize and
filter retrieved information. Example: Hypursuit. iii) Learn
user preferences and use those preferences to discover
information sources for those particular user. Example: Xpert
Rule Rminer.
3.2 Data base approach:
It focuses on “integrating and organizing the heterogeneous
and semi-structured data on the web into more structured and
high level collections of resources”.
These organized
resources can then be accessed and analyzed.
“metadata, or generalization are then organized into structured
collections and can be analyzed.
Discovery of useful information from the web
contents/data/documents (or) is the application of data mining
techniques to content published on the Internet. The web
contains many kinds and types of data. Basically, the web
content consists of several types of data such as plain text
(unstructured), image, audio, video, meta data as well as
HTML (semi Structured), or XML (structured documents),
dynamic documents, multimedia documents. Recent research
on mining multi types of data is termed multimedia data
mining. Thus we could consider multimedia data mining as an
instance of web content mining. The research around
applying data mining techniques to unstructured text is termed
knowledge discovery in texts/ text data mining/ text mining.
Hence we could consider text mining as an instance as an
instance of web content mining. Research issues addressed in
text mining are: topic discovery, extracting association
patterns, clustering of web documents and classification of
web pages.The Issues in Web content Mining developing
intelligent tools for information retrieval, finding keywords
and key phases, discovering grammatical rules collections,
4.Web Structure Mining
operates on the web’s hyperlink structure. The graph
structure can provide information about page ranking or
authoritativeness and enhance search results through filtering
i.e., tries to discover the model underlying the link structures
of the web. This model is used to analyze the similarity and
relationship between different web sites. Uses the hyperlink
structure of the web as an additional information source. This
type of mining can be further divided into 2 kinds based on the
kind of structural data used. a) Hyperlinks: A hyperlink is a
structural unit that connects a web page to different location,
either within the same web page (intra_document hyperlink)
or to a different web page (inter_document) hyperlink. b)
Document structure: In addition, the content within a web
page can also be organized in a tree structured format, based
on various HTML and XML tags within the page. Mining
efforts here have focused on automatically extracting
document object model (DOM) structures out of documents.
Web link analysis used for: ordering documents matching a
user query (ranking) , deciding what pages to add to a
collection , page categorization , finding related pages ,
finding duplicated web sites , and also to find out similarity
between them
Web Usage Mining: Web usage mining is the application of
data mining techniques to discover interesting usage patterns
from web data, in order to understand and better serve the
needs of web-based applications. It tries to make sense of the
data generated by the web surfer’s sessions/behaviors. While
the web content and structure mining utilize the primary data
on the web, web usage mining mines the secondary data
derived from the interactions of the users while interacting
with the web. The web usage data includes the data from web
Figure :2 Iterative Query refinement process in content
hypertext classification/categorization , extracting key phrases
from text documents ,learning extraction rules , hierarchical
clustering ,predicting relationships .The approaches of Web
content mining are : Agent based and Data base approaches
server logs, proxy server logs, browser logs, and user profiles.
(The usage data can also be split into 3 different kinds on the
basis of the source of its collection: on the server side (there is
an aggregate picture of the usage of a service by all users), the
client side (while on the client side there is complete picture of
usage of all services by a particular client), and the proxy side
(with the proxy side being some where in the middle).
Registration data, user sessions, cookies, user queries, mouse
clicks, and any other data as the results of interactions. Web
usage mining analyzes results of user interactions with a web
server, including web logs, click streams, and database
transactions at a web site of a group of related sites. Web
usage mining also known as web log mining.
Web usage mining process can be regarded as a three-phase
process consisting:
After discovering patterns from usage data, a further
analysis has to be conducted. The most common
ways of analyzing such patterns are either by using
query or by loading the results into a data cube and
then performing OLAP operations.
visualization techniques are used for a results
interpretation. The discovered rules and patterns can
then be used for improving the system performance /
for making modifications to the web site. The
purpose of web usage mining is to apply statistical
and data mining techniques to the preprocessed web
log data, in order to discover useful patterns. Usage
mining tools discover and predict user behavior in
order to help the designer to improve the web site, to
attract visitors, or to give regular users a personalized
and adaptive service. The applications are
Extract statistical information and discover
interesting user patterns.
• Cluster the user into groups according to
their navigational behavior.
• Discover potential correlations between web
pages and user groups
• Identification of potential customers for
• Enhance the quality and delivery of Internet
information services to the end user.
• Improve web server system performance
and site design.
• Facilitate personalization
Web usage mining itself can be classified further
depending on the kind of usage data considered.
Web server data: They correspond to the user logs
that are collected at web server. Some of the typical
data collected at a web server include IP addresses,
page references, and access time of the users.
Commercial application servers (example: Web
logic, Brod Vision, etc) have significant features in
the framework to enable E-Commerce applications to
be built on top them with little effort. A key feature
is the ability to track various kinds of business events
and log them in application server logs.
Application level data: Finally, new kinds of events
can always be defined in an application, and logging
can be tuned on for them - generating histories of
these specially defined events.
Knowledge of user access patterns is useful in
numerous applications:
• Supporting website design decisions such as
content and structure justifications
• Optimizing systems by enhancing caching
schemes and load balancing
• Making website adaptive
Figure : 3 Data preprocessing steps
Preprocessing/ data preparation - web log data
are preprocessed in order to clean the data –
removes log entries that are not needed for the
mining process, data integration, identify users,
sessions, and so on
pattern discovery - statistical methods as well as
data mining methods (path analysis, Association
classification rules) are applied in order to detect
interesting patterns.
pattern analysis phase - discovered patterns are
analyzed here using OLAP tools, knowledge
query management mechanism and Intelligent
agent to filter out the uninteresting rules/patterns.
has been provided. The service pattern has been
identified by locating associated services commonly
used by different application and understanding
control flow among the set of associated services.
Top-Down: The business processes are reviewed
from different organization to identify pattern..
Bottoms-UP: Execution logs of the applications are
analyzed to mine business for the pattern.
The execution logs from multiple applications could
be mined for frequently executed service patterns.
The pattern mining task form execution logs has been
broken in three sub tasks
5.1.2.Pre-processing execution logs: A serviceoriented application executed is existed as multiple
instances with instances from other application.
Instances are identifies by a unique identifier. Event
occurred in this instances is logged. There different
types events could occur in the system like resource
adaptor event, business rule event and service
invocation event. Service invocation event is being
considered in this context. The entry point, exit point
and process is logged in the application logs with
instance identifier and time stamp. Logs are
processed to filter out
event other types.
5.1.3.Identifying frequently associated web
Services which occurs most frequently occur together
are consider for service pattern. There are predefined
number of services in services pattern could be
considered. Usually, four services in a pattern gives
optimum result where are two services are considered
5.1.4.Recovering the control flow: The control flow
of the services in the service pattern makes in
reusable. The execution instance of service in a
service pattern is considered and execution flow is
extracted. Similar execution flow is extracted for all
services in the service pattern.
The Common execution flows among these services
are considered as control flow.
5.1.5.Web Service Interaction Mining
Business Process Execution Language (BPEL) is a
way to standardize web service composition into
business processes.
BPEL not only used to define workflows but also is
used to monitor the execution of workflow. In BPEL,
the owner is able to monitor only those web services
that is owned .
5.1.6 Log Based Web Service Mining : Web
services are becoming more and more complex,
involving numerous interacting business objects
within considerable processes. Web services mining
makes use of concepts from data mining and process
Supporting business intelligence and
marketing decisions
• Testing user interfaces, monitoring for
security purposes, and more importantly, in
web personalization applications.
A typical Web usage mining system consists of 2
tiers: i. Tracking, in which user interactions are
captured and acquired ii. Analysis, in which user
access patterns are discovered and interpreted by
applying typical data mining techniques to the
acquired data.
5. web services Mining
Web service mining is a bottom up search process,
which is targeted to proactively discovering
potentially interesting and useful Web Services from
existing ones. The Web services paradigm promises
to enable flexible, rich and dynamic operation of
heterogeneous and highly distributed network
enabled services. Similar to the concept of data
mining, web service mining is also evolving to
provide better services as per the business
requirement. Web services are not data but source of
data; hence there are subtle difference in mining of
web services and data. There are different aspects of
service mining, challenges and solution researched.
there are various algorithms to extract process trace
data from the process logs to develop a meta model .
These models can be used to improve the business
process and used in business process mining. service
mining in WSDL to find optimal web-service
to be best suited for the action to be performed.
The Challenges in web service mining are related to
data collection, data preparation, and process and
data changes. Extraction of service pattern from
individual web services applications deployed in
cloud environment requires scanning of logs. Cloud
services could provide integrated infrastructure for
data mining where complete systems could be
analyzed. The collected logs need to mine for the
5.1. Web Service Mining Techniques
5.1.1.Extraction of Composite Patterns from
Execution Logs
The re-usage of service composition patterns in
service composition provides an efficient way to
improve the quality of new applications. To provide
documentation of service composition pattern an
automatically service pattern recognition techniques
mining and applies them to web services and serviceoriented architecture. A web service is a application
on the internet, supplied by the supplier and is
accessible by customers though standard internet
Web mining consists of three major parts: collecting
the data, preprocessing the data and extracting and
analyzing patterns in the data. This paper focuses
primarily on web usage data mining.
using Web mining when designing and maintaining
Websites is extremely useful for making sure that the
Website conforms to the actual usage of the site. The
area of Web mining was invented with respect to the
needs of web shops, which wanted to be more
adaptive to customers. A set of web mining
techniques have been listed which significantly
speeds up the process of mining data on the Web.
The different techniques has a correspondence to
determine the technique of choice depending of the
size of the data.