Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ISSN No: 2309-4893 International Journal of Advanced Engineering and Global Technology Vol-2, Issue-1, January 2014 Web Service mining and its techniques in Web Mining B.Meena, I.S.L.Sarwani, S.V.S.S.Lakshmi ANITS, Visakhapatnam, India Abstract web mining is the integration of information gathered by traditional data mining methodologies and techniques with information gathered over the World Wide Web. Mining means extracting something useful or valuable from a baser substance, such as mining gold from the earth. Web mining is used to understand customer behavior, evaluate the effectiveness of a particular Web site, and help quantify the success of a marketing campaign. Web Service mining and its techniques has been discussed in this paper to know the future trends of Mining in Web. Web mining is the use of data mining techniques to automatically discover and extract information from web Keywords :webservice, web mining ,web usage , web log 1.Introduction to web mining Web mining allows to look for patterns in data through content mining, structure mining, and usage mining. Content mining is used to examine data collected by search engines and Web spiders. Structure mining is used to examine data related to the structure of a particular Web site and usage mining is used to examine data related to a particular user's browser as well as data gathered by forms the user may have submitted during Web transactions. Figure 1 : classification of web mining documents and services (content, structure, and usage). Two different approaches were taken in initially defining web mining: • Process_centric View – Web mining as a sequence of tasks • Data_centric view – web mining as a web data that was being used in the mining process. The information gathered through Web mining is evaluated (sometimes with the aid of software graphing applications) by using traditional data mining parameters such as clustering and classification, association, and examination of sequential patterns. The important data mining techniques applied in the web domain include Association Rule, Sequential pattern discovery, clustering, path analysis, classification and outlier discovery. web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the web data. Web mining aims at finding and extracting relevant information that is hidden in web-related data, in particular in text documents that are published on the web like data mining is a multidisciplinary effort that draws technique from fields like information retrieval, statistics, machine learning, natural language processing and others. Web mining can be a promising tool to address ineffective search engines that produce incomplete indexing, retrieval of irrelevant information/unverified reliability or retrieved information. It is essential to have a system that helps the user find relevant and reliable information easily and quickly on the web. Web mining discovers information from mounds of data on the www, but it also monitors and predicts user visit patterns. This gives designers more reliable information in structuring 2. Web Mining –classification The web contains collection of pages that includes countless hyperlinks and huge volumes of access and usage information. Because of the ever-increasing amount of information in cyberspace, knowledge discovery and web mining are becoming critical for successfully conducting business in the cyber world. Web mining is the discovery and analysis of useful information from the web. 385 WWW.IJAEGT.COM ISSN No: 2309-4893 International Journal of Advanced Engineering and Global Technology Vol-2, Issue-1, January 2014 and designing a web site.Given the rate of growth of the web, scalability of search engines is a key issue, as the amount of hardware and network resources needed is large, and expensive. In addition, search engines are popular tools, so they have heavy constraints on query answer time. So, the efficient use of resources can improve both scalability and answer time. One tool to achieve these goal is web mining. Web mining can be categorized into three areas of interest based on which part of the web to mine (Web mining research lines): 3.Web content mining 3.1 Agent based approaches: Involves AI systems that can “act autonomously or semi autonomously on behalf of a particular user, to discover and organize web based information”. Agent Based approaches focus on intelligent and autonomous web mining tools based on agent technology. i. Some intelligent web agents can use a user profile to search for relevant information, then organize and interpret the discovered information. example: Harvest. ii) Some use various information retrieval techniques and the characteristics of open hypertext documents to organize and filter retrieved information. Example: Hypursuit. iii) Learn user preferences and use those preferences to discover information sources for those particular user. Example: Xpert Rule Rminer. 3.2 Data base approach: It focuses on “integrating and organizing the heterogeneous and semi-structured data on the web into more structured and high level collections of resources”. These organized resources can then be accessed and analyzed. These “metadata, or generalization are then organized into structured collections and can be analyzed. Discovery of useful information from the web contents/data/documents (or) is the application of data mining techniques to content published on the Internet. The web contains many kinds and types of data. Basically, the web content consists of several types of data such as plain text (unstructured), image, audio, video, meta data as well as HTML (semi Structured), or XML (structured documents), dynamic documents, multimedia documents. Recent research on mining multi types of data is termed multimedia data mining. Thus we could consider multimedia data mining as an instance of web content mining. The research around applying data mining techniques to unstructured text is termed knowledge discovery in texts/ text data mining/ text mining. Hence we could consider text mining as an instance as an instance of web content mining. Research issues addressed in text mining are: topic discovery, extracting association patterns, clustering of web documents and classification of web pages.The Issues in Web content Mining developing intelligent tools for information retrieval, finding keywords and key phases, discovering grammatical rules collections, 4.Web Structure Mining operates on the web’s hyperlink structure. The graph structure can provide information about page ranking or authoritativeness and enhance search results through filtering i.e., tries to discover the model underlying the link structures of the web. This model is used to analyze the similarity and relationship between different web sites. Uses the hyperlink structure of the web as an additional information source. This type of mining can be further divided into 2 kinds based on the kind of structural data used. a) Hyperlinks: A hyperlink is a structural unit that connects a web page to different location, either within the same web page (intra_document hyperlink) or to a different web page (inter_document) hyperlink. b) Document structure: In addition, the content within a web page can also be organized in a tree structured format, based on various HTML and XML tags within the page. Mining efforts here have focused on automatically extracting document object model (DOM) structures out of documents. Web link analysis used for: ordering documents matching a user query (ranking) , deciding what pages to add to a collection , page categorization , finding related pages , finding duplicated web sites , and also to find out similarity between them Web Usage Mining: Web usage mining is the application of data mining techniques to discover interesting usage patterns from web data, in order to understand and better serve the needs of web-based applications. It tries to make sense of the data generated by the web surfer’s sessions/behaviors. While the web content and structure mining utilize the primary data on the web, web usage mining mines the secondary data derived from the interactions of the users while interacting with the web. The web usage data includes the data from web Figure :2 Iterative Query refinement process in content mining hypertext classification/categorization , extracting key phrases from text documents ,learning extraction rules , hierarchical clustering ,predicting relationships .The approaches of Web content mining are : Agent based and Data base approaches 386 WWW.IJAEGT.COM ISSN No: 2309-4893 International Journal of Advanced Engineering and Global Technology Vol-2, Issue-1, January 2014 server logs, proxy server logs, browser logs, and user profiles. (The usage data can also be split into 3 different kinds on the basis of the source of its collection: on the server side (there is an aggregate picture of the usage of a service by all users), the client side (while on the client side there is complete picture of usage of all services by a particular client), and the proxy side (with the proxy side being some where in the middle). Registration data, user sessions, cookies, user queries, mouse clicks, and any other data as the results of interactions. Web usage mining analyzes results of user interactions with a web server, including web logs, click streams, and database transactions at a web site of a group of related sites. Web usage mining also known as web log mining. Web usage mining process can be regarded as a three-phase process consisting: After discovering patterns from usage data, a further analysis has to be conducted. The most common ways of analyzing such patterns are either by using query or by loading the results into a data cube and then performing OLAP operations. Then, visualization techniques are used for a results interpretation. The discovered rules and patterns can then be used for improving the system performance / for making modifications to the web site. The purpose of web usage mining is to apply statistical and data mining techniques to the preprocessed web log data, in order to discover useful patterns. Usage mining tools discover and predict user behavior in order to help the designer to improve the web site, to attract visitors, or to give regular users a personalized and adaptive service. The applications are • Extract statistical information and discover interesting user patterns. • Cluster the user into groups according to their navigational behavior. • Discover potential correlations between web pages and user groups • Identification of potential customers for ecommerce • Enhance the quality and delivery of Internet information services to the end user. • Improve web server system performance and site design. • Facilitate personalization Web usage mining itself can be classified further depending on the kind of usage data considered. Web server data: They correspond to the user logs that are collected at web server. Some of the typical data collected at a web server include IP addresses, page references, and access time of the users. Commercial application servers (example: Web logic, Brod Vision, etc) have significant features in the framework to enable E-Commerce applications to be built on top them with little effort. A key feature is the ability to track various kinds of business events and log them in application server logs. Application level data: Finally, new kinds of events can always be defined in an application, and logging can be tuned on for them - generating histories of these specially defined events. Knowledge of user access patterns is useful in numerous applications: • Supporting website design decisions such as content and structure justifications • Optimizing systems by enhancing caching schemes and load balancing • Making website adaptive Figure : 3 Data preprocessing steps • • • Preprocessing/ data preparation - web log data are preprocessed in order to clean the data – removes log entries that are not needed for the mining process, data integration, identify users, sessions, and so on pattern discovery - statistical methods as well as data mining methods (path analysis, Association rule, Sequential patterns, cluster and classification rules) are applied in order to detect interesting patterns. pattern analysis phase - discovered patterns are analyzed here using OLAP tools, knowledge query management mechanism and Intelligent agent to filter out the uninteresting rules/patterns. 387 WWW.IJAEGT.COM ISSN No: 2309-4893 International Journal of Advanced Engineering and Global Technology Vol-2, Issue-1, January 2014 • has been provided. The service pattern has been identified by locating associated services commonly used by different application and understanding control flow among the set of associated services. Top-Down: The business processes are reviewed from different organization to identify pattern.. Bottoms-UP: Execution logs of the applications are analyzed to mine business for the pattern. The execution logs from multiple applications could be mined for frequently executed service patterns. The pattern mining task form execution logs has been broken in three sub tasks 5.1.2.Pre-processing execution logs: A serviceoriented application executed is existed as multiple instances with instances from other application. Instances are identifies by a unique identifier. Event occurred in this instances is logged. There different types events could occur in the system like resource adaptor event, business rule event and service invocation event. Service invocation event is being considered in this context. The entry point, exit point and process is logged in the application logs with instance identifier and time stamp. Logs are processed to filter out event other types. 5.1.3.Identifying frequently associated web services: Services which occurs most frequently occur together are consider for service pattern. There are predefined number of services in services pattern could be considered. Usually, four services in a pattern gives optimum result where are two services are considered incomplete. 5.1.4.Recovering the control flow: The control flow of the services in the service pattern makes in reusable. The execution instance of service in a service pattern is considered and execution flow is extracted. Similar execution flow is extracted for all services in the service pattern. The Common execution flows among these services are considered as control flow. 5.1.5.Web Service Interaction Mining Business Process Execution Language (BPEL) is a way to standardize web service composition into business processes. BPEL not only used to define workflows but also is used to monitor the execution of workflow. In BPEL, the owner is able to monitor only those web services that is owned . 5.1.6 Log Based Web Service Mining : Web services are becoming more and more complex, involving numerous interacting business objects within considerable processes. Web services mining makes use of concepts from data mining and process Supporting business intelligence and marketing decisions • Testing user interfaces, monitoring for security purposes, and more importantly, in web personalization applications. A typical Web usage mining system consists of 2 tiers: i. Tracking, in which user interactions are captured and acquired ii. Analysis, in which user access patterns are discovered and interpreted by applying typical data mining techniques to the acquired data. 5. web services Mining Web service mining is a bottom up search process, which is targeted to proactively discovering potentially interesting and useful Web Services from existing ones. The Web services paradigm promises to enable flexible, rich and dynamic operation of heterogeneous and highly distributed network enabled services. Similar to the concept of data mining, web service mining is also evolving to provide better services as per the business requirement. Web services are not data but source of data; hence there are subtle difference in mining of web services and data. There are different aspects of service mining, challenges and solution researched. there are various algorithms to extract process trace data from the process logs to develop a meta model . These models can be used to improve the business process and used in business process mining. service mining in WSDL to find optimal web-service to be best suited for the action to be performed. Challenges The Challenges in web service mining are related to data collection, data preparation, and process and data changes. Extraction of service pattern from individual web services applications deployed in cloud environment requires scanning of logs. Cloud services could provide integrated infrastructure for data mining where complete systems could be analyzed. The collected logs need to mine for the association. 5.1. Web Service Mining Techniques 5.1.1.Extraction of Composite Patterns from Execution Logs The re-usage of service composition patterns in service composition provides an efficient way to improve the quality of new applications. To provide documentation of service composition pattern an automatically service pattern recognition techniques 388 WWW.IJAEGT.COM ISSN No: 2309-4893 International Journal of Advanced Engineering and Global Technology Vol-2, Issue-1, January 2014 mining and applies them to web services and serviceoriented architecture. A web service is a application on the internet, supplied by the supplier and is accessible by customers though standard internet protocol. 6.Conclusion Web mining consists of three major parts: collecting the data, preprocessing the data and extracting and analyzing patterns in the data. This paper focuses primarily on web usage data mining. using Web mining when designing and maintaining Websites is extremely useful for making sure that the Website conforms to the actual usage of the site. The area of Web mining was invented with respect to the needs of web shops, which wanted to be more adaptive to customers. A set of web mining techniques have been listed which significantly speeds up the process of mining data on the Web. The different techniques has a correspondence to determine the technique of choice depending of the size of the data. 7.References [1] http://www.w3.org/TR/ws-arch/ [2] www.ieee.org.ar/downloads/Srinvastava-tutpres.pdf [3] http://www.web-atamining.net/structure/ [4] http://www.web-datamining.net/content/ [5]http://almaden.ibm.com/cs/projects/iis/hdb/Public ations/papers/www10_weblog.pdf [6]www.cs.rit.edu/~vvg1074/research/WebServiceMi ning.pdf [7] http://scholar.lib.vt.edu/theses/available/etd02272009-195012/ 389 WWW.IJAEGT.COM