Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Advanced Review A survey of fuzzy web mining Chun-Wei Lin1 and Tzung-Pei Hong2,3 ∗ The Internet has become an unlimited resource of knowledge, and is thus widely used in many applications. Web mining plays an important role in discovering such knowledge. This mining can be roughly divided into three categories, including Web usage mining, Web content mining, and Web structure mining. Data and knowledge on the Web may, however, consist of imprecise, incomplete, and uncertain data. Because fuzzy-set theory is often used to handle such data, several fuzzy Web-mining techniques have been proposed to reveal fuzzy and linguistic knowledge. This paper reviews these techniques according to the three Web-mining categories above—fuzzy Web usage mining, fuzzy Web content mining, and fuzzy Web structure mining. Some representative approaches C 2013 Wiley Periodicals, Inc. in each category are introduced and compared. How to cite this article: WIREs Data Mining Knowl Discov 2013, 3: 190–199 doi: 10.1002/widm.1091 INTRODUCTION T he number and variety of databases have increased with the growth of digital information. Mining meaningful information from large databases thus becomes more and more important. Many datamining techniques have thus been developed to derive useful knowledge or rules for making efficient decisions from large databases. Besides, the Internet has become an essential resource of information and Web mining plays a key role in discovering relevant knowledge from it. Web mining is the application of datamining techniques to discover the target information and knowledge from Web documents and services.1, 2 Generally, Web mining can be divided into three categories, namely Web usage mining, Web content mining, and Web structure mining.3 Web usage mining is aimed at mining usage behavior from Web access logs, user profiles, user queries, and clickstreams. The The authors have declared no conflicts of interest in relation to this article. ∗ Correspondence to: [email protected] 1 Innovative Information Industry Research Center (IIIRC), School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, People’s Republic of China 2 Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C. 3 Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan, R.O.C. DOI: 10.1002/widm.1091 190 C datasets are generated by the interactions between users and the Web, and can be used for discovering user access patterns on servers. Web content mining is used to mine knowledge from multimedia documents, including text, images, audio, videos, metadata, and hyperlinks for extracting relations across the Internet. Web content mining can also be considered as information retrieval (IR) from unstructured and semi-structured Web data.4 Web structure mining focuses on interrelations between data, providing a linking graph among Websites. The patterns of hyperlinks on connected Web pages and the document structure analysis of HTML or XML tag usage are the two main approaches of Web structure mining. Because Web data are usually unstructured, distributed, and heterogeneous, it is necessary to design efficient approaches for extracting, filtering, and evaluating the required information. Some strategies in IR, knowledge discovery (KDD), machine learning, and artificial intelligence are used for handling Web databases to generate human-like decisions.5 Soft computing tools, including fuzzy logic,6 are widely used in Web mining for processing uncertain, incomplete, and imprecise information because of their simplicity and ability to model human reasoning.7 Fuzzy set theory (FST) was first proposed by Zadeh in 1965.6 Fuzzy sets (also named as fuzzy clumps8 ) can be thought of as an extension of set theory. FST is primarily concerned with quantifying and reasoning using natural language, in which words can have 2013 John Wiley & Sons, Inc. Volume 3, May/June 2013 WIREs Data Mining and Knowledge Discovery A survey of fuzzy web mining ambiguous meanings,7, 9 thus providing useful tools for decision making. In this paper, the application of FST to various aspects of Web mining is surveyed. Web usage mining, Web content mining, and Web structure mining conducted using fuzzy sets are reviewed in sections Fuzzy Web Usage Mining, Fuzzy Web Content Mining, and Fuzzy Web Structure Mining, respectively. The conclusions are given in the last section. FUZZY WEB USAGE MINING The WebMiner was the first system developed for Web usage mining.10 Joshi and Krishnapuram11 found that extracted information for association rules or clustering and sequential patterns from Web data do not have crisp boundaries, indicating that Web mining is nontrivial work compared with traditional data mining. FST was adopted for handling uncertain, vague, incomplete, and noisy datasets. Web usage mining was then used to derive usage patterns from Web logs.12 General cases for Web usage mining conducted using fuzzy concepts are described below. Rule Extraction Because of its simplicity and similarity to human reasoning, FST has been applied to mine rules. Hong et al.13, 14 used FST to efficiently mine the relationships among the items of Web databases. Web logs provide useful information for discovering user access records on a Website. The records can be used to derive useful patterns for constructing more personalized Websites. Fuzzy association rules have been mined by integrating the case-based reasoning approach.15 It used Web access prediction and recommendation for finding fuzzy association rules from Web logs and user profiles. Krishnapuram et al.16 developed the fuzzy-medoids (FCMdd) and robust fuzzy-medoids (RFCMdd) algorithms for clustering relational data using fuzzy dissimilarity for Web documents, snippets, and user sessions. Zhou et al.17 proposed an approach to discover the association behavior patterns of individual users in visualization. Wu18 proposed a generalized method for fuzzy association rule mining from Web logs. Web page visits and the duration time of visits were used to reflect user interest and preferences. Hong et al.19 proposed a fuzzy Web-mining algorithm for discovering useful user browsing behaviors based on durations of Web page visits acquired from Web logs. The importance of Web pages was evaluated using linguistic terms, which were then Volume 3, May/June 2013 C transformed and averaged as fuzzy sets of weights. Each linguistic term was weighted according to its importance for its page. In this approach, the linguistic term with the maximum cardinality for a page was chosen in subsequent mining processes, thus reducing the time complexity. Hong et al.20 also developed a fuzzy object-oriented Web-mining algorithm for discovering fuzzy knowledge from object data logs on Web server. Each Web page was treated as a class, and each browsed Web page by a client is considered as an instance. Using their proposed framework, both the intrapage linguistic association rules and interpage linguistic browsing patterns can be easily derived at the same time. Personalization Web personalization refers to customizing a Website to the needs or interests of users, which can be achieved by collecting user navigational behaviors and browsing (access) logs from the Web server.21 The personalization of Web services is an important step toward building friendly and individual interfaces, thus enhancing the long-term engagement and loyalty of users.22 Nasraoui et al.23 defined a ‘user session’ as a temporally compact sequence of web accesses by a user. A distance measure between two Web sessions was also defined to capture the organization of a Website. The proposed algorithm automatically clustered data into the optimal number of components to analyze server access logs and obtain typical session profiles of users. Bae et al.24 developed a system for mining the Web log files of customers to recommend suitable ads to users. The system first clustered the customers using a self-organizing map (SOM) to divide them into segments based on similar preferences. Expert advice was used to help determine suitable ads according to the mined patterns. Thus, the patterns and ads generated the fuzzy rules by fuzzy inference for recommendation. Zhou et al.25 proposed a period personalization system for analyzing periodic access patterns for recommending the most relevant information to users. The system first constructed a personal Web usage lattice using a fuzzy formal concept analysis (FCA) technique to efficiently determine the resources of user’s interest during the specific period. Kim and Cho26 asserted that a personalized search engine is an important tool for finding Web documents. They proposed a system that yielded more personalized results based on link information.26 The Web concept network determining relevance with the use of mechanisms of fuzzy logic, which 2013 John Wiley & Sons, Inc. 191 Advanced Review wires.wiley.com/widm was constructed from a user profile. Joshi and Krishnapuram27 asserted that the interactions between a Website and users should be analyzed to design more personalized Websites. They proposed a framework for automatically discovering use session profiles in Web logs. On the basis of their approach, better session profiles were obtained by grouping similar sessions together when compared with those obtained using traditional association rules. Santhisree and Damodaram28 proposed the CLIQUE (CLUstering in QUEst) algorithm for clustering Web sessions for Web personalization. Various fuzzy similarity measures were used to measure the similarity of Web sessions using sequence alignment to determine learning behaviors. Recommendation Systems A recommendation system uses users’ specific interests to automatically recommend the desired information based on Web usage mining. Nasraoui and Petenes stated that approximate reasoning29 can offer a general framework for the recommendation process.30 They developed a fast and intuitive Web recommendation approach that used a fuzzy inference engine to automatically derive rules from the discovered user profiles. Their framework reduced the memory requirements of fuzzy recommendation systems and lowered the cost of collaborative filtering. Fong et al.31 found that customer emotions affect purchase activities. Thus, a semantic mining approach for periodic Web access patterns was designed through self-reporting and behavior tracking. A personal Web usage ontology was generated for personal Website recommendation according to emotion. Porcel et al.32 proposed a hybrid fuzzy linguistic recommender system to aid the Technology Transfer Office staff in the dissemination of research resources interesting to users. The proposed system automatically derives appropriate recommendations and output them to users about both of the specialized and complementary research resources. It also helps discover potential collaboration possibilities to form multidisciplinary working groups. Other Applications KDD from Web usage patterns can be directly applied to many applications, such as e-business, e-services, and e-learning.33, 34 Abraham35 proposed an intelligent miner (i-Miner) that optimized Web data clusters using the Takagi-Sugeno fuzzy inference system. i-Miner analyses the trends of the Website visitors to optimally segregate similar user interests. On the basis 192 C of the proposed framework, visitor behavior and profiles were discovered to enhance the business model of e-commerce Websites. Wang et al.36 proposed a concurrent neuro-fuzzy model for deriving useful knowledge from Web logs. The fuzzy inference system and a self organizing map (SOM) were used to generate cluster information for both short-term and long-term Web traffic trend predictions.36 A summary of fuzzy Web usage mining methods is given in Table 1. FUZZY WEB CONTENT MINING Web content mining focuses on deriving useful information or knowledge from Web page content. It can be divided into two parts, namely the direct mining of Web content (documents or pages) and the improvement of content search, such as search engines.37 Data-mining techniques38–46 such as association rule mining, clustering, and sequential patterns can be applied to mine Web content. FST was used to create a fuzzy IR model for Web search.47 The search engine included indexing mechanisms and query languages, fuzzy document clustering, fuzzy data mining, fuzzy approaches for distributed IR, and fuzzy recommender systems. Several Web content mining approaches are reviewed below. Rule Extraction Association rule mining is used for discovering associations within datasets.38, 44 Martı́n-Bautista et al.48 proposed a framework based on the retrieved association rules for query refinement. The system first retrieved Web documents to construct text transactions and derive association rules. Fuzzy-set theory was then applied to text transactions and association rules for determining the presence of the items in the transactions, which provided additional terms for the query for guiding the search and improving retrieval. Questionnaire mining is a Web content mining approach for analyzing open questionnaire data. Chen and Weng49 created seven questionnaires data and defined the extracted patterns from the questionnaire dataset. The fuzzy association rules were then discovered from the questionnaire dataset to evaluate the performance of the proposed approach. Fard et al.50 proposed a text and image retrieval architecture for processing dynamic Web content taxonomy using a fuzzy adaptive resonance theory neural network. This architecture handled the dynamic clustering of incremental information. Their approach is helpful for mining multimedia content without metadata. Schockaert et al.51 designed heuristic techniques to extract temporal information from Web 2013 John Wiley & Sons, Inc. Volume 3, May/June 2013 WIREs Data Mining and Knowledge Discovery A survey of fuzzy web mining T A B L E 1 Summary of Fuzzy Web Usage Mining Methods Authors 13, 14 Hong et al. Wong et al.15 Krishnapuram et al.16 Zhou et al.17 Hong et al.19, 20 Wu18 Nasraoui et al.23 Eirinaki and Vazirgiannis21 Pierrakos et al.22 Bae et al.24 Zhou et al.25 Kim and Cho26 Joshi and Krishnapuram27 Santhisree and Damodaram28 Nasraoui and Petenes30 Fong et al.31 Porcel et al.32 Abraham35 Wang et al.36 Year Content Category 1996, 2002 2001 2001 2005 2008 2010 1999 2003 2003 2003 2006 2007 2008 2011 2003 2011 2012 2003 2005 Fuzzy association rules and fuzzy sequential patterns Fuzzy association rules Two clustering approaches (FCMdd and RFCMdd) Association behaviors in visualization Fuzzy object-oriented Web mining Generalized association rules Clustering for analyzing user sessions Analyzing navigational behaviors and browsing logs A tool for enhancing customer loyalty An ad selector system for clustering customers by SOM Period personalization system Based on link information for personalization Analysis of interactions for personalization CLIQUE algorithm for clustering Web sessions Fuzzy approximate reasoning for recommendation Generating personal Web usage ontology for recommendation A hybrid fuzzy linguistic recommender system i-Miner for enhancing e-commerce Clustering Web traffic for predication Rule extraction Rule extraction Rule extraction Rule extraction Rule extraction Rule extraction Personalization Personalization Personalization Personalization Personalization Personalization Personalization Personalization Recommendation Recommendation Recommendation Application Application documents. It helps improve the reliability of the extracted information and deal with conflicts that arise because of the vagueness of events. The obtained fuzzy temporal relations can thus be used to target temporally constrained retrieval tasks effectively. The growth of Web 2.0 has provided Web reviews and comments for Web content mining.52, 53 Nadali et al.54 proposed a fuzzy logic model for semantically classifying customer reviews into five linguistic terms, resulting in more human oriented querying. Si and Wang55 presented an approach for extracting Web forum content based on templates. Web pages were translated into a DOM tree for determining whether they match the templates. Semantic Web and Ontology Subasic and Huettner56 proposed a system that combines natural language processing (NLP) and fuzzy logic to handle Web content with unstructured data. The proposed system can analyze and visualize Web content, thus helping managers make decisions. Ontology is an efficient conceptual structure used in the semantic Web. A fuzzy ontology generation framework (FOGA) was proposed for automatically generating a fuzzy ontology based on uncertain information.57 The approximate reasoning approach was also designed to allow the generated fuzzy ontology evolving with new instances incrementally. Trappey et al.58 presented a hierarchical clustering approach for knowledge document self- Volume 3, May/June 2013 C organization, which was especially useful for patent analysis. The proposed method automatically interpreted and clustered documents into an ontology schema. Fuzzy logic was used to find the appropriate document clusters for specific patents based on their derived ontological semantic Webs. Other Applications Web service discovery plays an important role in distributed computing environments. Gholamzadeh and Taghiyareh59 proposed a fuzzy semantic clustering algorithm for efficiently discovering Web services. It automatically found the semantic similarity among web services through an individual query for semantic clustering. The proposed algorithm could perform in a reasonable time by adapting the reeducation mechanism of search space. Kim and Cho60 designed an ensemble structure-adaptive SOM (SASOM) that integrated a fuzzy interval approach to classify Web documents based on user preference. On the basis of the proposed SASOM, it can efficiently classify documents for pattern recognition and visualization and efficiently predict users’ preference. In the IR systems, precision and recall are two commonly used criteria to evaluate the performance. Zadrożny et al.61 designed a bipolar information model and used database queries to collect related textual documents in IR. The proposed bipolar queries combine fuzzy logic with a sophisticated representation of user preferences and intentions to make the search from 2013 John Wiley & Sons, Inc. 193 Advanced Review wires.wiley.com/widm T A B L E 2 Summary of Fuzzy Web Content and Web Structure Mining Methods Authors Year Web Content Mining Martı́n-Bautista et al.48 Fard et al.50 Chen and Weng49 Schockaert et al.51 Nadali et al.54 Si et al.55 Subasic and Huettner56 Tho et al.57 Trappey et al.58 Kim et al.60 Gholamzadeh and Taghiyareh59 Zadrożny et al.61 2004 2006 2009 2010 2010 2010 2001 2006 2009 2004 2010 2012 López-Herrera et al.62 Web Structure Mining Saremi et al.63 Furnadzhiev64 Herrera-Viedma et al.65 Leitao et al.66 Herrera-Viedma and Peis67 Herrera-Viedma et al.68 Zhang et al.69 Maio et al.70 Content Category Association rules for query refinement Dynamic clustering for incremental information Questionnaire data mining for evaluating system performance Extracting temporal information from web documents Semantic classification for customized query Extracting Web forum information for DOM tree Analyzing Web content for visualization by NLP Automatic fuzzy ontology generation (FOGA) Hierarchical clustering for SOM Classify documents by SASOM for visualization Semantic clustering for Web services Bipolar information modeling and database queries to collect the textual documents 2009 Multiobjective evolutionary algorithms to automatically learn extended Boolean queries Rule extraction Rule extraction Rule extraction Rule extraction Rule extraction Rule extraction Semantic web Ontology Ontology Application Application Information retrieval 2006 2004 2007 2007 2003 2006 XML/HTML document XML/HTML document XML/HTML document XML/HTML document XML/HTML document XML/HTML document Modeling Web pages and content in linguistic terms Classifying Websites by their external features Measuring the quality of XML documents Bayesian theory and probabilities for detecting duplication Fuzzy evaluation method of SGML documents Analyzing the information quality of Websites to generate the linguistic recommendations 2009 Constructing fuzzy ontologies from fuzzy UML models 2012 Ontology-based retrieval approach for data organization and visualization by FCA vast resources of textual documents intelligent and flexible. In the past, traditional Boolean queries are extended to define user’s queries in a manual way in fuzzy ordinal linguistic IRSs (FOLIRSs). LópezHerrera et al.62 then presented an analysis of two wellknown general-purpose multiobjective evolutionary algorithms to automatically learn extended Boolean queries in FOLIRSs. A summary of fuzzy Web content mining methods is given in Table 2. FUZZY WEB STRUCTURE MINING Websites play a major role in e-business success.71 A better hyperlink structure makes it easier for users to find information, thus enhancing Website navigation. It is thus important to design a systematic Website, including its architecture, route path, and the page content. Web structure mining provided the hyperlink structure of Web pages as an additional information resource for analysis.72 It usually identified the relationships between linked Web pages or their connection. The information retrieved from Web structure 194 C Information retrieval Ontology Ontology mining is used to improve hyperlinks to internal or external Web pages. Thus, Web pages are clustered to enhance site navigation. Two algorithms for Web structure mining are PageRank73 and Hypertext Induced Topic Selection (HITS).74 PageRank used hyperlink weight normalization and the equilibrium distribution of random surfers as the citation score. HITS made a distinction between hubs (sites that link to informative sites) and authorities (informative sites which are linked to by hubs) to calculate them in a mutually reinforcing way. Moreno et al.75 proposed a qualitative and user-oriented methodology for assessing quality of health-related websites based on a 2-tuple fuzzy linguistic approach. In their approach, the 2-tuple linguistic weighted average operator is successfully applied without a loss of information. XML/HTML Documents In Web structure mining, Web pages and Web content are considered as significant factors for Website navigation. Some useful changes to the link structure can influence Website navigation by the above 2013 John Wiley & Sons, Inc. Volume 3, May/June 2013 WIREs Data Mining and Knowledge Discovery A survey of fuzzy web mining factors, thus optimizing the Website architecture. FST has been used to model the relationship of these two factors using the linguistic terms ‘weak’ `medium’, or `strong’.63 Furnadzhiev64 proposed a FST approach for classifying Websites into five categories according to their external features. The method early determined the relevant text and structural features of Websites without any preliminary knowledge used in the development process. Herrera Viedma et al.65 presented a fuzzy evaluation model for measuring the quality of XML documents on the Websites. It was a user-centered model that evaluated information quality based on user preference. An evaluation scheme and a computing method for quality rating were designed in the proposed model. The evaluation scheme relied on the characteristics of a Website and the content of XML documents. The quality rating was used to measure the ability of Websites. The proposed model can use XML schema language to improve the representation of Website documents. Website quality rating helped users find the required highest quality XML resources. Leitao et al.66 designed a duplication algorithm for detecting whether XML documents are hierarchical or semi-structured. The proposed approach considered the duplicate status of children and the probability of descendants being duplicated. A Bayesian network was used to calculate the probabilities for descendants and ascendants for detecting duplications. Herrera-Viedma and Peis67 proposed a fuzzy evaluation method of SGML documents based on the concept of computing with words. In that proposed method, the Web can be easily extended to evaluate both of the HyperText Markup Language and eXtensible Markup Language documents. They then proposed a method to generate linguistic recommendations from the information quality of contentbased Websites based on users’ perceptions.68 Two main components and an evaluation scheme are then proposed to analyze the information quality of Websites. A measurement method and two new linguistic aggregation operators called Majority guided Linguistic Induced Ordered Weighted Averaging (MLIOWA) and weighted MLIOWA operators are designed to generate linguistic recommendations according to the majority of the evaluation judgments provided by different visitors. Ontology Zhang et al.69 stated that imprecise and uncertain information cannot be constructed using traditional Web ontology methodology. Thus, fuzzy ontologies were proposed to develop fuzzy ontology structures and instances using fuzzy unified modeling language Volume 3, May/June 2013 C (UML) models. The three steps including investing fuzzy UML models, proposing formal definitions of fuzzy UML models, and introducing fuzzy ontology were then described. The UML model and its fuzzy UML instantiations were correspondingly translated into the fuzzy ontology structure and fuzzy ontology instances. Thus, the proposed approach acted as a bridge between the existing fuzzy applications of UML models. Maio et al.70 implemented an ontology-based approach for retrieving information in visualization for a better navigation interface of a multifacet view of the built ontology. It used the FCA theory to get the conceptualizations from datasets and to generate the hierarchical information. A summary of fuzzy Web structure mining methods is given in Table 2. DISCUSSION AND FUTURE WORKS As a result of the explosive growth of information resources, the social networking systems, such as blog, wiki, facebook and twitter, and alike, have rapidly emerged on Internet. These systems provide two-way communication, unstructured and dynamic content, active collaboration, and crowdsourcing architecture, which are also called Web 2.0. O’Reilly formulated and pointed the differences between Web 1.0 and Web 2.0 in many aspects.76 The information requested by users on Web 2.0 becomes complicated, massive, and heterogeneous. Thus, Deep Web77, 78 and Big Data79 mining will manifest as the trend of next technological revolution on Web mining. How to efficiently derive and discover useful knowledge, track and analyze their usage patterns from the enormous Web resources will be a critical research in the future. For now, the information technology has been involved into our daily life. Many devices and objects themselves may contain tiny computers, embedded systems or a series of intelligent activities for being fully integrated to Web. Some examples include RFID, smart phone, GPS, smart TV among others. Devices and objects can communicate and dialogue among them and with human being via wireless networks, thus being referred as Internet of Things (IoT).80–82 Haller et al.83 defined IoT as a world where physical objects are seamlessly integrated into the information network, and where the physical objects can become active participants in business process. They also stated these smart objects over the Internet can provide interactive services and information, and will consider the security and privacy issues. In addition, some advanced research about information collection, information transfer, and intelligent 2013 John Wiley & Sons, Inc. 195 Advanced Review wires.wiley.com/widm processing through the cloud architecture and soft computing technology is worthy of studying. Because the information can be collected and integrated from a variety of resources, the privacy-preserving issue becomes a critical issue in this research field.84 More algorithms and techniques may then be designed and proposed to secure or sanitize the protected information. CONCLUSIONS The Internet has become an unlimited resource for discovering useful information. Users may retrieve information they need from different Web resources. Web datasets, however, consist of imprecise, incomplete, and uncertain data. Thus, efficient approaches are needed to identify useful, meaningful, and interesting patterns to users. Fuzzy-set theory has been applied to handle these datasets and provide better solutions than traditional approaches. It is commonly and easily used to present users’ information requests in an interpretable way. This survey reviews studies on fuzzy Web usage mining, fuzzy Web content mining, and fuzzy Web structure mining. It is obvious to see that the research papers in fuzzy Web content mining and fuzzy Web structure mining are not as many as those in fuzzy usage Web mining. We look forward to seeing more research papers in these two directions in the future. Furthermore, efficiently and effectively handling big data and heterogeneous data has been an inevitable trend for web mining. Incorporating cloud computing, Computational Intelligence, and IoT in extracting and integrating useful knowledge from big data for helping the convenience of decision makers and general users will be worthy of future development. REFERENCES 1. Etzioni O. The world-wide web: Quagmire or gold mine? Commun ACM 1996, 39:65–68. 2. Cooley R, Mobasher B, Srivastava J. Web mining: information and pattern discovery on the world wide web. IEEE International Conference on Tools with Artificial Intelligence. Newport Beach, California, 3–8 November 1997. 3. Cooley R, Mobasher B, Srivastava J. Grouping web page references into transactions for mining world wide web browsing patterns. IEEE Knowledge and Data Engineering Exchange Workshop. Newport Beach, California, 4 November 1997, 2–9. 4. Kosala R, Blockeel H. Web mining research: a survey. SIGKDD Explor 2000, 2:1–15. 5. Pal SK, Talwar V, Mitra P. Web mining in soft computing framework: relevance, state of the art and future directions. IEEE Trans Neural Netw 2002, 13:1163– 1177. 6. Zadeh LA. Fuzzy sets. Inf Control 1965, 8:338–353. 7. Kandel A. Fuzzy Expert Systems. Clermont, FL: CRC Press Inc 1991. 8. Lin TY. Granular computing: fuzzy logic and rough sets. Comput Words Inf/Intell Syst 1999, 1:183–200. 9. Famili A, Shen WM, Weber R, Simoudis E. Data preprocessing and intelligent data analysis. Intell Data Anal 1997, 1:3–23. 10. Mobasher B, Jain N, Han EH, Srivastava J. Web mining: pattern discovery from world wide web transactions. Technical Report TR96–050; 1996, 558–567. Available at: http://citeseerx.ist.psu.edu/viewdoc/ download?doi=10.1.1.57.4087&rep=rep1&type= pdf. (Accessed April 5, 2013). 196 C 11. Joshi A, Krishnapuram R. Robust fuzzy clustering methods to support web mining. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. Seattle, Washington, 2–4 June 1998, 1–8. 12. Robert JS, Cooley R, Deshp M, Tan PN. Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor 2000, 1:11–23. 13. Hong TP, Lee CY. Induction of fuzzy rules and membership functions from training examples. Fuzzy Sets Syst 1996, 84:33–47. 14. Hong TP, Chiang MJ, Wang SL. Mining weighted browsing patterns with linguistic minimum supports. IEEE International Conference on Systems, Man and Cybernetics. Yasmine Hammamet, Tunisia, 6–9 October 2002, 635–639. 15. Wong C, Shiu S, Pal S. Mining fuzzy association rules for web access case adaptation. The International Conference on Case Based Reasoning, Workshop. Vancouver, BC, Canada, 30 July–2 August 2001. 16. Krishnapuram R, Joshi A, Nasraoui O, Yi L. Lowcomplexity fuzzy relational clustering algorithms for web mining. IEEE Trans Fuzzy Syst 2001, 9: 595–607. 17. Zhou B, Hui SC, Fong ACM. Discovering and visualizing temporal-based web access behavior. The International Conference on Web Intelligence. Compiègne University of Technology, France, 19–22 September 2005. 18. Wu R. Mining generalized fuzzy association rules from web logs. The International Conference on Fuzzy Systems and Knowledge Discovery. Yantai, Shandong, China, 10–12 August 2010. 2013 John Wiley & Sons, Inc. Volume 3, May/June 2013 WIREs Data Mining and Knowledge Discovery A survey of fuzzy web mining 19. Hong TP, Chiang MJ, Wang SL. Mining fuzzy weighted browsing patterns from time duration and with linguistic thresholds. Am J Appl Sci 2008, 5:1611–1621. 20. Hong TP, Huang CM, Horng SJ. Linguistic objectoriented web-usage mining. Int J Approx Reason 2008, 48:47–61. 21. Eirinaki M, Vazirgiannis M. Web mining for web personalization. ACM Trans Internet Technol 2003, 3:1– 27. 22. Pierrakos D, Paliouras G, Papatheodorou C, Spyropoulos CD. Web usage mining as a tool for personalization: a survey. User Model User Adapt Interact 2003, 13:311–372. 23. Nasraoui Olfa FH, Joshi A, Krishnapuram R. Mining web access logs using relational competitive fuzzy clustering. The International Fuzzy Systems Association World Congress. Taipei, Taiwan, 17–20 August 1999, 195–204. 24. Bae SM, Ha SH, Park SC. Fuzzy web ad selector based on web usage mining. IEEE Intell Syst 2003, 18:62– 69. 25. Zhou B, Hui SC, Fong ACM. An effective approach for periodic web personalization. The International Conference on Web Intelligence. Hong Kong, China, 18– 22 December 2006. 26. Kim KJ, Cho SB. Personalized mining of web documents using link structures and fuzzy concept networks. Appl Soft Comput 2007, 7:398–410. 27. Joshi A, Krishnapuram R. On mining web access logs. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. Vancouver, Canada, 9–12 June 2008, 63–69. 28. Santhisree K, Damodaram A. Clique: clustering based on density on web usage data: experiments and test results. The International Conference on Electronics Computer Technology. Kanyakumari, India, 8–10 April 2011. 29. Klir GJ, Yuan B. Fuzzy Sets and Fuzzy Logic: Theory and Applications. Upper Saddle River, NJ: Prentice Hall Inc; 1995. 30. Nasraoui O, Petenes C. Combining web usage mining and fuzzy inference for website personalization. WebKDD. Washington DC, 27 August 2003. 31. Fong ACM, Zhou B, Hui SC, Tang J, Hong GY. Generation of personalized ontology based on consumer emotion and behavior analysis. IEEE Trans Affec Comput 2012, 3:152–164. 32. Porcel C, Tejeda-Lorente A, Martı́nez MA, HerreraViedma E. A hybrid recommender system for the selective dissemination of research resources in a technology transfer office. Inf Sci 2012, 184:1–19. 33. Chen PM, Kuo FC. An information retrieval system based on a user profile. J Syst Softw 2000, 54:3–8. Volume 3, May/June 2013 C 34. Cheung DW, Kao B, Lee J. Discovering user access patterns on the world wide web. Knowledge Based Syst 1998, 10:463–470. 35. Abraham A. Business intelligence from web usgae mining. J Inf Knowledge Manage 2003, 2:375–390. 36. Wang X, Abraham A, Smith KA. Intelligent web traffic mining and analysis. J Netw Comput Appl 2005, 28:147–165. 37. Pol K, Patil N, Shreya P, Chhaya D. A survey on web content mining and extraction of structured and semistructured data. The International Conference on Emerging Trends in Engineering and Technology. Nagpur, Maharashtra, 16–18 July 2008, 543–546. 38. Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. The International Conference on Very Large Data Bases. Santiago de Chile, 12–15 September 1994, 487–499. 39. Hong TP, Lee YC. An overview of mining fuzzy association rules. Fuzzy Sets Their Extens Represent Aggreg Models 2008, 220:397–410. 40. Lent B, Swami A, Widom J. Clustering association rules. The International Conference on Data Engineering. Birmingham, 7–11 April 1997. 41. Liu F, Lu Z, Lu S. Mining association rules using clustering. Intell Data Anal 2001, 5:309–326. 42. Agrawal R, Srikant R. Mining sequential patterns. The International Conference on Data Engineering. Taipei, Taiwan, 6–10 March 1995. 43. Pei J, Han J, Behzad MA, Wang J, Helen P, Chen Q, Umeshwar D, Hsu MC. Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowledge Data Eng 2004, 16:1424–1440. 44. Hong TP, Lin CW, Wu YL. Incrementally fast updated frequent pattern trees. Expert Syst Appl 2008, 34:2424–2435. 45. Hong TP, Wu CH. An improved weighted clustering algorithm for determination of application nodes in heterogeneous sensor networks. J Inf Hiding Multimedia Signal Process 2011, 2:173–184. 46. Lin CW, Hong TP, Lu WH. The pre-fufp algorithm for incremental mining. Expert Syst Appl 2009, 36:9498– 9505. 47. Herrera-Viedma E, Pasi G. Fuzzy approaches to access information on the web: recent developments and research trends. The Conference of the European Society for Fuzzy Logic and Technology. Zittau, Germany, 10–12 September 2003, 25–31. 48. Martı́n-Bautista MJ, Sánchez D, Chamorro-Martı́nez J, Serrano JM, Vila MA. Mining web documents to find additional query terms using fuzzy association rules. Fuzzy Sets Syst 2004, 148:85–104. 49. Chen YL, Weng CH. Mining fuzzy association rules from questionnaire data. Knowledge Based Syst 2009, 22:46–56. 2013 John Wiley & Sons, Inc. 197 Advanced Review wires.wiley.com/widm 50. Fard AM, Akbari H, Mohammad R, Akbarzadeh T. Fuzzy adaptive resonance theory for content-based data retrieval. Innovations in Information Technology. Dubai, 19–21 November 2006. 51. Schockaert S, De Cock M, Kerre E. Reasoning about fuzzy temporal information from the web: Towards retrieval of historical events. Soft Comput 2010, 14:869– 886. 52. Liu W, Yan H, Xiao J. Automatically mining review records from forum web sites. The International Conference on Fuzzy Systems and Knowledge Discovery. Yantai, Shandong, 10–12 August 2010. 53. Cheng LC, Ke ZH, Shiue BM. Detecting changes of opinion from customer reviews. The International Conference on Fuzzy Systems and Knowledge Discovery. Shanghai, China, 26–28 July 2011. 54. Nadali S, Murad MAA, Kadir RA. Sentiment classification of customer reviews based on fuzzy logic. The International Symposium in Information Technology. Kuala Lumpur, Malaysia, 15–17 June 2010. 55. Si J, Wang W. A template-based forum posts content extraction method. The International Conference on Electrical and Control Engineering. Yichang, China, 16–18 September 2011. 56. Subasic P, Huettner A. Affect analysis of text using fuzzy semantic typing. IEEE Trans Fuzzy Syst 2001, 9:483–496. 57. Tho QT, Hui SC, Fong ACM, Tru Hoang C. Automatic fuzzy ontology generation for semantic web. IEEE Trans Knowledge Data Eng 2006, 18: 842–856. 58. Trappey A, Trappey CV, Fu Chiang H, Hsiao DW. A fuzzy ontological knowledge document clustering methodology. IEEE Trans Syst Man Cybernet B 2009, 39:806–814. 59. Gholamzadeh N, Taghiyareh F. Ontology-based fuzzy web services clustering. The International Symposium on Telecommunications. Kish Island, Iran, 4–6 December 2010, 721–725. 60. Kim KJ, Cho S-B. Fuzzy integration of structure adaptive soms for web content mining. Fuzzy Sets Syst 2004, 148:43–60. 61. Zadrożny S, Kacprzyk J, Tré GD. Bipolar queries in textual information retrieval: a new perspective. Inf Process Manage 2012, 48:390–398. 62. López-Herrera AG, Herrera-Viedma E, Herrera F. Applying multi-objective evolutionary algorithms to the automatic learning of extended boolean queries in fuzzy ordinal linguistic information retrieval systems. Fuzzy Sets Syst 2009, 160:2192–2205. 63. Saremi HQ, Montazer GA. Web usability: a fuzzy approach to the navigation structure enhancement in a website system, case of Iranian civil aviation organization website. Int J Appl Math Comput Sci 2005, 2:131–136. 198 C 64. Furnadzhiev G. Using web sites external views for fuzzy classification. Int J Inf Theor Appl 2004, 11:194– 199. 65. Herrera-Viedma E, Peis E, Morales del Castillo JM, Alonso S, Anaya K. A fuzzy linguistic model to evaluate the quality of web sites that store xml documents. Int J Approx Reason 2007, 46:226–253. 66. Leitao L, Calado P, Weis M. Structure-based inference of xml similarity for fuzzy duplicate detection. ACM Conference on Information and Knowledge Management. Lisbon, Portugal, 6–10 November 2007, 293– 302. 67. Herrera-Viedma E, Peis E. Evaluating the informative quality of documents in sgml format from judgements by means of fuzzy linguistic techniques based on computing with words. Inf Process Manage 2003, 39:233– 249. 68. Herrera-Viedma E, Pasi G, Lopez-Herrera AG, Porcel C. Evaluating the information quality of web sites: a methodology based on fuzzy computing with words: special topic section on soft approaches to information retrieval and information access on the web. J Am Soc Inf Sci Technol 2006, 57:538–549. 69. Zhang F, Ma ZM, Cheng J, Meng X. Fuzzy semantic web ontology learning from fuzzy uml model. ACM Conference on Information and Knowledge Management. Hong Kong, China, 2–6 November 2009, 1007– 1016. 70. Maio CD, Fenza G, Loia V, Senatore S. Hierarchical web resources retrieval by exploiting fuzzy formal concept analysis. Inf Process Manage 2012, 48:399–418. 71. Kim W, Song YU, Hong JS. Web enabled expert systems using hyperlink-based inference. Expert Syst Appl 2005, 28:79–91. 72. F¨urnkranz J. Web structure mining exploiting the graph structure of the world-wide web. ÖGAI J 2002, 21:17–26. 73. Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 1998, 30:107–117. 74. Kleinberg JM. Authoritative sources in a hyperlinked environment. J ACM 1999, 46:604–632. 75. Moreno J, Morales del Castillo J, Porcel C, HerreraViedma E. A quality evaluation methodology for health-related websites based on a 2-tuple fuzzy linguistic approach. Soft Comput 2010, 14:887–897. 76. Oreilly T. What is web 2.0: design patterns and business models for the next generation of software. Commun Strategies 2007, 1:17. 77. Shestakov D, Bhowmick SS, Lim E-P. Deque: querying the deep web. Data Knowledge Eng 2005, 52:273– 311. 78. Chang KCC, Cho J. Accessing the web: from search to integration. ACM SIGMOD. Chicago, Illinois, 27–29 June 2006, 804–805. 2013 John Wiley & Sons, Inc. Volume 3, May/June 2013 WIREs Data Mining and Knowledge Discovery A survey of fuzzy web mining 79. Madden S. From databases to big data. IEEE Internet Comput 2012, 16:4–6. 80. Atzori L, Iera A, Morabito G. The internet of things: a survey. Comput Netw 2010, 54:2787–2805. 81. Miorandi D, Sicari S, De Pellegrini F, Chlamtac I. Internet of things: vision, applications and research challenges. Ad Hoc Netw 2012, 10:1497–1516. 82. Sarma S, Brock DL, Ashton K. The networked physical world. TR MIT-AUTOID-WH-001, MIT Auto- Volume 3, May/June 2013 C ID Center; 2000. Available at: http://www.autoidlabs. org/uploads/media/MIT-AUTOID-WH-001.pdf. (Accessed April 5, 2013). 83. Haller S, Karnouskos S, Schroth C. The internet of things in an enterprise context future internet. Future Internet 2009, 5468:14–28. 84. Weber RH. Internet of things-new security and privacy challenges. Comput Law Security Rev 2010, 26:23– 30. 2013 John Wiley & Sons, Inc. 199