Download A survey of fuzzy web mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Advanced Review
A survey of fuzzy web mining
Chun-Wei Lin1 and Tzung-Pei Hong2,3 ∗
The Internet has become an unlimited resource of knowledge, and is thus widely
used in many applications. Web mining plays an important role in discovering
such knowledge. This mining can be roughly divided into three categories, including Web usage mining, Web content mining, and Web structure mining.
Data and knowledge on the Web may, however, consist of imprecise, incomplete, and uncertain data. Because fuzzy-set theory is often used to handle such
data, several fuzzy Web-mining techniques have been proposed to reveal fuzzy
and linguistic knowledge. This paper reviews these techniques according to the
three Web-mining categories above—fuzzy Web usage mining, fuzzy Web content mining, and fuzzy Web structure mining. Some representative approaches
C 2013 Wiley Periodicals, Inc.
in each category are introduced and compared. How to cite this article:
WIREs Data Mining Knowl Discov 2013, 3: 190–199 doi: 10.1002/widm.1091
INTRODUCTION
T
he number and variety of databases have increased with the growth of digital information.
Mining meaningful information from large databases
thus becomes more and more important. Many datamining techniques have thus been developed to derive
useful knowledge or rules for making efficient decisions from large databases. Besides, the Internet has
become an essential resource of information and Web
mining plays a key role in discovering relevant knowledge from it. Web mining is the application of datamining techniques to discover the target information
and knowledge from Web documents and services.1, 2
Generally, Web mining can be divided into three categories, namely Web usage mining, Web content mining, and Web structure mining.3 Web usage mining
is aimed at mining usage behavior from Web access
logs, user profiles, user queries, and clickstreams. The
The authors have declared no conflicts of interest in relation to this
article.
∗ Correspondence
to: [email protected]
1 Innovative
Information Industry Research Center (IIIRC), School
of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University
Town, Xili, Shenzhen, People’s Republic of China
2 Department of Computer Science and Information Engineering,
National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C.
3 Department of Computer Science and Engineering, National Sun
Yat-sen University, Kaohsiung, Taiwan, R.O.C.
DOI: 10.1002/widm.1091
190
C
datasets are generated by the interactions between
users and the Web, and can be used for discovering
user access patterns on servers. Web content mining
is used to mine knowledge from multimedia documents, including text, images, audio, videos, metadata, and hyperlinks for extracting relations across
the Internet. Web content mining can also be considered as information retrieval (IR) from unstructured
and semi-structured Web data.4 Web structure mining focuses on interrelations between data, providing a linking graph among Websites. The patterns of
hyperlinks on connected Web pages and the document structure analysis of HTML or XML tag usage are the two main approaches of Web structure
mining.
Because Web data are usually unstructured, distributed, and heterogeneous, it is necessary to design
efficient approaches for extracting, filtering, and evaluating the required information. Some strategies in
IR, knowledge discovery (KDD), machine learning,
and artificial intelligence are used for handling Web
databases to generate human-like decisions.5 Soft
computing tools, including fuzzy logic,6 are widely
used in Web mining for processing uncertain, incomplete, and imprecise information because of their
simplicity and ability to model human reasoning.7
Fuzzy set theory (FST) was first proposed by Zadeh
in 1965.6 Fuzzy sets (also named as fuzzy clumps8 )
can be thought of as an extension of set theory. FST
is primarily concerned with quantifying and reasoning using natural language, in which words can have
2013 John Wiley & Sons, Inc.
Volume 3, May/June 2013
WIREs Data Mining and Knowledge Discovery
A survey of fuzzy web mining
ambiguous meanings,7, 9 thus providing useful tools
for decision making.
In this paper, the application of FST to various
aspects of Web mining is surveyed. Web usage mining, Web content mining, and Web structure mining
conducted using fuzzy sets are reviewed in sections
Fuzzy Web Usage Mining, Fuzzy Web Content Mining, and Fuzzy Web Structure Mining, respectively.
The conclusions are given in the last section.
FUZZY WEB USAGE MINING
The WebMiner was the first system developed for
Web usage mining.10 Joshi and Krishnapuram11
found that extracted information for association rules
or clustering and sequential patterns from Web data
do not have crisp boundaries, indicating that Web
mining is nontrivial work compared with traditional
data mining. FST was adopted for handling uncertain, vague, incomplete, and noisy datasets. Web usage mining was then used to derive usage patterns
from Web logs.12 General cases for Web usage mining
conducted using fuzzy concepts are described below.
Rule Extraction
Because of its simplicity and similarity to human reasoning, FST has been applied to mine rules. Hong
et al.13, 14 used FST to efficiently mine the relationships among the items of Web databases. Web
logs provide useful information for discovering user
access records on a Website. The records can be
used to derive useful patterns for constructing more
personalized Websites. Fuzzy association rules have
been mined by integrating the case-based reasoning
approach.15 It used Web access prediction and recommendation for finding fuzzy association rules from
Web logs and user profiles. Krishnapuram et al.16
developed the fuzzy-medoids (FCMdd) and robust
fuzzy-medoids (RFCMdd) algorithms for clustering
relational data using fuzzy dissimilarity for Web documents, snippets, and user sessions. Zhou et al.17 proposed an approach to discover the association behavior patterns of individual users in visualization. Wu18
proposed a generalized method for fuzzy association
rule mining from Web logs. Web page visits and the
duration time of visits were used to reflect user interest
and preferences.
Hong et al.19 proposed a fuzzy Web-mining
algorithm for discovering useful user browsing behaviors based on durations of Web page visits acquired from Web logs. The importance of Web pages
was evaluated using linguistic terms, which were then
Volume 3, May/June 2013
C
transformed and averaged as fuzzy sets of weights.
Each linguistic term was weighted according to its
importance for its page. In this approach, the linguistic term with the maximum cardinality for a page was
chosen in subsequent mining processes, thus reducing
the time complexity. Hong et al.20 also developed a
fuzzy object-oriented Web-mining algorithm for discovering fuzzy knowledge from object data logs on
Web server. Each Web page was treated as a class,
and each browsed Web page by a client is considered
as an instance. Using their proposed framework, both
the intrapage linguistic association rules and interpage
linguistic browsing patterns can be easily derived at
the same time.
Personalization
Web personalization refers to customizing a Website to the needs or interests of users, which can be
achieved by collecting user navigational behaviors
and browsing (access) logs from the Web server.21
The personalization of Web services is an important
step toward building friendly and individual interfaces, thus enhancing the long-term engagement and
loyalty of users.22 Nasraoui et al.23 defined a ‘user
session’ as a temporally compact sequence of web
accesses by a user. A distance measure between two
Web sessions was also defined to capture the organization of a Website. The proposed algorithm automatically clustered data into the optimal number of
components to analyze server access logs and obtain
typical session profiles of users. Bae et al.24 developed
a system for mining the Web log files of customers
to recommend suitable ads to users. The system first
clustered the customers using a self-organizing map
(SOM) to divide them into segments based on similar
preferences. Expert advice was used to help determine suitable ads according to the mined patterns.
Thus, the patterns and ads generated the fuzzy rules
by fuzzy inference for recommendation. Zhou et al.25
proposed a period personalization system for analyzing periodic access patterns for recommending the
most relevant information to users. The system first
constructed a personal Web usage lattice using a fuzzy
formal concept analysis (FCA) technique to efficiently
determine the resources of user’s interest during the
specific period.
Kim and Cho26 asserted that a personalized
search engine is an important tool for finding Web
documents. They proposed a system that yielded more
personalized results based on link information.26
The Web concept network determining relevance
with the use of mechanisms of fuzzy logic, which
2013 John Wiley & Sons, Inc.
191
Advanced Review
wires.wiley.com/widm
was constructed from a user profile. Joshi and
Krishnapuram27 asserted that the interactions between a Website and users should be analyzed to design more personalized Websites. They proposed a
framework for automatically discovering use session
profiles in Web logs. On the basis of their approach,
better session profiles were obtained by grouping similar sessions together when compared with
those obtained using traditional association rules.
Santhisree and Damodaram28 proposed the CLIQUE
(CLUstering in QUEst) algorithm for clustering Web
sessions for Web personalization. Various fuzzy similarity measures were used to measure the similarity of
Web sessions using sequence alignment to determine
learning behaviors.
Recommendation Systems
A recommendation system uses users’ specific interests to automatically recommend the desired information based on Web usage mining. Nasraoui and
Petenes stated that approximate reasoning29 can offer a general framework for the recommendation
process.30 They developed a fast and intuitive Web
recommendation approach that used a fuzzy inference engine to automatically derive rules from the
discovered user profiles. Their framework reduced the
memory requirements of fuzzy recommendation systems and lowered the cost of collaborative filtering.
Fong et al.31 found that customer emotions affect purchase activities. Thus, a semantic mining approach for
periodic Web access patterns was designed through
self-reporting and behavior tracking. A personal Web
usage ontology was generated for personal Website
recommendation according to emotion. Porcel et al.32
proposed a hybrid fuzzy linguistic recommender system to aid the Technology Transfer Office staff in
the dissemination of research resources interesting to
users. The proposed system automatically derives appropriate recommendations and output them to users
about both of the specialized and complementary
research resources. It also helps discover potential
collaboration possibilities to form multidisciplinary
working groups.
Other Applications
KDD from Web usage patterns can be directly applied
to many applications, such as e-business, e-services,
and e-learning.33, 34 Abraham35 proposed an intelligent miner (i-Miner) that optimized Web data clusters using the Takagi-Sugeno fuzzy inference system.
i-Miner analyses the trends of the Website visitors to
optimally segregate similar user interests. On the basis
192
C
of the proposed framework, visitor behavior and profiles were discovered to enhance the business model of
e-commerce Websites. Wang et al.36 proposed a concurrent neuro-fuzzy model for deriving useful knowledge from Web logs. The fuzzy inference system and
a self organizing map (SOM) were used to generate
cluster information for both short-term and long-term
Web traffic trend predictions.36 A summary of fuzzy
Web usage mining methods is given in Table 1.
FUZZY WEB CONTENT MINING
Web content mining focuses on deriving useful information or knowledge from Web page content. It can
be divided into two parts, namely the direct mining
of Web content (documents or pages) and the improvement of content search, such as search engines.37
Data-mining techniques38–46 such as association rule
mining, clustering, and sequential patterns can be applied to mine Web content. FST was used to create a fuzzy IR model for Web search.47 The search
engine included indexing mechanisms and query languages, fuzzy document clustering, fuzzy data mining, fuzzy approaches for distributed IR, and fuzzy
recommender systems. Several Web content mining
approaches are reviewed below.
Rule Extraction
Association rule mining is used for discovering associations within datasets.38, 44 Martı́n-Bautista et al.48
proposed a framework based on the retrieved association rules for query refinement. The system first
retrieved Web documents to construct text transactions and derive association rules. Fuzzy-set theory
was then applied to text transactions and association rules for determining the presence of the items
in the transactions, which provided additional terms
for the query for guiding the search and improving
retrieval. Questionnaire mining is a Web content mining approach for analyzing open questionnaire data.
Chen and Weng49 created seven questionnaires data
and defined the extracted patterns from the questionnaire dataset. The fuzzy association rules were then
discovered from the questionnaire dataset to evaluate the performance of the proposed approach. Fard
et al.50 proposed a text and image retrieval architecture for processing dynamic Web content taxonomy
using a fuzzy adaptive resonance theory neural network. This architecture handled the dynamic clustering of incremental information. Their approach
is helpful for mining multimedia content without
metadata. Schockaert et al.51 designed heuristic techniques to extract temporal information from Web
2013 John Wiley & Sons, Inc.
Volume 3, May/June 2013
WIREs Data Mining and Knowledge Discovery
A survey of fuzzy web mining
T A B L E 1 Summary of Fuzzy Web Usage Mining Methods
Authors
13, 14
Hong et al.
Wong et al.15
Krishnapuram et al.16
Zhou et al.17
Hong et al.19, 20
Wu18
Nasraoui et al.23
Eirinaki and Vazirgiannis21
Pierrakos et al.22
Bae et al.24
Zhou et al.25
Kim and Cho26
Joshi and Krishnapuram27
Santhisree and Damodaram28
Nasraoui and Petenes30
Fong et al.31
Porcel et al.32
Abraham35
Wang et al.36
Year
Content
Category
1996, 2002
2001
2001
2005
2008
2010
1999
2003
2003
2003
2006
2007
2008
2011
2003
2011
2012
2003
2005
Fuzzy association rules and fuzzy sequential patterns
Fuzzy association rules
Two clustering approaches (FCMdd and RFCMdd)
Association behaviors in visualization
Fuzzy object-oriented Web mining
Generalized association rules
Clustering for analyzing user sessions
Analyzing navigational behaviors and browsing logs
A tool for enhancing customer loyalty
An ad selector system for clustering customers by SOM
Period personalization system
Based on link information for personalization
Analysis of interactions for personalization
CLIQUE algorithm for clustering Web sessions
Fuzzy approximate reasoning for recommendation
Generating personal Web usage ontology for recommendation
A hybrid fuzzy linguistic recommender system
i-Miner for enhancing e-commerce
Clustering Web traffic for predication
Rule extraction
Rule extraction
Rule extraction
Rule extraction
Rule extraction
Rule extraction
Personalization
Personalization
Personalization
Personalization
Personalization
Personalization
Personalization
Personalization
Recommendation
Recommendation
Recommendation
Application
Application
documents. It helps improve the reliability of the extracted information and deal with conflicts that arise
because of the vagueness of events. The obtained
fuzzy temporal relations can thus be used to target
temporally constrained retrieval tasks effectively. The
growth of Web 2.0 has provided Web reviews and
comments for Web content mining.52, 53 Nadali et
al.54 proposed a fuzzy logic model for semantically
classifying customer reviews into five linguistic terms,
resulting in more human oriented querying. Si and
Wang55 presented an approach for extracting Web
forum content based on templates. Web pages were
translated into a DOM tree for determining whether
they match the templates.
Semantic Web and Ontology
Subasic and Huettner56 proposed a system that
combines natural language processing (NLP) and
fuzzy logic to handle Web content with unstructured
data. The proposed system can analyze and visualize Web content, thus helping managers make decisions. Ontology is an efficient conceptual structure
used in the semantic Web. A fuzzy ontology generation framework (FOGA) was proposed for automatically generating a fuzzy ontology based on uncertain information.57 The approximate reasoning
approach was also designed to allow the generated
fuzzy ontology evolving with new instances incrementally. Trappey et al.58 presented a hierarchical
clustering approach for knowledge document self-
Volume 3, May/June 2013
C
organization, which was especially useful for patent
analysis. The proposed method automatically interpreted and clustered documents into an ontology
schema. Fuzzy logic was used to find the appropriate document clusters for specific patents based on
their derived ontological semantic Webs.
Other Applications
Web service discovery plays an important role in
distributed computing environments. Gholamzadeh
and Taghiyareh59 proposed a fuzzy semantic clustering algorithm for efficiently discovering Web services. It automatically found the semantic similarity
among web services through an individual query for
semantic clustering. The proposed algorithm could
perform in a reasonable time by adapting the reeducation mechanism of search space. Kim and Cho60
designed an ensemble structure-adaptive SOM (SASOM) that integrated a fuzzy interval approach to
classify Web documents based on user preference. On
the basis of the proposed SASOM, it can efficiently
classify documents for pattern recognition and visualization and efficiently predict users’ preference. In the
IR systems, precision and recall are two commonly
used criteria to evaluate the performance. Zadrożny
et al.61 designed a bipolar information model and
used database queries to collect related textual documents in IR. The proposed bipolar queries combine
fuzzy logic with a sophisticated representation of user
preferences and intentions to make the search from
2013 John Wiley & Sons, Inc.
193
Advanced Review
wires.wiley.com/widm
T A B L E 2 Summary of Fuzzy Web Content and Web Structure Mining Methods
Authors
Year
Web Content Mining
Martı́n-Bautista et al.48
Fard et al.50
Chen and Weng49
Schockaert et al.51
Nadali et al.54
Si et al.55
Subasic and Huettner56
Tho et al.57
Trappey et al.58
Kim et al.60
Gholamzadeh and Taghiyareh59
Zadrożny et al.61
2004
2006
2009
2010
2010
2010
2001
2006
2009
2004
2010
2012
López-Herrera et al.62
Web Structure Mining
Saremi et al.63
Furnadzhiev64
Herrera-Viedma et al.65
Leitao et al.66
Herrera-Viedma and Peis67
Herrera-Viedma et al.68
Zhang et al.69
Maio et al.70
Content
Category
Association rules for query refinement
Dynamic clustering for incremental information
Questionnaire data mining for evaluating system performance
Extracting temporal information from web documents
Semantic classification for customized query
Extracting Web forum information for DOM tree
Analyzing Web content for visualization by NLP
Automatic fuzzy ontology generation (FOGA)
Hierarchical clustering for SOM
Classify documents by SASOM for visualization
Semantic clustering for Web services
Bipolar information modeling and database queries to collect the
textual documents
2009 Multiobjective evolutionary algorithms to automatically learn
extended Boolean queries
Rule extraction
Rule extraction
Rule extraction
Rule extraction
Rule extraction
Rule extraction
Semantic web
Ontology
Ontology
Application
Application
Information retrieval
2006
2004
2007
2007
2003
2006
XML/HTML document
XML/HTML document
XML/HTML document
XML/HTML document
XML/HTML document
XML/HTML document
Modeling Web pages and content in linguistic terms
Classifying Websites by their external features
Measuring the quality of XML documents
Bayesian theory and probabilities for detecting duplication
Fuzzy evaluation method of SGML documents
Analyzing the information quality of Websites to generate the
linguistic recommendations
2009 Constructing fuzzy ontologies from fuzzy UML models
2012 Ontology-based retrieval approach for data organization and
visualization by FCA
vast resources of textual documents intelligent and
flexible. In the past, traditional Boolean queries are
extended to define user’s queries in a manual way
in fuzzy ordinal linguistic IRSs (FOLIRSs). LópezHerrera et al.62 then presented an analysis of two wellknown general-purpose multiobjective evolutionary
algorithms to automatically learn extended Boolean
queries in FOLIRSs. A summary of fuzzy Web content
mining methods is given in Table 2.
FUZZY WEB STRUCTURE MINING
Websites play a major role in e-business success.71 A
better hyperlink structure makes it easier for users to
find information, thus enhancing Website navigation.
It is thus important to design a systematic Website,
including its architecture, route path, and the page
content.
Web structure mining provided the hyperlink
structure of Web pages as an additional information
resource for analysis.72 It usually identified the relationships between linked Web pages or their connection. The information retrieved from Web structure
194
C
Information retrieval
Ontology
Ontology
mining is used to improve hyperlinks to internal or
external Web pages. Thus, Web pages are clustered
to enhance site navigation. Two algorithms for Web
structure mining are PageRank73 and Hypertext Induced Topic Selection (HITS).74 PageRank used hyperlink weight normalization and the equilibrium distribution of random surfers as the citation score. HITS
made a distinction between hubs (sites that link to
informative sites) and authorities (informative sites
which are linked to by hubs) to calculate them in a
mutually reinforcing way. Moreno et al.75 proposed
a qualitative and user-oriented methodology for assessing quality of health-related websites based on a
2-tuple fuzzy linguistic approach. In their approach,
the 2-tuple linguistic weighted average operator is
successfully applied without a loss of information.
XML/HTML Documents
In Web structure mining, Web pages and Web content are considered as significant factors for Website
navigation. Some useful changes to the link structure can influence Website navigation by the above
2013 John Wiley & Sons, Inc.
Volume 3, May/June 2013
WIREs Data Mining and Knowledge Discovery
A survey of fuzzy web mining
factors, thus optimizing the Website architecture. FST
has been used to model the relationship of these two
factors using the linguistic terms ‘weak’ `medium’, or
`strong’.63 Furnadzhiev64 proposed a FST approach
for classifying Websites into five categories according to their external features. The method early determined the relevant text and structural features of
Websites without any preliminary knowledge used in
the development process. Herrera Viedma et al.65 presented a fuzzy evaluation model for measuring the
quality of XML documents on the Websites. It was a
user-centered model that evaluated information quality based on user preference. An evaluation scheme
and a computing method for quality rating were designed in the proposed model. The evaluation scheme
relied on the characteristics of a Website and the
content of XML documents. The quality rating was
used to measure the ability of Websites. The proposed
model can use XML schema language to improve the
representation of Website documents. Website quality
rating helped users find the required highest quality
XML resources. Leitao et al.66 designed a duplication algorithm for detecting whether XML documents
are hierarchical or semi-structured. The proposed approach considered the duplicate status of children and
the probability of descendants being duplicated. A
Bayesian network was used to calculate the probabilities for descendants and ascendants for detecting
duplications. Herrera-Viedma and Peis67 proposed a
fuzzy evaluation method of SGML documents based
on the concept of computing with words. In that proposed method, the Web can be easily extended to
evaluate both of the HyperText Markup Language
and eXtensible Markup Language documents. They
then proposed a method to generate linguistic recommendations from the information quality of contentbased Websites based on users’ perceptions.68 Two
main components and an evaluation scheme are then
proposed to analyze the information quality of Websites. A measurement method and two new linguistic
aggregation operators called Majority guided Linguistic Induced Ordered Weighted Averaging (MLIOWA)
and weighted MLIOWA operators are designed to
generate linguistic recommendations according to the
majority of the evaluation judgments provided by different visitors.
Ontology
Zhang et al.69 stated that imprecise and uncertain
information cannot be constructed using traditional
Web ontology methodology. Thus, fuzzy ontologies
were proposed to develop fuzzy ontology structures
and instances using fuzzy unified modeling language
Volume 3, May/June 2013
C
(UML) models. The three steps including investing
fuzzy UML models, proposing formal definitions of
fuzzy UML models, and introducing fuzzy ontology
were then described. The UML model and its fuzzy
UML instantiations were correspondingly translated
into the fuzzy ontology structure and fuzzy ontology instances. Thus, the proposed approach acted as
a bridge between the existing fuzzy applications of
UML models.
Maio et al.70 implemented an ontology-based
approach for retrieving information in visualization
for a better navigation interface of a multifacet view
of the built ontology. It used the FCA theory to get the
conceptualizations from datasets and to generate the
hierarchical information. A summary of fuzzy Web
structure mining methods is given in Table 2.
DISCUSSION AND FUTURE WORKS
As a result of the explosive growth of information
resources, the social networking systems, such as
blog, wiki, facebook and twitter, and alike, have
rapidly emerged on Internet. These systems provide
two-way communication, unstructured and dynamic
content, active collaboration, and crowdsourcing architecture, which are also called Web 2.0. O’Reilly
formulated and pointed the differences between Web
1.0 and Web 2.0 in many aspects.76 The information
requested by users on Web 2.0 becomes complicated,
massive, and heterogeneous. Thus, Deep Web77, 78
and Big Data79 mining will manifest as the trend of
next technological revolution on Web mining. How
to efficiently derive and discover useful knowledge,
track and analyze their usage patterns from the enormous Web resources will be a critical research in the
future.
For now, the information technology has been
involved into our daily life. Many devices and objects themselves may contain tiny computers, embedded systems or a series of intelligent activities for
being fully integrated to Web. Some examples include RFID, smart phone, GPS, smart TV among
others. Devices and objects can communicate and dialogue among them and with human being via wireless
networks, thus being referred as Internet of Things
(IoT).80–82 Haller et al.83 defined IoT as a world
where physical objects are seamlessly integrated into
the information network, and where the physical objects can become active participants in business process. They also stated these smart objects over the
Internet can provide interactive services and information, and will consider the security and privacy issues.
In addition, some advanced research about information collection, information transfer, and intelligent
2013 John Wiley & Sons, Inc.
195
Advanced Review
wires.wiley.com/widm
processing through the cloud architecture and soft
computing technology is worthy of studying. Because
the information can be collected and integrated from a
variety of resources, the privacy-preserving issue becomes a critical issue in this research field.84 More
algorithms and techniques may then be designed
and proposed to secure or sanitize the protected
information.
CONCLUSIONS
The Internet has become an unlimited resource for
discovering useful information. Users may retrieve information they need from different Web resources.
Web datasets, however, consist of imprecise, incomplete, and uncertain data. Thus, efficient approaches
are needed to identify useful, meaningful, and interesting patterns to users.
Fuzzy-set theory has been applied to handle
these datasets and provide better solutions than traditional approaches. It is commonly and easily used to
present users’ information requests in an interpretable
way. This survey reviews studies on fuzzy Web usage
mining, fuzzy Web content mining, and fuzzy Web
structure mining. It is obvious to see that the research
papers in fuzzy Web content mining and fuzzy Web
structure mining are not as many as those in fuzzy
usage Web mining. We look forward to seeing more
research papers in these two directions in the future.
Furthermore, efficiently and effectively handling big
data and heterogeneous data has been an inevitable
trend for web mining. Incorporating cloud computing, Computational Intelligence, and IoT in extracting and integrating useful knowledge from big data
for helping the convenience of decision makers and
general users will be worthy of future development.
REFERENCES
1. Etzioni O. The world-wide web: Quagmire or gold
mine? Commun ACM 1996, 39:65–68.
2. Cooley R, Mobasher B, Srivastava J. Web mining: information and pattern discovery on the world wide
web. IEEE International Conference on Tools with
Artificial Intelligence. Newport Beach, California, 3–8
November 1997.
3. Cooley R, Mobasher B, Srivastava J. Grouping web
page references into transactions for mining world
wide web browsing patterns. IEEE Knowledge and
Data Engineering Exchange Workshop. Newport
Beach, California, 4 November 1997, 2–9.
4. Kosala R, Blockeel H. Web mining research: a survey.
SIGKDD Explor 2000, 2:1–15.
5. Pal SK, Talwar V, Mitra P. Web mining in soft computing framework: relevance, state of the art and future
directions. IEEE Trans Neural Netw 2002, 13:1163–
1177.
6. Zadeh LA. Fuzzy sets. Inf Control 1965, 8:338–353.
7. Kandel A. Fuzzy Expert Systems. Clermont, FL: CRC
Press Inc 1991.
8. Lin TY. Granular computing: fuzzy logic and rough
sets. Comput Words Inf/Intell Syst 1999, 1:183–200.
9. Famili A, Shen WM, Weber R, Simoudis E. Data preprocessing and intelligent data analysis. Intell Data
Anal 1997, 1:3–23.
10. Mobasher B, Jain N, Han EH, Srivastava J. Web
mining: pattern discovery from world wide web transactions. Technical Report TR96–050; 1996, 558–567.
Available at: http://citeseerx.ist.psu.edu/viewdoc/
download?doi=10.1.1.57.4087&rep=rep1&type=
pdf. (Accessed April 5, 2013).
196
C
11. Joshi A, Krishnapuram R. Robust fuzzy clustering
methods to support web mining. ACM SIGMOD
Workshop on Research Issues in Data Mining and
Knowledge Discovery. Seattle, Washington, 2–4 June
1998, 1–8.
12. Robert JS, Cooley R, Deshp M, Tan PN. Web usage
mining: discovery and applications of usage patterns
from web data. SIGKDD Explor 2000, 1:11–23.
13. Hong TP, Lee CY. Induction of fuzzy rules and membership functions from training examples. Fuzzy Sets
Syst 1996, 84:33–47.
14. Hong TP, Chiang MJ, Wang SL. Mining weighted
browsing patterns with linguistic minimum supports.
IEEE International Conference on Systems, Man and
Cybernetics. Yasmine Hammamet, Tunisia, 6–9 October 2002, 635–639.
15. Wong C, Shiu S, Pal S. Mining fuzzy association rules
for web access case adaptation. The International Conference on Case Based Reasoning, Workshop. Vancouver, BC, Canada, 30 July–2 August 2001.
16. Krishnapuram R, Joshi A, Nasraoui O, Yi L. Lowcomplexity fuzzy relational clustering algorithms for
web mining. IEEE Trans Fuzzy Syst 2001, 9: 595–607.
17. Zhou B, Hui SC, Fong ACM. Discovering and visualizing temporal-based web access behavior. The International Conference on Web Intelligence. Compiègne
University of Technology, France, 19–22 September
2005.
18. Wu R. Mining generalized fuzzy association rules from
web logs. The International Conference on Fuzzy Systems and Knowledge Discovery. Yantai, Shandong,
China, 10–12 August 2010.
2013 John Wiley & Sons, Inc.
Volume 3, May/June 2013
WIREs Data Mining and Knowledge Discovery
A survey of fuzzy web mining
19. Hong TP, Chiang MJ, Wang SL. Mining fuzzy
weighted browsing patterns from time duration and
with linguistic thresholds. Am J Appl Sci 2008,
5:1611–1621.
20. Hong TP, Huang CM, Horng SJ. Linguistic objectoriented web-usage mining. Int J Approx Reason 2008,
48:47–61.
21. Eirinaki M, Vazirgiannis M. Web mining for web personalization. ACM Trans Internet Technol 2003, 3:1–
27.
22. Pierrakos D, Paliouras G, Papatheodorou C, Spyropoulos CD. Web usage mining as a tool for personalization: a survey. User Model User Adapt Interact
2003, 13:311–372.
23. Nasraoui Olfa FH, Joshi A, Krishnapuram R. Mining web access logs using relational competitive fuzzy
clustering. The International Fuzzy Systems Association World Congress. Taipei, Taiwan, 17–20 August
1999, 195–204.
24. Bae SM, Ha SH, Park SC. Fuzzy web ad selector based
on web usage mining. IEEE Intell Syst 2003, 18:62–
69.
25. Zhou B, Hui SC, Fong ACM. An effective approach for
periodic web personalization. The International Conference on Web Intelligence. Hong Kong, China, 18–
22 December 2006.
26. Kim KJ, Cho SB. Personalized mining of web documents using link structures and fuzzy concept networks. Appl Soft Comput 2007, 7:398–410.
27. Joshi A, Krishnapuram R. On mining web access
logs. ACM SIGMOD Workshop on Research Issues
in Data Mining and Knowledge Discovery. Vancouver, Canada, 9–12 June 2008, 63–69.
28. Santhisree K, Damodaram A. Clique: clustering based
on density on web usage data: experiments and test
results. The International Conference on Electronics Computer Technology. Kanyakumari, India, 8–10
April 2011.
29. Klir GJ, Yuan B. Fuzzy Sets and Fuzzy Logic: Theory and Applications. Upper Saddle River, NJ: Prentice
Hall Inc; 1995.
30. Nasraoui O, Petenes C. Combining web usage mining
and fuzzy inference for website personalization. WebKDD. Washington DC, 27 August 2003.
31. Fong ACM, Zhou B, Hui SC, Tang J, Hong GY.
Generation of personalized ontology based on consumer emotion and behavior analysis. IEEE Trans Affec Comput 2012, 3:152–164.
32. Porcel C, Tejeda-Lorente A, Martı́nez MA, HerreraViedma E. A hybrid recommender system for the
selective dissemination of research resources in a technology transfer office. Inf Sci 2012, 184:1–19.
33. Chen PM, Kuo FC. An information retrieval system
based on a user profile. J Syst Softw 2000, 54:3–8.
Volume 3, May/June 2013
C
34. Cheung DW, Kao B, Lee J. Discovering user access
patterns on the world wide web. Knowledge Based
Syst 1998, 10:463–470.
35. Abraham A. Business intelligence from web usgae mining. J Inf Knowledge Manage 2003, 2:375–390.
36. Wang X, Abraham A, Smith KA. Intelligent web traffic mining and analysis. J Netw Comput Appl 2005,
28:147–165.
37. Pol K, Patil N, Shreya P, Chhaya D. A survey on
web content mining and extraction of structured and
semistructured data. The International Conference on
Emerging Trends in Engineering and Technology.
Nagpur, Maharashtra, 16–18 July 2008, 543–546.
38. Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. The International
Conference on Very Large Data Bases. Santiago de
Chile, 12–15 September 1994, 487–499.
39. Hong TP, Lee YC. An overview of mining fuzzy association rules. Fuzzy Sets Their Extens Represent Aggreg
Models 2008, 220:397–410.
40. Lent B, Swami A, Widom J. Clustering association
rules. The International Conference on Data Engineering. Birmingham, 7–11 April 1997.
41. Liu F, Lu Z, Lu S. Mining association rules using clustering. Intell Data Anal 2001, 5:309–326.
42. Agrawal R, Srikant R. Mining sequential patterns. The
International Conference on Data Engineering. Taipei,
Taiwan, 6–10 March 1995.
43. Pei J, Han J, Behzad MA, Wang J, Helen P, Chen Q,
Umeshwar D, Hsu MC. Mining sequential patterns by
pattern-growth: the prefixspan approach. IEEE Trans
Knowledge Data Eng 2004, 16:1424–1440.
44. Hong TP, Lin CW, Wu YL. Incrementally fast updated frequent pattern trees. Expert Syst Appl 2008,
34:2424–2435.
45. Hong TP, Wu CH. An improved weighted clustering
algorithm for determination of application nodes in
heterogeneous sensor networks. J Inf Hiding Multimedia Signal Process 2011, 2:173–184.
46. Lin CW, Hong TP, Lu WH. The pre-fufp algorithm for
incremental mining. Expert Syst Appl 2009, 36:9498–
9505.
47. Herrera-Viedma E, Pasi G. Fuzzy approaches to access information on the web: recent developments and
research trends. The Conference of the European Society for Fuzzy Logic and Technology. Zittau, Germany,
10–12 September 2003, 25–31.
48. Martı́n-Bautista MJ, Sánchez D, Chamorro-Martı́nez
J, Serrano JM, Vila MA. Mining web documents to find
additional query terms using fuzzy association rules.
Fuzzy Sets Syst 2004, 148:85–104.
49. Chen YL, Weng CH. Mining fuzzy association rules
from questionnaire data. Knowledge Based Syst 2009,
22:46–56.
2013 John Wiley & Sons, Inc.
197
Advanced Review
wires.wiley.com/widm
50. Fard AM, Akbari H, Mohammad R, Akbarzadeh T.
Fuzzy adaptive resonance theory for content-based
data retrieval. Innovations in Information Technology.
Dubai, 19–21 November 2006.
51. Schockaert S, De Cock M, Kerre E. Reasoning about
fuzzy temporal information from the web: Towards retrieval of historical events. Soft Comput 2010, 14:869–
886.
52. Liu W, Yan H, Xiao J. Automatically mining review
records from forum web sites. The International Conference on Fuzzy Systems and Knowledge Discovery.
Yantai, Shandong, 10–12 August 2010.
53. Cheng LC, Ke ZH, Shiue BM. Detecting changes
of opinion from customer reviews. The International
Conference on Fuzzy Systems and Knowledge Discovery. Shanghai, China, 26–28 July 2011.
54. Nadali S, Murad MAA, Kadir RA. Sentiment classification of customer reviews based on fuzzy logic. The
International Symposium in Information Technology.
Kuala Lumpur, Malaysia, 15–17 June 2010.
55. Si J, Wang W. A template-based forum posts content
extraction method. The International Conference on
Electrical and Control Engineering. Yichang, China,
16–18 September 2011.
56. Subasic P, Huettner A. Affect analysis of text using
fuzzy semantic typing. IEEE Trans Fuzzy Syst 2001,
9:483–496.
57. Tho QT, Hui SC, Fong ACM, Tru Hoang C.
Automatic fuzzy ontology generation for semantic
web. IEEE Trans Knowledge Data Eng 2006, 18:
842–856.
58. Trappey A, Trappey CV, Fu Chiang H, Hsiao DW.
A fuzzy ontological knowledge document clustering
methodology. IEEE Trans Syst Man Cybernet B 2009,
39:806–814.
59. Gholamzadeh N, Taghiyareh F. Ontology-based fuzzy
web services clustering. The International Symposium
on Telecommunications. Kish Island, Iran, 4–6 December 2010, 721–725.
60. Kim KJ, Cho S-B. Fuzzy integration of structure adaptive soms for web content mining. Fuzzy Sets Syst 2004,
148:43–60.
61. Zadrożny S, Kacprzyk J, Tré GD. Bipolar queries in
textual information retrieval: a new perspective. Inf
Process Manage 2012, 48:390–398.
62. López-Herrera AG, Herrera-Viedma E, Herrera F. Applying multi-objective evolutionary algorithms to the
automatic learning of extended boolean queries in
fuzzy ordinal linguistic information retrieval systems.
Fuzzy Sets Syst 2009, 160:2192–2205.
63. Saremi HQ, Montazer GA. Web usability: a fuzzy approach to the navigation structure enhancement in a
website system, case of Iranian civil aviation organization website. Int J Appl Math Comput Sci 2005,
2:131–136.
198
C
64. Furnadzhiev G. Using web sites external views for
fuzzy classification. Int J Inf Theor Appl 2004, 11:194–
199.
65. Herrera-Viedma E, Peis E, Morales del Castillo JM,
Alonso S, Anaya K. A fuzzy linguistic model to evaluate
the quality of web sites that store xml documents. Int
J Approx Reason 2007, 46:226–253.
66. Leitao L, Calado P, Weis M. Structure-based inference
of xml similarity for fuzzy duplicate detection. ACM
Conference on Information and Knowledge Management. Lisbon, Portugal, 6–10 November 2007, 293–
302.
67. Herrera-Viedma E, Peis E. Evaluating the informative
quality of documents in sgml format from judgements
by means of fuzzy linguistic techniques based on computing with words. Inf Process Manage 2003, 39:233–
249.
68. Herrera-Viedma E, Pasi G, Lopez-Herrera AG, Porcel
C. Evaluating the information quality of web sites: a
methodology based on fuzzy computing with words:
special topic section on soft approaches to information
retrieval and information access on the web. J Am Soc
Inf Sci Technol 2006, 57:538–549.
69. Zhang F, Ma ZM, Cheng J, Meng X. Fuzzy semantic
web ontology learning from fuzzy uml model. ACM
Conference on Information and Knowledge Management. Hong Kong, China, 2–6 November 2009, 1007–
1016.
70. Maio CD, Fenza G, Loia V, Senatore S. Hierarchical
web resources retrieval by exploiting fuzzy formal concept analysis. Inf Process Manage 2012, 48:399–418.
71. Kim W, Song YU, Hong JS. Web enabled expert systems using hyperlink-based inference. Expert Syst Appl
2005, 28:79–91.
72. F¨urnkranz J. Web structure mining exploiting the
graph structure of the world-wide web. ÖGAI J 2002,
21:17–26.
73. Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer Networks and
ISDN Systems 1998, 30:107–117.
74. Kleinberg JM. Authoritative sources in a hyperlinked
environment. J ACM 1999, 46:604–632.
75. Moreno J, Morales del Castillo J, Porcel C, HerreraViedma E. A quality evaluation methodology for
health-related websites based on a 2-tuple fuzzy linguistic approach. Soft Comput 2010, 14:887–897.
76. Oreilly T. What is web 2.0: design patterns and business models for the next generation of software. Commun Strategies 2007, 1:17.
77. Shestakov D, Bhowmick SS, Lim E-P. Deque: querying
the deep web. Data Knowledge Eng 2005, 52:273–
311.
78. Chang KCC, Cho J. Accessing the web: from search to
integration. ACM SIGMOD. Chicago, Illinois, 27–29
June 2006, 804–805.
2013 John Wiley & Sons, Inc.
Volume 3, May/June 2013
WIREs Data Mining and Knowledge Discovery
A survey of fuzzy web mining
79. Madden S. From databases to big data. IEEE Internet
Comput 2012, 16:4–6.
80. Atzori L, Iera A, Morabito G. The internet of things: a
survey. Comput Netw 2010, 54:2787–2805.
81. Miorandi D, Sicari S, De Pellegrini F, Chlamtac I. Internet of things: vision, applications and research challenges. Ad Hoc Netw 2012, 10:1497–1516.
82. Sarma S, Brock DL, Ashton K. The networked physical world. TR MIT-AUTOID-WH-001, MIT Auto-
Volume 3, May/June 2013
C
ID Center; 2000. Available at: http://www.autoidlabs.
org/uploads/media/MIT-AUTOID-WH-001.pdf. (Accessed April 5, 2013).
83. Haller S, Karnouskos S, Schroth C. The internet of
things in an enterprise context future internet. Future
Internet 2009, 5468:14–28.
84. Weber RH. Internet of things-new security and privacy
challenges. Comput Law Security Rev 2010, 26:23–
30.
2013 John Wiley & Sons, Inc.
199