Download Pattern.Comparison.in. Data.Mining:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
IGI PUBLISHING
ITB13885
701 E.Peleks,
Chocolate&Avenue,
Suite 200, Hershey PA 17033-1240, USA
Ntouts,
Theodords
Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.igi-pub.com
This paper appears in the publication, Research and Trends in Data Mining Technologies and Applications
edited by D. Taniar © 2007, IGI Global
Chapter.IV
Pattern.Comparison.in.
Data.Mining:
A.Survey
Irene Ntouts, Unversty of Praeus, Greece
Nkos Peleks, Unversty of Praeus, Greece
Yanns Theodords, Unversty of Praeus, Greece
Abstract
Many patterns are available nowadays due to the widespread use of knowledge
discovery in databases (KDD), as a result of the overwhelming amount of data. This
“flood” of patterns imposes new challenges regarding their management. Pattern
comparison, which aims at evaluating how close to each other two patterns are,
is one of these challenges resulting in a variety of applications. In this chapter we
investigate issues regarding the pattern comparison problem and present an overview
of the work performed so far in this domain. Due to heterogeneity of data mining
patterns, we focus on the most popular pattern types, namely frequent itemsets and
association rules, clusters and clusterings, and decision trees.
Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
Pattern Comparson n Data Mnng: A Survey Introduction
Nowadays a large quantity of raw data is collected from different application domains (business, science, telecommunication, health care systems, etc.). According to Lyman and Varian (2003), “The world produces between 1 and 2 exabytes
of unique information per year, which is roughly 250 megabytes for every man,
woman, and child on earth”. Due to their quantity and complexity, it is impossible
for humans to thoroughly investigate these data collections directly. Knowledge
discovery in databases (KDD) and data mining (DM) provide a solution to this
problem by generating compact and rich semantics representations of raw data,
called patterns (Rizzi et al, 2003). With roots in machine learning, statistics, and
pattern recognition, KKD aims at extracting valid, novel, potentially useful, and
ultimately understandable patterns from data (Fayyad, Piatetsky-Shapiro, & Smyth,
1996). Several pattern types exist in the literature mainly due to the wide heterogeneity of data and the different techniques for pattern extraction as a result of the
different goals that a mining process tries to achieve (i.e., what data characteristics
the mining process highlights). Frequent itemsets (and their extension, association
rules), clusters (and their grouping, clusterings), and decision trees are among the
most well-known pattern types in data mining.
Due to the current spreading of DM technology, even the amount of patterns extracted
from heterogeneous data sources is large and hard to be managed by humans. Of
course, patterns do not raise from the DM field only; signal processing, information retrieval, and mathematics are among the fields that also “yield” patterns. The
new reality imposes new challenges and requirements regarding the management
of patterns in correspondence to the management of traditional raw data. These requirements have been recognized by both the academic and the industrial parts that
try to deal with the problem of efficient and effective pattern management (Catania
& Maddalena, 2006), including, among others, modeling, querying, indexing, and
visualization issues.
Among the several interesting operations on patterns, one of the most important is
that of comparisonthat is, evaluating how similar two patterns are. As an application example, consider a supermarket that is interested in discovering changes in
its customers’ behavior over the last two months. For the supermarket owner, it is
probably more important to discover what has changed over time in its customers’
behavior rather than to preview some more association rules on this topic. This
is the case in general: the more familiar an expert becomes with data mining, the
more interesting it becomes for her to discover changes rather than already-known
patterns. A similar example also stands in the case of a distributed data mining
environment, where one might be interested in discovering what differentiates the
distributed branches with respect to each other or, in grouping together branches
of similar patterns. From the latter, another application of similarity arises, that
of exploiting similarity between patterns for meta-pattern managementthat is,
Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
33 more pages are available in the full version of this document,
which may be purchased using the "Add to Cart" button on the
publisher's webpage:
www.igi-global.com/chapter/pattern-comparison-datamining/28422
Related Content
Comparison of Linguistic Summaries and Fuzzy Functional Dependencies
Related to Data Mining
Miroslav Hudec, Miljan Vueti and Mirko Vujoševi (2014). Biologically-Inspired Techniques
for Knowledge Discovery and Data Mining (pp. 174-203).
www.irma-international.org/chapter/comparison-of-linguistic-summaries-andfuzzy-functional-dependencies-related-to-data-mining/110459/
Big Data Management in Financial Services
Javier Vidal-García and Marta Vidal (2016). Effective Big Data Management and
Opportunities for Implementation (pp. 217-230).
www.irma-international.org/chapter/big-data-management-in-financialservices/157694/
Algebraic and Graphic Languages for OLAP Manipulations
Ravat Franck, Teste Olivier, Tournier Ronan and Zurfluh Gilles (2010). Strategic
Advancements in Utilizing Data Mining and Warehousing Technologies: New Concepts
and Developments (pp. 60-90).
www.irma-international.org/chapter/algebraic-graphic-languages-olapmanipulations/40398/
An Efficient Method for Discretizing Continuous Attributes
Kelley M. Engle and Aryya Gangopadhyay (2012). Exploring Advances in
Interdisciplinary Data Mining and Analytics: New Trends (pp. 69-90).
www.irma-international.org/chapter/efficient-method-discretizing-continuousattributes/61169/
A Novel Multi-Secret Sharing Approach for Secure Data Warehousing and OnLine Analysis Processing in the Cloud
Varunya Attasena, Nouria Harbi and Jérôme Darmont (2015). International Journal of
Data Warehousing and Mining (pp. 22-43).
www.irma-international.org/article/a-novel-multi-secret-sharing-approach-forsecure-data-warehousing-and-on-line-analysis-processing-in-the-cloud/125649/