Download Mining.Frequent.Patterns. Using.Self

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
IGI PUBLISHING
ITB13886
701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA
Mnng Frequent Patterns Usng Self-Organzng Map Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.igi-pub.com
This paper appears in the publication, Research and Trends in Data Mining Technologies and Applications
edited by D. Taniar © 2007, IGI Global
Chapter.V
Mining.Frequent.Patterns.
Using.Self-Organizing.Map
Fedja Hadzc, Unversty of Technology Sydney, Australa
Tharam Dllon, Unversty of Technology Sydney, Australa
Henry Tan, Unversty of Technology Sydney, Australa
Lng Feng, Unversty of Twente, The Netherlands
Elzabeth Chang, Curtn Unversty of Technology, Australa
Abstract
Association rule mining is one of the most popular pattern discovery methods used
in data mining. Frequent pattern extraction is an essential step in association rule
mining. Most of the proposed algorithms for extracting frequent patterns are based on
the downward closure lemma concept utilizing the support and confidence framework.
In this chapter we investigate an alternative method for mining frequent patterns in
a transactional database. Self-Organizing Map (SOM) is an unsupervised neural
network that effectively creates spatially organized internal representations of the
features and abstractions detected in the input space. It is one of the most popular
clustering techniques, and it reveals existing similarities in the input space by perCopyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
Hadzc, Dllon, Tan, Feng, & Chang
forming a topology-preserving mapping. These promising properties indicate that
such a clustering technique can be used to detect frequent patterns in a top-down
manner as opposed to the traditional approach that employs a bottom-up lattice
search. Issues that are frequently raised when using clustering technique for the
purpose of finding association rules are: (i) the completeness of association rule set,
(ii) the support level for the rules generated, and (iii) the confidence level for the
rules generated. We present some case studies analyzing the relationships between
the SOM approach and the traditional association rule framework, and propose a
way to constrain the clustering technique so that the traditional support constraint
can be approximated. Throughout our experiments, we have demonstrated how a
clustering approach can be used for discovering frequent patterns.
Introduction
With large amounts of data continuously collected and stored, organizations are
interested in discovering associations within their databases for different industrial,
commercial, or scientific purposes. The discovery of interesting associations can
aid the decision-making process and provide the domain with new and invaluable
knowledge. Association rule mining is one of the most popular data mining techniques
used to discover interesting relationships among data objects present in a database.
A large amount of research has gone toward the development of efficient algorithms
for association rule mining (Agrawal, Imielinski, & Swami, 1993; Agrawal, Mannila,
Srikant, Toivonen, & Verkamo, 1996; Feng, Dillon, Weigana, & Chang, 2003;
Park, Chen, & Yu, 1997; Tan, Dillon, Feng, Chang, & Hadzic, 2005; Zaki, 2001;
Zaki, Pathasarthy, Ogihara, & Li, 1997). The algorithms developed have different
advantages and disadvantages when applied to different domains with varying
complexity and data characteristics. Frequent pattern discovery is an essential step
in association rule mining. Most of the developed algorithms are a variant of Apriori
(Agrawal et al., 1996) and are based on the downward closure lemma concept with
the support framework. Besides the fact that the problem is approached differently,
commonality that remains in most of the current algorithms is that all possible
candidate combinations are first enumerated, and then their frequency is determined
by scanning of the transactional database. These two steps, candidate enumeration
and testing, are known to be the major bottlenecks in the Apriori-like approaches,
which inspired some work towards methods that avoid this performance bottleneck
(Han, Pei, & Yin, 2000; Park, Chen, & Yu, 1995). Depending on the aim of the
application, rules may be generated from the frequent patterns discovered, and the
support and confidence of each rule can be indicated. Rules with high confidence
often cover only small fractions of the total number of rules generated, which makes
their isolation more challenging and costly. In general, the task of frequent patterns
discovery can have problems with complex real-world data, as for a transaction
Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
20 more pages are available in the full version of this document,
which may be purchased using the "Add to Cart" button on the
publisher's webpage:
www.igi-global.com/chapter/mining-frequent-patterns-usingself/28423
Related Content
Visualizing the Bug Distribution Information Available in Software Bug
Repositories
N. K. Nagwani and S. Verma (2014). Data Mining and Analysis in the Engineering Field
(pp. 48-66).
www.irma-international.org/chapter/visualizing-the-bug-distribution-informationavailable-in-software-bug-repositories/109975/
Periodic Streaming Data Reduction Using Flexible Adjustment of Time Section
Size
Jaehoon Kim and Seog Park (2005). International Journal of Data Warehousing and
Mining (pp. 37-56).
www.irma-international.org/article/periodic-streaming-data-reduction-using/1747/
Elasticity in Cloud Databases and Their Query Processing
Goetz Graefe, Anisoara Nica, Knut Stolze, Thomas Neumann, Todd Eavis, Ilia Petrov,
Elaheh Pourabbas and David Fekete (2013). International Journal of Data Warehousing
and Mining (pp. 1-20).
www.irma-international.org/article/elasticity-cloud-databases-their-query/78284/
A Review of RDF Storage in NoSQL Databases
Zongmin Ma and Li Yan (2016). Big Data: Concepts, Methodologies, Tools, and
Applications (pp. 85-104).
www.irma-international.org/chapter/a-review-of-rdf-storage-in-nosqldatabases/150160/
Ant Programming Algorithms for Classification
Juan Luis Olmo, José Raúl Romero and Sebastián Ventura (2014). Biologically-Inspired
Techniques for Knowledge Discovery and Data Mining (pp. 107-128).
www.irma-international.org/chapter/ant-programming-algorithms-forclassification/110456/