Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IGI PUBLISHING ITB13886 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Mnng Frequent Patterns Usng Self-Organzng Map Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.igi-pub.com This paper appears in the publication, Research and Trends in Data Mining Technologies and Applications edited by D. Taniar © 2007, IGI Global Chapter.V Mining.Frequent.Patterns. Using.Self-Organizing.Map Fedja Hadzc, Unversty of Technology Sydney, Australa Tharam Dllon, Unversty of Technology Sydney, Australa Henry Tan, Unversty of Technology Sydney, Australa Lng Feng, Unversty of Twente, The Netherlands Elzabeth Chang, Curtn Unversty of Technology, Australa Abstract Association rule mining is one of the most popular pattern discovery methods used in data mining. Frequent pattern extraction is an essential step in association rule mining. Most of the proposed algorithms for extracting frequent patterns are based on the downward closure lemma concept utilizing the support and confidence framework. In this chapter we investigate an alternative method for mining frequent patterns in a transactional database. Self-Organizing Map (SOM) is an unsupervised neural network that effectively creates spatially organized internal representations of the features and abstractions detected in the input space. It is one of the most popular clustering techniques, and it reveals existing similarities in the input space by perCopyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Hadzc, Dllon, Tan, Feng, & Chang forming a topology-preserving mapping. These promising properties indicate that such a clustering technique can be used to detect frequent patterns in a top-down manner as opposed to the traditional approach that employs a bottom-up lattice search. Issues that are frequently raised when using clustering technique for the purpose of finding association rules are: (i) the completeness of association rule set, (ii) the support level for the rules generated, and (iii) the confidence level for the rules generated. We present some case studies analyzing the relationships between the SOM approach and the traditional association rule framework, and propose a way to constrain the clustering technique so that the traditional support constraint can be approximated. Throughout our experiments, we have demonstrated how a clustering approach can be used for discovering frequent patterns. Introduction With large amounts of data continuously collected and stored, organizations are interested in discovering associations within their databases for different industrial, commercial, or scientific purposes. The discovery of interesting associations can aid the decision-making process and provide the domain with new and invaluable knowledge. Association rule mining is one of the most popular data mining techniques used to discover interesting relationships among data objects present in a database. A large amount of research has gone toward the development of efficient algorithms for association rule mining (Agrawal, Imielinski, & Swami, 1993; Agrawal, Mannila, Srikant, Toivonen, & Verkamo, 1996; Feng, Dillon, Weigana, & Chang, 2003; Park, Chen, & Yu, 1997; Tan, Dillon, Feng, Chang, & Hadzic, 2005; Zaki, 2001; Zaki, Pathasarthy, Ogihara, & Li, 1997). The algorithms developed have different advantages and disadvantages when applied to different domains with varying complexity and data characteristics. Frequent pattern discovery is an essential step in association rule mining. Most of the developed algorithms are a variant of Apriori (Agrawal et al., 1996) and are based on the downward closure lemma concept with the support framework. Besides the fact that the problem is approached differently, commonality that remains in most of the current algorithms is that all possible candidate combinations are first enumerated, and then their frequency is determined by scanning of the transactional database. These two steps, candidate enumeration and testing, are known to be the major bottlenecks in the Apriori-like approaches, which inspired some work towards methods that avoid this performance bottleneck (Han, Pei, & Yin, 2000; Park, Chen, & Yu, 1995). Depending on the aim of the application, rules may be generated from the frequent patterns discovered, and the support and confidence of each rule can be indicated. Rules with high confidence often cover only small fractions of the total number of rules generated, which makes their isolation more challenging and costly. In general, the task of frequent patterns discovery can have problems with complex real-world data, as for a transaction Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. 20 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the publisher's webpage: www.igi-global.com/chapter/mining-frequent-patterns-usingself/28423 Related Content Visualizing the Bug Distribution Information Available in Software Bug Repositories N. K. Nagwani and S. Verma (2014). Data Mining and Analysis in the Engineering Field (pp. 48-66). www.irma-international.org/chapter/visualizing-the-bug-distribution-informationavailable-in-software-bug-repositories/109975/ Periodic Streaming Data Reduction Using Flexible Adjustment of Time Section Size Jaehoon Kim and Seog Park (2005). International Journal of Data Warehousing and Mining (pp. 37-56). www.irma-international.org/article/periodic-streaming-data-reduction-using/1747/ Elasticity in Cloud Databases and Their Query Processing Goetz Graefe, Anisoara Nica, Knut Stolze, Thomas Neumann, Todd Eavis, Ilia Petrov, Elaheh Pourabbas and David Fekete (2013). International Journal of Data Warehousing and Mining (pp. 1-20). www.irma-international.org/article/elasticity-cloud-databases-their-query/78284/ A Review of RDF Storage in NoSQL Databases Zongmin Ma and Li Yan (2016). Big Data: Concepts, Methodologies, Tools, and Applications (pp. 85-104). www.irma-international.org/chapter/a-review-of-rdf-storage-in-nosqldatabases/150160/ Ant Programming Algorithms for Classification Juan Luis Olmo, José Raúl Romero and Sebastián Ventura (2014). Biologically-Inspired Techniques for Knowledge Discovery and Data Mining (pp. 107-128). www.irma-international.org/chapter/ant-programming-algorithms-forclassification/110456/