Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
國立雲林科技大學 National Yunlin University of Science and Technology A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor:Dr. Hsu Graduate:Chien-Ming Hsiao Author:Zhexue Huang and Michael K. Ng Outline Motivation Objective Introduction Notation Hard and fuzzy k-means algorithms Hard and fuzzy k-Modes algorithms Experimental Results Conclusions Personal Opinion Intelligent Database Systems Lab Motivation Working only on numeric data limits the use of these k-means-type algorithms in data mining. Most algorithms for clustering categorical data suffer from a common efficiency problem when applied to massive categorical-only data sets. Intelligent Database Systems Lab Objective To tackle the problem of clustering large categorical data sets in data mining Intelligent Database Systems Lab Introduction Fuzzy versions of k-means algorithm Each pattern is allowed to have membership functions to all clusters. Working only on numeric data limits the use of these kmeans-type algorithms in such areas data mining. Intelligent Database Systems Lab Introduction To cluster categorical data methods the k-means algorithm [Ralambondrainy, 1995] hierarchical clustering methods [Gower, 1991] the PAM algorithm [Kaufman et al, 1990] the fuzzy-statistical algorithms [Woodbury, 1974] The conceptual clustering methods [Michalski, 1983] Intelligent Database Systems Lab Notation The set of objects to be clustered is stored in a database table T defined by a set of attributes A1, A2,…, Am. Let X X 1,X 2 , ,X n be a set of n objects. Object X i is represente d as xi,1 , xi,2 ,, xi,m . Intelligent Database Systems Lab Hard and fuzzy k-means algorithms Let X be a set of n objects described by m numeric attributes. Intelligent Database Systems Lab Hard and fuzzy k-means algorithms The usual method toward optimization of F is to use partial optimization for Z and W fix Z and find necessary conditions on W to minimize F Fix W and minimize F with respect to Z Intelligent Database Systems Lab Hard and fuzzy k-means algorithms Theorem 1 Let be fixed and consider Problem (P1) Intelligent Database Systems Lab Hard and fuzzy k-means algorithms Theorem 2 Let be fixed and consider Problem (P2) Intelligent Database Systems Lab Hard and fuzzy k-means algorithms The complexity of the algorithm O(tkmn) The space of the algorithm O(n(m+k) + km) Intelligent Database Systems Lab Hard and fuzzy k-Modes algorithms Using a simple matching dissimilarity measure for categorical objects Replacing the means of clusters with the modes Using a frequency-based method to find the modes Intelligent Database Systems Lab Hard and fuzzy k-Modes algorithms Let X and Y be two categorical objects X= Y= The simple matching dissimilarity measure between X and Y is defined as follows: Intelligent Database Systems Lab Hard and fuzzy k-Modes algorithms Using a frequency-based method to update Z The Hard k-modes Update Method The Fuzzy k-modes Update Method Intelligent Database Systems Lab Hard and fuzzy k-Modes algorithms Theorem 3 : The Hard k-modes Update Method The category of attribute Aj of the cluster mode Zl is determined by the mode of categories of attribute Aj in the set of objects belonging to cluster l the quantity Intelligent Database Systems Lab Hard and fuzzy k-Modes algorithms Theorem 4 : The Fuzzy k-modes Update Method The category of attribute Aj of the cluster mode Zl is given by the category that achieves the maximum of the summation of wli to cluster l over all categories. the quantity Intelligent Database Systems Lab Hard and fuzzy k-Modes algorithms Theorem 5 Let 1. The fuzzy k - modes algorithm converges in a finite number of iterations . Intelligent Database Systems Lab Hard and fuzzy k-Modes algorithms Intelligent Database Systems Lab Experimental Results To evaluate the performance and efficiency of the fuzzy k-modes algorithm To compare the fuzzy k-modes algorithm with the conceptual k-means algorithm and the hard kmodes algorithm Use real and artificial data Soybean disease data set. Intelligent Database Systems Lab Experimental Results Intelligent Database Systems Lab Experimental Results Intelligent Database Systems Lab Experimental Results Intelligent Database Systems Lab Experimental Results Intelligent Database Systems Lab Experimental Results Intelligent Database Systems Lab Conclusions Introduced the fuzzy k-modes algorithm for clustering categorical objects based on extensions to the fuzzy k-means algorithm. The consequence of Theorem 4 that allows the kmeans paradigm to be used in generating the fuzzy partition matrix from categorical data Intelligent Database Systems Lab Personal Opinion The fuzzy partition matrix provides more information to help the user to determine the final clustering and to identify the boundary objects Intelligent Database Systems Lab