Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Automatic Subspace Clustering of High Dimensional Data for Data Mining Applicatyions Background The Curse Of Dimensionality Some solutions Data Projection, Dimension Reduction, signature encoding PCA, Wavelet, NN (SOM) Feature Selection CLIQUE need not do that Li Cheng 10/12/00 CLIQUE clustering algorithm 1 10/12/00 The Contribution of CLIQUE CLIQUE clustering algorithm 2 Some Definitions: A cluster is a maximal set of connected dense units in K-dimensions. Two K-dimensional units u1, u2 are connected if they have a common face, or if there exists other K-dim unit ui, such that u1, ui and u2 are connected consequently. A region in K dimensions is an axisparallel rectangular K-dimensional set. Automatically find subspaces with high-density clusters in high dimensional attribute spaces 10/12/00 CLIQUE clustering algorithm 3 10/12/00 CLIQUE clustering algorithm 4 What is CLIQUE Flow Chart of CLIQUE The basic idea is similar to Bottom-up to find dense units Further Prune subspaces using APRIORI, the association rule algorithm. MDL principle Generating Minimal number of Regions, each region cover one cluster A bottom-up scheme. The Monotonicity Lemma Prune to eliminate some outlines that their “support” is too small. The threshold here called “optimal cut point i” Next 10/12/00 CLIQUE clustering algorithm Firstly, greedily find a number of maximal rectangles Generate a minimal cover 5 10/12/00 Apriori algorithm 10/12/00 6 Basic Idea of CLIQUE Monotonicity: CLIQUE clustering algorithm If a collection of points S is a cluster in a Kdimensional space, then S is also part of a cluster in any (k-1) dimensional projections of this space. Reproduced from http://www.scs.carleton.ca/~kimasaki/DataMining/summary/ CLIQUE clustering algorithm 7 10/12/00 CLIQUE clustering algorithm 8 Prune subspaces using MDL principle Flow Chart of CLIQUE Bottom-up to find dense units Further Prune subspaces using MDL principle Generating Minimal number of Regions, each region cover one cluster Firstly, greedily find a number of maximal rectangles Generate a minimal cover 10/12/00 CLIQUE clustering algorithm Partitioning of the subspaces into selected and prune sets 9 10/12/00 CLIQUE clustering algorithm 10 Flow Chart of CLIQUE (Cont.) Flow Chart of CLIQUE Bottom-up to find dense units Further Prune subspaces using An Example: MDL principle Generating Minimal number of Regions, each region cover one cluster Firstly, greedily find a number of maximal rectangles Generate a minimal cover 10/12/00 CLIQUE clustering algorithm 11 10/12/00 CLIQUE clustering algorithm 12 10/12/00 Pro. And Con. Pro. And Con. (Cont.) Pro. Order Insensitive Arbitrary Shape of Clusters Tolerant of missing values Doesn’t presume some canonical distribution Scalability O(n) Insensitive to noise Cons. Some parameters that hard to preselect: ξ (partition threshold) and τ (density threshold, i.e. support threshold) Prone to higher dimensional clusters Some potential clusters will lost in the density-units or subspaceprune procedures CLIQUE clustering algorithm 13 Comparison with Birch, DBScan and PCA (SVD) Concludes that CLIQUE performs better than Birch, DBScan and SVD 10/12/00 CLIQUE clustering algorithm 15 10/12/00 CLIQUE clustering algorithm 14