Download What is CLIQUE - ugweb.cs.ualberta.ca

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Human genetic clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

Transcript
Automatic Subspace
Clustering of High
Dimensional Data for Data
Mining Applicatyions
Background
„ The Curse Of Dimensionality
„ Some solutions
‹ Data Projection, Dimension
Reduction, signature encoding
 PCA, Wavelet, NN (SOM)
‹ Feature Selection
„ CLIQUE need not do that
Li Cheng
10/12/00
CLIQUE clustering algorithm
1
10/12/00
The Contribution of
CLIQUE
CLIQUE clustering algorithm
2
Some Definitions:
„ A cluster is a maximal set of connected
dense units in K-dimensions.
„ Two K-dimensional units u1, u2 are
connected if they have a common face,
or if there exists other K-dim unit ui,
such that u1, ui and u2 are connected
consequently.
„ A region in K dimensions is an axisparallel rectangular K-dimensional set.
„ Automatically find subspaces with
high-density clusters in high
dimensional attribute spaces
10/12/00
CLIQUE clustering algorithm
3
10/12/00
CLIQUE clustering algorithm
4
What is CLIQUE
Flow Chart of CLIQUE
„ The basic idea is similar to
„ Bottom-up to find dense units
„ Further Prune subspaces using
APRIORI, the association rule
algorithm.
MDL principle
„ Generating Minimal number of
Regions, each region cover one
cluster
‹ A bottom-up scheme.
‹ The Monotonicity Lemma
‹ Prune to eliminate some outlines
that their “support” is too small.
The threshold here called “optimal
cut point i”
Next
10/12/00
CLIQUE clustering algorithm
‹ Firstly, greedily find a number of
maximal rectangles
‹ Generate a minimal cover
5
10/12/00
Apriori algorithm
10/12/00
6
Basic Idea of CLIQUE
Monotonicity:
„
„
CLIQUE clustering algorithm
If a collection of points S is a cluster in a Kdimensional space, then S is also part of a cluster in any
(k-1) dimensional projections of this space.
Reproduced from http://www.scs.carleton.ca/~kimasaki/DataMining/summary/
CLIQUE clustering algorithm
7
10/12/00
CLIQUE clustering algorithm
8
Prune subspaces using
MDL principle
Flow Chart of CLIQUE
„ Bottom-up to find dense units
„ Further Prune subspaces using
MDL principle
„ Generating Minimal number of
Regions, each region cover one
cluster
‹ Firstly, greedily find a number of
maximal rectangles
‹ Generate a minimal cover
10/12/00
CLIQUE clustering algorithm
„ Partitioning of the subspaces into
selected and prune sets
9
10/12/00
CLIQUE clustering algorithm
10
Flow Chart of CLIQUE
(Cont.)
Flow Chart of CLIQUE
„ Bottom-up to find dense units
„ Further Prune subspaces using
„ An Example:
MDL principle
„ Generating Minimal number of
Regions, each region cover one
cluster
‹ Firstly, greedily find a number of
maximal rectangles
‹ Generate a minimal cover
10/12/00
CLIQUE clustering algorithm
11
10/12/00
CLIQUE clustering algorithm
12
10/12/00
Pro. And Con.
Pro. And Con. (Cont.)
„ Pro.
‹ Order Insensitive
‹ Arbitrary Shape of Clusters
‹ Tolerant of missing values
‹ Doesn’t presume some canonical
distribution
‹ Scalability O(n)
‹ Insensitive to noise
„ Cons.
‹ Some parameters that hard to preselect: ξ (partition threshold) and τ
(density threshold, i.e. support
threshold)
‹ Prone to higher dimensional
clusters
‹ Some potential clusters will lost in
the density-units or subspaceprune procedures
CLIQUE clustering algorithm
13
Comparison with Birch,
DBScan and PCA (SVD)
Concludes that CLIQUE performs better than
Birch, DBScan and SVD
10/12/00
CLIQUE clustering algorithm
15
10/12/00
CLIQUE clustering algorithm
14