Download Clustering Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Clustering Methods
 Hierarchical Agglomerative methods

The hierarchical agglomerative clustering methods are most commonly used.
The construction of an hierarchical agglomerative classification can be achieved by
the following general algorithm.
1.
Find the 2 closest objects and merge them into a cluster
2.
Find and merge the next two closest points, where a point is either an
individual object or a cluster of objects.
3.
If more than one cluster remains , return to step 1

General example
Step 0
a
Step 1
Step 2 Step 3 Step 4
ab
b
abcde
c
cde
d
de
e
Step 4






agglomerativ
e
Step 3
Step 2 Step 1 Step 0
d iv is iv
e
Hierarchical Agglomerative methods- Implementation
Individual methods are characterized by the definition used for
identification of the closest pair of points, and by the means used to describe the new
cluster when two clusters are merged.
There are some general approaches to implementation of this algorithm:
o
stored matrix
o
stored data
Stored Matrix
o
In the second matrix approach , an N*N matrix containing all
pairwise distance values is first created, and updated as new clusters are
formed. This approach has at least an O(n*n) time requirement, rising to O(n3)
if a simple serial scan of dissimilarity matrix is used to identify the points
which need to be fused in each agglomeration, a serious limitation for large N.
Stored Data
o The stored data approach required the recalculation of pairwise dissimilarity
values for each of the N-1 agglomerations, and the O(N) space requirement is
therefore achieved at the expense of an O(N3) time requirement.
The Single Link Method (SLINK)
o The single link method is probably the best known of the hierarchical methods
and operates by joining, at each step, the two most similar objects, which are






not yet in the same cluster. The name single link thus refers to the joining of
pairs of clusters by the single shortest link between them.
The Complete Link Method (CLINK)
o The complete link method is similar to the single link method except that it
uses the least similar pair between two clusters to determine the inter-cluster
similarity (so that every cluster member is more like the furthest member of
its own cluster than the furthest item in any other cluster ). This method is
characterized by small, tightly bound clusters.
The Group Average Method
o The group average method relies on the average value of the pair wise within
a cluster, rather than the maximum or minimum similarity as with the single
link or the complete link methods. Since all objects in a cluster contribute to
the inter –cluster similarity, each object is , on average more like every other
member of its own cluster then the objects in any other cluster.
Hierarchical Clustering: Summary
o Single Link: Distance between two clusters is the distance between the
closest points. Also called “neighbor joining.”
o Average Link: Distance between clusters is distance between the cluster
centroids.
o Complete Link: Distance between clusters is distance between farthest pair of
points.
Dendrograms
o A Dendrogram Shows How the Clusters are Merged Hierarchically
o Ordered dendrograms
Principal Component Analysis
o Problem: many types of data have too many attributes to be visualized or
manipulated conveniently.
o For example, a single microarray experiment may have 6,000-8,000 genes.
o PCA is a method for reducing the number of attributes (dimensions) of
numerical data while attempting to preserve the cluster structure.
o After PCA, we hopefully get the same clusters as we would if we clustered the
data before PCA.
o After PCA, plots of the data should still have the clusters falling into obvious
groups.
o By using PCA to reduce the data to 2 or 3 dimensions, off-the-shelf geometry
viewers can be used to visualize data.
o PCA: The Algorithm
 Consider the data as an m by m matrix in which each cell is the
covariance between attribute i and j.
 The eigenvectors corresponding to the d largest eigenvalues of this
matrix are the “principal components”
 By projecting the data onto these vectors, one obtains d-dimensional
points
MST-method – Minimal Spanning trees - Graph Representation of data
o Representation of a set of n-dimensional “k” points as a graph
o Each data point is represented as a node V (a vertex)


o Edge between i-th and j-th points is a connection evaluated by the “distance”
between the two points V(i) and V(j)
o d i,j -matrix of distances
o Representation of a set of n-dimensional “k” points as a graph
Spanning Tree of a Graph
o SPANNING TREE =Subgraph with all vertices connected and without cycles
there are no two different paths between two points.
o Minimal spanning tree=spanning tree with minimal total sum of edges.
o In MST points are connected to their closest neighbors
Examples of Spanning tree
Forest (not a tree)
MST

There is a cycle!
 Prim algorithm
o Input {dij}, i,j=1,2,…,n distances between data points
o i0 is index for root (V(i0 ) starting point for the algorithm)
o k=1 A(1)={V(i0)}
B(1)={data points} – {V(i0 )}
o After “k” steps we have two sets :
o A(k) – data points included in MST, ||A(k)||=k
o B(k) – data points still are not in MST, ||B(k)||=n-k
o ik is index of data point from B(k) that is closest to A(k)
o A(k+1)= A(k) V(ik)
o B(k+1)= B(k) - V(ik)
until k=n
 Prim algorithm graphical example
 Clustering of MST
 If a cluster exists and is partitioned. The closest point to the partition will be a cluster
member.
 Graph partition – Graphical example
Advanced hierarchical clustering
 Two way clustering
 First cluster one dimension (genes)
 Cluster second dimension (conditions)
 Use clusters of first dimension to cluster second dimension and back
Different way to look at clustering
Given a data set consisting of a set of objects, discover and report natural groups in this
set. :
Formally: a partition of the objects into subsets. There are three major theoretical issues:
1. How to score partitions?
2. How to search the (vast) space of all possible partitions?
3. How to decide which clusters are statistically significant?
How to score partitions?
 Score: Sum of the distances (squared) from each data point to the center of its
cluster. (This score is used in the popular K-means clustering.) Note: The cluster
number has to be fixed externally.
 Score: For each point the distance to the nearest point outside its own cluster is
calculated. The minimum of this distance over all points is the partition
score.(equivalent to single-linkage) Note: The number of clusters needs to be
fixed externally here as well.
How to search the (vast) space of all possible partitions?
 Standard Monte-Carlo Markov chain moves:
o Propose random move from C to C’.
o Accept when Score(C)>Score(C’) or with [Score(C)/Score(C’)]b when
Score(C)<Score(C’) .
 Advantage: searches among all partitions.
 Disadvantage: computationally expensive.
How to decide which clusters are statistically significant?
 Significant clusters are those that occur in any partition with reasonably high
score, i.e. disturbing those clusters always significantly lowers the score.
another interesting problems: Assignment clustering
 Given M 0-1-N vectors of length L each, find a resolution which results in the
least number of distinct vectors
 A vector is called resolved it there are no Ns in it.
 This is called also called Binary Clustering with Missing Values where p is the
maximum number of Ns
 Identify clusters of mutually compatible vectors.
 Compatibility: If the vectors differ only at Ns they are called compatible
 Example:
o 110N0NN110
o N10100N1N0
 Problem is NP Complete
Spectral Clustering
 Spectral Clustering Algorithm - Ng, Jordan, Weiss
 Given a set of points S={s1,…sn}
 Form the affinity matrix
||s  s ||2 / 2 2
i j
Aij  e i j
 Define diagonal matrix Dii= k Aik
 Form the matrix : L=D-1/2AD-1/2
 Stack the k largest eigenvectors of L to form the columns of the new matrix X:
x1,x2,…,xk
 Renormalize each of X’s rows to have unit length. Cluster rows of Y as points in



Rk
Note: have reduced dimension from nxn to nxk
Form the matrix Y
Renormalize each of X’s rows to have unit length
Yij  X ij /( X ij 2 ) 2




Y R
Treat each row of Y as a point in Rk
Cluster into k clusters via K-means
Assign point Si to cluster j if row i of Y was assigned to cluster j
j
nxk