Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clustering Methods Hierarchical Agglomerative methods The hierarchical agglomerative clustering methods are most commonly used. The construction of an hierarchical agglomerative classification can be achieved by the following general algorithm. 1. Find the 2 closest objects and merge them into a cluster 2. Find and merge the next two closest points, where a point is either an individual object or a cluster of objects. 3. If more than one cluster remains , return to step 1 General example Step 0 a Step 1 Step 2 Step 3 Step 4 ab b abcde c cde d de e Step 4 agglomerativ e Step 3 Step 2 Step 1 Step 0 d iv is iv e Hierarchical Agglomerative methods- Implementation Individual methods are characterized by the definition used for identification of the closest pair of points, and by the means used to describe the new cluster when two clusters are merged. There are some general approaches to implementation of this algorithm: o stored matrix o stored data Stored Matrix o In the second matrix approach , an N*N matrix containing all pairwise distance values is first created, and updated as new clusters are formed. This approach has at least an O(n*n) time requirement, rising to O(n3) if a simple serial scan of dissimilarity matrix is used to identify the points which need to be fused in each agglomeration, a serious limitation for large N. Stored Data o The stored data approach required the recalculation of pairwise dissimilarity values for each of the N-1 agglomerations, and the O(N) space requirement is therefore achieved at the expense of an O(N3) time requirement. The Single Link Method (SLINK) o The single link method is probably the best known of the hierarchical methods and operates by joining, at each step, the two most similar objects, which are not yet in the same cluster. The name single link thus refers to the joining of pairs of clusters by the single shortest link between them. The Complete Link Method (CLINK) o The complete link method is similar to the single link method except that it uses the least similar pair between two clusters to determine the inter-cluster similarity (so that every cluster member is more like the furthest member of its own cluster than the furthest item in any other cluster ). This method is characterized by small, tightly bound clusters. The Group Average Method o The group average method relies on the average value of the pair wise within a cluster, rather than the maximum or minimum similarity as with the single link or the complete link methods. Since all objects in a cluster contribute to the inter –cluster similarity, each object is , on average more like every other member of its own cluster then the objects in any other cluster. Hierarchical Clustering: Summary o Single Link: Distance between two clusters is the distance between the closest points. Also called “neighbor joining.” o Average Link: Distance between clusters is distance between the cluster centroids. o Complete Link: Distance between clusters is distance between farthest pair of points. Dendrograms o A Dendrogram Shows How the Clusters are Merged Hierarchically o Ordered dendrograms Principal Component Analysis o Problem: many types of data have too many attributes to be visualized or manipulated conveniently. o For example, a single microarray experiment may have 6,000-8,000 genes. o PCA is a method for reducing the number of attributes (dimensions) of numerical data while attempting to preserve the cluster structure. o After PCA, we hopefully get the same clusters as we would if we clustered the data before PCA. o After PCA, plots of the data should still have the clusters falling into obvious groups. o By using PCA to reduce the data to 2 or 3 dimensions, off-the-shelf geometry viewers can be used to visualize data. o PCA: The Algorithm Consider the data as an m by m matrix in which each cell is the covariance between attribute i and j. The eigenvectors corresponding to the d largest eigenvalues of this matrix are the “principal components” By projecting the data onto these vectors, one obtains d-dimensional points MST-method – Minimal Spanning trees - Graph Representation of data o Representation of a set of n-dimensional “k” points as a graph o Each data point is represented as a node V (a vertex) o Edge between i-th and j-th points is a connection evaluated by the “distance” between the two points V(i) and V(j) o d i,j -matrix of distances o Representation of a set of n-dimensional “k” points as a graph Spanning Tree of a Graph o SPANNING TREE =Subgraph with all vertices connected and without cycles there are no two different paths between two points. o Minimal spanning tree=spanning tree with minimal total sum of edges. o In MST points are connected to their closest neighbors Examples of Spanning tree Forest (not a tree) MST There is a cycle! Prim algorithm o Input {dij}, i,j=1,2,…,n distances between data points o i0 is index for root (V(i0 ) starting point for the algorithm) o k=1 A(1)={V(i0)} B(1)={data points} – {V(i0 )} o After “k” steps we have two sets : o A(k) – data points included in MST, ||A(k)||=k o B(k) – data points still are not in MST, ||B(k)||=n-k o ik is index of data point from B(k) that is closest to A(k) o A(k+1)= A(k) V(ik) o B(k+1)= B(k) - V(ik) until k=n Prim algorithm graphical example Clustering of MST If a cluster exists and is partitioned. The closest point to the partition will be a cluster member. Graph partition – Graphical example Advanced hierarchical clustering Two way clustering First cluster one dimension (genes) Cluster second dimension (conditions) Use clusters of first dimension to cluster second dimension and back Different way to look at clustering Given a data set consisting of a set of objects, discover and report natural groups in this set. : Formally: a partition of the objects into subsets. There are three major theoretical issues: 1. How to score partitions? 2. How to search the (vast) space of all possible partitions? 3. How to decide which clusters are statistically significant? How to score partitions? Score: Sum of the distances (squared) from each data point to the center of its cluster. (This score is used in the popular K-means clustering.) Note: The cluster number has to be fixed externally. Score: For each point the distance to the nearest point outside its own cluster is calculated. The minimum of this distance over all points is the partition score.(equivalent to single-linkage) Note: The number of clusters needs to be fixed externally here as well. How to search the (vast) space of all possible partitions? Standard Monte-Carlo Markov chain moves: o Propose random move from C to C’. o Accept when Score(C)>Score(C’) or with [Score(C)/Score(C’)]b when Score(C)<Score(C’) . Advantage: searches among all partitions. Disadvantage: computationally expensive. How to decide which clusters are statistically significant? Significant clusters are those that occur in any partition with reasonably high score, i.e. disturbing those clusters always significantly lowers the score. another interesting problems: Assignment clustering Given M 0-1-N vectors of length L each, find a resolution which results in the least number of distinct vectors A vector is called resolved it there are no Ns in it. This is called also called Binary Clustering with Missing Values where p is the maximum number of Ns Identify clusters of mutually compatible vectors. Compatibility: If the vectors differ only at Ns they are called compatible Example: o 110N0NN110 o N10100N1N0 Problem is NP Complete Spectral Clustering Spectral Clustering Algorithm - Ng, Jordan, Weiss Given a set of points S={s1,…sn} Form the affinity matrix ||s s ||2 / 2 2 i j Aij e i j Define diagonal matrix Dii= k Aik Form the matrix : L=D-1/2AD-1/2 Stack the k largest eigenvectors of L to form the columns of the new matrix X: x1,x2,…,xk Renormalize each of X’s rows to have unit length. Cluster rows of Y as points in Rk Note: have reduced dimension from nxn to nxk Form the matrix Y Renormalize each of X’s rows to have unit length Yij X ij /( X ij 2 ) 2 Y R Treat each row of Y as a point in Rk Cluster into k clusters via K-means Assign point Si to cluster j if row i of Y was assigned to cluster j j nxk