Download clustering - The University of Kansas

EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006 The UNIVERSITY of Kansas Administrative Project assignments have been distributed Schedule a meeting this week with the instructor to start to work on your projects. 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide2 Overview 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis 3. Types of Clusters 4. A Categorization of Major Clustering Methods 5. Partitioning Methods 6. Hierarchical Methods 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide3 What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide4 Applications of Cluster Analysis Understanding Group related documents for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations Summarization Reduce the size of large data sets Clustering precipitation in Australia 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide5 Multidisciplinary Efforts of Clustering Pattern Recognition Spatial Data Analysis Create thematic maps in GIS by clustering feature spaces Detect spatial clusters or for other spatial mining tasks Image Processing Economic Science (especially market research) WWW Document classification Cluster Weblog data to discover groups of similar access patterns Bioinfo: Phylogenetic tree Microarray analysis 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide6 What is not Cluster Analysis? Supervised classification Have class label information Simple segmentation Dividing students into different registration groups alphabetically, by last name Results of a query Groupings are a result of an external specification Graph partitioning Some mutual relevance and synergy, but areas are not identical 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide7 Terms in Cluster Analysis Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters Unsupervised learning: no predefined classes So what? Clustering can be used as a stand-alone tool to get insight into data distribution Clustering can be used as a preprocessing step for other algorithms such as discretization 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide8 Quality: What Is Good Clustering? A good clustering method will produce high quality clusters with high intra-class similarity low inter-class similarity The quality of a clustering result depends on both the similarity measure used by the method and its implementation The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide9 Measure the Quality of Clustering Dissimilarity/Similarity metric: Similarity is expressed in terms of a distance function, typically metric: d(i, j) There is a separate “quality” function that measures the “goodness” of a cluster. The definitions of distance functions are usually very different for boolean, categorical, ordinal, interval, ratio, and vector variables. Weights should be associated with different variables based on applications and data semantics. It is hard to define “similar enough” or “good enough” the answer is typically highly subjective. 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide10 Requirements of Clustering in Data Mining Scalability Ability to deal with different types of attributes Ability to handle dynamic data Discovery of clusters with arbitrary shape Minimal requirements for domain knowledge to determine input parameters Able to deal with noise and outliers Insensitive to order of input records High dimensionality Incorporation of user-specified constraints Interpretability and usability 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide11 Data Matrix Data matrix Also called object-by-variable structure  x11   ... x  i1  ... x  n1 ... x1f ... ... ... ... xif ... ... ... ... ... xnf ... ... x1p   ...  xip   ...  xnp   Xf = (x1f, x2f, …, xnf)’ is the fth variable 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide12 Data Structure Dissimilarity matrix Also called object-by-object structure d(i,j): dissimilarity between objects i and j Nonnegative Close to 0: similar  0  d(2,1) 0   d(3,1) d ( 3,2) 0  : :  : d ( n,1) d ( n,2) ... 9/27/2006 Clustering I       ... 0 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide13 Data Structures Variance, co-Variance matrix  v( X 1 )  v(X ,X ) v( X )  2  2 1   v(X 3 ,X 1 ) v(X 3 ,X 2 ) v( X 3 )    : :  :  v(X n ,X 1 ) v(X n ,X 2 ) ... ... v( X n ) 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide14 Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects are (we are in the business of comparing apple and orange!). Is higher when objects are more alike. Often falls in the range [0,1] Dissimilarity Numerical measure of how different are two data objects Lower when objects are more alike Minimum dissimilarity is often 0 Upper limit varies Proximity refers to a similarity or dissimilarity 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide15 Similarity (Dissimilarity) is Measured at Different Levels Similarity between a single measurement for two objects Similarity between two objects (across a group of measurements) Similarity between an object and a cluster Similarity between two clusters 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide16 Type of data in clustering analysis Binary variables Nominal Ordinal Interval Ratio variables Variables of mixed types 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide17 Similarity/Dissimilarity for Simple Attributes p and q are the attribute values for two data objects. 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide18 Binary Variables A binary variable is symmetric if both of its states are equally valuable and carry the same weight (e.g. gender). The dissimilarity between objects I and J (using a group of symmetric variables) is defined with the simple matching coefficient (SMC): Object j 1 0 1 a b Object i 0 c d sum a  c b  d d (i, j)  9/27/2006 Clustering I sum a b cd p bc a bc  d Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide19 Binary Variables A binary variable is asymmetric if the outcomes of the states are not equally important. The distance between two objects on asymmetric binary variables is defined using Jaccard distance d (i, j)  9/27/2006 Clustering I bc a bc Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide20 Simple Matching versus Jaccard p= 1000000000 q= 0000001001 M01 = 2 M10 = 1 M00 = 7 M11 = 0 (the number of attributes where p was 0 and q was 1) (the number of attributes where p was 1 and q was 0) (the number of attributes where p was 0 and q was 0) (the number of attributes where p was 1 and q was 1) Simple match distance = (M10 + M01)/(M01 + M10 + M11 + M00) = (1+2) / (2+1+0+7) = 0.3 J= (M10 + M01) / (M01 + M10 + M11) = (1 + 2) / (2 + 1 + 0) = 1 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide21 Nominal Variables A generalization of the binary variable in that it can take more than 2 states, e.g., red, yellow, blue, green Method 1: Simple matching m: # of matches, p: total # of variables m d (i, j)  p  p Method 2: use a large number of binary variables creating a new binary variable for each of the M nominal states 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide22 Ordinal Variables An ordinal variable can be discrete or continuous Order is important, e.g., rank Can be treated like interval-scaled replace xif by their rank rif {1,...,M f } map the range of each variable onto [0, 1] by replacing i-th object in the f-th variable by zif rif 1  M f 1 compute the dissimilarity using methods for interval variables 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide23 Interval variables Standardize data Calculate the mean absolute deviation: sf  1 n (| x1 f  m f |  | x2 f  m f | ... | xnf  m f |) where m f  1n (x1 f  x2 f  ...  xnf ) . Calculate the standardized measurement (z-score) xif  m f zif  sf Using mean absolute deviation is more robust than using standard deviation 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide24 Similarity and Dissimilarity Between Objects Some popular ones include: Minkowski distance: d (i, j)  q (| x  x |q  | x  x |q ... | x  x |q ) i1 j1 i2 j2 ip jp where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two pdimensional data objects, and q is a positive integer 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide25 Minkowski Distance: Examples r = 1. City block (Manhattan, taxicab, L1 norm) distance. A common example of this is the Hamming distance, which is just the number of bits that are different between two binary vectors r = 2. Euclidean distance r  . “supremum” (Lmax norm, L norm) distance. This is the maximum difference between any component of the vectors Do not confuse r with n, i.e., all these distances are defined for all numbers of dimensions. 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide26 Distance Defines our Perception of the World What is a circle if we use Manhattan distance? The definition of a circle is that all the points on a circle have equal distances to a fix point (the center) 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide27 Ratio-Scaled Variables Ratio-scaled variable: a positive measurement on a nonlinear scale, approximately at exponential scale, such as AeBt or Ae-Bt A original measurement B a constant t a parameter for non-linear scale Methods: treat them like interval-scaled variables—not a good choice! (why?—the scale can be distorted) apply logarithmic transformation yif = log(xif) treat them as continuous ordinal data treat their rank as interval-scaled 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide28 Variables of Mixed Types A database may contain all the six types of variables symmetric binary, asymmetric binary, nominal, ordinal, interval and ratio One may use a weighted formula to combine their effects  pf  1 ij( f ) d ij( f ) d (i, j)   pf  1 ij( f ) d(i, j)(f) is the distance of the fth variable for objects I and J. δ(i, j)(f) is the weighting of the fth variable 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide29 Vector Objects Vector objects: keywords in documents, gene features in micro-arrays, etc. Broad applications: information retrieval, biologic taxonomy, etc. Cosine measure A variant: Tanimoto coefficient 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide30 Types of Clusters Well-separated clusters Center-based clusters Contiguous clusters Density-based clusters Model based 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide31 Types of Clusters: Well-Separated Well-Separated Clusters: A cluster is a set of points such that any point in a cluster is closer (or more similar) to every other point in the cluster than to any point not in the cluster. 3 well-separated clusters 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide32 Types of Clusters: Center-Based Center-based A cluster is a set of objects such that an object in a cluster is closer (more similar) to the “center” of a cluster, than to the center of any other cluster The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most “representative” point of a cluster 4 center-based clusters 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide33 Types of Clusters: Contiguity-Based Contiguous Cluster (Nearest neighbor or Transitive) A cluster is a set of points such that a point in a cluster is closer (or more similar) to one or more other points in the cluster than to any point not in the cluster. 8 contiguous clusters 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide34 Types of Clusters: Density-Based Density-based A cluster is a dense region of points, which is separated by low-density regions, from other regions of high density. Used when the clusters are irregular or intertwined, and when noise and outliers are present. 6 density-based clusters 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide35 Types of Clusters: Model Based Shared Property or Conceptual Clusters Finds clusters that share some common property or represent a particular model. . Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 Movie 6 Movie 7 rating 8 Viewer 71 6 Viewer 52 4 Viewer 33 2 Viewer 14 0 Viewer 5 1 2 4 4 3 5 6 7 viewer11 viewer 3 2 3 4 6 3 4 5 7 movie 1 movie 2 5 movie 45 36 movie 3 viewer 4 4 2 Overlapping Circles 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide36 Major Clustering Approaches (I) Partitioning approach: Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors Typical methods: k-means, k-medoids, CLARANS Hierarchical approach: Create a hierarchical decomposition of the set of data (or objects) using some criterion Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide37 Partitional Clustering Original Points 9/27/2006 Clustering I A Partitional Clustering Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide38 Hierarchical Clustering p1 p3 p4 p2 p1 p2 Traditional Hierarchical Clustering p3 p4 Traditional Dendrogram p1 p3 p4 p2 p1 p2 Non-traditional Hierarchical Clustering 9/27/2006 Clustering I p3 p4 Non-traditional Dendrogram Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide39 Major Clustering Approaches (II) Density-based approach: Based on connectivity and density functions Typical methods: DBSACN, OPTICS, DenClue Grid-based approach: based on a multiple-level granularity structure Typical methods: STING, WaveCluster, CLIQUE Model-based: A model is hypothesized for each of the clusters and tries to find the best fit of that model to each other Typical methods: EM, SOM, COBWEB 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide40 Typical Alternatives to Calculate the Distance between Clusters Single link: smallest distance between an element in one cluster and an element in the other, i.e., dis(Ki, Kj) = min(tip, tjq) Complete link: largest distance between an element in one cluster and an element in the other, i.e., dis(Ki, Kj) = max(tip, tjq) Average: avg distance between an element in one cluster and an element in the other, i.e., dis(Ki, Kj) = avg(tip, tjq) Centroid: distance between the centroids of two clusters, i.e., dis(Ki, Kj) = dis(Ci, Cj) Medoid: distance between the medoids of two clusters, i.e., dis(Ki, Kj) = dis(Mi, Mj) Medoid: one chosen, centrally located object in the cluster 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide41 Centroid, Radius and Diameter of a Cluster (for numerical data sets) Centroid: the “mean” point of a cluster Cm  iN 1(t ip ) N Radius: square root of average distance from any point of the cluster to its centroid  N (t  cm ) 2 ip i  1 Rm  N Diameter: square root of average mean squared distance between all pairs of points in the cluster  N  N (t  t ) 2 Dm  i 1 i 1 ip iq N ( N 1) 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide42 Partitioning Algorithms: Basic Concept Partitioning method: Construct a partition of a database D of n objects into a set of k clusters, s.t., min sum of squared distance km1tmiKm (Cm  tmi )2 Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion Global optimal: exhaustively enumerate all partitions Heuristic methods: k-means and k-medoids algorithms k-means (MacQueen’67): Each cluster is represented by the center of the cluster k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87): Each cluster is represented by one of the objects in the cluster 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide43 The K-Means Clustering Method Given k, the k-means algorithm is implemented in four steps: Partition objects into k nonempty subsets Compute seed points as the centroids of the clusters of the current partition (the centroid is the mean point, of the cluster) Assign each object to the cluster with the nearest seed point Go back to Step 2, stop when no more new assignment 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide44 The K-Means Clustering Method 10 9 8 7 6 5 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 2 1 0 0 1 2 3 4 5 6 7 8 K=2 Arbitrarily choose K object as initial cluster center 9 10 Assign each objects to most similar center 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 4 3 2 1 0 0 10 10 9 9 8 8 7 7 6 6 5 4 3 2 1 0 0 9/27/2006 Clustering I Update the cluster means 1 2 3 4 5 6 7 8 9 10 Update the cluster means Mining Biological Data KU EECS 800, Luke Huan, Fall’06 1 2 3 4 5 6 7 8 9 10 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 slide45 Comments on the K-Means Method Strength Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k)) Comment: Often terminates at a local optimum. The global optimum may be found using techniques such as genetic algorithms 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide46 Comments on the K-Means Method Weakness Applicable only when mean is defined, then what about categorical data? Need to specify k, the number of clusters, in advance Unable to handle noisy data and outliers Not suitable to discover clusters with non-convex shapes 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide47 Variations of the K-Means Method A few variants of the k-means which differ in Selection of the initial k means Dissimilarity calculations Strategies to calculate cluster means Handling categorical data: k-modes (Huang’98) Replacing means of clusters with modes Using new dissimilarity measures to deal with categorical objects Using a frequency-based method to update modes of clusters A mixture of categorical and numerical data: k-prototype method 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide48 A Problem of K-means Sensitive to outliers Outlier: objects with extremely large values May substantially distort the distribution of the data + + K-medoids: the most centrally located object in a cluster 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 9/27/2006 Clustering I 1 2 3 4 5 6 7 8 9 10 0 1 2 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 3 4 5 6 7 8 9 10 slide49 A Problem K-means: Differing Density Original Points 9/27/2006 Clustering I K-means (3 Clusters) Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide50 A Problem of K-means: Non-globular Shapes Original Points 9/27/2006 Clustering I K-means (2 Clusters) Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide51 The K-Medoids Clustering Method Find representative objects, called medoids, in clusters PAM (Partitioning Around Medoids, 1987) starts from an initial set of medoids and iteratively replaces one of the medoids by one of the non-medoids if it improves the total distance of the resulting clustering PAM works effectively for small data sets, but does not scale well for large data sets CLARA (Kaufmann & Rousseeuw, 1990) CLARANS (Ng & Han, 1994): Randomized sampling 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide52 A Typical K-Medoids Algorithm (PAM) Total Cost = 20 10 10 10 9 9 9 8 8 8 Arbitrary choose k object as initial medoids 7 6 5 4 3 2 7 6 5 4 3 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Assign each remainin g object to nearest medoids 7 6 5 4 3 2 1 0 0 K=2 10 Until no change If quality is improved. Compute total cost of swapping 8 7 6 4 5 6 7 8 9 10 9 8 7 6 5 5 4 4 3 3 2 2 1 1 0 0 0 9/27/2006 Clustering I 3 10 9 Swapping O and Oramdom 2 Randomly select a nonmedoid object,Oramdom Total Cost = 26 Do loop 1 1 2 3 4 5 6 7 8 9 10 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 0 1 2 3 4 5 6 7 8 9 10 slide53 PAM (Partitioning Around Medoids) (1987) PAM (Kaufman and Rousseeuw, 1987), built in Splus Use real object to represent the cluster Select k representative objects arbitrarily For each pair of non-selected object h and selected object i, calculate the total swapping cost TCih For each pair of i and h, If TCih < 0, i is replaced by h Then assign each non-selected object to the most similar representative object repeat steps 2-3 until there is no change 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide54 What Is the Problem with PAM? Pam is more robust than k-means in the presence of noise and outliers because a medoid is less influenced by outliers or other extreme values than a mean Pam works efficiently for small data sets but does not scale well for large data sets. O(k(n-k)2 ) for each iteration where n is # of data,k is # of clusters Sampling based method, CLARA(Clustering LARge Applications) 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide55 CLARA (Clustering Large Applications) (1990) CLARA (Kaufmann and Rousseeuw in 1990) Built in statistical analysis packages, such as S+ It draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as the output Strength: deals with larger data sets than PAM Weakness: Efficiency depends on the sample size A good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide56 CLARANS Clustering Large Applications based upon RANdomized Search The problem space: graph of clustering  n  vertices in k A vertex is k from n numbers, total PAM search the whole graph by random walking CLARA search some random sub-graphs CLARANS climbs mountains Randomly sample a set and select k medoids Consider neighbors of medoids as candidate for new medoids Use the sample set to verify Repeat multiple times to avoid bad samples 9/27/2006 Clustering I Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide57 Hierarchical Clustering Use distance matrix as clustering criteria. This method does not require the number of clusters k as an input, but needs a termination condition Step 0 a Step 1 Step 2 Step 3 Step 4 agglomerative (AGNES) ab b abcde c cde d de e Step 4 9/27/2006 Clustering I Step 3 Step 2 Step 1 Step 0 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 divisive (DIANA) slide58 AGNES (Agglomerative Nesting) Introduced in Kaufmann and Rousseeuw (1990) Implemented in statistical analysis packages, e.g., Splus Use the Single-Link method and the dissimilarity matrix. Merge nodes that have the least dissimilarity Go on in a non-descending fashion Eventually all nodes belong to the same cluster 10 10 10 9 9 9 8 8 8 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 9/27/2006 Clustering I 2 3 4 5 6 7 8 9 10 0 0 1 2 3 4 5 6 7 8 9 10 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 0 1 2 3 4 5 6 7 8 9 10 slide59

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download clustering - The University of Kansas