
Clustering census data: comparing the performance of
... The main objective of this paper is to evaluate the performance of the SOM and kmeans in the clustering problem, under specific conditions. Especially relevant is the possibility of providing empirical evidence to support the allegations that SOM can be a more effective tool in census-based data clu ...
... The main objective of this paper is to evaluate the performance of the SOM and kmeans in the clustering problem, under specific conditions. Especially relevant is the possibility of providing empirical evidence to support the allegations that SOM can be a more effective tool in census-based data clu ...
Data Mining: Concepts and Techniques
... Not the most effective and accurate clustering algorithm that exists, but it is efficient as it has a complexity of O(n) where n is the number of data objects [Portnoy01]. 1) Initialize the set of clusters, S, to the empty set. 2) Obtain an object d from the data set. If S is empty, then create a cl ...
... Not the most effective and accurate clustering algorithm that exists, but it is efficient as it has a complexity of O(n) where n is the number of data objects [Portnoy01]. 1) Initialize the set of clusters, S, to the empty set. 2) Obtain an object d from the data set. If S is empty, then create a cl ...
a subspace clustering of high dimensional data
... overlapping problem but also limits the information loss to cope with the data coverage problem. The highdimensional data is inherently more complex in clustering, classification, and similarity search. It produces identical results irrespective of the order in which input records are presented and ...
... overlapping problem but also limits the information loss to cope with the data coverage problem. The highdimensional data is inherently more complex in clustering, classification, and similarity search. It produces identical results irrespective of the order in which input records are presented and ...
Multiple Clustering Views via Constrained Projections ∗
... (under some notion of similarity) into the same cluster whilst separating dissimilar ones into different clusters. Toward this goal, many algorithms have been developed by which some clustering objective function is proposed along with an optimization mechanism such as k-means, mixture models, hiera ...
... (under some notion of similarity) into the same cluster whilst separating dissimilar ones into different clusters. Toward this goal, many algorithms have been developed by which some clustering objective function is proposed along with an optimization mechanism such as k-means, mixture models, hiera ...
view - dline
... is in an arbitrary d-dimensional space is not dense. Such algorithms are that each dimension is divided into a number of grids by using subspace clustering algorithm based on density clustering and grid-based method of combining cluster analysis. In this strategy the most prominent problem is easy t ...
... is in an arbitrary d-dimensional space is not dense. Such algorithms are that each dimension is divided into a number of grids by using subspace clustering algorithm based on density clustering and grid-based method of combining cluster analysis. In this strategy the most prominent problem is easy t ...
Permission to make digital or hard copies of all or part of this work
... When the data has one dimension, we display its estimated probability distribution. When the data has two dimensions, it can be displayed using scatterplots [10]. When the data has more than three dimensions, we need to apply visualization techniques. There are several multivariate-data visualizatio ...
... When the data has one dimension, we display its estimated probability distribution. When the data has two dimensions, it can be displayed using scatterplots [10]. When the data has more than three dimensions, we need to apply visualization techniques. There are several multivariate-data visualizatio ...
UNIT V CLUSTERING, APPLICATIONS AND TRENDS IN DATA
... based on Euclidean or Manhattan distance measures. Algorithms based on such distance measures tend to find spherical clusters with similar size and density. However, a cluster could be of any shape. It is important to develop algorithms that can detect clusters of arbitrary shape. Minimal requiremen ...
... based on Euclidean or Manhattan distance measures. Algorithms based on such distance measures tend to find spherical clusters with similar size and density. However, a cluster could be of any shape. It is important to develop algorithms that can detect clusters of arbitrary shape. Minimal requiremen ...
Multi-Step Density-Based Clustering
... Abstract. Data mining in large databases of complex objects from scientific, engineering or multimedia applications is getting more and more important. In many areas, complex distance measures are first choice but also simpler distance functions are available which can be computed much more efficien ...
... Abstract. Data mining in large databases of complex objects from scientific, engineering or multimedia applications is getting more and more important. In many areas, complex distance measures are first choice but also simpler distance functions are available which can be computed much more efficien ...
Nearest-neighbor chain algorithm

In the theory of cluster analysis, the nearest-neighbor chain algorithm is a method that can be used to perform several types of agglomerative hierarchical clustering, using an amount of memory that is linear in the number of points to be clustered and an amount of time linear in the number of distinct distances between pairs of points. The main idea of the algorithm is to find pairs of clusters to merge by following paths in the nearest neighbor graph of the clusters until the paths terminate in pairs of mutual nearest neighbors. The algorithm was developed and implemented in 1982 by J. P. Benzécri and J. Juan, based on earlier methods that constructed hierarchical clusterings using mutual nearest neighbor pairs without taking advantage of nearest neighbor chains.