Cluster Analysis: Basic Concepts and Algorithms

PRACTICAL K-ANONYMITY ON LARGE DATASETS By Benjamin

... Rapleaf is a startup which specializes in web personalization[2]. If a website is able to get clues about the interests or demographics of a user, it can tailor the experience to that person. To allow a website to learn about that individual, non-personally identifying microdata can be stored in a c ...

04_CAINE-clustering

Trajectory-Based Clustering

Algorithms and proto-type for pattern detection in probabilistic data

... the desired task. According to recent literature surveys (i.e. [20]), there are many potential benefits of feature extraction and selection: facilitating data visualization and data understanding, reducing the measurement and storage requirements, reducing training and utilization times, defying the ...

Multi-label Large Margin Hierarchical Perceptron

Cyclic Repeated Patterns in Sequential Pattern Mining

... 4.1 Fuzzy C means Clustering Algorithm (FCM) ...

A Gene Expression Programming Algorithm for Multi

... in classical classification [21, 27, 58], although it has not been previously used in multi-label classification. This paradigm has been chosen because it represents functions easily and makes them evolve to satisfactory solutions. These functions have been used as discriminant functions to build mu ...

On the Necessary and Sufficient Conditions of a Meaningful

... for many data mining problems including clustering, nearest neighbor search, and indexing. Recent research results show that if the Pearson variation of the distance distribution converges to zero with increasing dimensionality, the distance function will become unstable (or meaningless) in high dim ...

computational methods for learning and inference on dynamic

Data Mining: Concepts and Techniques

... noise and outliers because a medoid is less influenced by outliers or other extreme values than a mean Pam works efficiently for small data sets but does not scale well for large data sets. ...

The Utility of Clustering in Prediction Tasks

Pattern Discovery from Stock Time Series Using Self

Density Biased Sampling

"Approximate Kernel k-means: solution to Large Scale Kernel Clustering"

as a PDF

Optimal Grid-Clustering: Towards Breaking the Curse of

... which uses a data partitioning according to the expected cluster structure of the data [ZRL96]. BIRCH uses a hierarchical data structure called CF-tree (Cluster Feature Tree) which is a balanced tree for storing the clustering features. BIRCH tries to build the best possible clustering using the giv ...

Probabilistic Discovery of Time Series Motifs

Summarizing Categorical Data by Clustering Attributes

Measuring Constraint-Set Utility for Partitional Clustering Algorithms

Clustering-JHan - Department of Computer Science

... noise and outliers because a medoid is less influenced by outliers or other extreme values than a mean Pam works efficiently for small data sets but does not scale well for large data sets. ...

Mining Strong Affinity Association Patterns in Data Sets

Finding Associations and Computing Similarity via Biased Pair

Scaling EM Clustering to Large Databases Bradley, Fayyad, and

Hot Zone Identification: Analyzing Effects of Data Sampling On

< 1 ... 6 7 8 9 10 11 12 13 14 ... 88 >

Nearest-neighbor chain algorithm

In the theory of cluster analysis, the nearest-neighbor chain algorithm is a method that can be used to perform several types of agglomerative hierarchical clustering, using an amount of memory that is linear in the number of points to be clustered and an amount of time linear in the number of distinct distances between pairs of points. The main idea of the algorithm is to find pairs of clusters to merge by following paths in the nearest neighbor graph of the clusters until the paths terminate in pairs of mutual nearest neighbors. The algorithm was developed and implemented in 1982 by J. P. Benzécri and J. Juan, based on earlier methods that constructed hierarchical clusterings using mutual nearest neighbor pairs without taking advantage of nearest neighbor chains.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nearest-neighbor chain algorithm