
Mining mass-spectra for diagnosis and biomarker - (CUI)
... to different masses. Second, if two clusters have been identified as possible candidates for merging, i.e., d(C1 , C2 ) ≤ 2 × merr , merging will only take place if the two farthest elements of the clusters have a distance which is smaller than 2 × merr (the second condition of the while loop). This ...
... to different masses. Second, if two clusters have been identified as possible candidates for merging, i.e., d(C1 , C2 ) ≤ 2 × merr , merging will only take place if the two farthest elements of the clusters have a distance which is smaller than 2 × merr (the second condition of the while loop). This ...
Data clustering: 50 years beyond K-means
... There are several books published on data clustering; classic ones are by Sokal and Sneath (1963), Anderberg (1973), Hartigan (1975), Jain and Dubes (1988), and Duda et al. (2001). Clustering algorithms have also been extensively studied in data mining (see books by Han and Kamber (2000) and Tan et ...
... There are several books published on data clustering; classic ones are by Sokal and Sneath (1963), Anderberg (1973), Hartigan (1975), Jain and Dubes (1988), and Duda et al. (2001). Clustering algorithms have also been extensively studied in data mining (see books by Han and Kamber (2000) and Tan et ...
Decision Tree Construction
... Sequential Patterns • Considering the purchase from data in sample Table, each group of tuples with same custID can be viewed as a sequence of transactions ordered by date. • This allows us to identify frequently ...
... Sequential Patterns • Considering the purchase from data in sample Table, each group of tuples with same custID can be viewed as a sequence of transactions ordered by date. • This allows us to identify frequently ...
A Framework for Grouping High Dimensional Data
... are two of the mostly used strategies. We referred to them in the following sections as PCAKM. Although we can initially reduce the dimensionality by any approach and then use clustering approaches to group high dimensional data, the performance can also be improved since these two techniques are co ...
... are two of the mostly used strategies. We referred to them in the following sections as PCAKM. Although we can initially reduce the dimensionality by any approach and then use clustering approaches to group high dimensional data, the performance can also be improved since these two techniques are co ...
Paper
... are typically small, atom-, bond- or ring-centred fragment substructures that are algorithmically generated from a connection table when a molecule is added to the database that is to be searched. One common approach to screening involves listing the fragments that have been chosen for use as screen ...
... are typically small, atom-, bond- or ring-centred fragment substructures that are algorithmically generated from a connection table when a molecule is added to the database that is to be searched. One common approach to screening involves listing the fragments that have been chosen for use as screen ...
A Novel Data Mining Methodology for Narrative Text Mining and Its
... with well defined contents and formats, and nonstructural data in the form of narrative texts to provide background information with regard to each incident recorded. Most existing data mining methods were initially devised to work with structural data, and a lot of efforts have been devoted to maki ...
... with well defined contents and formats, and nonstructural data in the form of narrative texts to provide background information with regard to each incident recorded. Most existing data mining methods were initially devised to work with structural data, and a lot of efforts have been devoted to maki ...
Nearest-neighbor chain algorithm

In the theory of cluster analysis, the nearest-neighbor chain algorithm is a method that can be used to perform several types of agglomerative hierarchical clustering, using an amount of memory that is linear in the number of points to be clustered and an amount of time linear in the number of distinct distances between pairs of points. The main idea of the algorithm is to find pairs of clusters to merge by following paths in the nearest neighbor graph of the clusters until the paths terminate in pairs of mutual nearest neighbors. The algorithm was developed and implemented in 1982 by J. P. Benzécri and J. Juan, based on earlier methods that constructed hierarchical clusterings using mutual nearest neighbor pairs without taking advantage of nearest neighbor chains.