
MIS2502: Jing Gong
... using Pivot table is not data mining • Sum, average, min, max, time trend… ...
... using Pivot table is not data mining • Sum, average, min, max, time trend… ...
View PDF - CiteSeerX
... speaking, there are two ways to analyze the data: central analysis and distributed analysis. A typical example DIDS based central analysis is being developed by Division of Computer Science, University of California, Davis in 90th last century [2]. It is the first intrusion detection system that agg ...
... speaking, there are two ways to analyze the data: central analysis and distributed analysis. A typical example DIDS based central analysis is being developed by Division of Computer Science, University of California, Davis in 90th last century [2]. It is the first intrusion detection system that agg ...
Cluster Analysis Research Design model, problems, issues
... Structure of database: Real life data may not always contain clearly identifiable clusters. Also the order in which the tuples are arranged may affect the results when an algorithm is executed if the distance measure used is not perfect. With a structure less data (for eg. Having lots of missing val ...
... Structure of database: Real life data may not always contain clearly identifiable clusters. Also the order in which the tuples are arranged may affect the results when an algorithm is executed if the distance measure used is not perfect. With a structure less data (for eg. Having lots of missing val ...
An Entropy-Based Subspace Clustering Algorithm for - Inf
... In subspace clustering, objects are grouped into clusters according to subsets of dimensions (or attributes) of a data set [9]. These approaches involve two mains tasks, identification of the subsets of dimensions where clusters can be found and discovery of the clusters from different subsets of dim ...
... In subspace clustering, objects are grouped into clusters according to subsets of dimensions (or attributes) of a data set [9]. These approaches involve two mains tasks, identification of the subsets of dimensions where clusters can be found and discovery of the clusters from different subsets of dim ...
A Highly-usable Projected Clustering Algorithm for Gene Expression
... quality. However, the traditional functions used in evaluating cluster quality may not be applicable in the projected case. For example, if the average within-cluster distance to centroid is used within the selected subspace, the fewer attributes being selected, the better evaluation score will be r ...
... quality. However, the traditional functions used in evaluating cluster quality may not be applicable in the projected case. For example, if the average within-cluster distance to centroid is used within the selected subspace, the fewer attributes being selected, the better evaluation score will be r ...
Ensembles of Partitions via Data Resampling
... uncertainty from a set of different k-means partitions. The key idea of this approach is to integrate multiple partitions produced by clustering of pseudo-samples of a data set. Two issues, specific to the clustering combination, must be addressed: 1) The generative mechanism for individual partitio ...
... uncertainty from a set of different k-means partitions. The key idea of this approach is to integrate multiple partitions produced by clustering of pseudo-samples of a data set. Two issues, specific to the clustering combination, must be addressed: 1) The generative mechanism for individual partitio ...
Improved Multi Threshold Birch Clustering Algorithm
... parameter is larger than the optimal value, then the number of points put into sets is increased which require a continuously increasing extra cost while the leaf nodes representing sets are being clustered.[ 21], Also beginning with a good initial value for threshold would save about 10% time.[9]. ...
... parameter is larger than the optimal value, then the number of points put into sets is increased which require a continuously increasing extra cost while the leaf nodes representing sets are being clustered.[ 21], Also beginning with a good initial value for threshold would save about 10% time.[9]. ...
Efficient Mining of web log for improving the website using Density
... depends upon the websites. There are two types of the logs 1.Server logs and 2. Client logs. Server log can records all the activities on the server. Client log is not used much. The server log contains this following information ip address, session, port, date and time. By using the ip address, eac ...
... depends upon the websites. There are two types of the logs 1.Server logs and 2. Client logs. Server log can records all the activities on the server. Client log is not used much. The server log contains this following information ip address, session, port, date and time. By using the ip address, eac ...
Mining Quantitative Association Rules on Overlapped Intervals
... Clustering can be considered the most important unsupervised learning technique, which deals with finding a structure in a collection of unlabeled data. A cluster is therefore a collection of objects which are “similar” to each other and are “dissimilar” to the objects belonging to other clusters [8 ...
... Clustering can be considered the most important unsupervised learning technique, which deals with finding a structure in a collection of unlabeled data. A cluster is therefore a collection of objects which are “similar” to each other and are “dissimilar” to the objects belonging to other clusters [8 ...
Nearest-neighbor chain algorithm

In the theory of cluster analysis, the nearest-neighbor chain algorithm is a method that can be used to perform several types of agglomerative hierarchical clustering, using an amount of memory that is linear in the number of points to be clustered and an amount of time linear in the number of distinct distances between pairs of points. The main idea of the algorithm is to find pairs of clusters to merge by following paths in the nearest neighbor graph of the clusters until the paths terminate in pairs of mutual nearest neighbors. The algorithm was developed and implemented in 1982 by J. P. Benzécri and J. Juan, based on earlier methods that constructed hierarchical clusterings using mutual nearest neighbor pairs without taking advantage of nearest neighbor chains.