
Preprocessing data sets for association rules using community
... in ARcl or ARcd (the same rule can be extracted in different groups). The aim is to analyze the amount of knowledge that was repeatedly generated in different groups; the lower the value the better the result. • MN-RSP: ratio of new rules in ARcl or ARcd . A rule is new if it is not in the AR set. T ...
... in ARcl or ARcd (the same rule can be extracted in different groups). The aim is to analyze the amount of knowledge that was repeatedly generated in different groups; the lower the value the better the result. • MN-RSP: ratio of new rules in ARcl or ARcd . A rule is new if it is not in the AR set. T ...
Environmental Data Exploration with Data
... In addition, monitoring certain areas involves often obtaining the on–line data. Thus, how long should we wait for accumulating the data before starting building the clusters? Can we simply somehow create the clusters “dynamically” as the data come? These above–mentioned problems represent typically ...
... In addition, monitoring certain areas involves often obtaining the on–line data. Thus, how long should we wait for accumulating the data before starting building the clusters? Can we simply somehow create the clusters “dynamically” as the data come? These above–mentioned problems represent typically ...
Educational Data Mining using Improved Apriori Algorithm
... uses the intersection operation to generate frequent item sets. It is different from the existing algorithm as it scans the database only one time and then uses the database to mine association rules. The proposed technique has been implemented in a teaching evaluation system, to enhance the foundat ...
... uses the intersection operation to generate frequent item sets. It is different from the existing algorithm as it scans the database only one time and then uses the database to mine association rules. The proposed technique has been implemented in a teaching evaluation system, to enhance the foundat ...
Cluster By: A New SQL Extension for Spatial Data Aggregation*
... creasingly finer resolution. These ever growing datasets necessitate the wide application of spatial databases [15, 17, 13]. Queries on these geo-referenced data often require the aggregation of isolated data points to form spatial clusters and obtain properties of the clusters. However, current SQL ...
... creasingly finer resolution. These ever growing datasets necessitate the wide application of spatial databases [15, 17, 13]. Queries on these geo-referenced data often require the aggregation of isolated data points to form spatial clusters and obtain properties of the clusters. However, current SQL ...
Internet Traffic Identification using Machine Learning
... information necessary for the tests, the flows must be identified within the traces. These flows, also known as connections, are a bidirectional exchange of packets between two nodes. These two nodes can be identified based on their IP addresses and transport layer port numbers which stay constant d ...
... information necessary for the tests, the flows must be identified within the traces. These flows, also known as connections, are a bidirectional exchange of packets between two nodes. These two nodes can be identified based on their IP addresses and transport layer port numbers which stay constant d ...
An Overview of Web Data Clustering Practices
... The algorithms for users’ sessions clustering may be classified into two approaches: similaritybased and model-based (or probabilistic). 2.1 Similarity-based clustering approach Similarity measures have been proposed towards capturing Web users’ common practices whereas effective Web users’ logs pro ...
... The algorithms for users’ sessions clustering may be classified into two approaches: similaritybased and model-based (or probabilistic). 2.1 Similarity-based clustering approach Similarity measures have been proposed towards capturing Web users’ common practices whereas effective Web users’ logs pro ...
LNCS 3268 - An Overview of Web Data Clustering
... correspond to communities with a definite topic of interest. In this framework, several approaches have been proposed (e.g. Maximum Flow and Minimal cuts, graph cuts and partitions, PageRank algorithm etc.) in order to identify them [12]. – Compound Documents are represented as Web graphs, which are ...
... correspond to communities with a definite topic of interest. In this framework, several approaches have been proposed (e.g. Maximum Flow and Minimal cuts, graph cuts and partitions, PageRank algorithm etc.) in order to identify them [12]. – Compound Documents are represented as Web graphs, which are ...
14 Resampling Methods for Unsupervised Learning from Sample Data Ulrich Möller
... It has been shown that for increasing values of N, the percentage of original data which are not contained in a bootstrap sample converges to about 37%. If this information loss is considered to be too large for an adequate recognition of the data structure, the bootstrap scheme could be applied to ...
... It has been shown that for increasing values of N, the percentage of original data which are not contained in a bootstrap sample converges to about 37%. If this information loss is considered to be too large for an adequate recognition of the data structure, the bootstrap scheme could be applied to ...
Adaptive Product Normalization: Using Online Learning for Record
... collective approaches have been shown to be more accurate than the pairwise approach on certain domains, the simultaneous inference process makes these methods more computationally intensive. ...
... collective approaches have been shown to be more accurate than the pairwise approach on certain domains, the simultaneous inference process makes these methods more computationally intensive. ...
Low-rank Kernel Matrix Factorization for Large Scale Evolutionary
... version for static data) and Colibri-D (for dynamic data), compute the low-rank approximation to a matrix with a non-redundant subspace, and are proved to lose no accuracy compared to the best competitors, e.g., CUR [45] and CMD [46], while achieving significant savings in space and time. Colibri-D i ...
... version for static data) and Colibri-D (for dynamic data), compute the low-rank approximation to a matrix with a non-redundant subspace, and are proved to lose no accuracy compared to the best competitors, e.g., CUR [45] and CMD [46], while achieving significant savings in space and time. Colibri-D i ...
Data Mining: Mining Association Rules Definitions
... If X is a frequent itemset in T , then its every non-empty subset is also a frequent itemset in T . Why is this useful? Any frequent itemset discovery algorithm is essentially a specialized search algorithm over the space of all itemsets. Apriori principle allows us to prune potentially a lot of ite ...
... If X is a frequent itemset in T , then its every non-empty subset is also a frequent itemset in T . Why is this useful? Any frequent itemset discovery algorithm is essentially a specialized search algorithm over the space of all itemsets. Apriori principle allows us to prune potentially a lot of ite ...
Nearest-neighbor chain algorithm

In the theory of cluster analysis, the nearest-neighbor chain algorithm is a method that can be used to perform several types of agglomerative hierarchical clustering, using an amount of memory that is linear in the number of points to be clustered and an amount of time linear in the number of distinct distances between pairs of points. The main idea of the algorithm is to find pairs of clusters to merge by following paths in the nearest neighbor graph of the clusters until the paths terminate in pairs of mutual nearest neighbors. The algorithm was developed and implemented in 1982 by J. P. Benzécri and J. Juan, based on earlier methods that constructed hierarchical clusterings using mutual nearest neighbor pairs without taking advantage of nearest neighbor chains.