An Efficient k-Means Clustering Algorithm Using Simple Partitioning

... to cluster a large dataset becomes an important operational objective. To solve this and other related performance problems, Alsabti et al. [1] proposed an algorithm based on the data structure of the k-d tree and used a pruning function on the candidate centroid of a cluster. While this method can ...

Aggregated Probabilistic Fuzzy Relational

... similarity to human reasoning. The theory has been successfully applied to many fields such as manufacturing, engineering, diagnosis, economics, and others (Höppner, 1999). In this context, a generalization of the previously methods in order to be used in clustering of fuzzy data (or fuzzy numbers) ...

A Mutual Subspace Clustering Algorithm for High Dimensional

... information in the clustering spaces is used to form the mutual subspace clusters. On the cluster assignment if the signature subspaces in the clustering spaces agree with each other, then that cluster can become stable. That is, in the clustering spaces the centers attract the approximately same se ...

cougar^2: an open source machine learning and data mining

... extends existing work which yields to inter-operability problems and significant effort is essentially wasted porting one algorithm into a different learning framework. (3) Former developers leave code without adequate documentation which makes it very difficult to reuse and improve. These problems ...

Boosting for Real-Time Multivariate Time Series Classification

CLOPE: A Fast and Effective Clustering Algorithm for - Inf

... The Largeltem [13] algorithm groups large categorical databases by iterative optimization of a global criterion function. The criterion fimction is based on the notion of large item that is the item in a cluster having occurrence rates larger than a user-defined parameter minimum support. Computing ...

O - 國立雲林科技大學

Explaining clusters with inductive logic programming and linked data

... Knowledge Discovery in Databases (KDD) is the process of detecting hidden patterns in large amounts of data [2]. In many real-world contexts, the explanation of such patterns is provided by experts, whose work is to analyse, visualise and interpret the results obtained out of a data mining process i ...

Genetic Algorithms for Multi-Criterion Classification and Clustering

... string of n integers where the ith integer signifies the group number of the ith object. When there are two clusters this can be reduced to a binary encoding scheme by using 0 and 1 as the group identifier. Bezdek et al. [30] used kn matrix to represent a clustering, with each row corresponding to ...

Supervised Learning for Automatic Classification of Documents

slide

... 1. Exhaustive Recursive Search (ERS): the input network is represented by an adjacency matrix M. (motif size <= 4) 2. ESU: starting with individual nodes and adding one node at a time until the required size k is reached. (motif size <=14) ...

Soil data clustering by using K-means and fuzzy K

Streaming-Data Algorithms For High

... medical or marketing data, for example, the volume of data stored on disk is so large that it is only possible to make a small number of passes over the data. In the data stream model [13], the data points can only be accessed in the order in which they arrive. Random access to the data is not allo ...

Using support vector machines in predicting and classifying factors

... Summer2016 Vol 7, No3. ISSN 2008-4978 dimensional space (feature space) are written as the input samples space. By increasing dimensions, it is generally possible to increase linear rating. SVM finds optimal decision boundary in the feature space. It is determined by mapping hyper-plane into the inp ...

Slide 1 - Department of Computer Science

Paper Title (use style: paper title)

... introduces a novel hierarchical data structure, CFtree, for compressing the data into many small sub-clusters and then performs clustering with these summaries rather than the raw data. A Clustering Features Tree (CF-tree) is a hierarchical data structure for multiphase clustering. For each successi ...

Scaling Clustering Algorithms to Large Databases

network traffic clustering and geographic visualization

... To get around these obstacles, one proposal is to characterize network traffic based on features of the transport-layer statistics irrespective of port-based identification or payload content. The idea here is that different applications on the network will exhibit different patterns of behavior wh ...

International Journal of Computational Intelligence Volume 2

... Intersection of two sets is set of elements, which belong to both sets, simultaneously. Clustering is realised via using intersections. An intersection describes a pattern. All objects meeting the description form a cluster. The purpose is in case of need to find all existing intersections of attrib ...

Survey of Streaming Data Algorithms

Grid-based Supervised Clustering Algorithm using Greedy and

... grid-based methods, and model-based methods. Unlike the goal of traditional unsupervised clustering, the goal of supervised clustering is to identify class-uniform clusters that have high data densities [11],[24]. According to them, not only data attribute variables, but also a class variable, take ...

Parallel K-Means Algorithm for Shared Memory Multiprocessors

APPLYING PARALLEL ASSOCIATION RULE MINING TO

CS685 : Special Topics in Data Mining, UKY

Clustering - Network Protocols Lab

< 1 ... 113 114 115 116 117 118 119 120 121 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering