Behavior of proximity measures in high dimensions

... two points. For low to medium dimensional data, density based algorithms such as DBSCAN [EKSX96], CLIQUE [AGGR98], MAFIA [GHC99], and DENCLUE [HK98] have shown to find clusters of different sizes and shapes, although not of different densities. However, in high dimensions, the notion of density is p ...

Secure Mining of the Outsourced Transaction Databases

Speeding up k-Means by GPUs

Cluster Center Initialization for Categorical Data Using Multiple

... of clusters in the data that may mislead the interpretations of the results. It also fall into problems when clusters are of differing sizes, density and non-globular shapes. K-means does not guarantee unique clustering due to random choice of initial cluster centers that may yield different groupin ...

a two-staged clustering algorithm for multiple scales

... Most clustering algorithms treat different fields of data with equal weights and calculate the “distance” using the same method. They ignore the fact that different fields of data have different scales; therefore, the “distance” should be calculated differently. This study incorporated a traditional ...

A Mixture Model of Clustering Ensembles

A Study of Network Intrusion Detection by Applying

... Where E is the sum of the square error for all objects in the data set; p is the point in space representing given object; and mi is the mean of cluster Ci(p and mi are multidimensional)[21] 2. K-MEDOIDS K-Medoids attempts to minimize the distance between points and its centroid. This clustering alg ...

Image Clustering For Feature Detection

... Data mining (the analysis step of the "Knowledge Discovery and Data Mining" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, ...

A Comprehensive Study of Challenges and Approaches

... Clustering has proven to be one of the most effective methods for analyzing datasets containing large number of objects with plentiful attributes. Clustering groups, or make clusters, of objects with similar attributes. A cluster is defined as a subset of objects of similar attribute and objects whi ...

Abstract - International Cartographic Association

... The automatic derivation of unknown information from databases is also known under the term Data Mining or Knowledge Discovery [Frawley et. al. 1991]. Data mining techniques are used to derive unknown information from huge data sets that are not visible for a human person. This applies only partly t ...

Sequence Clustering in Data Streams

College Recommendation System

... Vishwakarma Institute of Information Technology, Pune. 411 048 Abstract—Educational organizations are one of the important parts of our society and playing a vital role for growth and development of any nation.For that getting appropriate college is of foremost importance.We are proposing a system w ...

Study of Hybrid Genetic algorithm using Artificial Neural Network in

Data mining process - Department of Computer Science

LeaDen-Stream - Scientific Research Publishing

Spectral Clustering Using Optimized Gaussian Kernel

The machine learning in the prediction of elections

... to the global objective function minimum. The algorithm is also significantly sensitive to the initial randomly selected cluster centres. The k-means algorithm can be run multiple times to reduce this effect. K-means is a simple algorithm that has been adapted to many problem domains, it is a good c ...

A Density-Based Algorithm for Discovering Clusters in Large Spatial

Security Applications for Malicious Code Detection Using

An Incremental Hierarchical Data Clustering Algorithm Based on

... design of modern clustering algorithms is that, in many applications, new data sets are continuously added into an already huge database. As a result, it is impractical to carry out data clustering from scratch whenever there are new data instances added into the database. One way to tackle this cha ...

StreamDM: Advanced Data Mining in Spark Streaming

ICARUS, arxiv:0812:2373 - IDS-NF

... LArsoft: ArgoNeuT, MicroBooNE, data analysis code A. Rubbia's group on data analysis (travel grant) ...

A Study of Various Clustering Algorithms on Retail Sales

... and the selected attributes or features. Research areas include data mining, statistics, machine learning, biology, special database technology and marketing. Clustering is an unsupervised learning. Different from classification, it does not rely on predefined classes and class labels training examp ...

CS2075964

... METHODS A set of clustering, find a single clustering that agrees as much as possible with the input clustering. An important issue in combining cluster is that this is particularly useful if they are different. This can be achieved by using different feature sets as well as by different training se ...

literature review on data mining techniques

< 1 ... 109 110 111 112 113 114 115 116 117 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering