View PDF - International Journal of Computer Science and Mobile

... Today, the development of the computer technology and the degree of the informationization is getting higher and higher, so the people know that the data are needed by them is mass data on the present world. Data mining is the process of extracting important information and knowledge from the large ...

Locally adaptive metrics for clustering high dimensional data

... each object to the cluster with the closest representative, so that the sum of the squared differences between the objects and their representatives is minimized. Finding a set of representative vectors for clouds of multidimensional data is an important issue in data compression, signal coding, pat ...

COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 2

... CLIQUE: The Major Steps Partition the data space and find the number of points that lie inside each cell of the partition. Identify the subspaces that contain clusters using the ...

Analysis of Sequential Pattern Mining

... surveillance is used for this task. By monitoring ocean currents, weather patterns can be predicted in advance, warning populated areas under risk of hurricanes and tornados. However, these short-term warnings are effective only if relief programs are planned and efficiently carried out. We are focu ...

Cluster Ensembles for Big Data Mining Problems

... mensionality reduction method. Once a sample is available, clusterers φi obtain a data partition πi,j from it. A clusterer refers to a clustering algorithm with a fixed set of parameters (for example, a k-means instance with k = 7, or a selforganizing map with size 30x30). They can be run several ti ...

(PD) Algorithm for Finding All Frequent Patterns in Large Datasets

... The PD algorithm shrinks dataset each time when infrequent itemsets are discovered. More specifically, it finds frequent sets by employing a bottom-up search. For a given transaction dataset D1, the first pass has two phrases: 1) the algorithm counts for item occurrences to determine the frequent 1- ...

Decision Tree Construction

R Reference Card for Data Mining

... APRIORI Algorithm ...

Chapter 9 Part 1

Chapter11

A Method to Improve the Accuracy of K

Permission to make digital or hard copies of all or part of this work

... When the data has one dimension, we display its estimated probability distribution. When the data has two dimensions, it can be displayed using scatterplots [10]. When the data has more than three dimensions, we need to apply visualization techniques. There are several multivariate-data visualizatio ...

Agricultural Recommender Using Data Mining Techniques

... pattern from it. In this we need to make inferences from immense data so that we can make decisions driven by knowledge. Various factors which affect the production of crops like soil type, crop price and other factors are taken into consideration. Data mining is the process of knowledge discovery i ...

View/Download-PDF - International Journal of Computer Science

... distance are minimized and inter-class are maximized. In this paper we review k-means clustering technique. Selection of suitable cluster centre which will be centroid is the main aim of k-means clustering. Outlier detection, existing data object is do not obey with the general performance or model ...

Applying data mining in the context of Industrial Internet

Simulating Price Interactions by Mining Multivariate Financial Time Series

... Each time series is considered a feature of the training data, i.e., observations are the prices of the financial products for consecutive days. After performing a logarithmic normalization of the values, so that the specific range of each asset price is disregarded, the historical data forms the tr ...

An Efficient Mechanism for Data Mining with Clustering

... Today Information Technology plays a very important function in every aspects of the human life. It is extremely necessary to collect data from dissimilar sources. This data can be stored and maintained to create information and knowledge. Data mining is the non trivial procedure of identifying suit ...

Visual Data Mining: Framework and Algorithm Development

... candidate such that it meets user goal specifications. • Model Candidate Generator transforms the current model candidate into a new model candidate by selecting one model atom to expand from the expandable leaf model atoms. • Model Constraints (used by Model Candidate Generator) provide controls an ...

an unsupervised neural network and point pattern analysis approach

... map. A set of input vectors is used to train the SOM. For each input vector the neuron with the shortest Euclidean distance of its prototype vector is determined, which is also referred to as the best matching unit (BMU). Then, the BMU’s prototype vector and the prototype vectors in a certain vicini ...

Clustering and Outlier Analysis For Data Mining (COADM)

... Hence, if factors uncontrollable by the Blue Force, such as Red Force tactics and behavior, resulted in the circumstances becoming unfavourable (e.g. falling into Cluster 3 outcomes), Blue force must attempt to exploit outlier case 5921 by moving swiftly and stealthily, and engaging more aggressivel ...

A Topological-Based Spatial Data Clustering

... algorithm starts by clustering the data points initially using a small value of r (e.g., 0.1). After that the algorithm iteratively increases the value of r and cluster the objects. This process continues until the accuracy value (true ratio) starts to degrade or reaches 100% (which occurs first). ...

Revealing True Subspace Clusters in High Dimensions

... between the intersection area and the cluster boundary, as well as the percentage of data points falling in the intersection region. In each dimension, the adhesion strength h of two clusters is captured in terms of both data points and physical space. Two clusters can adhere to each other only if t ...

OPTICS: Ordering Points To Identify the Clustering Structure

Apriori Algorithm

... worker nodes. This fragmentation keeps the possibility to increase the efficiency of the data processing. Due to the asynchronous communication the worker nodes can process a data fragment while a new one is arriving through the network communication channel. In the current implementation the size o ...

A Survey On feature Selection Methods For High Dimensional Data

... suffers from two weakness that is it is hard to interpret the Result validation: Feature selection method must be validating resultant features when using all dimensions for embedding by carrying out different tests and comparison with previously and the original data inevitably contains noisy featu ...

< 1 ... 103 104 105 106 107 108 109 110 111 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering