Accelerating BIRCH for Clustering Large Scale Streaming Data

Data Mining Methods for Recommender Systems

... For instance, in stratified sampling the data is split into several partitions based on a particular feature, followed by random sampling on each partition independently. The most common approach to sampling consists of using sampling without replacement: When an item is selected, it is removed from ...

An Efficient Algorithm for Mining Frequent Items in Data Streams

... technique called as dynamic tree restructuring technique to handle the stream data. The main disadvantage of this algorithm is every time a new item arrives, it reconstructs the tree. So it causes more memory space as well as time. Pauray S.M. Tsai [13] proposed a new technique called the weighted s ...

A Hybrid Fuzzy Firefly Based Evolutionary Radial Basis Functional

... To compare the performance of our proposed method we have considered two other evolutionary methods which are FA-RBFN and PSO-RBFN. In case of FA-RBFN method all the parameters of the RBF network are optimized by means of the firefly algorithm simultaneously. So, a firefly is encoded as a combinatio ...

A Multi-Resolution Clustering Approach for Very Large Spatial

... Grid-based method (STING) for spatial data mining [WYM97]. They divide the spatial area into rectangular cells using a hierarchical structure. They store the statistical parameters (such as mean, variance, minimum, maximum, and type of distribution) of each numerical feature of the objects within ce ...

Extended Naive Bayes classifier for mixed data

... where nac is the number of instances in the training set which has the class value c and an attribute value of a, while na is the number of instances which simply has an attribute value of a. Due to horizontal partitioning of data, each party has partial information about every attribute. Each party ...

Data mining on graphics processors

... The results are shown as Fig.6, Fig.7 and Fig.8. From the experiments, we can see that TBI-GPU performs better than TBI CPU in data set Retail and Chess, and PBI-GPU performs better than TBI-GPU in most of the cases. PBI-GPU, TBI-GPU and TBI-CPU, implemented by this paper perform better than the ori ...

An Efficient Approach for Test Suite Reduction using Density based

... specification. [2]Classification rules are generated first with the help of Weka classifier that classifies SRS into functional and non-functional requirements from which state diagram is derived which is then transformed into test cases upon which clustering techniques are applied to mine test case ...

Extensions to the k-Means Algorithm for Clustering Large Data Sets

No Slide Title - CSE, IIT Bombay

... Repeat until stabilization: Assign each point to closest cluster center Generate new cluster centers Adjust clusters by merging/splitting ...

Association Rule Generation in Streams

IOSR Journal of Computer Engineering (IOSR-JCE)

Mining Frequent Patterns Without Candidate Generation

... Some popular ones include: Minkowski distance: d (i, j)  q (| x  x |q  | x  x |q ... | x  x |q ) i1 j1 i2 j2 ip jp where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two pdimensional data objects, and q is a positive integer ...

Robust Outlier Detection Technique in Data Mining- A

Preface - Society for Industrial and Applied Mathematics

... Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. There have been many clustering algorithms scattered in publications in very diversified areas such as pattern recognition, artificial intelligence, information technology, image processing, biology, p ...

Mining Quantitative Maximal Hyperclique Patterns: A

... attributes. An intuitive and simple solution is to partition quantitative attributes into binary attributes. However, there is potential information loss due to partitioning. Instead, our approach is based on a normalization scheme and can directly work on quantitative attributes. In addition, we ad ...

Chapter 10

...  Typical methods: COD (obstacles), constrained clustering Link-based clustering:  Objects are often linked together in various ways  Massive links can be used to cluster objects: SimRank, LinkClus ...

Anomaly Detection Using Mixture Modeling

Characterization of unsupervised clusters with the simplest

... examples into clusters such that examples within a cluster are similar. Recently, an important research effort has been devoted to the integration of cluster characterization into such methods. In conceptual clustering, examples are given by attribute-value pairs (e.g., the definition of medical sym ...

Unsupervised learning

A Robust Data Scaling Algorithm for Gene Expression Classification

... gene selection [5], and dynamic modeling [6]. Before performing any machine learning and data mining tasks, a preprocessing step is always recommended to smooth, generalize, and scale the data [7]. Data scaling is particularly important for models that utilize distance measures; e.g., nearest neighb ...

Branko Kavšek, Nada Lavrač - ailab

... data and decided to examine only the accidents that happened in 10 districts (called Local Authorities (LAs)) across Great Britain. We have chosen the 5 areas with the most increasing trend of accidents and 5 areas with the most decreasing trend according to the results of regression analysis of the ...

Consensus Clustering

... to find K cluster centers that are furthest from each other. The algorithm starts with finding a pair of data points that are furthest apart, and assigning them as cluster centers. Then it repeatedly finds a next cluster center that is furthest apart from previous found centers. At the end, all data ...

PPT

Developing High Risk Clusters for Chronic Disease Events with

... The following section of this paper presents some general definitions for understanding of our approach. In section 3, we introduce our approach in general, including data description and staple data preprocessing. That is followed by section 4 and 5 ...

< 1 ... 91 92 93 94 95 96 97 98 99 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering