rough set theory and fuzzy logic based warehousing of

initialization of optimized k-means centroids using

... utilizes the echolocation behavior of bats. This algorithm does not require the user is given in advance the number of centroid. However this KMBA does not guarantee unique clustering because we get different results with randomly chosen initial clusters. The final cluster centroids may not be the o ...

Survey on Density Based Clustering for Spatial Data

An adaptive rough fuzzy single pass algorithm for clustering large

... divides the data set into a set of overlapping clusters. To de-ne the clusters it employs the Rough set theory and here each cluster is represented by a leader, a Lower Bound and an Upper Bound. The Lower Bound of a cluster contains all the patterns that de-nitely belong to the cluster. There can be ...

KACU: K-means with Hardware Centroid

... in particular the commercial FPGA [1][9], it creates a new scope for the design space, changes the view on algorithmic problem solving and has the advantage of being extremely powerful for many applications. In this paper we design a specific hardware solution to accelerate the processing speed of K ...

A Survey on Mining Actionable Clusters from High Dimensional

... Two novel algorithms to mine FCCs from 3D datasets are introduced. The first scheme is a Representative Slice Mining (RSM) framework that can be used to extend existing 2D FCP mining algorithms for FCC mining. The second technique, called CubeMiner, is a novel algorithm that operates on the 3D space ...

Statistical Computing

mt13-req

Analysis of Optimized Association Rule Mining Algorithm using

... version of Apriori algorithm was tested with a real data set. The dataset comprised of 1000 entries and 5000 entries [5]. The data set is that of bakery sales, which consists of entries in the form of a sparse vector representation: Receipt# followed by item #'s that are on that receipt This dataset ...

ii. requirements and applications of clustering

ppt - CIS @ Temple University

... then condense attribute lists by discarding examples that correspond to the pure node SLIQ is able to scale for large datasets with no loss in accuracy – the splits evaluated with or without pre-sorting are identical ...

a comparative study of different clustering technique

Text Documents Clustering

... Abstract— Big amounts of textual information are generated every day, and existing techniques can hardly deal with such information flow. However, users expect fast and exact information management and retrieval tools. Clustering is a well known technique for grouping similar data and in such a way ...

85. analysis of outlier detection in categorical dataset

... threshold value to find frequent item sets from dataset then these techniques can be-come very slow [11]. Attribute Value Frequency (AVF) algorithm is simple and faster approach to detect outliers in categorical dataset which minimizes the number of scans over the data. It does not create more space ...

What is CLIQUE - ugweb.cs.ualberta.ca

... Two K-dimensional units u1, u2 are connected if they have a common face, or if there exists other K-dim unit ui, such that u1, ui and u2 are connected consequently. A region in K dimensions is an axisparallel rectangular K-dimensional set. ...

Improving Clustering Performance on High Dimensional Data using

... Hubness is viewed as a local centrality measure and is possible to use it for clustering high dimensional data in various ways. There are two types of hubness, namely global hubness and local hubness [2]. Local hubness can be defined as a restriction of global hubness on any given cluster of the cur ...

Time Series Analysis of VLE Activity Data

Clustering - IDA.LiU.se

... Create a workflow diagram with an Input Data Source node and a Clustering node. Import and assign the data in ‘lakesurvey.xls’ to the Input Data Source node. This Excel document ‘lakesurvey.xls’ contains water quality data from a survey of 2782 Swedish lakes that was carried out in 2005. Further inf ...

lecture notes

... • Being able to deal with high-dimensionality • Minimal input parameters (if any) • Interpretability and usability • Reasonably fast (computationally efficient) ...

Clustering

Data Mining Techniques using in Medical Science

... process of grouping a set of physical or abstract objects into classes of similar objects is called clustering.. The Cluster tab is also supported which shows the list of machine learning tools. These tools in general operate on a clustering algorithm and run it multiple times to manipulating algori ...

Towards a Collaborative Platform for Advanced Meta-Learning in Healthcare Predictive Analytics

... OpenML is not fully distributed but can be installed on local instances which can communicate with the main OpenML database using mirroring techniques. The downside of this approach is that code (machine learning workflows), datasets, experiments (models and evaluations) are physically kept on local ...

Dimensionality Reduction Using CLIQUE and Genetic

a survey: fuzzy based clustering algorithms for big data

... K. Vidhya(Assist.Prof.(Sr.G)) has completed B.E(Computer Science and Engineering) from Muthayammal Engineering College, Namakkal and M.E from Government College of Technology , Coimbatore. She is pursuing research in the domain of Cloud based Data Analytics. She is presently working as an Assistant ...

< 1 ... 144 145 146 147 148 149 150 151 152 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering