Training RBF neural networks on unbalanced data

... the overlaps between different classes and the overlaps between clusters of the same class. The overlaps between different classes have been considered in RBF training algorithms. For example, overlappedreceptive fields of different clusters can improve the performance of the RBF classifier when dea ...

Outlier Detection for High Dimensional Data

... Many algorithms have been proposed in recent years for outlier detection [7, 8, 10, 22, 23, 25, 26], but they are not methods which are specically designed in order to deal with the curse of high dimensionality. The statistics community has studied the concept of outliers quite extensively [8]. In ...

Comparison of information retrieval techniques: Latent

... SVD like in LSI Concept decomposition was introduced in 2001 ...

Introduction

... Data mining is often associated with quantitative methods but it differs from standard statistical approaches. ...

Retail Marketing Segmentation and Customer Profiling for

... X ⇒ Y is the percentage of transactions in the database that contain X ∪ Y. That is, support (X ⇒ Y ) = P (X ∪ Y ), P is the probability. Definition 3: The confidence or strength ( Φ ) for an association rule (X ⇒ Y) is the ratio of the number of transactions that contain X ∪ Y to the number of tran ...

Dimensionality Reduction for Spectral Clustering

Outlier Detection Algorithms in Data Mining Systems

Data Dashboard-Integrating Data Mining with Data Deduplication

Improving Efficiency of Apriori Algorithm Using Transaction Reduction

WSARE: What`s Strange About Recent Events

... traditional anomaly detection systems, shortcomings in these systems, which we will illustrate, limit their usefulness in early disease outbreak detection. In our database of emergency department (ED) cases from several hospitals in a city, each record contains information about the individual who w ...

Association Rule Mining using Apriori Algorithm: A Survey

... multiple processors and databases to speed up the execution of data mining and enable data distribution. The main aim of grid computing is to give organizations and application developers the ability to create distributed computing environments that can utilize computing resources on demand. Therefo ...

Effective framework for prediction of disease outcome using medical

... Cardiac disorders diagnosis is based on SPECT (Single Photon Emission Computed Tomography) images. Bakirci and Yildirim (2004) used feed-forward ANN and achieved an accuracy of 90.04%. Polat et al. (2007c) proposed a method ensemble classifier system based on different feature subsets and AIRS class ...

View/Download-PDF - International Journal of Computer Science

... instance to a particular class with the aim of achieving least classification error. It is used to extract models that correctly define important data classes within the given dataset. It is a two-step process. In first step the model is created by applying classification algorithm on training data ...

Institutionen f¨ or datavetenskap An Evaluation of Clustering and Classification Algorithms in

"Efficient Kernel Clustering using Random Fourier Features"

Deductive and inductive reasoning on spatio-temporal data

... According to the local interpolation method, although there is not a global function describing the whole trajectory, objects are assumed to move between the observed points following some rule. For instance, a linear interpolation function models a straight movement with constant speed, while other ...

Web Search Result Optimization using Association Rule Mining

... higher than support values. Step 4: In next pass, algorithm creates item sets of three members. Repeat this process until all frequent item sets are accounted. Step 5: These item sets are then used to generate association rules which have threshold values less than or equal to confidence values. Ste ...

04_cikm_vert_outlier_clust_byproduct

A new method for the discovery of the best threshold value for

Research of Dr. Eick`s Subgroup

Class cover catch digraphs for latent class discovery in gene

Analysis of Hepatitis Dataset using Multirelational Association Rules

... period when measurements were made of the degree of fibrosis for the same patient. To properly analyze the time period involved, the exam date was divided into two attributes: year and month. The period of time considered for the analysis was one month. The exam results of patients with more than on ...

www.1000projects.com

A Parallel Clustering Method Study Based on MapReduce

... large scale data is an important issue. It is the development intention of big data science. Many scholars have done lots work on this topic. Some clustering methods based on MapReduce were proposed, such as k-means, EM, Dirichlet Process Clustering and so on. Though the clustering method based on I ...

Ranking Interesting Subspaces for Clustering High Dimensional Data*

... the whole feature space onto a lower-dimensional subspace of relevant attributes, using e.g. principal component analysis (PCA) and singular value decomposition (SVD). However, the transformed attributes often have no intuitive meaning any more and thus the resulting clusters are hard to interpret. ...

< 1 ... 73 74 75 76 77 78 79 80 81 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering