Rheinisch-Westfälische Technische Hochschule Aachen

... Hall. The name Weka is an acronym for Waikato Environment for Knowledge Analysis. The weka, a bird domiciled in New Zealand, is its symbol. It is a collection of dierent data mining tools and provides the right response for most real world data set problems. Researchers who work with data sets have ...

Gaussian Mixture Density Modeling, Decomposition, and Applications

TrajectoryPatternMining - Georgia Institute of Technology

...  A simple preprocessing step can alleviate this ...

The K-Medoids Clustering Method A Typical K

... Assign each object to a cluster according to a weight (prob. distribution) ...

Combining Multiple Clusterings by Soft Correspondence

... they appear in the same cluster from an ensemble. Kellam et al. [13] used the co-association matrix to find a set of so-called robust clusters with the highest value of support based on object co-occurrences. Fred [9] applied a votingtype algorithm to the co-association matrix to find the final clus ...

Slides - Network Protocols Lab

... Cheng and Church • Handling missing values and masking discovered biclusters: replace by random numbers so that no recognizable structures will be introduced. • Data preprocessing: – Yeast: x  100log(105x) – Lymphoma: x  100x (original data is already logtransformed) ...

Foundations of Perturbation Robust Clustering

... Pk the P clustering C2 = {C1 , . . . , Ck } that minimizes i=1 x∈Ci d(x, ci ) , where ci is the center of mass of cluster Ci . Many different notions of clusterability have been proposed in prior work [1, 13]. Although they all aim to quantify the same tendency, it has been proven that notions of cl ...

Fuzzy adaptive resonance theory: Applications and

... fuzzy logic, genetic and evolutionary computing, and artificial immune systems). Biologically-inspired machine learning methods have seen success in linear and nonlinear function approximations, data processing, and classification. Applications include filtering, adaptive control, pattern recognitio ...

Novel Intrusion Detection System Using Hybrid Approach

... The goal of classification is to categorize data into distinct classes. Classification is two-step process. The first step is learning process. In this training data are analysed by a classifier algorithm. In second phase classification is done. Test data are used to estimate the accuracy of the cla ...

Semi-supervised Clustering using Combinatorial MRFs

... clusterings of this set is, say, {{x1 , x3 }, {x2 }}. We can define a random variable X̃ over this set of clusters, so that it can take two values: x̃1 = {x1 , x3 } and x̃2 = {x2 }. There are five possible clusterings of {xi }: x̃c1 = {{x1 , x2 , x3 }}, x̃c2 = {{x1 }, {x2 , x3 }}, x̃c3 = {{x1 , x2 } ...

Educational Data Mining by Using Neural Network

... introduced by Breiman in 1984.It builds both classifications and regression trees. The classification tree construction by CART is based on binary splitting of the attributes. It is also based on Hunt’s algorithm and can be implemented serially. It uses gini index splitting measure in selecting the ...

evaluating the performance of association rule mining algorithms

... Abstract: Association rule mining is one of the most popular data mining methods. However, mining association rules often results in a very large number of found rules, leaving the analyst with the task to go through all the rules and discover interesting ones. In this paper, we present the performa ...

Extracting Diagnostic Rules from Support Vector Machine

view - dline

... number of units. In each unit, the statistical parameters of the storage object, such as maximum, minimum, distribution type, variance and mean and so on. Then, you can all clustering operations on the space of this quantization. Grid-based algorithm execution time is not the number of data objects. ...

Clustering daily patterns of human activities in the city

... demographic and economic characteristics of the studied subjects. While the new datasets allow us to study massive aggregated travel behavior and social interactions, they have limited capacity in revealing the underlying reasons driving human behavior (Nature Editorial 2008). In order to have detai ...

Detecting Subdimensional Motifs: An Efficient Algorithm for

... (1) adding increasingly large amounts of noise to a single distracting noise dimension and (2) adding additional irrelevant dimensions each with a moderate amount of noise. The non-synthetic data set was captured during an exercise regime made up of six different dumbbell exercises. A three-axis acc ...

Chapter 1

Data Mining for the Discovery of Ocean Climate Indices

... Nino/La Nina events. An ecosystem model for predicting NPP, CASA (the Carnegie Ames Stanford Approach [PKB99]), has been used for over a decade to produce a detailed view of terrestrial productivity. Our goal in the investigations of OCIs is to use an improved understanding of the effect of OCIs on ...

A Literature Review on Data Mining and its Techniques

as a PDF

... LM team, other studies might require more elaborate tools. For example, the LM team only used five of the eight dimensions of the standard ODC scheme (the four shown in Figure 1 plus “impact”, which is implicit in the selection by the LM team of only high-criticality anomalies). The other three dime ...

A Streaming Parallel Decision Tree Algorithm

... a distributed environment, using only one pass on the data. We refer to the new algorithm as the Streaming Parallel Decision Tree (SPDT). Decision trees are simple yet effective classification algorithms. One of their main advantages is that they provide human-readable rules of classification. Decis ...

Clustering (1)

... Cluster analysis (or clustering, data segmentation, …)  Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters Unsupervised learning: no predefined classes (i.e., learning by observations vs. learning by examples: supervi ...

Cluster Subspace Identification Via Conditional Entrophy Calculation

Text Mining: Finding Nuggets in Mountains of Textual Data

Mining Trajectory Data

... and retrieve features based on their geographic location over the time; such features include Stay Points (SP) and Points of Interest (POI) which can be useful to understand users’ interaction and similarity, and both understand individuals’ movement patterns and find interesting places in a certain ...

< 1 ... 83 84 85 86 87 88 89 90 91 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering