Frequency-aware Similarity Measures - Hasso-Plattner

... their work, we partition data according to frequencies and not based on different sources of the data. Moreover, we employ a set of similar matchers, i.e., we learn one similarity function for each of the partitions – but all of them with the same machine learning technique. Another idea is to use a ...

Mining Periodic Patterns in Sequence Data

Using Constraints During Set Mining: Should We Prune or not?

... In this paper, we consider inductive queries that return sets S ⊆ Items where Items denotes a collection of attributes1 . Example 1. A simple mining task: assume one wants to find all frequents itemsets that contain attribute A (where Items = {A, B, C, D, E}). Many works have been done on mining con ...

GRID-BASED SUPERVISED CLUSTERING ALGORITHM USING

... clustering, subspace clustering and supervised clustering are provided. Reviews on clustering algorithms relevant to the proposed algorithm are also given. ...

LG3120522064

... equivalence class. If it does, the database does not have to be accessed to determine the support of the itemset. This way the expensive database passes and support counts can be constrained to the case of generators only. From some level on, all generators can be found, thus all remaining frequent ...

A Framework for Trajectory Data Preprocessing for Data Mining

... parts of one single trajectory are identified, using a spatiotemporal clustering method that is a variation of the DBSCAN [3] algorithm considering one-dimensional line (trajectories) and speed. In the second step, the algorithm identifies where these potential stops (clusters) are located, consider ...

... problem, in which external content information can be used to produce predictions for new users or new items. In Ziegler et al. [28], a hybrid collaborative filtering approach was proposed to exploit bulk taxonomic information designed for exact product classification to address the data sparsity pr ...

Subgroup Discovery with Evolutionary Fuzzy Systems in R: The

... 6. Go to step 2 until a stopping criterion is reached. Normally this criterion is a number of evaluations or generations. These algorithms perform efficiently a global stochastic search through a huge search space. However, it is possible that these algorithms can not find an optimal solution (a glo ...

Faster Online Matrix-Vector Multiplication

... A truly subquadratic-time cell probe data structure. Unlike other popular algorithmic hardness conjectures like SETH, the 3SUM conjecture, the APSP conjecture etc., the OMV conjecture asserts a lower bound on data structures instead of traditional algorithms. Given the state-of-the-art in proving da ...

Visualizing Variable-Length Time Series Motifs

TopCat: Data Mining for Topic Identification in a Text

... multiple names may be used for a single entity. This gives us a high correlation between different variants of a name (e.g., Rios and Marcelo Rios) that add no useful information. We want to capture that these all refer to the same entity, mapping multiple instances to the same variant of the name, ...

Learning a Taxonomy of Predefined and Discovered Activity Patterns

... because of inherent differences between labeling techniques. In this paper we investigate a data-driven approach to creating an activity taxonomy from sensor data found in disparate smart home datasets. We investigate how the resulting taxonomy can help analyze the relationship between classes of ac ...

WEKA Overview

... Then choose the “ClassAssigner” from “Evaluation” tab. This icon will allow us to select which class is to be predicted. ...

Efficiently Mining Asynchronous Periodic Patterns

... Huang proposed a novel SMCA algorithm which requires no candidate pattern generation as compared to previous technique [6]. Their algorithm allows the mining of all asynchronous periodic patterns, not only in a sequence of events, but also in a temporal dataset with multiple event sets. In [3] they ...

Movement Data Anonymity through Generalization

... that this simple operation is insufficient to protect privacy. They proposed k-anonymity to make each record indistinguishable with at least k − 1 other records. In recent years many algorithms for k-anonymity have been developed [14, 11, 8, 15]. Although it has been shown that finding an optimal k- ...

Cluster Analysis for Large, High

... number of algorithms, originating from both statistics and computer science, have been proposed over the years (Jain et al., 1999; Berkhin, 2006). Following the recent technological progress, it is possible to produce everincreasing amounts of data of high complexity (e.g. sales histories or molecul ...

i COMPARATIVE STATISTICAL ANALYSES OF AUTOMATED BOOLEANIZATION METHODS FOR DATA MINING PROGRAMS

Enhanced ID3 algorithm based on the weightage of the Attribute

... ABSTRACT - ID3 algorithm a decision tree classification algorithm is very popular due to its speed and simplicity in construction but it has its own snags while classifying the ID3 algorithm and tends to choose the attributes with large values and practical complexities arises due to this. To solve ...

Mining Building Energy Management System Data

... thorough understanding of the system is required to operate them. Advanced CI based techniques have been previously used for ...

Paper Title (use style: paper title) - International Journal of Advanced

A Dynamic Indexing Technique for Multidimensional Non-Ordered Discrete Data Spaces, ACM Transactions on Database Systems, Vol. 31, No. 2, 2006, Gang Qian, Qiang Zhu, Qiang Xue and Sakti Pramanik.

... Ωd ) into/from/in the tree are needed. In this paper, our discussion is focused on the insertion issues and its related algorithms. However, for completeness, a deletion algorithm for the ND-tree is also described. The update operation can be implemented by a deletion followed by an insertion. 3.2.1 ...

Speculative Markov Blanket Discovery for Optimal Feature Selection

Unveiling the complexity of human mobility by querying and mining

Chapter 2 Knowledge Discovery and Data Mining

< 1 ... 16 17 18 19 20 21 22 23 24 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering