Data-Oblivious Data Structures

... We will consider algorithms executed on a word RAM, with a word size of Θ(log n), where n is the size of the input (or the capacity of the data structure, as appropriate), and the entire memory consists of poly(n) words.1 The RAM has a constant number of public and secret registers, and can perform ...

Paper - George Karypis

... hand corner of the matrix will be idle for more time as the size of the matrix increases. But, the two dimensional cyclic mapping solves this problem by localizing the pipeline to a xed portion of the matrix independent of the size of the matrix. This is a very desirable property for achieving sca ...

Using Reinforcement Learning to Spider the Web Efficiently

... naive Bayes text classifiers, performing the mapping by casting this regression problem as classification [Torgo and Gama, 1997]. We discretize the discounted sum of future reward values of our training data into bins, place the text in the neighborhood of the hyperlinks into the bin corresponding t ...

LD4KD2014 Linked Data for Knowledge Discovery - CEUR

... such as reliability, heterogeneity, provenance or completeness. Many areas of research have adopted these principles both for the management and dissemination of their own data and for the combined reuse of external data sources. However, the way in which Linked Data can be applicable and beneficial ...

Mining Top-k Covering Rule Groups for Gene

Representation is Everything: Towards Efficient and Adaptable

Outline I

Slides

...  Same project – Real World Datasets  Run more instances of the experiments  Control over parameters ...

Survey on Clustering Algorithms for Sentence Level Text

Trajectory Clustering: A Partition-and-Group Framework

... The MDL cost consists of two components [9]: L(H) and L(D|H). Here, H means the hypothesis, and D the data. The two components are informally stated as follows [9]: “L(H) is the length, in bits, of the description of the hypothesis; and L(D|H) is the length, in bits, of the description of the data w ...

Chapter 9 Part 1

... The result of minimization is a partition matrix and a collection of prototypes The methods in this class are conceptually and algorithmically appealing © 2007 Cios / Pedrycz / Swiniarski / Kurgan ...

R Reference Card for Data Mining

... APRIORI Algorithm ...

Demand Forecast for Short Life Cycle Products

... the information available and to organize this according to natural clusters (Wu et al. , 2009; Li et al. , 2012). The forecasts may be carried out according to the results of that cluster analysis. Therefore we also consider the following research hypothesis. Forecasts based on data obtained by mea ...

Chapter 9 Part 1

Clustering Ensembles: Models of Consensus and Weak Partitions

... ensembles as a new branch in the conventional taxonomy of clustering algorithms [26, 27]. Please see the Appendix for detailed review of the related work, including [7, 11, 16, 19, 28, 31, 35]. The problem of clustering combination can be defined generally as follows: given multiple clusterings of t ...

Soil data clustering by using K-means and fuzzy K

... Clustering is a process of grouping similar sets of data. This grouping is unsupervised; it is done without using known structures in the data. Clustering aims to make clusters with data samples which are more similar to each other than to data samples that belong to the other clusters. Each cluster ...

On Using Class-Labels in Evaluation of Clusterings

here

paper manuscript submitted to the computer journal

... for detection of auction fraudsters, respectively. Ku et al. attempted to detect Internet auction fraud using social network analysis (SNA) and decision trees [6]. They demonstrated that their approach could provide a feasible way of monitoring and protecting buyers from auction fraudsters. Ochaeta ...

On-line Human Activity Recognition from Audio and Home

Impact of Evaluation Methods on Decision Tree Accuracy Batuhan

... machine learning (ML) and knowledge discovery in databases (KDD) are the processes that enable turning data into useful knowledge. Application of these processes has become more common for the past years and is becoming even more frequent. Data mining is one of the mostly applied processes to make u ...

AN EFFICIENT HILBERT CURVE

... such that the data points within a cluster are more similar to each other than data points in dierent clusters. Cluster analysis has been widely applied to many areas such as medicine, social studies, bioinformatics, map regions and GIS, etc. In recent years, many researchers have focused on nding ...

Ensemble Learning Techniques for Structured

... classification models such as decision trees, artificial neural networks, Naïve Bayes, as well as many other classifiers (Kim, 2009). Ensemble learning, based on aggregating the results from multiple models, is a more sophisticated approach for increasing model accuracy as compared to the traditiona ...

Inductive Intrusion Detection in Flow-Based

Automatic Subspace Clustering of High Dimensional Data

... threshold τ , are only used to approximate the density of the space. We do not presume specific mathematical forms for data distribution; instead, data points are separated according to the valleys of the density function. Related work. A similar approach to clustering high dimensional data has been ...

< 1 ... 14 15 16 17 18 19 20 21 22 ... 170 >

K-nearest neighbors algorithm

In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-nearest neighbors algorithm