Movement Data Anonymity through Generalization
... represent typical or unexpected customer and user behavior. The collection and the disclosure of personal, often sensitive, information increase the risk of violating a citizen’s privacy. Much research thus focused on privacy-preserving data mining [2, 25, 11, 15]. These approaches enables knowledge ...
... represent typical or unexpected customer and user behavior. The collection and the disclosure of personal, often sensitive, information increase the risk of violating a citizen’s privacy. Much research thus focused on privacy-preserving data mining [2, 25, 11, 15]. These approaches enables knowledge ...
rastogi02[2]. - Computer Science and Engineering
... – Throw away coefficients that give small increase in error Garofalakis, Gehrke, Rastogi, KDD’02 # 29 ...
... – Throw away coefficients that give small increase in error Garofalakis, Gehrke, Rastogi, KDD’02 # 29 ...
- D-Scholarship@Pitt
... help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explain ...
... help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explain ...
towards outlier detection for high-dimensional data
... Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers in high-dimensional data are projected outliers, i.e., they are embedded in lowerdimensional subspaces. Detecting projected outliers fr ...
... Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers in high-dimensional data are projected outliers, i.e., they are embedded in lowerdimensional subspaces. Detecting projected outliers fr ...
7_Mini
... o No (none, nada, zippo) training required o All computation deferred to scoring phase In ...
... o No (none, nada, zippo) training required o All computation deferred to scoring phase In ...
Prototype-based Classification and Clustering
... partitioning approaches are not always appropriate for the task at hand, especially if the groups of data points are not well separated, but rather form more densely populated regions, which are separated by less densely populated ones. In such cases the boundary between clusters can only be drawn w ...
... partitioning approaches are not always appropriate for the task at hand, especially if the groups of data points are not well separated, but rather form more densely populated regions, which are separated by less densely populated ones. In such cases the boundary between clusters can only be drawn w ...
Discovering Colocation Patterns from Spatial Data Sets: A General
... may explore such strategies in future work. Geometric Approach. The geometric approach can be implemented by neighborhood relationship-based spatial joins of table instances of prevalent colocations of size k with table instance sets of prevalent colocations of size 1. In practice, spatial join oper ...
... may explore such strategies in future work. Geometric Approach. The geometric approach can be implemented by neighborhood relationship-based spatial joins of table instances of prevalent colocations of size k with table instance sets of prevalent colocations of size 1. In practice, spatial join oper ...
On Monotone Data Mining Languages
... It is natural to ask a related question about the relationship between the logical form of sentences and the applicability of a given data mining technique, like the a-priori technique: “What is the relationship between the logical form of sentences to be discovered and the applicability of a given ...
... It is natural to ask a related question about the relationship between the logical form of sentences and the applicability of a given data mining technique, like the a-priori technique: “What is the relationship between the logical form of sentences to be discovered and the applicability of a given ...
K-nearest neighbors algorithm
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.