Customer Segmentation and Customer Profiling

Data Mining Introduction-I

... predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data ...

An Overview of Data Mining Techniques

... databases of equal numbers of records. Mode - the most common value for the predictor. Variance - the measure of how spread out the values are from the average value. When there are many values for a given predictor the histogram begins to look smoother and smoother (compare the difference between t ...

CPSC445_term_projects_2008-v2

... (5) Explore techniques for handling problems with training sets which are missing data, see http://bioinformatics.oupjournals.org/cgi/reprint/17/6/520.pdf. http://binf.gmu.edu/%7Ejweller/pages/BINF733_s2005_pdf/Kim_mvLocalLSQ_ Bioinformatics2005.pdf Try out these techniques on some training sets of ...

A Topological-Based Spatial Data Clustering

... The algorithm initially computes the new appropriate value of radius r experimentally, then iteratively selects a point with flag field equal to zero from the point’s table and scans the database in order to compare the current point with other points. The points that related with overlap and meet r ...

Determining the number of clusters using information entropy for

An Efficient Algorithm for Finding Similar Short Substrings from

Some contributions to semi-supervised learning

Consider the following problem

YADING: Fast Clustering of Large-Scale Time Series Data

... conducting clustering on the sampled dataset, and assigning the rest of the input data to the clusters generated on the sampled dataset. In particular, we provide theoretical proof on the lower and upper bounds of the sample size, which not only guarantees YADING’s high performance, but also ensures ...

A Practical Differentially Private Random Decision Tree Classifier

... Privacy has been a concern since the early days of database technology, and many privacy approaches to data analysis have been considered over the years, including output perturbation (e.g., [1]), data perturbation (e.g., [2]), secure multiparty computation (e.g., [19, 23]), and anonymization-based ...

Automated Learning and Data Visualization

Reinforcement Learning for Neural Networks using Swarm Intelligence

... candidate solution for a given problem. Second, when an ant follows a path, the amount of pheromone deposit on that path is proportional to the quality of the corresponding candidate solution for the target problem. Third, when an ant has to choose between two or more paths, the path(s) with a large ...

chapter 6 data mining

Abstract

... responsibilities and availabilities of the new arriving objects should be assigned referring to their nearest neighbors. NA is proposed based on such a fact that if two objects are similar, they should not only be clustered into the same group,but also have the same relationships (responsibilities a ...

Clustering of the self-organizing map

... • In addition, several such growing variants of the SOM have been proposed where the new nodes do have a welldefined place on low-dimensional grid, and thus, the visualization would not be very problematic [23]–[27]. The SOM variants were not used in this study because we wanted to select the most c ...

A Local Discretization of Continuous Data for Lattices: Technical Aspects

... This is global discretization. It can also be run during model construction considering, at each step, only a part of the training set. This is local discretization. In [7], Quinlan shows that local discretization improves supervised classification with decision trees (DTs) as compared with global d ...

Clustering Methods in High

A Class Imbalance Learning Approach to Fraud

... Due to the limitations of mobile phone networks, new methods are needed to evaluate the fraudulent behavior of ad publishers. New features have to be created using existing parameters to capture the behavior of the publishers. Previous studies [35, 43] show that these fraud publishers try to act rat ...

Discrimination Aware Decision Tree Learning*

... Many anti-discrimination laws, e.g., the Australian Sex Discrimination Act 1984, the US Equal Pay Act of 1963 and the US Equal Credit opportunity act have been enacted to eradicate the discrimination and prejudices. It is quite intuitive that if some discriminatory practice is banned by law, nobody ...

6.896 Project Presentations

Locally Scaled Density Based Clustering

... In Ester et al. [1], density based clustering (DBSCAN) was presented as a clustering technique which can discover clusters of arbitrary shape. Hinneburg and Keim [8] introduced a new density based clustering technique as DENCLUE, which sums the density impact of a data point within its neighborhood. ...

Soft TDCT: A Fuzzy Approach towards Triangle Density based

Spatial associative classification: propositional vs structural approach

... Wu, & Chawla, 2002). Some well-known formalizations, such as the 9-intersection model for topological relationships (Egenhofer, 1991), are unsatisfactory in many applications, since the end-user of a data mining solution is often interested in human-interpretable properties and relations between spa ...

lecture1428550844

... In most cases, data-mining models should help in decision making. Hence, such models need to be interpretable in order to be useful because humans are not likely to base their decisions on complex "black-box" models. Note that the goals of accuracy of the model and accuracy of its interpretation are ...

< 1 ... 18 19 20 21 22 23 24 25 26 ... 170 >

K-nearest neighbors algorithm

In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-nearest neighbors algorithm