Online Full Text

Classification rules + time = Temporal Rules

... information, to find/extract rules of the form “If a person contracted an influenza necessitating more than five days of hospitalization, then with a confidence p the same person will contract, in the next three month, a serious lung-infection”. The size of the database (several Gigabytes) made the ...

Scalable and interpretable data representation for high

... corpus (Salton 1991). It is often used as a weighting factor in information retrieval and text mining. The Tf-idf value increases proportionally with the number of times a word appears in a document, but is offset by the frequency of that word in the corpus, which helps to control for the fact that ...

A COMP 790-090 Data Mining

A COMP 790-090 Data Mining - UNC Computer Science

Document

Support vector machines based on K-means clustering for real

... Abstract: Support vector machines (SVM) have been applied to build classifiers, which can help users make well-informed business decisions. Despite their high generalisation accuracy, the response time of SVM classifiers is still a concern when applied into real-time business intelligence systems, s ...

A Comparative Study of Data Mining Algorithms for Image

lecture 4 - Maastricht University

Cluster Analysis on High-Dimensional Data: A Comparison of

PPT

A Novel Data Mining Approach for the Accurate Prediction of

descriptive - Columbia Statistics

On Building Decision Trees from Large-scale Data in

Lecture slides - Maastricht University

... •  John is quicker than Mary and Mary is quicker than John have the same vectors •  This is called the bag of words model. •  Other well-known problems: -  Breaks mulX-words (e.g. data mining) -  Does not consider synonymy, hyponymy, etc. -  Main problem is that it’s sparse! ...

Comparative analysis of different methods and obtained results

... structures in unlabeled data. Some of approaches in unsupervised learning are: clustering, neural networks, principal component analysis, independent component analysis, etc. In this paper, two methods of unsupervised learning will be used: k-means clustering and self-organizing maps. ...

Clustering

... Self-Organizing Feature Map (SOM) • SOMs, also called topological ordered maps, or Kohonen Self-Organizing Feature Map (KSOMs) • It maps all the points in a high-dimensional source space into a 2 to 3-d target space, s.t., the distance and proximity relationship (i.e., topology) are preserved as mu ...

04_cikm_vert_outlier_clust_byproduct

IREP++, a Faster Rule Learning Algorithm ∗ Oliver Dain Robert K. Cunningham

... high classification accuracy [11]. The most common ways IREP++ and IREP are designed for boolean valued targets to produce such rule sets from data are to first learn a only and this application shall be the focus of the following decision tree and then extract the rules from the tree [13] discussio ...

Detecting Subdimensional Motifs: An Efficient Algorithm for

Implementation of C4.5 and PAPI Kostick to Predict Students

... course will be a problem if the identification of potential leaders is experiencing various difficulties which required the classification of leadership in an organization to facilitate potential identification. The leadership crisis was influenced by two factors, namely first the loss of the main c ...

Cyberbullying Detection based on Text

NÁZEV ČLÁNKU [velikost14 pt]

TimeClassifier - Department of Computer Science

... We consider classification as an extended form of search in the time domain. For example, given a set of labelled data, classification operates by searching for matches for each labelled item and assigns the results accordingly to the corresponding classification group. A user oriented approach invo ...

Angle-Based Outlier Detection in High-dimensional Data

... and considers those objects as outliers that deviate considerably from the general characteristics of the groups. This approach has been pursued e.g. in [4, 27]. The forming of groups at random is rather arbitrary and so are the results depending on the selected groups. Forming groups at random, how ...

< 1 ... 65 66 67 68 69 70 71 72 73 ... 170 >

K-nearest neighbors algorithm

In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-nearest neighbors algorithm