Clustering Sentence-Level Text Using a Novel Fuzzy Relational
... Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm Abstract—In comparison with hard clustering methods, in which a pattern belongs to a single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with ...
... Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm Abstract—In comparison with hard clustering methods, in which a pattern belongs to a single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with ...
Complete Paper
... data. The decision is grown using Depth-first strategy. The decision trees generated by C4.5 can be used for classification. The algorithm considers all the possible tests that can split the data set and selects a test that gives the best information gain. In every internal node the condition of some ...
... data. The decision is grown using Depth-first strategy. The decision trees generated by C4.5 can be used for classification. The algorithm considers all the possible tests that can split the data set and selects a test that gives the best information gain. In every internal node the condition of some ...
full paper - Frontiers in Artificial Intelligence and Applications (FAIA)
... We hereafter present our approach on Distributed Stacking of multiple classifiers that attempts to counter the problems of large-scale distributed data mining explained in the previous section. It can be broken down into the following phases: 1. Local Learning. Suppose that there are N distributed d ...
... We hereafter present our approach on Distributed Stacking of multiple classifiers that attempts to counter the problems of large-scale distributed data mining explained in the previous section. It can be broken down into the following phases: 1. Local Learning. Suppose that there are N distributed d ...
Data Mining with Cellular Discrete Event Modeling and
... communities over the past several decades [3]. In this task, data is classified into different classes according to any criteria, and not only in terms of their relative importance or frequency of use. Classification, as a form of data analysis, is a well-known method used for extracting models desc ...
... communities over the past several decades [3]. In this task, data is classified into different classes according to any criteria, and not only in terms of their relative importance or frequency of use. Classification, as a form of data analysis, is a well-known method used for extracting models desc ...
Improved Comprehensibility and Reliability of Explanations via Restricted Halfspace Discretization
... minimum description length principle ([22, 23, 45]) and other schemes using information gain were the most widely used methods, with strong performance with regard to prediction accuracy; see for example [3, 4, 21, 35]. Recent research has produced a number of new discretization algorithms [12, 13, ...
... minimum description length principle ([22, 23, 45]) and other schemes using information gain were the most widely used methods, with strong performance with regard to prediction accuracy; see for example [3, 4, 21, 35]. Recent research has produced a number of new discretization algorithms [12, 13, ...
Evaluating data mining algorithms using molecular dynamics
... well-known data mining toolkit Weka alone offers 65 different classification algorithms, each equipped with different configuration options (Hall et al., 2009). Facing the challenge of selecting a few algorithms with the potential for yielding good results, we decided to conduct a comprehensive set ...
... well-known data mining toolkit Weka alone offers 65 different classification algorithms, each equipped with different configuration options (Hall et al., 2009). Facing the challenge of selecting a few algorithms with the potential for yielding good results, we decided to conduct a comprehensive set ...
Unsupervised naive Bayes for data clustering with mixtures of
... by means of Gaussian distributions. In this implementation the user can specify, beforehand, the number of clusters or the EM can decide how many clusters to create by cross validation. In the last case, the number of clusters is initially set to 1, the EM algorithm is performed by using a 10 folds ...
... by means of Gaussian distributions. In this implementation the user can specify, beforehand, the number of clusters or the EM can decide how many clusters to create by cross validation. In the last case, the number of clusters is initially set to 1, the EM algorithm is performed by using a 10 folds ...
automatic discretization in preprocessing for data analysis in
... requirements on the data presented to it. Preprocessing of the data is an essential part in any data mining process. Some of the methods require the input data to be discrete, taken rough sets and association rules, for example [4]. That is why a new task in preprocessing is needed: discretization. ...
... requirements on the data presented to it. Preprocessing of the data is an essential part in any data mining process. Some of the methods require the input data to be discrete, taken rough sets and association rules, for example [4]. That is why a new task in preprocessing is needed: discretization. ...
evaluation of data mining classification and clustering - MJoC
... by the data mining tool WEKA. According to the analysis results, C4.5 Decision Tree algorithm has been the algorithm which has the best classification accuracy with a classification accuracy of 91% (Vijayarani and Sudha, 2013). In this study, 10 different data mining algorithms have been used in the ...
... by the data mining tool WEKA. According to the analysis results, C4.5 Decision Tree algorithm has been the algorithm which has the best classification accuracy with a classification accuracy of 91% (Vijayarani and Sudha, 2013). In this study, 10 different data mining algorithms have been used in the ...
String Edit Analysis for Merging Databases
... use a modification of the genetic algorithm to avoid the convergence to local minima. The algorithm uses a set of genes (20 in the example below), each of which represents an individual cost set including all of the edit rules. First, each gene is initialized with some random set of costs. Next, eac ...
... use a modification of the genetic algorithm to avoid the convergence to local minima. The algorithm uses a set of genes (20 in the example below), each of which represents an individual cost set including all of the edit rules. First, each gene is initialized with some random set of costs. Next, eac ...
Object Recognition Using Discriminative Features and Linear Classifiers Karishma Agrawal Soumya Shyamasundar
... ensemble methods was done by taking the majority vote of all the predicted labels for each image. This gave us the a new set of labels which achieved an increase in the classification accuracy. Our best accuracy, 56.876% was obtained via majority voting. ...
... ensemble methods was done by taking the majority vote of all the predicted labels for each image. This gave us the a new set of labels which achieved an increase in the classification accuracy. Our best accuracy, 56.876% was obtained via majority voting. ...
K-nearest neighbors algorithm
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.