
New Trends in E-Science: Machine Learning and Knowledge
... even higher cost. From an application development point of view, this will require a fundamental paradigm shift from the currently sequential or parallel programming approach in scientific applications to a mix of parallel and distributed programming that builds programs that exploit low latency in ...
... even higher cost. From an application development point of view, this will require a fundamental paradigm shift from the currently sequential or parallel programming approach in scientific applications to a mix of parallel and distributed programming that builds programs that exploit low latency in ...
Optimal Ensemble Construction via Meta-Evolutionary
... Recently many researchers have combined the predictions of multiple classifiers to produce a better classifier, an ensemble, and often reported improved performance [1–3]. Bagging [4] and Boosting [5,6] are the most popular methods for creating accurate ensembles. Bagging is a bootstrap ensemble met ...
... Recently many researchers have combined the predictions of multiple classifiers to produce a better classifier, an ensemble, and often reported improved performance [1–3]. Bagging [4] and Boosting [5,6] are the most popular methods for creating accurate ensembles. Bagging is a bootstrap ensemble met ...
A Parallel Spatial Co-location Mining Algorithm
... So-called “Big data” is a fact of today’s world and brings not only large amounts of data but also various data types that previously would not have been considered together. Richer data with geolocation information and date and time stamps is collected from numerous sources including mobile phones, ...
... So-called “Big data” is a fact of today’s world and brings not only large amounts of data but also various data types that previously would not have been considered together. Richer data with geolocation information and date and time stamps is collected from numerous sources including mobile phones, ...
PRACTICAL K-ANONYMITY ON LARGE DATASETS By Benjamin
... As we spend more of our time online in information-rich and personalized environments, it becomes increasingly easier for details from our offline life to meld with their online presence. Through Facebook and other social networks, our preferences in friends, food, and games becomes visible to other ...
... As we spend more of our time online in information-rich and personalized environments, it becomes increasingly easier for details from our offline life to meld with their online presence. Through Facebook and other social networks, our preferences in friends, food, and games becomes visible to other ...
An Effective Determination of Initial Centroids in K-Means
... using the available spatial information. A semivariogrambased grid clustering technique is used in this approach. It utilizes the spatial correlation for obtaining the bin size. The author combines this approach with a conventional k-means clustering technique as the bins are constrained to regular ...
... using the available spatial information. A semivariogrambased grid clustering technique is used in this approach. It utilizes the spatial correlation for obtaining the bin size. The author combines this approach with a conventional k-means clustering technique as the bins are constrained to regular ...
fulltext - Simple search
... 2 Clustering in Social Web Dividing people into different groups is one of human natures. Previously, people used clustering in order to study phenomena and compare them with other phenomena based on a certain set of rules. Clustering refers to grouping similar things together. It is a division of da ...
... 2 Clustering in Social Web Dividing people into different groups is one of human natures. Previously, people used clustering in order to study phenomena and compare them with other phenomena based on a certain set of rules. Clustering refers to grouping similar things together. It is a division of da ...
Handling the Class Imbalance Problem in Binary Classification
... Natural processes often generate some observations more frequently than others. These processes result in an unbalanced distributions which cause the classifiers to bias toward the majority class especially because most classifiers assume a normal distribution. The quantity and the diversity of imba ...
... Natural processes often generate some observations more frequently than others. These processes result in an unbalanced distributions which cause the classifiers to bias toward the majority class especially because most classifiers assume a normal distribution. The quantity and the diversity of imba ...
clustering sentence level text using a hierarchical fuzzy
... large amount of data is difficult problem. One of the most important methods to help the users efficiently is document clustering. This is powerful method to organize the text documents. To cluster the meaningful documents from large amount of documents, a document clustering is used. Document clust ...
... large amount of data is difficult problem. One of the most important methods to help the users efficiently is document clustering. This is powerful method to organize the text documents. To cluster the meaningful documents from large amount of documents, a document clustering is used. Document clust ...
Ontology-based Distance Measure for Text Clustering
... Combining this mutual information matrix and the traditional vector space model (VSM ), we design a new data model (considering the correlation between terms) on which the Euclidean distance measure can be used. Two k-means type clustering algorithms, standard kmeans [11] and FW-KMeans [20], are imp ...
... Combining this mutual information matrix and the traditional vector space model (VSM ), we design a new data model (considering the correlation between terms) on which the Euclidean distance measure can be used. Two k-means type clustering algorithms, standard kmeans [11] and FW-KMeans [20], are imp ...
Unifying Instance-Based and Rule
... hyperplanes into hyperquadrics (Cost & Salzberg, 1993). Another extension to the basic IBL paradigm consists in using the k nearest neighbors for classification, instead of just the nearest one (Duda & Hart, 1973). The class assigned is then that of the majority of those k neighbors, or the class re ...
... hyperplanes into hyperquadrics (Cost & Salzberg, 1993). Another extension to the basic IBL paradigm consists in using the k nearest neighbors for classification, instead of just the nearest one (Duda & Hart, 1973). The class assigned is then that of the majority of those k neighbors, or the class re ...
A fuzzy decision tree approach to start a genetic
... induction threshold. The same reason may be used to justify the generation of less accurate fuzzy trees. It is relevant to reaffirm that the objective is not to produce the best decision trees. They are only used to initiate the genetic algorithm. The fuzzy trees with up to 5 leaves are the ones wit ...
... induction threshold. The same reason may be used to justify the generation of less accurate fuzzy trees. It is relevant to reaffirm that the objective is not to produce the best decision trees. They are only used to initiate the genetic algorithm. The fuzzy trees with up to 5 leaves are the ones wit ...
Data Mining Algorithms In R/Frequent Pattern Mining
... of this method is the usage of a special data structure named frequentpattern tree (FPtree), which retains the itemset association information. In simple words, this algorithm works as follows: first it compresses the input database creating an FPtree instance to represent frequent items. After t ...
... of this method is the usage of a special data structure named frequentpattern tree (FPtree), which retains the itemset association information. In simple words, this algorithm works as follows: first it compresses the input database creating an FPtree instance to represent frequent items. After t ...
Never Walk Alone: Uncertainty for Anonymity in Moving
... location based quasi-identifier, i.e., a spatio-temporal pattern that can uniquely identify one individual. How to exploit this interesting concept in the case of data publishing is a serious, challenging, open problem not addressed in [1] nor in other work. In our framework we do not take in consid ...
... location based quasi-identifier, i.e., a spatio-temporal pattern that can uniquely identify one individual. How to exploit this interesting concept in the case of data publishing is a serious, challenging, open problem not addressed in [1] nor in other work. In our framework we do not take in consid ...
Genetics-Based Machine Learning for Rule Induction: Taxonomy
... overlapped, or as a decision list. Also, the inference type (the classification process itself) is very dependent on the type of rule used in the algorithm. The details are as follows: • Non-ordered overlapping rule sets, also called “IFTHEN” rules. Since the rules can be overlapping, the whole rule ...
... overlapped, or as a decision list. Also, the inference type (the classification process itself) is very dependent on the type of rule used in the algorithm. The details are as follows: • Non-ordered overlapping rule sets, also called “IFTHEN” rules. Since the rules can be overlapping, the whole rule ...
Efficient Classification and Prediction Algorithms for Biomedical
... each transaction: date, customer identification code, goods bought and their amount, total money spent, and so forth. This requires saving of space on order of gigabytes in a daily basis. The problem here is how those supermarket branches can use this huge amount of raw data to predict which custome ...
... each transaction: date, customer identification code, goods bought and their amount, total money spent, and so forth. This requires saving of space on order of gigabytes in a daily basis. The problem here is how those supermarket branches can use this huge amount of raw data to predict which custome ...
K-nearest neighbors algorithm
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.