New results for a Hybrid Decision Tree/Genetic Algorithm for Data

... of small disjuncts. Overall, it obtained results considerably better than both standard C4.5 and double C4.5. Another advantage of the hybrid C4.5/GA is that in general it discovers a rule set considerably simpler (smaller) than the rule set discovered by standard C4.5. Hence, C4.5/GA seems a good c ...

Machine learning of functional class from phenotype data

... We aimed to learn rules for predicting functional classes which could be interpreted biologically. To this end we evaluated splitting the data set into 3 parts: training data, validation data to select the best rules from (rules were chosen that had an accuracy of at least 50% and correctly covered ...

Metalearning for Data Mining and KDD

... that is processed by such systems, it is impossible to store the data in convetional manner. These so-called big data (more on this phenomenon in [3]) are often stored in distributed data storages accross many storage units. It is obvious, that all operations performed over such data need to be opti ...

Data

Distance-Based Outlier Detection: Consolidation and Renewed

... the object to its neighbors, the more likely it is an outlier. Distance-based approaches have been the subject of much research withiin the database and data analytics communities [3, 5, 10, 2]. It is a relatively non-parametric approach that has been shown to scale quite nicely to large datasets of ...

Outlier detection in spatial data using the m

... methods this technique does not assume an underlying probability distribution model for the data. m-SNN can also be regarded as a variant of nearest neighbor method. In this method, we consider the ratio between the summation of Euclidean distances to shared nearest neighbors and total number of sha ...

The Data Mining and Data Usability Challenge

... ITSC and Simpson Weather Associates are applying data mining frameworks for the analysis and extraction of information from numerical model output data generated or archived at the GMAO. The team is conducting experiments focusing on the automated detection and mining of atmospheric phenomena relati ...

An Analysis of Particle Swarm Optimization with

... Data mining has been called exploratory data analysis, among other things. Masses of data generated from cash registers, from scanning, from topic specific databases throughout the company, are explored, analyzed, reduced, and reused. Searches are performed across different models proposed for predi ...

A API Standardization Efforts for Data Mining

... oriented specification for a set of data access interfaces designed for record-oriented data stores. It employs SQL commands as arguments of interface operations. The approach in defining OLE DB for DM was not to extend OLE DB interfaces but to expose data mining interfaces in a language-based API. ...

Effective Classification of 3D Image Data using

... A lot of research has been done in the field of content-based retrieval and classification for general types of images (see [1, 2] for comparative surveys). In most cases the extracted features (usually color-based [3-5]) characterize the entire image rather than image regions and there is no distin ...

Semi-Final Proceedin..

... approaches. RadVizTM has this machine learning feature embedded in it and is responsible for the selections carried out here. The advantage of RadVizTM is that one immediately sees a “visual” clustering of the results of the t-statistic selection. Generally, the amount of visual class separation cor ...

Challenges for Information Retrieval in Big data

... class (e.g., thumbs-up and thumbs-downs, or some other quantitative or binary ratings), they designed and experimented a number of methods for building sentiment classifiers. They have shown that such classifiers perform quite well with test reviews. They also used their classifiers to classify sent ...

IEEE Paper Template in A4 (V1) - International Journal of Computer

... hard to choose (as discussed below) when not given by external constraints. In contrast to other algorithms, kmeans can also not be used with arbitrary distance functions or be use on non-numerical data. For these use cases, many other algorithms have been developed since. Feature learning:- k-means ...

Chapter 1 MINING TIME SERIES DATA

Clustering Algorithms Applied in Educational Data Mining

... In another study, researchers have shown how educational institutions can benefit from the data collected by LMS. They have proposed an algorithm called “Course Classification Algorithm”[45] when applied in the LMS (Open e-Class platform) that the institution uses can be used to determine and genera ...

Special Topics: Advanced Classification Neural Nets, Support

... – Assign "blame" for the local error to neurons at the previous level, giving greater responsibility to neurons connected by stronger weights. – Repeat on the neurons at the previous level, using each one's "blame" as its error. ...

Software Engineering: Analysis and Design

ECLT5810 E-Commerce Data Mining Techniques Overview of SAS

... 4 every nth may contain sample with part of structure, especially when data set is sorted - Stratified 1 Specify class variables to form strata (subsets) 2 Preserve the strata proportions of the original data set - First N 1 Select first N observations - Cluster 1 Cluster variable: class va ...

Optimal Grid-Clustering: Towards Breaking the Curse of

as a PDF

... no good partitioning plane exist in some dimensions, we do not partition the data set in those dimensions. Our strategy of using a data-dependent partitioning of the data avoids the eectiveness problems of the existing approaches and guarantees that all clusters are found by the algorithm (even for ...

MS PowerPoint 97 format - Kansas State University

... – Set whose entities are alike and are different from entities in other clusters – Aggregation of points in the instance space such that distance between any two points in the cluster is less than the distance between any point in the cluster and ...

“Secure” Logistic Regression of Horizontally and Vertically

... depends on how the database is partitioned. When the parties (government agencies or competing business establishments) have exactly the same variables but for different data subjects, we call the situation (pure) horizontally partitioned data. At the other extreme, when the parties hold disjoint se ...

View PDF - International Journal of Computer Science and Mobile

... achieves security only for a passive adversary setting, without the possibility to enhance it to active adversary settings. Due to their simplicity, the described protocols are well-suited for educational purposes, which is a main goal of this paper. Another advantage of the protocol used in this pa ...

effectiveness prediction of memory based classifiers for the

... instance closest to the given test instance, and predicts the same class as this training instance. If several instances have the smallest distance to the test instance, the first one obtained is used. Nearest neighbour method is one of the effortless and uncomplicated learning/classification algori ...

IJESRT

... as [4]: In subgroup discovery, we consider we are given a in titled as people population (objects, client) and a property of those people we are interested in. The aim of subgroup discovery is then to find the subgroups of the population that are measurably "most interesting", i.e. are as huge as co ...

< 1 ... 255 256 257 258 259 260 261 262 263 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction