Applying Data Mining Techniques to Discover Patterns in Context
... Still, it is not always feasible to fully use the possibilities given. This is mainly because the efficiency of how one uses mobile learning platform hardly depends on each individual’s environment (e.g. location, time of the day), his mental state and possibilities of devices that he or she uses. A ...
... Still, it is not always feasible to fully use the possibilities given. This is mainly because the efficiency of how one uses mobile learning platform hardly depends on each individual’s environment (e.g. location, time of the day), his mental state and possibilities of devices that he or she uses. A ...
Parallel Outlier Detection on Uncertain Data for GPUs
... A considerable portion of data in the real world contains some degree of uncertainty [3], due to factors such as limitations in measuring equipment, partial responses or interpolation [5] [13]. An example of this could be a remote sensor network or location-based tracking system [42]. There have bee ...
... A considerable portion of data in the real world contains some degree of uncertainty [3], due to factors such as limitations in measuring equipment, partial responses or interpolation [5] [13]. An example of this could be a remote sensor network or location-based tracking system [42]. There have bee ...
Drug Similarity
... Guillermo Palma, Maria-Esther Vidal, Eric Haag, Louiqa Raschid and Andreas Thor. Measuring Relatedness Between Scientific Entities in Annotation Datasets. ACM BCB. 2013. Joseph Benik, Caren Chang, Louiqa Raschid, Maria-Esther Vidal, Guillermo Palma and Andreas Thor. Finding Cross Genome Patterns i ...
... Guillermo Palma, Maria-Esther Vidal, Eric Haag, Louiqa Raschid and Andreas Thor. Measuring Relatedness Between Scientific Entities in Annotation Datasets. ACM BCB. 2013. Joseph Benik, Caren Chang, Louiqa Raschid, Maria-Esther Vidal, Guillermo Palma and Andreas Thor. Finding Cross Genome Patterns i ...
Mining Data Bases and Data Streams
... too massive. In these situations, the information must be processed and mined as the data arrives. While this new research area, namely data stream mining, shares many similarities with database mining, it also poses unique challenges and requirements discussed in Section 5.4. The progress made by D ...
... too massive. In these situations, the information must be processed and mined as the data arrives. While this new research area, namely data stream mining, shares many similarities with database mining, it also poses unique challenges and requirements discussed in Section 5.4. The progress made by D ...
A tutorial on using the rminer R package for data mining tasks*
... A data.frame can be easily manipulated as a matrix using standard R operations (e.g., cut, tt sample, cbind, nrow). The rminer package includes some useful preprocessing functions, such as: delevels ...
... A data.frame can be easily manipulated as a matrix using standard R operations (e.g., cut, tt sample, cbind, nrow). The rminer package includes some useful preprocessing functions, such as: delevels ...
CD: A Coupled Discretization Algorithm
... [13]; decision tree algorithms cannot handle numeric features in tolerable time directly, and only carry out a selection of nominal attributes [9]; and attribute reduction algorithms in rough set theory can only apply to the categorical values [10]. However, real-world data sets predominantly consis ...
... [13]; decision tree algorithms cannot handle numeric features in tolerable time directly, and only carry out a selection of nominal attributes [9]; and attribute reduction algorithms in rough set theory can only apply to the categorical values [10]. However, real-world data sets predominantly consis ...
FROM DATA MINING TO SENTIMENT ANALYSIS Classifying documents through existing opinion mining methods
... even impossible, for example, handheld devices and smart phones. Web-browsers like Firefox, Chrome and Opera utilize SQLite to store data easily and without extra burden for the user. Self-containment makes it also convenient for testing purposes where new databases need to be created and removed wi ...
... even impossible, for example, handheld devices and smart phones. Web-browsers like Firefox, Chrome and Opera utilize SQLite to store data easily and without extra burden for the user. Self-containment makes it also convenient for testing purposes where new databases need to be created and removed wi ...
Discovering frequent patterns in sensitive data
... 3. (Accuracy) For every pattern in the output list, the noisy frequency reported differs by no more than η from the corresponding true frequency. ...
... 3. (Accuracy) For every pattern in the output list, the noisy frequency reported differs by no more than η from the corresponding true frequency. ...
Recurring and Novel Class Detection using Class-Based Ensemble
... complete classification model. Next, we briefly discuss the construction and operation of the micro-classifiers. We train r micro-classifiers from a data chunk, where r is the number of classes in the chunk. Each micro-classifier is trained using only the positive instances of a class. During training (s ...
... complete classification model. Next, we briefly discuss the construction and operation of the micro-classifiers. We train r micro-classifiers from a data chunk, where r is the number of classes in the chunk. Each micro-classifier is trained using only the positive instances of a class. During training (s ...
Preference Learning with Gaussian Processes
... Grid search can be used to find the optimal hyperparameter values θ , but such an approach is very expensive when a large number of hyperparameters are involved. For example, automatic relevance determination (ARD) parameters 3 could be embedded into the covariance function (2) as a means of featur ...
... Grid search can be used to find the optimal hyperparameter values θ , but such an approach is very expensive when a large number of hyperparameters are involved. For example, automatic relevance determination (ARD) parameters 3 could be embedded into the covariance function (2) as a means of featur ...
K-nearest neighbors algorithm
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.