
Technical report MSU-CSE-04-35
... algorithms to build a classification or regression model. Despite its importance, pattern ordering is a challenging task due to the wide range of metrics and expert’s opinions available for ranking patterns. As shown in [22], many existing metrics such as support, confidence, lift, correlation, χ2, ...
... algorithms to build a classification or regression model. Despite its importance, pattern ordering is a challenging task due to the wide range of metrics and expert’s opinions available for ranking patterns. As shown in [22], many existing metrics such as support, confidence, lift, correlation, χ2, ...
Feature Discovery in the Context of Educational Data Mining: An
... training data, test on held out data • 38% improvement in R2 of discovered features over baseline regression on initial features ...
... training data, test on held out data • 38% improvement in R2 of discovered features over baseline regression on initial features ...
An Introduction to Cluster Analysis for Data Mining
... or efficiently finding the nearest neighbors of points. Whether for understanding or utility, cluster analysis has long been used in a wide variety of fields: psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning, and data mining. The ...
... or efficiently finding the nearest neighbors of points. Whether for understanding or utility, cluster analysis has long been used in a wide variety of fields: psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning, and data mining. The ...
Binary Matrix Factorization with Applications
... • Step 2. for the element X(i, j) > p, X(i, j) = 1, otherwise X(i, j) = 0, where p is a pre-assigned parameter that controls the sparsity of X. Table 1 shows the numerical results where the size of the input binary matrix X is 200 × 400. In Table 1, the density parameter P is selected from {0.2, 0.5 ...
... • Step 2. for the element X(i, j) > p, X(i, j) = 1, otherwise X(i, j) = 0, where p is a pre-assigned parameter that controls the sparsity of X. Table 1 shows the numerical results where the size of the input binary matrix X is 200 × 400. In Table 1, the density parameter P is selected from {0.2, 0.5 ...
Extraction of Significant Patterns from Heart Disease Warehouses
... rule. Experiments illustrated that the constraints reduced the number of discovered rules remarkably besides decreasing the running time. Two groups of rules envisaged the presence or absence of heart disease in four specific heart arteries. Data mining methods may aid the clinicians in the predicat ...
... rule. Experiments illustrated that the constraints reduced the number of discovered rules remarkably besides decreasing the running time. Two groups of rules envisaged the presence or absence of heart disease in four specific heart arteries. Data mining methods may aid the clinicians in the predicat ...
Mining association rules for the quality improvement of the
... mining association rules is to find all association rules in a database having a support no less than a user-defined threshold minsup and a confidence no less than a user-defined threshold minconf. The problem of rule mining can be decomposed in two steps: Step 1 is to determine all frequent itemset ...
... mining association rules is to find all association rules in a database having a support no less than a user-defined threshold minsup and a confidence no less than a user-defined threshold minconf. The problem of rule mining can be decomposed in two steps: Step 1 is to determine all frequent itemset ...
Conceptual Grouping of Object Behaviour in
... The importance of the qualitative reasoning in making conclusions and predictions on the system behaviour, even without complete data, makes it suitable for many real world problems. The proposed system uses qualitative spatiotemporal representation and reasoning as the base in laboratory animal beh ...
... The importance of the qualitative reasoning in making conclusions and predictions on the system behaviour, even without complete data, makes it suitable for many real world problems. The proposed system uses qualitative spatiotemporal representation and reasoning as the base in laboratory animal beh ...
ADWICE - Anomaly Detection with Real
... a real network is used, the problem of producing good normal data is reduced, but then the data may be too sensitive to be released in public. For learning based methods, good data is not only necessary for evaluation and testing, but also for training. Thus applying a learning based method in the r ...
... a real network is used, the problem of producing good normal data is reduced, but then the data may be too sensitive to be released in public. For learning based methods, good data is not only necessary for evaluation and testing, but also for training. Thus applying a learning based method in the r ...
Abstract - Logic Systems
... different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection context. Namely, it was recently observed that the distribution of points’ reverse-neighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We provid ...
... different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection context. Namely, it was recently observed that the distribution of points’ reverse-neighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We provid ...
Cluster Analysis
... The clustering process can be presented as searching a graph where every node is a potential solution, that is, a set of k medoids If the local optimum is found, CLARANS starts with new randomly selected node in search for a new local optimum It is more efficient and scalable than both PAM and CLARA ...
... The clustering process can be presented as searching a graph where every node is a potential solution, that is, a set of k medoids If the local optimum is found, CLARANS starts with new randomly selected node in search for a new local optimum It is more efficient and scalable than both PAM and CLARA ...
Fast Parallel Mining of Frequent Itemsets - MSU CSE
... growth of the available data demands more and more computational power. To address this issue, it is necessary to study parallel implementations of such algorithms. In this paper, we propose a parallel approach to the Frequent Pattern Tree (FP-Tree) algorithm, which is a fast and popular tree projec ...
... growth of the available data demands more and more computational power. To address this issue, it is necessary to study parallel implementations of such algorithms. In this paper, we propose a parallel approach to the Frequent Pattern Tree (FP-Tree) algorithm, which is a fast and popular tree projec ...
Global Discretization of Continuous Attributes as Preprocessing for
... where a b = a c = ½, /3 = J, and 3' = 0 for the median cluster analysis method. At any point during the clustering process the clusters formed induce a partition on the set of examples U. Examples that belong to the same duster are indiscernible by the subset of continuous attributes. Therefore, we ...
... where a b = a c = ½, /3 = J, and 3' = 0 for the median cluster analysis method. At any point during the clustering process the clusters formed induce a partition on the set of examples U. Examples that belong to the same duster are indiscernible by the subset of continuous attributes. Therefore, we ...
A Novel Approach towards Tourism Recommendation System with
... attributes like age, gender and race as well as travel group types like family, friends and couple. They had exploited the detected people attributes and travel group types in photo contents. They had used probabilistic Bayesian learning framework which is used as a part of mobile recommendation on ...
... attributes like age, gender and race as well as travel group types like family, friends and couple. They had exploited the detected people attributes and travel group types in photo contents. They had used probabilistic Bayesian learning framework which is used as a part of mobile recommendation on ...
ASSOCIATION RULE MINING IN COOPERATIVE RESEARCH A
... The survey is a joint project between UPI and the University of Missouri’s Graduate Institute of Cooperative Leadership. The objective of the survey was to understand what types of services their members desire, the relative emphasis they place on these services, and how well the cooperative is curr ...
... The survey is a joint project between UPI and the University of Missouri’s Graduate Institute of Cooperative Leadership. The objective of the survey was to understand what types of services their members desire, the relative emphasis they place on these services, and how well the cooperative is curr ...
Visual Quality Assessment of Subspace Clusterings
... parameter settings, the criticism to this evaluation method is manifold: The main problem of external quality measures lies in the use of a ground truth clustering itself. In most (real-world) applications and datasets with unknown data a ground truth is not available. Even if a ground truth labelin ...
... parameter settings, the criticism to this evaluation method is manifold: The main problem of external quality measures lies in the use of a ground truth clustering itself. In most (real-world) applications and datasets with unknown data a ground truth is not available. Even if a ground truth labelin ...
FP-Outlier: Frequent Pattern Based Outlier Detection
... approaches are not appropriate for discovering outliers in a high dimensional space. Furthermore, they failed to find outliers in the subsets of dimensions. The method proposed by Aggarwal and Yu [4] considers data points in a local region of abnormally low density as outliers to conquer the curse o ...
... approaches are not appropriate for discovering outliers in a high dimensional space. Furthermore, they failed to find outliers in the subsets of dimensions. The method proposed by Aggarwal and Yu [4] considers data points in a local region of abnormally low density as outliers to conquer the curse o ...
Distributed Higher Order Association Rule Mining Using
... algorithms discussed assume that the databases are horizontally distributed. This limits the applicability of these algorithms. To address this issue, distributed mining of vertically fragmented data has received a growing amount of attention, especially in the context of privacy preserving data min ...
... algorithms discussed assume that the databases are horizontally distributed. This limits the applicability of these algorithms. To address this issue, distributed mining of vertically fragmented data has received a growing amount of attention, especially in the context of privacy preserving data min ...
Data Mining - TIGP Bioinformatics Program
... patterns • Any subset of a frequent itemset must be frequent • If {beer, diaper, nuts} is frequent, so is {beer, diaper} • i.e., every transaction having {beer, diaper, nuts} also ...
... patterns • Any subset of a frequent itemset must be frequent • If {beer, diaper, nuts} is frequent, so is {beer, diaper} • i.e., every transaction having {beer, diaper, nuts} also ...
data mining
... Moreover they have evaluated the classifier performance using ROC values and Kappa Statistics in Weka Data mining tool. Soni et.al, (March, 2011) have performed an analysis of data mining techniques using Tanagra. But they have made a complete study of only the heart disease dataset and have provide ...
... Moreover they have evaluated the classifier performance using ROC values and Kappa Statistics in Weka Data mining tool. Soni et.al, (March, 2011) have performed an analysis of data mining techniques using Tanagra. But they have made a complete study of only the heart disease dataset and have provide ...
A MapReduce Algorithm for Polygon Retrieval
... retrieval involves retrieval of all terrain data within a given polygon’s boundary [4], [5] to access the spatial data within a specific area of interest for further analysis. We note that terrain data is usually represented using one of the common data structures to approximate surface, for example ...
... retrieval involves retrieval of all terrain data within a given polygon’s boundary [4], [5] to access the spatial data within a specific area of interest for further analysis. We note that terrain data is usually represented using one of the common data structures to approximate surface, for example ...
Cluster Analysis
... clusters of the current partition. The centroid is the center (mean point) of the cluster. Assign each object to the cluster with the nearest seed point. Go back to Step 2, stop when no more new assignment. ...
... clusters of the current partition. The centroid is the center (mean point) of the cluster. Assign each object to the cluster with the nearest seed point. Go back to Step 2, stop when no more new assignment. ...