www.cs.gmu.edu - George Mason University Department of

Technical report MSU-CSE-04-35

... algorithms to build a classification or regression model. Despite its importance, pattern ordering is a challenging task due to the wide range of metrics and expert’s opinions available for ranking patterns. As shown in [22], many existing metrics such as support, confidence, lift, correlation, χ2, ...

Feature Discovery in the Context of Educational Data Mining: An

... training data, test on held out data • 38% improvement in R2 of discovered features over baseline regression on initial features ...

An Introduction to Cluster Analysis for Data Mining

... or efficiently finding the nearest neighbors of points. Whether for understanding or utility, cluster analysis has long been used in a wide variety of fields: psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning, and data mining. The ...

Binary Matrix Factorization with Applications

... • Step 2. for the element X(i, j) > p, X(i, j) = 1, otherwise X(i, j) = 0, where p is a pre-assigned parameter that controls the sparsity of X. Table 1 shows the numerical results where the size of the input binary matrix X is 200 × 400. In Table 1, the density parameter P is selected from {0.2, 0.5 ...

hipc_presentation - web.iiit.ac.in

Extraction of Significant Patterns from Heart Disease Warehouses

... rule. Experiments illustrated that the constraints reduced the number of discovered rules remarkably besides decreasing the running time. Two groups of rules envisaged the presence or absence of heart disease in four specific heart arteries. Data mining methods may aid the clinicians in the predicat ...

Mining association rules for the quality improvement of the

... mining association rules is to find all association rules in a database having a support no less than a user-defined threshold minsup and a confidence no less than a user-defined threshold minconf. The problem of rule mining can be decomposed in two steps: Step 1 is to determine all frequent itemset ...

review clustering mechanisms of distributed denial of service attacks

Conceptual Grouping of Object Behaviour in

... The importance of the qualitative reasoning in making conclusions and predictions on the system behaviour, even without complete data, makes it suitable for many real world problems. The proposed system uses qualitative spatiotemporal representation and reasoning as the base in laboratory animal beh ...

ADWICE - Anomaly Detection with Real

... a real network is used, the problem of producing good normal data is reduced, but then the data may be too sensitive to be released in public. For learning based methods, good data is not only necessary for evaluation and testing, but also for training. Thus applying a learning based method in the r ...

Abstract - Logic Systems

... different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection context. Namely, it was recently observed that the distribution of points’ reverse-neighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We provid ...

Cluster Analysis

... The clustering process can be presented as searching a graph where every node is a potential solution, that is, a set of k medoids If the local optimum is found, CLARANS starts with new randomly selected node in search for a new local optimum It is more efficient and scalable than both PAM and CLARA ...

Fast Parallel Mining of Frequent Itemsets - MSU CSE

... growth of the available data demands more and more computational power. To address this issue, it is necessary to study parallel implementations of such algorithms. In this paper, we propose a parallel approach to the Frequent Pattern Tree (FP-Tree) algorithm, which is a fast and popular tree projec ...

PDF

Global Discretization of Continuous Attributes as Preprocessing for

... where a b = a c = ½, /3 = J, and 3' = 0 for the median cluster analysis method. At any point during the clustering process the clusters formed induce a partition on the set of examples U. Examples that belong to the same duster are indiscernible by the subset of continuous attributes. Therefore, we ...

A Novel Approach towards Tourism Recommendation System with

... attributes like age, gender and race as well as travel group types like family, friends and couple. They had exploited the detected people attributes and travel group types in photo contents. They had used probabilistic Bayesian learning framework which is used as a part of mobile recommendation on ...

ASSOCIATION RULE MINING IN COOPERATIVE RESEARCH A

... The survey is a joint project between UPI and the University of Missouri’s Graduate Institute of Cooperative Leadership. The objective of the survey was to understand what types of services their members desire, the relative emphasis they place on these services, and how well the cooperative is curr ...

Visual Quality Assessment of Subspace Clusterings

... parameter settings, the criticism to this evaluation method is manifold: The main problem of external quality measures lies in the use of a ground truth clustering itself. In most (real-world) applications and datasets with unknown data a ground truth is not available. Even if a ground truth labelin ...

FP-Outlier: Frequent Pattern Based Outlier Detection

... approaches are not appropriate for discovering outliers in a high dimensional space. Furthermore, they failed to find outliers in the subsets of dimensions. The method proposed by Aggarwal and Yu [4] considers data points in a local region of abnormally low density as outliers to conquer the curse o ...

Distributed Higher Order Association Rule Mining Using

... algorithms discussed assume that the databases are horizontally distributed. This limits the applicability of these algorithms. To address this issue, distributed mining of vertically fragmented data has received a growing amount of attention, especially in the context of privacy preserving data min ...

Data Mining - TIGP Bioinformatics Program

... patterns • Any subset of a frequent itemset must be frequent • If {beer, diaper, nuts} is frequent, so is {beer, diaper} • i.e., every transaction having {beer, diaper, nuts} also ...

data mining

... Moreover they have evaluated the classifier performance using ROC values and Kappa Statistics in Weka Data mining tool. Soni et.al, (March, 2011) have performed an analysis of data mining techniques using Tanagra. But they have made a complete study of only the heart disease dataset and have provide ...

A MapReduce Algorithm for Polygon Retrieval

... retrieval involves retrieval of all terrain data within a given polygon’s boundary [4], [5] to access the spatial data within a specific area of interest for further analysis. We note that terrain data is usually represented using one of the common data structures to approximate surface, for example ...

Cluster Analysis

... clusters of the current partition. The centroid is the center (mean point) of the cluster.  Assign each object to the cluster with the nearest seed point.  Go back to Step 2, stop when no more new assignment. ...

< 1 ... 36 37 38 39 40 41 42 43 44 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering