The effect of data pre-processing on the performance of Artificial

EZ36937941

... Dichotomized 3) and C4.5. These algorithms are at variance in selection of splits, when to stop a node from splitting, and assignment of class to a non-split node [14]. CART uses Gini index to measure the impurity of a partition or set of training tuples [5]. It can handle high dimensional categoric ...

An Efficient Density-based Approach for Data Mining Tasks

... density of the dataset contains useful information for both the classification and clustering tasks. For classification, the main point is that, given a query, the values of the class density functions over the space around it quantify the contribution of the correspondent class within the neighbour ...

VIT-PLA: Visual Interactive Tool for Process Log Analysis

... prototype (Figure 4). Because it is not practical to visualize all the data objects on a single computer screen, a substantial reduction in the data size is needed. The deployment of cluster prototypes helps compress the dataset. Several candidates can be considered as cluster prototype, such as the ...

Clustering Game Behavior Data - Game Analytics Resources v

... However, analyzing behavioral data from games can be challenging. Consider, for example, Massively MultiPlayer Online Games such as World of Warcraft, Tera, or Eve Online. Each of these games features up to hundreds of thousands of simultaneously active users spread across hundreds of instances of t ...

Efficient Pattern Mining from Temporal Data through

... Due to the increasing computerization in many applications ranging from finance to bioinformatics, vast amounts of data are routinely collected. To unearth useful knowledge from such databases there is need for a different framework. One such framework is provided by Periodicity Mining, a subfield o ...

Full Text - Bonfring International Journals

Discovering frequent patterns in sensitive data

... O(K 0 + K log K 0 + nK) to produce the final output. Since K and K 0 are typically much smaller than n, the non-private itemset mining is the efficiency bottleneck. This observation was borne out by our experiments. Techniques. The main difference between our two algorithms is technique. Our first a ...

Multithreaded Implementation of the Slope One

Comparison of Feature Selection Techniques in

Mining Predictive Redescriptions with Trees

... Furthermore, redescriptions should also be statistically significant. To evaluate the significance of results, we use p-values as in [3]. Our algorithms incorporate parameters to account for these preferences. In short, given two data matrices, redescription mining is the task of searching for the ...

A Data Mining Methodology for Evaluating Maintainability according

PDF file - Stanford InfoLab

... support greater than a minimum threshold κ (called minimum support or minsup) [RG99]. Note that for a single transaction T to contribute to the support of a given itemset, it must contain the entire itemset. We relax this exact matching criterion to yield a more flexible definition of support and co ...

ijecec/v3-i2-06

... methodology is that the density around an outlier remarkably varies from that around its neighbors [14]. The density of an object‟s neighborhood is correlated with that of its neighbor‟s neighborhood. If there is a significant anomaly between the densities, the object can be considered as an outlier ...

Overview of overlapping partitional clustering methods

... algorithms need to detect overlapping clusters where an actor can belong to multiple communities [Tang and Liu, 2009, Wang et al., 2010, Fellows et al., 2011]. In video classification, overlapping clustering is a necessary requirement where videos have potentially multiple genres [Snoek et al., 2006 ...

Global Discretization of Continuous Attributes as Preprocessing for

... the outcome of the discretization process. We can, however, abide by the following guidelines that intuitively insure successful discretization: Complete discretization. We are seldom interested in discretization of just one continuous attribute (unless there is only one such attribute in a data s ...

The Research of Data Mining Algorithm Based on Association Rules

... In the above two steps, the second step is relatively easy, because it only needs to list all possible association rules based on the frequent item sets have been found, and then use the support threshold and confidence threshold to measure them, and the association rules both met the support thresh ...

Analysing frequent sequential patterns of collaborative learning

A biologically-inspired validity measure for comparison - FICH-UNL

Data Mining In EDA - Basic Principles, Promises, and Constraints

A Survey on Frequent Pattern Mining Methods Apriori, Eclat, FP growth

... frequently, it is called a frequent pattern. Finding frequently .In frequent pattern mining to check such frequent patterns plays an essential role in whether a itemset occurs frequently or not we have mining associations, correlations, and many other a parameter called support of an itemset . An in ...

Appendix: The WEKA Data Mining Software

... Explorer, Experimenter and Knowledge Flow. The easiest way to use WEKA is through Explorer, the main graphical user interface. Data can be loaded from various sources, including files, URLs and databases. Database access is ...

4. A Data Mining Methodology for Evaluating Maintainability according to ISO/IEC-9126 Software Engineering-Product Quality Standard - P. Antonellis D. Antoniou Y. Kanellopoulos, C. Makris E. Theodoridis C. Tjortjis N.Tsirakis

... algorithms. This method was applied to Mozilla, a large open source software system with more than four million lines of C/C++. All these approaches employ data mining techniques only to recover the structure of a software system. On the other hand [14] is employing clustering for predicting softwar ...

Classification Algorithms of Data Mining

Effective Classification of 3D Image Data using

< 1 ... 46 47 48 49 50 51 52 53 54 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering