A comparative study of some classification algorithms using WEKA

View PDF - CiteSeerX

... corresponding clustering problem. Alternatively, the reader may think about using conventional clustering algorithms for fixed k, such as k-means [101][72], EM (Expectation Maximization) [34][61], and SOM (Self-Organized Maps) [17][62] algorithms. However, these prototype-based algorithms are quite ...

References

... The system is very simple in design and to implement. The system requires very low system resources and the system will work in almost all configurations. To account for the changing legal behaviour caused by external events, SD (Spike Detection) strengthens CD (Communal Detection) by providing attr ...

Dot Plots for Time Series Analysis

... The “challenge problem” that Pevzner and Sze proposed was the planted (15, 4)-motif problem with t=20 sequences of length n=600 over the four letter DNA alphabet. This problem turns to be very hard computationally as one planted occurrence corresponds to several potential motifs. As a consequence, e ...

Distance based fast hierarchical clustering method for large datasets

... of the data points is unequal (i.e., number of the data points of abnormal/attack type is very less compared to normal data points). These low proportional data points (abnormal/attack data points) look like outliers in the feature space. These abnormal data points are likely to get merged with the ...

Subspace Selection for Clustering High-Dimensional Data

... 1% of additional points is sufficient to achieve the desired results. Let us note that this strategy is only required, if the subspaces contain a clear clustering structure without noise. In most real-world data sets the subspaces do not show a clear cluster structure and often have much more than 1 ...

Measuring Constraint-Set Utility for Partitional Clustering Algorithms

... SRI International, Menlo Park, CA 94025, [email protected] ...

Using Subgroup Discovery to Analyze the UK Traffic Data

... records of all accidents happened over the given period of time (1979–1999), the VEHICLE data includes data about all the vehicles involved in those accidents, and the CASUALTY data includes the data about all the casualties involved in the accidents. Consider the following example: ‘Two vehicles cr ...

G44083642

Adaptive Scaling of Cluster Boundaries for Large

... isolation and outlierness, are used to select the best cluster structure in the hierarchy. In [28], an agglomerative clustering algorithm was proposed based on an intracluster dissimilarity measure, and a merge dissimilarity index (MDI) is presented to find the optimal number of clusters. 2) Genetic ...

A Comparative Study of Frequent and Maximal Periodic Pattern

... candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to fin ...

Knowledge Management in CRM using Data mining Technique

... algorithms. That is really going to be fruitful in saving the time in case of large database. This key idea is surely going to open a new gateway for the ...

Cyclic Repeated Patterns in Sequential Pattern Mining

... Initially the input dataset is fed to the clustering process, in which fuzzy c means clustering algorithm is used to cluster the available data based on the similar sequential pattern. This approach is able to mine the patterns with the help of association rule mining, here two major tasks are prese ...

Efficient Discovery of Unusual Patterns in Time Series | SpringerLink

Enhanced SPRINT Algorithm based on SLIQ to Improve Attribute

... all necessary information to compute the gini index. The attribute for each candidate split-points are evaluated in a single sequential scan of the equivalent attribute list. During scanning, if a wining split point was found, it is automatically saved and the Cbelow and Cabove histograms are de-all ...

Improving Decision Tree Performance by Exception Handling

... into one of the several predefined classes based on the values of the record attributes[2] . The decision tree is an important classification tool and its various improvements such as ID3[3, 4] , ID4[5] , ID5[6] , ITI[7] , C4.5[8] , and CART[9] over the original decision tree algorithm have been pro ...

MEX Vocabulary: A Lightweight Interchange Format

... algorithm. Each execution carry the associated values for each existing variable. • :ExperimentConfiguration: Each set of executions should be grouped by one configuration i.e., for one experiment we could have many executions, in different hardware environments or over different algorithm configura ...

DWM - Vidyalankar

1 - University of Illinois Urbana

... of the methods or compares them. This information either is not available in the document content, is not thorough enough, or simply cannot be trusted scientifically. The quality of survey papers are very dependent on how authoritative the writer(s) is. There are some approaches in the community tow ...

Cluster Ensembles for High Dimensional Clustering

... However, because the true structure of the data is unknown, it is inherently ambiguous what constitutes a good low dimensional representation. This makes it difficult to define a proper “interestingness” criterion. An alternative approach to this problem that we explore in this paper is to use multi ...

DAta guided approach_article

... clusters based on a similarity measure. For the purpose of generating clusters, we use only the numeric variables in order to get optimal clustering results, as clustering algorithms perform well on numeric data. In the second step, we generate the binary tree from the hierarchical dendrogram (tree) ...

Visual mining of moving flock patterns in large

A Proposed Data Mining Framework for Higher Education System

... system via using several techniques and algorithms. This section appears as a survey for previous work in e-learning using data mining techniques. As stated in the introduction, the study aims at organizing the findings of the survey from different views that might correspond to the diverse academic ...

LH3120652069

DATA MINING LAB MANUAL

< 1 ... 40 41 42 43 44 45 46 47 48 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering