Mining Trajectory Data

... and retrieve features based on their geographic location over the time; such features include Stay Points (SP) and Points of Interest (POI) which can be useful to understand users’ interaction and similarity, and both understand individuals’ movement patterns and find interesting places in a certain ...

Support vector machines based on K-means clustering for real

... Two elements affecting the response time of SVM classifiers are the number of input variables and that of the support vectors. While Viaene et al. (2001) improve response time by selecting parts of input variables, this paper tries to improve the response time of SVM classifiers by reducing support ...

Data Mining and Exploration

... •  Important, since for most astronomical studies you want either stars (~ quasars), or galaxies; the depth to which a reliable classification can be done is the effective limiting depth of your catalog - not the detection depth –  There is generally more to measure for a non-PSF object •  You d lik ...

Data Mining and Exploration (a quick and very superficial

... How many statistically distinct kinds of things are there in my data, and which data object belongs to which class? Are there anomalies/outliers? (e.g., extremely rare classes) I know the classes present in the data, but would like to classify efficiently all of my data objects ...

Pattern Recognition and Classification for Multivariate - DAI

... that the reconstruction error of the employed SVD model does not exceed a predefined threshold. In case of two observed parameters the correlation structure of a segment can be expressed as a position vector (two-dimensional hyperplane) which gives an approximation of the original segment data. The ...

Decision Tree Induction in High Dimensional, Hierarchically

... Previous work on distributed decision tree induction usually focused on tight clusters of computers, or even on shared memory machines [4–6, 10, 11]. When a wide area distributed scenario was considered, all these algorithms become impractical because they use too much communication and synchronizat ...

Application of Data Mining Techniques to Olea - CEUR

... it produces very simple rules for classification and can be considered the baseline for classification performance. It was found to perform as well as more sophisticated algorithms when applied to many of the standard machine learning test datasets (Holte, 1993). OneR can parsimoniously discover and ...

Cluster Description and Related Problems

... R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD 1998. D. Angluin. Queries and concept learning. Machine Learning, 2(4): 319-342, 1988. P. Berman and B. Dasgupta. Approximating rectilinear polygon cover ...

efficient classifier for predicting students knowledge level

Multiple Non-Redundant Spectral Clustering Views

... Clustering is often a ﬁrst step in the analysis of complex multivariate data, particularly when a data analyst wishes to engage in a preliminary exploration of the data. Most clustering algorithms ﬁnd one partitioning of the data (Jain et al., 1999), but this is overly rigid. In the exploratory data ...

Web People Search via Connection Analysis

... Disambiguation algorithm Correlation Clustering (1/3) • CC has been applied in the past to group documents of the same topic and to other problems. • It assumes that there is a similarity function s(u, v) learned on the past data. • Each (u, v) edge is assigned a “+” (similar) or “-” (different) la ...

Density Clustering Method for Gene Expression Data

... successfully discovered the tumor classes based on the simultaneous expression profiles of thousands of genes from acute leukemia patient’s testing samples using self-organizing maps clustering approach [8]. Some other clustering approaches, such as k-mean [21], fuzzy kmeans [1], CAST [3], etc, als ...

Subspace Clustering of High-Dimensional Data: An Evolutionary

... ORCLUS finds projected clusters as a set of data points C together with a set of orthogonal vectors such that these data points are closely clustered in the defined subspace. A limitation of these two approaches is that the process of forming the locality is based on the full dimensionality of the s ...

Designing Parallel and Distributed Algorithms for Data Mining and

... operates on huge deposits of data being salvaged from massive data resources determining in terabytes or zeta bytes are now totally considered to be frequent in data mining that tend to create data mining tasks and applications too deliberate to work and too gigantic to be executed on a solo process ...

It gives me a great pleasure to present the paper on “Fast Clustering

Implementation of Apriori Algorithm using WEKA

WAIRS: Improving Classification Accuracy by Weighting Attributes in

frequent patterns for mining association rule in improved

... International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 3 Issue 3, March 2014 ...

Efficient and Effective Clustering Methods for Spatial Data Mining

Chapter 5. Cluster Analysis

... There is a separate “quality” function that measures the “goodness” of a cluster. The definitions of distance functions are usually very different for interval-scaled, boolean, categorical, ordinal and ratio variables. Weights should be associated with different variables based on applications and d ...

Iterative Projected Clustering by Subspace Mining

... Therefore, a new class of projected clustering methods (also called subspace clustering methods) [1], [2], [3], [12] have emerged, whose task is to find 1) a set of clusters C, and 2) for each cluster Ci 2 C, the set of dimensions Di that are relevant to Ci . For instance, the projected clusters in ...

- International Journal of Innovative Science, Engineering

CLINCH: Clustering Incomplete High-Dimensional Data

... mean more information, if we deal with dimensions with larger entropies, the prediction on the missing attributes will be more precise. After ﬁnishing all the complete dimensions, characteristics of clusters on this complete subspace are built through entropies, and will be employed for the predicti ...

Opening the Black Box: Interactive Hierarchical Clustering for

... Clustering is one of the most important tasks for geographic knowledge discovery. However, existing clustering methods have two severe drawbacks for this purpose. First, spatial clustering methods have so far been mainly focused on searching for patterns within the spatial dimensions (usually 2D or ...

Comprehensibility of Data Mining Algorithms

< 1 ... 84 85 86 87 88 89 90 91 92 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering