Summarizing categorical data by clustering attributes

Mobility Data Warehousing and Mining

... objects and considers unconstrained environments. Also, relevant to our work is the work on change detection. For example, the MONIC framework has been proposed [20] for modeling and detecting transitions between clusters discovered at ...

Steven F. Ashby Center for Applied Scientific Computing Month DD

... height of a person may vary from 1.5m to 1.8m  weight of a person may vary from 90lb to 300lb  income of a person may vary from $10K to $1M ...

Improving the Quality of Association Rules by Preprocessing

... Other methods used mainly in classification problems consider entropy measures. These are often embedded within machine learning algorithms such as those of decision trees induction. In these cases, a supervised discretization is usually carried out. It considers class information for generating the ...

Determining the Existence of Quantitative Association Rule Hiding in

... itemsets are formed by extending existing itemsets. These algorithms turn out to be ineffective because they generate and count too many candidate itemsets that turn out to be small (infrequent) [4]. To remedy the problem, Apriori, AprioriTid, and AprioriHybrid were proposed in [4]. Apriori and Apri ...

E-Learning Platform Usage Analysis

... virtual classroom, and digital collaboration. The usage of web applications can be measured with the use of indexes and metrics. However, in e-Learning platforms there are no appropriate indexes and metrics that would facilitate their qualitative and quantitative measurement. The purpose of this pap ...

Iterative Discovery of Multiple Alternative Clustering Views

... These solutions are then “meta-clustered” using an agglomerative clustering based on Rand index for measuring similarity between pairwise clustering solutions. Jain et al. [12] find two disparate clusterings simultaneously by minimizing a sum-of-squared objective for the two clustering solutions whi ...

Outlier detection in spatial data using the m

... classes, which is very rarely found. Nearest Neighbor based methods [24, 26, 32] involve a distance or similarity measure which is defined between data points. The main advantage of the Nearest Neighbor based method is that it does not make any assumptions about the distribution of data. Therefore h ...

4. Conclusions Acknowledgments 5. References

... generalized notion of a cluster, i.e. a density-connected set. In the current context they state the following. Given the parameters NPred and MinWeight, we can discover a densityconnected set in a two-step approach. First, choose an arbitrary object from the database satisfying the core object cond ...

paper - AET Papers Repository

... analyzing traffic accidents as these methods are able to identify groups of road users, vehicles and road segments which would be suitable targets for countermeasures. More specifically, cluster analysis is a statistical technique that groups items together on the basis of similarities or dissimilar ...

Demand Forecast for Short Life Cycle Products

... 8.1 Forecasting results using multiple linear regression . . . 8.1.1 Multiple linear regression results with clustering 8.1.2 Conclusions of the MLR case . . . . . . . . . . 8.2 Forecasting results using support vector regression . . . 8.2.1 Support vector regression results with clustering 8.2.2 Co ...

04_CAINE-clustering

An R Package for Determining the Relevant Number of Clusters in a

... clusters, the density of clusters or, at least, the number of points in a cluster. Nonhierarchical procedures usually require the user to specify the number of clusters before any clustering is accomplished and hierarchical methods routinely produce a series of solutions ranging from n clusters to a ...

A Hybrid Data Mining Technique for Improving the Classification

... A Hybrid Data Mining Technique for Improving the Classification Accuracy of Microarray Data Set ...

Pattern Classi cation using Arti cial Neural Networks

... Classification is a data mining (machine learning) technique used to predict group membership for data instances. Pattern Classification involves building a function that maps the input feature space to an output space of two or more than two classes. Neural Networks (NN) are an effective tool in th ...

Data Mining: Concepts and Techniques

... Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster) Assign each object to the cluster with the nearest seed point ...

View/Open - ScholarWorks

... issues and challenges that come up when mining human reviews [14]. ...

Applying Supervised Opinion Mining Techniques

... training set with pre marked examples. It is important to train the model used for classification with domain-specific data. Marking is done by expressing subjectivity and polarity of training sets. c) Aggregation and presentation of results At this stage, opinions classification process result obta ...

A modified Apriori algorithm to generate rules for inference system

... Anj ((rr)) – j(r)-th linguistic value of n(r)-th input variable, that is the r-th element in itemsets, D – empirical data concerning the examined system, in data mining terminology often referred to as transaction data. Di – i-th set of empirical values of the model variables {x1i ,...,x Ni , y i } ...

Research of Association Rules Algorithm in Data Mining

... Big data era, whether it is produced by the stock exchange trading information or a large number of log information generated by the Internet server, they cannot be directly used to human and understanding, the problem is in front of people, how to extract useful knowledge for human huge amounts of ...

Improving Efficiency of Apriori Algorithm

jonyer01a - Journal of Machine Learning Research

Privacy-Preserving Clustering of Data Streams

Automatic Clustering & Classification

... membership for data instances. For example, you may wish to use classification to predict whether the weather on a particular day will be “sunny”, “rainy” or “cloudy”. ...

Data clustering: 50 years beyond K-means

... Minimizing this objective function is known to be an NP-hard problem (even for K = 2) (Drineas et al., 1999). Thus K-means, which is a greedy algorithm, can only converge to a local minimum, even though recent study has shown with a large probability K-means could converge to the global optimum when ...

< 1 ... 34 35 36 37 38 39 40 41 42 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering