Comprehensibility of Data Mining Algorithms

Improving classification Accuracy of Neural Network through

an improved framework for outlier periodic pattern detection

... One thing that we have to note- it is realistic for a user to be predicted as being outside the set of significant places (e.g., maybe transitioning from one to another) and that our technique is also able to predict this state. ...

Data Cleaning Using Clustering Based Data Mining Technique

... from B.V.B. College of Engineering and Technology, Hubli in 1995 and M.Tech. degree in Computer Science and Engineering from M.S.Ramaiah Institute of Technology, Bangalore in 2007. Currently working as Assistant Professor in the Department of Computer Science and Engineering at Nitte Meenakshi Insti ...

A Survey on Educational Data Mining in Field of Education

... A. Analysis and Visualization of Data It is used to highlight meaningful information and support decision making. In the educational sector, for example, it can be helpful for course administrators and educators for analyzing the usage information and students‟ activitiesduring course to get a brief ...

Data mining for activity extraction in video data

Scalable Clustering Algorithms with Balancing Constraints

... database scans involved. For example, Bradley et al. (1998a, b) propose out-of-core methods that scan the database once to form a summarized model (for instance, the size, sum and sum-squared values of potential clusters, as well as a small number of unallocated data points) in main memory. Subseque ...

A Flexible Framework for Consensus Clustering

... We believe that these methods are bound to suffer because each clustering in the ensemble is given equal importance. Suppose we had 4 perfect clusterings and 1 terribly inaccurate clustering. These methods would not take into account the fact that the majority of the algorithms share 100% agreement ...

Solutions for analyzing CRM systems

... tree branch and can be used when the data for the desired predictors is missing. C4.5 works in three main steps. First, the root node at the top node of the tree considers all samples and passes through the samples information in the second node called ‘branch node’. The branch node generates rules ...

Streaming Submodular Maximization: Massive Data Summarization

Crime vs. demographic factors revisited: Application of data mining

... Since SVMs were mentioned only for binary classification problem, various multi-class extensions (cases where the number of classes is greater than 2) have been presented. We applied four of them in this paper. These were one-vs.-one (OVO), one-vs.-all (OVA), binary complete (BIN) and ordinal (ORD) ...

View PDF - International Journal of Computer Science and Mobile

... K-means clustering is a data mining/machine learning algorithm used to cluster observations into groups of related observations without any prior knowledge of those relationships. The k-means algorithm is one of the simplest clustering techniques and it is commonly used in medical imaging, biometric ...

Introduction to Database Systems

Algorithm for Discovering Patterns in Sequences

... - Pros: Works well for small datasets - Cons: Expensive ...

On Subspace Clustering with Density Consciousness

... Clearly, the trade-oﬀ between precision and recall in previous subspace clustering, which is incurred by the "density divergence problem," solely depends on the determination of the density threshold. However, it is quite subtle to set an appropriate density threshold, and the parameter determinatio ...

Customizing Computational Methods for Visual

Meta Mining Architecture for Supervised Learning

Frequent Term-Based Text Clustering

... algorithm. Many variants of the k-means algorithm have been proposed for the purpose of text clustering, e.g. [7], in particular to determine a good initial clustering. A recent study [4] has compared partitioning and hierarchical methods of text clustering on a broad variety of test datasets. It co ...

C4.5 Versus Other Decision Trees: A Review

Data Stream Clustering: Challenges and Issues

... suitable for very large data base so it has been applied for data stream mining. This method introduces two new concepts: micro clustering and macro clustering. Based on these two concepts it could overcome two main difficulties in agglomerative method in clustering: scalability and the inability to ...

K044055762

... IRCCC approach consists to apply clustering algorithms to the rows and columns of the data matrix, independently, and then to combine results using some sort of iterative process The algorithms based on DC approach begin with the entire data in one block (bi-cluster) and identifies bi-clusters at ea ...

Text Mining: Finding Nuggets in Mountains of Textual Data

... amounts of text. How does this decision affect the ability of the program to interpret the semantics of text?  Does not perform in-depth syntactic or semantic ...

Advanced_time_series

... 6,250,000 calls to the Euclidean distance function. ...

Cortina: a web image search engine

material - Dr. Fei Hu

... algorithms that are most frequently used in WSN applications. A. Decision trees Decision trees are characterized by fast execution time, ease in the interpretation of the rules, and scalability for large multi-dimensional datasets (Cabena, et al. 1998), (Han 2005). The goal of decision tree learning ...

< 1 ... 85 86 87 88 89 90 91 92 93 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering