Distributed Scalable Collaborative Filtering Algorithm

IOSR Journal of Computer Engineering (IOSRJCE)

... collection are dissimilar. Most of the algorithms are developed for numerical data for clustering may be easy to use in normal conditions but not when it comes to categorical data [1, 3]. Clustering is a challenging issue in categorical domain, where the distance between data points is undefined [4] ...

Data Driven Modeling for System-Level Condition - CEUR

... field of machine learning reduce effort of time for generating a system model caused by the complex sensor interdependencies. Additionally, a WPP is influenced by seasonal components and a normal state of work cannot be declared as precise as for a machine that works in a homogeneous environment of ...

PDF

... over 400 million tweets per day has emerged as an invaluable source of news, blogs, opinions and more. our proposed work consists three components tweet stream clustering to cluster tweet using k-means cluster algorithm and second tweet cluster vector technique to generate rank summarization using g ...

View PDF - CiteSeerX

... three largest databases all belong to telecommunication companies, with France Telecom, AT&T, and SBC having databases with 29, 26, and 25 Terabytes, respectively. Thus, the scalability of data mining methods is a key concern. A second issue is that telecommunication data is often in the form of tra ...

Evaluating Subspace Clustering Algorithms

... 3.3 MAFIA The MAFIA [10, 17, 18] algorithm extends CLIQUE by using an adaptive grid based on the distribution of data to improve efficiency and cluster quality. MAFIA also introduces parallelism to improve scalability. MAFIA initially creates a histogram to determine the minimum number of bins for a ...

K-Means Clustering with Distributed Dimensions

Cluster Analysis on High-Dimensional Data: A Comparison of

... harder as the dimensionality of the data increases. For clustering, the definition of density and the distance between points, which are critical for clustering would often become meaningless (Tan, et al., 2006). This problem indicates that the complexity of clustering the data grows exponentially w ...

IOSR Journal of Computer Engineering (IOSR-JCE)

... analysis,privacy preserving and it is also a heart favourite theme for the resarchers. A substantial work has been devoted to this research and tremendous progression made in this field so far. Frequent/Periodic itemset mining is used for search and to find back the relationship in a given data set. ...

Clustering by Pattern Similarity

... Clustering in high dimensional spaces is often problematic as theoretical results[8] questioned the meaning of closest matching in high dimensional spaces. Recent research work[9−13,17] has focused on discovering clusters embedded in the subspaces of high dimensional data sets. This problem is known ...

Making Time-series Classification More Accurate Using learned

Analysis of Neural Network Algorithms in Data Mining

An Unsupervised Pattern Clustering Approach for Identifying

... activities was discovered using k-means clustering technique. It then uses the temporal association rule to find the order of the events. The use of k-means cluster algorithm is that it has the problem of dealing with the outliers. In paper[6], EM-algorithm was used to form group of similar objects. ...

Topic6-Clustering

... • EM (Expectation / Maximization) is a widely used technique that converges to a solution for finding mixture models. • Assume multivariate normal components. To apply EM: – take an initial solution – calculate the probability that each point comes from each component and assign it (E-step) – re-est ...

Text Mining: Finding Nuggets in Mountains of Textual Data

SNN Clustering Algorithm

... Adapt to the characteristics of the data set to find the natural clusters Use a dynamic model to measure the similarity between clusters – Main property is the relative closeness and relative interconnectivity of the cluster – Two clusters are combined if the resulting cluster shares certain propert ...

A Review on Density based Clustering Algorithms for Very

... palakrishnan et al., 1995) in order to eliminate patterns that motivate slowness in the learning of the mp. On the other hand, (Barandela and Gasca, 2000) demonstrates the benefits to use a methodology based on the nnr to work with samples imperfectly supervised, producing a cleaning adapted of the ...

thesis paper

A Parallel Attribute Reduction Algorithm based on Affinity

... application fields and cross-cutting features with other research direction. As an unsupervised machine learning method, cluster analysis has been widely used in natural and social science. It classifies some objects into several clusters, making the differences of the objects in distinct classes as ...

A Fuzzy System Modeling Algorithm for Data Analysis and

data-mining-concepts

What is data mining?

... The storing of data in data warehouses The availability of increased access to data from Web navigation and intranet  We have to find a more effective way to use these data in decision support process than ...

Using data mining technology to provide a recommendation service

Sentence Clustering via Projection over Term Clusters

< 1 ... 98 99 100 101 102 103 104 105 106 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering