ENHANCED PREDICTION OF STUDENT DROPOUTS USING

... imbalance and multi dimensionality, which can affect the low performance of students. In this paper, we have collected different database from various colleges, among these 500 best real attributes are identified in order to identify the factor that affecting dropout students using neural based clas ...

Scaling Up Classifiers to Cloud Computers

... the rest of the computation. However, as we will show, this places a significant I/O burden on the source node in both the partitioning and classifying stages. The technique may be appropriate for a cluster with a large central file server, but is not likely to scale to a cloud of any significant si ...

Cross Level Frequent Pattern Mining Using Dynamic

... analysis was the first frequent pattern mining conceptualized proposed [2]. It is about finding association among items bought in a market. This concept used transactional databases and other data repositories in order to find association’s casual structures, interesting correlations or frequent pat ...

Dimension Reduction for Visual Data Mining

... techniques. This information must be appropriately communicated to us in order to make the best use of it. According to [Ware, 2000], in order to be visualized, data are passed through four basic stages : independently of any visualization technique, the first step of visualization is data collectio ...

A Review of Feature Selection Algorithms for Data Mining Techniques

A Novel Data Mining Methodology for Narrative Text Mining and Its

... MSHA AII database is a typical industrial incident database. It contains structural data with well defined contents and formats, and nonstructural data in the form of narrative texts to provide background information with regard to each incident recorded. Most existing data mining methods were initi ...

UH-DMML Research Overview - Department of Computer Science

Outlier Ensembles - Outlier Definition, Detection, and Description

... Independent vs Sequential Ensembles • In independent ensembles, independent models are constructed from the data, and combination is used. – Most common approach for ensemble analysis. – Simple approach in terms of implementation. ...

FP3111131118

... number of levels, and finally it clusters the 2k clusters into k clusters. The exponential histogram (EH) data structure and another k-median algorithm that overcomes the problem of increasing approximation factors in the Guha et al [7] algorithm. Another algorithm that captured the attention of man ...

Spatio-Temporal Patterns of Passengers` Interests at

... 2.3. Generate hot topics from Twitter data using LDA A script using the R package named “tm” (Feinerer, 2014) was used to remove the “noise” from the Twitter data, which includes the process of removing the whitespaces, numbers, punctuations and stopwords, also converting all the upper case to lower ...

Cluster By: A New SQL Extension for Spatial Data Aggregation*

... In traditional SQL[6]-compliant database, Group By is the main aggregation mechanism to group individual tuples with the same grouping attribute(s) values together and form one tuple. Spatial database systems build on traditional database systems as cartridges and support spatial data types and pred ...

Exploring Cell Tower Data Dumps for Supervised Learning

... visiting rate. For example, the downtown is usually the most popular and busiest area in a city, so it records the highest user visiting rate, and thus needs more cell towers. In comparison, relatively fewer people visit the suburb in a day, thus the density of cell towers is lower in such area. Hav ...

pdf - ijesrt

... system which is used to classify the data [6]. Consider there are various objects. It would be surely beneficial for us if we know the characteristics features of one of the objects in order to predict it for its nearest neighbors because nearest neighbor objects have similar characteristics. The ma ...

Survey on Spatio-Temporal Clustering

... composed of a set of fixed geographical coordinates, each corresponding to one or more time series. Georeferenced variables data form a special case of georeferenced time series where only the most recent point of time series is available. Clustering this type of data aims to group objects based on ...

Understanding the indoor environment through mining sensory data

... SAS system (http://www.sas.com/) is used to implement the clustering process. There are a dozen of clustering algorithms in the SAS system. Among the clustering algorithms, the K-means algorithm is usually used for large datasets. As our dataset is so big, we select the K-means clustering algorithm ...

Discovering Regular Groups of Mobile Objects

... and by using knowledge about collisions between the MMCs (splitting or merging MMCs when those events occur). In experiments conducted on synthetic data with the K-Means as the generic algorithm used in micro-clustering, MMCs showed improvement in running times compared to NC (normal clustering), th ...

2.1 UNIT-2 material

... horizontal data format.Alternatively data can be represented in a table with itemname and set of transactions containing the item called vertical data format Optimization – techniques used to improve the performance of the algorithm for a given data distribution Architecture – sequential, parallel a ...

A Survey Report on RFM Pattern Matching Using Efficient

... formula that can estimate the probability that one customer will buy at the next time, and the expected value of the total number of times that the customer will buy in the future. It introduced a comprehensive methodology to discover the knowledge for selecting targets for direct marketing from a d ...

Incremental Clustering for the Classification of Concept

... because of a concept change (drift) that occured at some time point. After noticing this problem, we propose a new probabilistic representation for data streams suitable for problems with concept drift. More specifically, we map batches of data into what we name “Conceptual Vectors”. These vectors c ...

Topic Models over Text Streams: A Study of

4 - Department of Knowledge Technologies

... Step 5. Let m=m+1 and go to Step 1 if m

PDF

... Association rules in data mining are useful for the analysis and prediction of an individual user’s behavior which facilitates the data analysis on a regular basis for market basket data, clustering of products, designing catalogs and playing an immense role for store layout setting. This paper pres ...

marked - Kansas State University

... iterations. Normally, k, t << n.  Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms ...

CIS732-Lecture-22

Clustering for High Dimensional Data: Density based Subspace

< 1 ... 56 57 58 59 60 61 62 63 64 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering