A Discretization Algorithm Based on Extended Gini Criterion

Toward a Framework for Learner Segmentation

... financial boundaries, representing and reflecting the importance of the successful use of technology for instructing the current and future generations [Hines et al. 2009]. Accompanying the online environments themselves, the data generated by these resources can include learner profile data as wel ...

ADR-Miner - An Ant-Based Data Reduction Algorithm for Classification

CSE591 Data Mining

... frequent k-itemsets in any given scan • Remove from transactions those items that are not member of any candidate k-itemsets – e.g., if 12, 24, 14 are the only candidate itemsets contained in 1234, then can remove 3 – if 12, 24 are the only candidate itemsets contained in 1234, then can remove the t ...

A Study on Clustering Techniques on Matlab

... the given cluster as long as the density (number of objects or data points) in the neighbourhood exceeds some threshold. Namely, the neighbourhood of a given radius has to contain at least a minimum number of objects. When each cluster is characterized by local mode or maxima of the density function ...

R package: mlbench: Machine Learning Benchmark Problems

... Principal Components Analysis (PCA) • If X=(x1,x2,…,xn) is a random vector (mean vector , covariance matrix ), then principal component transformation X  Y = (X-) s.t.  is orthogonal, T   =  is diagonal, 1  2  … p  0. – Linear orthogonal transform of original data to new coordinate ...

pptx - University of Hawaii

... • A data set of N records each given as a d-dimensional data feature vector. Output: • Determine a natural, useful “partitioning” of the data set into a number of (k) clusters and noise such that we have: – High similarity of records within each cluster (intra-cluster similarity) – Low similarity of ...

A Framework for Categorize Feature Selection Algorithms

... with the outside world, and represents the simplicity of feature selection procedure in this approach. The Algorithm serves as evaluation criterion for features subset, too. The main disadvantage of this method is its high complexity that result in uncontrollability of problem solving. In other word ...

Data Warehouse

... Sorting, hashing, and grouping operations are applied to the dimension attributes in order to reorder and cluster related tuples ...

Mining Frequent Patterns Without Candidate Generation

... Presentation: decision-tree, classification rule, neural ...

Applying Data Mining to Demand Forecasting and Product Allocations

Sparse Additive Subspace Clustering

... Theoretical justification of SSC has received significant interests from computer vision researchers as well as statisticians. It is shown in [7] that when subspaces are disjoint, i.e. they are not overlapping, the block structure of affinity matrix can be exactly recovered. Similar block structure guar ...

Optimization-based Data Mining Techniques with Applications

Example: Data Mining for the NBA - The University of Texas at Dallas

Lecture8 - The University of Texas at Dallas

... I may want my location to be private, but does it make sense if a camera can capture a picture of me? - If there are sensors all over the place, is it meaningful to have privacy preserving surveillance?  This suggestion that we need application specific privacy  It is not meaningful to examine PPD ...

Data Mining Engineering

... Query languages like SQL are standardized and powerful, but for not skilled users are they too difficult. OLAP Tools allow flexible multidimensional queries. Their methods are querycentric. ...

PDF

... unique words and 19420 sentences are extracted. Then the algorithm constructs the Sentences Vs Words matrix for 19420 sentences as rows and 1228 words as column to test our TF-ISF statistical approach. The cell value is filled with 1 if the word occurs in the sentence and 0 otherwise. To weight the ...

Contrast Data Mining: Methods and Applications

Web Mining and Link Analysis I

... – Recording of all browser-related actions by a user (including visits to multiple websites) – More-reliable identification of individual users (e.g. by login ID for multiple users on a single computer) • Preferred mode of data collection for studies of navigation behavior on the Web • Companies lik ...

12 On-board Mining of Data Streams in Sensor Networks

... Keogh et al. [36] have proved empirically that most cited clustering timeseries data-stream algorithms proposed so far in the literature come out with meaningless results in subsequence clustering. They have proposed a solution approach using a k-motif to choose the subsequences that the algorithm c ...

IOSR Journal of Computer Engineering (IOSR-JCE)

... three conceptually different type of crawling method each of which is suitable to collect data for different type of analysis and related problem statement. The first and, we might say, the simplest is when the data we are interested can be found on one site at a well defined place or technically re ...

Data Warehouse

... • Sorting, hashing, and grouping operations are applied to the dimension attributes in order to reorder and cluster related tuples • Aggregates may be computed from previously computed aggregates, rather than from the base fact table – Smallest-child: computing a cuboid from the smallest, previously ...

Navigation pattern discovery using grammatical inference

... kept in the log file of a web server. The research work on Web Usage Mining is very extensive and covers aspects both related to traditional data mining issues as well as difficulties inherent in this specific area, such as the reliability of the usage data. Two thorough surveys on the field of Web ...

ppt

... high  minimizes training error, but leads to poor generalization (smaller separation, thus higher risk) IRDM WS 2005 ...

Optimal Bounds for the Predecessor Problem

< 1 ... 163 164 165 166 167 168 169 170 171 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction