Document

search engine optimization using data mining approach

... and retrieval. With Vector Space Indexing, all the documents in the database will be indexed against terms in each document. The problem arises here as it takes a great amount of time in order to process all the terms step-by-step. In order to reduce terms for indexing, we have used the Stemmer tech ...

[Mamoulis 2004] Mining, indexing, and querying historical

... querying periodic spatiotemporal data. The problem of discovering periodic patterns from historical object movements is very challenging. Usually, the patterns are not explicitly specified, but have to be mined from the data. The patterns can be thought of as (possibly noncontiguous) sequences of ob ...

Mining, Indexing, and Querying Historical Spatiotemporal Data

... querying periodic spatiotemporal data. The problem of discovering periodic patterns from historical object movements is very challenging. Usually, the patterns are not explicitly specified, but have to be mined from the data. The patterns can be thought of as (possibly noncontiguous) sequences of ob ...

Neural Networks Demystified - Francis Analytics Actuarial Data Mining

... Warner and Misra (Warner and Misra, 1996) point out that neural network analysis is in many ways like linear regression, which can be used to fit a curve to data. Regression coefficients are solved for by minimizing the squared deviations between actual observations on a target variable and the fitt ...

Using semi-parametric clustering applied to electronic health record

... due to the size of the dataset, but other problems result from processing sequences that are non-uniformly sampled, variable in length, highly heterogenous or incomplete. ...

On Clustering Validation Techniques

A Simple Constraint-Based Algorithm for Efficiently Mining

Detection of financial statement fraud and feature selection using

... compared multi-criteria decision aids with statistical techniques such as logit and discriminant analysis in detecting fraudulent financial statements. A novel financial kernel for the detection of management fraud is developed using support vector machines on financial data by Cecchini et al. [9]. Hua ...

Cosmology in the Era of Large Surveys Ryan Scranton Google 13 March 2007

... that were largely unanticipated prior to the beginning of these surveys. • The next generation of surveys has the potential to tell us a great deal about the nature of dark energy, but the unavoidable size & complexity of these surveys will be a problem for outside users. • By unifying the data and ...

Data Mining Classification: Basic Concepts, Decision Trees, and

Mining Strong Affinity Association Patterns in Data Sets

... Related Work: Support-based pruning does not work well with dense data sets, nor is it effective at finding low support patterns. The concepts of maximal [3, 5] and closed itemsets [11, 16] were proposed to address these limitations. Although these concepts can identify a smaller set of representati ...

The 2009 Knowledge Discovery in Data Competition (KDD Cup

... switch provider (churn), buy new products or services (appetency), or buy upgrades or addons proposed to them to make the sale more profitable (up-selling). The most practical way to build knowledge on customers in a CRM system is to produce scores. A score (the output of a model) is an evaluation f ...

Selecting the right objective measure for association analysis

... information about the interestingness of a pattern. Data mining practitioners also tend to apply an objective measure without realizing that there may be better alternatives available for their application. In this paper, we describe several key properties one should examine in order to select the r ...

Slides - dimacs

... – The uncertainty model may attach a probability to each world – Queries conceptually range over all possible worlds ...

1.5. Frequent sequence mining in data streams

Partition Incremental Discretization

... also define this number. The input for this layer is the set of intervals of the first layer. A. Initialization of the layers: First layer The number of intervals in this layer should be much higher than required. It can be initialized in two modes: • Without seeing any previous data. We use a EWD ...

A METHODOLOGY FOR FINDING UNIFORM REGIONS IN SPATIAL

A Binary Decision Diagram Based Approach for Mining Frequent

Aggregating Time Partitions

A Generic Framework for Rule-Based Classification

... classifier consists of a set of rules, used in a given order during the prediction process, to classify unlabeled objects. Definition 1. (Classifier) Let C be the set of all classifiers. A classifier is a tuple hR,

Crowd-Based Mining of Reusable Process Model Patterns

... from one language into another, or designing a logo. A crowdsourcing platform is an online software infrastructure that provides access to a crowd of workers and can be used by crowdsourcers to crowdsource work. Multiple CS platforms exist, which all implement a specific CS model : The marketplace m ...

4FT Miner

... Pudil, P., Novovičová J.: Novel Methods for Subset Selection with Respect to Problem Knowledge, IEEE Transactions on Intelligent Systems - Special Issue on Feature Transformation and Subset ...

Spatial autocorrelation

... correlation with spatial database  Spatial databases are used for spatial data mining, which includes statistical techniques and more specialised DM techniques such as association rules.. In this case the data mining algorithms need to have a spatial context. We must explicitly include location inf ...

Semantically-grounded construction of centroids for datasets with

< 1 ... 27 28 29 30 31 32 33 34 35 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction