Entity-based Data Source Contextualization for Searching the Web of Data

... graphs, i.e., Di = {G1i , . . . , Gni }, with D as set of all sources. Notice, the above definition abstracts from the data access, e.g., via HTTP GET requests. In particular, Def. 2 also covers Linked Data sources, see Fig. 1. Entity Model. Given a data source Di , an entity e is a subject that is ...

Document

... 10. FEATURE EXTRACTION In machine learning, pattern recognition and in image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative, non-redundant, facilitating the subsequent learning and generalization steps, in so ...

Frequent Pattern Mining using Parallel Architecture of

... mining, where more than one node can contain single item which causes repetition of same item and needs more space to store many copies of same item. Page fault is also one of the causes of it. PGMiner, a novel graph based algorithm proposed by H.D.K. Moonesinghe et al. for mining frequent closed it ...

A Gene Expression Programming Algorithm for Multi

... to generate one or several single label datasets from one multi-label dataset before applying a classical classification technique. The simple transformation methods are classified in [45] as copy methods, selection methods and ignore method. Copy methods transform each multi-label pattern into patt ...

Mining concepts from large SAGE gene expression - LIRIS

... database. The user can select data from r and s remains the same. The user can also select subsets s, and r is not modified. Let us just consider simple typical queries, i.e., selections1 . A data selection example is σC (r0 , s0 ) = (r1 , s1 ) where r1 = σC (r0 ) and s1 = s0 . For instance, we wil ...

Massimo Poesio: Text Categorization and

... Wednesday 25th May and the details are as follows Who: Dave Robertson Title: Formal Reasoning Gets Social Abstract: For much of its history, formal knowledge representation has aimed to describe knowledge independently of the personal and social context in which it is used, with the advantage that w ...

SDMOQL: An OQL-based Data Mining Query Language for Map

A Survey on Data Mining Techniques for Customer

... Sequence discovery: Sequence discovery is the identification of associations over times or pattern over time. Sequential pattern mining has become the challenging task in data mining due to complexity. Most common tools are statistics and set theory. Visualization: Visualization refers the presentat ...

Using hierarchical data mining to characterize performance

... conditional on the (discrete) distribution of the conﬁgurations in C. The values fpk gnk¼1 are what the statisticians call prior probabilities. For most purposes of this paper, we simply estimate fpk gnk¼1 from available data. These values are explicitly or implicitly constructed during experiment d ...

Dimension Reconstruction for Visual Exploration of Subspace

... clusters is a cyclic process. Discoveries in interesting subspaces may lead to other exploration goals and motivate users to start a new round of subspace analysis. One of the potential goals is to construct new subspaces that combine diverse cluster structures observed in different original subspac ...

Lecture Slides - School of Computing and Information Sciences

... OLAP Technology: An Overview ...

intro - EECS Department

Classification, clustering, similarity

... the independence hypothesis • … makes computation possible • … yields optimal classifiers when satisfied • … but is seldom satisfied in practice, as attributes (variables) are often correlated. • Attempts to overcome this limitation: o Bayesian networks, that combine Bayesian reasoning with causal r ...

Unsupervised and Semi-supervised Clustering: a

... For fuzzy partitional methods, internal validity indices should take into account both the data items and the membership degrees resulting from clustering. The average partition density in [21] is obtained as the mean ratio between the “sum of central members” (sum of membership degrees for the item ...

Validation of Document Clustering based on Purity and Entropy

... objects in each group are indistinguishable under some criterion of similarity. Clustering is an unsupervised classification process fundamental to data mining (one of the most important tasks in data analysis). It has applications in several fields like bioinformatics, web data analysis, text minin ...

Online Mining in Sensor Networks | SpringerLink

a scrutiny of association rule mining algorithms

... Computational experiments are performed to test the VNS algorithm against a benchmark problem set. The results show that the VNS algorithm is an effective approach for solving the MTFWS problem, capable of discovering many large-one frequent itemset with time-windows (FITW) with a larger timecoverag ...

4. Variograms

... Here the fluctuation behavior of corks should be essentially the same over time at each location. Moreover, any dependencies among cork heights due to the smoothness of wave actions should depend only on the spacing between their positions in Figure 4.3. Hence the homogeneity and isotropy assumption ...

Machine learning in bioinformatics

... and the value of a fitness or evaluation function—in an optimization problem. In a modelling problem, the ‘learning’ term refers to running a computer program to induce a model by using training data or past experience. Machine learning uses statistical theory when building computational models sinc ...

Mining concepts from large SAGE gene

... database. The user can select data from r and s remains the same. The user can also select subsets s, and r is not modified. Let us just consider simple typical queries, i.e., selections1 . A data selection example is σC (r0 , s0 ) = (r1 , s1 ) where r1 = σC (r0 ) and s1 = s0 . For instance, we wil ...

Hybrid intelligent systems in petroleum reservoir

... handle datasets of high dimensionality and fast in execution, while others are limited in their ability to handle uncertainties, difficult to learn, and could not deal with datasets of high or low dimensionality. The ‘‘no free lunch’’ theorem also gives credence to this problem as it postulates that ...

Clustering based Two-Stage Text Classification Requiring Minimal

... Clustering has been applied in many sub-domains of the problem of text classification, including feature compression or extraction [10], semisupervised learning [11], and clustering in large-scale classification problems [12,13]. The following will review several related work about clustering aiding ...

A decision-theoretic approach to data mining

... as one of three classes: “low,” “medium,” or “high” payment risks. The bank has applied a data mining algorithm to its database of previous loan applications and their payment results and has induced a classifier that is used to classify future loan applications. The set of actions that can be taken ...

Opening the Black Box: Interactive Hierarchical Clustering for

... However, on one hand, existing spatial clustering methods can only deal with low-dimensional spaces (usually 2-D or 3-D space: 2 spatial dimensions and a non-spatial dimension). On the other hand, general-purpose clustering methods mainly deal with non-spatial feature spaces and have very limited po ...

VISUALIZATION OF MODIS DATA IN THE BLENDER ENVIRONMENT: by Jonathan D. Wilson

< 1 ... 132 133 134 135 136 137 138 139 140 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction