Data clustering with size constraints

... The goal of cluster analysis is to divide data objects into groups so that objects within a group are similar to one another and different from objects in other groups. Traditionally, clustering is viewed as an unsupervised learning method which groups data objects based only on the information pres ...

The Application of Data Mining in Crime Prevention: The Case of

... Results of the experiments have shown that decision tree has classified crime records at an accuracy rate of 94 percent when the attribute CrimeLabel is used as a basis for classification. Where as, in the same experiment, the accuracy rate of neural networks is 92.5 percent. On the other hand, in t ...

disi.unitn

Data clustering: 50 years beyond K-means

... descriptive, meaning that the investigator does not have pre-specified models or hypotheses but wants to understand the general characteristics or structure of the high-dimensional data, and (ii) confirmatory or inferential, meaning that the investigator wants to confirm the validity of a hypothesis/mo ...

VIT-PLA: Visual Interactive Tool for Process Log Analysis

... other efforts have also been made to produce different visualizations for process executions or workflow data [3][4][5][6][7][8][9]. Although these systems have been shown to work well with focused processes and relatively small event logs, little work has been done with large process logs with many ...

Descriptive Data Mining Approach to Visualize Diabetes

... resistance because, in this type the body can produce insulin but either it is not sufficient or the body cannot respond to its effect leading glucose remains circulating in blood. This type also includes LADA (Latent Autoimmune Diabetes in Adults), describing a small number of people with diabetes ...

Multi-threaded Implementation of Association Rule Mining with

Streaming Random Forests Hanady Abdulsalam

... Many recent applications such as telecommunication data management, financial applications, sensor data analysis, and web logs, deal with data streams, conceptually endless sequences of data records, often arriving at high rates. Analyzing or mining data streams raises several issues different from ...

1 - News

... • Find a model for class attribute as a function of the values of other attributes. • Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, ...

Big Data Analytics with Oracle Advanced Analytics In

... The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The ...

An Algorithm To Discover Time-Interval Sequential Patterns In

... databases [5, 9]. The discovered information and knowledge is useful for various applications, such as market analysis, decision support, flaw detection and business management. Because of its importance, many approaches have been proposed to extract information, and mining sequential patterns is on ...

Travelling the world of gene^gene interactions

... according to their predictive power. Embedded methods perform variable selection during a training step and are usually specific to the chosen learning machine. Notably, in contrast to dimensionality reduction techniques like those based on projection (e.g. principal components analysis [23]), featu ...

Solving Complex Machine Learning Problems with Ensemble Methods

PDF

Decision Tree Induction

... Informally, this can be viewed as posteriori = likelihood x prior/evidence ...

Supervised Local Pattern Discovery

... global counterpart to a pattern is a model, which explains data formation. Practical use of algorithms in this area is motivated through their successful application in a variety of contexts. For instance, supervised local pattern applications have been used in marketing. Lavrač et al. (2004a) used ...

Introduction to Database Systems

... – Find all of the images that are similar to the given image sample – Compare the feature vector (signature) extracted from the sample with the feature vectors of images that have already been extracted and indexed in the image database ...

MCAIM: Modified CAIM Discretization Algorithm for Classification

... attribute values into a finite number of intervals in order to generate attribute with a small number of distinct values. Discretization methods have been developed along different approaches due to different needs: supervised versus unsupervised, static versus dynamic, global versus local, top-down ...

Fast and Scalable Subspace Clustering of High Dimensional Data

... Some of the clustering algorithms look for a fixed number of clusters in pre-defined subspaces. Such algorithms diminish the whole idea of discovering previously unknown and hidden clusters. We cannot have prior information of the relevant subspaces or the number of clusters. The iterative process o ...

Peng Gong School of Electrical Engineering Thesis supervisor: Prof

... case studies in the past done on the software companies’ act as foundation for the analysis. This study will be using the current data analysis methods and tools with a focus on evaluation plus the company-specific information that we gain from the representative also as this research instructor. We ...

Automatic Detection of Cluster Structure Changes using Relative

... In ReDSOM, changes of clustering structure are identified from changes of density estimations at the same location over time. The plot of density estimation can provide useful characteristics in the data, such as skewness, multi-modality, and clustering structure. Density estimation is the construct ...

Subspace clustering for high dimensional datasets

... domains. Feature extraction and attribute selection are two popularly used methods for dimensionality reduction. Feature extraction methods like Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) create new attributes which are linear combinations of the original attributes. T ...

Incremental, Online, and Merge Mining of Partial Periodic Patterns in

... Partial periodic patterns, which are the patterns of interest in this paper, specify the behavior of the time series at some, but not all the points in time [17]. For example, a pattern disclosing that the prices of a specific stock are high every Friday and low every Tuesday is a partial periodic p ...

Discrete wavelet transform-based time series analysis and mining

... A time series is a sequence of data that represent recorded values of a phenomenon over time. Time series data constitutes a large portion of the data stored in real world databases [Agrawal et al. 1993]. Time series data appear in many application domains, such as in financial, meteorological, medi ...

Using Pattern Decomposition Methods for Finding All Frequent

... A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Frequent patterns are ones that occur at least a user-given number of times (minimum support) in the dataset. They allow us to perform essential tasks such as discovering association relationships among ite ...

< 1 ... 35 36 37 38 39 40 41 42 43 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction