Recent issues in data mining

... • Due to the increasing use of the recordbased databases, recent important applications have called for the need of incremental mining – Such applications include Web log records, stock market data, grocery sales data, transactions in electronic commerce, and daily weather/traffic records, to name a ...

Data mining usage in health care management: literature survey

... is used when a confirmation or a rejection of an already defined hypothesis is needed. The other style is knowledge discovery (relevant for this article). It is a bottom-up approach and it is used when we want to find something that we do not know searching available data. It can be directed or undi ...

Computational Intelligence Techniques for Predicting Earthquakes

... in [2]. The fitness function was formed by four different objectives: support, confidence, comprehensibility of the rule (aimed at being maximized) and the amplitude of the intervals that forms the rule (intended to be minimized). The work published in [17] presented a new approach based on three novel ...

CV PDF - Hui Xiong - Rutgers University

... Instead, we designed a server-based agent to capture user sessions explicitly at the server end and construct a new web log, which is more suitable for web usage mining tasks. A summary of the preliminary work has been published in the First SPIE International Conference on Data Mining and Knowledge ...

Data-integration and catalogues - Department of Information and

... • Not magic, still need to understand data and statistics ...

jpcap, winpcap used for network intrusion detection system

Retail Marketing Segmentation and Customer Profiling for

... location. A novel methodology and model are proposed for accomplishing the task efficiently. The methodology is based on the integration of the popular data mining approaches such as clustering and association rule mining. It focuses on the discovery of rules that vary according to the economic and ...

A Survey on Data Mining and Text Categorization Technique

... classification. In this process, it is to determine the features which are most relevant to the classification process. This is because some of the words are much more likely to be correlated to the class distribution than others. Therefore, a wide variety of methods has been proposed in the literat ...

Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare

... aggregation returns a set of numbers instead of a single number for each group, resembling a multi-dimensional vector. We proposed an abstract, but minimal, extension to SQL standard aggregate functions to compute horizontal aggregations which just requires specifying subgrouping columns inside the ...

Lecture7

In [2] the author showcases a systematical data mining approach to

... reduce the data. After that we club together the various parameters into a few numbers of manageable factors with the help of similar conceptual bases. Now the stratified or classified data can be used for business analysis. Based on existing literature correlations between various factors can be pr ...

Testing - Stony Brook University

Full Text - Research Publications

... reduction are usually effective but imperfect. The most usual step for data dimension reduction is to examine the attributes and consider their predictive potential. Some of the attributes can usually be discarded, either because they are poor predictors or are redundant relative to some other good ...

Clustering Data Streams: Theory and Practice

... Data stream algorithms must not have large space requirements, and so our first goal will be to show that clustering can be carried out in small space (n for n data points, and 0 < < 1), without being concerned with the number of passes. Subsequently we will develop a one-pass algorithm. We first ...

Unsupervised Spatio-Temporal Mining of Satellite Image Time Series

Hierarchical Clustering with Simple Matching and Joint Entropy

CP21586589

... e. Integration and consolidation. The data we need may reside in a single database or in multiple databases. The source databases may be transaction databases used by the operational systems of our company. Other data may be in data warehouses or data marts built for specific purposes. Data integrat ...

What is Data Warehouse

... – A cube's structure is defined by its measures and dimensions. – They are derived from tables in the cube's data source. – The set of tables from which a cube's measures and dimensions are derived is called the cube's schema. – Every cube schema consists of a single fact table and one or more dimen ...

Lecture 3b

SHIFTING FROM LEGACY SYSTEMS TO A DATA MART AND

... information using Natural Language or field based queries. Our CAIRN system is a general tool that has focused on medical information covering the needs of physicians. Today, concepts related to Data Mining and Data Marts have to be incorporated into such a framework. In this paper a CAIRN-DAMM (Com ...

Data Mining

... understand language — we learn it — and it makes sense to try to have computers learn language instead of trying to program it all it ...

Document

S4904131136

... package C4.5, Olcay and Onur [31] show how to parallelize C4.5 algorithm in three ways: (i) feature based, (ii) node based (iii) data based manner. To sum up, one of the most useful characteristics of decision trees is their comprehensibility. Decision trees tend to perform better when dealing with ...

Lecture5 - The University of Texas at Dallas

Analysis of Mass Based and Density Based Clustering

... The mass estimation is another technique to find clusters in arbitrary shape data. In the clustering the mass estimation is unique because in this estimation there is no use of distance or density [20]. DEMassDBSCAN clustering mass estimation technique is used (it is alternate of density based clust ...

< 1 ... 279 280 281 282 283 284 285 286 287 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction