An Overview of Data Mining Techniques

... Median - the value for a given predictor that divides the database as nearly as possible into two databases of equal numbers of records. Mode - the most common value for the predictor. Variance - the measure of how spread out the values are from the average value. When there are many values for a gi ...

Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams

Finding REMO - Detecting Relative Motion Patterns in Geospatial

Inferring taxonomic hierarchies from 0

... taxonomic hierarchy. A taxonomy itself, however, need not be a hierarchy, but can be organized in a network-like structures as well. In this thesis we are concerned with the inference of taxonomic hierarchies. The motivation for the work comes from a practical starting point: we have been introduced ...

Big data preprocessing: Methods and Prospects

... Distributed computing has been widely used by data scientists before the advent of Big Data phenomenon. Many standard and time-consuming algorithms were replaced by their distributed versions with the aim of agilizing the learning process. However, for most of current massive problems, a distributed ...

The Role of Domain Knowledge in Data Mining

... rule mass functions. Based on the three rule mass functions defined above we can define three ar.filter ...

Information Retrieval in SAS®: The Power of Combining Perl Regular Expressions and Hash Objects

... In this section, we apply Perl regular expressions and hash objects in information retrieval for healthcare research use. In medical fields, doctors often document discharge summaries and clinical notes on surgery progress, radiology exams, patient history and etc. These notes are usually free text ...

Efficient Mining of Frequent Itemsets on Large Uncertain Databases

... While these algorithms work well for databases with precise values, it is not clear how they can be used to mine probabilistic data. Here we develop algorithms for extracting frequent itemsets from uncertain databases. Although our algorithms are developed based on the Apriori framework, they can be ...

Mining and Summarizing Customer Reviews - UIC

... It is necessary to detect when a wrapper stops to work properly. Any change may make existing extraction rules invalid. Re-learning is needed, and most likely manual relabeling as well. ...

No Slide Title

The following paper was originally published in the

... Our research aims to eliminate, as much as possible, the manual and ad-hoc elements from the process of building an intrusion detection system. We take a data-centric point of view and consider intrusion detection as a data analysis process. Anomaly detection is about finding the normal usage patter ...

Density-Based Clustering for Real-Time Stream Data

Advancing the discovery of unique column combinations

... Bottom-Up Apriori Bottom-up unique discovery is very similar to the Apriori algorithm for mining association rules. Bottom-up indicates here that the powerset lattice of the schema R is traversed beginning with all 1-combinations to the top of the lattice, which is the |R|-combination. The preﬁxed n ...

C-SWF Incremental Mining Algorithm for Firewall Policy Management

... traffic rules from preprocessing log data. After log analysis step, a collection of traffic rules would be generated, and then system would perform rule generalization to generalize a set of novel rules reflecting current environment from previous results [18]. Above all, the essential intention of ...

Consensus Guided Unsupervised Feature Selection

Chapter 1: Knowledge Management, Data Mining, and Text mining in Medical Informatics

... Probabilistic and statistical analysis techniques and models have the longest history and strongest theoretical foundation for data analysis. Although it is not rooted in artificial intelligence research, statistical analysis achieves data analysis and knowledge discovery objectives similar to machi ...

An Evolutionary Clustering Algorithm for Gene Expression

... number of nearest neighbors as input parameter ahead of time. With this representation, an evolutionary algorithm is used to find clusters which are represented as connected subgraphs. This approach is again not very scalable. Other than the length of the chromosomes being again the same as the size ...

Data Mining for the Internet of Things: Literature Review and

... association analysis, time series analysis, and outlier analysis. (i) Classification is the process of finding a set of models or functions that describe and distinguish data classes or concepts, for the purpose of predicting the class of objects whose class label is unknown. (ii) Clustering analyze ...

Delta Boosting Machine and its Application in Actuarial Modeling

... (τ = 1). The trade-off is that a small shrinkage factor requires a higher number of iterations and computational time increase. A strategy for model selection often used is practice is to set the value of τ as small as possible (i.e. between 0.01 and 0.001), which then leads to a relative large T . ...

Data Mining Applictaions/Research/Future

Computational intelligence in sports: Challenges and

... divided into query independent and query dependent. The former visualize datasets directly, i.e., without any assertion, while the latter according to a query speciﬁed by a user. A novel visualization technique using glyphs [4,17,21] can be used for illustrating sequential patterns or time sequences ...

IOSR Journal of Electronics and Communication Engineering (IOSR-JECE)

... there is always a scope for improvement and still several techniques are being developed to overcome the limitations of the existing techniques. This paper presents an analytical study on the existing techniques available for diabetes mellitus. The characteristic features of the approaches are inves ...

COMPARATIVE STUDY ID3, CART AND C4.5 DECISION TREE

... a decision tree for the given data by recursively splitting that data. The decision tree grows using Depth-first strategy. The C4.5 algorithm considers all the possible tests that can split the data and selects a test that gives the best information gain (i.e. highest gain ratio). This test removes ...

Amir Hossein Akhavan Rahnama Real-time Sentiment

... 1.4 Proposed Solution ........................................................................................ 13 1.5 Objectives of the solution............................................................................. 15 1.6 Overview of this thesis ........................... ...

Discovering Decision Trees

< 1 ... 99 100 101 102 103 104 105 106 107 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction