Profiler: Integrated Statistical Analysis and Visualization for Data

Here

... This is because discords only require a single parameter, and as we have seen above, we can typical double or half this parameter without effecting the results. In contrast, most other anomaly detection schemes require require 3 to 7 parameters , including some parameters for which we may have poor ...

Deloitte Consulting, 2005

Data Mining and Health Care: Techniques of Application

... absence of disease but confers on a person or group’s freedom from illness and the ability to realize one’s potential. Health is therefore best understood as the indispensable basis for defining a person’s sense of well-being. The delivery of health care services thus assumes greater proportion, and ...

The Nearest Sub-class Classifier: a Compromise between the

Modeling of Human Movement Behavioral Knowledge from GPS

... stay points or activities performed at stay points are considered. Motivated by the potential merits, recently research trend is to extract semantic information or capturing inherent meaning of these huge volume of human movement data. This semantic enrichment of raw GPS log bridges the gap between ...

Chapter 1 Introduction: Data-Analytic Thinking

... Data Science, Engineering, and DataDriven Decision Making  Our churn example illustrates type 2 DDD problem. MegaTelCo has hundreds of millions of customers, each a candidate for defection. Ten of millions of customers have contracts expiring each month, so each one of them has an increased likelih ...

Sampling from scarcely defined distributions

Predictive Analysis with SQL Server 2008

... found BI solutions too expensive or complex to implement are now taking advantage of the comprehensive report authoring, rendering, and delivery capabilities of SQL Server Reporting Services and the powerful online analytical processing (OLAP) services provided by SQL Server Analysis Services. The c ...

Data mining workflow templates for intelligent discovery assistance

... user to maintain the relationship in the light of changes. Another weak point is that workflows are not checked for correctness before execution: it frequently happens that the execution of the workflow stops with an error after several hours runtime because of small syntactic incompatibilities betw ...

Global alignment of multiple protein interaction networks with

... significant attention in recent years. Toward this goal, highthroughput experimental techniques [e.g., yeast two-hybrid (1, 2) and coimmunoprecipitation (3)] have been invented to discover protein–protein interactions (PPIs) . The data from these techniques, which are still being perfected, are bein ...

Data Mining - ETH Zürich

Analysis of Various Periodicity Detection Algorithms in Time Series

... time series data. For example, periodicity mining allows an energy company to analyze power consumption patterns and predict periods of high and low usage so that proper planning may take place. Data mining, which is also called Knowledge-Discovery from data. It is the process of searching enormous ...

Referral Traffic Analysis: A Case Study of the Iranian Students` News

... Web traffic analysis is a well-known e-marketing activity. Today most of the news agencies have entered the web providing a variety of online services to their customers. The number of online news consumers is also increasing dramatically all over the world. A news website usually benefits from diff ...

A Conditional Random Field for Discriminatively-trained Finite

... string edit distance, and a conditional-probability parameter estimation method that exploits both matching and non-matching sequence pairs. Based on conditional random fields (CRFs), the approach not only provides powerful capabilities long sought in many application domains, but also demonstrates ...

A Visual Technique for Internet Anomaly Detection

... 3. the user interacts with the data, possibly going back to 2. The visual anomaly detection method is an iterative process. The anomaly detection method has to be performed with different parameters in order to achieve success. Interactive visualization provides an efficient means of trying out diff ...

Classification and knowledge discovery in protein databases

... redundancy, it is a common practice to impose a threshold on sequence identity or to introduce some other measure of sequence similarity such that the analysis is performed on non-redundant sequences. The same was done in this study. 2.2. Feature selection for high-dimensional data High-dimensional ...

Benchmarking Attribute Selection Techniques for Discrete Class

... Kononenko [9] notes that the higher the value of m (the number of instances sampled), the more reliable ReliefF’s estimates are—though of course increasing m increases the running time. For all experiments reported in this paper, we set m = 250 and k = 10 as suggested in [9], [10]. C. Principal Comp ...

Finding Sequential Patterns from Large Sequence Data

... Given a sequence S, the parameters min-rep and max-dist and the maximum period length Lmax, in three phases we can discover the valid subsequences that have the most repetitions for each valid pattern whose period length does not exceed Lmax. When parameters are not set properly, noise may be qualif ...

Accelerating Data Mining Workloads: Current Approaches and

... during execution using profiling tools (like Intel VTune analyzer [19]) for every application, and analytically studied their individual characteristics. A k-Means based clustering algorithm [16] was applied to the performance characteristics of these applications. The goal of this clustering is to ...

Mayo_tutorial_July14

... – Keep M’ if improvement, (M,M’) >  • Often times,  is chosen from a set of alternative components,  = {1, 2, …, k} • If many alternatives are available, one may inadvertently add irrelevant components to the model, resulting in model overfitting July 15, 2015 Mining Big Data ...

Discovery of Spatio-Temporal Patterns from Location

... Different geographical discretizations can be proposed to allow the extraction of patterns at different resolutions. A possibility is to divide the area using a regular grid. This would also allow to study the events at different levels of granularity. Controlling the size and shape of the cells it ...

document

... • So far we did not ask anything that statistics would not have ask. So Data Mining another word for statistic? • We hope that the response will be resounding NO • The major difference is that statistical methods work with random data samples, whereas the data in databases is not necessarily random ...

Mining Frequent Spatio-Temporal Patterns from

... patterns inside a geographical area. These data are available from different sources, like GPS traces extracted from these devices or from internet sites where users voluntarily share their location among other information. Different knowledge can be extracted from these data depending on the analys ...

A Clustering-based Approach for Discovering Interesting Places in

... over trajectories, which allows powerful semantic analysis, called stops and moves. A stop is a semantically important part of a trajectory that is relevant for an application, and where the object has stayed for a minimal amount of time. For instance, in a tourism application, a stop could be a tou ...

< 1 ... 121 122 123 124 125 126 127 128 129 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction