Social Big Data: Recent achievements and new challenges (PDF

... techniques that are used in social big data. Section 4 describes a number of applications related to marketing, crime analysis, epidemic intelligence, and user experiences. Finally, Section 5 describes some of the current problems and challenges in social big data; this section also provides some co ...

Ph.D. Thesis Proposal Towards a spatio

... 4.2 An Object-Oriented Implementation of the 2W Model. . . . . 29 ...

Slide 1

... Both require K to be specified in the input K-medoids is less influenced by outliers in the data K-medoids is computationally more expensive Both methods assign each instance exactly to one cluster ...

Evaluation of Credit Scoring Methods using Data Mining

... A significant number of classification techniques have been implemented for credit scoring. The techniques include the following (Baesens et al. 2003): 1) Traditional statistical methods such as discriminate analysis and logistic regression. 2) Non-parametric statistical models such as k-nearest nei ...

Scientific Data Mining, Integration, and Visualization - LIGO

... Many definitions of data mining exist. Hand, Mannila and Smyth[4] defined it as “the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner”, while Han[5] called it “[the] e ...

Data mining and visualization

Predicting Missing Attribute Values Using k

D43062127

... that are used to extract sequential patterns: one includes methods based on association rule mining; the other one includes methods based on the use of tree structures and Markov chains to represent navigation patterns. Some well-known algorithms for mining association rules have been modified to ex ...

A Brief Review of Alternative Uses of Data Mining

... considerable insight into students’ learning behaviors. However the study also mentioned some of the shortcomings of data mining in education [9]. Data mining in education is a relatively new application of data mining and therefore does not work as seamlessly as it does in many business applicatio ...

Proposal to integrate Reactome usage in Pathway updates.

... But it does store extra information about interactions, reactions metabolites, localizations etc in BioPAX format. New GenMapp ...

Mining Sequential Patterns with Time Constraints

... science. For instance, in [AS95], the problem has been refined considering a database storing behavioral facts which occur over time to individuals of the studied population. Thus facts are provided with a time stamp. The concept of sequential pattern is introduced to capture typical behaviors over ...

Data Mining

... In these methods, a distance measure used to quantify the similarity between objects. Objects that are far from others can be regarded as outliers. These methods assume that the proximity of an outlier object to its nearest neighbors significantly deviates from the proximity of the object to most of ...

slide

... 1. Exhaustive Recursive Search (ERS): the input network is represented by an adjacency matrix M. (motif size <= 4) 2. ESU: starting with individual nodes and adding one node at a time until the required size k is reached. (motif size <=14) ...

Data Mining Strategies

... But How Do You Decide on k?  A key question to ask is “how many clusters is the right number?”  Try a bunch of different values, and map distance ...

Cost-Sensitive Classification with Genetic Programming

Proposed Analytics Track for B.S. in Business

... New Courses Emphasis – Team Projects • Courses focus on the communication & presentation of business insight to an audience of peers. • In the later part of each half of the course, students will assemble in teams of about four, and work on a case that is designed to evaluate knowledge and understa ...

¢¡¤£ £ ¦ £

... the values must be kept private from whoever is doing the mining operation. We instead assume that each site is allowed to see its local data, but not the data from the other sites. In return, we are able to get exact, rather than approximate results. In [7] the authors presented a privacy-preservin ...

Clustering

... • Example: They are currently the best-known classifier on a well-studied hand-writtencharacter recognition benchmark • Another Example: Andrew knows several reliable people doing practical real-world work who claim that SVMs have saved them when their other favorite classifiers did poorly. • There ...

Towards a Systematic Approach to Big Data Benchmarking

pdf (preprint)

Scientific Data Mining, Integration, and Visualization

Chapter I: Introduction to Data Mining

Is .1+.2 equal to .3?

Frequency-aware Similarity Measures - Hasso-Plattner

... problem. Suitable similarity measures help to find duplicates and thus cleanse a data set, or they can help finding nearest neighbors to answer search queries. The problem comprises two main difficulties: First, the representations of same real-world objects might differ due to typos, outdated value ...

Data Mining: Concepts and Techniques

... data lineage (history of migrated data and transformation path), currency of data (active, archived, or purged), monitoring information (warehouse usage statistics, error reports, audit trails) ...

< 1 ... 115 116 117 118 119 120 121 122 123 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction