ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs

... The problems of random access to a disk resident edge file has been addressed in [15]. They find that it is possible to define good storage layouts for undirected graphs but that the storage blowup can be very large. Given that we are interested only in very large graphs and graphs with directed edg ...

Data Mining - Data Warehousing and Data Mining by Gopinath N

... Presentation: decision-tree, classification rule, neural network ...

EZ33917920

... by proactively triggering the end points before the occurring of congestion collapse. Fair queuing mechanism is useful for mostly in the choke points routers having a small number of connections passing through them. A larger router mainly depends on RED. Some protocols such as TCP behave better und ...

Uncovering the Plot: Detecting Surprising Coalitions of Entities in

... chaining these connections to create stories that either serve as end hypotheses or as templates of reasoning that can then be prototyped. Our work here focuses exclusively on multi-relational datasets, either available in native form or obtained through straightforward ‘relationalization’ of unstru ...

Anomaly Detection and Preprocessing

... unusual, and the ability to distinguish between the two. Another serious difficulty is that the definition of normal can change. Sensor nodes in wireless sensor networks have limited energy resources and this hinders the dissemination of the gathered data to a central location. This stimulated our r ...

DY33753757

... appear frequently in transactions and some of them appear rarely. The rare itemsets may also be interesting. A rare itemset is an itemset consisting of rare items. It may be found by setting a low support threshold but leads to large number association rules consisting of both interesting and uninte ...

ICS 278: Data Mining Lecture 1: Introduction to Data Mining

... University of California, Irvine ...

ppt

...  When a transaction is fetched, the count is incremented for each set of items that is contained in the transaction.  Large itemsets: sets with a high count at the end of the pass  Many itemsets: If memory not enough to hold all counts for all itemsets use ...

WaveCluster: a wavelet-based clustering approach for spatial data

... The aim of data-clustering methods is to group the objects in spatial databases into meaningful subclasses. Due to the huge amount of spatial data, an important challenge for clustering algorithms is to achieve good time efficiency. Also, due to the diverse nature and characteristics of the sources ...

Research on Data Mining Models for the Internet of Things

... same time, the data sources of IoT are heterogeneous, and the resources of nodes are limited. These characteristics bring several problems to centralized data mining architecture. At first, mass data of IoT is stored in different sites. Therefore, it is difficult for us to mine distributed data by ...

Towards a Comprehensive Set of Big Data Benchmarks

... 3. Particular Benchmarks as instances of Ogres Our approach suggests choosing benchmarks from Ogre instances that cover a diverse range of facets. Rather than trying to be comprehensive at this stage, we give some examples. Note that kernel benchmarks are instances of Ogre Processing facets classifi ...

Slides

... can you see how the triangle inequality is used for the vantage-point pruning rules ? problem in metric spaces becomes more difficult than in vector spaces ...

Condensed representations for data mining - LIRIS

... predicate q can be defined in terms of a Boolean expression over some primitive constraints (e.g., a minimal frequency constraint used in conjunction with a syntactic constraint which enforces the presence or the absence of some sub-patterns). Some of the primitive constraints generally refer to th ...

Symbolic data analysis of complex data

... f(i, j, j’) = Copula(f(i, j’). f(i, j’)) Aim of Copula model in SDA:  find the Copula which minimises the difference with the joint.  In order to avoid the restriction to independency hypotheses and to reduce the cost of f(i, j, j’) computing. ...

Data Mining and Cluster Organisations

... The widespread interest in the economics of industrial location and, particularly, in the issue of industrial clusters as strategic entities in global industries (Tallman et al., 2004), follows as it became widely recognized that they can positively contribute to spur economic growth (Porter, 2003), ...

Open Access - Lund University Publications

... clustering approaches is included and the relevant practical work is introduced as well. An evaluation is made based on several common methods in the end of this chapter. ...

IT6702 - DATA WAREHOUSING AND DATA MINING TWO MARKS

Cross-domain Text Classification using Wikipedia

Online Batch Weighted Ensemble for Mining Data Streams with

... occurred. BDDM distinguishes between two levels of change: warning and drift. If the value of the slope a is less than 0, then default change level is warning. In the end, it is checked whether drift level was obtained. The threshold for the drift was inspired by the DDM [6] and is established using ...

big data and high dimensional data analysis

Research Proposal - University of South Australia

Mining[commat]home: toward a public

... architectures, therefore using shared memory and having a limited degree of parallelism. Section 3 discusses the problem of mining closed frequent itemsets, and proposed a new distributed algorithm. A very recent effort toward the distributed mining of ‘terabyte-sized data sets’ is [5]. The authors ...

Orthogonal Range Searching on the RAM, Revisited

DATA MINING CONCEPTS AND TECHNIQUES

... readers, they can be highly unstructured and lack a predefined schema, type, or pattern. Thus it is difficult for computers to understand the semantic meaning of diverse Web pages and structure them in an organized way for systematic information retrieval and data mining. Automated Web page clusteri ...

Exploiting Data Mining Techniques for Improving the Efficiency of

... considering features of each class, the new data object is allocated to them; its label and kind becomes determinable. In classification, the established model is obtained based on some training data (data objects that their class's label is determined and identified). The obtained model can be pres ...

< 1 ... 127 128 129 130 131 132 133 134 135 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction