Neighborhood rough sets for dynamic data mining

Advancing the discovery of unique column combinations

... These are discussed in detail in Sec. 2. In the broader area of meta data discovery however, there is much work related to the discovery of functional dependencies (FD). In fact, the discovery of FDs is very similar to the problem of discovering uniques, as uniques functionally determine all other i ...

Concept Decompositions for Large Sparse Text Data using Clustering by Inderjit S. Dhillon and Dharmendra S. Modha

... insights are a key step towards our second focus, which is to explore intimate connections between clustering using the spherical k-means algorithm and the problem of matrix approximation for the word-by-document matrices. Generally speaking, matrix approximations attempt to retain the “signal” pres ...

Analysis and comparison of methods and algorithms for data mining

... names relations in DB, such that the Horn rule (MQ) (obtained by applying to MQ) encodes a dependency between the atoms in its head and body. The Horn rule is supposed to hold in DB with a certain degree of plausibility. The plausibility is dened in terms of indexes which we will formally dene ...

A Hash based Mining Algorithm for Maximal Frequent Item Sets

... sequence of log data into a set of maximal forward In open addressing, all item records are stored in the hash references. Second step is to derive an algorithm to table itself. When a new item has to be inserted, to found determine frequent traversal patterns from Maximum the place that item has to ...

Finding Highly Correlated Pairs Efficiently with Powerful Pruning

... the pairs. Although one can turn to external-memory computations, the performance deteriorates to an unacceptable level. Hence, in these situations, it is critical for the memory requirement of an algorithm to be much smaller than the size of the input data set. This is possible because it is often ...

IOSR Journal of Computer Engineering (IOSR-JCE)

... efficient algorithm named Fuzzy Cluster-Based AssociationRules(FCBAR).The FCBAR method is to create cluster tables by scanning thedatabase once, and then clustering the transaction records tothe k_th cluster table, where the length of a record is k.Moreover, the fuzzy large itemsets are generated by ...

1 Aggregating and visualizing a single feature: 1D analysis

... things. Structurally, knowledge can be thought of as a set of categories and statements of relation between them. Categories are aggregations of similar entities such as apples or plums or more general categories such as fruit comprising apples, plums, etc. When created over data objects or features ...

P-N-RMiner: A Generic Framework for Mining Interesting Structured

Adattarhaz

... partial materialization, Csak néhány cuboid materializációja, a lekérdezések gyakorisága, a méret, stb. alapján ...

Dimension Reconstruction for Visual Exploration of Subspace

Machine learning in bioinformatics

... optimal solutions when convergence is achieved. However, they do not necessarily converge for every ...

CF33497503

CD: A Coupled Discretization Algorithm

... quantitative attributes. One solution to this problem is to partition numeric domains into a number of intervals with corresponding breakpoints. As we know, the number of different ways to discretize a continuous feature is huge [6], including binning-based, chi-based, fuzzy-based [2], and entropy-b ...

A Complete Survey on application of Frequent Pattern Mining and

... Crime analysis can occur at various levels, including tactical, operational, and strategic. Crime analysts study crime reports, arrests reports, and police calls for service to identify emerging patterns, series, and trends as quickly as possible. They analyze these phenomena for all relevant factor ...

... Real world entry may be proposed work. This work is implemented with real time domain like web and medical datasets. ...

Clustering

... find clusters such that – Data points in one cluster are more similar to one another. – Data points in separate clusters are less similar to one another. ...

Steven F. Ashby Center for Applied Scientific Computing

... find clusters such that – Data points in one cluster are more similar to one another. – Data points in separate clusters are less similar to one another. ...

Combined Association Rule Mining - University of Technology Sydney

... characterized itemsets. Employing the concept of “share measures”, their algorithm may present more information in terms of ﬁnancial analysis. Diﬀerent from Hilderman et al.’s algorithm, each single rule in this paper is associated with a target class to provide ordered action list. Ras et al. [7,8] ...

Direct Local Pattern Sampling by Efficient Two

... algorithm. More precisely, it is used for the internal randomization of an algorithm with an otherwise deterministic output (all maximal frequent and minimal infrequent sets of a given input database). When applied for the final pattern discovery, however, this random process has the weakness that ...

Jacques Lacan`s Registers of the Psychoanalytic Field, Applied

1.2 What is data mining?

Data Stream Mining: an Evolutionary Approach

Outlier Detection using Semi-supervised and Unsupervised Learning on High Dimensional Data

... [2] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “LOF: Identifying density-based local outliers,” SIGMOD Rec, vol. 29, no. 2, pp. 93–104, 2000. [3] W. Jin, A. K. H. Tung, J. Han, and W. Wang, “Ranking outliers using symmetric neighborhood relationship,” in Proc 10th Pacific-Asia Conf on Ad ...

Finding Association Rules From Quantitative Data Using Data Booleanization

... Srikant, R., and Agrawal, R.(1996) called the problem of finding association rules from quantitative data the "Quantitative Association Rules" problem. They pointed out that if too many intervals are defined for a variable, rules based on this variable might not hit minimum support thresholds. On th ...

< 1 ... 25 26 27 28 29 30 31 32 33 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering