![Parallel K-Means Algorithm for Shared Memory Multiprocessors](http://s1.studyres.com/store/data/017885331_1-30d4849c248231d4104e67f751dbc278-300x300.png)
Efficient clustering techniques for managing large datasets
... group (= a cluster) consists of objects that are similar between themselves and dissimilar to objects of other groups. From the machine learning perspective, Clustering can be viewed as unsupervised learning of concepts [5]. A simple, formal, mathematical definition of clustering, as stated in [6] i ...
... group (= a cluster) consists of objects that are similar between themselves and dissimilar to objects of other groups. From the machine learning perspective, Clustering can be viewed as unsupervised learning of concepts [5]. A simple, formal, mathematical definition of clustering, as stated in [6] i ...
Revisiting interestingness of strong symmetric association rules in educational data
... association rules from data. The selection of items and transactions within the data remains intuitive. In comparison with a classification task for example, there are many classifiers that, with the same set of data, can give different results. The data preparation and most importantly the definiti ...
... association rules from data. The selection of items and transactions within the data remains intuitive. In comparison with a classification task for example, there are many classifiers that, with the same set of data, can give different results. The data preparation and most importantly the definiti ...
A RESEARCH SUPPORT SYSTEM FRAMEWORK FOR WEB DATA
... web crawlers to create entries for indexing. They can also be used in other possible applications such as page validation, structural analysis and visualization, update notification, mirroring and personal web assistants/agents etc. [2]. Search engines are not adequate for web mining for a research ...
... web crawlers to create entries for indexing. They can also be used in other possible applications such as page validation, structural analysis and visualization, update notification, mirroring and personal web assistants/agents etc. [2]. Search engines are not adequate for web mining for a research ...
Efficient Computation of Iceberg Cubes by Bounding Aggregate
... following the BUC framework, and only one group is aggregated at a time in the recursive partitioning process, which limits the amount of information readily available for pruning. In this paper, we propose a novel technique, called bound prune cubing (BP-Cubing), for efficiently computing iceberg c ...
... following the BUC framework, and only one group is aggregated at a time in the recursive partitioning process, which limits the amount of information readily available for pruning. In this paper, we propose a novel technique, called bound prune cubing (BP-Cubing), for efficiently computing iceberg c ...
an effective analysis of spatial data mining methods using range
... values using the contiguity matrix. The first technique performs a smoothing by replacing each attribute value by the average value of its neighbors. This highlights the general characteristics of the data. The other contrasts data by subtracting this average from each value. Each attribute (called ...
... values using the contiguity matrix. The first technique performs a smoothing by replacing each attribute value by the average value of its neighbors. This highlights the general characteristics of the data. The other contrasts data by subtracting this average from each value. Each attribute (called ...
Weka4WS: Enabling Distributed Data Mining on Grids
... computing and data processing By exploiting a service-oriented approach, knowledge discovery applications can be developed on Grids to deliver high performance and manage data and knowledge distribution The goal of this work is to extend the Weka toolkit for supporting distributed data mining th ...
... computing and data processing By exploiting a service-oriented approach, knowledge discovery applications can be developed on Grids to deliver high performance and manage data and knowledge distribution The goal of this work is to extend the Weka toolkit for supporting distributed data mining th ...
Mining Gene Expression Data using Domain Knowledge∗
... Eisen et al.[15] showed clusters which significance was demonstrated by common functional categorization, other works concluded that further statistical analyzes were required. Indeed, a microarray dataset contains numerous groups of co-expressed genes. Then, a typical strategy for a biologist is to ...
... Eisen et al.[15] showed clusters which significance was demonstrated by common functional categorization, other works concluded that further statistical analyzes were required. Indeed, a microarray dataset contains numerous groups of co-expressed genes. Then, a typical strategy for a biologist is to ...
Density Clustering Method for Gene Expression Data
... thousands of genes from acute leukemia patient’s testing samples using self-organizing maps clustering approach [8]. Some other clustering approaches, such as k-mean [21], fuzzy kmeans [1], CAST [3], etc, also have been proven to be valuable clustering methods for gene expression data analysis. Howe ...
... thousands of genes from acute leukemia patient’s testing samples using self-organizing maps clustering approach [8]. Some other clustering approaches, such as k-mean [21], fuzzy kmeans [1], CAST [3], etc, also have been proven to be valuable clustering methods for gene expression data analysis. Howe ...
Towards Effective and Efficient Distributed Clustering
... sites are combined and analyzed. The result of the central analysis may be returned to the local sites, so that the local sites are able to put their data into a global context. The requirement to extract knowledge out of distributed data, without a prior unification of the data, created the rather ...
... sites are combined and analyzed. The result of the central analysis may be returned to the local sites, so that the local sites are able to put their data into a global context. The requirement to extract knowledge out of distributed data, without a prior unification of the data, created the rather ...
OPTIMIZATION-BASED MACHINE LEARNING AND DATA MINING
... method. Ten linear programs were randomly generated for each number of variables and constraints, and the average solution time in seconds is given with the standard deviation in parentheses for each algorithm. Primal methods were used for problems with more variables than constraints, and dual meth ...
... method. Ten linear programs were randomly generated for each number of variables and constraints, and the average solution time in seconds is given with the standard deviation in parentheses for each algorithm. Primal methods were used for problems with more variables than constraints, and dual meth ...
3.3 ANOVA analysis of Brucella vaccine protection
... protection efficacy assessment. For those pathogens that kill a model animal (e.g., mouse), survival assessment is used for assessing vaccine protection efficacy (Brinkman et al, 2010). Since virulent Brucella does not kill mice, the survival of pathogen challenged mice is not a useful method to ass ...
... protection efficacy assessment. For those pathogens that kill a model animal (e.g., mouse), survival assessment is used for assessing vaccine protection efficacy (Brinkman et al, 2010). Since virulent Brucella does not kill mice, the survival of pathogen challenged mice is not a useful method to ass ...
Government Data Mining and the Fourth Amendment
... data mining is undertaken by the government, does it implicate the Fourth Amendment? Second, does the analysis change when data mining is undertaken by private entities that then make the data or data analysis available to the government? Third, if the Fourth Amendment does impose some restrictions ...
... data mining is undertaken by the government, does it implicate the Fourth Amendment? Second, does the analysis change when data mining is undertaken by private entities that then make the data or data analysis available to the government? Third, if the Fourth Amendment does impose some restrictions ...
Mining Associations on the Warsaw Stock Exchange
... also purchase item B, then we can say that there is a relationship between item A and item B, and such information can be useful for decision making. Therefore, the aim of applying the association rules algorithm is to nd interesting relationships by analyzing data and use them to support a decisio ...
... also purchase item B, then we can say that there is a relationship between item A and item B, and such information can be useful for decision making. Therefore, the aim of applying the association rules algorithm is to nd interesting relationships by analyzing data and use them to support a decisio ...
Nonlinear dimensionality reduction
![](https://commons.wikimedia.org/wiki/Special:FilePath/Lle_hlle_swissroll.png?width=300)
High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.