
A Probabilistic Framework for Semi
... semi-supervised clustering that was experimentally shown to produce more accurate clusters than other methods on several data sets [8]. However, this approach is restricted to using Euclidean distance as the clustering distortion measure. In this paper, we show how to generalize that model to handle ...
... semi-supervised clustering that was experimentally shown to produce more accurate clusters than other methods on several data sets [8]. However, this approach is restricted to using Euclidean distance as the clustering distortion measure. In this paper, we show how to generalize that model to handle ...
Using an Ontology-based Approach for Geospatial Clustering
... result including appropriate clustering methods and datasets together with possible explanation data. The Spatial Data Viewer will allow users to effectively explore and select appropriate data relevant to user’s task. The basic functions of Spatial Data Viewer will include visualize data in differe ...
... result including appropriate clustering methods and datasets together with possible explanation data. The Spatial Data Viewer will allow users to effectively explore and select appropriate data relevant to user’s task. The basic functions of Spatial Data Viewer will include visualize data in differe ...
Subspace Clustering for Complex Data
... trated in Figure 2 (right). The subspaces individually assigned to each group provide the reasoning why such multiple solutions are meaningful. Thus, in the example of Figure 2, each of the four clusters {C1 , . . . , C4 } is useful and should be provided to the user. In this work we describe novel ...
... trated in Figure 2 (right). The subspaces individually assigned to each group provide the reasoning why such multiple solutions are meaningful. Thus, in the example of Figure 2, each of the four clusters {C1 , . . . , C4 } is useful and should be provided to the user. In this work we describe novel ...
Intelligence Based Intrusion Detection System (IBIDS) Senior Project
... mostly through computers, and especially the internet, that most of global communication takes place. This global communication does not just contain normal conversations and public-available data, but also consists of the transactions and transfers of private and confidential data which are kept at ...
... mostly through computers, and especially the internet, that most of global communication takes place. This global communication does not just contain normal conversations and public-available data, but also consists of the transactions and transfers of private and confidential data which are kept at ...
jonyer01a - Journal of Machine Learning Research
... instances to contain minor differences from the substructure definition. This feature is optional and the user must enable it as well as specify the degree of maximum allowable dissimilarity. The command line argument to be specified is –threshold Number, where Number is between 0 and 1 inclusive - ...
... instances to contain minor differences from the substructure definition. This feature is optional and the user must enable it as well as specify the degree of maximum allowable dissimilarity. The command line argument to be specified is –threshold Number, where Number is between 0 and 1 inclusive - ...
Detection and Visualization of Subspace Cluster Hierarchies
... In addition, cluster C is embedded within both 2D clusters A and B. Detecting such relationships of subspace clusters is obviously a hierarchical problem. The resulting hierarchy is different from the result of a conventional hierarchical clustering algorithm (e.g. a dendrogram). In a dendrogram, ea ...
... In addition, cluster C is embedded within both 2D clusters A and B. Detecting such relationships of subspace clusters is obviously a hierarchical problem. The resulting hierarchy is different from the result of a conventional hierarchical clustering algorithm (e.g. a dendrogram). In a dendrogram, ea ...
Detection and Visualization of Subspace Cluster Hierarchies
... In addition, cluster C is embedded within both 2D clusters A and B. Detecting such relationships of subspace clusters is obviously a hierarchical problem. The resulting hierarchy is different from the result of a conventional hierarchical clustering algorithm (e.g. a dendrogram). In a dendrogram, ea ...
... In addition, cluster C is embedded within both 2D clusters A and B. Detecting such relationships of subspace clusters is obviously a hierarchical problem. The resulting hierarchy is different from the result of a conventional hierarchical clustering algorithm (e.g. a dendrogram). In a dendrogram, ea ...
Graph-Based Hierarchical Conceptual Clustering
... instances to contain minor differences from the substructure definition. This feature is optional and the user must enable it as well as specify the degree of maximum allowable dissimilarity. The command line argument to be specified is –threshold Number, where Number is between 0 and 1 inclusive - ...
... instances to contain minor differences from the substructure definition. This feature is optional and the user must enable it as well as specify the degree of maximum allowable dissimilarity. The command line argument to be specified is –threshold Number, where Number is between 0 and 1 inclusive - ...
Foundations of Perturbation Robust Clustering
... This work follows a line of research on theoretical foundations of clustering. Efforts in the field began as early as the 1970s with the pioneering work of Wright [25] on axioms of clustering, as well analysis of clustering properties by Fisher et al [17] and Jardine et al [20], among others. This f ...
... This work follows a line of research on theoretical foundations of clustering. Efforts in the field began as early as the 1970s with the pioneering work of Wright [25] on axioms of clustering, as well analysis of clustering properties by Fisher et al [17] and Jardine et al [20], among others. This f ...
Swarm Intelligence Algorithms for Data Clustering
... Jain, 1995, Pal et al., 1993, Kohonen, 1995), evolutionary computing (Falkenauer, 1998, Paterlini and Minerva, 2003) and so on. Researchers all over the globe are coming up with new algorithms, on a regular basis, to meet the increasing complexity of vast real-world datasets. A comprehensive review ...
... Jain, 1995, Pal et al., 1993, Kohonen, 1995), evolutionary computing (Falkenauer, 1998, Paterlini and Minerva, 2003) and so on. Researchers all over the globe are coming up with new algorithms, on a regular basis, to meet the increasing complexity of vast real-world datasets. A comprehensive review ...
A novel algorithm for fast and scalable subspace clustering of high
... knowledge helps to prune the non-dense neighbourhoods of the data points in the lower dimensional subspaces as they will never lead to the dense neighbourhoods in the higher dimensional subspaces. Thus, only the dense set of points (clusters) starting from the 1-dimensional subspaces, are chosen as ...
... knowledge helps to prune the non-dense neighbourhoods of the data points in the lower dimensional subspaces as they will never lead to the dense neighbourhoods in the higher dimensional subspaces. Thus, only the dense set of points (clusters) starting from the 1-dimensional subspaces, are chosen as ...
Determining the number of clusters using information entropy for
... clusters characterized by distances to the centers of the clusters. Leung et al. [29] proposed an interesting hierarchical clustering algorithm based on human visual system research, in which each data point is regarded as a light point in an image, and a cluster is represented as a blob. As the rea ...
... clusters characterized by distances to the centers of the clusters. Leung et al. [29] proposed an interesting hierarchical clustering algorithm based on human visual system research, in which each data point is regarded as a light point in an image, and a cluster is represented as a blob. As the rea ...
Clustering Documents with Active Learning using Wikipedia
... fa (p) is the number of Wikipedia articles in which it is used as an anchor, and ft (p) is the number of articles in which it appears in any form. Phrases with low probabilities are discarded. The same feature is used to resolve overlaps. For example, the term “South Africa” matches to three anchors ...
... fa (p) is the number of Wikipedia articles in which it is used as an anchor, and ft (p) is the number of articles in which it appears in any form. Phrases with low probabilities are discarded. The same feature is used to resolve overlaps. For example, the term “South Africa” matches to three anchors ...
Self-Tuning Clustering: An Adaptive Clustering Method for
... communities due to its wide applicability to improving marketing strategies [3]. Among others, data clustering is an important technique for exploratory data analysis [6]. In essence, clustering is meant to divide a set of data items into some proper groups in such a way that items in the same group ...
... communities due to its wide applicability to improving marketing strategies [3]. Among others, data clustering is an important technique for exploratory data analysis [6]. In essence, clustering is meant to divide a set of data items into some proper groups in such a way that items in the same group ...
When Pattern met Subspace Cluster
... rectangle in the data, or a tile. In pattern mining, the notion of a tile has become very important in recent years [17, 21, 23, 33]. Originally the denition of a pattern was very much along the lines of an SQL query, posing selection criteria on which objects in the data are considered to support ...
... rectangle in the data, or a tile. In pattern mining, the notion of a tile has become very important in recent years [17, 21, 23, 33]. Originally the denition of a pattern was very much along the lines of an SQL query, posing selection criteria on which objects in the data are considered to support ...
Human genetic clustering

Human genetic clustering analysis uses mathematical cluster analysis of the degree of similarity of genetic data between individuals and groups in order to infer population structures and assign individuals to groups. These groupings in turn often, but not always, correspond with the individuals' self-identified geographical ancestry. A similar analysis can be done using principal components analysis, which in earlier research was a popular method. Many studies in the past few years have continued using principal components analysis.