
Privacy-Preserving Decision Tree Mining Based on
... To perturb a set of data records O = {o1 , . . . , on } on an attribute A, we create a perturbation matrix M for the attribute domain U = {u1 , . . . , uN }. For each uk ∈ U, p(k → h) = Pr(uk → uh ) denotes the (transition) probability that uk is replaced by uh ∈ U. The perturbation matrix is then d ...
... To perturb a set of data records O = {o1 , . . . , on } on an attribute A, we create a perturbation matrix M for the attribute domain U = {u1 , . . . , uN }. For each uk ∈ U, p(k → h) = Pr(uk → uh ) denotes the (transition) probability that uk is replaced by uh ∈ U. The perturbation matrix is then d ...
Clustering - Politecnico di Milano
... using a mixture of distributions • Each cluster is represented by one distribution • The distribution governs the probabilities of attributes values in the corresponding cluster • They are called finite mixtures because there is only a finite number of clusters being represented • Usually individual ...
... using a mixture of distributions • Each cluster is represented by one distribution • The distribution governs the probabilities of attributes values in the corresponding cluster • They are called finite mixtures because there is only a finite number of clusters being represented • Usually individual ...
DECODE: a new method for discovering clusters of different
... Density-based cluster methods are characterized by aggregating mechanisms based on density (Han et al. 2001). It is believed that density-based cluster methods have the potential to reveal the structure of a spatial data set in which different point processes overlap. Ester et al. (1996) and Sander ...
... Density-based cluster methods are characterized by aggregating mechanisms based on density (Han et al. 2001). It is believed that density-based cluster methods have the potential to reveal the structure of a spatial data set in which different point processes overlap. Ester et al. (1996) and Sander ...
Discovering Correlated Subspace Clusters in 3D
... Axis-parallel 3D subspace clusters are extensions of the 2D subspace clusters with time/location as the third dimension. Tricluster [1] is the pioneer work on 3D subspace clusters. Similar to 2D subspace clusters, triclusters fulfill certain similarity-based functions and thresholds have to be set o ...
... Axis-parallel 3D subspace clusters are extensions of the 2D subspace clusters with time/location as the third dimension. Tricluster [1] is the pioneer work on 3D subspace clusters. Similar to 2D subspace clusters, triclusters fulfill certain similarity-based functions and thresholds have to be set o ...
Inducing Decision Trees with an Ant Colony Optimization Algorithm
... 2.1. Top-down Induction of Decision Trees Decision trees provide a comprehensible graphical representation of a classification model, where the internal nodes correspond to attribute tests (decision nodes) and leaf nodes correspond to the predicted class labels—illustrated in Fig. 1. In order to cla ...
... 2.1. Top-down Induction of Decision Trees Decision trees provide a comprehensible graphical representation of a classification model, where the internal nodes correspond to attribute tests (decision nodes) and leaf nodes correspond to the predicted class labels—illustrated in Fig. 1. In order to cla ...
Parallel Outlier Detection on Uncertain Data for GPUs
... 2.2 Parallel and GPU-accelerated data mining Data mining applications such as outlier detection are good candidates for parallelization [16] [18] as a large amount of data is processed by a small number of routines. Several outlier detection algorithms have been parallelized for acceleration with GP ...
... 2.2 Parallel and GPU-accelerated data mining Data mining applications such as outlier detection are good candidates for parallelization [16] [18] as a large amount of data is processed by a small number of routines. Several outlier detection algorithms have been parallelized for acceleration with GP ...
Clustering Non-Ordered Discrete Data, JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, Vol 30, PP. 1-23, 2014, Alok Watve, Sakti Pramanik, Sungwon Jung, Bumjoon Jo, Sunil Kumar, Shamik Sural
... idea behind this is that even though two data points may not directly be similar to each other, but if they share many common neighbors then they should be put in the same cluster. Based on this new measure, they propose a hierarchical clustering method that recursively merges clusters with maximum ...
... idea behind this is that even though two data points may not directly be similar to each other, but if they share many common neighbors then they should be put in the same cluster. Based on this new measure, they propose a hierarchical clustering method that recursively merges clusters with maximum ...
Expectation–maximization algorithm

In statistics, an expectation–maximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.