* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Grid-Based Mode Seeking Procedure
Survey
Document related concepts
Transcript
Grid-Based Mode Seeking Procedure Damir Krstinić⋆ , Ivan Slapničar University of Split, Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture, R. Boškovića b.b, 21000 Split, Croatia ⋆ [email protected], tel: +38521305895 fax: +38521305856 November 7, 2011 Abstract We propose a novel mode seeking and clustering procedure based on the estimation of the discrete probability density function of the data set. The discrete density is estimated in two steps. Initial density estimate is acquired by counting data samples which populate each cell of the discretized domain. Based on the initial density estimate, each cell of the discretized domain is assigned a variable bandwidth kernel, which is afterwards used to compute final discrete density estimate. Modes of the estimated density, corresponding to the patterns in the input data, are obtained by hill climbing procedure. The proposed technique is highly efficient, running in time linear to the number of input data samples, with low constant factors. The proposed technique has no embedded assumptions about structure of the data or the feature space, like number of clusters or their shape, thus arbitrarily structured data sets can be analyzed. Keywords: clustering, mode seeking, density estimation, density based clustering, variable kernel 1 1 Introduction Broad category of feature space analysis techniques relay on the density estimation, the construction of the unknown density function from the observed data. Estimated density reveals statistical trends and hidden patterns in data distribution where dense regions correspond to clusters of the data separated by regions of low density and noise [1]. Opposite to classification model, where data samples are classified in one of the predefined categories, number of clusters and their characteristics are generally not known in advance. Therefore methods which relay upon a priori knowledge of the structure of the feature space and data set being analyzed, like number of clusters or their shape, are often incapable of providing reliable representation of the underlying true distribution of the real data sets. To provide accurate description of the arbitrarily structured feature space methods should be used that do not have embedded assumptions [2]. Nonparametric methods for density estimation are based on the concept that the probability density function (PDF) at a continuity point can be estimated using the sample observation that falls within a small region around that point [3]. Clustering is based on determining local maxima (modes) of the estimated density. Mode seeking procedure is started form different locations of the feature space to find points of convergence corresponding to the modes of the density function. The set of all locations that converge to the same mode delineates the basin of attraction of that mode and represents the pattern in the input data. Data samples which are in the same basin of attraction are associated with the same cluster. Mode seeking procedure is guided only by the estimated density and have no embedded assumptions about feature space or data set. These techniques can find arbitrary shaped clusters with robustness to outliers and noise. Density based techniques are application independent and can be used to develop algorithms for wide variety of computer vision tasks [4–7], earth sciences and spatial databases [8–11] and other data mining problems where processing of large data sets is required. The main motivation of this work is to provide a computationally efficient feature space analysis technique. Our approach is based on the discretization of the data domain and estimation of the probability density function in two steps. Initial estimate is obtained by counting data samples populating each cell of the discretized domain. Based on the initial estimate, variable bandwidth is assigned to each cell of the discretized domain, which is afterwards used to compute final density estimate. Patterns in the input data are obtained by hill climbing procedure started form the populated cells of the discretized domain. 2 2 Related work Mean Shift algorithm, widely adopted in computer vision community [4,5], is based on kernel density estimation and gradient-guided mode seeking procedure [12, 13]. This technique is application independent and is not parameterized with the a priori knowledge about number of clusters or their shape. Scale of observation is defined with the kernel bandwidth. For the data sets exhibiting local scale variations better results can be obtained using the variable bandwidth Mean Shift [14]. However, despite the desirable characteristics of the Mean Shift algorithm, the convergence rate could be to slow for real-time problems when fast processing of large number of data samples is required. Grid-based clustering techniques [15, p. 243] discretize the data domain into a finite number of cells, forming a multidimensional grid structure on which the operations are performed. These techniques are typically fast since they depend on the number of cells in each dimension of the quantized feature space rather than the number of actual data samples. Moreover, they are easy to implement and generally produce satisfactory results. In the DENCLUE algorithm [16] approximation of the PDF is obtained by using average shifted histogram technique [17]. The GCA [18] algorithm uses multidimensional grid to find initial centers and reduce the number of iterations in the k-means algorithm. Discretization of the feature space is also used in [19] to accelerate well known DBSCAN [20] algorithm based on the concept of the density reachability. Instead of using a single resolution grid, some techniques adaptively refine grid resolution based on the regional density [21, 22]. Adaptive refinement of the grid resolution is also used in [23], with clustering result sensitive to the order of the input data. A different approach to a multiscale cluster detection problem is used in the WaveCluster [24] algorithm, where wavelet transform is repeatedly applied to the discretized domain, revealing clusters at different scales. 3 Density based clustering Let D = {x1 , ..., xN }, xi ∈ Rd be a set of N data samples in d-dimensional vector space. Let f : Rd → R be the inherent distribution of the data set D. The main concept of the density based clustering is based on the density attractor notion [16]. 3 3.1 Density attractors Definition 1 (Density attractor) A point x∗ ∈ Rd is a density attractor of the probability density function f : Rd → R if x∗ is a local maximum of f . More formally ∃ǫ ∈ R : d(x∗ , x) < ǫ ∧ x 6= x∗ ⇒ f (x) < f (x∗ ), (1) where d(x, x∗ ) is the distance in Rd . Each density attractor of PDF delineates associated basin of attraction, a set of points x ∈ Rd for which hill-climbing procedure started at x converges to x∗ . Hill-climbing procedure can be guided by a gradient of PDF [3, 13] or step-wise [16]. Definition 2 (Strong density attractor) Strong density attractor x∗ is density attractor satisfying f (x∗ ) ≥ ξ, (2) where ξ is a noise level. Definition 2 defines significant density attractors with regard to the noise and outliers. Due to noise and outliers, PDF can exhibit local variations with the density low compared to the modes corresponding to significant clusters. Implicitly defined number of clusters corresponds to the number of modes above noise level. 3.2 Kernel density estimation Let K : Rd → R, K(x) ≥ 0 be a radially symmetric kernel satisfying Z K(x)dx = 1 (3) Rd The inherent distribution f (x) of the data set D at x ∈ Rd can be estimated as the average of the identically scaled kernels centered at each sample: N 1 X x − xi f (x) = , K N hd i=1 h D (4) where bandwidth h is the scaling factor. The shape of the PDF estimate is strongly influenced by the bandwidth h, which defines the scale of observation. Larger values of h result in smoother density estimate, while for smaller values the contribution of 4 each sample to overall density has the emphasized local character, resulting in density function revealing details on a finer scale. The fixed bandwidth h, constant across x ∈ Rd , affects the estimation performance when the data exhibit local scale variations. Frequently more smoothing is needed in the tails of the distribution where data are sparse, while less smoothing is needed in the dense regions. By selecting a different bandwidth h = h(xi ) for each sample xi , the sample point density estimator [14, 25] is defined: fsD (x) = N 1 X 1 x − xi K N i=1 h(xi )d h(xi ) (5) Since the variable bandwidth is itself a probability density function, an auxiliary technique should be used to obtain h(xi ). The simple and efficient way to obtain h(xi ), used in [5], is nearest neighbors technique. Significant improvements can be obtained [26, 27] by setting the bandwidth h(xi ) to be reciprocal to the square root of the density f D (xi ) h(xi ) = h0 s λ , f D (xi ) (6) where h0 is the fixed bandwidth and λ is a proportionality constant. Since f D (x) is unknown density to be estimated, the practical approach [14] is to use an initial pilot estimate f˜D (x) and to take proportionality o n . constant λ as the geometric mean of all f˜D (xi ) i=1,...,N Several techniques proposed in the literature can be used to select fixed kernel bandwidth h0 . Statistically motivated techniques define optimal bandwidth as the bandwidth that minimizes Asymptotic Mean Integrated Squared error (AMISE) [28, 29], i.e. the bandwidth that achieves the best compromise between the bias and the variance of the estimator [2]. Cross-validation techniques optimize eligibility criteria [30]. The originally proposed cross-validation criterion [31] selects the bandwidth that maximizes LCV (h) = n Y fˆh,−i (xi ) (7) i=1 where fˆh,−i (xi ) denotes the density estimate with xi removed. Other techniques are guided by the stability of the decomposition and the bandwidth is taken as the center of the largest interval over which the number of density attractors remains constant [16]. However, most of these techniques relay on iterative optimization and are computationally expensive. Since in most cases the decomposition is task-dependent, top-down information provided by the user or by an upper-level processing module can be used to control the kernel bandwidth [2]. 5 4 Grid based mode seeking We propose a new two-step technique for the discrete density estimation. Initial density is estimated by counting input data samples which populate each cell of the discretized domain. Afterwards the sample point density estimator (5) is applied to gain final density estimate, where initial density estimate is used to compute variable bandwidth (6). The proposed technique can be summarized in the following steps: 1. quantize the data domain 2. estimate initial density by counting data samples which populate each cell of the discretized domain 3. assign variable bandwidth to each cell of the discretized domain 4. acquire final density estimate using the sample point density estimator 5. detect modes by step-wise hill-climbing procedure 6. assign input data to clusters defined by the detected modes 4.1 Discrete density estimation The process of the estimation of the discrete density starts with domain discretization. The bounding ddimensional hyperrectangle of the data set D = {x1 , ..., xN }, xi ∈ Rd is is partitioned into d-dimensional hypercubes with side length σ. The density at some point x is strongly influenced by a limited set of samples near that point, thus density can be approximated with local density function fˆD (x). We adopt the sample point density estimation technique (5) where each sample is assigned the variable bandwidth (6), resulting in the adaptive local density function: 1 fˆD (x) = N X xl ∈Nl (x) 1 K h(xl )d x − xl h(xl ) , (8) where Nl (x) is the adaptive neighborhood. The variable bandwidth is defined by: h(xl ) = h0 s λ , D ˜ f (xl ) (9) where f˜D is the initial density estimate obtained by counting data samples which populate each cell of the discretized domain. Proportionality constant λ is set to the geometric mean of f˜D [14]. The discrete density 6 fˆD (x) is estimated using the Epanechnikov kernel (10): 2 1 s−1 2 d (d + 2)(1 − kxk ) KE (x) = 0 for kxk ≤ 1 (10) for kxk > 1 where sd is equal to the volume of the d-dimensional hypersphere. As the influence of the Epanechnikov kernel centered at xi vanishes for d(x, xi ) > h(xi ), the adaptive neighborhood is defined by: Nl (x) = {xl : d(xl , x) ≤ h(xl )} (11) We define the cell width σ by means of the fixed bandwidth h0 as: h0 ρ σ= (12) where ρ is the discretization parameter. Setting the bandwidth to be multiple of the cell width intuitively prescribes a set of neighboring cells in which contribution of each sample should be accounted for. Factor ρ determines discretization granularity of the data domain and defines the complexity of the density estimation procedure. For higher ρ details on finer scale are revealed with increase in the computational complexity. This parameter is connected with the bandwidth h0 (12), which is effectively used to control the scale of observation. Therefore, we fix the discretization granularity to ρ = 3 as this value represents a compromise between the efficiency and the accuracy of the provided density estimate. The introduced technique requires two passes through the data to compute the initial and the final density estimate. Efficiency of this procedure can be increased such that it requires one pass while preserving the accuracy. Let 1 K Z(x, xj ) = h(xj )d x − xj h(xj ) (13) be the contribution of the sample xj at point x. Introducing (13) and (11) into (8) yields 1 fˆD (x) = N X Z(x, xl ) (14) xl :d(xl ,x)≤h(xl ) Let cu denote the spatial coordinates of the center of the u-th cell of the discretized domain. The contribution of all samples which populate the v-th cell to the density at the center of the u-th cell is approximately Ẑ(cu , cv ) = f˜D (cv ) 1 K h(cv )d 7 cu − cv h(cv ) (15) The contribution is accounted for as if all samples were located in the center of the cell. Asset of the adaptively scaled kernel centered at the v-th cell is multiplied with the number of samples populating that cell. By inserting (15) into (14) we obtain: 1 fˆD (cu ) ≈ N X Ẑ(cu , cl ), (16) cl :d(cu ,cl )≤h(cl ) The proposed technique requires one pass through the input data and an additional pass through populated cells, yielding computational complexity O(N + rM ), where N is cardinality of the data set, M is the number of populated cells and constant r, proportional to ρd , is the average number of neighboring cells in which kernel contribution is accounted for. 4.2 Clustering and noise level Clustering is based on determining modes of the estimated discrete density and associated basins of attraction. Subset of input samples pertaining to a basin of attraction of the same strong density attractor are assigned a unique cluster label. Samples pertaining to basins of attraction of modes with the density below noise level ξ are labeled as noise. Step wise hill-climbing procedure started from populated cells (satisfying f˜D > 0) has two stopping criteria: 1. Local maxima detected If the density fˆD of the detected mode is below noise level ξ, all cells on the path of the hill-climbing procedure are labeled as noise. Otherwise, a strong attractor is detected and a new label denoting new cluster is assigned to all cells on the path of the procedure. 2. Cell with already assigned label encountered Continuation of the procedure from this point leads to an already detected maxima. The encountered label is assigned to all cells on the path of the hill-climbing procedure, regardless of representing noise or clustered data. Clustering procedure stops when all populated cells are labeled. Noise level ξ separates statistically important clusters from noise and outliers. As prior knowledge would be required to anticipate absolute noise level, we define ξ relative to the estimated PDF by: ξ = εΛ, 8 (17) where Λ is the geometric mean of fˆD and ε is the task driven algorithm parameter. The value Λ differs from the proportionality constant λ, defined as the geometric mean of the initial density estimate. By defining noise level relative to the estimated density, we utilize the fact that the density of the uniformly distributed noise f DN oise is nearly constant and the probability that the density attractors remain identical goes against 1 for |DN oise | → ∞, where |DN oise | is the cardinality of the noise data set [16]. Note that only modes of the density must be above noise level ξ and the associated basins of attraction are delineated by hill-climbing procedure. Therefore reasonably wide range of values ε started with ε = 2 should efficiently separate noise from structured data. The computational complexity of the hill-climbing procedure is O(M ). The output of the mode seeking procedure is the data domain divided to the basins of attraction of significant attractor and region of noise. If the input data samples are to be assigned to clusters representing revealed patterns in the data distribution, additional pass through the input data is required to assign each sample a cluster label. Thus the overall complexity of the clustering algorithm is O[2N + (r + 1)M ]. 4.3 Postprocessing modes Local maxima in minor scale can also be exhibited in the regions of higher density. These can mislead mode seeking procedure and multiple modes corresponding the same cluster can be detected, particularly on the flat plateaus of the PDF. In [2] these artifacts were removed by joining mode candidates at a distance less than the kernel bandwidth. However, on larger clusters spatially distant modes representing the same pattern in the input data can exist that can not be joined based solely on the distance criterion. Joining strategy presented in [16] fuses modes if path between modes exists with density above predefined level. We adopt similar strategy with adaptive joining threshold. Two modes are fused if path between modes exists with density above 75% of the density of lower mode. 5 Experimental results The results of the proposed mode seeking procedure applied on the 2D data set are shown in Fig. 1. Original data set is shown in Fig. 1(a) and the same data set in the presence of noise is shown in Fig. 1(b). Number of noise data samples added is approximately 60% of the number of samples in original data set. Estimated discrete density in the presence of noise is shown in Fig. 1(c). The effect of noise is visible as raised overall 9 density and small variation in the areas where no clusters are present. However, these variations are minor when compared with the density of the clustered data and are easily removed with the adaptive noise level (17). Significant density attractors approximately retain their original positions [16]. Clusters detected with the proposed grid-based algorithm are shown in Fig. 1(d). Detected clusters are shown in three colors while noise data samples are shown dim. Number of modes detected on the original data set ( Fig. 1(a)) for different values of bandwidth h0 is given in Table 1. Bandwidth h0 is given in the first column, followed by number of strong attractors detected in the second column and final number of modes after joining procedure (see subsection 4.3) in the third column. In the next three columns of the Table 1 results achieved with the fixed bandwidth Mean Shift are shown in the same format. Results achieved for the variable bandwidth Mean Shift with nearest neighbors technique [5] are presented in the last three columns. Value k represents number of nearest neighbors used to assess variable bandwidth. Both versions of the mean shift algorithm use the spatial distance based joining strategy. The results demonstrate the ability of all three algorithms to reveal correct number of modes. The proposed technique detects correct number of modes for a wide range of the parameter h0 . For small values of h0 details in finer scale are revealed, resulting in higher number of modes detected representing small variations in cluster densities. These modes are joined together in Postprocessing procedure. The results achieved in the presence of noise (Fig. 1(b)) are presented in Table 2. Presence of noise slightly narrows the bandwidth range for which correct number of nodes is detected by the proposed technique. Fixed bandwidth Mean Shift detects correct number of nodes after joining procedure for the narrow bandwidth range. Variable bandwidth Mean Shift can’t reveal correct number of significant modes as the presence of noise has great influence on nearest neighbors technique used to acquire the variable bandwidth. The performance of the proposed approach applied to overlapping clusters is illustrated in Fig. 2. Input data is shown in Fig. 2(a), with labels from the original distribution shown in Fig. 2(b). Estimated discrete density is shown in Fig. 2(c). The output of the clustering algorithm is shown in Fig. 2(d). The compliance between clusters from the original distribution and the clusters detected with the proposed algorithm is 96%. Non-parametric procedure correctly detects peaks of the density with boundary between clusters formed from neighboring points from which hill-climbing leads to different density attractors. However, data samples generated from the tails of the distribution can be misclassified if two or more clusters are close. Different 10 example of the performance of the proposed algorithm when applied complexly structured data is given in Fig. 3. Grid-based mode seeking procedure is applied the 3D data samples organized in the form of three interlocked rings. Detected clusters are shown in different colors with each ring recognized as a separate cluster. The proposed mode seeking procedure is applied as color reduction algorithm to image pixels represented by their 3D color coordinates. Discrete density is estimated in the discretized color space and modes of the density are detected. Each pixel color is replaced by the color of the mode of the density function to which hill-climbing procedure started at the color coordinates of the pixels converges to. Color reduction is low level preprocessing step which can be used in wide range of computer vision problems. Example in Fig. 4 shows original image 4(a) and output image 4(b) with each pixel color replaced by the color of the corresponding mode. In Fig. 5 results are shown for three different values of bandwidth h0 . For smaller bandwidth details in finer scale are preserved (can be seen on light reflections, clothes, hair etc.), while for higher h0 more reduction is carried out and most important features on image are isolated (clothes, faces). Bandwidth h0 is therefore used to set level of abstraction. However, variable bandwidth kernel ensures that details in different scales present in different areas of the image are still correctly isolated. Several validation methods have been proposed in the literature for the objective evaluation of clustering methods [15, p. 257]. Typical evaluation objective functions formalize the goal of attaining high intra-cluster similarity and low inter-cluster similarity, which is in accordance with the intuitive concept of clustering. However, good scores on an internal criterion do not necessarily translate into good effectiveness in an particular application [32, p. 356]. An alternative to this approach is evaluation in the application of interest. The proposed mode seeking procedure can be implemented in wide range of clustering and data mining problems, like given example in color reduction, where dedicated evaluation techniques should be used to assess the effectiveness of the method in the particular application [33, 34]. 6 Conclusion The basic feature of all grid-based clustering techniques is a low computational complexity. However, discretization unavoidably introduces error and lowers the performance in disclosing clusters of different scales which are often present in the real data. Some techniques attempt to solve this problem by allowing arbitrary mesh refinement in densely populated areas [21–23]. The resulting grid structure has complex neighboring relations 11 between adjacent cells, which may introduce computational overhead in the mode seeking and clustering procedure. Moreover, an additional pass through the data is often required to construct the adaptive grid, or the result is sensitive to the order of the input data. The proposed technique is based on a single resolution grid, thus a simple hill-climbing procedure can efficiently detect modes and associated basins of attraction. Two-step density estimation ensures that the estimator is self-tuned to the local scale variations in the data set without considerable increase in the computational complexity. Adaptability to the noise in the input scene is attained using relative noise level. Experiments confirm the robustness of the algorithm in the presence of noise as well as the ability to analyze complex data sets. The proposed technique does not scale well with the dimension of the feature space, due to the time and space complexity of multidimensional grid processing and the curse of dimensionality problem inherent to all density-based approaches. However, this technique should be applicable to the wide range of problems when efficient processing of large number of low-dimensional data samples is required. References [1] B. W. Silverman, Density Estimation for Statistics and Data Analysis (Monographs on Statistics and Applied Probability). London: Chapman and Hall, 1986. [2] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, 2002. [3] K. Fukunaga and L. D. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition,” IEEE Transactions on Information Theory, vol. 21, no. 1, pp. 32–40, 1975. [4] D. Comaniciu and P. Meer, “Robust analysis of feature spaces: color image segmentation,” in Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR ’97), (Washington, DC, USA), pp. 750–756, IEEE Computer Society, 1997. [5] B. Georgescu, I. Shimshoni, and P. Meer, “Mean shift based clustering in high dimensions: A texture classification example,” in Proceedings of the 9th IEEE International Conference on Computer Vision, (Washington, DC, USA), pp. 456–463, IEEE Computer Society, 2003. 12 [6] T. Gang, H. Rui-Min, W. Zhong-Yuan, and Z. Li, “Object tracking algorithm based on meanshift algorithm combining with motion vector analysis,” in ETCS ’09. First International Workshop on Education Technology and Computer Science, (Wuhan, Hubei, China), pp. 987–990, 2009. [7] J. Wang, B. Thiesson, Y. Xu, and M. Cohen, “Image and video segmentation by anisotropic kernel mean shift,” in Proceedings of European Conference on Computer Vision ECCV 2004, pp. 238–249, SpringerVerlag, 2004. [8] A. M. Fahim, G. Saake, A.-B. M. Salem, F. A. Torkey, and M. A. Ramadan, “An enhanced density based spatial clustering of applications with noise,” in DMIN (R. Stahlbock, S. F. Crone, and S. Lessmann, eds.), pp. 517–523, CSREA Press, 2009. [9] J. Sander, M. Ester, H. Kriegel, and X. Xu, “Density-based clustering in spatial databases: The algorithm gdbscan and its applications,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 169 – 194, 1998. [10] O. Uncu, W. Gruver, D. Kotak, D. Sabaz, Z. Alibhai, and C. Ng, “Gridbscan: Grid density-based spatial clustering of applications with noise,” in IEEE International Conference on Systems, Man and Cybernetics, 2006. SMC ’06, 2006. [11] eD. Ma and A. Zhang, “An adaptive density-based clustering algorithm for spatial database with nois,” in Fourth IEEE International Conference on Data Mining (ICDM’04), pp. 467–470, 2004. [12] K. Fukunaga, “Statistical pattern recognition,” in Handbook of Pattern Recognition and Computer Vision (P. S. P. W. C. H. Chen, L. F. Pau, ed.), pp. 33–60, Singapore: World Scientific Publishing Co., 1993. [13] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 790–799, 1995. [14] D. Comaniciu, V. Ramesh, and P. Meer, “The variable bandwidth mean shift and data-driven scale selection,” in Proceedings of the 8th IEEE International Conference on Computer Vision, vol. 1, (Vancouver, Canada), pp. 438–445, 2001. [15] S. Mitra and T. Acharya, Data Mining: Multimedia, Soft Computing, and Bioinformatics. Hoboken, New Jersey: John Wiley & Sons, Inc., 2003. 13 [16] A. Hinneburg and D. A. Keim, “A general approach to clustering in large databases with noise,” An International Journal of Knowledge and Information Systems, vol. 5, no. 4, pp. 387–415, 2003. [17] D. W. Scott, “Averaged shifted histograms: Effective nonparametric density estimators in several dimensions,” The Annals of Statistics, vol. 13, no. 3, pp. 1024–1040, 1985. [18] Z. Yu and H. S. Wong, “Gca: A real-time grid-based clustering algorithm for large data set,” in Proceedings of the 2006. International Conference on Pattern Recognition, (Hong Kong, China), pp. 740–743, 2006. [19] S. Mahran and K. Mahar, “Using grid for accelerating density-based clustering,” in 8th IEEE International Conference on Computer and Information Technology (CIT 2008), (Sydney, NSW, Australia), 2008. [20] M. Ester, H. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), p. 226231, AAAI Press, 1996. [21] Z. A. Aghbari and R. Al-Haj, “Hill-manipulation: An effective algorithm for color image segmentation,” Image and Vision Computing, vol. 24, no. 8, pp. 894–903, 2006. [22] W. Liao, Y. Liu, and A. Choudhary, “A grid-based clustering algorithm using adaptive mesh refinement,” in 7th Workshop on Mining Scientific and Engineering Datasets, (Lake Buena Vista, Florida, USA), 2004. [23] N. H. Park and W. S. Lee, “Statistical grid-based clustering over data streams,” ACM SIGMOD Record, vol. 33, no. 1, pp. 32–37, 2004. [24] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “Wavecluster: a wavelet-based clustering approach for spatial data in very large databases,” The International Journal on Very Large Data Bases, vol. 8, no. 3-4, pp. 289–304, 2000. [25] D. W. Scott and S. R. Sain, “Multi-dimensional density estimation,” in Handbook of Statistics: Data Mining and Computational Statistics (C. R. Rao and E. J. Wegman, eds.), vol. 23, pp. 229–263, Amsterdam: Elsevier, 2004. [26] I. S. Abramson, “On bandwidth variation in kernel estimates-a square root law,” The Annals of Statistics, vol. 10, no. 4, pp. 1217–1223, 1982. 14 [27] P. Hall, T. C. Hu, and J. S. Marron, “Improved variable window kernel estimates of probability densities,” The Annals of Statistics, vol. 23, no. 1, pp. 1–10, 1995. [28] B. A. Turlach, “Bandwidth selection in kernel density estimation: A review,” Disscusion Paper 9317, Institute de Statistique, 1993. [29] M. P. Wand and M. C. Jones, Kernel Smoothing (Monographs on Statistics and Applied Probability). London: Chapman and Hall, 1994. [30] C. R. Loader, “Bandwidth selection: classical or plug-in?,” Annals of Statistics, vol. 27, no. 2, pp. 415–438, 1999. [31] R. P. W. Duin, “On the choice of smoothing parameters for parzen estimators of probability density functions,” IEEE Transactions on Computers, vol. 25, no. 11, pp. 1175–1179, 1976. [32] C. D. Manning, P. Raghavan, and H. Schutze, An Introduction ti Information Retrieval. Cambridge, England: Online edition (c) 2009 Cambridge University Press, 2009. [33] H. Zhang, J. E. Fritts, and S. A. Goldman, “Image segmentation evaluation: A survey of unsupervised methods,” Computer Vision and Image Understanding, vol. 110, no. 2, pp. 260–280, 2008. [34] R. Unnikrishnan, C. Pantofaru, and M. Hebert, “A measure for objective evaluation of image segmentation algorithms,” in Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Empirical Evaluation Methods in Computer Vision, (San Diego, CA, USA), pp. 394–400, 2005. 15 Table 1: 2D data - Number of modes detected Two Step GB mode seeking MS, fixed bandwidth MS, variable bandwidth h0 modes final h modes final k modes final 0.02 133 56 0.02 408 12 50 303 16 0.04 32 3 0.04 115 3 100 247 7 0.06 9 3 0.06 36 3 150 68 5 0.08 10 3 0.08 25 3 200 344 5 0.10 6 3 0.10 16 3 250 360 3 0.12 3 3 0.12 12 3 300 64 3 0.14 3 3 0.14 10 3 350 34 3 0.16 3 3 0.16 8 3 400 38 3 0.18 3 3 0.18 7 3 450 28 3 0.20 3 3 0.20 9 3 500 25 3 0.22 3 3 0.22 3 3 550 19 3 0.24 3 3 0.24 7 3 600 20 3 0.26 3 3 0.26 5 3 650 19 3 0.28 3 3 0.28 6 2 700 16 3 0.30 3 3 0.30 2 1 750 18 3 0.32 3 3 0.32 2 1 800 16 3 0.34 3 3 0.34 3 1 850 16 3 0.36 3 3 0.36 3 1 900 20 3 0.38 3 3 0.38 3 1 950 20 3 0.40 3 3 0.40 3 1 1000 20 3 0.42 3 3 0.42 3 1 1050 20 3 0.44 3 3 0.44 4 1 1100 20 3 0.46 3 3 0.46 2 1 1150 20 3 0.48 3 2 0.48 2 1 1200 20 3 0.50 3 2 0.50 2 1 1250 20 3 16 Table 2: 2D data with noise - Number of modes detected Two Step GB mode seeking MS, fixed bandwidth MS, variable bandwidth h0 modes final h modes final k modes final 0.02 323 194 0.02 901 17 50 426 29 0.04 86 27 0.04 655 3 100 312 11 0.06 28 5 0.06 111 6 150 422 9 0.08 23 3 0.08 87 6 200 595 10 0.10 11 3 0.10 147 6 250 698 8 0.12 4 3 0.12 106 5 300 566 7 0.14 4 3 0.14 31 4 350 293 6 0.16 5 3 0.16 25 4 400 165 6 0.18 5 3 0.18 105 5 450 147 6 0.20 3 3 0.20 10 3 500 148 6 0.22 3 3 0.22 9 3 550 143 6 0.24 3 3 0.24 8 3 600 141 6 0.26 3 3 0.26 9 3 650 138 6 0.28 3 3 0.28 76 2 700 137 6 0.30 3 3 0.30 4 1 750 135 6 0.32 3 3 0.32 3 1 800 135 6 0.34 3 3 0.34 2 1 850 134 6 0.36 3 3 0.36 4 1 900 136 6 0.38 3 3 0.38 3 1 950 136 6 0.40 3 3 0.40 4 1 1000 136 6 0.42 3 3 0.42 3 1 1050 136 6 0.44 3 3 0.44 4 1 1100 136 6 0.46 3 2 0.46 4 1 1150 136 6 0.48 3 2 0.48 4 1 1200 136 6 0.50 3 2 0.50 4 1 1250 136 6 17 List of Figures 1 Density estimation and clustering in the presence of noise . . . . . . . . . . . . . . . . . . . . . . 19 2 Density estimation and clustering on data set with overlaping clusters . . . . . . . . . . . . . . . 20 3 Clustered 3D data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4 Mode seeking procedure implmented for color reduction algorithm . . . . . . . . . . . . . . . . . 22 5 Color reduction for different values of fixed bandwidth h0 . . . . . . . . . . . . . . . . . . . . . . 23 18 (a) Original data set (b) Data set with noise (c) Estimated discrete density (d) Clusters detected, noise samples are shown dim Figure 1: Density estimation and clustering in the presence of noise 19 (a) Overlapping clusters (b) Cluster labels from the original distribution (c) Estimated discrete density (d) Detected cluster labels Figure 2: Density estimation and clustering on data set with overlaping clusters 20 Figure 3: Clustered 3D data 21 (a) Original image (b) Colors reduced Figure 4: Mode seeking procedure implmented for color reduction algorithm 22 (a) Original image (b) Colors reduced, h0 = 2 (c) Colors reduced, h0 = 8 (d) Colors reduced, h0 = 16 Figure 5: Color reduction for different values of fixed bandwidth h0 23