Download Grid-Based Mode Seeking Procedure

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Grid-Based Mode Seeking Procedure
Damir Krstinić⋆ , Ivan Slapničar
University of Split,
Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture,
R. Boškovića b.b, 21000 Split, Croatia
⋆
[email protected], tel: +38521305895 fax: +38521305856
November 7, 2011
Abstract
We propose a novel mode seeking and clustering procedure based on the estimation of the discrete
probability density function of the data set. The discrete density is estimated in two steps. Initial density
estimate is acquired by counting data samples which populate each cell of the discretized domain. Based on
the initial density estimate, each cell of the discretized domain is assigned a variable bandwidth kernel, which
is afterwards used to compute final discrete density estimate. Modes of the estimated density, corresponding
to the patterns in the input data, are obtained by hill climbing procedure. The proposed technique is
highly efficient, running in time linear to the number of input data samples, with low constant factors. The
proposed technique has no embedded assumptions about structure of the data or the feature space, like
number of clusters or their shape, thus arbitrarily structured data sets can be analyzed.
Keywords: clustering, mode seeking, density estimation, density based clustering, variable kernel
1
1
Introduction
Broad category of feature space analysis techniques relay on the density estimation, the construction of the
unknown density function from the observed data. Estimated density reveals statistical trends and hidden
patterns in data distribution where dense regions correspond to clusters of the data separated by regions of low
density and noise [1]. Opposite to classification model, where data samples are classified in one of the predefined
categories, number of clusters and their characteristics are generally not known in advance. Therefore methods
which relay upon a priori knowledge of the structure of the feature space and data set being analyzed, like
number of clusters or their shape, are often incapable of providing reliable representation of the underlying true
distribution of the real data sets. To provide accurate description of the arbitrarily structured feature space
methods should be used that do not have embedded assumptions [2].
Nonparametric methods for density estimation are based on the concept that the probability density function
(PDF) at a continuity point can be estimated using the sample observation that falls within a small region around
that point [3]. Clustering is based on determining local maxima (modes) of the estimated density. Mode seeking
procedure is started form different locations of the feature space to find points of convergence corresponding
to the modes of the density function. The set of all locations that converge to the same mode delineates the
basin of attraction of that mode and represents the pattern in the input data. Data samples which are in the
same basin of attraction are associated with the same cluster. Mode seeking procedure is guided only by the
estimated density and have no embedded assumptions about feature space or data set. These techniques can
find arbitrary shaped clusters with robustness to outliers and noise. Density based techniques are application
independent and can be used to develop algorithms for wide variety of computer vision tasks [4–7], earth sciences
and spatial databases [8–11] and other data mining problems where processing of large data sets is required.
The main motivation of this work is to provide a computationally efficient feature space analysis technique.
Our approach is based on the discretization of the data domain and estimation of the probability density function
in two steps. Initial estimate is obtained by counting data samples populating each cell of the discretized domain.
Based on the initial estimate, variable bandwidth is assigned to each cell of the discretized domain, which is
afterwards used to compute final density estimate. Patterns in the input data are obtained by hill climbing
procedure started form the populated cells of the discretized domain.
2
2
Related work
Mean Shift algorithm, widely adopted in computer vision community [4,5], is based on kernel density estimation
and gradient-guided mode seeking procedure [12, 13]. This technique is application independent and is not
parameterized with the a priori knowledge about number of clusters or their shape. Scale of observation is
defined with the kernel bandwidth. For the data sets exhibiting local scale variations better results can be
obtained using the variable bandwidth Mean Shift [14]. However, despite the desirable characteristics of the
Mean Shift algorithm, the convergence rate could be to slow for real-time problems when fast processing of
large number of data samples is required.
Grid-based clustering techniques [15, p. 243] discretize the data domain into a finite number of cells, forming
a multidimensional grid structure on which the operations are performed. These techniques are typically fast
since they depend on the number of cells in each dimension of the quantized feature space rather than the
number of actual data samples. Moreover, they are easy to implement and generally produce satisfactory
results. In the DENCLUE algorithm [16] approximation of the PDF is obtained by using average shifted
histogram technique [17]. The GCA [18] algorithm uses multidimensional grid to find initial centers and reduce
the number of iterations in the k-means algorithm. Discretization of the feature space is also used in [19] to
accelerate well known DBSCAN [20] algorithm based on the concept of the density reachability. Instead of using
a single resolution grid, some techniques adaptively refine grid resolution based on the regional density [21, 22].
Adaptive refinement of the grid resolution is also used in [23], with clustering result sensitive to the order of
the input data. A different approach to a multiscale cluster detection problem is used in the WaveCluster [24]
algorithm, where wavelet transform is repeatedly applied to the discretized domain, revealing clusters at different
scales.
3
Density based clustering
Let D = {x1 , ..., xN }, xi ∈ Rd be a set of N data samples in d-dimensional vector space. Let f : Rd → R be
the inherent distribution of the data set D. The main concept of the density based clustering is based on the
density attractor notion [16].
3
3.1
Density attractors
Definition 1 (Density attractor) A point x∗ ∈ Rd is a density attractor of the probability density function
f : Rd → R if x∗ is a local maximum of f . More formally
∃ǫ ∈ R : d(x∗ , x) < ǫ ∧ x 6= x∗ ⇒ f (x) < f (x∗ ),
(1)
where d(x, x∗ ) is the distance in Rd .
Each density attractor of PDF delineates associated basin of attraction, a set of points x ∈ Rd for which
hill-climbing procedure started at x converges to x∗ . Hill-climbing procedure can be guided by a gradient of
PDF [3, 13] or step-wise [16].
Definition 2 (Strong density attractor) Strong density attractor x∗ is density attractor satisfying
f (x∗ ) ≥ ξ,
(2)
where ξ is a noise level.
Definition 2 defines significant density attractors with regard to the noise and outliers. Due to noise and outliers, PDF can exhibit local variations with the density low compared to the modes corresponding to significant
clusters. Implicitly defined number of clusters corresponds to the number of modes above noise level.
3.2
Kernel density estimation
Let K : Rd → R, K(x) ≥ 0 be a radially symmetric kernel satisfying
Z
K(x)dx = 1
(3)
Rd
The inherent distribution f (x) of the data set D at x ∈ Rd can be estimated as the average of the identically
scaled kernels centered at each sample:
N
1 X
x − xi
f (x) =
,
K
N hd i=1
h
D
(4)
where bandwidth h is the scaling factor.
The shape of the PDF estimate is strongly influenced by the bandwidth h, which defines the scale of
observation. Larger values of h result in smoother density estimate, while for smaller values the contribution of
4
each sample to overall density has the emphasized local character, resulting in density function revealing details
on a finer scale. The fixed bandwidth h, constant across x ∈ Rd , affects the estimation performance when the
data exhibit local scale variations. Frequently more smoothing is needed in the tails of the distribution where
data are sparse, while less smoothing is needed in the dense regions.
By selecting a different bandwidth h = h(xi ) for each sample xi , the sample point density estimator [14, 25]
is defined:
fsD (x) =
N
1 X 1
x − xi
K
N i=1 h(xi )d
h(xi )
(5)
Since the variable bandwidth is itself a probability density function, an auxiliary technique should be used to
obtain h(xi ). The simple and efficient way to obtain h(xi ), used in [5], is nearest neighbors technique. Significant
improvements can be obtained [26, 27] by setting the bandwidth h(xi ) to be reciprocal to the square root of the
density f D (xi )
h(xi ) = h0
s
λ
,
f D (xi )
(6)
where h0 is the fixed bandwidth and λ is a proportionality constant. Since f D (x) is unknown density to be
estimated, the practical approach [14] is to use an initial pilot estimate f˜D (x) and to take proportionality
o
n
.
constant λ as the geometric mean of all f˜D (xi )
i=1,...,N
Several techniques proposed in the literature can be used to select fixed kernel bandwidth h0 . Statistically
motivated techniques define optimal bandwidth as the bandwidth that minimizes Asymptotic Mean Integrated
Squared error (AMISE) [28, 29], i.e. the bandwidth that achieves the best compromise between the bias and
the variance of the estimator [2]. Cross-validation techniques optimize eligibility criteria [30]. The originally
proposed cross-validation criterion [31] selects the bandwidth that maximizes
LCV (h) =
n
Y
fˆh,−i (xi )
(7)
i=1
where fˆh,−i (xi ) denotes the density estimate with xi removed. Other techniques are guided by the stability of
the decomposition and the bandwidth is taken as the center of the largest interval over which the number of
density attractors remains constant [16]. However, most of these techniques relay on iterative optimization and
are computationally expensive. Since in most cases the decomposition is task-dependent, top-down information
provided by the user or by an upper-level processing module can be used to control the kernel bandwidth [2].
5
4
Grid based mode seeking
We propose a new two-step technique for the discrete density estimation. Initial density is estimated by counting
input data samples which populate each cell of the discretized domain. Afterwards the sample point density
estimator (5) is applied to gain final density estimate, where initial density estimate is used to compute variable
bandwidth (6). The proposed technique can be summarized in the following steps:
1. quantize the data domain
2. estimate initial density by counting data samples which populate each cell of the discretized domain
3. assign variable bandwidth to each cell of the discretized domain
4. acquire final density estimate using the sample point density estimator
5. detect modes by step-wise hill-climbing procedure
6. assign input data to clusters defined by the detected modes
4.1
Discrete density estimation
The process of the estimation of the discrete density starts with domain discretization. The bounding ddimensional hyperrectangle of the data set D = {x1 , ..., xN }, xi ∈ Rd is is partitioned into d-dimensional
hypercubes with side length σ. The density at some point x is strongly influenced by a limited set of samples
near that point, thus density can be approximated with local density function fˆD (x). We adopt the sample
point density estimation technique (5) where each sample is assigned the variable bandwidth (6), resulting in
the adaptive local density function:
1
fˆD (x) =
N
X
xl ∈Nl (x)
1
K
h(xl )d
x − xl
h(xl )
,
(8)
where Nl (x) is the adaptive neighborhood. The variable bandwidth is defined by:
h(xl ) = h0
s
λ
,
D
˜
f (xl )
(9)
where f˜D is the initial density estimate obtained by counting data samples which populate each cell of the
discretized domain. Proportionality constant λ is set to the geometric mean of f˜D [14]. The discrete density
6
fˆD (x) is estimated using the Epanechnikov kernel (10):



2
 1 s−1
2 d (d + 2)(1 − kxk )
KE (x) =


 0
for kxk ≤ 1
(10)
for kxk > 1
where sd is equal to the volume of the d-dimensional hypersphere. As the influence of the Epanechnikov kernel
centered at xi vanishes for d(x, xi ) > h(xi ), the adaptive neighborhood is defined by:
Nl (x) = {xl : d(xl , x) ≤ h(xl )}
(11)
We define the cell width σ by means of the fixed bandwidth h0 as:
h0
ρ
σ=
(12)
where ρ is the discretization parameter. Setting the bandwidth to be multiple of the cell width intuitively
prescribes a set of neighboring cells in which contribution of each sample should be accounted for. Factor ρ
determines discretization granularity of the data domain and defines the complexity of the density estimation
procedure. For higher ρ details on finer scale are revealed with increase in the computational complexity. This
parameter is connected with the bandwidth h0 (12), which is effectively used to control the scale of observation.
Therefore, we fix the discretization granularity to ρ = 3 as this value represents a compromise between the
efficiency and the accuracy of the provided density estimate.
The introduced technique requires two passes through the data to compute the initial and the final density
estimate. Efficiency of this procedure can be increased such that it requires one pass while preserving the
accuracy. Let
1
K
Z(x, xj ) =
h(xj )d
x − xj
h(xj )
(13)
be the contribution of the sample xj at point x. Introducing (13) and (11) into (8) yields
1
fˆD (x) =
N
X
Z(x, xl )
(14)
xl :d(xl ,x)≤h(xl )
Let cu denote the spatial coordinates of the center of the u-th cell of the discretized domain. The contribution
of all samples which populate the v-th cell to the density at the center of the u-th cell is approximately
Ẑ(cu , cv ) = f˜D (cv )
1
K
h(cv )d
7
cu − cv
h(cv )
(15)
The contribution is accounted for as if all samples were located in the center of the cell. Asset of the adaptively
scaled kernel centered at the v-th cell is multiplied with the number of samples populating that cell. By inserting
(15) into (14) we obtain:
1
fˆD (cu ) ≈
N
X
Ẑ(cu , cl ),
(16)
cl :d(cu ,cl )≤h(cl )
The proposed technique requires one pass through the input data and an additional pass through populated
cells, yielding computational complexity O(N + rM ), where N is cardinality of the data set, M is the number
of populated cells and constant r, proportional to ρd , is the average number of neighboring cells in which kernel
contribution is accounted for.
4.2
Clustering and noise level
Clustering is based on determining modes of the estimated discrete density and associated basins of attraction.
Subset of input samples pertaining to a basin of attraction of the same strong density attractor are assigned a
unique cluster label. Samples pertaining to basins of attraction of modes with the density below noise level ξ
are labeled as noise. Step wise hill-climbing procedure started from populated cells (satisfying f˜D > 0) has two
stopping criteria:
1. Local maxima detected
If the density fˆD of the detected mode is below noise level ξ, all cells on the path of the hill-climbing
procedure are labeled as noise. Otherwise, a strong attractor is detected and a new label denoting new
cluster is assigned to all cells on the path of the procedure.
2. Cell with already assigned label encountered
Continuation of the procedure from this point leads to an already detected maxima. The encountered
label is assigned to all cells on the path of the hill-climbing procedure, regardless of representing noise or
clustered data.
Clustering procedure stops when all populated cells are labeled.
Noise level ξ separates statistically important clusters from noise and outliers. As prior knowledge would be
required to anticipate absolute noise level, we define ξ relative to the estimated PDF by:
ξ = εΛ,
8
(17)
where Λ is the geometric mean of fˆD and ε is the task driven algorithm parameter. The value Λ differs from
the proportionality constant λ, defined as the geometric mean of the initial density estimate. By defining noise
level relative to the estimated density, we utilize the fact that the density of the uniformly distributed noise
f DN oise is nearly constant and the probability that the density attractors remain identical goes against 1 for
|DN oise | → ∞, where |DN oise | is the cardinality of the noise data set [16]. Note that only modes of the density
must be above noise level ξ and the associated basins of attraction are delineated by hill-climbing procedure.
Therefore reasonably wide range of values ε started with ε = 2 should efficiently separate noise from structured
data. The computational complexity of the hill-climbing procedure is O(M ).
The output of the mode seeking procedure is the data domain divided to the basins of attraction of significant
attractor and region of noise. If the input data samples are to be assigned to clusters representing revealed
patterns in the data distribution, additional pass through the input data is required to assign each sample a
cluster label. Thus the overall complexity of the clustering algorithm is O[2N + (r + 1)M ].
4.3
Postprocessing modes
Local maxima in minor scale can also be exhibited in the regions of higher density. These can mislead mode
seeking procedure and multiple modes corresponding the same cluster can be detected, particularly on the flat
plateaus of the PDF. In [2] these artifacts were removed by joining mode candidates at a distance less than
the kernel bandwidth. However, on larger clusters spatially distant modes representing the same pattern in the
input data can exist that can not be joined based solely on the distance criterion. Joining strategy presented
in [16] fuses modes if path between modes exists with density above predefined level. We adopt similar strategy
with adaptive joining threshold. Two modes are fused if path between modes exists with density above 75% of
the density of lower mode.
5
Experimental results
The results of the proposed mode seeking procedure applied on the 2D data set are shown in Fig. 1. Original
data set is shown in Fig. 1(a) and the same data set in the presence of noise is shown in Fig. 1(b). Number
of noise data samples added is approximately 60% of the number of samples in original data set. Estimated
discrete density in the presence of noise is shown in Fig. 1(c). The effect of noise is visible as raised overall
9
density and small variation in the areas where no clusters are present. However, these variations are minor
when compared with the density of the clustered data and are easily removed with the adaptive noise level
(17). Significant density attractors approximately retain their original positions [16]. Clusters detected with
the proposed grid-based algorithm are shown in Fig. 1(d). Detected clusters are shown in three colors while
noise data samples are shown dim.
Number of modes detected on the original data set ( Fig. 1(a)) for different values of bandwidth h0 is
given in Table 1. Bandwidth h0 is given in the first column, followed by number of strong attractors detected
in the second column and final number of modes after joining procedure (see subsection 4.3) in the third
column. In the next three columns of the Table 1 results achieved with the fixed bandwidth Mean Shift are
shown in the same format. Results achieved for the variable bandwidth Mean Shift with nearest neighbors
technique [5] are presented in the last three columns. Value k represents number of nearest neighbors used to
assess variable bandwidth. Both versions of the mean shift algorithm use the spatial distance based joining
strategy. The results demonstrate the ability of all three algorithms to reveal correct number of modes. The
proposed technique detects correct number of modes for a wide range of the parameter h0 . For small values of
h0 details in finer scale are revealed, resulting in higher number of modes detected representing small variations
in cluster densities. These modes are joined together in Postprocessing procedure.
The results achieved in the presence of noise (Fig. 1(b)) are presented in Table 2. Presence of noise slightly
narrows the bandwidth range for which correct number of nodes is detected by the proposed technique. Fixed
bandwidth Mean Shift detects correct number of nodes after joining procedure for the narrow bandwidth range.
Variable bandwidth Mean Shift can’t reveal correct number of significant modes as the presence of noise has
great influence on nearest neighbors technique used to acquire the variable bandwidth.
The performance of the proposed approach applied to overlapping clusters is illustrated in Fig. 2. Input
data is shown in Fig. 2(a), with labels from the original distribution shown in Fig. 2(b). Estimated discrete
density is shown in Fig. 2(c). The output of the clustering algorithm is shown in Fig. 2(d). The compliance between clusters from the original distribution and the clusters detected with the proposed algorithm is
96%. Non-parametric procedure correctly detects peaks of the density with boundary between clusters formed
from neighboring points from which hill-climbing leads to different density attractors. However, data samples
generated from the tails of the distribution can be misclassified if two or more clusters are close. Different
10
example of the performance of the proposed algorithm when applied complexly structured data is given in Fig.
3. Grid-based mode seeking procedure is applied the 3D data samples organized in the form of three interlocked
rings. Detected clusters are shown in different colors with each ring recognized as a separate cluster.
The proposed mode seeking procedure is applied as color reduction algorithm to image pixels represented by
their 3D color coordinates. Discrete density is estimated in the discretized color space and modes of the density
are detected. Each pixel color is replaced by the color of the mode of the density function to which hill-climbing
procedure started at the color coordinates of the pixels converges to. Color reduction is low level preprocessing
step which can be used in wide range of computer vision problems. Example in Fig. 4 shows original image
4(a) and output image 4(b) with each pixel color replaced by the color of the corresponding mode. In Fig. 5
results are shown for three different values of bandwidth h0 . For smaller bandwidth details in finer scale are
preserved (can be seen on light reflections, clothes, hair etc.), while for higher h0 more reduction is carried out
and most important features on image are isolated (clothes, faces). Bandwidth h0 is therefore used to set level
of abstraction. However, variable bandwidth kernel ensures that details in different scales present in different
areas of the image are still correctly isolated.
Several validation methods have been proposed in the literature for the objective evaluation of clustering
methods [15, p. 257]. Typical evaluation objective functions formalize the goal of attaining high intra-cluster
similarity and low inter-cluster similarity, which is in accordance with the intuitive concept of clustering. However, good scores on an internal criterion do not necessarily translate into good effectiveness in an particular
application [32, p. 356]. An alternative to this approach is evaluation in the application of interest. The proposed
mode seeking procedure can be implemented in wide range of clustering and data mining problems, like given
example in color reduction, where dedicated evaluation techniques should be used to assess the effectiveness of
the method in the particular application [33, 34].
6
Conclusion
The basic feature of all grid-based clustering techniques is a low computational complexity. However, discretization unavoidably introduces error and lowers the performance in disclosing clusters of different scales which are
often present in the real data. Some techniques attempt to solve this problem by allowing arbitrary mesh
refinement in densely populated areas [21–23]. The resulting grid structure has complex neighboring relations
11
between adjacent cells, which may introduce computational overhead in the mode seeking and clustering procedure. Moreover, an additional pass through the data is often required to construct the adaptive grid, or the
result is sensitive to the order of the input data.
The proposed technique is based on a single resolution grid, thus a simple hill-climbing procedure can efficiently detect modes and associated basins of attraction. Two-step density estimation ensures that the estimator
is self-tuned to the local scale variations in the data set without considerable increase in the computational complexity. Adaptability to the noise in the input scene is attained using relative noise level. Experiments confirm
the robustness of the algorithm in the presence of noise as well as the ability to analyze complex data sets.
The proposed technique does not scale well with the dimension of the feature space, due to the time and
space complexity of multidimensional grid processing and the curse of dimensionality problem inherent to all
density-based approaches. However, this technique should be applicable to the wide range of problems when
efficient processing of large number of low-dimensional data samples is required.
References
[1] B. W. Silverman, Density Estimation for Statistics and Data Analysis (Monographs on Statistics and
Applied Probability). London: Chapman and Hall, 1986.
[2] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, 2002.
[3] K. Fukunaga and L. D. Hostetler, “The estimation of the gradient of a density function, with applications
in pattern recognition,” IEEE Transactions on Information Theory, vol. 21, no. 1, pp. 32–40, 1975.
[4] D. Comaniciu and P. Meer, “Robust analysis of feature spaces: color image segmentation,” in Proceedings of
the 1997 Conference on Computer Vision and Pattern Recognition (CVPR ’97), (Washington, DC, USA),
pp. 750–756, IEEE Computer Society, 1997.
[5] B. Georgescu, I. Shimshoni, and P. Meer, “Mean shift based clustering in high dimensions: A texture
classification example,” in Proceedings of the 9th IEEE International Conference on Computer Vision,
(Washington, DC, USA), pp. 456–463, IEEE Computer Society, 2003.
12
[6] T. Gang, H. Rui-Min, W. Zhong-Yuan, and Z. Li, “Object tracking algorithm based on meanshift algorithm combining with motion vector analysis,” in ETCS ’09. First International Workshop on Education
Technology and Computer Science, (Wuhan, Hubei, China), pp. 987–990, 2009.
[7] J. Wang, B. Thiesson, Y. Xu, and M. Cohen, “Image and video segmentation by anisotropic kernel mean
shift,” in Proceedings of European Conference on Computer Vision ECCV 2004, pp. 238–249, SpringerVerlag, 2004.
[8] A. M. Fahim, G. Saake, A.-B. M. Salem, F. A. Torkey, and M. A. Ramadan, “An enhanced density based
spatial clustering of applications with noise,” in DMIN (R. Stahlbock, S. F. Crone, and S. Lessmann, eds.),
pp. 517–523, CSREA Press, 2009.
[9] J. Sander, M. Ester, H. Kriegel, and X. Xu, “Density-based clustering in spatial databases: The algorithm
gdbscan and its applications,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 169 – 194, 1998.
[10] O. Uncu, W. Gruver, D. Kotak, D. Sabaz, Z. Alibhai, and C. Ng, “Gridbscan: Grid density-based spatial
clustering of applications with noise,” in IEEE International Conference on Systems, Man and Cybernetics,
2006. SMC ’06, 2006.
[11] eD. Ma and A. Zhang, “An adaptive density-based clustering algorithm for spatial database with nois,” in
Fourth IEEE International Conference on Data Mining (ICDM’04), pp. 467–470, 2004.
[12] K. Fukunaga, “Statistical pattern recognition,” in Handbook of Pattern Recognition and Computer Vision
(P. S. P. W. C. H. Chen, L. F. Pau, ed.), pp. 33–60, Singapore: World Scientific Publishing Co., 1993.
[13] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 790–799, 1995.
[14] D. Comaniciu, V. Ramesh, and P. Meer, “The variable bandwidth mean shift and data-driven scale selection,” in Proceedings of the 8th IEEE International Conference on Computer Vision, vol. 1, (Vancouver,
Canada), pp. 438–445, 2001.
[15] S. Mitra and T. Acharya, Data Mining: Multimedia, Soft Computing, and Bioinformatics. Hoboken, New
Jersey: John Wiley & Sons, Inc., 2003.
13
[16] A. Hinneburg and D. A. Keim, “A general approach to clustering in large databases with noise,” An
International Journal of Knowledge and Information Systems, vol. 5, no. 4, pp. 387–415, 2003.
[17] D. W. Scott, “Averaged shifted histograms: Effective nonparametric density estimators in several dimensions,” The Annals of Statistics, vol. 13, no. 3, pp. 1024–1040, 1985.
[18] Z. Yu and H. S. Wong, “Gca: A real-time grid-based clustering algorithm for large data set,” in Proceedings
of the 2006. International Conference on Pattern Recognition, (Hong Kong, China), pp. 740–743, 2006.
[19] S. Mahran and K. Mahar, “Using grid for accelerating density-based clustering,” in 8th IEEE International
Conference on Computer and Information Technology (CIT 2008), (Sydney, NSW, Australia), 2008.
[20] M. Ester, H. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery
and Data Mining (KDD-96), p. 226231, AAAI Press, 1996.
[21] Z. A. Aghbari and R. Al-Haj, “Hill-manipulation: An effective algorithm for color image segmentation,”
Image and Vision Computing, vol. 24, no. 8, pp. 894–903, 2006.
[22] W. Liao, Y. Liu, and A. Choudhary, “A grid-based clustering algorithm using adaptive mesh refinement,”
in 7th Workshop on Mining Scientific and Engineering Datasets, (Lake Buena Vista, Florida, USA), 2004.
[23] N. H. Park and W. S. Lee, “Statistical grid-based clustering over data streams,” ACM SIGMOD Record,
vol. 33, no. 1, pp. 32–37, 2004.
[24] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “Wavecluster: a wavelet-based clustering approach for
spatial data in very large databases,” The International Journal on Very Large Data Bases, vol. 8, no. 3-4,
pp. 289–304, 2000.
[25] D. W. Scott and S. R. Sain, “Multi-dimensional density estimation,” in Handbook of Statistics: Data Mining
and Computational Statistics (C. R. Rao and E. J. Wegman, eds.), vol. 23, pp. 229–263, Amsterdam:
Elsevier, 2004.
[26] I. S. Abramson, “On bandwidth variation in kernel estimates-a square root law,” The Annals of Statistics,
vol. 10, no. 4, pp. 1217–1223, 1982.
14
[27] P. Hall, T. C. Hu, and J. S. Marron, “Improved variable window kernel estimates of probability densities,”
The Annals of Statistics, vol. 23, no. 1, pp. 1–10, 1995.
[28] B. A. Turlach, “Bandwidth selection in kernel density estimation: A review,” Disscusion Paper 9317,
Institute de Statistique, 1993.
[29] M. P. Wand and M. C. Jones, Kernel Smoothing (Monographs on Statistics and Applied Probability).
London: Chapman and Hall, 1994.
[30] C. R. Loader, “Bandwidth selection: classical or plug-in?,” Annals of Statistics, vol. 27, no. 2, pp. 415–438,
1999.
[31] R. P. W. Duin, “On the choice of smoothing parameters for parzen estimators of probability density
functions,” IEEE Transactions on Computers, vol. 25, no. 11, pp. 1175–1179, 1976.
[32] C. D. Manning, P. Raghavan, and H. Schutze, An Introduction ti Information Retrieval. Cambridge,
England: Online edition (c) 2009 Cambridge University Press, 2009.
[33] H. Zhang, J. E. Fritts, and S. A. Goldman, “Image segmentation evaluation: A survey of unsupervised
methods,” Computer Vision and Image Understanding, vol. 110, no. 2, pp. 260–280, 2008.
[34] R. Unnikrishnan, C. Pantofaru, and M. Hebert, “A measure for objective evaluation of image segmentation
algorithms,” in Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition,
Workshop on Empirical Evaluation Methods in Computer Vision, (San Diego, CA, USA), pp. 394–400,
2005.
15
Table 1: 2D data - Number of modes detected
Two Step GB mode seeking
MS, fixed bandwidth
MS, variable bandwidth
h0
modes
final
h
modes
final
k
modes
final
0.02
133
56
0.02
408
12
50
303
16
0.04
32
3
0.04
115
3
100
247
7
0.06
9
3
0.06
36
3
150
68
5
0.08
10
3
0.08
25
3
200
344
5
0.10
6
3
0.10
16
3
250
360
3
0.12
3
3
0.12
12
3
300
64
3
0.14
3
3
0.14
10
3
350
34
3
0.16
3
3
0.16
8
3
400
38
3
0.18
3
3
0.18
7
3
450
28
3
0.20
3
3
0.20
9
3
500
25
3
0.22
3
3
0.22
3
3
550
19
3
0.24
3
3
0.24
7
3
600
20
3
0.26
3
3
0.26
5
3
650
19
3
0.28
3
3
0.28
6
2
700
16
3
0.30
3
3
0.30
2
1
750
18
3
0.32
3
3
0.32
2
1
800
16
3
0.34
3
3
0.34
3
1
850
16
3
0.36
3
3
0.36
3
1
900
20
3
0.38
3
3
0.38
3
1
950
20
3
0.40
3
3
0.40
3
1
1000
20
3
0.42
3
3
0.42
3
1
1050
20
3
0.44
3
3
0.44
4
1
1100
20
3
0.46
3
3
0.46
2
1
1150
20
3
0.48
3
2
0.48
2
1
1200
20
3
0.50
3
2
0.50
2
1
1250
20
3
16
Table 2: 2D data with noise - Number of modes detected
Two Step GB mode seeking
MS, fixed bandwidth
MS, variable bandwidth
h0
modes
final
h
modes
final
k
modes
final
0.02
323
194
0.02
901
17
50
426
29
0.04
86
27
0.04
655
3
100
312
11
0.06
28
5
0.06
111
6
150
422
9
0.08
23
3
0.08
87
6
200
595
10
0.10
11
3
0.10
147
6
250
698
8
0.12
4
3
0.12
106
5
300
566
7
0.14
4
3
0.14
31
4
350
293
6
0.16
5
3
0.16
25
4
400
165
6
0.18
5
3
0.18
105
5
450
147
6
0.20
3
3
0.20
10
3
500
148
6
0.22
3
3
0.22
9
3
550
143
6
0.24
3
3
0.24
8
3
600
141
6
0.26
3
3
0.26
9
3
650
138
6
0.28
3
3
0.28
76
2
700
137
6
0.30
3
3
0.30
4
1
750
135
6
0.32
3
3
0.32
3
1
800
135
6
0.34
3
3
0.34
2
1
850
134
6
0.36
3
3
0.36
4
1
900
136
6
0.38
3
3
0.38
3
1
950
136
6
0.40
3
3
0.40
4
1
1000
136
6
0.42
3
3
0.42
3
1
1050
136
6
0.44
3
3
0.44
4
1
1100
136
6
0.46
3
2
0.46
4
1
1150
136
6
0.48
3
2
0.48
4
1
1200
136
6
0.50
3
2
0.50
4
1
1250
136
6
17
List of Figures
1
Density estimation and clustering in the presence of noise . . . . . . . . . . . . . . . . . . . . . . 19
2
Density estimation and clustering on data set with overlaping clusters . . . . . . . . . . . . . . . 20
3
Clustered 3D data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4
Mode seeking procedure implmented for color reduction algorithm . . . . . . . . . . . . . . . . . 22
5
Color reduction for different values of fixed bandwidth h0 . . . . . . . . . . . . . . . . . . . . . . 23
18
(a) Original data set
(b) Data set with noise
(c) Estimated discrete density
(d) Clusters detected, noise samples are shown dim
Figure 1: Density estimation and clustering in the presence of noise
19
(a) Overlapping clusters
(b) Cluster labels from the original distribution
(c) Estimated discrete density
(d) Detected cluster labels
Figure 2: Density estimation and clustering on data set with overlaping clusters
20
Figure 3: Clustered 3D data
21
(a) Original image
(b) Colors reduced
Figure 4: Mode seeking procedure implmented for color reduction algorithm
22
(a) Original image
(b) Colors reduced, h0 = 2
(c) Colors reduced, h0 = 8
(d) Colors reduced, h0 = 16
Figure 5: Color reduction for different values of fixed bandwidth h0
23