Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cambridge, Massachusetts Kernel Methods for Weakly Supervised Mean Shift Clustering Oncel Tuzel & Fatih Porikli Mitsubishi Electric Research Labs Peter Meer Rutgers University COMPANY CONFIDENTIAL 5/24/2017 2 Outline • Motivation • Mean Shift • Method Overview • Kernel Mean Shift • Constrained Kernel Mean Shift • Experiments • Conclusion 3 Motivation • Clustering is an ambiguous task • In many cases, the initially designed similarity metric fails to resolve the ambiguities • Simple supervision can guide clustering to desired structure • We present a semi supervised mean shift clustering algorithm based on pair-wise similarities 4 Mean Shift • Given n data points xi on Rd and associated bandwidths hi, the sample point density estimator is given by where k(x) is the kernel profile • Stationary points of the density can be found via the mean shift procedure where 5 Mean Shift Clustering • Mean shift iterations are initialized at the data points • The cluster centers are located by the mean shift procedure • The data points associated with the same local maxima of the density function produce a partitioning of the space • There is no systematic semi supervised mean shift algorithm 6 Method Overview • The supervision is given in the form of a few pair-wise similarity constraints • We embed the input space to a space where the constraint pairs are associated with the same mode • Mode seeking is performed on the embedded space • The method preserves all the advantages of mean shift clustering Embedded Space . . . . . . . . . . . . x .. . . . . . . ...... .... . ..x...... .... .. . .. . .. . ...x. .... .. . ......x.. .. Input Space ...x. .. . .. .x... 7 Pair-wise Constraints on the Input Space • Data points are projected to the null space of the constraint matrix Constraint Input Clustering Projection Points Vector 1 • The method fails if the clusters are not linearly separable c2-c1 y • Since the constraint point pairs overlap after projection, they are clustered together 0 c1 1 -1 -2 -1 0 x • At most d-1 constraints can be defined c2 1 8 Pair-wise Constraints on the Feature Space • The method can be extended to handle increasing number of constraints or to linearly inseparable case using a mapping function • The projection is performed on the feature space 1.5 1 (c2)-(c1) 0 (c ) 1c 1 -0.5 2 -1 -1 c2 0 1 x • Defining mapping explicitly is not practical Solution: Kernel Trick (c2) 0.5 x2 • The mapping embeds the input space to an enlarged feature space Mapping Constraint Input Clustering Projection toPoints Feature Vector Space 9 Kernel Mean Shift (Explicit Form) • Given and a p.s.d. kernel satisfying where • The density estimator at is given by • The stationary points can be found via the mean shift procedure 10 Kernel Mean Shift (Implicit Form) • Let be the dimensional feature matrix and be the dimensional Kernel matrix • At each iteration the estimate, , lies is the column space of and any point on the subspace can be written as • The distance between two points and is given by • The implicit form of mean shift updates the weighting vectors where denote the i-th canonical basis for Rn 11 Kernel Mean Shift Clustering • The clustering algorithm starts on the data points • Upon convergence the mode can be expressed via • When the rank of the kernel matrix K is smaller than n, columns of form an overcomplete basis and the modes can be identified within an equivalence relationship • The procedure is restricted to the subspace spanned by the feature points therefore • The convergence of the procedure follows from the original proof 12 Constrained Kernel Mean Shift 1.5 • Let be the set of point pairs to be clustered together • The constraint matrix is given by 1 Feature Space (c2)-(c1) (c2) x2 0.5 0 (c ) 1 -0.5 • The null space of A is the set of vectors -1 -1 1.5 and the matrix 0 x Projection 1 projects to • Under the projection the constraint point pairs are overlapped x 2 0.5 0 -0.5 -1 -1 0 1 x 1 13 Constrained Kernel Mean Shift • The constrained mean shift algorithm implicitly maps the data points to null space of the constraint matrix and performs mean shift on the embedded space • This process is equivalent to applying kernel mean shift algorithm with the projected kernel function • The projected Kernel matrix only involves mapping through the kernel function and can be expressed in terms of original Kernel matrix where involving constraint set and is the part of the Kernel matrix is the scaling matrix 14 Experiments • We conduct experiments on three datasets – Synthetic experiments – Clustering faces across illumination on CMU PIE dataset – Clustering object categories on Caltech-4 dataset • For the first two experiments we utilize Gaussian kernel function • For the last experiment we utilize kernel function • We use adaptive bandwidth mean shift where the bandwidth for each point is selected as the k-th smallest distance from the point to all the data points on the feature space 15 Clustering Linear Structure Data Points Mean Shift Constrained Mean Shift • We generated 240 data points originating from six different lines • Data is corrupted with normally distributed noise with standard deviation 0.1 • Three pair-wise constraints are given 16 Clustering Circular Structure • We generated 200 data points originating from five concentric circles • Data is corrupted with normally distributed noise with standard deviation 0.1 • 80 outlier points are added • Four pair-wise constraints are enforced from the same circle Data Points Data Points with Outliers Mean Shift Constrained Mean Shift 17 Clustering Faces Across Illumination Samples from CMU PIE Dataset Constraint Set • Dataset contains 441 images from 21 subjects under 21 different illumination conditions • Images are coarsely registered and scaled to the same size 128x128 • Each image is represented with a 16384-dimensional vector • Two pair-wise similarity constraints are given per subject • Approximately 1/10 of the dataset is labeled 18 Clustering Faces with Mean Shift Pair-wise Distances Mean Shift 50 50 100 100 150 150 200 200 250 250 300 300 350 350 400 400 100 200 300 400 100 200 300 • Mean shift finds 5 clusters corresponding to partly illumination conditions, partly subject labels 400 19 Clustering Faces with Constrained Mean Shift Pair-wise Distances after Embedding Constrained Mean Shift 50 50 100 100 150 150 200 200 250 250 300 300 350 350 400 400 100 200 300 400 100 200 300 • Constrained mean shift recovers all 21 subjects perfectly 400 20 Clustering Object Categories Samples from Caltech-4 Dataset • Dataset contains 400 images from four object categories: cars, motorcycles, faces, airplanes • Each image is represented with a 500 bin feature histogram • Pair-wise constraints are randomly selected within classes • Experiment is repeated with varying number of constraints (1 to 20 constraints per object class) 21 Clustering Object Categories with Mean Shift Pair-wise Distances Mean Shift 50 50 100 100 150 150 200 200 250 250 300 300 350 350 400 100 200 300 400 400 100 200 300 400 • Some of the samples from airplanes class and half of the motorcycles class are incorrectly identified as cars • The overall clustering accuracy is 74.25% 22 Clustering Object Categories with Constrained Mean Shift Pair-wise Distances after Embedding Constrained Mean Shift 50 50 100 100 150 150 200 200 250 250 300 300 350 350 400 100 200 300 400 400 100 200 300 • Clustering example after enforcing 10 constraints per class • Only a single example among 400 is misclustered 400 23 Clustering Performance vs. Number of Constraints • The results are averaged over 20 runs where at each run a different constraint set is selected • Clustering accuracy is over 99% for more than 7 constraints per class 24 Conclusion • We presented a novel constrained mean shift clustering method that can incorporate pair-wise must-link priors • The method preserves all the advantages of the original mean shift clustering algorithm • The presented approach also extends to inner product spaces thus, it is applicable to a wide range of problems