Download Small Codes and Large Image Databases for Recognition

Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University Outline • • • • Introduction Methods Experiment Conclusion Outline • • • • Introduction Methods Experiment Conclusion Summary • Goal – efficient image search(real time on web-sized) and fast, just require little memory, enable on standard hardware or handheld devices • Approach – Use machine learning to convert Gist descriptor to a compact binary code with a few hundred bits per image Gist descriptor • Global image representation • Describe the shapes occurring in an image with one descriptor – Subdivide image in 4×4 sub images – Calculate Gabor responses in each of these – Create histograms of Gabor responses in each sub image Slide by James Hays and Alexei Efros Gist descriptor Slide by James Hays and Alexei Efros Gist descriptor • In this paper – 8 orientations ,4 frequency = 4×8×16 = 512 dimensional vector. – For smaller images (32×32 pixels), use 3 frequency = 3×8×16 = 384 dimensions. Binary Code • Three reason – compression, it’s possible to represent images with a very small number of bits and still maintain the information for recognition Binary Code – scaling up to web-size databases requires doing the calculations in memory. Fitting hundreds of millions of images into a few GB of memory means we have a budget of very few bytes per image. – short binary codes allow very fast querying in standard hardware, either using hash tables or efficient bit-count operations Locality Sensitive Hashing (LSH) • high dimensional Euclidean space – finds nearest neighbors in constant time • a number of random projections of that point into R1 – each projection contributes a few bits • when the number of bits is fixed and small – LSH can perform quite poorly • In this paper – N = 30 bits Outline • • • • Introduction Methods Experiment Conclusion Learning binary codes • • • • • A database of images {xi} a distance function D(i, j) a binary feature vector yi = f(xi) Hamming distance N100(xi) - the 100 nearest neighbors of xi according to the distance function D(i, j) • N100(yi) - the 100 descriptors yj that are closest to yi in terms of Hamming distance • we would like N100(xi) = N100(yi) for all examples in our training set BoostSSC • Boosting similarity sensitive coding • Learn original input space into a new space – distances between images can be computed using a weighted Hamming distance. • Binary feature(M bits) – – • weighted Hamming distance – BoostSSC • positive examples – pairs of images xi, xj , j ∈ N(xi). • Negative examples – pairs of images that are not neighbors • regression stump – BoostSSC • Minimize the square loss – – K is the number of training pairs – Zk = 1, if the two images are neighbors; = −1, otherwise – • In this paper – – M around 30 bits Restricted Boltzmann Machines • Network of binary stochastic units • – weights W, bias b Hidden units: h Symmetric weights: w Visible units: v Restricted Boltzmann Machines • A probability can be assigned to a binary vector at the visible units – • Convenient conditional distributions – – Learn weights and biases using Contrastive Divergence Multi‐Layer RBM architecture Training RBM models • Pre‐training – Unsupervised – Use Contrastive Divergence to learn weights and biases – Gets parameters to right ballpark • Fine‐tuning – – – – Supervised No longer stochastic Backpropagate error to update parameters Moves parameters to local minimum Outline • • • • Introduction Methods Experiment Conclusion Two test datasets • LabelMe – 22,000 images – Ground truth segmentations for all – Can define distance between images using these segmentations • Web data[28] – 12.9 million images 32 × 32 colorimages – Subset of 80 million images – No labels, so use L2 distance between GIST vectors as ground truth [28] A. Torralba, R. Fergus, and W. T. Freeman. Tiny images. Technical Report MIT-CSAIL-TR-2007024, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 2007. LabelMe retrieval LabelMe retrieval • what ground truth semantic similarity is – spatial pyramid matching over object labels LabelMe retrieval LabelMe retrieval • On 2000 test images, N = 50 • • • • • Web images retrieval Web images retrieval Retrieval speed evaluation • Using multi-threading (M/T) on a quad-core Pixel label • On 2000 test images Web images recognition • On 2000 test images Outline • • • • Introduction Methods Experiment Conclusion Conclusion • Possible to build compact codes for retrieval – Fast and small on standard PC – Suitable for use on large database – Much room for improvement

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Small Codes and Large Image Databases for Recognition