Download Small Codes and Large Image Databases for Recognition

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

M-Theory (learning framework) wikipedia , lookup

Convolutional neural network wikipedia , lookup

Computer vision wikipedia , lookup

Visual Turing Test wikipedia , lookup

Scale-invariant feature transform wikipedia , lookup

Histogram of oriented gradients wikipedia , lookup

Stereo display wikipedia , lookup

Transcript
Small Codes and Large Image
Databases for Recognition
CVPR 2008
Antonio Torralba, MIT
Rob Fergus, NYU
Yair Weiss, Hebrew University
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Summary
• Goal
– efficient image search(real time on web-sized)
and fast, just require little memory, enable on
standard hardware or handheld devices
• Approach
– Use machine learning to convert Gist descriptor
to a compact binary code with a few hundred bits
per image
Gist descriptor
• Global image representation
• Describe the shapes occurring in an image
with one descriptor
– Subdivide image in 4×4 sub images
– Calculate Gabor responses in each of these
– Create histograms of Gabor responses in each sub
image
Slide by James Hays and Alexei Efros
Gist descriptor
Slide by James Hays and Alexei Efros
Gist descriptor
• In this paper
– 8 orientations ,4 frequency = 4×8×16 = 512
dimensional vector.
– For smaller images (32×32 pixels), use 3 frequency
= 3×8×16 = 384 dimensions.
Binary Code
• Three reason
– compression, it’s possible to represent images
with a very small number of bits and still maintain
the information for recognition
Binary Code
– scaling up to web-size databases requires doing
the calculations in memory. Fitting hundreds of
millions of images into a few GB of memory
means we have a budget of very few bytes per
image.
– short binary codes allow very fast querying in
standard hardware, either using hash tables or
efficient bit-count operations
Locality Sensitive Hashing (LSH)
• high dimensional Euclidean space
– finds nearest neighbors in constant time
• a number of random projections of that point
into R1
– each projection contributes a few bits
• when the number of bits is fixed and small
– LSH can perform quite poorly
• In this paper
– N = 30 bits
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Learning binary codes
•
•
•
•
•
A database of images {xi}
a distance function D(i, j)
a binary feature vector yi = f(xi)
Hamming distance
N100(xi) - the 100 nearest neighbors of xi
according to the distance function D(i, j)
• N100(yi) - the 100 descriptors yj that are closest
to yi in terms of Hamming distance
• we would like N100(xi) = N100(yi) for all examples
in our training set
BoostSSC
• Boosting similarity sensitive coding
• Learn original input space into a new space
– distances between images can be computed using
a weighted Hamming distance.
• Binary feature(M bits)
–
–
• weighted Hamming distance
–
BoostSSC
• positive examples
– pairs of images xi, xj , j ∈ N(xi).
• Negative examples
– pairs of images that are not neighbors
• regression stump
–
BoostSSC
• Minimize the square loss
–
– K is the number of training pairs
– Zk = 1, if the two images are neighbors; = −1,
otherwise
–
• In this paper
–
– M around 30 bits
Restricted Boltzmann Machines
• Network of binary stochastic units
•
– weights W, bias b
Hidden units: h
Symmetric weights: w
Visible units: v
Restricted Boltzmann Machines
• A probability can be assigned to a binary
vector at the visible units
–
• Convenient conditional distributions
–
–
Learn weights and biases using
Contrastive Divergence
Multi‐Layer RBM architecture
Training RBM models
• Pre‐training
– Unsupervised
– Use Contrastive Divergence to learn weights and
biases
– Gets parameters to right ballpark
• Fine‐tuning
–
–
–
–
Supervised
No longer stochastic
Backpropagate error to update parameters
Moves parameters to local minimum
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Two test datasets
• LabelMe
– 22,000 images
– Ground truth segmentations for all
– Can define distance between images using these
segmentations
• Web data[28]
– 12.9 million images 32 × 32 colorimages
– Subset of 80 million images
– No labels, so use L2 distance between GIST vectors as
ground truth
[28] A. Torralba, R. Fergus, and W. T. Freeman. Tiny images. Technical Report MIT-CSAIL-TR-2007024, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology,
2007.
LabelMe retrieval
LabelMe retrieval
• what ground truth semantic similarity is
– spatial pyramid matching over object labels
LabelMe retrieval
LabelMe retrieval
• On 2000 test images, N = 50
•
•
•
•
•
Web images retrieval
Web images retrieval
Retrieval speed evaluation
• Using multi-threading (M/T) on a quad-core
Pixel label
• On 2000 test images
Web images recognition
• On 2000 test images
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Conclusion
• Possible to build compact codes for retrieval
– Fast and small on standard PC
– Suitable for use on large database
– Much room for improvement