Download SUMMARY OPEN PROBLEMS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Technical drawing wikipedia , lookup

Line (geometry) wikipedia , lookup

Transcript
SUMMARY
OPEN PROBLEMS
The slides are from several sources through James Hays (Brown);
including his own slides.
Fundamentals of Computer Vision
• Geometry
– What can find out from mathematics.
• Matching
– How to measure the similarity of two regions.
• Alignment
– How to align points/patches.
– How to recover transformation parameters based on
matched points.
• Grouping
– What points/regions/lines belong together.
• Categorization/Recognition
– What similarities are more important.
around year 2010...
Geometry
• x = K [R t] X
– Maps 3D point X to 2D point x.
– Rotation R, translation t maps into 3D camera coordinates.
– Intrinsic matrix K calibrates a 2D image.
• Parallel lines in 3D converge at the vanishing point in
the 2D image.
– A 3D plane has a vanishing line in the 2D image.
•
x’T F
x=0
or viceversa
x F x' = 0 and F => F
T T
T
– Points in two views that correspond to the
same 3D point are related by the fundamental matrix F.
The two camera matrices (3D=>2D) are invariant to a 3D (4x4)
projective transformation and give uncalibrated 2D images!
Matching
• Does this patch match that patch...
– in two simultaneous views -- stereo
– in two successive frames -- tracking, flow, SFM
– in two pictures of the same object -- recognition
?
?
Matching
Representations invariant/robust to expected deformations.
• Often assume that locally the shape is constant.
• Change in viewpoint mostly can be achieved.
• Change in lighting or camera gain compensated by local averages.
For example,
SIFT achieved a relative good descriptor
... mainly for 2D components... which
provides robustness to in-plane orientation,
lighting, contrast, translation.
Alignment of points
Search efficiently to align matching patches.
• Interest points: find repeatable, distinctive points.
– long-range matching -- wide baseline stereo, object recognition
– Harris detector -- 2D corners are ofter repeatable
– difference of Gaussian -- used in Laplacian image pyramid
• Local search
– short range matching -- tracking, optical flow
– gradient descent on SSD -- often with image pyramid
• Windowed search
– long-range matching -- recognition, stereo with scanline
)
Alignment of sets
Find transformation to align matching sets of points.
• Geometric transformation
– least squares fit (with SVD), if all matches can be trusted
– RANSAC: works if fraction of inliers is high
-- a more advancent robust estimator already exist...
Other cases...
-- thin plate spline for more general distortions
-- one-to-one correspondence with graph algorithms
A_1
A_2
A_3
s
Grouping
• Clustering: group items (patches, pixels, lines, etc.)
that have similar appearance.
-- unknown regions often can be allocated to the clustening
• Segmentation: group pixels into regions of coherent color,
texture, motion, and/or label.
• Probabilistical group distributions, like Gaussians, often
can estimate the distributions’ parameters, but the
nonparametric distributions also should not be neglected.
Categorization
Match objects, parts, or scenes that may vary in
appearance.
• Categories are typically defined by human and may be
related by function, location, or other non-visual
attributes.
• What are the important similarities and can be learned from
training examples only.
Training
Labels
Training
Images
Image
Features
Classifier
Training
Trained
Classifier
Object categorization
Search by sliding window detector.
•
May work well for rigid objects in isolation and given pose.
•
Simple alignment for simple deformations works, but more
flexible alignment for articulated objects
(part-based models) is difficult.
Object or
Background?
Real object categorization
is a difficult problem
for the moment...
Vision as part of an intelligent system
3D Scene
Feature
Extraction
Texture
Grouping
Surfaces
Interpretation
Action
Objects
Color
Bits of
objects
Agents
and goals
Optical
Flow
Stereo
Disparity
Sense of
depth
Motion
patterns
Shapes and
properties
Open
paths
Words
Walk, touch, contemplate, smile, evade, read on, pick up, …
Computer vision is potentially worth major $$$, but
there are major challenges to overcome first.
These were some successful examples...
• Driver assistance
MobileEye received >$100M in funding from Goldman Sachs.
• Entertainment (Kinect, movies, etc.)
Intel is spending $100M for visual computing over next five years.
• Security
Potential for billions of deployed cameras.
• Robot workers
Machines instead of people.
• many more applications...
There are many open problems due to the difference
between the human and computer vision. While human
vision (working completely differently and largely unknown)
can process simultateously and robustly many object,
todays best computer vision algorithms will not exceed
four-five objects at not very difficult poses.
Even the low-level algorithms, like, edge or corner
detection, should be reexamined since we already have
much more pixels per each image. Maybe better algorithms
will come. Top-down ("in the memory") and bottom-up
("what we see now") should more closely interact and
from a much lower stage in the processing.
Here, we will examine only the object recognition open
problems and how they reflect in the low-level solutions.
Open problems
Object category recognition: where is the cat?
Important questions:
• How can we better align two object instances.
• How do we identify the important similarities of objects
within a category.
• How do we tell if two patches depict similar shapes.
Open problems
• Spatial understanding: what she/he will be doing
here if have to?
Important questions:
• What are good representations of space for navigation
and interaction. What kind of details are important.
• How can we combine single-image cues with multi-view
cues.
Open problems
Object representation: what is it?
Important questions:
• How can we pose recognition so that it lets us deal with
new objects.
• What do we want to predict or infer, and to what extent
does that rely on categorization.
• How do we transfer knowledge of one type of object to
another.
it is far from a comercial system... yet
example
A. Farhadi, I. Endres, D. Hoiem, D.A. Forsyth, “Describing Objects by their Attributes”, CVPR 2009
Hays and Efros, SIGGRAPH 2007