Download CMSC 426: Image Processing (Computer Vision)

Document related concepts

Artificial intelligence for video surveillance wikipedia , lookup

Visual servoing wikipedia , lookup

Pattern recognition wikipedia , lookup

Stereopsis recovery wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Visual Turing Test wikipedia , lookup

Computer vision wikipedia , lookup

Transcript
CS 558
Computer Vision
John Oliensis
Today’s class
• What is vision
• What is computer vision
• How we can solve vision problems
– Important tools
– Overall approaches
Why is Vision Interesting?
• Psychology
– ~ 50% of cerebral cortex is for vision.
– Vision is how we experience the world.
• Engineering
– Want machines to interact with world.
– Digital images are everywhere.
Vision is inferential
Vision is inferential
Inferring Surface “Lightness”
How do we determine the “true” surface color at A and B?
?Discount slow changes from lighting, keep quick paint changes?
Inferring Surface Color
We perceive true surface color despite unknown or changing light!
Vision is Inferential
(surface brightness)
plaid-movie,
haze movie
Vision is inferential:
Shape from light
Shape from Motion
Vision is Inferential:
Prior Knowledge
Computer Vision
• Inference  Computation
• Building machines that see
• Modeling biological perception
So what do humans care about?
slide by Fei Fei, Fergus & Torralba
Verification: is that a bus?
slide by Fei Fei, Fergus & Torralba
Detection: are there cars?
slide by Fei Fei, Fergus & Torralba
Identification: is that a picture of Mao?
slide by Fei Fei, Fergus & Torralba
Object categorization
sky
building
flag
banner
face
wall
street lamp
bus
bus
cars
slide by Fei Fei, Fergus & Torralba
Scene and context categorization
• outdoor
• city
• traffic
•…
slide by Fei Fei, Fergus & Torralba
Rough 3D layout, depth ordering
slide by Fei Fei, Fergus & Torralba
Challenges 1: view point variation
Michelangelo 1475-1564
slide by Fei Fei, Fergus & Torralba
Challenges 2: illumination
slide credit: S. Ullman
Challenges 3: occlusion
Magritte, 1957
slide by Fei Fei, Fergus & Torralba
Challenges 4: scale
slide by Fei Fei, Fergus & Torralba
Challenges 5: deformation
slide by Fei Fei, Fergus & Torralba
Xu, Beihong 1943
Challenges 6: background clutter
Klimt, 1913
slide by Fei Fei, Fergus & Torralba
Challenges 7: object intra-class variation
slide by Fei-Fei, Fergus & Torralba
Challenges 8: local ambiguity
slide by Fei-Fei, Fergus & Torralba
Summary:
Same object can appear very different!
How can you isolate what’s the same in these two pictures (the horse)
given the huge differences?
Quick Tour of Computer Vision
Approach: local cues
• The entire image is too complex.
• Try to find distinctive small patches which may
help to interpret it
• Example: brightness boundaries
• Maybe part of object’s outline?
• May help in inferring object shapes.
• Build larger interpretations from these small “clues”
Local cue: Brightness Boundary
Local cue: Brightness Boundary
Could this be part of the outline
of something?
Local cue: Brightness Boundary
Part of
the leaf
outline
Local cue: Brightness Boundary
Local cue: Brightness Boundary
Could this be part of
the outline of something?
Local cue: Brightness Boundary
Local cue: Brightness Boundary
Not an outline,
Just a highlight
Where’s the squirrel outline?
Integrating information over
larger regions
• Finding outlines
• Finding regions that might correspond to
objects
Boundary Detection
http://www.robots.ox.ac.uk/~vdg/dynamics.html
Boundary Detection
Finding the Corpus Callosum
(G. Hamarneh, T. McInerney, D. Terzopoulos)
Segmentation (foreground versus background)
(Sharon, Balun, Brandt, Basri)
Segmentation (foreground versus background)
JO
Different approach
Different approach
A Classical View of Vision
High-level
Object and Scene
Recognition
Figure/Ground
Organization
Mid-level
Grouping /
Segmentation
Low-level
pixels,
boundaries,
small windows…
A Contemporary View of Vision
High-level
Object and Scene
Recognition
Figure/Ground
Organization
Mid-level
Grouping /
Segmentation
But where do we
draw this line?
Low-level
pixels,
boundaries,
small windows…
• Boundaries and regions  Shape
• Texture  appearance
Original
Texture
• Learn the statistics of a texture to recognize it
• Synthesize texture based on learned model
Repetition
Synthesis
Original
Texture
• Textures over time
(Smoke, flame,waterfall...)
Repetition
Synthesis
Tracking (JO+HZ)
Understanding Action
Tracking pedestrians
 surveillance
Tracking face features  emotions
Tracking office workers
Stereo
Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923 (Slide courtesy Steve Seitz)
Stereo
P
P
Image 1
Camera 1
P
Image 2
Camera 2
Stereo
http://www.magiceye.com/
Stereo
http://www.magiceye.com/
Structure from Motion
Motion and shape
from movies
Estimated
Camera Motion
movie to shape
Estimated
3D shape
Movie shape
• Important for humans
Motion – Application
Inserting virtual objects into video
(www.realviz.com)
Motion Application
Aligning virtual & real objects despite camera motion
Visually guided surgery
Recognition
(despite appearance change)
Lighting affects appearance
Classification
(Funkhauser, Min, Kazhdan, Chen, Halderman, Dobkin, Jacobs)
Viola and Jones:
Real time
Face Detection
Approaches to Vision
Approach 1: Toy Models + Algorithms
1) Start with simple idealized model of world, images
Find good algorithms
2) Experiment on real world.
3) Update model, algorithms
Real Problem is going beyond idealizations!
Example:
3D shape from shading (JO)
How does Shading determine Shape?
Bright
Dark
Shading (image brightness) indicates how much light on
each surface patch
 gives surface patch orientation  overall shape
Very Idealized!
• Uniformly bright surface (no paint!)
(else brightness doesn’t indicate orientation)
• Other idealizations as well
–
–
–
–
–
–
no shadows
smooth surface
no objects in front of others
no glossiness or mirror reflections
known light source
light from one direction only
Approach 2: Psychology/Neuroscience
• Derive insights from human/animal vision
• Example: processing at multiple scales
• True for people; useful
for computers
(Try squinching your eyes from far)
Approach 3: Engineering
• Limited goals, application-oriented.
• Exploit domain constraints!
Problem: May not generalize to other tasks
Example: Image Mosaics
•
+
+ … +
=
• Goal: Stitch together images into composite image
Composite has to look real, taken from one place: may have to warp
original images
Approach 4
Bayesian inference + Learning
•
Given the image, what 3D scene produced it?
Impossible!
Image is 2D, has too little information about scene since it’s 3D.
•
Bayesian solution:
Learn: accumulate experience about what types of 3D
scenes and images are likely to occur.
Use this experience to help in interpreting new images.
(i.e., tune algorithm based on experience).
Approach 4
Bayesian inference + Learning
•
Usually based on probabilities
–
–
•
How likely is this object to appear?
How likely is it that this image patch shows the
object?
Finding the probability for all possibilities often very hard,
can lead to huge computations.
Recognize objects
(Bayesian learning)
• Recognize parts (eyes, nose,…) and their spatial arrangement.
• Learning: Automatically tune algorithm from its success on trial runs
Approach 4A:
Learning from millions of pictures
The State of Computer Vision
• Technology
– Applications
•
•
•
•
•
•
•
Surveillance
Road monitoring
Computer driven cars
Football
Movies
Medicine
Face Recognition/BiometricsSpace
• HCI (Human Computer Interface); sign language recognition
• Remote Sensing
– Successful companies
• Largest ~100-200 million in revenues. In-house applications.
The State of Computer Vision
• Science
– More progress in engineering
– Interesting theory for specific problems
(e.g., estimating 3D shape of objects from images)
– Beginnings of progress on “intelligent “vision
(i.e., recognizing objects)
The State of Computer Vision
• Sociology
– Engineers (dominant group)
– Applied math
– Computer science
– Visual Psychology, neuroscience
Related Fields
• Learning (can computers teach themselves to see?)
+ Artificial Intelligence (AI)
• Graphics. “Vision is inverse graphics”
• Visual perception + Neuroscience
• Math (eg., geometry, statistics/probability) + Physics
• Operation research, optimization
History (very rough)
“Those who cannot remember the past are condemned to repeat it”
• 1985-1990
–
–
–
–
–
Toy models/algorithms (line drawings of blocks)
AI Recognition Systems.
Segmentation. Break images up into regions that could be objects
Low level vision. Detecting brightness boundaries, estimating 3D shapes of objects
Neural nets. David Marr.
• 1990s
–
–
–
–
–
Estimating camera motion from movies. Projective geometry,
Model-based recognition. Use specific object models to find them in images
Represent 2D shapes by their “skeletons”
Tracking
Classifying pixels from appearance (blue  sky or water, green  leaf, …)
• 2000s
–
–
–
–
Learning: internet scale data
More reliable appearance descriptors  better recognition of objects
More math Graph theory, Monte Carlo, level sets.
Robust Statistics: recovering from mistakes of low level modules
Tools Needed for Course
• Math
–
–
–
–
–
Linear Algebra (to be taught)
Signal Processing (to be taught).
Calculus
Some geometry
Probability
• Computer Science
– Algorithms
– Programming (matlab)