Download Computer Vision Systems for the Blind and Visually Disabled.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Optical illusion wikipedia , lookup

Transcript
What is Vision?
Why is it Hard?
Alan L. Yuille.
UCLA.
The Purpose of Vision.
“To Know What is Where by Looking”.
Aristotle. (384-322 BC).
 Information Processing: receive a signal by
light rays and decode its information.
 Vision appears deceptively simple, but this
is highly misleading.

What can humans see?

Rich description: fox, tree trunk, grass and
background twigs.
 Also: shape of fox’s legs and head, its fur, what
it is doing? old or young? and more.
 Local regions (B,C,D) are highly ambiguous.
Local regions are ambiguous
Airplane
Car
Boat
Sign
Building
Why can Humans see so well?

Humans appear understand images effortlessly.
But this is only because of the enormous amount
of our brains that we devote to this task.
 It is estimated that 40-50% of neurons in the
cortex are involved in doing vision.
 Humans are very visual. We get much more
information from our eyes than other animals.
Vision and the Brain
Half the Cortex does Vision
But human vision is not perfect

Human perception is often incorrect.
 Visual illusions suggest that perception is
often a reconstruction, or even a controlled
hallucination.
 Here are some illusions.
Ames Room
Perspective Cues
Are these lines horizontal?
Which square is brighter?
Visual Illusions

The perception of brightness of a surface,
 or the length of a line depends on context.
 Not on basic measurements like:
 the no. of photons that reach the eye
 or the length of line in the image.
Perception is Inference

Helmholtz. 1821-1894.
 “Perception as Unconscious Inference”.
So why is vision hard?

Vision is an inverse inference problem.
There is a highly complex process that
generates the image.
 This process is studied in Computer
Graphics. It models objects, light sources,
and how they interact.
 Vision must invert this image formation
process to estimate the “causal factors” –
objects, lighting, and so on.
Vision as Inverse Inference:
Inverting image formation.
Inverse Problems are often hard

There are an infinite number of ways that
images can be formed.
 Which ways are most plausible?
.
Vision and Bayes
Bayes’ Theorem gives a procedure to solve
inverse inference problems.
 It states that we can infer the
state S of the world from the
observed image I by,

P(S|I) = P(I|S)P(S)/P(I).
 But doesn’t tell us how to
 Specify these distributions.

Complexity:
The Fundamental Problem?

The set of all images is almost infinite
(Kersten 1987).
 Estimate that only a tiny fraction of all
possible images have been seen by humans
(even small images 10x10).
 The no. of objects is large – 30,000?
 The no. of scene types is large – 1,000?
 Hence I and S are extremely complex.
Variability and Ambiguity

Variability: Images of similar objects
(bicycle) can be very different (left, center).
 Ambiguity: Images of different objects can
be identical (right).
Complexity, Variability, Ambiguity

What is the space of all possible images?
 It is only now becoming possible to explore
this questions – (technological advances).
 Large datasets – 20,000 images (Pascal),
1,000,000 (ImageNet).
 But are these datasets big enough for results
on them to apply to other images?
Datasets: Promise and Peril

Vision evaluates performance of algorithms
on benchmarked datasets.
 But results on these datasets may fail to
generalize to other images.
 Datasets need to be representative of the
complexity of the real world.
 Tyranny of Datasets. Unrepresentative
datasets. Limited visual tasks.
Dataset Examples
Pascal (top). 20,000 images. 20 objects.
ImageNet (bottom) 1,000,000 images. 1,000 objects.
Brief History of Vision

The difficulty of vision became clear in the
1960’s,70’s when Artificial Intelligence
researchers tried to get computer programs to
interpret images.
 Initially AI researchers thought vision was easy.
“Solve vision in a summer”.
 Researchers gradually started realizing that vision
was extremely hard. Much harder than Chess.
.
Big Picture Theories of Vision.
Work in the 1980’s in vision was largely
influenced by big picture theories.
e.g., David Marr’s Vision (1982).
 The theories were interesting, but practical
results were limited.
 Progress started being made by breaking up
vision into subproblems. Inventing or
borrowing mathematical tools for these
problems.

Techniques used in Vision

Multi-Disciplinary set of tools.
 Linear and Non-linear filtering.
 Variational Methods.
 Probabilities on Graphs.
 Perspective Geometry.
 Differential Geometry.
 And many more. We will introduce many of
these at this summer school.
Vision as a Bag of Tricks?

Dividing vision up into sub-problems
enabled progress to be made.
 But lead to a situation where vision was
seen as an enormous bag of tricks.
 “What papers should I read in computer
vision? There are so many, and they are so
different.” Chinese Graduate student.
 Attempts are being made – summer schools,
Wikivision project – to give a core and
foundation for vision,
The mathematical richness of
vision

The complexity and ambiguity of vision is a
challenge.
 Vision problems (broadly defined) have
attracted the attention of learning
mathematicians (e.g., Field’s medalists).
 David Mumford, Shing-Tung Yau, Maxim
Kontsevich, Stephen Smale, Terrence Tao.
Relationships to other
Disciplines

Computer vision relates closely to
 Image Processing
 Machine Learning.
 Partially covered in this school.
 Artificial Intelligence.
 Robotics.
 Study of Biological Vision Systems –
Psychology and Neuroscience.
Imaging Devices

In this school we will be dealing with
“natural images” of everyday scenes.
 But the methods used can be applied to
other imaging modalities.
 Medical images: fMRI, EEC,
 Astronomial Images:…
 Ever-increasing number of imaging devices.
Need a science of visual images.
Structure of the School

Vision can be broken down into low-, mid-,
and high-level vision (very roughly).
 Low-level vision – local image operations
which have limited knowledge of the world.
 Mid-level vision – non-local operations
which know about surfaces and geometry.
 High-level vision – operations which know
about objects and scene structures.
Low-level vision

Image processing.
 Filtering, denoising,
 enhancement.
 Edge detection.
 Image segmentation.
 Right: ideal segmentation
 (followed by labeling).
Mid-Level Vision

Estimation of 3D surfaces:
 Binocular stereo,
 structure from motion,
 other depth cues
 (e.g., perspective, shading,
 texture, focus).
 Fig: Image, Depth (blue to red),
 Segmentation.
Mid-Level Vision: Grouping

Kanisza. Humans have a strong tendency to
group image structures as surfaces.
High-Level Vision

Object detection and Scene Understanding.
 Example: detect objects in Pascal.
High-Level Vision: parts

Detect parts of objects:
High-Level Vision – AI.

Rich Explicit Representations enable:
 Understanding of objects, scenes, and events.
 Reasoning about functions and roles of objects, goals
and intentions of agents, predicting the outcomes of
events. SC Zhu – MURI.
Visual Turing tests

Vision algorithms that have similar
properties to those of humans:
 (i) flexible, adaptive, robust
 (ii) capable of learning from limited data,
ability to transfer,
 (iii) able to perform multiple tasks,
 (iv) closely coupled to reasoning, language,
and other cognitive abilities.
Grammars, Compositionality,
and Pattern Theory.

How to put everything together?
 How to address the complexity problem of
vision.
 Pattern theory offers suggestions:
 Mumford and Desolneux 2010.
 Describe the class of patterns that can
happen in images using grammars.
Grammars and Pattern Theory

Parse images by decomposing them into
elementary image patterns.
 Inverse inference on a grammar for
generating images.
The next few lectures

The next few lectures will concentrate on
basic low-level vision tasks:
 Filtering – linear and non-linear.
 Segmentation. Variational and probabilistic
methods.
Conclusion





Vision is difficult and challenging. Humans
devote half of their cortex to doing it.
It is difficult because it is inverse inference in
an extremely complex domain.
It involves a great variety of mathematical
and computational techniques.
Low-, Mid-, High-Level taxonomy.
Pattern Theory and Grammars.