Download PDF-1 - RUcore - Rutgers University

Document related concepts
no text concepts found
Transcript
THE INFLUENCE OF COMPLEXITY ON THE DETECTION
OF CONTOURS
by
JOHN WILDER
A dissertation submitted to the
Graduate School—New Brunswick
Rutgers, The State University of New Jersey
in partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
Graduate Program in Psychology
Written under the direction of
Dr. Jacob Feldman
and approved by
New Brunswick, New Jersey
October, 2013
ABSTRACT OF THE DISSERTATION
The influence of complexity on the detection of contours
By JOHN WILDER
Dissertation Director:
Dr. Jacob Feldman
Detecting objects in visual scenes is an important function of the visual system. Studies
of contour detection and contour integration have answered many questions about the
human visual system, but the role of contour geometry is not well understood. This
thesis considers the problem of contour detection as a Bayesian decision problem. I
begin by describing a generative model for natural contours. Bayesian arguments
predict that simple contours (high probability under the generative model) should
be easier to detect than more complex (lower probability) ones. In the case of open
contours (Experiments 1 and 2), a complexity measure follows from a well-established
contour-generating model. For closed contours, which have been far less studied,
complexity measures require a more novel model that involves the shape of the region
enclosed by the contour. The results of closed contours (Experiments 3 and 4) show
that the complexity of the contour and the complexity of the shape of the bounded
region jointly a↵ect the ability of human observers to detect the contour in a noise field.
In summary, contour integration has been treated mainly as a local grouping problem, but these results suggest that there is an important role for global factors in
detection. Additionally, while the mathematical framework for measuring complexity
was used here to study contour detection, it is also general enough to be useful in all
areas of pattern detection where an explicit generative model is defined.
ii
Acknowledgements
While working on this dissertation I have received help and support from many people.
I wish to say thank you to them all.
My fellow graduate students and lab members have supported and encouraged
me, and provided opportunities for fun times and relaxation. There are too many
people to name here, but a special thanks to everyone who attended CAST meetings
and my team mates on Psy Jung. Thanks to Min Zhao for all her help with the kids.
A heart felt thank you to the sta↵ of the Psychology Department and the Center
for Cognitive Science. Without the help of Anne Sokolowski, Donna Tomaselli, John
Dixon, Sue Cosentino, and JoAnn Meli I would not have been able to navigate the Rutgers Bureaucracy, and made sure I was registered for classes, had insurance coverage
and was getting paid.
Thank you to all my committee members: Jacob Feldman, Manish Singh, Eileen
Kowler, and Ahmed Elgammal. Their input was useful in making this a dissertation
of which I am very proud.
Thanks to both Jacob and Eileen for their extensive work improving my writing
skills. The di↵erence between my writing when I began my graduate career and now is
astonishing. Thank you for helping develop this skill that is necessary for succeeding
in a research career.
Thank you to Manish Singh, for all the valuable help debugging research ideas
and deciding the appropriate statistical methods to use in my many research projects.
Thank you to Eileen Kowler, who has not only spend an abundance of time
helping with my intellectual development, but also helped me develop presentation
skills, introduced me to many important people in the field, and helped make it easy
to balance work and family.
Thank you to Jacob for all of the assistance in developing research ideas, designing
experiments, finding relevant literature, encouraging me along the way, and discussing
what movies and TV shows I should be watching.
Thank you to my family, especially Kristina. It is for you I have worked so hard.
iii
Dedication
The dissertation is dedicated to Kristina, Brigid, and Baby Stu.
Kristina, thank you for your help and support throughout graduate school, for
listening to research ideas, and participating in experiments.
Brigid, thank you for all of the fun and games, reading stories, and thank you for
behaving nicely when I was busy writing and couldn’t play.
Baby Stu, thank you for waiting to be born until after I completed this dissertation,
and thank for all the fun I know we will have as you grow up.
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. General Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1. Contour Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2. Natural Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3. Open Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.4. Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2. Open Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.0.1. Smooth contours . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.1. Bayesian model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.2. Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2.1. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
Procedure and Stimuli . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2.2. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3. Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3.1. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
Procedure and Stimuli . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.3.2. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.4. General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.6. Open vs Closed Contours . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3. Closed Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.1. Experiment 3: Natural Shapes . . . . . . . . . . . . . . . . . . . . . . . . .
36
v
3.1.1. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
Design and Procedure . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.1.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.2. Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.2.1. Contour Complexity . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.2.2. Shape Complexity Measures . . . . . . . . . . . . . . . . . . . . .
44
3.3. Experiment 4: Experimentally Manipulated Shapes . . . . . . . . . . . .
55
3.3.1. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
Design and Procedure . . . . . . . . . . . . . . . . . . . . . . . . .
59
3.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
3.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
4. General Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
4.0.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
Appendix A. Modeling Approaches . . . . . . . . . . . . . . . . . . . . . . . . . .
72
A.1. Spatial Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
A.2. Theoretical Decision Model . . . . . . . . . . . . . . . . . . . . . . . . . .
72
A.3. Realistic Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
vi
List of Figures
2.1. Sample Stimulus and Blown-up patch from the stimulus image . . . . .
13
2.2. Example stimulus in experiments 1 and 2. . . . . . . . . . . . . . . . . . .
18
2.3. Individual subjects’ performance as a function of complexity in Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.4. Experiment 1 data, combined across subjects. Performance is shown as
a function of complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.5. Individual subjects’ performance as a function of complexity in Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.6. Experiment 2 data, combined across subjects. Performance is shown as
a function of complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.7. Individual subjects’ performance as a function of the location of the
complexity in Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.8. Experiment 2 data, combined across subjects. Performance is shown as
a function of the location of the complexity. . . . . . . . . . . . . . . . . .
26
3.1. Examples of natural shapes, from which contours would be extracted
and embedded in noisy images. . . . . . . . . . . . . . . . . . . . . . . . .
37
3.2. Example natural closed contour embedded in noise. . . . . . . . . . . . .
39
3.3. Combined subjects’ performance as a function of DL(CONTOUR). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . .
45
3.4. Individual subjects’ performance as a function of DL(CONTOUR). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . .
46
3.5. Three bears example of the MAP Skeleton . . . . . . . . . . . . . . . . . .
48
3.6. A graphical depiction of DL(SKELETON) and DL(MISFIT) . . . . . . . .
50
3.7. Combined subjects’ performance as a function of DL(MISFIT). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . . . .
51
3.8. Individual subjects’ performance as a function of DL(MISFIT). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . . . .
vii
52
3.9. Combined subjects’ performance as a function of DL(SKELETON). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . .
53
3.10. Individual subjects’ performance as a function of DL(SKELETON). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . .
54
3.11. Examples of shapes used in Experiment 4 (experimentally manipulated
shapes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
3.12. Combined subjects’ performance as a function the manipulated contour
noise variable. Experiment 4 (Manipulated Shapes) data. . . . . . . . . .
60
3.13. Combined subjects’ performance as a function of the manipulated axial
structure variable. Experiment 4 (Manipulated Shapes) data. . . . . . . .
61
3.14. Combined subjects’ performance as a function of DL(CONTOUR). Experiment 4 (Manipulated Shapes) data. . . . . . . . . . . . . . . . . . . .
62
3.15. Combined subjects’ performance as a function of DL(MISFIT). Experiment 4 (Manipulated Shapes) data. . . . . . . . . . . . . . . . . . . . . . .
63
3.16. Combined subjects’ performance as a function of DL(SKELETON). Experiment 4 (Manipulated Shapes) data. . . . . . . . . . . . . . . . . . . .
64
A.1. Power spectrum of stimulus image . . . . . . . . . . . . . . . . . . . . . .
73
A.2. Distributions of the number of straight continuations . . . . . . . . . . .
74
A.3. The response of a vertical Gabor filter to an image from Experiment 1 . .
78
A.4. The response of a vertical surround-suppression filter to an image from
Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
A.5. The response of a vertical filter that ignores the surround to an image
from Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
79
1
1. General Background
In order to properly function in the environment we must be able to move around
without running into objects and without being run over by other moving objects. We
need to decide which objects in the environment are okay to eat, and which ones are
not okay to eat. In order to eat the objects that are edible, we must be able to grasp the
objects so they can be moved to our mouth. Additionally, we need to decide whether
or not an object in our environment is likely to eat us. All of these decisions rely on
accurately perceiving the properties of the relevant objects. The human visual system,
however, does not receive input in a form where all of the objects are segmented,
localized, and labeled. The visual system is faced with the problem of extracting
objects from cluttered scenes and correctly representing the properties of the extracted
objects.
The input of the visual system is the image on the retina: a set of responses
from individual photoreceptors that are completely ungrouped. Finding the correct
grouping of the individual elements is a difficult problem. There has been a long
history of attempts to describe how the visual system groups individual elements and
combines them in order to detect the objects in the environment. Unfortunately, this
problem remains unsolved for both the human visual system and computer vision.
The grouping of elements in the visual field has been well studied since the
time of the Gestalt psychologists in the early to mid 1900s. The Gestalt Psychologists
described relationships between individual elements that result in those elements being
perceived as a whole object instead of simply individual elements (Wertheimer, 1958),
famously noting that the whole object is di↵erent than the sum of the individual parts
(Ko↵ka, 1935). Their theories were descriptive in nature, explaining what contributes
to grouping, but were not quantitative, lacking rigorous detail about the mathematics
behind the di↵erent grouping cues.
Fast-forwarding to the latter half of the century, Field, Hayes, & Hess (1993)
specifically looked at how individual elements of the visual field are integrated into
a single contour and they proposed a model of the brain mechanisms to perform the
2
integration, allowing for concrete predictions about which elements will be grouped.
1.1
Contour Integration
When attempting to detect the presence of an object, the observer must group the
individual components of the visual field into a representation of that object. In order
to decide what is (or is not) an object the brain must find the boundary between the
object and other objects or the background. The first step in this process is finding
candidate edge elements, followed by the grouping of the individual elements that
make up the bounding contour of the object. If individual elements at the boundary
are strongly grouped the visual system can infer that they are all part of the same
contour, and detection will succeed, otherwise it will not.
In order to discover how the brain detects contours we must understand both
how the individual elements are encoded and how the brain integrates those elements.
De Valois, Albrecht, & Thorell (1982a,b), Field & Tolhurst (1986) and Movshon, Thompson, & Tolhurst (1987) all studied the properties of the primary visual cortex of both
cats and monkeys. The cells from which they recorded tended to be orientation selective, and localized in both space and frequency. Jones & Palmer (1987), motivated
by their findings, suggested that a good approximation of the response properties of
these cells is a sinusoidal grating under a Gaussian envelope (a Gabor filter). Results
such as these are one of the justifications for using Gabor stimuli in studies of contour
integration.
Field et al. (1993) showed subjects an image containing many Gabor patches
(i.e. images of 2D Gabor filters). Some of the Gabor patches were positioned in such a
way that the visual system would group them into a single contour. Since Field et al.
(1993), there have been numerous follow-up studies of human contour integration,
which will be described below.
In order to detect a contour in a field of Gabor patches, the visual system must
process the individual Gabor patches and group them into a chain of elements. Field
et al. (1993) proposed a model that could perform this grouping task, the association
3
field model. Their model groups elements that are parallel to the contour, where the
change in the path angle was less than 60 degrees, and where the separation between
the elements is less than 7 times their width. The association field model, like the
ideas of the Gestalt psychologists, suggests that certain properties are necessary for
the grouping of similar elements. Unlike the work of the Gestalt psychologists, the
association field model is able to make concrete, quantitative predictions about which
Gabor patches in a larger field will be grouped. In order to be grouped, elements must
have similar orientations (like the Gestalt law of similarity), the elements must be near
to each other (the law of proximity), and the turning angle between elements must
be small (similar to the laws of continuity and common fate) (Field et al., 1993; Hess,
Hayes, & Field, 2003).
The polarity or color of the individual Gabor patches and the phase of those
patches need not be the same in subsequent elements in order to be integrated in a
single chain of elements (McIlhagga & Mullen, 1996; Hess & Dakin, 1997; Dakin & Hess,
1998; Field, Hayes, & Hess, 2000). Additionally, having the same orientation greatly
increases the strength of grouping, however if the orientation of the elements are
perpendicular grouping will be stronger than for elements with random orientations
(Bex, Simmers, & Dakin, 2001; Marotti, Pavan, & Casco, 2012).
All of the contour integration studies discussed above use open contours, that is,
a chain of elements that never crosses itself. There has been less work on the detection
of contours that touch at their ends resulting in a bounded interior region, in other
words, closed contours (Kovacs & Julesz, 1993; Elder & Zucker, 1993; Pettet, McKee, &
Grzywacz, 1998; Braun, 1999; Tversky, Geisler, & Perry, 2004; Mathes & Fahle, 2007).
This is surprising because closed contours are likely more behaviorally relevant than
open contours. The visual system’s preference for closed contours is so strong that even
when the visual system encounters an open contour it seeks a perceptual explanation
that allows it to infer a closed contour (Kellman & Shipley, 1991). With the exception
of Tversky et al. (2004), the closed contour integration literature supports the Gestalt
law of closure. A group of Gabor patches that connect to form a closed shapes are
4
more easily detected (Kovacs & Julesz, 1993; Pettet et al., 1998; Braun, 1999). Contrast
sensitivity on the interior of a shape is enhanced at certain locations (Kovacs & Julesz,
1994). Also, closed shapes are more easily discriminated (Elder & Zucker, 1993) and
remembered (Garrigan, 2012) than non-closed shapes. The mechanisms that result in
this closure benefit are still not understood.
Field et al. (1993) was motivated by a desire to understand the mechanisms
in the early visual system that integrate individual elements into a whole. Models
of contour integration explain what they visual system does. These models do not
answer questions about why the visual system might have evolved to work in this
way, nor do they tell us how efficiently the visual system solves these problems, that
is, how close the system comes to optimal contour detection. Field (1987) measured
statistical regularities in natural images, and spawned an area of research that looks at
how the properties of our environment relate to the design of the visual system and if
the visual system efficiently represents those regularities.
1.2
Natural Images
Both Brunswik & Kamiya (1953) and Gibson (1966) stressed the importance of understanding the environment before we are able to fully understand human cognition and
perception. Understanding the environment and its relationship to the visual system
was the motivating idea for the study of natural image statistics. By understanding the
structure of the environment we can measure how efficient our visual systems are at
representing that environment. Additionally, knowledge of the statistical regularities
in the environment can lead to a better understanding of the driving forces in the
evolution of the visual system.
Field (1987) showed that it is possible to understand the structure of natural
images, and that the Gabor filter model of early visual cortex is an efficient way to
represent natural images. In other words, the structure of the human visual system is
well matched to the structure of natural images. Olshausen & Field (1996a,b) trained
a system to represent natural images with constraints that the representation must be
5
sparse and accurate. The filters the system learned to use were very similar to Gabor
filters, again showing that the structure of natural images lend themselves nicely to a
representation based on a bank of Gabor filters. A set of Gabor filters is an efficient
way to represent natural images, suggesting a possible reason why the visual system
developed to have receptive fields with similar properties to Gabor filters. Is there
similar evidence to support the association field type models of integrating individual
Gabor elements into a perceived whole?
Elder & Goldberg (1998) also measured the statistics of images; specifically
statistics that relate to grouping rules based on proximity, good continuation, and
brightness similarity. These three grouping rules were used to define a probability
distribution over contours. Contours that follow all of the grouping rules have high
likelihoods, and contours that do not meet the grouping rules have low likelihoods.
Elder & Goldberg (1998) also found that the contours identified by human observers
tended to have high likelihood under the model based on the three grouping rules.
Geisler, Perry, Super, & Gallogly (2001) continued this line of work, finding similar
results. They showed that contours in natural images tend to be smooth (the turning
angle between elements is small), the edge elements tend to be close in proximity,
and the elements tend to be co-circular (resulting in the elements that are parallel to
the contour and have similar orientations as their neighbors). Geisler et al. (2001) also
tested human observers’ abilities to detect contours in a field of random edge elements.
The human observers were more likely to detect the contours that were more likely to
occur in natural images. Taken together, the results of Elder & Goldberg (1998) and
Geisler et al. (2001) show that the edge elements belonging to the same contours in
natural images are also integrated into a single contour by human observers, whose
performance was shown, by Field et al. (1993), to be well explained by the association
field model. In other words, the association field model explains human contour
detection because the visual system is well matched to the environment in which it
evolved.
6
1.3
Open Questions
Historically, the work on contour integration in psychology and neuroscience primarily
focuses on explaining what the visual system does. In contrast, natural image statistics
are measured in order to answer questions about why the system works this way, under
an assumption that the system is near optimally evolved to match the properties of the
environment. The work on natural image statistics has revealed important information
about the structure of the environment. There is, however, a weak point in the studies
of natural image statistics that were discussed above. In both Elder & Goldberg (1998)
and Geisler et al. (2001) some of the statistics were collected from contours traced by
observers. The weakens the overall conclusions to say that human subjects are good at
detecting the contours in natural images that were detected by other human subjects.
These findings could be strengthened by giving an additional reason why the human
visual system solves the contour detection problem using the grouping rules found in
the literature described above, which is one goal of this thesis.
Open contour detection has been extensively studied, but it is not clear how
frequently we encounter open contours in our environment. Objects can be segmented
from the background for further analysis only after the bounding contour of the object
is detected. This bounding contour will be a closed contour. Closed contours are the
contours that define the objects in our environment and as such are the contours most
relevant for human behavior. Because closed contour detection is less well studied it
is also less well understood. Specifically, it is not known if local interactions between
elements can account for closed contour detection, or if there are more global properties
involved. A second goal of this thesis is to investigate the role of local and global e↵ects
in contour detection.
The bounded region inside a contour has a shape. Perhaps one of the reasons
why closed contour detection is not well understood is because the detection may not
only be a↵ected by properties of the contour but by properties of the shape as well.
In order to account for the e↵ect of shape, the theory would need a model of shape
in addition to the model of contour integration. Is there a representation of shape
7
that contains the psychologically meaningful information that is useful for contour
detection? A third goal of this thesis is to investigate how shape might be represented
in a meaningful way for a detection task.
1.4
Dissertation Overview
The approach in this thesis is to explain why the system functions the way it does by
first seeking to understand the nature of the decision problem facing the system. I
will show that the design of the system is not simply due to the structure of natural
contours, but due to the mathematics of the decision problem.
Chapter 2 investigates the detection of open contours (i.e. contours that do not
cross or close on themselves). A generative model of natural contours is proposed.
This generative model is used as a basis for a complexity measure, which is in turn
shown to be related to the detectability of the contours. The complexity of the entire
contour is important, not simply local points of high complexity. Using the complexity
measure I show that by setting up the detection problem as a decision between two
alternatives (seeing noise or seeing a contour) a decrease in performance with an
increase in curvature is expected.
Chapter 3 investigates the detection of closed contour. Through two experiments, one using natural contours and the other using experimentally manipulated
contours, we show that the detectability of the contours cannot be completely explained
by the contour complexity measure used for open contours. Additional complexity
measures are introduced that involve the complexity of the bounded interior region,
as opposed to simply the bounding contour. I explain why the MAP skeleton representation of shape (Feldman & Singh, 2006) is a principled choice for the basis of the
shape complexity measure. In order to explain our subject’s detection results, both the
complexity of the contour and the complexity of the shape of the bounded region must
be considered.
These results suggest that, when performing a contour detection task, the visual
system integrates information across a larger region of space, rather than considering
8
only a small neighborhood near each contour point. Additionally, the mathematical
structure of the detection problem reveals a principled reason why our visual system
evolved grouping mechanisms that prefer less complex contours.
9
2. Open Contours
The detection of coherent objects amid noisy backgrounds is an essential function of
perceptual organization, allowing the visual system to distinguish discrete whole forms
from random clutter. Psychophysical aspects of this process have been extensively
studied using contour detection tasks, in which subjects are asked to detect coherent
chains of oriented elements amid background fields containing randomly oriented
elements (Field et al., 1993; Kovacs & Julesz, 1993; Hess & Field, 1995; Pettet et al.,
1998; Hess & Field, 1999; Field et al., 2000; Geisler et al., 2001; Yuille, Fang, Schrater,
& Kersten, 2004). From this literature we now know that cells in the visual cortex are
able to transmit the information contained in natural images efficiently (Field, 1987),
that closed contours are more easily detected than open contours (Kovacs & Julesz,
1993), that contours can be integrated across depths (Hess & Field, 1995), that large
curvature discontinuities disrupt detection (Pettet et al., 1998), that the tokens that are
integrated do not need to appear identical (Field et al., 2000), and that human detection
performance is close to ideal for contours whose properties are similar to those found
in natural images (Yuille et al., 2004).
But despite this extensive literature, the computational processes underlying
contour detection are still poorly understood. Field et al. (1993) showed that contours
are more difficult to detect when the turning angles, ↵, between consecutive elements
are large, an e↵ect widely interpreted as implicating lateral connections between oriented receptive fields in visual cortex, the “association field.” This preference for small
turning angles is sometimes expressed as a preference for smooth curves, because turning angle can be thought of as a discretization of the curvature of an implied underlying
contour. (Curvature is the derivative of the tangent ~t with respect to arclength s. The
turning angle ↵ is the angle by which the tangent changes between samples separated
by ∆s, i.e. ↵ ⇡ ∆s~t.) The association field is useful in describing what elements are
integrated into a contour. However, it is a qualitative model that can be thought of as
simply separating contours into two groups, those that are integrated and those that
are not. It does not anything to say about the di↵erences between two contours that
10
both have been integrated, or the total integrated complexity of the entire contour, only
considering the relationship between neighboring elements. A number of more recent
studies have confirmed that, broadly speaking, detection performance declines with
larger turning angles (Ernst, Mandon, Schinkel-Bielefeld, Neitzel, Kreiter, & Pawelzik,
2012; Geisler et al., 2001; Hess et al., 2003; Pettet et al., 1998). The more the curve “zigs
and zags,” the less detectable it is.
Nevertheless a principled quantitative model of the decline in performance as
a function of turning angle, predicting exactly how performance is a↵ected by contour
geometry, and why, is still elusive. In part this reflects the lack of an integrated
probabilistic account of the decision problem inherent in contour detection. Below
I aim to take steps towards such a model by casting the contour detection problem
as a Bayesian decision problem, in which the observer must decide based on a given
arrangement of visual elements whether a “smooth” contour is actually present. To
this end I introduce a simple probabilistic model of smooth contours, derive a Bayesian
model for detecting them amid noisy backgrounds, and compare the predictions of the
model to the performance of human subjects in a simple contour detection task.
2.0.1
Smooth contours
To model contour detection as a decision problem, we need an explicit probabilistic
generative model of contours, i.e. a model that assigns a probability to each potential
sequence of turning angles. The association field and similar models assume simply
that smooth contours consist of chains of small (i.e., relatively straight) turning angles,
without any explicit probabilistic model. A more specific probabilistic model of turning
angle has been proposed and developed in Feldman (1995, 1997); Feldman & Singh
(2005); Singh & Fulvio (2005, 2007). In this model, I assume that turning angles along
a smooth contour are approximately straight (collinear), with random deviations from
collinearity that are distributed according to a von Mises distribution of turning angles,
p(↵|C) / exp β cos ↵.
(2.1)
11
This equation defines the likelihood model for angles under the contour (C) hypothesis,
which indicates the probability of each particular turning angle if this hypothesis
holds. The von Mises distribution is similar to a Gaussian (normal) distribution,
except suited to angular measurements, with the parameter β acting like the inverse
of the variance. In practice it is often more mathematically convenient to work with a
Gaussian distribution of turning angle,
"
✓ ◆ #
1 ↵ 2
1
,
p(↵|C) = p exp −
2 σ
σ 2⇡
(2.2)
which is nearly identical numerically to the von Mises for small angles. (The two
distributions’ Taylor series are the same up to the first two first terms; see Feldman &
Singh, 2005.) This distribution (whether von Mises or Gaussian) captures the idea that
at each point along a smooth contour, the contour is most likely to continue straight
(zero turning angle), with larger turning angles increasingly unlikely.
I then extend this local model to create a likelihood model for an entire contour by assuming that successive turning angles are identically and independently
distributed (i.i.d.) from the same distribution. Thus a sequence [↵i ] = [↵1 , ↵2 , . . . ↵N ] of
N turning angles has probability given by the product of the individual angle probabilities,
LC = p([↵i ]|C) = p(↵1 |C)p(↵2 |C) . . . p(↵N |C)
(2.3)
which, under the Gaussian approximation, equals
LC = p
1
2⇡σ
!N
3
2
N
77
66 1 X
2
6
exp 664− 2
↵i 7775
2σ
(2.4)
i=1
Technically, the assumption that successive turning angles are i.i.d. means that the
sequence of orientations (local tangents) along the contour form a Markov chain (nonadjacent orientations are independent conditioned on intervening ones). Contours
generated from this model undulate smoothly but erratically, turning left and right
equally often, though always most likely straight ahead. This generative model can be
12
augmented, such as to include an bias towards cocircularity (continuity of curvature;
see Singh & Feldman, 2013), though in what follows we use the simpler collinear
model.
2.1
Bayesian model
With this simple contour model in hand, I next derive a simple model of how the
observer can distinguish samples drawn from it (that is, smooth contours) from noise.
In the experiments below, we embed dark contours generated via the above model
in fields of random pixel noise (Fig. 2.1a). Most studies of contour detection starting
with Field (1987) have used displays constructed from Gabor elements arranged in a
spatial grid, in part to avoid element density cues, and also to optimize the response
of V1 cells. However such displays are extremely constrained in geometric form,
and partly for this reason Geisler et al. (2001) used simple line segments that can be
arranged more freely, and Yuille et al. (2004) used matrices of pixels. My aim was to
achieve complete flexibility of target contour shape, so that I could study the e↵ects
of contour geometry as comprehensively as possible. So, like Yuille et al. (2004), I
constructed each display by embedding a monochromatic contour (a chain of pixels of
equal luminance) in pixel noise (a grid of pixels of random luminance; Fig.2.1a). This
construction allows considerable freedom in the shape of the target contour, while still
presenting the observer with a challenging task.
I begin by deriving a simple model of how an observer can distinguish an
image that contains a target generated from the above smooth contour model from
an image that does not (pure pixel noise). The target curve consists of a chain of
dark pixels passing through a field of pixels of random luminances. Within each local
image patch, a simple strategy to distinguish a target from a distractor is to consider
the longest path of uniform luminance path through the patch, and decide whether
the path is more likely to have been generated by the smooth curve model C, with von
Mises distributed turning angles, or is simply part of a patch of random background
pixels (Fig. 2.1b), which I will refer to as the “null” model H0 . We can use Bayes’ rule
13
a
b
Figure 2.1: (a) Sample image (with contour target) from experiments. (b) Blowup of
a patch of the image, showing a chain of locally darkest pixels. The observer must
decide if this path is a smooth path drawn from the von Mises model, or a random
walk resulting from the pixel noise.
to assess the relative probability of the two models, and then extend the decision to a
series of image patches extending the length of the potential contour.
Under the smooth contour model, the observed sequence of turning angles has
likelihood given in Eq. 2.4, the product of an i.i.d sequence of von Mises (or normal)
angles. Conversely, under the null model, the contour actually consists of pixel noise,
and each direction is equally likely to be the continuation of the path, yielding a uniform
distribution over turning angle p(↵|H0 ) = ✏. (In experimental displays, ✏ = 1/3 because
the curve could continue left, straight, or right, all with equal probability, but here I state
the theory in more general terms.) In the null model, as in the contour model, I assume
that turning angles are all independent conditioned on the model, so the likelihood of
the angle sequence [↵i ] is just the product of the individual angle probabilities,
L0 ([↵i ]) = ✏N ,
reflecting a sequence of N “accidental” turning angles each with probability ✏.
(2.5)
14
By Bayes’ rule, the relative probability of the two interpretations C and H0
is given by the posterior ratio p(C|[↵i ])/p(H0 |[↵i ]). Assuming equal priors (which is
reasonable in our experiments as targets and distractors appear equally often), this
posterior ratio is simply the likelihood ratio LC /L0 . As is conventional, I take the
logarithm of this ratio to give an additive measure of the evidence in favor of the
contour model relative to the null model (sometimes called the Weight of Evidence,
WOE):
log LC /L0 = log LC − log L0 .
(2.6)
The first term in this expression, the log likelihood under the contour model,
is familiar as the (negative of the) Description Length (DL), defined in general as the
negative logarithm of the probability:
DL(M) = − log p(M)
(2.7)
As Shannon (1948) showed, the DL is the length of the representation of a message
M having probability p(M) in an optimal code, making it a measure of complexity of
the message M. (Once the coding language is optimized, less probable messages take
more bits to express.) In this case, plugging in the expression for the likelihood of the
contour model (Eq. 2.4) (and using natural logarithms for convenience), the DL of the
contour [↵i ] is just
p
1 X 2
↵ ,
DL([↵i ]) = − ln LC = N ln(σ 2⇡) + 2
2σ i i
(2.8)
and the weight of evidence in favor of the contour model is
ln(LC /L0 ) = −DL − N ln ✏,
(2.9)
the negative of the DL minus a constant.
For a contour of a given length N, the only term in Eq. 2.8 that depends on the
15
shape of the contour is
P
2
i ↵i ,
the sum of the squared turning angles along its length.
(The N ln ✏ term is a constant that does not depend on the observed turning angles.)
The larger this sum, the more the contour undulates, and the higher its DL. The larger
the DL, the less evidence in favor of the smooth contour model; the smaller the DL, the
smoother the contour, and the more evidence in favor of the smooth model.
Hence the Bayesian model leads very directly to a simple prediction about
contour detection performance: it should decline as contour DL increases. The larger
the DL, the less intrinsically detectable the contour is, and the more the contour is
intrinsically indistinguishable from pixel noise. Because this prediction follows so
directly from a simple formulation of the decision problem, in what follows I treat it as
a central empirical issue. In the following experiments I manipulate the geometry of
the target contour, specifically its constituent turning angles, and evaluate observers’
detection performance as a function of contour DL. More specifically, the data can
be regarded as a test of the specific contour likelihood model I have adopted, which
quantifies the probability of each contour under the model and thus, via Shannon’s
formula, its complexity.
As suggested by Attneave (1954), contour curvature conveys information; the
more the contour curves, the more “surprise” under the smooth model, and thus the
more information required to encode it (Feldman & Singh, 2005). But DL, Shannon’s
measure of information, is also a measure of the contour’s intrinsic distinguishability
from noise. The DL is in fact a sufficient statistic for this decision, meaning that it
conveys all the information available to the observer about the contour’s likelihood
under the smooth contour model.
Of course, as discussed above, it is well known that contour curvature decreases
detectability, and the derived dependence on the summed squared turning angles may
simply be regarded as one way of quantifying that dependence. But the above derivation is novel because (a) it shows how the specific choice of contour curvature measure,
the summed squared angle, derives from a standard contour generating model, and (b)
it shows how the predicted decrease in detectability relates to the Description Length,
16
which is a more fundamental relationship that generalizes far beyond the current situation. From this point of view, contour detection is simply a special case of a more
general problem, the detection of structure in noise. Generalizing the argument above,
whenever the target class can be formulated as a stochastic generative model with an
associated likelihood function, detection performance should decline as DL under the
model increases. In the experiments below, I test this prediction for the case of open
contours, that is contours that do not cross over themselves.
2.2
2.2.1
Experiment 1
Method
Subjects
Ten naive subjects participated in the experiment as part of course credit for an introductory Psychology course.
Procedure and Stimuli
Stimuli were displayed on a gamma corrected iMac computer running Mac OS X 10.4
and Matlab 7.5, using the Psychophysics toolbox (Brainard, 1997; Pelli, 1997).
The task was a 2IFC task. The subjects’ task was to decide if a contour was
present in the first or second of two target images. There were three masks used. A premask was presented prior to the first target image, a post mask was presented after the
second target image, and a mask was presented between the two target images. Each
target image and mask was displayed for 500 ms, and there was a 100 ms blank period
between an image and mask or mask and image. All images were presented at about
6.5 degrees of visual angle. Each image was 15 cm x 15 cm on the display. The head
position was not constrained, but was initially located at 66 cm from the display and
subjects were instructed to remain as still as possible. If necessary, during three breaks
which were evenly spaced throughout the experiment, the head was repositioned to
be 66 cm from the display.
17
The target images were created by randomly sampling a 225 by 225 matrix of
intensity values. Each pixel was drawn from a uniform distribution between 0 and 1.
There were two target images, in one of which a contour would be embedded. The
contour was either a long contour (220 pixels) or a short contour (110 pixels). The
contour was generated by sampling sequential turning angles (✓) using a discretized
approximation to a von Mises distribution. Continuing straight (✓ = 0) is more common than a turning angle right or left (✓ = ⇡/4 or ✓ = −⇡/4). Once the contour was
generated it was dilated using the dilation mask ([010; 110; 000]). The dilation was
performed so that a perfectly straight contour oriented horizontally or vertically was
detected as easily as one oriented at an arbitrary angle in a pilot study involving the
experimenter. Now, in order that the contour doesn’t appear at a di↵erent scale from
the random pixels in the image, the image was increased in size, to 550 by 550 pixels,
using nearest-neighbor interpolation so that where each pixel in the smaller image is
now a two-by-two pixel area in the images. Finally the contour was embedded in on
of the target images. The contour was randomly rotated, and then was placed at a
random location in the image, with the restriction that the contour may not extend
beyond the edge of the image. An example image, containing a contour, can be seen
in Figure 2.2.
The masks were random noise images consisting of black and white pixels. The
images were created by randomly sampling a binary matrix of size 225 by 225, and
then scaling the matrix to 550 by 550 using nearest-neighbor interpolation, just as in
the target images.
The contours embedded in one of the target images were one of five di↵erent
complexity levels and were one of two lengths (110 pixels or 220 pixels). Each contour
was displayed at 85 percent contrast. There were 40 images of each length and each
complexity, resulting in a total of 40 ⇥ 2 ⇥ 5 = 400 trials. The luminance of the contour,
image size, and contour lengths were chosen so that in a pilot study subjects performed
near 75 percent correct.
18
Figure 2.2: An example of an image containing a contour. Here the contour is shown
at maximum contrast (black) for the ease of the reader, however in the experiment the
contour was at 85 percent contrast. This image is best viewed or printed at a zoom of
100 percent, otherwise the image may be distorted by pixel averaging or interpolation.
19
2.2.2
Results and Discussion
Every subject showed some level of decrease in detection as the complexity of the
stimuli increased (see Figure 2.3). Using a linear regression of the log odds of a correct
answer to the complexity, five of ten subjects showed a significant (p < 0.05) detection
decrease in the short line condition, and five of ten showed a decrease in the long
line condition. All but two of the subjects showed a significant decrease in at least
one of the length conditions. Since this e↵ect was present in each subject, the data
from each subject was combined for analysis (Figure 2.4). There was a main e↵ect of
complexity, F(4, 89) = 18.55, p = 4.07 ⇥ 10−11 , there was no main e↵ect of line length
and no interaction at the p = 0.05 level. The slope of the best fitting regression line to
the combined subjects’ data was 0.027 for the short line and 0.02 for the long line.
These results show that our subjects are sensitive to the variability in the contours created using our generative model, as predicted by a Bayesian detection model.
This model is indeed useful as the basis of a perceptually relevant complexity model.
2.3
Experiment 2
The complexity measure used in Experiment 1 is dependent upon integration across the
entire contour, but it is possible that the detectability of a contour is only based on the
local complexity at an individual contour point. The results from the first experiment
could be explained by a model of detection that only looks for straight segments, and
the contour is not integrated after a deviation from straight. Under this model the
reason the results show an e↵ect of complexity is that the probability of having a long
straight segment is higher in simple contours than in contours with a larger number
of bends. To control for this possibility, Experiment 2 varies the spatial distribution of
the complexity, placing 90 percent of the curvature in the first eleventh (the tip, third
eleventh (between the tip and the middle), or fifth eleventh (the middle) of the contour.
Because of this control the longest straight segments will be almost the same length in
contours of di↵ering complexity. When the bending points, or complexity, is located at
the tip the contour will have the longest possible straight segment, while the shortest
20
4
2
R : 0.87, p<0.023
R2: 0.62, p<0.113
Log Odds
3
Subject 1
4
3
Subject 2
R2: 0.31, p<0.327
2
R : 0.46, p<0.320
4
3
Subject 3
2
R : 0.92, p<0.010
R2: 0.61, p<0.118
4
3
Subject 4
R2: 0.76, p<0.055
R2: 0.90, p<0.014
2
2
2
2
1
1
1
1
0
−100−80 −60 −40
0
−100−80 −60 −40
0
−100−80 −60 −40
0
−100−80 −60 −40
4
2
R : 0.51, p<0.176
2
R : 0.80, p<0.040
Log Odds
3
Subject 5
4
3
Subject 6
2
R : 0.85, p<0.027
2
R : 0.87, p<0.023
4
3
Subject 7
2
R : 0.62, p<0.113
2
R : 0.64, p<0.107
4
3
Subject 8
R2: 1.00, p<0.001
2
R : 0.79, p<0.045
2
2
2
2
1
1
1
1
0
−100−80 −60 −40
0
−100−80 −60 −40
0
−100−80 −60 −40
Less
More
0
−100−80 −60 −40
Less
More
4
R2: 0.93, p<0.009
R2: 0.26, p<0.379
Log Odds
3
Subject 9
4
3
Subject 10
R2: 0.38, p<0.266
R2: 0.94, p<0.007
2
2
1
1
0
−100−80 −60 −40
Less
More
0
−100−80 −60 −40
Less
More
Complexity
Complexity
Complexity
Short
Long
Complexity
Figure 2.3: For each subject the performance (log odds of a correct answer) is shown as
a function of the complexity of the stimulus. The red points shows the detection when
subjects were shown a long contour (220 pixels) and the blue points a short contour
(110 pixels). A decrease in performance with increasing complexity was found for each
individual and for each line length. The R2 of the best fitting regression line for the
short contour (blue R2 ) and the long contour (red R2 ) are reported. Of the 20 linear fits,
10 have slopes significantly di↵erent from 0 at the p=0.05 level.
21
3
2
R : 0.56, p<0.001
2
R : 0.32, p<0.001
2.5
Log Odds
2
1.5
1
0.5
0
Short
Long
−100 −90
Less
−70
Complexity
−50 −40
More
Figure 2.4: The performance (log odds of a correct answer) is shown as a function of
the complexity of the stimulus for the subjects’ combined data. The red points show
the detection when subjects were shown a long contour (220 pixels) and the blue points
when shown a short contour (110 pixels). Error bars show one standard error. For both
contour lengths there is a decrease in performance with increasing complexity.
22
segments would result if the complexity was located at the middle. The longest straight
segment detection model would predict better detection if the bend location is at the
tip compared to the condition when the bend location is at the middle.
2.3.1
Method
Subjects
Ten naive subjects participated for course credit as part of an introductory psychology
course. The data of two subjects were excluded because they were at chance performance for all conditions; self reports at the end of the experiment suggested that these
two subjects did not understand the task.
Procedure and Stimuli
Each trial was conducted in the same manner as Experiment 1. The only di↵erence in
the stimulus was that the location of 90 percent of the surprisal was controlled to be at
the first eleventh of the contour (the tip), the third eleventh of the contour (between the
tip and the middle, or the fifth eleventh of the contour (the middle).
There were 396 trials (4 di↵erent complexities and three locations, with 36 trials
per crossed conditions).
2.3.2
Results and Discussion
Complexity decreased detectability, just as in Experiment 1, in all conditions of all subjects except for 1 (Figure 2.5). Combining subject data (Figure 2.6) shows a significant
main e↵ect of complexity, F(3, 42) = 18.99, p = 6.18 ⇥ 10−8 , and no main e↵ect of bend
location F(2, 42) = 0.94, p = 0.397.
The manipulation of the location of the complexity in the contour can be seen to
have had no consistent e↵ect in the individual subjects when looking at percent correct
versus the bend location (Figure 2.7). Combining subjects’ data (Figure 2.8) also shows
no trend. The failure of the ANOVA to find a significant main e↵ect of bend location
23
3
Subject 1
3
Subject 2
3
Subject 3
3
2
2
2
1
1
1
1
Log Odds
2
0
−100
3
−50
Subject 5
0
−100
3
−50
Subject 6
0
−100
3
−50
Subject 7
0
−100
3
Subject 4
−50
Subject 8
Tip
Between
Middle
2
2
2
1
1
1
1
0
−100
−50
Complexity
0
−100
−50
Complexity
0
−100
−50
Complexity
0
−100
−50
Complexity
Log Odds
2
Figure 2.5: For each subject the performance (log odds of a correct answer) is shown
as a function of the complexity of the stimulus. The colors denote the di↵erent bend
location conditions. Dashed lines show the best fitting regression lines. The decrease
in detectability as complexity increased found in Experiment 1 was found in all conditions, for all subjects, except for one bend location condition in one subject.
24
3
Bend Location
Tip
Between
Middle
2.5
2
2
Log Odds
R : 0.74, p<0.142
R2: 0.89, p<0.059
1.5
R2: 0.66, p<0.190
1
0.5
0
−100 −90
Less
Complexity
−50 −40
More
Figure 2.6: The performance (log odds of a correct answer) is shown as a function of
the complexity of the stimulus for the subjects’ combined data. The e↵ect found in
Experiment 1, a decrease in detection with performance, is again seen in this data.
25
Subject 1
Log Odds
2
Subject 3
2
1.5
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
2
T
B
M
Subject 5
0
2
T
B
M
Subject 6
0
2
T
B
M
Subject 7
0
1.5
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
T
B
M
Bend Location
0
T
B
M
Bend Location
0
T
B
M
Bend Location
T
0
B
M
Subject 8
2
1.5
0
Subject 4
2
1.5
0
Log Odds
Subject 2
2
T
B
M
Bend Location
Figure 2.7: For each subject the performance (log odds of a correct answer) is shown as
a function of the bend location (location of the majority of the curvature). The di↵erent
bend locations are denoted by “T” (tip), “B” (between the tip and middle), and “M”
(Middle). There was no consistent decrease in detectability as the location was varied
across subjects.
was not a result of finding a weak trend in the correct direction with too little data to
find significance, but that there is no trend in the predicted direction.
These data suggest that the e↵ect of complexity found in Experiment 1, and
replicated in Experiment 2, is not due solely to the spatial distribution of the complexity throughout the contour. Humans are not simply straight segment detectors; the
complexity of the entire contour plays a role in the detectability of that contour.
2.4
General discussion
Contour curvature is well known to influence human detection of contours (Field
et al., 1993; Hess & Field, 1995; Pettet et al., 1998; Hess & Field, 1999; Field et al., 2000;
26
2
Log Odds
1.5
1
0.5
0
Tip
Between
Bend Location
Middle
Figure 2.8: The performance (log odds of a correct answer) is shown as a function of
the location of complexity of the stimulus for all of the subjects’ data combined. There
appears to be no e↵ect of the location of the complexity.
27
Geisler et al., 2001; Yuille et al., 2004). But exactly why curvature has this e↵ect is still
poorly understood, as reflected in the lack of mathematical models that can adequately
capture it. In this paper we have modeled contour detection as a statistical decision
problem, showing that a few simple assumptions about the statistical properties of
smooth contours make fairly strong predictions about the performance of a rational
contour detection mechanism, specifically that its performance will be impaired by
contour complexity. Contour complexity, in turn, involves contour curvature as a
critical term (the only term that reflects the shape of the contour), thus giving a very
concrete quantification of the e↵ects of contour curvature on detection performance.
The data strongly corroborates the e↵ect of complexity on contour detection, showing
strong complexity e↵ects in almost every subject. Moreover, the spatial distribution of
the complexity does not seem to determine performance, suggesting that a purely local
complexity measure (such as the information/complexity defined at each contour point)
will not suffice to explain performance, and a holistic measure of contour complexity
(such as contour description length) is required.
A more detailed consideration of the statistical decision problem suggests that
our observers are far from optimal. To see why, consider the contour detection problem
reduced to a simple Bernoulli problem. The target contour consists of a series of turns
through the pixel grid, each of which can be classified as either a straight continuation
(a success or “head”) or a non-straight turns (a failure or “tail”). Smooth contours
have a strong bias towards straight continuations (successes), translating to a high
success probability, the exact value of which depends on the spread parameter β in
the von Mises distribution that generates the turning angles (see Eq. 2.1), which in
turn depends on the condition. In contrast, each distractor path consists of a series of
random turns in which the probability of straight continuation is ✏ (= 1/3, as explained
above; see Fig. 2.1). The observer’s goal is to determine, based on the sample of turns
observed, whether the sample was generated from the smooth process (i.e., a target
display) or the null process (distractor display). Because successive turning angles are
assumed independent, this means that a smooth curve consists, in e↵ect, of sample of
N (=220 or 110 on long and short trials respectively) draws from a Bernoulli process
28
with high success probability (0.985 or 0.964 for short and long contours respectively),
while noise consists of a sample with success probability 1/3.
To make a decision in these experiments, the observer must consider the number
of straight continuations observed in the smoothest contour in each display, and decide
which stimulus interval was more likely to contain a sample generated by a process
with high success probability (again, with the level dependent on condition). This is
a (forced choice version of a) standard Bernoulli Bayesian decision problem (see for
example Lee, 2004). Given the values of N, β and ✏ used in the experiment, this is
actually an easy decision; an optimal observer with full information about the pixel
content of the images in both intervals would be at ceiling in all conditions, since even
the highest DL conditions have a success probability much higher than 1/3. The perfect
performance predicted by this optimal observer is similar to the performance expected
under association field style models. When the association field has complete access
to the pixels in the image it would integrate the contour on every trial. A decision
procedure that simply chooses the image containing the containing the longest of the
contours found by the association field would perform perfectly. with no e↵ect of
complexity.
Obviously, our observers do not have full access to the pixel content of both
images, which contain far more information than can be fully acquired under our
speeded and masked experimental conditions. But no simple assumptions about the
restricted availability of image content in our displays predicts the particular combination of sub-ceiling performance with complexity e↵ect that we observed. (For
example, one might assume that subjects only apprehend a part of the target contour,
which would account for sub-ceiling performance, but then this model would expect a
smaller complexity e↵ect than is actually observed.) That is, the observed complexity
e↵ect, thought it is predicted in a general way by Bayesian arguments as explained in
the introduction, is actually even stronger than simple optimal models would predict.
For a more detailed discussion of the di↵erent observer models that failed to explain
these results see the appendix. I conclude, in short, that our observers are suboptimal
29
due to some combination of performance limitations that will need to be explored in
future studies.
It is important to understand that many conventional models of object detection
from the computer vision literature cannot locate the objects in our displays at all.
Many of the standard edge detection algorithms rely on finding strong luminance
discontinuities in the image. For example, the Marr-Hildreth algorithm uses Laplacian
of Gaussian (LOG) filters, due to their similarity to response properties of cells in LGN
(Marr & Hildreth, 1980). These filters fail to increase their responses near the contours
in both experiments because the filters need a di↵erence between the luminance in the
center from the luminance in the surround. In our stimuli there is noise everywhere
and our contours have the same luminance as many pixels that are actually part of the
noise. Other edge detection algorithms, such as the Canny edge detector, the Hough
Transform, and the Burns line finder also look for high magnitude image gradients,
which are not present in our images (Canny, 1986; Hough, 1962; Duda & Hart, 1972;
Burns, Hanson, & Riseman, 1986). These methods also frequently smooth the input
image prior to detecting edges. This causes the contour to blend in to the noise making
any image gradient substantially weaker.
Edges in images result in a strong component in the power spectrum perpendicular to the edge in the image. One possible explanation for our e↵ect is that the
power spectrum of images containing straighter contours is more distinguishable from
the power spectrum of noise images than the power spectrum in images with curved
contours. This turns out not to be the case. The power spectrum of the images used in
the experiments reveals that there is energy at all frequencies, not just predominantly at
low spatial frequencies as is common in natural images. This suggests that the reason
standard edge-detection algorithms are unable to detect the contours is because our
images are of a di↵erent structure than present in the natural images for which these
algorithms were developed. However, our subjects were able to accurately detect the
contours, suggesting that the human visual system is doing something di↵erent than
these algorithms.
30
Hence at the one extreme, conventional object detectors cannot perform the
task, while at the other extreme, probabilistic models with access to the full pixel
content, as well as the association field model, would be at ceiling (and not show any
complexity e↵ects). Human observers are in between these extremes in that they are
capable of finding the target objects, but their performance diminishes with increasing
contour complexity.
2.5
Conclusion
Here I have framed the contour detection problem as a Bayesian inference problem. I
defined a simple generative model for contours, in which successive turning angles are
generated from a straight-centered distribution. The observer’s task then reduces to a
probabilistic inference problem in which the goal is to decide which of the two stimulus
images was more likely to contain a contour sample drawn from this generative model.
This framing of the problem predicts and explains key aspects of the empirical results,
in particular the influence of contour complexity on detection. Moreover, we found
that the spatial distribution of complexity appears not to matter substantially; the
deciding factor is the smoothness of the entire target taken as a whole. Both of these
results are not predicted by standard theories of contour integration (Field et al., 1993;
Geisler & Super, 2000; Geisler et al., 2001), which would always correctly detect the
contour.
More fundamentally, the Bayesian model makes it clear why a diminution in
performance as curvature increases “makes sense”—it is a direct consequence of a
very general contour model and a rational detection mechanism. As argued in the
Introduction of this chapter, the problem of contour detection is really just a special
case of the more general problem of the detection of pattern structure in noise. As
shown above, a very general model of the probabilistic detection of regular structure
in noise entails an influence of target complexity, defined from an information-theoretic
point of view as the negative log of the stimulus probability under the generative
model, that is the Shannon complexity or Description Length (DL) of the pattern.
31
The ubiquitous influence of simplicity biases and complexity e↵ects in perception and
cognition more generally (Chater, 2000; Feldman, 2000) can be seen as a consequence of
the general tendency for simple patterns to be more readily distinguishable from noise
than complex ones (Feldman, 2004). In past studies, contour detection has usually been
studied as a special problem unto itself whose properties derived from characteristics
of visual cortex. Instead I hope that in the future the problem of contour detection
can be treated as a special case of a broader class of pattern detection problems, all
of which can be studied in a common mathematical framework in which they di↵er
only in details of the entailed generative models. This observation opens the door to a
much broader investigation of perceptual detection problems encompassing detection
of patterns and processes well beyond simple contours, such as closed contours and
whole objects.
2.6
Open vs Closed Contours
This chapter showed that a Bayesian model of contour detection linked contour complexity to performance, giving a principled reason why the more curved a contour is
the more difficult it will be to detect. The experimental data also suggested that our
ability to detect a contour depends on the entire contour, rather a small local region
around contour points.
In Chapter 3, I will show that the contour complexity measure defined in this
chapter easily extends to closed contours. Simpler models of contour complexity, for
example the sum of total curvature, do not easily extend to closed contours.
In addition to looking at the complexity of the closed contours, a closed contour
detection task allows for the examination of other global e↵ects that are not present in
open contours stimuli. Specifically, I will consider the e↵ect of global shape representations on detection.
Chapter 3 will introduce two models of complexity based on the MAP Skeleton
(Feldman & Singh, 2006). The MAP skeleton is an attractive choice due to its success
in modeling multiple perceptual phenomena. The MAP skeleton representation has
32
been shown to be useful when modeling human shape classification (Wilder, Feldman,
& Singh, 2011), shape similarity (Briscoe, 2008), figure/ground interpretation (Froyen,
Feldman, & Singh, 2010), and part segmentation (Feldman, Singh, Briscoe, Froyen,
Kim, & Wilder, 2013). The experiments on closed contour detection will measure if the
global aspects of shape captured by the MAP skeleton also influence detection.
33
3. Closed Contours
Detecting objects in our environments is an important function of the human visual
system. In order to navigate around obstacles or be able to categorize an object we
must first detect the objects. How humans detect edges in an image and how those
edge elements are integrated into contours, as well as the structure of contours in
natural images has been extensively studied (Field et al., 1993; Geisler & Super, 2000).
From this literature we know that the natural open contours humans easily perceive
tend to be straight, and as curvature increases the probability of the contour decreases
(Elder & Goldberg, 2002; Geisler et al., 2001; Ren & Malik, 2002). We also know that
the visual system is sensitive to the structure to natural contours (Field, 1987). Natural
contours tend to be straight, and if the visual system is really sensitive to the structure of
natural contours, large curvature discontinuities should disrupt detection and human
detection of natural contours should be close to ideal, as shown in Pettet et al. (1998),
Yuille et al. (2004), and in Chapter 2 of this thesis.
The detection of closed contours (contours that meet at their ends) is less well
studied. It is unknown how similar the processes for the detection of closed contours
are to the processes for detecting open contours. Finding the statistical properties
of natural open contours was motivated by the idea that the evolution of our visual
system was driven by those statistical properties (Geisler et al., 2001). Our evolution is
likely driven, not just by the statistics of the natural environment, but by the statistics
of the environment that are relevant for our survival and every day behavior. Unlike the detection of open contours, the detection of closed contours is a problem the
visual system encounters continuously because they constitute the boundaries of objects. Detecting an object is a necessary first step for important visual tasks like object
recognition and navigating around objects.
Previous work suggests that closure has a profound e↵ect on the perception of a
contour. A contour that closes not only defines the contour, but also defines a bounded
interior region (Ko↵ka, 1935). When separate elements of a sequence of Gabor patches
have weaker grouping signals (based on distance, curvature, orientation...) they are
34
less likely to be integrated into a contour (Field et al., 1993). However, when the contour
closes on itself, even with weaker grouping signals, the elements are integrated into
a whole (Kovacs & Julesz, 1993; Braun, 1999; Mathes & Fahle, 2007). Tversky et al.
(2004) suggests that the improvement in the detection of closed contours can simply
be explained by the transitivity rule of Geisler & Super (2000), but it is not clear how
transitivity would explain other benefits conferred on closed contours. Judgments
about the aspect ratio of a contour are more accurate when the contour is closed than
when it is open (Saarinen & Levi, 1999). A closed contour is processed more efficiently
than an open one (Elder & Zucker, 1993). Closed contours are encoded more accurately
than open contours which leads to an improvement in remembering and recognizing
closed contours (Garrigan, 2012). Also, Altmann, Bültho↵, & Kourtzi (2003) found that
the visual system gave special treatment to a chain of elements that could be perceived
as a shape (i.e. a closed contour) over individual elements or an open contour. Similar
findings to Altmann et al. (2003) suggest that the visual system’s preference for closed
contours is driven by a preference for explanations of visual data that involve fewer
objects instead of an explanation where each element is a single object (Murray, Kersten,
Olshausen, Schrater, & Woods, 2002; Murray, Schrater, & Kersten, 2004; Fang, Kersten,
& Murray, 2008; He, Kersten, & Fang, 2012).
The human visual system’s detection of closed contours has not been as extensively studied as the detection of open contours, but the detection of closed contours
by artificial systems has been extensively studied. This work has been predominantly
motivated by applications, for example in medical imaging analysis where it can be
used as a way to analyze the shape of the interior of arteries and the heart ventricles
(Guttman, McVeigh, & Prince, 1992; Guttman, Prince, & McVeigh, 1994), or detect the
boundaries of blood vessels (Yuan, Lin, Millard, & Hwang, 1999). Kass, Witkin, &
Terzopoulos (1988) created the active contour model, also called the snake algorithm,
which has been successfully used to detect closed contours. While active contours
often work with an edge map output by standard edge detection algorithms, newer
algorithms for active contours work directly on the original image without an edge detection step. For example, the edgeless active contour algorithm of Chan & Vese (1999,
35
2001) has found success at finding contours on a variety of images, even very noisy
images. Additionally, this algorithm can be used to find the boundaries between fuzzy
objects without edges, or where the boundaries are not defined by image gradients.
This is potentially useful, as the experiments described below use images of random
pixel noise with a closed contour embedded in the noise, similar to the images in open
contour experiments in Chapter 2. Standard edge detection algorithms failed to detect
the contours in these images, while human observers easily detected the contours.
This study investigates the detection of closed contours in noise. Subjects
will be shown a sequence of pixels noise images. One image will contain a closed
contour, and the other will not. The subject must decide which image contained the
contour. In Experiment 3, subjects will be shown natural contours in order to test if
the visual system is sensitive to variations in complexity that are found in the natural
environment. Experiment 4 will use experimentally manipulated contours to ensure
that our results from the first experiment generalize to shapes outside of the databases
used in Experiment 3.
The results of these experiments will help uncover the shape and contour properties in the images that led to some contours being easily detected and others to go
unnoticed. In order to predict how detectable a shape is I define three complexity
measures which arise naturally from probabilistic models of contour and shape generation. The first measure is contour complexity, defined similarly to the complexity
measure in Chapter 2 with a slight modification to make it suitable for closed contours. The second and third measures reflect the complexity of the bounded interior
region; shape skeleton complexity, which is the log of the prior probability of the
skeleton, and shape/skeleton misfit complexity, which is the likelihood of the shape
given the skeleton. Natural shapes are used in Experiment 3 in order to measure the
sensitivity of human observers to the contour/shape complexities that are present in
the natural environment. Natural shapes, however, may not sample the complexity
space evenly. Experiment 4 uses experimentally manipulated closed contours in order
to evenly sample the space of complexity. Standard models of contours integration
36
(Field et al., 1993; Geisler & Super, 2000) would integrate the contours in these stimuli
with high probability, resulting in ceiling performance with no e↵ect of any type of
complexity, and do not explicitly integrate the probability of the entire contour and
transform this integrated probability to a complexity measure. These experiments will
investigate changes in the detectability of contours all of which have a high probability
of being integrated at each point, and how the detection of these contours relates to
their complexity. Focusing on the three types of complexity will allow for concrete
recommendations about the types of information that should be included in contour
integration models if they are to make quantitative predictions about contour detection
performance.
3.1
Experiment 3: Natural Shapes
Previously, human observers have been shown to be sensitive to the structure of natural
shapes in a categorization task (Wilder et al., 2011) and the structure of natural-like
open contours in a detection task (Chapter 2). This study investigates the sensitivity
of human observers to the structure of natural shapes and contours in a detection task.
In this experiment subjects will detect the closed contours of natural shapes that are
embedded in a noise field. Closed contours can appear in an image displayed in one of
two intervals and the subject will respond, with a key-press, which interval contained
the image with the contour.
3.1.1
Method
Subjects
Fifteen naive subjects participated in the experiment as part of course credit for an
introductory Psychology course.
37
Figure 3.1: Examples of the natural shapes, from which bounding contours would
be extracted and embedded in noisy images. The top row (a) shows a sample of the
animal shapes, and the bottom row (b) shows a sample of leaf shapes.
Stimuli
Stimuli were displayed on a gamma corrected iMac computer running Mac OS X 10.4
and Matlab 7.5, using the Psychophysics toolbox (Brainard, 1997; Pelli, 1997).
There were two types of experimental images, one noise image, and one image
with a contour embedded in noise. The contours were the bounding contours of
natural leaf or animal shapes. Example shapes are shown in figure 3.1.
Each image was a 225 by 225 matrix of intensity values. The intensities were
between 0 and 1 and were sampled from a uniform distribution. Next the image was
increased in size, to 550 by 550 pixels, using nearest-neighbor interpolation so that at
the location of each pixel in the original image is now a two-by-two pixel area.
A contour was embedded at a random location within the image and at a
random orientation in one of the images. The contour was taken from a database of
animal shapes or a database of leaf shapes. Each database had an equal probability
of being sampled, and each image within those databases had an equal probability of
38
being sampled. The bounding contour of the sampled shape was extracted and scaled
so that the contour would be 500 ± 25 pixels long. This was done so that embedding
di↵erent shapes in the noise fields would not result in images with a di↵erence mean
luminance. Once the contour was properly sized it was dilated using the dilation
mask ([010; 110; 000]). The dilation was performed so that a perfectly straight contour
oriented horizontally or vertically was detected as easily as one oriented at an arbitrary
angle. The dilation mask was tested in a pilot study involving the experimenter. Finally,
the contour was embedded in the image with an intensity of 0.22, slightly darker than
mean luminance, 0.5. Because the noise images were doubled in size the dilated
contours match the scale of the noise in the images.
A sample image containing a leaf shape can be seen in figure 3.2. The noise
images are identical but without containing the contour.
The experimental images were interleaved with masking images. The masks
were noise images consisting of random black and white pixels. The masks were
created by randomly sampling a binary matrix of size 225 by 225, and then scaling the
matrix to 550 by 550 using nearest-neighbor interpolation, just as in the display images.
Design and Procedure
Each subject participated in 300 trials, with breaks after trial 100 and 200.
Each trial began with a central fixation mark. After fixating the subject pressed
a key to begin the trial. A sequence of five images was rapidly presented to the subject.
The first, third, and fifth images were the masks. The second and fourth images were
the experimental images, one noise image, and one containing a contour. The task was
a 2IFC task. With a key press the subjects’ decided if a contour was present in the first
or second of the two experimental images, and responded by a key press. Each mask
and experimental image was presented for 500 ms, with a 100 ms gap between images.
Images were presented at about 6.5 degrees of visual angle. Each image measured 15 cm x 15 cm on the display. Head position was not constrained, but the head
was initially placed at 66 cm from the display and subjects were instructed to remain
39
Figure 3.2: An example stimulus image containing a contour. A leaf has been embedded in the upper middle of the image. This example shows the contour at the lowest
luminance (0 out of 1) for the ease of the reader, though in the actual experiment the
images were much closer to the mean luminance (0.22 out of 1). Also, the pixel noise in
this image is in a restricted range by not sampling dark luminance noise pixels. In the
actual experiment the luminance of the noise pixels were sampled uniformly between
black (0) and white (1), making the detection task more difficult. See Figure 2.2 to see
an example image with the full range of luminances for the noise pixels. This image is
best viewed or printed at a zoom of 100 percent, otherwise the image will be distorted
due to pixel averaging or interpolation.
40
as still as possible. If necessary, during three breaks, the head was repositioned to be
66 cm from the display.
3.1.2
Results
The majority of the subjects were able to perform the task very well. Two of the
15 subjects were excluded from analysis because they performed below 55 percent
correct. The remaining 13 subjects performed between 65 and 90.3 percent correct, with
a mean of 77.7 percent correct. Subjects were significantly better (t = 2.96, p=0.0068)
at detecting leaf shapes (82.8 percent correct) than animals shapes (73 percent correct).
All 13 subjects had a higher proportion of correct answers for leaves than for animals;
this trend was significant for 8 of the 13 subjects using a two-proportion z-test with
↵ = 0.5.
Beyond these basic analyses we would like to know what properties of leaf
shapes made them more detectable than animal shapes when embedded in pixel noise.
Are there properties of their contours and shapes that a↵ect how easily each contour
is detected? In order to fully understand the e↵ect of the contour and shape the next
section will define a measure of contour complexity (the same as the contour complexity
measure in Chapter 2 but for closed contours), and the following section will define two
measures of shape complexity. Following the explanation of the complexity measures
the contribution of each of the di↵erent complexities will be analyzed.
3.2
3.2.1
Modeling
Contour Complexity
In Chapter 1, I showed that a subject’s ability to detect a contour in a noise field is
influenced by a measure of contour complexity based upon the integrated information
along the contour (Feldman & Singh, 2005).
Here I extend this model to closed contours. Again I express the the detection
problem as a decision problem in which the observer must decide if the image data is
41
likely due to the existence of a contour or the existence of complete noise.
First, let us begin with the generative model for contours used in Chapter 1. In
this context contours are thought of as a sequence of turning angles. The generative
model assumes that each turning angle in the sequence is an independent samples
from a von Mises distribution. A von Mises is a neutral choice, because it is like a
Gaussian, in that it is the maximum entropy distribution, but, unlike a Gaussian, it
is only defined from −⇡ to ⇡, making it a natural choice for angles. The distribution
has two parameters, one representing the center of the distribution and one for the
spread of the distribution. For the derivations that follow, it will be convenient to
use a Gaussian distribution, which is justifiable because they are mathematically very
similar (the initial two terms of their Taylor Series expansions are equivalent).
Under this generative model the probability of each angle is
p(↵|CONTOUR) =
(µ − ↵)2
1
p exp −
2σ2
σ 2⇡
(3.1)
Since each contour point is independently sampled, the probability of the entire
sequence of turning angles (of length N) under the contour hypothesis is the product
of the probability of each angle:
Lc =
p([↵i ]N
i=1 |CONTOUR)
1
=
p
σ 2⇡
!N
=
N
Y
i=1
p(↵i |CONTOUR)
N
h
i
1 X
exp − 2
(µ − ↵i )2
2σ i=1
(3.2)
Now consider seeing a sequence of turning angles under the hypothesis that
this sequence is simply noise. Under the noise assumption there is no reason to
expect one turning angle over another, so the probability of each turning angle is set
uniformly. Since each turning angle is sampled independently, the probability of the
entire sequence is the product of the individual probabilities. In this case, since all
angles have the same probability, the result is a constant probability to the Nth power.
42
N
L0 = p([↵i ]N
i=1 |NOISE) = ✏
(3.3)
We can now answer the question if
N
p(CONTOUR|[↵i ]N
i=1 ) > p(NOISE|[↵i ]i=1 )
(3.4)
Since the probability of a contour and the probability of noise are equal in our experiment this simply becomes a comparison of the likelihoods.
Lc P(CONTOUR) > L0 P(NOISE)
Lc > L0
(3.5)
(3.6)
An observer will decide “Contour” is LC is greater than L0 . By moving the
likelihoods to one side of the equation to get the weight of evidence (WOE) and then
taking the logs, the decision rules changes so that the observer will decide “Contour”
if the log likelihood ratio (w) is larger than 0, and otherwise decide “Noise”.
w = log (Lc /L0 ) = log Lc − log L0
(3.7)
Shannon (1948) defines Description Length (DL) as
DL = − log p(M)
(3.8)
where M is a quantity being measured, and p(M) is the probability of obtaining that
measurement. In our decision problem the measurements are sequences of turning
. Using our generative model for contours (Eqn. 3.2) we see that
angles, so M = [↵i ]N
i=1
43
the complexity, or DL of a contour is
DL(CONTOUR) = − log p([↵i ]N
i=1 |CONTOUR) = − log LC
(3.9)
Note the notation used for this complexity measure, DL(CONTOUR), which is
di↵erent than in the previous chapter. This notation will be used to distinguish this
complexity measure from the shape complexity measures defined later in this chapter.
Our decision variable can then be expressed as
w = log Lc − log L0 = −DL(CONTOUR) − log L0
(3.10)
As the DL of the contour increase, w will decrease, which means an observer
will not perceive the sequence of angles as a contour, but instead will perceive it as
noise.
To understand the e↵ect of curvature more explicitly, plug Lc (Eqn. 3.2) into the
DL equation (Eqn. 3.9)
N
p
1 X
(µ − ↵)2
DL(CONTOUR) = − log LC = N ln σ 2⇡ + 2
2σ i=1
(3.11)
Plugging Eqn. 3.11 into Eqn. 3.10 leads to a more understandable form of the
decision variable
w = −DL(CONTOUR) − log L0
N
p
1 X
(µ − ↵)2 − N ln ✏
= −N ln σ 2⇡ − 2
2σ i=1
= −c1 − c2
/ −
N
X
i=1
N
X
i=1
(µ − ↵)2 − c3
(µ − ↵)2 + c4
(3.12)
(3.13)
(3.14)
(3.15)
The decision variable is proportional to the negative of the squared di↵erence
of each observed angle to that of the most likely angle plus a constant, c4 , where c1 , c2 , c3
44
and c4 are simply constants that do not depend on the turning angle.
For open contours the most likely turning angle is 0, so the decision variable
is a function of the total squared curvature, as was shown in Chapter 2. As a contour
deviates from straight, it will be more difficult to detect.
For closed contours, the expected turning angle is µ = 2⇡/N, where N is the
number of angles in the sequence. This is because the sum of the exterior angles of any
closed figure, that does not cross over itself, must equal 2⇡ radians (360 degrees). It is
intuitive that the expected turning angle must not be zero, as in the open contour case,
because a closed contour must close. Having a non-zero center for the distribution of
turning angles biases the contour towards closure. The decision variable will decrease
as the total deviation from the expected turning angle increases. This leads to the
prediction that a closed contour shaped like a circle will be easily detected, but as the
contour becomes more dissimilar to a circle it will become more difficult to detect.
As in the open contour experiments, performance is compared to DL(CONTOUR).
Figure 3.3 shows a weak trend in the predicted direction. Using a 2-AFC Logistic Regression reveals DL(CONTOUR) is not a significant predictor of performance,
p = 0.487. This trend was present in about half of the subjects (see Fig. 3.4). The
complexity of the contour conveys at most a small influence on detection performance.
In what follows, I propose two shape complexity factors that will involve more global
information than is present in contour complexity. The complexity measures will then
be compared to detection performance.
3.2.2
Shape Complexity Measures
As discussed earlier, a closed contour is much more than a contour with closure. The
visual system treats closed contours di↵erently than open ones.
Previous work also shows that the visual system is sensitive to the statistical
properties of natural shapes, and specifically to the statistical properties of the shape
skeleton (Wilder et al., 2011). They showed that the classification of novel shapes was
well predicted by a classifier trained on the statistical properties of the MAP skeletons
45
2
1.8
Log Odds
1.6
1.4
1.2
1
0.8
560
580
600
620
DL(CONTOUR)
640
660
Figure 3.3: Performance (log odds of a correct answer) is shown as a function of
DL(CONTOUR) for the combined subjects’ data. Error bars show one standard error.
The data is noisy, but shows the trend in the predicted direction; detection was best for
the simplest contours and worse for higher complexities.
46
Log Odds
2.5
2
4
1.5
3
1
2
3
2.5
1.5
2
1
0.5
550
600
650
Log Odds
4
3
0.5
550
600
650
1
550
1.5
2
1
1.5
1.5
600
650
600
650
600
650
600
650
1
0.5
0.5
1
600
650
1.8
0
550
1
600
650
3
1.6
2
1.4
1
1.2
1
550
1
550
1.5
2
0
550
Log Odds
2
600
650
600
650
0
550
600
650
DL(CONTOUR)
0.5
550
0
600
650
−0.5
550
2.5
5
2
4
1.5
3
1
2
0.5
550
600
650
DL(CONTOUR)
1
550
DL(CONTOUR)
Log Odds
1
0.5
0
550
DL(CONTOUR)
Figure 3.4: Performance (log odds of a correct answer) is shown as a function of
DL(CONTOUR) for each individual subject. Again the data is noisy; with half of the
subjects showing a trend in the predicted direction.
47
of natural shapes. In a detection task, is the visual system sensitive to some of those
same properties?
The MAP skeleton is a Bayesian estimate of the shape skeleton that tries to find
the shape skeleton that is most likely given a shape (Feldman & Singh, 2006). The MAP
skeleton builds upon a long line of work on skeletal representations of shape, and the
visual systems’ sensitivity to points of symmetry in a shape.
The Medial-Axis Transform (MAT) was proposed in Blum (1973), as a way to
represent biological shapes. Since then the symmetric axis has been shown to have
psychological relevance (Kovacs & Julesz, 1994; Psotka, 1978; Lescroart & Biederman,
2012; Hung, Carlson, & Connor, 2012) and several variations on the representation have
been proposed (Siddiqi, Shokoufandeh, Dickenson, & Zucker, 1998; August, Siddiqi,
& Zucker, 1999; Leymarie, Kimia, & Giblin, 2004) to solve di↵erent problems in vision.
The MAP skeleton representation captures the type of shape information that
is perceptually relevant and can be used to solve multiple problems in vision (Feldman
et al., 2013). The MAP skeleton can be used for simultaneous image segmentation and
figure/ground organization (Froyen et al., 2010), it can predict where figure/ground
reversals occur (Feldman et al., 2013), it can be used to segment objects into parts
(Feldman et al., 2013), it can predict human shape similarity judgments (Briscoe, 2008),
and it can be used for object classification (Wilder et al., 2011).
In addition to its functional uses in solving vision problems, the MAP skeleton
is less sensitive to contour noise than the MAT skeleton. The MAP skeleton achieves
this by balancing how likely a shape is given its skeleton (the likelihood) and the
complexity of the skeleton (the prior). If we think of the points along the contour as
being generated by points on the skeleton, the MAP skeleton finds the simplest skeleton
that would be likely to generate that set of contour points. A graphical depiction of
candidate skeletons and the MAP skeleton for a shape can be seen in figure 3.5. The
details of the algorithm can be found in Feldman & Singh (2006).
Just as with our measure of contour complexity, DL(CONTOUR), Shannon
complexity, DL(M) = − log p(M), will be used as a complexity measure for shapes. The
48
Figure 3.5: On the left is a bear shape with a simple skeleton (high prior probability).
This skeleton would not be likely to generate the bear shape (low likelihood of the
shape given the skeleton). On the right there is a bear shape with a complex skeleton
(low prior probability) that is very likely to generate the bear shape (high likelihood of
the shape given the skeleton). In the middle is a bear shape with the MAP skeleton. The
skeleton is a balance of skeletal complexity (medium prior probability) and likelihood
of generating the shape (medium likelihood) that fits “just right”. This figure is a
modified version of figure 17 from Wilder et al. (2011).
shape complexity models simply use a di↵erent probability model for p(M). Two description lengths will be investigated: DL(SKELETON), which looks at the complexity
of the MAP skeleton, and DL(MISFIT) which looks at how well a skeleton fits a shape.
The description length of the skeleton is captured by the prior probability in
the MAP skeleton computation.
DL(SKELETON) = − log p(SKELETON)
Skeleton DL increase with the number of skeletal axes and the curvature of the
axes. Intuitively this can be thought of as the number of parts in the shape and the
curvature of those parts.
The second type of shape complexity considered is “Misfit” complexity. Misfit
complexity is a measure of how well the shape and skeleton fit together. Specifically, it
reflects how likely the contour is as the result of a process where the MAP skeleton is
49
inflated to grow a shape; this is the likelihood term of the MAP skeleton computation.
DL(MISFIT) = − log p(SHAPE|SKELETON)
If the shape is thought of as being grown from the skeleton, each contour point
is generated by a point on the skeleton. For each skeletal branch there is a probability
distribution over the distance between the contour points generated by that skeletal
branch and the skeleton itself. If many of the contour points are close to the mean
of that distribution, then that part of the shape is very likely given the skeleton. If
the contour is very jagged around the skeleton, and many contour points are far from
the mean, then the shape has a low probability given the skeleton. Additionally the
angle between the skeleton and the line connecting a contour point and the point on
the skeleton that generated it (called a “rib”) a↵ects the DL. The angle between the
skeleton and the rib is distributed as a von Mises distribution centered at 90 degrees,
so deviations from 90 degrees increase DL(MISFIT).
A graphical depiction of both shape complexity measures, DL(SKELETON)
and DL(MISFIT), can be seen in figure 3.6.
Now that the two shape complexities are defined, they can be compared to
subjects’ detection performance.
The independent e↵ect of DL(MISFIT) on detection is similar to the independent e↵ect of DL(CONTOUR): weak but in the predicted direction. Figure 3.7 shows
the relationship between detection and DL(MISFIT). Performance tended to decrease
with an increase in DL(MISFIT) but the trend is non-significant (using a 2-AFC Logistic
Regression, DL(MISFIT) is not a significant predictor of performance, p = 0.403). The
decreasing trend was present in about half of the subjects (see Fig. 3.8).
A slightly clearer e↵ect on detection is found when looking at DL(SKELETON).
Figure 3.9 shows a slightly larger decrease in detection with an increase in complexity.
The 2-AFC logistic regression reveals that DL(SKELETON) is a significant predictor of
performance (t = −4.555, p = 0.0000054). All but one of the individual subjects showed
50
(a)
p(SKEL)
DLprior
(SKELETON)
high prior
Low Complexity
low prior
High Complexity
(b)
DLlikelihood
(MISFIT) p(SHAPE|SKEL)
Skeleton axis
Rib direction error density p(f)
Rib length error density p(e)
A
xi
s
no
rm
al
Shape
Shape point x
Rib length error ex
Rib direction error fx
Ribs
Figure 3.6: A graphical depiction of DL(SKELETON) (top) and DL(MISFIT) (bottom).
Straight, single axes skeletons have low DL(SKELETON), while multi-axes skeletons
with high curvature have high DL(SKELETON). DL(MISFIT) is a↵ected by probability
of each rib length, as well as the angle at which the rib is sprouted from the skeleton.
This figure is a modified version of a figure (fig. 17) originally appearing in Wilder
et al. (2011).
51
2
1.8
Log Odds
1.6
1.4
1.2
1
0.8
−5
0
5
10
DL(MISFIT)
15
20
8
x 10
Figure 3.7: Performance (log odds of a correct answer) is shown as a function of
DL(MISFIT) for the combined subjects’ data. Error bars show one standard error. The
data is noisy, but shows the trend in the predicted direction; detection was best for the
lowest DL(MISFIT) and worse for higher complexities.
Log Odds
52
2
2
4
4
1.5
1.5
3
3
1
1
2
2
Log Odds
0.5
0
2
9
x 10
0.5
0
1
2
9
x 10
1
0
1
2
9
x 10
1
0
2
1.5
2.5
2
1.5
1
2
1
1
0.5
1.5
0
0.5
0
Log Odds
1
1
2
9
x 10
0
0
1
2
9
x 10
1
0
2
4
9
x 10
−1
0
2
2.5
2.5
3
1.5
2
2
2.5
1
1.5
1.5
2
0.5
0
1
2
9
x 10
1
0
1
9
x 10
DL(MISFIT)
2
1
0
1
9
x 10
DL(MISFIT)
2
1.5
0
1
2
1
2
9
x 10
4
9
x 10
2
9
x 10
DL(MISFIT)
Log Odds
1.5
1
0.5
0
0
1
2
9
x 10
DL(MISFIT)
Figure 3.8: Performance (log odds of a correct answer) is shown as a function of
DL(MISFIT) for each individual subject. Again the data is noisy; and, as with
DL(CONTOUR), roughly half of the subjects shows a trend in the predicted direction.
53
2
1.8
Log Odds
1.6
1.4
1.2
1
0.8
0
500
1000
1500
2000
DL(SKELETON)
2500
3000
Figure 3.9: Performance (log odds of a correct answer) is shown as a function of
DL(SKELETON) for the combined subjects’ data. Error bars show one standard error.
The data is noisy, but shows the trend in the predicted direction; detection was best
for the simplest skeletons and worse for higher complexity skeletons. The magnitude
of the performance decrement is larger than for either DL(CONTOUR) (Fig. 3.3) or
DL(MISFIT) (Fig. 3.7).
a decrease in detection as DL(SKELETON) increased.
The above analyses test for the e↵ect of the three di↵erent types of complexity
in isolation. An increase in either DL(CONTOUR) and DL(MISFIT) tended to cause
a decrease in performance, though not to a significant level. Performance did significantly decrease as DL(SKELETON) increased. These findings reflect the e↵ects of each
factor separately. To understand how these three factors might interact, I next enter
all there factors into 2-AFC logistic regression model using a stepwise function that
includes the factors and interactions that increased proportion of explained variance,
while penalizing for model complexity using the Akaike-Information Criterion (AIC).
The analysis revealed that both DL(CONTOUR) and DL(SKELETON) and there
interaction were part of the winning (lowest AIC) model, meaning that both factors
Log Odds
54
2
2
1.5
1.5
1
2
1
2000
4000
2
Log Odds
2.5
2
0.5
0
0.5
0
1.5
2000
4000
1.5
1.5
1.5
0
2000
4000
2.5
1
0
2000
4000
2000
4000
2000
4000
2
2
1
1
1
1.5
0.5
0.5
0
0
Log Odds
2.5
2000
4000
0
0
2
2.5
1.5
2
0
1
2000
4000
0.5
0
2000
4000
2.5
−1
0
4
2
3
1.5
1
0.5
0
1.5
2000
4000
2000
4000
1
0
2
1
2000
4000
DL(SKELETON)
0.5
0
2000
4000
DL(SKELETON)
1
0
DL(SKELETON)
Log Odds
1.5
1
0.5
0
0
DL(SKELETON)
Figure 3.10: Performance (log odds of a correct answer) is shown as a function of
DL(SKELETON) for each individual subject. This data, while still noisy, is much
more consistent than for either DL(CONTOUR) (Fig. 3.4) or DL(MISFIT) (Fig. 3.8).
All but one individual showed a decreasing trend, where detection decreased as
DL(SKELETON) increased.
55
and their interaction are needed to explain the data. The complexity of the MAP skeleton (DL(SKELETON)) significantly decreased detection (z = −3.207, p = 0.001). The
complexity of the contour (DL(CONTOUR)) decreased detection, but the e↵ect did not
reach significance (z = −1.77, p = 0.077). There was a significant interaction between
DL(SKELETON) and DL(CONTOUR) (z = 3.037, p = 0.002). For both DL(CONTOUR)
and DL(SKELETON) an increase in complexity lead to a decrease in detection. The
interaction coefficient was positive, the opposite direction of both main e↵ects, meaning that the two factors combined sub-additively. In other words, the decrease in
detectability for a given shape is less than predicted in a model without the interaction
term, where the decrease in detectability is simply the sum of the magnitude of the two
complexity e↵ects. These results suggests that detection of the closed contours cannot
be simply explained by local contour e↵ects; the subjects’ responses are due in part to
the global shape as well.
The results of Experiment 3 confirm that closed contour detection is influenced
by global shape complexity as captured by DL(SKELETON), at least in the context of a
natural contour detection task. However, the shapes constitute a “natural experiment”
in which the shapes are largely uncontrolled where shape parameters are correlated
with each other. Experiment 4 uses experimentally designed closed contours that are
not drawn from a database of natural shapes. These artificial shapes are generated
systematically so that the complexity measures are as independent as possible.
3.3
Experiment 4: Experimentally Manipulated Shapes
Experiment 4 follows the same basic methodology as Experiment 3, but this time
with experimentally manipulated shapes. Natural shapes do not uniformly cover the
space of complexity, leading to a sample of shapes with similar complexities where
performance is similar. Additionally, the correlation between the di↵erent complexity
measures is uncontrolled, making it difficult to tease apart the e↵ects of the separate
factors. In Experiment 4 shapes are generated by independently manipulating the axial
structure and the noise in the contour. This will result in shapes that are more evenly
56
spread in complexity space and that have a low correlation between the complexities.
Additionally, it will ensure that the results from Experiment 3 are not simply due to
some familiarity with natural shapes or other unknown artifacts in the shape database.
As in Experiment 3, subjects will be required to decide which of two intervals
contained the image containing the closed contour, and which contained the noise
image. The e↵ect of the three types of complexity - DL(CONTOUR), DL(MISFIT), and
DL(SKELETON) - will first be analyzed in isolation. Then they will be considered in
combination, to find the factors that best describe the subjects’ data without over-fitting.
3.3.1
Methods
Subjects
Twenty undergraduate subjects from the general psychology subject pool participated
in the experiment for credit in an undergraduate course.
Stimuli
As in Experiment 3, there were two noise images. In one image a contour would be
embedded. The contours in this experiment were experimentally generated so that the
structure of the skeleton could be manipulated independently of the contour noise.
The shape skeleton was generated first and then a shape was “grown” outward
from the skeleton. There were four di↵erent shape skeleton conditions: single part
shapes, two part shapes, three part shapes where two parts are on opposite sides of a
main part, and three part shapes where two parts extend o↵ the same side of a main
part.
The first condition of the skeletal axis variable used a single straight axis.
The second condition used a straight main axis with a second skeletal axis added,
perpendicular to the main axis, at a random location, yielding a skeleton with a higher
complexity (lower prior probability under the generative model). The third condition
uses a straight main axis with a second axis emanating from one side of the main axis,
and a third axis emanating from the other side of the main axis. Both the second and
57
third axis were perpendicular to the main axis and were at a random location. This
results in a skeleton that has even higher complexity. The fourth axis condition has a
straight main axis with a second and third axis emanating from the same side of the
main axis. Both are perpendicular to the main axis and at a random location, with
the constraint that the third axis must be more than 2 ⇥ 15 = 30 units away from the
second axis, where 15 is the length of the ribs for the second and third axis (explained
in more detail below). This condition results in skeletons with similar complexities
as in the third axis condition, but higher complexities than skeletons in the first or
second conditions. The overall process yields a variety of skeletal structures varying
in complexity (DL(SKELETON)).
After the skeleton was generated, a shape was grown from the skeleton. The
growth process is a result of sprouting “ribs” laterally from the skeleton. Beginning
at one end of the first axis a rib was grown perpendicularly left and right from the
skeleton. A contour point was placed at the end of the rib. The length of the ribs was
sampled from a Gaussian distribution centered at 25 units away from the skeleton. The
contour noise condition set the standard deviation of the Gaussian; the contour noise
conditions ranged from lowest noise to highest noise, and had standard deviations of
1,3,5, or 7.
After obtaining the first contour points, the second point on the skeletal axis is
considered. Again we sample a rib length from a Gaussian, however, now the Gaussian
is centered at the location of the previous rib length. The contour was sampled in this
Markovian fashion around the entire axis. Once a child axis is reached the ribs are
generated in the same manner, however the Gaussian from which rib lengths are
sampled is initially centered at 15 instead of 25. This process yields a variety of shapes
for each skeletal structure, di↵ering in likelihood (and this DL(MISFIT)).
To see examples from each combination of the factors see figure 3.11.
More Noise
Contour Noise Manipulation
Manipulated Shape Variables
Less Complex
Axial Structure Manipulation
Affects both Region Misfit and Contour Complexity
Less Noise
58
More Complex
Affects Contour Complexity
Figure 3.11: Moving right to left shows the e↵ect of the manipulation of axial structure.
The rightmost shapes all have a single axis, the two axes, and finally three axes shapes,
first with two axes on opposite sides of the shape and then two axes on the same side
of the shape. Moving from top to bottom illustrates the e↵ect of increasing the contour
noise. Along the top, the shapes have a low amount of variability in the lengths of the
ribs that generated the contour, and moving down increases this variability.
59
Design and Procedure
There design was a 4x4 factorial design (four skeletal axis conditions and four contour
noise conditions). Subjects saw 25 trials from each conditions, in a random order, for
a total of 400 trials.
Each trial was identical to Experiment 3, except that the stimuli were generated
as described above instead of being the boundaries of natural shapes.
3.4
Results
The results from Experiment 4 were noisier than from Experiment 3, perhaps because
the unnatural shapes induced more uncertainty in the observers. No significant e↵ect
of our manipulated variables was found. There appears to be no a↵ect of the contour
noise manipulated variable. An increase in the number of skeletal axes tended to
decrease detection. The e↵ect of the skeletal axis manipulation was close to reaching
significance (p = 0.053).
In addition to the analysis of manipulated factors, I again looked at the individual e↵ects of the three complexity measures on detection (which are di↵erent from,
though related to the manipulated factors). The trends were were all in the predicted
direction (see Figs. 3.14, 3.15, and 3.16). Both DL(CONTOUR) and DL(MISFIT) failed
to have a significant e↵ect on detection when looked at in isolation (p = 0.750, and
p = 0.243), as in Experiment 3. The data in this experiment also failed to result in an
isolated e↵ect of DL(SKELETON), but of all three complexities, skeleton complexity
was the closest to having a significant e↵ect (p = 0.068).
As in Experiment 3, in addition to looking for e↵ects of the three complexities in
isolation, a stepwise minimization procedure was used to find the 2-AFC logistic regression model that resulted in the lowest AIC (fit the data best with the simplest model).
Just as in Experiment 3, the winning model includes DL(CONTOUR), DL(SKELETON),
and the interaction between the two factors. An increase in DL(SKELETON) significantly decreased detection (z = −3.56, p = 0.0004). There was a non-significant decrease
in detection as DL(CONTOUR) increased (z = 1.44, p = 0.15), but the factor must be
60
1.4
1.3
Log Odds
1.2
1.1
1
0.9
0.5
1
1.5
2
2.5
3
Contour noise
3.5
4
4.5
Figure 3.12: Performance (log odds of a correct answer) is shown as a function of the
manipulated contour noise variable. The contour noise manipulation appears not to
have a↵ected detection performance.
included in the regression due to a significant interaction between DL(SKELETON)
and DL(CONTOUR) (z = 3.34, p = 0.0008). The interaction between the two complexities was positive, the opposite direction of the two main e↵ects. This means that
there was a sub-additive e↵ect of the two complexities. In other words both increases
in DL(SKELETON) and DL(CONTOUR) decrease detection, though the magnitude of
the decrease when both factors are increased is less than predicted by simply summing
the e↵ect of the two components.
This best fitting model for experimentally manipulated shapes (Experiment
4) was identical to the best fitting model for natural shapes (Experiment 3), and all
the factors had e↵ects in consistent directions. This provides strong converging evidence about the e↵ect of each type of complexity. Shape complexity (DL(SKELETON))
provides the strongest influence on performance. The complexity of the contour
(DL(CONTOUR)) also tended to decrease performance. Finally, there is a sub-additive
interaction between DL(SKELETON) and DL(CONTOUR).
61
1.5
1.4
Log Odds
1.3
1.2
1.1
1
0.5
1
1.5
2
2.5
Axis Branches
3
3.5
Figure 3.13: Performance (log odds of a correct answer) is shown as a function of
manipulated axial structure variable. Performance appear to slightly declined as the
number of branches increases, but the trend is not significance (p = 0.053)
3.5
Discussion
The detection of closed contours in noise could be performed by a completely local process, since detection could occur if the observer only notices a small contour segment.
This detection model would suggest that a holistic or global representation of shape
is not necessary for closed contour detection. If our subjects used a decision process
based solely on a segment of the contour there would be no e↵ect of the complexity of
the shape (as measured by either DL(SKELETON) or DL(MISFIT)). The results, however, showed a significant e↵ect of the complexity of the shape skeleton, suggesting
that the structure of the region bounded by the closed contour is important for detection. Additionally, the complexity of the contour interacted with the complexity of the
shape, suggesting that the complexity of the contour and shape need to be considered
together.
While many of the closed contour studies mentioned earlier suggested that
62
1.5
1.4
Log Odds
1.3
1.2
1.1
1
0.9
450
500
550
DL(CONTOUR)
600
Figure 3.14: Performance (log odds of a correct answer) is shown as a function of
DL(CONTOUR) for the combined subjects’ data from Experiment 4. Error bars show
one standard error. There is a weak trend in the predicted direction; detection was best
for the simple contours and worse for contours with the higher complexity.
63
1.5
1.4
Log Odds
1.3
1.2
1.1
1
0.9
3000
4000
5000
6000
7000
DL(MISFIT)
8000
9000
Figure 3.15: Performance (log odds of a correct answer) is shown as a function of
DL(MISFIT) for the combined subjects’ data from Experiment 4. Error bars show one
standard error. The data is noisy, but the best performance was in the cases of low
DL(MISFIT) as predicted, however the worst performance was at a moderate level of
complexity.
64
1.6
1.5
Log Odds
1.4
1.3
1.2
1.1
1
0.9
0
1000
2000
3000
DL(SKELETON)
4000
5000
Figure 3.16: Performance (log odds of a correct answer) is shown as a function of
DL(SKELETON) for the subjects’ combined data from Experiment 4. Error bars show
one standard error. As with all of the other data, this data is noisy, but there is a weak
trend in the predicted direction; detection was best for the simpler skeletons and worse
for skeletons with higher complexity.
65
closed contours are treated di↵erently than open contours, Tversky et al. (2004) was
unable to find an e↵ect of closure on detection that could not be explained by a
simple grouping rule. The transitivity rule (Geisler & Super, 2000) simply says that if
elements A and B are grouped, and B is grouped to a third element C, then A will be
grouped with C as well. Subjects in Tversky et al. (2004) were able to detect a sequence
of edge elements that closed to form a circle using fewer edge elements than were
necessary to detect an open sequence of edge elements. The performance di↵erence
was predicted by an application of the transitivity rule, which increased the grouping
strength between all of the elements in the circle. No closure mechanism needed to be
introduced.
Our results show that the complexity of the shape is a necessary component
in the explanation of contour detection. Our results, however, are not necessarily in
contradiction with Geisler & Super (2000) or Tversky et al. (2004). The goal of the
studies of contour integration is to learn how separate elements might be grouped into
with neighboring elements. Contour grouping depends on the turning angle between
elements, the proximity of the elements, and the similarity of the orientation of the
individual elements (Field et al., 1993; Geisler et al., 2001). The elements that make
up our contours are pixels. The contour elements have no separation, as they are
neighboring pixels. Pixels have no orientation or phase and the pixels of a contour
were all the same luminance; each element of the contour is maximally similar to one
another. The only property of the elements that a↵ects their grouping is the angle
between neighboring elements. The contours and noise were placed on a pixel grid,
so turning angles are always multiples of 45 degrees. In Experiment 3, using natural
shapes, the large majority of the turning angles were 45 degrees, which would result
in strong integration signals under the standard models (Field et al., 1993; Geisler &
Super, 2000; Geisler et al., 2001; Hess et al., 2003; Tversky et al., 2004), yet it was in this
experiment that the e↵ect of the complexity of the shape was strongest. The stimuli in
both the natural contours and manipulated contours experiments would be grouped
with high probability by other contour integration models. This study looks at shapes
which have a high probability of being grouped, which is a di↵erent set of contours
66
than the set considered in Tversky et al. (2004), which considered both contours that
are likely to be grouped (edge elements generated by the same object) and ungrouped
(elements generated by di↵erent objects/generative procedures). While a benefit for
closure may not be useful for separating edge elements into those that would, or would
not, be grouped by an association field type model, shape complexity is a necessary
component to understand the detectability of contours whose edge elements are all
generated by the same object. For the closed contours used in both Experiments 3 and
4, the e↵ect of shape cannot be ignored.
Standard computer vision techniques were not able to detect the contours in
these experiments, though subjects were generally able to. A more complete understanding of the nature of the complexities a↵ecting human subjects, and the global
shape representation processes that these complexity e↵ects uncover, would take us
closer to achieving computational models that can achieve human-like performance.
In summary, when a contour closes, a shape is formed. The complexity of
this shape, which is not present in open contours, is an important factor in detection
performance. This result is not predicted by conventional shape representation models
or contour integration models, but is a natural consequence of a complexity measure
based on a Bayesian model of shape representation.
67
4. General Conclusions
By framing the problem of contour detection as one of Bayesian inference I have
shown why a decrease in performance is expected with an increase the complexity of
a contour, measured as the Description Length (DL) of the contour. Through a series
of experiments on contour detection I have shown both that DL(CONTOUR) does
capture the complexity relevant for contour detection, and that contour integration is
not a purely local process.
Using closed contours, I was able to support the finding from my experiments
on open contours: that the mechanisms responsible for contour detection do not
consider only local information. By introducing two additional complexity measures
that are related to the shape of the bounded region, DL(SKELETON) and DL(MISFIT),
I was able to show that the shape skeleton captures the information about global shape
that a↵ects detection. Specifically, the data revealed that the e↵ect of DL(CONTOUR)
does not fully explain closed contour detection and that the addition of a global factor,
such as DL(SKELETON), is necessary.
Global shape information is important for tasks other than detection as well.
The shape skeleton has been shown to be important for shape categorization (Wilder
et al., 2011). The orientation of a shape’s skeletal axis a↵ects the perceived orientation
of local texture information on the interior of a shape (Harrison & Feldman, 2009). Lee,
Mumford, Romero, & Lamme (1998) found V4 cells that fire more strongly when their
receptive fields are near the shape skeleton than when their receptive fields are at other
locations in the interior of a bounded region. While these global shape e↵ects have
been found to my knowledge this is the first time global shape information has been
shown to influence contour detection.
It is not obvious why global information is involved in the detection process.
Each closed contour could be detected by locating a small segment of the contour,
letting the whole go unnoticed. Nestor, Vettel, & Tarr (2008) found that di↵erent brain
areas were active if the subject was instructed to identify a face than if the instructions
were to detect a face. The areas active during the detection task are believed to represent
68
local facial features, while the active areas during the identification task are believed to
process the face as a whole. The results of the contour detection experiments presented
here appear to be inconsistent with the findings of Nestor et al. (2008). Nestor et al.
(2008) were measuring BOLD responses from visual areas that are fairly high in the
visual hierarchy, specifically areas believed to be involved in face processing. Standard
theories of contour integration, such as the association field model (Field et al., 1993),
view contour integration as happening earlier in the visual pathway. Even if closed
contour integration involves brain areas further up the visual stream it is unlikely to
involve areas like the fusiform gyrus, where Nestor et al. (2008) found their e↵ect.
Based upon the data from the experiments in chapters 2 and 3, it appears that the
visual system is not easily able to choose between using local or global information;
the visual system takes both into account.
Nestor et al. (2008) is only one study suggesting that global information is
represented in separate locations than local information. Murray et al. (2002, 2004)
and Fang et al. (2008) found that identical stimuli can produce activation in di↵erent
brain areas depending on if the observer is currently perceiving a set of individual
elements or a grouped whole. A similar e↵ect was found in He et al. (2012), where
perceiving a stimulus as un-grouped elements led to one type of adaptation e↵ects
(the tilt aftere↵ect) while perceiving the same stimulus as a whole led to a di↵erent
type of adaptation (the shape aftere↵ect). Those studies show that the visual system
does make use of global representations. They also suggest, as did Nestor et al. (2008),
that the visual system is flexible in the type of representation used. The stimuli from
the current study’s experiments are not easily perceived as individual elements; the
individual pixels are too small to be easily distinguished. The grouping signals for our
stimuli are so strong that the visual system cannot help but group them together and
use a global representation.
The results presented here support neural evidence for a global shape representation. Recently, there has been evidence suggesting that cells in IT simultaneously
represent axial structure and bounding contour information (Hung et al., 2012) and
69
that areas as early as V3 are already switching from the local representation of earlier
areas that are sensitive object properties such as orientation to a global representation
that highlights axial structure (Lescroart & Biederman, 2012).
Zhou, Friedman, & von der Heydt (2000) found that many cells in V2 and V4
(and some cells in V1) were sensitive to what side of a boundary contained the figure,
even for figures that extend far beyond the classical receptive fields of those cells.
Zhaoping (2005) proposed a model of lateral connections between V2 cells to explain
this e↵ect. Craft, Schutze, Niebur, & von der Heydt (2007) showed how models
using solely lateral connections cannot account for some of the neurophysiological
data (such as the speed at which the direction of figure information appears), and
proposed a model with feedback connections from higher levels. This model was
further developed in Mihalas, Dong, von der Heydt, & Niebur (2011). In the model,
at a low level, such as LGN and V1, there are cells that are selective for edges of
certain orientations. At the next level, V2, there are cells that are selective for the
side of the edge that owns the border (usually this is the direction of the figure). At
some higher level there are grouping cells. Each grouping cell gets input from cells at
di↵erent locations in the visual field that are selective for di↵erent directions of border
ownership. These grouping cells are similar to the grassfire algorithm for computing
the medial axis (Blum, 1973). In the grassfire, a “fire” is set at the boundary of a shape
and the flames proceed inward normal to the boundary. When two flames meet, the
flames are extinguished and a pile of ash is left. The set of all the ash piles is the medial
axis. The grouping cells of Craft et al. (2007) look for border ownership cells that are
selective for di↵erent directions, meaning that at the midpoint between the two cells
the normal vectors from the edge elements would intersect, so this point is on the
medial axis.
The current study showed that the complexity of the MAP skeleton, DL(SKELETON),
is related to the detectability of closed contours. Even though local information is sufficient for detection the visual system was unable to avoid representing the global
information. Global information, such as the direction of figure, appears very early in
70
visual processing(Zhou et al., 2000), possibly as a result of feedback connections from
grouping cells (Craft et al., 2007). Grouping cells also explain our ability to attend to
specific objects (Mihalas et al., 2011), which is a possible explanation for why there is
a benefit for perceiving closed contours over open contours. Grouping cells are still a
theoretical construct, but if they are related to the MAP skeleton or other representation
similar to the medial axis, possible places to search for these cells are in V3 or IT (Lescroart & Biederman, 2012; Hung et al., 2012). The current study suggests that global
information, specifically axial structure, is an integral part of the object representation
system, so further exploration to find cells with similar properties to grouping cells
would be worthwhile.
The findings reported here raise several additional questions. The simplest
open contour our model generates is a completely straight segment. The simplest
closed contour is a perfect circle. Removing a small portion of the closed contour
will result in an open contour that is fairly complex as measured by DL(CONTOUR).
Intuition suggests that this contour might be more easily detected than a wavy line
that does not reach closure, even if the total curvature is the same in both contours.
There are two lines of research suggested by this example.
The first line of research considers that the visual system might take additional
information beyond the changes in the direction (turning angles) of a contour. The
system might also care about rate of change in the direction, preferring to keep the
rate of change constant (similar to the results of Pettet, 1999). In future work I hope to
look at the e↵ect of rate of change of curvature on contour detection, and enhance my
model to include a bias for constant curvature in addition to constant direction.
The second line of research suggested by the example of the circle with a
missing segment is that global shape e↵ects could be found on open contours that are
“shape-like” or “part-like”. By estimating skeletons for open contours we can see if
DL(SKELETON) also a↵ects their detectability which would suggest that the global
e↵ect found in closed contours are also present for open contours.
71
4.0.1
Conclusion
Human observers were fairly accurate at detecting which image contained a contour
and which contained noise. Their performance was modulated by the complexity of
the entire contour, as well as the the complexity of the shape contained inside a closed
contour. This has important implications about the nature of object detection and
shape representations. The Gestalt psychologists had the correct intuition that a whole
is special. In addition to simply being a collection of contour points that closes, closed
contours have a shape. These results suggest that models of contour integration (Field
et al., 1993; Geisler & Super, 2000), while considering the importance of the relationship
of neighboring edge elements, could be improved by including information about the
geometry of the entire contour. A more complete model of human contour detection
must accommodate the global e↵ect of contour complexity and the e↵ect of shape
complexity.
By deriving a shape complexity measure this work provided one key component for quantifying the e↵ect of closure. By deriving three complexity measures using
the definition of description length of Shannon (1948) I hope to influence the broader
field of pattern detection which can be studied using this common framework while
allowing flexibility in the choice of generative models that are relevant to each specific
class of stimuli.
72
A. Modeling Approaches
This appendix contains several di↵erent attempts at modeling the detection or decision
processes of the observer, and explanations why these approaches fail.
A.1
Spatial Frequency
One possible way the subjects could have performed the task, if the noise and the
contour images contained di↵erent spatial frequencies, is to simply decide based upon
these di↵erences. When looking at the power spectra of images, most natural scenes
have a lot of low spatial frequency information and very little high spatial frequency
content. Where there is an edge in an image there will be a spike in the medium to high
spatial frequencies. The power spectra of the images in our experiment reveal content
at all spatial frequencies, due to the images being white noise. The power spectrum
of a noise image is indistinguishable from that of an image containing a contour (see
Figure A.1).
A.2
Theoretical Decision Model
This decision model looks at the statistical di↵erences between contours and noise,
and does not consider the process of extracting the information from an image. Let us
begin by assuming that the observer has extracted a chain of pixels from each image,
and that the chain from each image is of the appropriate length (determined by the
length condition - either 110 or 220 pixels). Using equation 2.9, the observer is able to
decide if the chain from each image is a contour or noise. If only one of the chains is
determined to be a contour, the observer will respond by choosing the interval from
which that chain was extracted. If the observer decides that neither chain is a contour
the decision about which interval contained the contour is simply a Bernoulli trial,
or coin flip. Under this model if the observer does not detect the contour in either
interval, each response would be random, resulting in a performance of 50 percent. If
both chains are determined to be contours the observer can pick an interval at random,
73
Figure A.1: The top left is an image containing a contour (in the lower right). The top
right image is the power spectrum of the image in the top left. The image in the bottom
left only contains noise. The bottom right is the power spectrum for the image in the
lower left. The power spectra of the two images are not noticeably di↵erent.
74
0.16
0.14
Proportion
0.12
0.1
0.08
0.06
0.04
0.02
0
0
50
100
150
Straight Continuations
200
250
Figure A.2: This figure shows the distributions of possible contours and possible noise
chains in the long contour condition (220 pixels). The x-axis show the number of
straight continuations (out of a possible 219). The y-axis shows the proportion of pixel
chains with this number of straight continuations. The distributions do not overlap,
so any decision will always be correct.
pick the interval with the largest log-likelihood ratio, or probability match (use a coin
flip to guess which interval, but the coin’s weight is set based upon the relative size of
the two log-likelihood ratios).
Perfect performance will occur if only one interval is believed to have contained
a contour. In order for the model to result in less than perfect performance the contours
would need to be misclassified as noise, or the noise would need to be misclassified
as a contour. This would happen when the distribution of possible contours overlaps
with the distribution of possible noise chains. In order to find a complexity e↵ect, the
distribution of contours in one condition would need to overlap to a greater or lesser
extend than for other complexities. Figure A.2 shows that the distribution of contours
used in the experiment does not overlap with the noise distribution. This means, under
the assumption that the contour is fully integrated on each trial, that the contour will
never be misclassified as noise and noise will never be misclassified as a contour.
75
I have tested several modifications to this model in an attempt to increase
the overlap between the contour distribution and the noise distribution. The first
assumption made is that the for each of the complexity levels we find a di↵erent
distribution of possible contours. For each complexity level there is a number of
straight continuations. I found the binomial distribution whose parameter maximized
the likelihood of obtaining a contour of that complexity. That results in one contour
distribution for each complexity. The most complex contour’s distribution is closest
to the noise distribution, and the simplest contour’s distribution is furthest from the
noise distribution. None of the contour distributions, however, overlap with the noise
distribution. This modification does not result in a model with a complexity e↵ect, but
the order of the distributions is consistent with the complexity e↵ect so this modification
will be used in the other versions of this model.
A second version of the model contains an additional parameter L. The L
parameter is the probability that the observer will find the contour if one exists. The
assumption is that the observer can only attend to a certain number of small regions
in the image. If the contour is contained in one of the locations where the observer
is attending, they will detect it, otherwise they will not. L is directly related to the
number of locations that the observer is able to attend, as the number of locations
increases the probability of detection will increase. The parameter gives control over
how well the observer performs. With a larger L parameter the observer will perform
better, but this does not result in a complexity e↵ect because it does not create a
larger di↵erence between the distributions of straight continuations for the contours
of di↵ering complexities. This parameter would be useful in fitting the actual data,
allowing us to increase or decrease the average performance to match an individual
subjects’ performance, but it does not explain the decreasing trend with complexity
found in the data.
In order to find a complexity e↵ect, the contour distributions need to be shifted
closer to the noise distribution so that there is overlap and so that the amount of overlap
is di↵erent for each of the complexity levels. To do this I introduce a parameter, D, that
76
can be fit for each subject. The D parameter controls the amount of noise pixels that
are included in the chain in addition to the actual signal pixels (or in the noise case in
addition to the noise pixels). As the D parameter is increased the distributions have
increasing overlap. This fails to work because as D increases so does the variance of the
distributions. In order to get the contour and noise distributions to overlap D needs to
be so large that the di↵erent contour complexities are no di↵erence in the performance
for di↵erent levels of complexity because the contour distributions are not sufficiently
distinguishable. In summary, this model either results in perfect performance, as in
the original version of this model, or there is a fixed level of performance.
Since increasing the noise in the contour does not give a model that decreases in
performance as complexity increases, the next modification is to decrease the signal in
the integrated contour. A third parameter is introduced, S. The S parameter controls
how many of the actual pixels from the contour are integrated into the proposed
contour chain. This means that D + S is the length of the hypothesized contour. This
model su↵ers from the same problem as the previous model, in order for the noise
distribution and the contour distributions to overlap the D parameter must be large
relative to the S parameters and the contour distributions are no longer distinguishable
from one another.
One modification to this model resulted in a complexity e↵ect similar to that
found in our subjects. This model, however, was rejected because the assumptions are
not reasonable. In order to find the complexity e↵ect I needed to assume that all of the
deviations from straight are contained in the S signal pixels. This assumption might
actually be justifiable in experiment 2, where the majority of the non-straight turning
angles were grouped together, but is not valid for the other experiments. This model
essentially assumes that the observer detects the segment of the contour containing all
of the curvature, but then does not continue to integrate the rest of the contour, instead
integrating noise pixels. Because this assumption seems unreasonable, and is counter
to any of the existing models of contour integration (which suggest that the straight
portion should be easily integrated) this version of the model is also rejected.
77
A.3
Realistic Front End
The theoretical decision model described above fails because noise was not being
introduced in an e↵ective way. That model assumes that the observer has complete
access to the individual pixels, so one way to introduce noise is to begin with input
that is more similar to what the visual system receives. In this model the images are
blurred with a Gaussian filter.
A bank of oriented filters is applied to the blurred image. I tested both a set
of Gabor filters and square wave filters where each filter contained 1.5 phases (an
“on” region in the center and an “o↵” region in the surround - the “straight surroundsuppression filter”). There is a vertical oriented filter and then filters each 30 degrees.
Filter responses below a threshold are ignored. The responses of all of the filters are
combined into a single image and the several largest connected chain of locations
with high filter responses are isolated. Finally, the chain with the highest probability
under the von Mises model is selected as the candidate contour. The interval with the
candidate contour that has the highest probability is chosen for the response.
For this model to succeed, the filters need to have highest (or possibly lowest)
responses near the contour. The output of the filters (either the Gabor filter or the
straight surround-suppression filter) do not have a high (or low) response near the
location of the contour (see Figures A.3 and A.4). This is likely due to the large amount
of noise surrounding the contour a↵ecting the filter response.
Because the problems with these filters appear to be driven by the noise, the
next approach to test is using a filter that doesn’t suppress the surround, but simply
gives no response in the surround. This is not standard when using image filters,
because filters generally must sum to one. If the sum of the filter is not 1, there
will be an overall increase or decrease in the luminance of the image. I am currently
developing this model and preliminary results appear promising. Figure A.5 shows the
response of vertical filter that ignores the surround. The straight portion of the contour
is highlighted by this filter, but there is still a large amount of spurious responses due to
the noise in the image. Hopefully as this model gets developed further these spurious
78
Figure A.3: The image on the left is an image containing a contour (just left of the
center of the image). On the right is the response to a vertical Gabor filter convolved
with the image. The filter should highlight the vertical portion of the contour, but due
to the integration of the noise surrounding the contour there is not significantly more
activity near the straight portion of the contour than at other locations.
Figure A.4: The image on the left is an image containing a contour (just left of the center
of the image). On the right is the response to a vertical surround-suppression filter
convolved with the image. Just as with the Gabor filter, this filter should highlight
the vertical portion of the contour, but even with the surround noise being suppressed
there is not significantly more activity near the straight portion of the contour than at
other locations.
79
Figure A.5: The image on the left is an image containing a contour (just left of the
center of the image). On the right is the response to a vertical filter that ignores noise
surrounding the middle of the filter. This filter does a much better job at identifying
the location of the contour, suggesting that a model based on this filter should be
developed further.
responses are not strong enough to prevent the detection of the contours.
80
Bibliography
Altmann, C. F., Bültho↵, H. H., & Kourtzi, Z. (2003). Perceptual organization of local
elements into global shapes in the human visual cortex. Current Biology, 13(4), 342 –
349.
Attneave, F. (1954). Some informational aspects of visual perception. Psychological
Review, 61(3), 183–193.
August, J., Siddiqi, K., & Zucker, S. (1999). Ligature instabilities in the perceptual
organization of shape. Computer Vision and Image Understanding, 76(3), 231–243.
Bex, P. J., Simmers, A. J., & Dakin, S. C. (2001). Snakes and ladders: the role of temporal
modulation in visual contour integration. Vision Research, 41(27), 3775 – 3782.
Blum, H. (1973). Biological shape and visual science (part 1). Journal of Theoretical
Biology, 38, 205–287.
Brainard, D. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436.
Braun, J. (1999). On the detection of salient contours. Spatial Vision, 12, 211–225.
Briscoe, E. (2008). Shape skeletons and shape similarity. Ph.D. thesis, Rutgers University.
Brunswik, E., & Kamiya (1953). Ecological cue-validity of ’proximity’ and of other
gestalt factors. The American Journal of Psychology, 66(1), 20–32.
Burns, J. B., Hanson, A. R., & Riseman, E. M. (1986). Extracting straight lines. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 8(4), 425–455.
Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 8(6), 679–698.
Chan, T., & Vese, L. (1999). An active contour model without edges. In M. Nielsen,
P. Johansen, O. Olsen, & J. Weickert (Eds.) Scale-Space Theories in Computer Vision, vol.
1682 of Lecture Notes in Computer Science, (pp. 141–151). Springer Berlin Heidelberg.
Chan, T., & Vese, L. (2001). Active contours without edges. Image Processing, IEEE
Transactions on, 10(2), 266–277.
Chater, N. (2000). Cognitive science: The logic of human learning. Nature, 407, 572–573.
Craft, E., Schutze, H., Niebur, E., & von der Heydt, R. (2007). A neural model of
figure-ground organization. Journal of Neurophysiology.
URL http://jn.physiology.org/content/early/2007/04/18/jn.00203.2007.short
Dakin, S., & Hess, R. (1998). Spatial-frequency tuning of visual contour integration.
Journal of the Optical Society of America, A, 15, 1486–1499.
De Valois, R., Albrecht, D., & Thorell, L. (1982a). The orientation and direction selectivity of cells in macaque visual cortex. Vision Research, 22, 531–544.
De Valois, R., Albrecht, D., & Thorell, L. (1982b). Spatial fequency selectivity of cells in
macaque visual cortex. Vision Research, 22, 545–599.
81
Duda, R. O., & Hart, P. E. (1972). Use of the hough transformation to detect lines and
curves in pictures. Commununications of the ACM, 15(1), 11–15.
Elder, J., & Goldberg, R. (1998). The statistics of natural image contours. Proceedings of
the IEEE Workshop on Perceptual Organisation in Computer Vision.
Elder, J., & Goldberg, R. (2002). Ecological statistics of Gestalt laws for the perceptual
organization of contours. Journal of Vision, 2, 324–353.
Elder, J., & Zucker, S. (1993). The e↵ect of contour closure on the rapid discrimination
of two-dimensional shapes. Vision Research, 33, 981–991.
Ernst, U. A., Mandon, S., Schinkel-Bielefeld, N., Neitzel, S. D., Kreiter, A. K., &
Pawelzik, K. R. (2012). Optimality of human contour integration. PLoS Computational Biology, 8(5), e1002520.
Fang, F., Kersten, D., & Murray, S. O. (2008). Perceptual grouping and inverse fmri
activity patterns in human visual cortex. Journal of Vision, 8(7).
Feldman, J. (1995). Perceptual models of small dot clusters. In I. J. Cox, P. Hansen, &
B. Julesz (Eds.) Partitioning Data Sets, (pp. 331–357). American Mathematical Society.
DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 19.
Feldman, J. (1997). Curvilinearity, covariance, and regularity in perceptual groups.
Vision Research, 37(20), 2835 – 2848.
Feldman, J. (2000). Minimization of boolean complexity in human concept learning.
Nature, 407, 630–633.
Feldman, J. (2004). How surprising is a simple patter? quantifying ”eureka!”. Cognition,
93, 199–224.
Feldman, J., & Singh, M. (2005). Information along contours and object boundaries.
Psychological Review, 112(1), 243–252.
Feldman, J., & Singh, M. (2006). Bayesian estimation of the shape skeleton. Proceedings
of the National Academy of Sciences, 103(47), 18014–18019.
Feldman, J., Singh, M., Briscoe, E., Froyen, V., Kim, S., & Wilder, J. (2013). An intergrated bayesian approach to shape representation and perceptual organization. In
S. Dickinson, & Z. Pizlo (Eds.) Shape Perception in Human and Computer Vision: An
Interdisciplinary Perspective. Springer Verlag.
Field, D. (1987). Relations between the statistics of natural images and the response
properties of cortical cells. Journal of the Optical Society of America, 4, 2379–2394.
Field, D., Hayes, A., & Hess, R. (1993). Contour integration by the human visual
system: Evidence for a ”local association field”. Vision Research, 33(2), 173 – 193.
Field, D., Hayes, A., & Hess, R. (2000). The roles of polarity and symmetry in the
perceptual grouping of contour fragments. Spatial Vision, 13(1), 51–66.
Field, D., & Tolhurst, D. (1986). The structure and symmetry of simple cell receptivefield profiles in the cat’s visual cortex. Proceedings of the Royal Society B, 228, 379–400.
82
Froyen, V., Feldman, J., & Singh, M. (2010). A bayesian framework for figure-ground
interpretation. In NIPS’10, (pp. 631–639).
Garrigan, P. (2012). The e↵ect of contour closure on shape recognition. Perception, 41(2),
221–235.
Geisler, W., Perry, J., Super, B., & Gallogly, D. (2001). Edge co-occurrence in natural
images predicts contour grouping performance. Vision Research, 41, 711–724.
Geisler, W., & Super, B. (2000). Perceptual organization of two-dimensional patterns.
Psychological Review, 107(4), 677–708.
Gibson, J. (1966). The Senses Considered as Perceptual Systems. Boston, MA: Houghton
Mi✏in.
Guttman, M., Prince, J., & McVeigh, E. (1994). Tag and contour detection in tagged mr
images of the left ventricle. Medical Imaging, IEEE Transactions on, 13(1), 74–88.
Guttman, M. A., McVeigh, E. R., & Prince, J. (1992). Contour estimation in tagged
cardiac magnetic resonance images. In Engineering in Medicine and Biology Society,
1992 14th Annual International Conference of the IEEE, vol. 5, (pp. 1856–1857).
Harrison, S. J., & Feldman, J. (2009). The influence of shape and skeletal axis structure
on texture perception. Journal of Vision, 9(6).
He, D., Kersten, D., & Fang, F. (2012). Opposite modulation of high- and low-level
visual aftere↵ects by perceptual grouping. Current Biology, 22(1), 1040–1045.
Hess, R., & Dakin, S. (1997). Absence of contour linking in peripheral vision. Nature,
390, 602–604.
Hess, R., & Field, D. (1995). Contour integration across depth. Vision Research, 35(12),
1699 – 1711.
Hess, R., & Field, D. (1999). Integration of contours: new insights. Trends in Cognitive
Sciences, 3(12), 480 – 486.
Hess, R., Hayes, A., & Field, D. (2003). Contour integration and cortical processing.
Journal of Physiology-Paris, 97, 105 – 119.
Hough, P. (1962). Method and means for recognizing complex patterns. U.S. Patent
3069654.
Hung, C.-C., Carlson, E. T., & Connor, C. E. (2012). Medial axis shape coding in
macaque inferotemporal cortex. Neuron, 74(6), 1099 – 1113.
Jones, J., & Palmer, L. (1987). An evaluation of the two-dimensional Gabor filter model
of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6), 1233–
1258.
Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models.
International Journal of Computer Vision, 1(4), 321–331.
83
Kellman, P., & Shipley, T. (1991). A theory of visual interpolation in object perception.
Cognitive Psychology, 23, 141–221.
Ko↵ka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace, & World.
Kovacs, I., & Julesz, B. (1993). A closed curve is much more than an incomplete one:
e↵ect of closure in figure-ground segmentation. Proceedings of the National Academy
of Sciences, 90(16), 7495–7497.
Kovacs, I., & Julesz, B. (1994). Perceptual sensitivity maps within globally defined
visual shapes. Nature, 370, 644–646.
Lee, P. M. (2004). Bayesian Statistics: An introduction. West Sussex, UK: Wiley, third ed.
Lee, T. S., Mumford, D., Romero, R., & Lamme, V. A. (1998). The role of the primary
visual cortex in higher level vision. Vision Res., 38, 2429–2454.
Lescroart, M., & Biederman, I. (2012). Cortical representation of medial axis structure.
Cerebral Cortex, 23(3), 629–637.
Leymarie, F., Kimia, B., & Giblin, P. (2004). Towards surface regularization via medial axis transitions. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th
International Conference on, vol. 3, (pp. 123–126 Vol.3).
Marotti, R. B., Pavan, A., & Casco, C. (2012). The integration of straight contours
(snakes and ladders): The role of spatial arrangement, spatial frequency and spatial
phase. Vision Research, 71(0), 44 – 52.
Marr, D., & Hildreth, E. (1980). Theory of edge detection. Proceedings of the Royal Society
of London. Series B. Biological Sciences, 207(1167), 187–217.
Mathes, B., & Fahle, M. (2007). Closure facilitates contour integration. Vision Research,
47(6), 818 – 827.
McIlhagga, W. H., & Mullen, K. T. (1996). Contour integration with colour and luminance contrast. Vision Research, 36(9), 1265 – 1279.
Mihalas, S., Dong, Y., von der Heydt, R., & Niebur, E. (2011). Mechanisms of perceptual organization provide auto-zoom and auto-localization for attention to objects.
Proceedings of the National Academy of Sciences, 108(18), 7583–7588.
Movshon, J., Thompson, I., & Tolhurst, D. (1987). Receptive field organization of
complex cells in the cat’s striate cortex. Journal of Physiology (London), 283, 53–77.
Murray, S. O., Kersten, D., Olshausen, B. A., Schrater, P., & Woods, D. L. (2002).
Shape perception reduces activity in human primary visual cortex. Proceedings of the
National Academy of Sciences, 99(23), 15164–15169.
Murray, S. O., Schrater, P. R., & Kersten, D. (2004). Perceptual grouping and the
interactions between visual cortical areas. Neural Networks, 17(5-6), 695–705.
Nestor, A., Vettel, J. M., & Tarr, M. J. (2008). Task-specific codes for face recognition:
How they shape the neural representation of features for detection and individuation. PLoS ONE, 3(12), e3978.
84
Olshausen, B., & Field, D. (1996a). Emergence of simple-cell receptive field properties
by learning a sparse code for natural images. Nature, 381, 607–609.
Olshausen, B., & Field, D. (1996b). Natural image statistics and efficient coding. Network: Computation in Neural Systems, 7, 333–339.
Pelli, D. (1997). The videotoolbox software for visual psychophysics: Transforming
numbers into movies. Spatial Vision, 10, 437–442.
Pettet, M. W. (1999). Shape and contour detection. Vision Research, 39(3), 551 – 557.
Pettet, M. W., McKee, S. P., & Grzywacz, N. M. (1998). Constraints on long range
interactions mediating contour detection. Vision Research, 38(6), 865 – 879.
Psotka, J. (1978). Perceptual processes that may create stick figures and balance. Journal
of Experimental Psychology, 4(1), 101–111.
Ren, X., & Malik, J. (2002). A probabilistic multi-scale model for contour completion
based on image statistics. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.)
Computer Vision - ECCV 2002, vol. 2350 of Lecture Notes in Computer Science, (pp.
312–327). Springer Berlin Heidelberg.
Saarinen, J., & Levi, D. M. (1999). The e↵ect of contour closure on shape perception.
Spatial Vision, 12(2), 227–238.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical
Journal, 27, 379–423 and 623–656.
Siddiqi, K., Shokoufandeh, A., Dickenson, S., & Zucker, S. (1998). Shock graphs and
shape matching. In Computer Vision, 1998. Sixth International Conference on, (pp.
222–229).
Singh, M., & Feldman, J. (2013). Principles of contour information: a response to Lim
and Leek (2012). Psychological Review, 119, 678–683.
Singh, M., & Fulvio, J. M. (2005). Visual extrapolation of contour geometry. Proceedings
of the National Academy of Sciences of the United States of America, 102(3), 939–944.
Singh, M., & Fulvio, J. M. (2007). Bayesian contour extrapolation: Geometric determinants of good continuation. Vision Research, 47(6), 783 – 798.
Tversky, T., Geisler, W., & Perry, J. (2004). Contour grouping: Closure e↵ects are
explained by good continuation and proximity. Vision Research, 44, 2769–2777.
Wertheimer, M. (1958). Principles of perceptual organization. In D. Beardslee, &
M. Wertheimer (Eds.) Readings in perception, (pp. 103–123). Princeton: Van Nostrand
(translated from original published in 1923).
Wilder, J., Feldman, J., & Singh, M. (2011). Superordinate shape classification using
natural shape statistics. Cognition, 119(3), 325 – 340.
Yuan, C., Lin, E., Millard, J., & Hwang, J.-N. (1999). Closed contour edge detection of
blood vessel lumen and outer wall boundaries in black-blood {MR} images. Magnetic
Resonance Imaging, 17(2), 257 – 266.
85
Yuille, A. L., Fang, F., Schrater, P. R., & Kersten, D. (2004). Human and ideal observers
for detecting image curves. In Nerual Information Processing Systems (NIPS).
Zhaoping, L. (2005). Border ownership from intracortical interactions in visual area
V2. Neuron, 47(1), 143 – 153.
Zhou, H., Friedman, H. S., & von der Heydt, R. (2000). Coding of border ownership in
monkey visual cortex. The Journal of Neuroscience, 20(17), 6594–6611.