Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THE INFLUENCE OF COMPLEXITY ON THE DETECTION OF CONTOURS by JOHN WILDER A dissertation submitted to the Graduate School—New Brunswick Rutgers, The State University of New Jersey in partial fulfillment of the requirements for the degree of Doctor of Philosophy Graduate Program in Psychology Written under the direction of Dr. Jacob Feldman and approved by New Brunswick, New Jersey October, 2013 ABSTRACT OF THE DISSERTATION The influence of complexity on the detection of contours By JOHN WILDER Dissertation Director: Dr. Jacob Feldman Detecting objects in visual scenes is an important function of the visual system. Studies of contour detection and contour integration have answered many questions about the human visual system, but the role of contour geometry is not well understood. This thesis considers the problem of contour detection as a Bayesian decision problem. I begin by describing a generative model for natural contours. Bayesian arguments predict that simple contours (high probability under the generative model) should be easier to detect than more complex (lower probability) ones. In the case of open contours (Experiments 1 and 2), a complexity measure follows from a well-established contour-generating model. For closed contours, which have been far less studied, complexity measures require a more novel model that involves the shape of the region enclosed by the contour. The results of closed contours (Experiments 3 and 4) show that the complexity of the contour and the complexity of the shape of the bounded region jointly a↵ect the ability of human observers to detect the contour in a noise field. In summary, contour integration has been treated mainly as a local grouping problem, but these results suggest that there is an important role for global factors in detection. Additionally, while the mathematical framework for measuring complexity was used here to study contour detection, it is also general enough to be useful in all areas of pattern detection where an explicit generative model is defined. ii Acknowledgements While working on this dissertation I have received help and support from many people. I wish to say thank you to them all. My fellow graduate students and lab members have supported and encouraged me, and provided opportunities for fun times and relaxation. There are too many people to name here, but a special thanks to everyone who attended CAST meetings and my team mates on Psy Jung. Thanks to Min Zhao for all her help with the kids. A heart felt thank you to the sta↵ of the Psychology Department and the Center for Cognitive Science. Without the help of Anne Sokolowski, Donna Tomaselli, John Dixon, Sue Cosentino, and JoAnn Meli I would not have been able to navigate the Rutgers Bureaucracy, and made sure I was registered for classes, had insurance coverage and was getting paid. Thank you to all my committee members: Jacob Feldman, Manish Singh, Eileen Kowler, and Ahmed Elgammal. Their input was useful in making this a dissertation of which I am very proud. Thanks to both Jacob and Eileen for their extensive work improving my writing skills. The di↵erence between my writing when I began my graduate career and now is astonishing. Thank you for helping develop this skill that is necessary for succeeding in a research career. Thank you to Manish Singh, for all the valuable help debugging research ideas and deciding the appropriate statistical methods to use in my many research projects. Thank you to Eileen Kowler, who has not only spend an abundance of time helping with my intellectual development, but also helped me develop presentation skills, introduced me to many important people in the field, and helped make it easy to balance work and family. Thank you to Jacob for all of the assistance in developing research ideas, designing experiments, finding relevant literature, encouraging me along the way, and discussing what movies and TV shows I should be watching. Thank you to my family, especially Kristina. It is for you I have worked so hard. iii Dedication The dissertation is dedicated to Kristina, Brigid, and Baby Stu. Kristina, thank you for your help and support throughout graduate school, for listening to research ideas, and participating in experiments. Brigid, thank you for all of the fun and games, reading stories, and thank you for behaving nicely when I was busy writing and couldn’t play. Baby Stu, thank you for waiting to be born until after I completed this dissertation, and thank for all the fun I know we will have as you grow up. iv Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1. General Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1. Contour Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2. Natural Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3. Open Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4. Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2. Open Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.0.1. Smooth contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1. Bayesian model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2. Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Procedure and Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3. Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Procedure and Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4. General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6. Open vs Closed Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3. Closed Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1. Experiment 3: Natural Shapes . . . . . . . . . . . . . . . . . . . . . . . . . 36 v 3.1.1. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Design and Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.1.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2. Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.1. Contour Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.2. Shape Complexity Measures . . . . . . . . . . . . . . . . . . . . . 44 3.3. Experiment 4: Experimentally Manipulated Shapes . . . . . . . . . . . . 55 3.3.1. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Design and Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4. General Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.0.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Appendix A. Modeling Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 72 A.1. Spatial Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 A.2. Theoretical Decision Model . . . . . . . . . . . . . . . . . . . . . . . . . . 72 A.3. Realistic Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 vi List of Figures 2.1. Sample Stimulus and Blown-up patch from the stimulus image . . . . . 13 2.2. Example stimulus in experiments 1 and 2. . . . . . . . . . . . . . . . . . . 18 2.3. Individual subjects’ performance as a function of complexity in Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4. Experiment 1 data, combined across subjects. Performance is shown as a function of complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5. Individual subjects’ performance as a function of complexity in Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6. Experiment 2 data, combined across subjects. Performance is shown as a function of complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.7. Individual subjects’ performance as a function of the location of the complexity in Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.8. Experiment 2 data, combined across subjects. Performance is shown as a function of the location of the complexity. . . . . . . . . . . . . . . . . . 26 3.1. Examples of natural shapes, from which contours would be extracted and embedded in noisy images. . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2. Example natural closed contour embedded in noise. . . . . . . . . . . . . 39 3.3. Combined subjects’ performance as a function of DL(CONTOUR). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . . 45 3.4. Individual subjects’ performance as a function of DL(CONTOUR). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . . 46 3.5. Three bears example of the MAP Skeleton . . . . . . . . . . . . . . . . . . 48 3.6. A graphical depiction of DL(SKELETON) and DL(MISFIT) . . . . . . . . 50 3.7. Combined subjects’ performance as a function of DL(MISFIT). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.8. Individual subjects’ performance as a function of DL(MISFIT). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . . . . vii 52 3.9. Combined subjects’ performance as a function of DL(SKELETON). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . . 53 3.10. Individual subjects’ performance as a function of DL(SKELETON). Experiment 3 (Natural Shapes) data. . . . . . . . . . . . . . . . . . . . . . . . 54 3.11. Examples of shapes used in Experiment 4 (experimentally manipulated shapes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.12. Combined subjects’ performance as a function the manipulated contour noise variable. Experiment 4 (Manipulated Shapes) data. . . . . . . . . . 60 3.13. Combined subjects’ performance as a function of the manipulated axial structure variable. Experiment 4 (Manipulated Shapes) data. . . . . . . . 61 3.14. Combined subjects’ performance as a function of DL(CONTOUR). Experiment 4 (Manipulated Shapes) data. . . . . . . . . . . . . . . . . . . . 62 3.15. Combined subjects’ performance as a function of DL(MISFIT). Experiment 4 (Manipulated Shapes) data. . . . . . . . . . . . . . . . . . . . . . . 63 3.16. Combined subjects’ performance as a function of DL(SKELETON). Experiment 4 (Manipulated Shapes) data. . . . . . . . . . . . . . . . . . . . 64 A.1. Power spectrum of stimulus image . . . . . . . . . . . . . . . . . . . . . . 73 A.2. Distributions of the number of straight continuations . . . . . . . . . . . 74 A.3. The response of a vertical Gabor filter to an image from Experiment 1 . . 78 A.4. The response of a vertical surround-suppression filter to an image from Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 A.5. The response of a vertical filter that ignores the surround to an image from Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 79 1 1. General Background In order to properly function in the environment we must be able to move around without running into objects and without being run over by other moving objects. We need to decide which objects in the environment are okay to eat, and which ones are not okay to eat. In order to eat the objects that are edible, we must be able to grasp the objects so they can be moved to our mouth. Additionally, we need to decide whether or not an object in our environment is likely to eat us. All of these decisions rely on accurately perceiving the properties of the relevant objects. The human visual system, however, does not receive input in a form where all of the objects are segmented, localized, and labeled. The visual system is faced with the problem of extracting objects from cluttered scenes and correctly representing the properties of the extracted objects. The input of the visual system is the image on the retina: a set of responses from individual photoreceptors that are completely ungrouped. Finding the correct grouping of the individual elements is a difficult problem. There has been a long history of attempts to describe how the visual system groups individual elements and combines them in order to detect the objects in the environment. Unfortunately, this problem remains unsolved for both the human visual system and computer vision. The grouping of elements in the visual field has been well studied since the time of the Gestalt psychologists in the early to mid 1900s. The Gestalt Psychologists described relationships between individual elements that result in those elements being perceived as a whole object instead of simply individual elements (Wertheimer, 1958), famously noting that the whole object is di↵erent than the sum of the individual parts (Ko↵ka, 1935). Their theories were descriptive in nature, explaining what contributes to grouping, but were not quantitative, lacking rigorous detail about the mathematics behind the di↵erent grouping cues. Fast-forwarding to the latter half of the century, Field, Hayes, & Hess (1993) specifically looked at how individual elements of the visual field are integrated into a single contour and they proposed a model of the brain mechanisms to perform the 2 integration, allowing for concrete predictions about which elements will be grouped. 1.1 Contour Integration When attempting to detect the presence of an object, the observer must group the individual components of the visual field into a representation of that object. In order to decide what is (or is not) an object the brain must find the boundary between the object and other objects or the background. The first step in this process is finding candidate edge elements, followed by the grouping of the individual elements that make up the bounding contour of the object. If individual elements at the boundary are strongly grouped the visual system can infer that they are all part of the same contour, and detection will succeed, otherwise it will not. In order to discover how the brain detects contours we must understand both how the individual elements are encoded and how the brain integrates those elements. De Valois, Albrecht, & Thorell (1982a,b), Field & Tolhurst (1986) and Movshon, Thompson, & Tolhurst (1987) all studied the properties of the primary visual cortex of both cats and monkeys. The cells from which they recorded tended to be orientation selective, and localized in both space and frequency. Jones & Palmer (1987), motivated by their findings, suggested that a good approximation of the response properties of these cells is a sinusoidal grating under a Gaussian envelope (a Gabor filter). Results such as these are one of the justifications for using Gabor stimuli in studies of contour integration. Field et al. (1993) showed subjects an image containing many Gabor patches (i.e. images of 2D Gabor filters). Some of the Gabor patches were positioned in such a way that the visual system would group them into a single contour. Since Field et al. (1993), there have been numerous follow-up studies of human contour integration, which will be described below. In order to detect a contour in a field of Gabor patches, the visual system must process the individual Gabor patches and group them into a chain of elements. Field et al. (1993) proposed a model that could perform this grouping task, the association 3 field model. Their model groups elements that are parallel to the contour, where the change in the path angle was less than 60 degrees, and where the separation between the elements is less than 7 times their width. The association field model, like the ideas of the Gestalt psychologists, suggests that certain properties are necessary for the grouping of similar elements. Unlike the work of the Gestalt psychologists, the association field model is able to make concrete, quantitative predictions about which Gabor patches in a larger field will be grouped. In order to be grouped, elements must have similar orientations (like the Gestalt law of similarity), the elements must be near to each other (the law of proximity), and the turning angle between elements must be small (similar to the laws of continuity and common fate) (Field et al., 1993; Hess, Hayes, & Field, 2003). The polarity or color of the individual Gabor patches and the phase of those patches need not be the same in subsequent elements in order to be integrated in a single chain of elements (McIlhagga & Mullen, 1996; Hess & Dakin, 1997; Dakin & Hess, 1998; Field, Hayes, & Hess, 2000). Additionally, having the same orientation greatly increases the strength of grouping, however if the orientation of the elements are perpendicular grouping will be stronger than for elements with random orientations (Bex, Simmers, & Dakin, 2001; Marotti, Pavan, & Casco, 2012). All of the contour integration studies discussed above use open contours, that is, a chain of elements that never crosses itself. There has been less work on the detection of contours that touch at their ends resulting in a bounded interior region, in other words, closed contours (Kovacs & Julesz, 1993; Elder & Zucker, 1993; Pettet, McKee, & Grzywacz, 1998; Braun, 1999; Tversky, Geisler, & Perry, 2004; Mathes & Fahle, 2007). This is surprising because closed contours are likely more behaviorally relevant than open contours. The visual system’s preference for closed contours is so strong that even when the visual system encounters an open contour it seeks a perceptual explanation that allows it to infer a closed contour (Kellman & Shipley, 1991). With the exception of Tversky et al. (2004), the closed contour integration literature supports the Gestalt law of closure. A group of Gabor patches that connect to form a closed shapes are 4 more easily detected (Kovacs & Julesz, 1993; Pettet et al., 1998; Braun, 1999). Contrast sensitivity on the interior of a shape is enhanced at certain locations (Kovacs & Julesz, 1994). Also, closed shapes are more easily discriminated (Elder & Zucker, 1993) and remembered (Garrigan, 2012) than non-closed shapes. The mechanisms that result in this closure benefit are still not understood. Field et al. (1993) was motivated by a desire to understand the mechanisms in the early visual system that integrate individual elements into a whole. Models of contour integration explain what they visual system does. These models do not answer questions about why the visual system might have evolved to work in this way, nor do they tell us how efficiently the visual system solves these problems, that is, how close the system comes to optimal contour detection. Field (1987) measured statistical regularities in natural images, and spawned an area of research that looks at how the properties of our environment relate to the design of the visual system and if the visual system efficiently represents those regularities. 1.2 Natural Images Both Brunswik & Kamiya (1953) and Gibson (1966) stressed the importance of understanding the environment before we are able to fully understand human cognition and perception. Understanding the environment and its relationship to the visual system was the motivating idea for the study of natural image statistics. By understanding the structure of the environment we can measure how efficient our visual systems are at representing that environment. Additionally, knowledge of the statistical regularities in the environment can lead to a better understanding of the driving forces in the evolution of the visual system. Field (1987) showed that it is possible to understand the structure of natural images, and that the Gabor filter model of early visual cortex is an efficient way to represent natural images. In other words, the structure of the human visual system is well matched to the structure of natural images. Olshausen & Field (1996a,b) trained a system to represent natural images with constraints that the representation must be 5 sparse and accurate. The filters the system learned to use were very similar to Gabor filters, again showing that the structure of natural images lend themselves nicely to a representation based on a bank of Gabor filters. A set of Gabor filters is an efficient way to represent natural images, suggesting a possible reason why the visual system developed to have receptive fields with similar properties to Gabor filters. Is there similar evidence to support the association field type models of integrating individual Gabor elements into a perceived whole? Elder & Goldberg (1998) also measured the statistics of images; specifically statistics that relate to grouping rules based on proximity, good continuation, and brightness similarity. These three grouping rules were used to define a probability distribution over contours. Contours that follow all of the grouping rules have high likelihoods, and contours that do not meet the grouping rules have low likelihoods. Elder & Goldberg (1998) also found that the contours identified by human observers tended to have high likelihood under the model based on the three grouping rules. Geisler, Perry, Super, & Gallogly (2001) continued this line of work, finding similar results. They showed that contours in natural images tend to be smooth (the turning angle between elements is small), the edge elements tend to be close in proximity, and the elements tend to be co-circular (resulting in the elements that are parallel to the contour and have similar orientations as their neighbors). Geisler et al. (2001) also tested human observers’ abilities to detect contours in a field of random edge elements. The human observers were more likely to detect the contours that were more likely to occur in natural images. Taken together, the results of Elder & Goldberg (1998) and Geisler et al. (2001) show that the edge elements belonging to the same contours in natural images are also integrated into a single contour by human observers, whose performance was shown, by Field et al. (1993), to be well explained by the association field model. In other words, the association field model explains human contour detection because the visual system is well matched to the environment in which it evolved. 6 1.3 Open Questions Historically, the work on contour integration in psychology and neuroscience primarily focuses on explaining what the visual system does. In contrast, natural image statistics are measured in order to answer questions about why the system works this way, under an assumption that the system is near optimally evolved to match the properties of the environment. The work on natural image statistics has revealed important information about the structure of the environment. There is, however, a weak point in the studies of natural image statistics that were discussed above. In both Elder & Goldberg (1998) and Geisler et al. (2001) some of the statistics were collected from contours traced by observers. The weakens the overall conclusions to say that human subjects are good at detecting the contours in natural images that were detected by other human subjects. These findings could be strengthened by giving an additional reason why the human visual system solves the contour detection problem using the grouping rules found in the literature described above, which is one goal of this thesis. Open contour detection has been extensively studied, but it is not clear how frequently we encounter open contours in our environment. Objects can be segmented from the background for further analysis only after the bounding contour of the object is detected. This bounding contour will be a closed contour. Closed contours are the contours that define the objects in our environment and as such are the contours most relevant for human behavior. Because closed contour detection is less well studied it is also less well understood. Specifically, it is not known if local interactions between elements can account for closed contour detection, or if there are more global properties involved. A second goal of this thesis is to investigate the role of local and global e↵ects in contour detection. The bounded region inside a contour has a shape. Perhaps one of the reasons why closed contour detection is not well understood is because the detection may not only be a↵ected by properties of the contour but by properties of the shape as well. In order to account for the e↵ect of shape, the theory would need a model of shape in addition to the model of contour integration. Is there a representation of shape 7 that contains the psychologically meaningful information that is useful for contour detection? A third goal of this thesis is to investigate how shape might be represented in a meaningful way for a detection task. 1.4 Dissertation Overview The approach in this thesis is to explain why the system functions the way it does by first seeking to understand the nature of the decision problem facing the system. I will show that the design of the system is not simply due to the structure of natural contours, but due to the mathematics of the decision problem. Chapter 2 investigates the detection of open contours (i.e. contours that do not cross or close on themselves). A generative model of natural contours is proposed. This generative model is used as a basis for a complexity measure, which is in turn shown to be related to the detectability of the contours. The complexity of the entire contour is important, not simply local points of high complexity. Using the complexity measure I show that by setting up the detection problem as a decision between two alternatives (seeing noise or seeing a contour) a decrease in performance with an increase in curvature is expected. Chapter 3 investigates the detection of closed contour. Through two experiments, one using natural contours and the other using experimentally manipulated contours, we show that the detectability of the contours cannot be completely explained by the contour complexity measure used for open contours. Additional complexity measures are introduced that involve the complexity of the bounded interior region, as opposed to simply the bounding contour. I explain why the MAP skeleton representation of shape (Feldman & Singh, 2006) is a principled choice for the basis of the shape complexity measure. In order to explain our subject’s detection results, both the complexity of the contour and the complexity of the shape of the bounded region must be considered. These results suggest that, when performing a contour detection task, the visual system integrates information across a larger region of space, rather than considering 8 only a small neighborhood near each contour point. Additionally, the mathematical structure of the detection problem reveals a principled reason why our visual system evolved grouping mechanisms that prefer less complex contours. 9 2. Open Contours The detection of coherent objects amid noisy backgrounds is an essential function of perceptual organization, allowing the visual system to distinguish discrete whole forms from random clutter. Psychophysical aspects of this process have been extensively studied using contour detection tasks, in which subjects are asked to detect coherent chains of oriented elements amid background fields containing randomly oriented elements (Field et al., 1993; Kovacs & Julesz, 1993; Hess & Field, 1995; Pettet et al., 1998; Hess & Field, 1999; Field et al., 2000; Geisler et al., 2001; Yuille, Fang, Schrater, & Kersten, 2004). From this literature we now know that cells in the visual cortex are able to transmit the information contained in natural images efficiently (Field, 1987), that closed contours are more easily detected than open contours (Kovacs & Julesz, 1993), that contours can be integrated across depths (Hess & Field, 1995), that large curvature discontinuities disrupt detection (Pettet et al., 1998), that the tokens that are integrated do not need to appear identical (Field et al., 2000), and that human detection performance is close to ideal for contours whose properties are similar to those found in natural images (Yuille et al., 2004). But despite this extensive literature, the computational processes underlying contour detection are still poorly understood. Field et al. (1993) showed that contours are more difficult to detect when the turning angles, ↵, between consecutive elements are large, an e↵ect widely interpreted as implicating lateral connections between oriented receptive fields in visual cortex, the “association field.” This preference for small turning angles is sometimes expressed as a preference for smooth curves, because turning angle can be thought of as a discretization of the curvature of an implied underlying contour. (Curvature is the derivative of the tangent ~t with respect to arclength s. The turning angle ↵ is the angle by which the tangent changes between samples separated by ∆s, i.e. ↵ ⇡ ∆s~t.) The association field is useful in describing what elements are integrated into a contour. However, it is a qualitative model that can be thought of as simply separating contours into two groups, those that are integrated and those that are not. It does not anything to say about the di↵erences between two contours that 10 both have been integrated, or the total integrated complexity of the entire contour, only considering the relationship between neighboring elements. A number of more recent studies have confirmed that, broadly speaking, detection performance declines with larger turning angles (Ernst, Mandon, Schinkel-Bielefeld, Neitzel, Kreiter, & Pawelzik, 2012; Geisler et al., 2001; Hess et al., 2003; Pettet et al., 1998). The more the curve “zigs and zags,” the less detectable it is. Nevertheless a principled quantitative model of the decline in performance as a function of turning angle, predicting exactly how performance is a↵ected by contour geometry, and why, is still elusive. In part this reflects the lack of an integrated probabilistic account of the decision problem inherent in contour detection. Below I aim to take steps towards such a model by casting the contour detection problem as a Bayesian decision problem, in which the observer must decide based on a given arrangement of visual elements whether a “smooth” contour is actually present. To this end I introduce a simple probabilistic model of smooth contours, derive a Bayesian model for detecting them amid noisy backgrounds, and compare the predictions of the model to the performance of human subjects in a simple contour detection task. 2.0.1 Smooth contours To model contour detection as a decision problem, we need an explicit probabilistic generative model of contours, i.e. a model that assigns a probability to each potential sequence of turning angles. The association field and similar models assume simply that smooth contours consist of chains of small (i.e., relatively straight) turning angles, without any explicit probabilistic model. A more specific probabilistic model of turning angle has been proposed and developed in Feldman (1995, 1997); Feldman & Singh (2005); Singh & Fulvio (2005, 2007). In this model, I assume that turning angles along a smooth contour are approximately straight (collinear), with random deviations from collinearity that are distributed according to a von Mises distribution of turning angles, p(↵|C) / exp β cos ↵. (2.1) 11 This equation defines the likelihood model for angles under the contour (C) hypothesis, which indicates the probability of each particular turning angle if this hypothesis holds. The von Mises distribution is similar to a Gaussian (normal) distribution, except suited to angular measurements, with the parameter β acting like the inverse of the variance. In practice it is often more mathematically convenient to work with a Gaussian distribution of turning angle, " ✓ ◆ # 1 ↵ 2 1 , p(↵|C) = p exp − 2 σ σ 2⇡ (2.2) which is nearly identical numerically to the von Mises for small angles. (The two distributions’ Taylor series are the same up to the first two first terms; see Feldman & Singh, 2005.) This distribution (whether von Mises or Gaussian) captures the idea that at each point along a smooth contour, the contour is most likely to continue straight (zero turning angle), with larger turning angles increasingly unlikely. I then extend this local model to create a likelihood model for an entire contour by assuming that successive turning angles are identically and independently distributed (i.i.d.) from the same distribution. Thus a sequence [↵i ] = [↵1 , ↵2 , . . . ↵N ] of N turning angles has probability given by the product of the individual angle probabilities, LC = p([↵i ]|C) = p(↵1 |C)p(↵2 |C) . . . p(↵N |C) (2.3) which, under the Gaussian approximation, equals LC = p 1 2⇡σ !N 3 2 N 77 66 1 X 2 6 exp 664− 2 ↵i 7775 2σ (2.4) i=1 Technically, the assumption that successive turning angles are i.i.d. means that the sequence of orientations (local tangents) along the contour form a Markov chain (nonadjacent orientations are independent conditioned on intervening ones). Contours generated from this model undulate smoothly but erratically, turning left and right equally often, though always most likely straight ahead. This generative model can be 12 augmented, such as to include an bias towards cocircularity (continuity of curvature; see Singh & Feldman, 2013), though in what follows we use the simpler collinear model. 2.1 Bayesian model With this simple contour model in hand, I next derive a simple model of how the observer can distinguish samples drawn from it (that is, smooth contours) from noise. In the experiments below, we embed dark contours generated via the above model in fields of random pixel noise (Fig. 2.1a). Most studies of contour detection starting with Field (1987) have used displays constructed from Gabor elements arranged in a spatial grid, in part to avoid element density cues, and also to optimize the response of V1 cells. However such displays are extremely constrained in geometric form, and partly for this reason Geisler et al. (2001) used simple line segments that can be arranged more freely, and Yuille et al. (2004) used matrices of pixels. My aim was to achieve complete flexibility of target contour shape, so that I could study the e↵ects of contour geometry as comprehensively as possible. So, like Yuille et al. (2004), I constructed each display by embedding a monochromatic contour (a chain of pixels of equal luminance) in pixel noise (a grid of pixels of random luminance; Fig.2.1a). This construction allows considerable freedom in the shape of the target contour, while still presenting the observer with a challenging task. I begin by deriving a simple model of how an observer can distinguish an image that contains a target generated from the above smooth contour model from an image that does not (pure pixel noise). The target curve consists of a chain of dark pixels passing through a field of pixels of random luminances. Within each local image patch, a simple strategy to distinguish a target from a distractor is to consider the longest path of uniform luminance path through the patch, and decide whether the path is more likely to have been generated by the smooth curve model C, with von Mises distributed turning angles, or is simply part of a patch of random background pixels (Fig. 2.1b), which I will refer to as the “null” model H0 . We can use Bayes’ rule 13 a b Figure 2.1: (a) Sample image (with contour target) from experiments. (b) Blowup of a patch of the image, showing a chain of locally darkest pixels. The observer must decide if this path is a smooth path drawn from the von Mises model, or a random walk resulting from the pixel noise. to assess the relative probability of the two models, and then extend the decision to a series of image patches extending the length of the potential contour. Under the smooth contour model, the observed sequence of turning angles has likelihood given in Eq. 2.4, the product of an i.i.d sequence of von Mises (or normal) angles. Conversely, under the null model, the contour actually consists of pixel noise, and each direction is equally likely to be the continuation of the path, yielding a uniform distribution over turning angle p(↵|H0 ) = ✏. (In experimental displays, ✏ = 1/3 because the curve could continue left, straight, or right, all with equal probability, but here I state the theory in more general terms.) In the null model, as in the contour model, I assume that turning angles are all independent conditioned on the model, so the likelihood of the angle sequence [↵i ] is just the product of the individual angle probabilities, L0 ([↵i ]) = ✏N , reflecting a sequence of N “accidental” turning angles each with probability ✏. (2.5) 14 By Bayes’ rule, the relative probability of the two interpretations C and H0 is given by the posterior ratio p(C|[↵i ])/p(H0 |[↵i ]). Assuming equal priors (which is reasonable in our experiments as targets and distractors appear equally often), this posterior ratio is simply the likelihood ratio LC /L0 . As is conventional, I take the logarithm of this ratio to give an additive measure of the evidence in favor of the contour model relative to the null model (sometimes called the Weight of Evidence, WOE): log LC /L0 = log LC − log L0 . (2.6) The first term in this expression, the log likelihood under the contour model, is familiar as the (negative of the) Description Length (DL), defined in general as the negative logarithm of the probability: DL(M) = − log p(M) (2.7) As Shannon (1948) showed, the DL is the length of the representation of a message M having probability p(M) in an optimal code, making it a measure of complexity of the message M. (Once the coding language is optimized, less probable messages take more bits to express.) In this case, plugging in the expression for the likelihood of the contour model (Eq. 2.4) (and using natural logarithms for convenience), the DL of the contour [↵i ] is just p 1 X 2 ↵ , DL([↵i ]) = − ln LC = N ln(σ 2⇡) + 2 2σ i i (2.8) and the weight of evidence in favor of the contour model is ln(LC /L0 ) = −DL − N ln ✏, (2.9) the negative of the DL minus a constant. For a contour of a given length N, the only term in Eq. 2.8 that depends on the 15 shape of the contour is P 2 i ↵i , the sum of the squared turning angles along its length. (The N ln ✏ term is a constant that does not depend on the observed turning angles.) The larger this sum, the more the contour undulates, and the higher its DL. The larger the DL, the less evidence in favor of the smooth contour model; the smaller the DL, the smoother the contour, and the more evidence in favor of the smooth model. Hence the Bayesian model leads very directly to a simple prediction about contour detection performance: it should decline as contour DL increases. The larger the DL, the less intrinsically detectable the contour is, and the more the contour is intrinsically indistinguishable from pixel noise. Because this prediction follows so directly from a simple formulation of the decision problem, in what follows I treat it as a central empirical issue. In the following experiments I manipulate the geometry of the target contour, specifically its constituent turning angles, and evaluate observers’ detection performance as a function of contour DL. More specifically, the data can be regarded as a test of the specific contour likelihood model I have adopted, which quantifies the probability of each contour under the model and thus, via Shannon’s formula, its complexity. As suggested by Attneave (1954), contour curvature conveys information; the more the contour curves, the more “surprise” under the smooth model, and thus the more information required to encode it (Feldman & Singh, 2005). But DL, Shannon’s measure of information, is also a measure of the contour’s intrinsic distinguishability from noise. The DL is in fact a sufficient statistic for this decision, meaning that it conveys all the information available to the observer about the contour’s likelihood under the smooth contour model. Of course, as discussed above, it is well known that contour curvature decreases detectability, and the derived dependence on the summed squared turning angles may simply be regarded as one way of quantifying that dependence. But the above derivation is novel because (a) it shows how the specific choice of contour curvature measure, the summed squared angle, derives from a standard contour generating model, and (b) it shows how the predicted decrease in detectability relates to the Description Length, 16 which is a more fundamental relationship that generalizes far beyond the current situation. From this point of view, contour detection is simply a special case of a more general problem, the detection of structure in noise. Generalizing the argument above, whenever the target class can be formulated as a stochastic generative model with an associated likelihood function, detection performance should decline as DL under the model increases. In the experiments below, I test this prediction for the case of open contours, that is contours that do not cross over themselves. 2.2 2.2.1 Experiment 1 Method Subjects Ten naive subjects participated in the experiment as part of course credit for an introductory Psychology course. Procedure and Stimuli Stimuli were displayed on a gamma corrected iMac computer running Mac OS X 10.4 and Matlab 7.5, using the Psychophysics toolbox (Brainard, 1997; Pelli, 1997). The task was a 2IFC task. The subjects’ task was to decide if a contour was present in the first or second of two target images. There were three masks used. A premask was presented prior to the first target image, a post mask was presented after the second target image, and a mask was presented between the two target images. Each target image and mask was displayed for 500 ms, and there was a 100 ms blank period between an image and mask or mask and image. All images were presented at about 6.5 degrees of visual angle. Each image was 15 cm x 15 cm on the display. The head position was not constrained, but was initially located at 66 cm from the display and subjects were instructed to remain as still as possible. If necessary, during three breaks which were evenly spaced throughout the experiment, the head was repositioned to be 66 cm from the display. 17 The target images were created by randomly sampling a 225 by 225 matrix of intensity values. Each pixel was drawn from a uniform distribution between 0 and 1. There were two target images, in one of which a contour would be embedded. The contour was either a long contour (220 pixels) or a short contour (110 pixels). The contour was generated by sampling sequential turning angles (✓) using a discretized approximation to a von Mises distribution. Continuing straight (✓ = 0) is more common than a turning angle right or left (✓ = ⇡/4 or ✓ = −⇡/4). Once the contour was generated it was dilated using the dilation mask ([010; 110; 000]). The dilation was performed so that a perfectly straight contour oriented horizontally or vertically was detected as easily as one oriented at an arbitrary angle in a pilot study involving the experimenter. Now, in order that the contour doesn’t appear at a di↵erent scale from the random pixels in the image, the image was increased in size, to 550 by 550 pixels, using nearest-neighbor interpolation so that where each pixel in the smaller image is now a two-by-two pixel area in the images. Finally the contour was embedded in on of the target images. The contour was randomly rotated, and then was placed at a random location in the image, with the restriction that the contour may not extend beyond the edge of the image. An example image, containing a contour, can be seen in Figure 2.2. The masks were random noise images consisting of black and white pixels. The images were created by randomly sampling a binary matrix of size 225 by 225, and then scaling the matrix to 550 by 550 using nearest-neighbor interpolation, just as in the target images. The contours embedded in one of the target images were one of five di↵erent complexity levels and were one of two lengths (110 pixels or 220 pixels). Each contour was displayed at 85 percent contrast. There were 40 images of each length and each complexity, resulting in a total of 40 ⇥ 2 ⇥ 5 = 400 trials. The luminance of the contour, image size, and contour lengths were chosen so that in a pilot study subjects performed near 75 percent correct. 18 Figure 2.2: An example of an image containing a contour. Here the contour is shown at maximum contrast (black) for the ease of the reader, however in the experiment the contour was at 85 percent contrast. This image is best viewed or printed at a zoom of 100 percent, otherwise the image may be distorted by pixel averaging or interpolation. 19 2.2.2 Results and Discussion Every subject showed some level of decrease in detection as the complexity of the stimuli increased (see Figure 2.3). Using a linear regression of the log odds of a correct answer to the complexity, five of ten subjects showed a significant (p < 0.05) detection decrease in the short line condition, and five of ten showed a decrease in the long line condition. All but two of the subjects showed a significant decrease in at least one of the length conditions. Since this e↵ect was present in each subject, the data from each subject was combined for analysis (Figure 2.4). There was a main e↵ect of complexity, F(4, 89) = 18.55, p = 4.07 ⇥ 10−11 , there was no main e↵ect of line length and no interaction at the p = 0.05 level. The slope of the best fitting regression line to the combined subjects’ data was 0.027 for the short line and 0.02 for the long line. These results show that our subjects are sensitive to the variability in the contours created using our generative model, as predicted by a Bayesian detection model. This model is indeed useful as the basis of a perceptually relevant complexity model. 2.3 Experiment 2 The complexity measure used in Experiment 1 is dependent upon integration across the entire contour, but it is possible that the detectability of a contour is only based on the local complexity at an individual contour point. The results from the first experiment could be explained by a model of detection that only looks for straight segments, and the contour is not integrated after a deviation from straight. Under this model the reason the results show an e↵ect of complexity is that the probability of having a long straight segment is higher in simple contours than in contours with a larger number of bends. To control for this possibility, Experiment 2 varies the spatial distribution of the complexity, placing 90 percent of the curvature in the first eleventh (the tip, third eleventh (between the tip and the middle), or fifth eleventh (the middle) of the contour. Because of this control the longest straight segments will be almost the same length in contours of di↵ering complexity. When the bending points, or complexity, is located at the tip the contour will have the longest possible straight segment, while the shortest 20 4 2 R : 0.87, p<0.023 R2: 0.62, p<0.113 Log Odds 3 Subject 1 4 3 Subject 2 R2: 0.31, p<0.327 2 R : 0.46, p<0.320 4 3 Subject 3 2 R : 0.92, p<0.010 R2: 0.61, p<0.118 4 3 Subject 4 R2: 0.76, p<0.055 R2: 0.90, p<0.014 2 2 2 2 1 1 1 1 0 −100−80 −60 −40 0 −100−80 −60 −40 0 −100−80 −60 −40 0 −100−80 −60 −40 4 2 R : 0.51, p<0.176 2 R : 0.80, p<0.040 Log Odds 3 Subject 5 4 3 Subject 6 2 R : 0.85, p<0.027 2 R : 0.87, p<0.023 4 3 Subject 7 2 R : 0.62, p<0.113 2 R : 0.64, p<0.107 4 3 Subject 8 R2: 1.00, p<0.001 2 R : 0.79, p<0.045 2 2 2 2 1 1 1 1 0 −100−80 −60 −40 0 −100−80 −60 −40 0 −100−80 −60 −40 Less More 0 −100−80 −60 −40 Less More 4 R2: 0.93, p<0.009 R2: 0.26, p<0.379 Log Odds 3 Subject 9 4 3 Subject 10 R2: 0.38, p<0.266 R2: 0.94, p<0.007 2 2 1 1 0 −100−80 −60 −40 Less More 0 −100−80 −60 −40 Less More Complexity Complexity Complexity Short Long Complexity Figure 2.3: For each subject the performance (log odds of a correct answer) is shown as a function of the complexity of the stimulus. The red points shows the detection when subjects were shown a long contour (220 pixels) and the blue points a short contour (110 pixels). A decrease in performance with increasing complexity was found for each individual and for each line length. The R2 of the best fitting regression line for the short contour (blue R2 ) and the long contour (red R2 ) are reported. Of the 20 linear fits, 10 have slopes significantly di↵erent from 0 at the p=0.05 level. 21 3 2 R : 0.56, p<0.001 2 R : 0.32, p<0.001 2.5 Log Odds 2 1.5 1 0.5 0 Short Long −100 −90 Less −70 Complexity −50 −40 More Figure 2.4: The performance (log odds of a correct answer) is shown as a function of the complexity of the stimulus for the subjects’ combined data. The red points show the detection when subjects were shown a long contour (220 pixels) and the blue points when shown a short contour (110 pixels). Error bars show one standard error. For both contour lengths there is a decrease in performance with increasing complexity. 22 segments would result if the complexity was located at the middle. The longest straight segment detection model would predict better detection if the bend location is at the tip compared to the condition when the bend location is at the middle. 2.3.1 Method Subjects Ten naive subjects participated for course credit as part of an introductory psychology course. The data of two subjects were excluded because they were at chance performance for all conditions; self reports at the end of the experiment suggested that these two subjects did not understand the task. Procedure and Stimuli Each trial was conducted in the same manner as Experiment 1. The only di↵erence in the stimulus was that the location of 90 percent of the surprisal was controlled to be at the first eleventh of the contour (the tip), the third eleventh of the contour (between the tip and the middle, or the fifth eleventh of the contour (the middle). There were 396 trials (4 di↵erent complexities and three locations, with 36 trials per crossed conditions). 2.3.2 Results and Discussion Complexity decreased detectability, just as in Experiment 1, in all conditions of all subjects except for 1 (Figure 2.5). Combining subject data (Figure 2.6) shows a significant main e↵ect of complexity, F(3, 42) = 18.99, p = 6.18 ⇥ 10−8 , and no main e↵ect of bend location F(2, 42) = 0.94, p = 0.397. The manipulation of the location of the complexity in the contour can be seen to have had no consistent e↵ect in the individual subjects when looking at percent correct versus the bend location (Figure 2.7). Combining subjects’ data (Figure 2.8) also shows no trend. The failure of the ANOVA to find a significant main e↵ect of bend location 23 3 Subject 1 3 Subject 2 3 Subject 3 3 2 2 2 1 1 1 1 Log Odds 2 0 −100 3 −50 Subject 5 0 −100 3 −50 Subject 6 0 −100 3 −50 Subject 7 0 −100 3 Subject 4 −50 Subject 8 Tip Between Middle 2 2 2 1 1 1 1 0 −100 −50 Complexity 0 −100 −50 Complexity 0 −100 −50 Complexity 0 −100 −50 Complexity Log Odds 2 Figure 2.5: For each subject the performance (log odds of a correct answer) is shown as a function of the complexity of the stimulus. The colors denote the di↵erent bend location conditions. Dashed lines show the best fitting regression lines. The decrease in detectability as complexity increased found in Experiment 1 was found in all conditions, for all subjects, except for one bend location condition in one subject. 24 3 Bend Location Tip Between Middle 2.5 2 2 Log Odds R : 0.74, p<0.142 R2: 0.89, p<0.059 1.5 R2: 0.66, p<0.190 1 0.5 0 −100 −90 Less Complexity −50 −40 More Figure 2.6: The performance (log odds of a correct answer) is shown as a function of the complexity of the stimulus for the subjects’ combined data. The e↵ect found in Experiment 1, a decrease in detection with performance, is again seen in this data. 25 Subject 1 Log Odds 2 Subject 3 2 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 2 T B M Subject 5 0 2 T B M Subject 6 0 2 T B M Subject 7 0 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 T B M Bend Location 0 T B M Bend Location 0 T B M Bend Location T 0 B M Subject 8 2 1.5 0 Subject 4 2 1.5 0 Log Odds Subject 2 2 T B M Bend Location Figure 2.7: For each subject the performance (log odds of a correct answer) is shown as a function of the bend location (location of the majority of the curvature). The di↵erent bend locations are denoted by “T” (tip), “B” (between the tip and middle), and “M” (Middle). There was no consistent decrease in detectability as the location was varied across subjects. was not a result of finding a weak trend in the correct direction with too little data to find significance, but that there is no trend in the predicted direction. These data suggest that the e↵ect of complexity found in Experiment 1, and replicated in Experiment 2, is not due solely to the spatial distribution of the complexity throughout the contour. Humans are not simply straight segment detectors; the complexity of the entire contour plays a role in the detectability of that contour. 2.4 General discussion Contour curvature is well known to influence human detection of contours (Field et al., 1993; Hess & Field, 1995; Pettet et al., 1998; Hess & Field, 1999; Field et al., 2000; 26 2 Log Odds 1.5 1 0.5 0 Tip Between Bend Location Middle Figure 2.8: The performance (log odds of a correct answer) is shown as a function of the location of complexity of the stimulus for all of the subjects’ data combined. There appears to be no e↵ect of the location of the complexity. 27 Geisler et al., 2001; Yuille et al., 2004). But exactly why curvature has this e↵ect is still poorly understood, as reflected in the lack of mathematical models that can adequately capture it. In this paper we have modeled contour detection as a statistical decision problem, showing that a few simple assumptions about the statistical properties of smooth contours make fairly strong predictions about the performance of a rational contour detection mechanism, specifically that its performance will be impaired by contour complexity. Contour complexity, in turn, involves contour curvature as a critical term (the only term that reflects the shape of the contour), thus giving a very concrete quantification of the e↵ects of contour curvature on detection performance. The data strongly corroborates the e↵ect of complexity on contour detection, showing strong complexity e↵ects in almost every subject. Moreover, the spatial distribution of the complexity does not seem to determine performance, suggesting that a purely local complexity measure (such as the information/complexity defined at each contour point) will not suffice to explain performance, and a holistic measure of contour complexity (such as contour description length) is required. A more detailed consideration of the statistical decision problem suggests that our observers are far from optimal. To see why, consider the contour detection problem reduced to a simple Bernoulli problem. The target contour consists of a series of turns through the pixel grid, each of which can be classified as either a straight continuation (a success or “head”) or a non-straight turns (a failure or “tail”). Smooth contours have a strong bias towards straight continuations (successes), translating to a high success probability, the exact value of which depends on the spread parameter β in the von Mises distribution that generates the turning angles (see Eq. 2.1), which in turn depends on the condition. In contrast, each distractor path consists of a series of random turns in which the probability of straight continuation is ✏ (= 1/3, as explained above; see Fig. 2.1). The observer’s goal is to determine, based on the sample of turns observed, whether the sample was generated from the smooth process (i.e., a target display) or the null process (distractor display). Because successive turning angles are assumed independent, this means that a smooth curve consists, in e↵ect, of sample of N (=220 or 110 on long and short trials respectively) draws from a Bernoulli process 28 with high success probability (0.985 or 0.964 for short and long contours respectively), while noise consists of a sample with success probability 1/3. To make a decision in these experiments, the observer must consider the number of straight continuations observed in the smoothest contour in each display, and decide which stimulus interval was more likely to contain a sample generated by a process with high success probability (again, with the level dependent on condition). This is a (forced choice version of a) standard Bernoulli Bayesian decision problem (see for example Lee, 2004). Given the values of N, β and ✏ used in the experiment, this is actually an easy decision; an optimal observer with full information about the pixel content of the images in both intervals would be at ceiling in all conditions, since even the highest DL conditions have a success probability much higher than 1/3. The perfect performance predicted by this optimal observer is similar to the performance expected under association field style models. When the association field has complete access to the pixels in the image it would integrate the contour on every trial. A decision procedure that simply chooses the image containing the containing the longest of the contours found by the association field would perform perfectly. with no e↵ect of complexity. Obviously, our observers do not have full access to the pixel content of both images, which contain far more information than can be fully acquired under our speeded and masked experimental conditions. But no simple assumptions about the restricted availability of image content in our displays predicts the particular combination of sub-ceiling performance with complexity e↵ect that we observed. (For example, one might assume that subjects only apprehend a part of the target contour, which would account for sub-ceiling performance, but then this model would expect a smaller complexity e↵ect than is actually observed.) That is, the observed complexity e↵ect, thought it is predicted in a general way by Bayesian arguments as explained in the introduction, is actually even stronger than simple optimal models would predict. For a more detailed discussion of the di↵erent observer models that failed to explain these results see the appendix. I conclude, in short, that our observers are suboptimal 29 due to some combination of performance limitations that will need to be explored in future studies. It is important to understand that many conventional models of object detection from the computer vision literature cannot locate the objects in our displays at all. Many of the standard edge detection algorithms rely on finding strong luminance discontinuities in the image. For example, the Marr-Hildreth algorithm uses Laplacian of Gaussian (LOG) filters, due to their similarity to response properties of cells in LGN (Marr & Hildreth, 1980). These filters fail to increase their responses near the contours in both experiments because the filters need a di↵erence between the luminance in the center from the luminance in the surround. In our stimuli there is noise everywhere and our contours have the same luminance as many pixels that are actually part of the noise. Other edge detection algorithms, such as the Canny edge detector, the Hough Transform, and the Burns line finder also look for high magnitude image gradients, which are not present in our images (Canny, 1986; Hough, 1962; Duda & Hart, 1972; Burns, Hanson, & Riseman, 1986). These methods also frequently smooth the input image prior to detecting edges. This causes the contour to blend in to the noise making any image gradient substantially weaker. Edges in images result in a strong component in the power spectrum perpendicular to the edge in the image. One possible explanation for our e↵ect is that the power spectrum of images containing straighter contours is more distinguishable from the power spectrum of noise images than the power spectrum in images with curved contours. This turns out not to be the case. The power spectrum of the images used in the experiments reveals that there is energy at all frequencies, not just predominantly at low spatial frequencies as is common in natural images. This suggests that the reason standard edge-detection algorithms are unable to detect the contours is because our images are of a di↵erent structure than present in the natural images for which these algorithms were developed. However, our subjects were able to accurately detect the contours, suggesting that the human visual system is doing something di↵erent than these algorithms. 30 Hence at the one extreme, conventional object detectors cannot perform the task, while at the other extreme, probabilistic models with access to the full pixel content, as well as the association field model, would be at ceiling (and not show any complexity e↵ects). Human observers are in between these extremes in that they are capable of finding the target objects, but their performance diminishes with increasing contour complexity. 2.5 Conclusion Here I have framed the contour detection problem as a Bayesian inference problem. I defined a simple generative model for contours, in which successive turning angles are generated from a straight-centered distribution. The observer’s task then reduces to a probabilistic inference problem in which the goal is to decide which of the two stimulus images was more likely to contain a contour sample drawn from this generative model. This framing of the problem predicts and explains key aspects of the empirical results, in particular the influence of contour complexity on detection. Moreover, we found that the spatial distribution of complexity appears not to matter substantially; the deciding factor is the smoothness of the entire target taken as a whole. Both of these results are not predicted by standard theories of contour integration (Field et al., 1993; Geisler & Super, 2000; Geisler et al., 2001), which would always correctly detect the contour. More fundamentally, the Bayesian model makes it clear why a diminution in performance as curvature increases “makes sense”—it is a direct consequence of a very general contour model and a rational detection mechanism. As argued in the Introduction of this chapter, the problem of contour detection is really just a special case of the more general problem of the detection of pattern structure in noise. As shown above, a very general model of the probabilistic detection of regular structure in noise entails an influence of target complexity, defined from an information-theoretic point of view as the negative log of the stimulus probability under the generative model, that is the Shannon complexity or Description Length (DL) of the pattern. 31 The ubiquitous influence of simplicity biases and complexity e↵ects in perception and cognition more generally (Chater, 2000; Feldman, 2000) can be seen as a consequence of the general tendency for simple patterns to be more readily distinguishable from noise than complex ones (Feldman, 2004). In past studies, contour detection has usually been studied as a special problem unto itself whose properties derived from characteristics of visual cortex. Instead I hope that in the future the problem of contour detection can be treated as a special case of a broader class of pattern detection problems, all of which can be studied in a common mathematical framework in which they di↵er only in details of the entailed generative models. This observation opens the door to a much broader investigation of perceptual detection problems encompassing detection of patterns and processes well beyond simple contours, such as closed contours and whole objects. 2.6 Open vs Closed Contours This chapter showed that a Bayesian model of contour detection linked contour complexity to performance, giving a principled reason why the more curved a contour is the more difficult it will be to detect. The experimental data also suggested that our ability to detect a contour depends on the entire contour, rather a small local region around contour points. In Chapter 3, I will show that the contour complexity measure defined in this chapter easily extends to closed contours. Simpler models of contour complexity, for example the sum of total curvature, do not easily extend to closed contours. In addition to looking at the complexity of the closed contours, a closed contour detection task allows for the examination of other global e↵ects that are not present in open contours stimuli. Specifically, I will consider the e↵ect of global shape representations on detection. Chapter 3 will introduce two models of complexity based on the MAP Skeleton (Feldman & Singh, 2006). The MAP skeleton is an attractive choice due to its success in modeling multiple perceptual phenomena. The MAP skeleton representation has 32 been shown to be useful when modeling human shape classification (Wilder, Feldman, & Singh, 2011), shape similarity (Briscoe, 2008), figure/ground interpretation (Froyen, Feldman, & Singh, 2010), and part segmentation (Feldman, Singh, Briscoe, Froyen, Kim, & Wilder, 2013). The experiments on closed contour detection will measure if the global aspects of shape captured by the MAP skeleton also influence detection. 33 3. Closed Contours Detecting objects in our environments is an important function of the human visual system. In order to navigate around obstacles or be able to categorize an object we must first detect the objects. How humans detect edges in an image and how those edge elements are integrated into contours, as well as the structure of contours in natural images has been extensively studied (Field et al., 1993; Geisler & Super, 2000). From this literature we know that the natural open contours humans easily perceive tend to be straight, and as curvature increases the probability of the contour decreases (Elder & Goldberg, 2002; Geisler et al., 2001; Ren & Malik, 2002). We also know that the visual system is sensitive to the structure to natural contours (Field, 1987). Natural contours tend to be straight, and if the visual system is really sensitive to the structure of natural contours, large curvature discontinuities should disrupt detection and human detection of natural contours should be close to ideal, as shown in Pettet et al. (1998), Yuille et al. (2004), and in Chapter 2 of this thesis. The detection of closed contours (contours that meet at their ends) is less well studied. It is unknown how similar the processes for the detection of closed contours are to the processes for detecting open contours. Finding the statistical properties of natural open contours was motivated by the idea that the evolution of our visual system was driven by those statistical properties (Geisler et al., 2001). Our evolution is likely driven, not just by the statistics of the natural environment, but by the statistics of the environment that are relevant for our survival and every day behavior. Unlike the detection of open contours, the detection of closed contours is a problem the visual system encounters continuously because they constitute the boundaries of objects. Detecting an object is a necessary first step for important visual tasks like object recognition and navigating around objects. Previous work suggests that closure has a profound e↵ect on the perception of a contour. A contour that closes not only defines the contour, but also defines a bounded interior region (Ko↵ka, 1935). When separate elements of a sequence of Gabor patches have weaker grouping signals (based on distance, curvature, orientation...) they are 34 less likely to be integrated into a contour (Field et al., 1993). However, when the contour closes on itself, even with weaker grouping signals, the elements are integrated into a whole (Kovacs & Julesz, 1993; Braun, 1999; Mathes & Fahle, 2007). Tversky et al. (2004) suggests that the improvement in the detection of closed contours can simply be explained by the transitivity rule of Geisler & Super (2000), but it is not clear how transitivity would explain other benefits conferred on closed contours. Judgments about the aspect ratio of a contour are more accurate when the contour is closed than when it is open (Saarinen & Levi, 1999). A closed contour is processed more efficiently than an open one (Elder & Zucker, 1993). Closed contours are encoded more accurately than open contours which leads to an improvement in remembering and recognizing closed contours (Garrigan, 2012). Also, Altmann, Bültho↵, & Kourtzi (2003) found that the visual system gave special treatment to a chain of elements that could be perceived as a shape (i.e. a closed contour) over individual elements or an open contour. Similar findings to Altmann et al. (2003) suggest that the visual system’s preference for closed contours is driven by a preference for explanations of visual data that involve fewer objects instead of an explanation where each element is a single object (Murray, Kersten, Olshausen, Schrater, & Woods, 2002; Murray, Schrater, & Kersten, 2004; Fang, Kersten, & Murray, 2008; He, Kersten, & Fang, 2012). The human visual system’s detection of closed contours has not been as extensively studied as the detection of open contours, but the detection of closed contours by artificial systems has been extensively studied. This work has been predominantly motivated by applications, for example in medical imaging analysis where it can be used as a way to analyze the shape of the interior of arteries and the heart ventricles (Guttman, McVeigh, & Prince, 1992; Guttman, Prince, & McVeigh, 1994), or detect the boundaries of blood vessels (Yuan, Lin, Millard, & Hwang, 1999). Kass, Witkin, & Terzopoulos (1988) created the active contour model, also called the snake algorithm, which has been successfully used to detect closed contours. While active contours often work with an edge map output by standard edge detection algorithms, newer algorithms for active contours work directly on the original image without an edge detection step. For example, the edgeless active contour algorithm of Chan & Vese (1999, 35 2001) has found success at finding contours on a variety of images, even very noisy images. Additionally, this algorithm can be used to find the boundaries between fuzzy objects without edges, or where the boundaries are not defined by image gradients. This is potentially useful, as the experiments described below use images of random pixel noise with a closed contour embedded in the noise, similar to the images in open contour experiments in Chapter 2. Standard edge detection algorithms failed to detect the contours in these images, while human observers easily detected the contours. This study investigates the detection of closed contours in noise. Subjects will be shown a sequence of pixels noise images. One image will contain a closed contour, and the other will not. The subject must decide which image contained the contour. In Experiment 3, subjects will be shown natural contours in order to test if the visual system is sensitive to variations in complexity that are found in the natural environment. Experiment 4 will use experimentally manipulated contours to ensure that our results from the first experiment generalize to shapes outside of the databases used in Experiment 3. The results of these experiments will help uncover the shape and contour properties in the images that led to some contours being easily detected and others to go unnoticed. In order to predict how detectable a shape is I define three complexity measures which arise naturally from probabilistic models of contour and shape generation. The first measure is contour complexity, defined similarly to the complexity measure in Chapter 2 with a slight modification to make it suitable for closed contours. The second and third measures reflect the complexity of the bounded interior region; shape skeleton complexity, which is the log of the prior probability of the skeleton, and shape/skeleton misfit complexity, which is the likelihood of the shape given the skeleton. Natural shapes are used in Experiment 3 in order to measure the sensitivity of human observers to the contour/shape complexities that are present in the natural environment. Natural shapes, however, may not sample the complexity space evenly. Experiment 4 uses experimentally manipulated closed contours in order to evenly sample the space of complexity. Standard models of contours integration 36 (Field et al., 1993; Geisler & Super, 2000) would integrate the contours in these stimuli with high probability, resulting in ceiling performance with no e↵ect of any type of complexity, and do not explicitly integrate the probability of the entire contour and transform this integrated probability to a complexity measure. These experiments will investigate changes in the detectability of contours all of which have a high probability of being integrated at each point, and how the detection of these contours relates to their complexity. Focusing on the three types of complexity will allow for concrete recommendations about the types of information that should be included in contour integration models if they are to make quantitative predictions about contour detection performance. 3.1 Experiment 3: Natural Shapes Previously, human observers have been shown to be sensitive to the structure of natural shapes in a categorization task (Wilder et al., 2011) and the structure of natural-like open contours in a detection task (Chapter 2). This study investigates the sensitivity of human observers to the structure of natural shapes and contours in a detection task. In this experiment subjects will detect the closed contours of natural shapes that are embedded in a noise field. Closed contours can appear in an image displayed in one of two intervals and the subject will respond, with a key-press, which interval contained the image with the contour. 3.1.1 Method Subjects Fifteen naive subjects participated in the experiment as part of course credit for an introductory Psychology course. 37 Figure 3.1: Examples of the natural shapes, from which bounding contours would be extracted and embedded in noisy images. The top row (a) shows a sample of the animal shapes, and the bottom row (b) shows a sample of leaf shapes. Stimuli Stimuli were displayed on a gamma corrected iMac computer running Mac OS X 10.4 and Matlab 7.5, using the Psychophysics toolbox (Brainard, 1997; Pelli, 1997). There were two types of experimental images, one noise image, and one image with a contour embedded in noise. The contours were the bounding contours of natural leaf or animal shapes. Example shapes are shown in figure 3.1. Each image was a 225 by 225 matrix of intensity values. The intensities were between 0 and 1 and were sampled from a uniform distribution. Next the image was increased in size, to 550 by 550 pixels, using nearest-neighbor interpolation so that at the location of each pixel in the original image is now a two-by-two pixel area. A contour was embedded at a random location within the image and at a random orientation in one of the images. The contour was taken from a database of animal shapes or a database of leaf shapes. Each database had an equal probability of being sampled, and each image within those databases had an equal probability of 38 being sampled. The bounding contour of the sampled shape was extracted and scaled so that the contour would be 500 ± 25 pixels long. This was done so that embedding di↵erent shapes in the noise fields would not result in images with a di↵erence mean luminance. Once the contour was properly sized it was dilated using the dilation mask ([010; 110; 000]). The dilation was performed so that a perfectly straight contour oriented horizontally or vertically was detected as easily as one oriented at an arbitrary angle. The dilation mask was tested in a pilot study involving the experimenter. Finally, the contour was embedded in the image with an intensity of 0.22, slightly darker than mean luminance, 0.5. Because the noise images were doubled in size the dilated contours match the scale of the noise in the images. A sample image containing a leaf shape can be seen in figure 3.2. The noise images are identical but without containing the contour. The experimental images were interleaved with masking images. The masks were noise images consisting of random black and white pixels. The masks were created by randomly sampling a binary matrix of size 225 by 225, and then scaling the matrix to 550 by 550 using nearest-neighbor interpolation, just as in the display images. Design and Procedure Each subject participated in 300 trials, with breaks after trial 100 and 200. Each trial began with a central fixation mark. After fixating the subject pressed a key to begin the trial. A sequence of five images was rapidly presented to the subject. The first, third, and fifth images were the masks. The second and fourth images were the experimental images, one noise image, and one containing a contour. The task was a 2IFC task. With a key press the subjects’ decided if a contour was present in the first or second of the two experimental images, and responded by a key press. Each mask and experimental image was presented for 500 ms, with a 100 ms gap between images. Images were presented at about 6.5 degrees of visual angle. Each image measured 15 cm x 15 cm on the display. Head position was not constrained, but the head was initially placed at 66 cm from the display and subjects were instructed to remain 39 Figure 3.2: An example stimulus image containing a contour. A leaf has been embedded in the upper middle of the image. This example shows the contour at the lowest luminance (0 out of 1) for the ease of the reader, though in the actual experiment the images were much closer to the mean luminance (0.22 out of 1). Also, the pixel noise in this image is in a restricted range by not sampling dark luminance noise pixels. In the actual experiment the luminance of the noise pixels were sampled uniformly between black (0) and white (1), making the detection task more difficult. See Figure 2.2 to see an example image with the full range of luminances for the noise pixels. This image is best viewed or printed at a zoom of 100 percent, otherwise the image will be distorted due to pixel averaging or interpolation. 40 as still as possible. If necessary, during three breaks, the head was repositioned to be 66 cm from the display. 3.1.2 Results The majority of the subjects were able to perform the task very well. Two of the 15 subjects were excluded from analysis because they performed below 55 percent correct. The remaining 13 subjects performed between 65 and 90.3 percent correct, with a mean of 77.7 percent correct. Subjects were significantly better (t = 2.96, p=0.0068) at detecting leaf shapes (82.8 percent correct) than animals shapes (73 percent correct). All 13 subjects had a higher proportion of correct answers for leaves than for animals; this trend was significant for 8 of the 13 subjects using a two-proportion z-test with ↵ = 0.5. Beyond these basic analyses we would like to know what properties of leaf shapes made them more detectable than animal shapes when embedded in pixel noise. Are there properties of their contours and shapes that a↵ect how easily each contour is detected? In order to fully understand the e↵ect of the contour and shape the next section will define a measure of contour complexity (the same as the contour complexity measure in Chapter 2 but for closed contours), and the following section will define two measures of shape complexity. Following the explanation of the complexity measures the contribution of each of the di↵erent complexities will be analyzed. 3.2 3.2.1 Modeling Contour Complexity In Chapter 1, I showed that a subject’s ability to detect a contour in a noise field is influenced by a measure of contour complexity based upon the integrated information along the contour (Feldman & Singh, 2005). Here I extend this model to closed contours. Again I express the the detection problem as a decision problem in which the observer must decide if the image data is 41 likely due to the existence of a contour or the existence of complete noise. First, let us begin with the generative model for contours used in Chapter 1. In this context contours are thought of as a sequence of turning angles. The generative model assumes that each turning angle in the sequence is an independent samples from a von Mises distribution. A von Mises is a neutral choice, because it is like a Gaussian, in that it is the maximum entropy distribution, but, unlike a Gaussian, it is only defined from −⇡ to ⇡, making it a natural choice for angles. The distribution has two parameters, one representing the center of the distribution and one for the spread of the distribution. For the derivations that follow, it will be convenient to use a Gaussian distribution, which is justifiable because they are mathematically very similar (the initial two terms of their Taylor Series expansions are equivalent). Under this generative model the probability of each angle is p(↵|CONTOUR) = (µ − ↵)2 1 p exp − 2σ2 σ 2⇡ (3.1) Since each contour point is independently sampled, the probability of the entire sequence of turning angles (of length N) under the contour hypothesis is the product of the probability of each angle: Lc = p([↵i ]N i=1 |CONTOUR) 1 = p σ 2⇡ !N = N Y i=1 p(↵i |CONTOUR) N h i 1 X exp − 2 (µ − ↵i )2 2σ i=1 (3.2) Now consider seeing a sequence of turning angles under the hypothesis that this sequence is simply noise. Under the noise assumption there is no reason to expect one turning angle over another, so the probability of each turning angle is set uniformly. Since each turning angle is sampled independently, the probability of the entire sequence is the product of the individual probabilities. In this case, since all angles have the same probability, the result is a constant probability to the Nth power. 42 N L0 = p([↵i ]N i=1 |NOISE) = ✏ (3.3) We can now answer the question if N p(CONTOUR|[↵i ]N i=1 ) > p(NOISE|[↵i ]i=1 ) (3.4) Since the probability of a contour and the probability of noise are equal in our experiment this simply becomes a comparison of the likelihoods. Lc P(CONTOUR) > L0 P(NOISE) Lc > L0 (3.5) (3.6) An observer will decide “Contour” is LC is greater than L0 . By moving the likelihoods to one side of the equation to get the weight of evidence (WOE) and then taking the logs, the decision rules changes so that the observer will decide “Contour” if the log likelihood ratio (w) is larger than 0, and otherwise decide “Noise”. w = log (Lc /L0 ) = log Lc − log L0 (3.7) Shannon (1948) defines Description Length (DL) as DL = − log p(M) (3.8) where M is a quantity being measured, and p(M) is the probability of obtaining that measurement. In our decision problem the measurements are sequences of turning . Using our generative model for contours (Eqn. 3.2) we see that angles, so M = [↵i ]N i=1 43 the complexity, or DL of a contour is DL(CONTOUR) = − log p([↵i ]N i=1 |CONTOUR) = − log LC (3.9) Note the notation used for this complexity measure, DL(CONTOUR), which is di↵erent than in the previous chapter. This notation will be used to distinguish this complexity measure from the shape complexity measures defined later in this chapter. Our decision variable can then be expressed as w = log Lc − log L0 = −DL(CONTOUR) − log L0 (3.10) As the DL of the contour increase, w will decrease, which means an observer will not perceive the sequence of angles as a contour, but instead will perceive it as noise. To understand the e↵ect of curvature more explicitly, plug Lc (Eqn. 3.2) into the DL equation (Eqn. 3.9) N p 1 X (µ − ↵)2 DL(CONTOUR) = − log LC = N ln σ 2⇡ + 2 2σ i=1 (3.11) Plugging Eqn. 3.11 into Eqn. 3.10 leads to a more understandable form of the decision variable w = −DL(CONTOUR) − log L0 N p 1 X (µ − ↵)2 − N ln ✏ = −N ln σ 2⇡ − 2 2σ i=1 = −c1 − c2 / − N X i=1 N X i=1 (µ − ↵)2 − c3 (µ − ↵)2 + c4 (3.12) (3.13) (3.14) (3.15) The decision variable is proportional to the negative of the squared di↵erence of each observed angle to that of the most likely angle plus a constant, c4 , where c1 , c2 , c3 44 and c4 are simply constants that do not depend on the turning angle. For open contours the most likely turning angle is 0, so the decision variable is a function of the total squared curvature, as was shown in Chapter 2. As a contour deviates from straight, it will be more difficult to detect. For closed contours, the expected turning angle is µ = 2⇡/N, where N is the number of angles in the sequence. This is because the sum of the exterior angles of any closed figure, that does not cross over itself, must equal 2⇡ radians (360 degrees). It is intuitive that the expected turning angle must not be zero, as in the open contour case, because a closed contour must close. Having a non-zero center for the distribution of turning angles biases the contour towards closure. The decision variable will decrease as the total deviation from the expected turning angle increases. This leads to the prediction that a closed contour shaped like a circle will be easily detected, but as the contour becomes more dissimilar to a circle it will become more difficult to detect. As in the open contour experiments, performance is compared to DL(CONTOUR). Figure 3.3 shows a weak trend in the predicted direction. Using a 2-AFC Logistic Regression reveals DL(CONTOUR) is not a significant predictor of performance, p = 0.487. This trend was present in about half of the subjects (see Fig. 3.4). The complexity of the contour conveys at most a small influence on detection performance. In what follows, I propose two shape complexity factors that will involve more global information than is present in contour complexity. The complexity measures will then be compared to detection performance. 3.2.2 Shape Complexity Measures As discussed earlier, a closed contour is much more than a contour with closure. The visual system treats closed contours di↵erently than open ones. Previous work also shows that the visual system is sensitive to the statistical properties of natural shapes, and specifically to the statistical properties of the shape skeleton (Wilder et al., 2011). They showed that the classification of novel shapes was well predicted by a classifier trained on the statistical properties of the MAP skeletons 45 2 1.8 Log Odds 1.6 1.4 1.2 1 0.8 560 580 600 620 DL(CONTOUR) 640 660 Figure 3.3: Performance (log odds of a correct answer) is shown as a function of DL(CONTOUR) for the combined subjects’ data. Error bars show one standard error. The data is noisy, but shows the trend in the predicted direction; detection was best for the simplest contours and worse for higher complexities. 46 Log Odds 2.5 2 4 1.5 3 1 2 3 2.5 1.5 2 1 0.5 550 600 650 Log Odds 4 3 0.5 550 600 650 1 550 1.5 2 1 1.5 1.5 600 650 600 650 600 650 600 650 1 0.5 0.5 1 600 650 1.8 0 550 1 600 650 3 1.6 2 1.4 1 1.2 1 550 1 550 1.5 2 0 550 Log Odds 2 600 650 600 650 0 550 600 650 DL(CONTOUR) 0.5 550 0 600 650 −0.5 550 2.5 5 2 4 1.5 3 1 2 0.5 550 600 650 DL(CONTOUR) 1 550 DL(CONTOUR) Log Odds 1 0.5 0 550 DL(CONTOUR) Figure 3.4: Performance (log odds of a correct answer) is shown as a function of DL(CONTOUR) for each individual subject. Again the data is noisy; with half of the subjects showing a trend in the predicted direction. 47 of natural shapes. In a detection task, is the visual system sensitive to some of those same properties? The MAP skeleton is a Bayesian estimate of the shape skeleton that tries to find the shape skeleton that is most likely given a shape (Feldman & Singh, 2006). The MAP skeleton builds upon a long line of work on skeletal representations of shape, and the visual systems’ sensitivity to points of symmetry in a shape. The Medial-Axis Transform (MAT) was proposed in Blum (1973), as a way to represent biological shapes. Since then the symmetric axis has been shown to have psychological relevance (Kovacs & Julesz, 1994; Psotka, 1978; Lescroart & Biederman, 2012; Hung, Carlson, & Connor, 2012) and several variations on the representation have been proposed (Siddiqi, Shokoufandeh, Dickenson, & Zucker, 1998; August, Siddiqi, & Zucker, 1999; Leymarie, Kimia, & Giblin, 2004) to solve di↵erent problems in vision. The MAP skeleton representation captures the type of shape information that is perceptually relevant and can be used to solve multiple problems in vision (Feldman et al., 2013). The MAP skeleton can be used for simultaneous image segmentation and figure/ground organization (Froyen et al., 2010), it can predict where figure/ground reversals occur (Feldman et al., 2013), it can be used to segment objects into parts (Feldman et al., 2013), it can predict human shape similarity judgments (Briscoe, 2008), and it can be used for object classification (Wilder et al., 2011). In addition to its functional uses in solving vision problems, the MAP skeleton is less sensitive to contour noise than the MAT skeleton. The MAP skeleton achieves this by balancing how likely a shape is given its skeleton (the likelihood) and the complexity of the skeleton (the prior). If we think of the points along the contour as being generated by points on the skeleton, the MAP skeleton finds the simplest skeleton that would be likely to generate that set of contour points. A graphical depiction of candidate skeletons and the MAP skeleton for a shape can be seen in figure 3.5. The details of the algorithm can be found in Feldman & Singh (2006). Just as with our measure of contour complexity, DL(CONTOUR), Shannon complexity, DL(M) = − log p(M), will be used as a complexity measure for shapes. The 48 Figure 3.5: On the left is a bear shape with a simple skeleton (high prior probability). This skeleton would not be likely to generate the bear shape (low likelihood of the shape given the skeleton). On the right there is a bear shape with a complex skeleton (low prior probability) that is very likely to generate the bear shape (high likelihood of the shape given the skeleton). In the middle is a bear shape with the MAP skeleton. The skeleton is a balance of skeletal complexity (medium prior probability) and likelihood of generating the shape (medium likelihood) that fits “just right”. This figure is a modified version of figure 17 from Wilder et al. (2011). shape complexity models simply use a di↵erent probability model for p(M). Two description lengths will be investigated: DL(SKELETON), which looks at the complexity of the MAP skeleton, and DL(MISFIT) which looks at how well a skeleton fits a shape. The description length of the skeleton is captured by the prior probability in the MAP skeleton computation. DL(SKELETON) = − log p(SKELETON) Skeleton DL increase with the number of skeletal axes and the curvature of the axes. Intuitively this can be thought of as the number of parts in the shape and the curvature of those parts. The second type of shape complexity considered is “Misfit” complexity. Misfit complexity is a measure of how well the shape and skeleton fit together. Specifically, it reflects how likely the contour is as the result of a process where the MAP skeleton is 49 inflated to grow a shape; this is the likelihood term of the MAP skeleton computation. DL(MISFIT) = − log p(SHAPE|SKELETON) If the shape is thought of as being grown from the skeleton, each contour point is generated by a point on the skeleton. For each skeletal branch there is a probability distribution over the distance between the contour points generated by that skeletal branch and the skeleton itself. If many of the contour points are close to the mean of that distribution, then that part of the shape is very likely given the skeleton. If the contour is very jagged around the skeleton, and many contour points are far from the mean, then the shape has a low probability given the skeleton. Additionally the angle between the skeleton and the line connecting a contour point and the point on the skeleton that generated it (called a “rib”) a↵ects the DL. The angle between the skeleton and the rib is distributed as a von Mises distribution centered at 90 degrees, so deviations from 90 degrees increase DL(MISFIT). A graphical depiction of both shape complexity measures, DL(SKELETON) and DL(MISFIT), can be seen in figure 3.6. Now that the two shape complexities are defined, they can be compared to subjects’ detection performance. The independent e↵ect of DL(MISFIT) on detection is similar to the independent e↵ect of DL(CONTOUR): weak but in the predicted direction. Figure 3.7 shows the relationship between detection and DL(MISFIT). Performance tended to decrease with an increase in DL(MISFIT) but the trend is non-significant (using a 2-AFC Logistic Regression, DL(MISFIT) is not a significant predictor of performance, p = 0.403). The decreasing trend was present in about half of the subjects (see Fig. 3.8). A slightly clearer e↵ect on detection is found when looking at DL(SKELETON). Figure 3.9 shows a slightly larger decrease in detection with an increase in complexity. The 2-AFC logistic regression reveals that DL(SKELETON) is a significant predictor of performance (t = −4.555, p = 0.0000054). All but one of the individual subjects showed 50 (a) p(SKEL) DLprior (SKELETON) high prior Low Complexity low prior High Complexity (b) DLlikelihood (MISFIT) p(SHAPE|SKEL) Skeleton axis Rib direction error density p(f) Rib length error density p(e) A xi s no rm al Shape Shape point x Rib length error ex Rib direction error fx Ribs Figure 3.6: A graphical depiction of DL(SKELETON) (top) and DL(MISFIT) (bottom). Straight, single axes skeletons have low DL(SKELETON), while multi-axes skeletons with high curvature have high DL(SKELETON). DL(MISFIT) is a↵ected by probability of each rib length, as well as the angle at which the rib is sprouted from the skeleton. This figure is a modified version of a figure (fig. 17) originally appearing in Wilder et al. (2011). 51 2 1.8 Log Odds 1.6 1.4 1.2 1 0.8 −5 0 5 10 DL(MISFIT) 15 20 8 x 10 Figure 3.7: Performance (log odds of a correct answer) is shown as a function of DL(MISFIT) for the combined subjects’ data. Error bars show one standard error. The data is noisy, but shows the trend in the predicted direction; detection was best for the lowest DL(MISFIT) and worse for higher complexities. Log Odds 52 2 2 4 4 1.5 1.5 3 3 1 1 2 2 Log Odds 0.5 0 2 9 x 10 0.5 0 1 2 9 x 10 1 0 1 2 9 x 10 1 0 2 1.5 2.5 2 1.5 1 2 1 1 0.5 1.5 0 0.5 0 Log Odds 1 1 2 9 x 10 0 0 1 2 9 x 10 1 0 2 4 9 x 10 −1 0 2 2.5 2.5 3 1.5 2 2 2.5 1 1.5 1.5 2 0.5 0 1 2 9 x 10 1 0 1 9 x 10 DL(MISFIT) 2 1 0 1 9 x 10 DL(MISFIT) 2 1.5 0 1 2 1 2 9 x 10 4 9 x 10 2 9 x 10 DL(MISFIT) Log Odds 1.5 1 0.5 0 0 1 2 9 x 10 DL(MISFIT) Figure 3.8: Performance (log odds of a correct answer) is shown as a function of DL(MISFIT) for each individual subject. Again the data is noisy; and, as with DL(CONTOUR), roughly half of the subjects shows a trend in the predicted direction. 53 2 1.8 Log Odds 1.6 1.4 1.2 1 0.8 0 500 1000 1500 2000 DL(SKELETON) 2500 3000 Figure 3.9: Performance (log odds of a correct answer) is shown as a function of DL(SKELETON) for the combined subjects’ data. Error bars show one standard error. The data is noisy, but shows the trend in the predicted direction; detection was best for the simplest skeletons and worse for higher complexity skeletons. The magnitude of the performance decrement is larger than for either DL(CONTOUR) (Fig. 3.3) or DL(MISFIT) (Fig. 3.7). a decrease in detection as DL(SKELETON) increased. The above analyses test for the e↵ect of the three di↵erent types of complexity in isolation. An increase in either DL(CONTOUR) and DL(MISFIT) tended to cause a decrease in performance, though not to a significant level. Performance did significantly decrease as DL(SKELETON) increased. These findings reflect the e↵ects of each factor separately. To understand how these three factors might interact, I next enter all there factors into 2-AFC logistic regression model using a stepwise function that includes the factors and interactions that increased proportion of explained variance, while penalizing for model complexity using the Akaike-Information Criterion (AIC). The analysis revealed that both DL(CONTOUR) and DL(SKELETON) and there interaction were part of the winning (lowest AIC) model, meaning that both factors Log Odds 54 2 2 1.5 1.5 1 2 1 2000 4000 2 Log Odds 2.5 2 0.5 0 0.5 0 1.5 2000 4000 1.5 1.5 1.5 0 2000 4000 2.5 1 0 2000 4000 2000 4000 2000 4000 2 2 1 1 1 1.5 0.5 0.5 0 0 Log Odds 2.5 2000 4000 0 0 2 2.5 1.5 2 0 1 2000 4000 0.5 0 2000 4000 2.5 −1 0 4 2 3 1.5 1 0.5 0 1.5 2000 4000 2000 4000 1 0 2 1 2000 4000 DL(SKELETON) 0.5 0 2000 4000 DL(SKELETON) 1 0 DL(SKELETON) Log Odds 1.5 1 0.5 0 0 DL(SKELETON) Figure 3.10: Performance (log odds of a correct answer) is shown as a function of DL(SKELETON) for each individual subject. This data, while still noisy, is much more consistent than for either DL(CONTOUR) (Fig. 3.4) or DL(MISFIT) (Fig. 3.8). All but one individual showed a decreasing trend, where detection decreased as DL(SKELETON) increased. 55 and their interaction are needed to explain the data. The complexity of the MAP skeleton (DL(SKELETON)) significantly decreased detection (z = −3.207, p = 0.001). The complexity of the contour (DL(CONTOUR)) decreased detection, but the e↵ect did not reach significance (z = −1.77, p = 0.077). There was a significant interaction between DL(SKELETON) and DL(CONTOUR) (z = 3.037, p = 0.002). For both DL(CONTOUR) and DL(SKELETON) an increase in complexity lead to a decrease in detection. The interaction coefficient was positive, the opposite direction of both main e↵ects, meaning that the two factors combined sub-additively. In other words, the decrease in detectability for a given shape is less than predicted in a model without the interaction term, where the decrease in detectability is simply the sum of the magnitude of the two complexity e↵ects. These results suggests that detection of the closed contours cannot be simply explained by local contour e↵ects; the subjects’ responses are due in part to the global shape as well. The results of Experiment 3 confirm that closed contour detection is influenced by global shape complexity as captured by DL(SKELETON), at least in the context of a natural contour detection task. However, the shapes constitute a “natural experiment” in which the shapes are largely uncontrolled where shape parameters are correlated with each other. Experiment 4 uses experimentally designed closed contours that are not drawn from a database of natural shapes. These artificial shapes are generated systematically so that the complexity measures are as independent as possible. 3.3 Experiment 4: Experimentally Manipulated Shapes Experiment 4 follows the same basic methodology as Experiment 3, but this time with experimentally manipulated shapes. Natural shapes do not uniformly cover the space of complexity, leading to a sample of shapes with similar complexities where performance is similar. Additionally, the correlation between the di↵erent complexity measures is uncontrolled, making it difficult to tease apart the e↵ects of the separate factors. In Experiment 4 shapes are generated by independently manipulating the axial structure and the noise in the contour. This will result in shapes that are more evenly 56 spread in complexity space and that have a low correlation between the complexities. Additionally, it will ensure that the results from Experiment 3 are not simply due to some familiarity with natural shapes or other unknown artifacts in the shape database. As in Experiment 3, subjects will be required to decide which of two intervals contained the image containing the closed contour, and which contained the noise image. The e↵ect of the three types of complexity - DL(CONTOUR), DL(MISFIT), and DL(SKELETON) - will first be analyzed in isolation. Then they will be considered in combination, to find the factors that best describe the subjects’ data without over-fitting. 3.3.1 Methods Subjects Twenty undergraduate subjects from the general psychology subject pool participated in the experiment for credit in an undergraduate course. Stimuli As in Experiment 3, there were two noise images. In one image a contour would be embedded. The contours in this experiment were experimentally generated so that the structure of the skeleton could be manipulated independently of the contour noise. The shape skeleton was generated first and then a shape was “grown” outward from the skeleton. There were four di↵erent shape skeleton conditions: single part shapes, two part shapes, three part shapes where two parts are on opposite sides of a main part, and three part shapes where two parts extend o↵ the same side of a main part. The first condition of the skeletal axis variable used a single straight axis. The second condition used a straight main axis with a second skeletal axis added, perpendicular to the main axis, at a random location, yielding a skeleton with a higher complexity (lower prior probability under the generative model). The third condition uses a straight main axis with a second axis emanating from one side of the main axis, and a third axis emanating from the other side of the main axis. Both the second and 57 third axis were perpendicular to the main axis and were at a random location. This results in a skeleton that has even higher complexity. The fourth axis condition has a straight main axis with a second and third axis emanating from the same side of the main axis. Both are perpendicular to the main axis and at a random location, with the constraint that the third axis must be more than 2 ⇥ 15 = 30 units away from the second axis, where 15 is the length of the ribs for the second and third axis (explained in more detail below). This condition results in skeletons with similar complexities as in the third axis condition, but higher complexities than skeletons in the first or second conditions. The overall process yields a variety of skeletal structures varying in complexity (DL(SKELETON)). After the skeleton was generated, a shape was grown from the skeleton. The growth process is a result of sprouting “ribs” laterally from the skeleton. Beginning at one end of the first axis a rib was grown perpendicularly left and right from the skeleton. A contour point was placed at the end of the rib. The length of the ribs was sampled from a Gaussian distribution centered at 25 units away from the skeleton. The contour noise condition set the standard deviation of the Gaussian; the contour noise conditions ranged from lowest noise to highest noise, and had standard deviations of 1,3,5, or 7. After obtaining the first contour points, the second point on the skeletal axis is considered. Again we sample a rib length from a Gaussian, however, now the Gaussian is centered at the location of the previous rib length. The contour was sampled in this Markovian fashion around the entire axis. Once a child axis is reached the ribs are generated in the same manner, however the Gaussian from which rib lengths are sampled is initially centered at 15 instead of 25. This process yields a variety of shapes for each skeletal structure, di↵ering in likelihood (and this DL(MISFIT)). To see examples from each combination of the factors see figure 3.11. More Noise Contour Noise Manipulation Manipulated Shape Variables Less Complex Axial Structure Manipulation Affects both Region Misfit and Contour Complexity Less Noise 58 More Complex Affects Contour Complexity Figure 3.11: Moving right to left shows the e↵ect of the manipulation of axial structure. The rightmost shapes all have a single axis, the two axes, and finally three axes shapes, first with two axes on opposite sides of the shape and then two axes on the same side of the shape. Moving from top to bottom illustrates the e↵ect of increasing the contour noise. Along the top, the shapes have a low amount of variability in the lengths of the ribs that generated the contour, and moving down increases this variability. 59 Design and Procedure There design was a 4x4 factorial design (four skeletal axis conditions and four contour noise conditions). Subjects saw 25 trials from each conditions, in a random order, for a total of 400 trials. Each trial was identical to Experiment 3, except that the stimuli were generated as described above instead of being the boundaries of natural shapes. 3.4 Results The results from Experiment 4 were noisier than from Experiment 3, perhaps because the unnatural shapes induced more uncertainty in the observers. No significant e↵ect of our manipulated variables was found. There appears to be no a↵ect of the contour noise manipulated variable. An increase in the number of skeletal axes tended to decrease detection. The e↵ect of the skeletal axis manipulation was close to reaching significance (p = 0.053). In addition to the analysis of manipulated factors, I again looked at the individual e↵ects of the three complexity measures on detection (which are di↵erent from, though related to the manipulated factors). The trends were were all in the predicted direction (see Figs. 3.14, 3.15, and 3.16). Both DL(CONTOUR) and DL(MISFIT) failed to have a significant e↵ect on detection when looked at in isolation (p = 0.750, and p = 0.243), as in Experiment 3. The data in this experiment also failed to result in an isolated e↵ect of DL(SKELETON), but of all three complexities, skeleton complexity was the closest to having a significant e↵ect (p = 0.068). As in Experiment 3, in addition to looking for e↵ects of the three complexities in isolation, a stepwise minimization procedure was used to find the 2-AFC logistic regression model that resulted in the lowest AIC (fit the data best with the simplest model). Just as in Experiment 3, the winning model includes DL(CONTOUR), DL(SKELETON), and the interaction between the two factors. An increase in DL(SKELETON) significantly decreased detection (z = −3.56, p = 0.0004). There was a non-significant decrease in detection as DL(CONTOUR) increased (z = 1.44, p = 0.15), but the factor must be 60 1.4 1.3 Log Odds 1.2 1.1 1 0.9 0.5 1 1.5 2 2.5 3 Contour noise 3.5 4 4.5 Figure 3.12: Performance (log odds of a correct answer) is shown as a function of the manipulated contour noise variable. The contour noise manipulation appears not to have a↵ected detection performance. included in the regression due to a significant interaction between DL(SKELETON) and DL(CONTOUR) (z = 3.34, p = 0.0008). The interaction between the two complexities was positive, the opposite direction of the two main e↵ects. This means that there was a sub-additive e↵ect of the two complexities. In other words both increases in DL(SKELETON) and DL(CONTOUR) decrease detection, though the magnitude of the decrease when both factors are increased is less than predicted by simply summing the e↵ect of the two components. This best fitting model for experimentally manipulated shapes (Experiment 4) was identical to the best fitting model for natural shapes (Experiment 3), and all the factors had e↵ects in consistent directions. This provides strong converging evidence about the e↵ect of each type of complexity. Shape complexity (DL(SKELETON)) provides the strongest influence on performance. The complexity of the contour (DL(CONTOUR)) also tended to decrease performance. Finally, there is a sub-additive interaction between DL(SKELETON) and DL(CONTOUR). 61 1.5 1.4 Log Odds 1.3 1.2 1.1 1 0.5 1 1.5 2 2.5 Axis Branches 3 3.5 Figure 3.13: Performance (log odds of a correct answer) is shown as a function of manipulated axial structure variable. Performance appear to slightly declined as the number of branches increases, but the trend is not significance (p = 0.053) 3.5 Discussion The detection of closed contours in noise could be performed by a completely local process, since detection could occur if the observer only notices a small contour segment. This detection model would suggest that a holistic or global representation of shape is not necessary for closed contour detection. If our subjects used a decision process based solely on a segment of the contour there would be no e↵ect of the complexity of the shape (as measured by either DL(SKELETON) or DL(MISFIT)). The results, however, showed a significant e↵ect of the complexity of the shape skeleton, suggesting that the structure of the region bounded by the closed contour is important for detection. Additionally, the complexity of the contour interacted with the complexity of the shape, suggesting that the complexity of the contour and shape need to be considered together. While many of the closed contour studies mentioned earlier suggested that 62 1.5 1.4 Log Odds 1.3 1.2 1.1 1 0.9 450 500 550 DL(CONTOUR) 600 Figure 3.14: Performance (log odds of a correct answer) is shown as a function of DL(CONTOUR) for the combined subjects’ data from Experiment 4. Error bars show one standard error. There is a weak trend in the predicted direction; detection was best for the simple contours and worse for contours with the higher complexity. 63 1.5 1.4 Log Odds 1.3 1.2 1.1 1 0.9 3000 4000 5000 6000 7000 DL(MISFIT) 8000 9000 Figure 3.15: Performance (log odds of a correct answer) is shown as a function of DL(MISFIT) for the combined subjects’ data from Experiment 4. Error bars show one standard error. The data is noisy, but the best performance was in the cases of low DL(MISFIT) as predicted, however the worst performance was at a moderate level of complexity. 64 1.6 1.5 Log Odds 1.4 1.3 1.2 1.1 1 0.9 0 1000 2000 3000 DL(SKELETON) 4000 5000 Figure 3.16: Performance (log odds of a correct answer) is shown as a function of DL(SKELETON) for the subjects’ combined data from Experiment 4. Error bars show one standard error. As with all of the other data, this data is noisy, but there is a weak trend in the predicted direction; detection was best for the simpler skeletons and worse for skeletons with higher complexity. 65 closed contours are treated di↵erently than open contours, Tversky et al. (2004) was unable to find an e↵ect of closure on detection that could not be explained by a simple grouping rule. The transitivity rule (Geisler & Super, 2000) simply says that if elements A and B are grouped, and B is grouped to a third element C, then A will be grouped with C as well. Subjects in Tversky et al. (2004) were able to detect a sequence of edge elements that closed to form a circle using fewer edge elements than were necessary to detect an open sequence of edge elements. The performance di↵erence was predicted by an application of the transitivity rule, which increased the grouping strength between all of the elements in the circle. No closure mechanism needed to be introduced. Our results show that the complexity of the shape is a necessary component in the explanation of contour detection. Our results, however, are not necessarily in contradiction with Geisler & Super (2000) or Tversky et al. (2004). The goal of the studies of contour integration is to learn how separate elements might be grouped into with neighboring elements. Contour grouping depends on the turning angle between elements, the proximity of the elements, and the similarity of the orientation of the individual elements (Field et al., 1993; Geisler et al., 2001). The elements that make up our contours are pixels. The contour elements have no separation, as they are neighboring pixels. Pixels have no orientation or phase and the pixels of a contour were all the same luminance; each element of the contour is maximally similar to one another. The only property of the elements that a↵ects their grouping is the angle between neighboring elements. The contours and noise were placed on a pixel grid, so turning angles are always multiples of 45 degrees. In Experiment 3, using natural shapes, the large majority of the turning angles were 45 degrees, which would result in strong integration signals under the standard models (Field et al., 1993; Geisler & Super, 2000; Geisler et al., 2001; Hess et al., 2003; Tversky et al., 2004), yet it was in this experiment that the e↵ect of the complexity of the shape was strongest. The stimuli in both the natural contours and manipulated contours experiments would be grouped with high probability by other contour integration models. This study looks at shapes which have a high probability of being grouped, which is a di↵erent set of contours 66 than the set considered in Tversky et al. (2004), which considered both contours that are likely to be grouped (edge elements generated by the same object) and ungrouped (elements generated by di↵erent objects/generative procedures). While a benefit for closure may not be useful for separating edge elements into those that would, or would not, be grouped by an association field type model, shape complexity is a necessary component to understand the detectability of contours whose edge elements are all generated by the same object. For the closed contours used in both Experiments 3 and 4, the e↵ect of shape cannot be ignored. Standard computer vision techniques were not able to detect the contours in these experiments, though subjects were generally able to. A more complete understanding of the nature of the complexities a↵ecting human subjects, and the global shape representation processes that these complexity e↵ects uncover, would take us closer to achieving computational models that can achieve human-like performance. In summary, when a contour closes, a shape is formed. The complexity of this shape, which is not present in open contours, is an important factor in detection performance. This result is not predicted by conventional shape representation models or contour integration models, but is a natural consequence of a complexity measure based on a Bayesian model of shape representation. 67 4. General Conclusions By framing the problem of contour detection as one of Bayesian inference I have shown why a decrease in performance is expected with an increase the complexity of a contour, measured as the Description Length (DL) of the contour. Through a series of experiments on contour detection I have shown both that DL(CONTOUR) does capture the complexity relevant for contour detection, and that contour integration is not a purely local process. Using closed contours, I was able to support the finding from my experiments on open contours: that the mechanisms responsible for contour detection do not consider only local information. By introducing two additional complexity measures that are related to the shape of the bounded region, DL(SKELETON) and DL(MISFIT), I was able to show that the shape skeleton captures the information about global shape that a↵ects detection. Specifically, the data revealed that the e↵ect of DL(CONTOUR) does not fully explain closed contour detection and that the addition of a global factor, such as DL(SKELETON), is necessary. Global shape information is important for tasks other than detection as well. The shape skeleton has been shown to be important for shape categorization (Wilder et al., 2011). The orientation of a shape’s skeletal axis a↵ects the perceived orientation of local texture information on the interior of a shape (Harrison & Feldman, 2009). Lee, Mumford, Romero, & Lamme (1998) found V4 cells that fire more strongly when their receptive fields are near the shape skeleton than when their receptive fields are at other locations in the interior of a bounded region. While these global shape e↵ects have been found to my knowledge this is the first time global shape information has been shown to influence contour detection. It is not obvious why global information is involved in the detection process. Each closed contour could be detected by locating a small segment of the contour, letting the whole go unnoticed. Nestor, Vettel, & Tarr (2008) found that di↵erent brain areas were active if the subject was instructed to identify a face than if the instructions were to detect a face. The areas active during the detection task are believed to represent 68 local facial features, while the active areas during the identification task are believed to process the face as a whole. The results of the contour detection experiments presented here appear to be inconsistent with the findings of Nestor et al. (2008). Nestor et al. (2008) were measuring BOLD responses from visual areas that are fairly high in the visual hierarchy, specifically areas believed to be involved in face processing. Standard theories of contour integration, such as the association field model (Field et al., 1993), view contour integration as happening earlier in the visual pathway. Even if closed contour integration involves brain areas further up the visual stream it is unlikely to involve areas like the fusiform gyrus, where Nestor et al. (2008) found their e↵ect. Based upon the data from the experiments in chapters 2 and 3, it appears that the visual system is not easily able to choose between using local or global information; the visual system takes both into account. Nestor et al. (2008) is only one study suggesting that global information is represented in separate locations than local information. Murray et al. (2002, 2004) and Fang et al. (2008) found that identical stimuli can produce activation in di↵erent brain areas depending on if the observer is currently perceiving a set of individual elements or a grouped whole. A similar e↵ect was found in He et al. (2012), where perceiving a stimulus as un-grouped elements led to one type of adaptation e↵ects (the tilt aftere↵ect) while perceiving the same stimulus as a whole led to a di↵erent type of adaptation (the shape aftere↵ect). Those studies show that the visual system does make use of global representations. They also suggest, as did Nestor et al. (2008), that the visual system is flexible in the type of representation used. The stimuli from the current study’s experiments are not easily perceived as individual elements; the individual pixels are too small to be easily distinguished. The grouping signals for our stimuli are so strong that the visual system cannot help but group them together and use a global representation. The results presented here support neural evidence for a global shape representation. Recently, there has been evidence suggesting that cells in IT simultaneously represent axial structure and bounding contour information (Hung et al., 2012) and 69 that areas as early as V3 are already switching from the local representation of earlier areas that are sensitive object properties such as orientation to a global representation that highlights axial structure (Lescroart & Biederman, 2012). Zhou, Friedman, & von der Heydt (2000) found that many cells in V2 and V4 (and some cells in V1) were sensitive to what side of a boundary contained the figure, even for figures that extend far beyond the classical receptive fields of those cells. Zhaoping (2005) proposed a model of lateral connections between V2 cells to explain this e↵ect. Craft, Schutze, Niebur, & von der Heydt (2007) showed how models using solely lateral connections cannot account for some of the neurophysiological data (such as the speed at which the direction of figure information appears), and proposed a model with feedback connections from higher levels. This model was further developed in Mihalas, Dong, von der Heydt, & Niebur (2011). In the model, at a low level, such as LGN and V1, there are cells that are selective for edges of certain orientations. At the next level, V2, there are cells that are selective for the side of the edge that owns the border (usually this is the direction of the figure). At some higher level there are grouping cells. Each grouping cell gets input from cells at di↵erent locations in the visual field that are selective for di↵erent directions of border ownership. These grouping cells are similar to the grassfire algorithm for computing the medial axis (Blum, 1973). In the grassfire, a “fire” is set at the boundary of a shape and the flames proceed inward normal to the boundary. When two flames meet, the flames are extinguished and a pile of ash is left. The set of all the ash piles is the medial axis. The grouping cells of Craft et al. (2007) look for border ownership cells that are selective for di↵erent directions, meaning that at the midpoint between the two cells the normal vectors from the edge elements would intersect, so this point is on the medial axis. The current study showed that the complexity of the MAP skeleton, DL(SKELETON), is related to the detectability of closed contours. Even though local information is sufficient for detection the visual system was unable to avoid representing the global information. Global information, such as the direction of figure, appears very early in 70 visual processing(Zhou et al., 2000), possibly as a result of feedback connections from grouping cells (Craft et al., 2007). Grouping cells also explain our ability to attend to specific objects (Mihalas et al., 2011), which is a possible explanation for why there is a benefit for perceiving closed contours over open contours. Grouping cells are still a theoretical construct, but if they are related to the MAP skeleton or other representation similar to the medial axis, possible places to search for these cells are in V3 or IT (Lescroart & Biederman, 2012; Hung et al., 2012). The current study suggests that global information, specifically axial structure, is an integral part of the object representation system, so further exploration to find cells with similar properties to grouping cells would be worthwhile. The findings reported here raise several additional questions. The simplest open contour our model generates is a completely straight segment. The simplest closed contour is a perfect circle. Removing a small portion of the closed contour will result in an open contour that is fairly complex as measured by DL(CONTOUR). Intuition suggests that this contour might be more easily detected than a wavy line that does not reach closure, even if the total curvature is the same in both contours. There are two lines of research suggested by this example. The first line of research considers that the visual system might take additional information beyond the changes in the direction (turning angles) of a contour. The system might also care about rate of change in the direction, preferring to keep the rate of change constant (similar to the results of Pettet, 1999). In future work I hope to look at the e↵ect of rate of change of curvature on contour detection, and enhance my model to include a bias for constant curvature in addition to constant direction. The second line of research suggested by the example of the circle with a missing segment is that global shape e↵ects could be found on open contours that are “shape-like” or “part-like”. By estimating skeletons for open contours we can see if DL(SKELETON) also a↵ects their detectability which would suggest that the global e↵ect found in closed contours are also present for open contours. 71 4.0.1 Conclusion Human observers were fairly accurate at detecting which image contained a contour and which contained noise. Their performance was modulated by the complexity of the entire contour, as well as the the complexity of the shape contained inside a closed contour. This has important implications about the nature of object detection and shape representations. The Gestalt psychologists had the correct intuition that a whole is special. In addition to simply being a collection of contour points that closes, closed contours have a shape. These results suggest that models of contour integration (Field et al., 1993; Geisler & Super, 2000), while considering the importance of the relationship of neighboring edge elements, could be improved by including information about the geometry of the entire contour. A more complete model of human contour detection must accommodate the global e↵ect of contour complexity and the e↵ect of shape complexity. By deriving a shape complexity measure this work provided one key component for quantifying the e↵ect of closure. By deriving three complexity measures using the definition of description length of Shannon (1948) I hope to influence the broader field of pattern detection which can be studied using this common framework while allowing flexibility in the choice of generative models that are relevant to each specific class of stimuli. 72 A. Modeling Approaches This appendix contains several di↵erent attempts at modeling the detection or decision processes of the observer, and explanations why these approaches fail. A.1 Spatial Frequency One possible way the subjects could have performed the task, if the noise and the contour images contained di↵erent spatial frequencies, is to simply decide based upon these di↵erences. When looking at the power spectra of images, most natural scenes have a lot of low spatial frequency information and very little high spatial frequency content. Where there is an edge in an image there will be a spike in the medium to high spatial frequencies. The power spectra of the images in our experiment reveal content at all spatial frequencies, due to the images being white noise. The power spectrum of a noise image is indistinguishable from that of an image containing a contour (see Figure A.1). A.2 Theoretical Decision Model This decision model looks at the statistical di↵erences between contours and noise, and does not consider the process of extracting the information from an image. Let us begin by assuming that the observer has extracted a chain of pixels from each image, and that the chain from each image is of the appropriate length (determined by the length condition - either 110 or 220 pixels). Using equation 2.9, the observer is able to decide if the chain from each image is a contour or noise. If only one of the chains is determined to be a contour, the observer will respond by choosing the interval from which that chain was extracted. If the observer decides that neither chain is a contour the decision about which interval contained the contour is simply a Bernoulli trial, or coin flip. Under this model if the observer does not detect the contour in either interval, each response would be random, resulting in a performance of 50 percent. If both chains are determined to be contours the observer can pick an interval at random, 73 Figure A.1: The top left is an image containing a contour (in the lower right). The top right image is the power spectrum of the image in the top left. The image in the bottom left only contains noise. The bottom right is the power spectrum for the image in the lower left. The power spectra of the two images are not noticeably di↵erent. 74 0.16 0.14 Proportion 0.12 0.1 0.08 0.06 0.04 0.02 0 0 50 100 150 Straight Continuations 200 250 Figure A.2: This figure shows the distributions of possible contours and possible noise chains in the long contour condition (220 pixels). The x-axis show the number of straight continuations (out of a possible 219). The y-axis shows the proportion of pixel chains with this number of straight continuations. The distributions do not overlap, so any decision will always be correct. pick the interval with the largest log-likelihood ratio, or probability match (use a coin flip to guess which interval, but the coin’s weight is set based upon the relative size of the two log-likelihood ratios). Perfect performance will occur if only one interval is believed to have contained a contour. In order for the model to result in less than perfect performance the contours would need to be misclassified as noise, or the noise would need to be misclassified as a contour. This would happen when the distribution of possible contours overlaps with the distribution of possible noise chains. In order to find a complexity e↵ect, the distribution of contours in one condition would need to overlap to a greater or lesser extend than for other complexities. Figure A.2 shows that the distribution of contours used in the experiment does not overlap with the noise distribution. This means, under the assumption that the contour is fully integrated on each trial, that the contour will never be misclassified as noise and noise will never be misclassified as a contour. 75 I have tested several modifications to this model in an attempt to increase the overlap between the contour distribution and the noise distribution. The first assumption made is that the for each of the complexity levels we find a di↵erent distribution of possible contours. For each complexity level there is a number of straight continuations. I found the binomial distribution whose parameter maximized the likelihood of obtaining a contour of that complexity. That results in one contour distribution for each complexity. The most complex contour’s distribution is closest to the noise distribution, and the simplest contour’s distribution is furthest from the noise distribution. None of the contour distributions, however, overlap with the noise distribution. This modification does not result in a model with a complexity e↵ect, but the order of the distributions is consistent with the complexity e↵ect so this modification will be used in the other versions of this model. A second version of the model contains an additional parameter L. The L parameter is the probability that the observer will find the contour if one exists. The assumption is that the observer can only attend to a certain number of small regions in the image. If the contour is contained in one of the locations where the observer is attending, they will detect it, otherwise they will not. L is directly related to the number of locations that the observer is able to attend, as the number of locations increases the probability of detection will increase. The parameter gives control over how well the observer performs. With a larger L parameter the observer will perform better, but this does not result in a complexity e↵ect because it does not create a larger di↵erence between the distributions of straight continuations for the contours of di↵ering complexities. This parameter would be useful in fitting the actual data, allowing us to increase or decrease the average performance to match an individual subjects’ performance, but it does not explain the decreasing trend with complexity found in the data. In order to find a complexity e↵ect, the contour distributions need to be shifted closer to the noise distribution so that there is overlap and so that the amount of overlap is di↵erent for each of the complexity levels. To do this I introduce a parameter, D, that 76 can be fit for each subject. The D parameter controls the amount of noise pixels that are included in the chain in addition to the actual signal pixels (or in the noise case in addition to the noise pixels). As the D parameter is increased the distributions have increasing overlap. This fails to work because as D increases so does the variance of the distributions. In order to get the contour and noise distributions to overlap D needs to be so large that the di↵erent contour complexities are no di↵erence in the performance for di↵erent levels of complexity because the contour distributions are not sufficiently distinguishable. In summary, this model either results in perfect performance, as in the original version of this model, or there is a fixed level of performance. Since increasing the noise in the contour does not give a model that decreases in performance as complexity increases, the next modification is to decrease the signal in the integrated contour. A third parameter is introduced, S. The S parameter controls how many of the actual pixels from the contour are integrated into the proposed contour chain. This means that D + S is the length of the hypothesized contour. This model su↵ers from the same problem as the previous model, in order for the noise distribution and the contour distributions to overlap the D parameter must be large relative to the S parameters and the contour distributions are no longer distinguishable from one another. One modification to this model resulted in a complexity e↵ect similar to that found in our subjects. This model, however, was rejected because the assumptions are not reasonable. In order to find the complexity e↵ect I needed to assume that all of the deviations from straight are contained in the S signal pixels. This assumption might actually be justifiable in experiment 2, where the majority of the non-straight turning angles were grouped together, but is not valid for the other experiments. This model essentially assumes that the observer detects the segment of the contour containing all of the curvature, but then does not continue to integrate the rest of the contour, instead integrating noise pixels. Because this assumption seems unreasonable, and is counter to any of the existing models of contour integration (which suggest that the straight portion should be easily integrated) this version of the model is also rejected. 77 A.3 Realistic Front End The theoretical decision model described above fails because noise was not being introduced in an e↵ective way. That model assumes that the observer has complete access to the individual pixels, so one way to introduce noise is to begin with input that is more similar to what the visual system receives. In this model the images are blurred with a Gaussian filter. A bank of oriented filters is applied to the blurred image. I tested both a set of Gabor filters and square wave filters where each filter contained 1.5 phases (an “on” region in the center and an “o↵” region in the surround - the “straight surroundsuppression filter”). There is a vertical oriented filter and then filters each 30 degrees. Filter responses below a threshold are ignored. The responses of all of the filters are combined into a single image and the several largest connected chain of locations with high filter responses are isolated. Finally, the chain with the highest probability under the von Mises model is selected as the candidate contour. The interval with the candidate contour that has the highest probability is chosen for the response. For this model to succeed, the filters need to have highest (or possibly lowest) responses near the contour. The output of the filters (either the Gabor filter or the straight surround-suppression filter) do not have a high (or low) response near the location of the contour (see Figures A.3 and A.4). This is likely due to the large amount of noise surrounding the contour a↵ecting the filter response. Because the problems with these filters appear to be driven by the noise, the next approach to test is using a filter that doesn’t suppress the surround, but simply gives no response in the surround. This is not standard when using image filters, because filters generally must sum to one. If the sum of the filter is not 1, there will be an overall increase or decrease in the luminance of the image. I am currently developing this model and preliminary results appear promising. Figure A.5 shows the response of vertical filter that ignores the surround. The straight portion of the contour is highlighted by this filter, but there is still a large amount of spurious responses due to the noise in the image. Hopefully as this model gets developed further these spurious 78 Figure A.3: The image on the left is an image containing a contour (just left of the center of the image). On the right is the response to a vertical Gabor filter convolved with the image. The filter should highlight the vertical portion of the contour, but due to the integration of the noise surrounding the contour there is not significantly more activity near the straight portion of the contour than at other locations. Figure A.4: The image on the left is an image containing a contour (just left of the center of the image). On the right is the response to a vertical surround-suppression filter convolved with the image. Just as with the Gabor filter, this filter should highlight the vertical portion of the contour, but even with the surround noise being suppressed there is not significantly more activity near the straight portion of the contour than at other locations. 79 Figure A.5: The image on the left is an image containing a contour (just left of the center of the image). On the right is the response to a vertical filter that ignores noise surrounding the middle of the filter. This filter does a much better job at identifying the location of the contour, suggesting that a model based on this filter should be developed further. responses are not strong enough to prevent the detection of the contours. 80 Bibliography Altmann, C. F., Bültho↵, H. H., & Kourtzi, Z. (2003). Perceptual organization of local elements into global shapes in the human visual cortex. Current Biology, 13(4), 342 – 349. Attneave, F. (1954). Some informational aspects of visual perception. Psychological Review, 61(3), 183–193. August, J., Siddiqi, K., & Zucker, S. (1999). Ligature instabilities in the perceptual organization of shape. Computer Vision and Image Understanding, 76(3), 231–243. Bex, P. J., Simmers, A. J., & Dakin, S. C. (2001). Snakes and ladders: the role of temporal modulation in visual contour integration. Vision Research, 41(27), 3775 – 3782. Blum, H. (1973). Biological shape and visual science (part 1). Journal of Theoretical Biology, 38, 205–287. Brainard, D. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. Braun, J. (1999). On the detection of salient contours. Spatial Vision, 12, 211–225. Briscoe, E. (2008). Shape skeletons and shape similarity. Ph.D. thesis, Rutgers University. Brunswik, E., & Kamiya (1953). Ecological cue-validity of ’proximity’ and of other gestalt factors. The American Journal of Psychology, 66(1), 20–32. Burns, J. B., Hanson, A. R., & Riseman, E. M. (1986). Extracting straight lines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(4), 425–455. Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 679–698. Chan, T., & Vese, L. (1999). An active contour model without edges. In M. Nielsen, P. Johansen, O. Olsen, & J. Weickert (Eds.) Scale-Space Theories in Computer Vision, vol. 1682 of Lecture Notes in Computer Science, (pp. 141–151). Springer Berlin Heidelberg. Chan, T., & Vese, L. (2001). Active contours without edges. Image Processing, IEEE Transactions on, 10(2), 266–277. Chater, N. (2000). Cognitive science: The logic of human learning. Nature, 407, 572–573. Craft, E., Schutze, H., Niebur, E., & von der Heydt, R. (2007). A neural model of figure-ground organization. Journal of Neurophysiology. URL http://jn.physiology.org/content/early/2007/04/18/jn.00203.2007.short Dakin, S., & Hess, R. (1998). Spatial-frequency tuning of visual contour integration. Journal of the Optical Society of America, A, 15, 1486–1499. De Valois, R., Albrecht, D., & Thorell, L. (1982a). The orientation and direction selectivity of cells in macaque visual cortex. Vision Research, 22, 531–544. De Valois, R., Albrecht, D., & Thorell, L. (1982b). Spatial fequency selectivity of cells in macaque visual cortex. Vision Research, 22, 545–599. 81 Duda, R. O., & Hart, P. E. (1972). Use of the hough transformation to detect lines and curves in pictures. Commununications of the ACM, 15(1), 11–15. Elder, J., & Goldberg, R. (1998). The statistics of natural image contours. Proceedings of the IEEE Workshop on Perceptual Organisation in Computer Vision. Elder, J., & Goldberg, R. (2002). Ecological statistics of Gestalt laws for the perceptual organization of contours. Journal of Vision, 2, 324–353. Elder, J., & Zucker, S. (1993). The e↵ect of contour closure on the rapid discrimination of two-dimensional shapes. Vision Research, 33, 981–991. Ernst, U. A., Mandon, S., Schinkel-Bielefeld, N., Neitzel, S. D., Kreiter, A. K., & Pawelzik, K. R. (2012). Optimality of human contour integration. PLoS Computational Biology, 8(5), e1002520. Fang, F., Kersten, D., & Murray, S. O. (2008). Perceptual grouping and inverse fmri activity patterns in human visual cortex. Journal of Vision, 8(7). Feldman, J. (1995). Perceptual models of small dot clusters. In I. J. Cox, P. Hansen, & B. Julesz (Eds.) Partitioning Data Sets, (pp. 331–357). American Mathematical Society. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 19. Feldman, J. (1997). Curvilinearity, covariance, and regularity in perceptual groups. Vision Research, 37(20), 2835 – 2848. Feldman, J. (2000). Minimization of boolean complexity in human concept learning. Nature, 407, 630–633. Feldman, J. (2004). How surprising is a simple patter? quantifying ”eureka!”. Cognition, 93, 199–224. Feldman, J., & Singh, M. (2005). Information along contours and object boundaries. Psychological Review, 112(1), 243–252. Feldman, J., & Singh, M. (2006). Bayesian estimation of the shape skeleton. Proceedings of the National Academy of Sciences, 103(47), 18014–18019. Feldman, J., Singh, M., Briscoe, E., Froyen, V., Kim, S., & Wilder, J. (2013). An intergrated bayesian approach to shape representation and perceptual organization. In S. Dickinson, & Z. Pizlo (Eds.) Shape Perception in Human and Computer Vision: An Interdisciplinary Perspective. Springer Verlag. Field, D. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America, 4, 2379–2394. Field, D., Hayes, A., & Hess, R. (1993). Contour integration by the human visual system: Evidence for a ”local association field”. Vision Research, 33(2), 173 – 193. Field, D., Hayes, A., & Hess, R. (2000). The roles of polarity and symmetry in the perceptual grouping of contour fragments. Spatial Vision, 13(1), 51–66. Field, D., & Tolhurst, D. (1986). The structure and symmetry of simple cell receptivefield profiles in the cat’s visual cortex. Proceedings of the Royal Society B, 228, 379–400. 82 Froyen, V., Feldman, J., & Singh, M. (2010). A bayesian framework for figure-ground interpretation. In NIPS’10, (pp. 631–639). Garrigan, P. (2012). The e↵ect of contour closure on shape recognition. Perception, 41(2), 221–235. Geisler, W., Perry, J., Super, B., & Gallogly, D. (2001). Edge co-occurrence in natural images predicts contour grouping performance. Vision Research, 41, 711–724. Geisler, W., & Super, B. (2000). Perceptual organization of two-dimensional patterns. Psychological Review, 107(4), 677–708. Gibson, J. (1966). The Senses Considered as Perceptual Systems. Boston, MA: Houghton Mi✏in. Guttman, M., Prince, J., & McVeigh, E. (1994). Tag and contour detection in tagged mr images of the left ventricle. Medical Imaging, IEEE Transactions on, 13(1), 74–88. Guttman, M. A., McVeigh, E. R., & Prince, J. (1992). Contour estimation in tagged cardiac magnetic resonance images. In Engineering in Medicine and Biology Society, 1992 14th Annual International Conference of the IEEE, vol. 5, (pp. 1856–1857). Harrison, S. J., & Feldman, J. (2009). The influence of shape and skeletal axis structure on texture perception. Journal of Vision, 9(6). He, D., Kersten, D., & Fang, F. (2012). Opposite modulation of high- and low-level visual aftere↵ects by perceptual grouping. Current Biology, 22(1), 1040–1045. Hess, R., & Dakin, S. (1997). Absence of contour linking in peripheral vision. Nature, 390, 602–604. Hess, R., & Field, D. (1995). Contour integration across depth. Vision Research, 35(12), 1699 – 1711. Hess, R., & Field, D. (1999). Integration of contours: new insights. Trends in Cognitive Sciences, 3(12), 480 – 486. Hess, R., Hayes, A., & Field, D. (2003). Contour integration and cortical processing. Journal of Physiology-Paris, 97, 105 – 119. Hough, P. (1962). Method and means for recognizing complex patterns. U.S. Patent 3069654. Hung, C.-C., Carlson, E. T., & Connor, C. E. (2012). Medial axis shape coding in macaque inferotemporal cortex. Neuron, 74(6), 1099 – 1113. Jones, J., & Palmer, L. (1987). An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6), 1233– 1258. Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models. International Journal of Computer Vision, 1(4), 321–331. 83 Kellman, P., & Shipley, T. (1991). A theory of visual interpolation in object perception. Cognitive Psychology, 23, 141–221. Ko↵ka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace, & World. Kovacs, I., & Julesz, B. (1993). A closed curve is much more than an incomplete one: e↵ect of closure in figure-ground segmentation. Proceedings of the National Academy of Sciences, 90(16), 7495–7497. Kovacs, I., & Julesz, B. (1994). Perceptual sensitivity maps within globally defined visual shapes. Nature, 370, 644–646. Lee, P. M. (2004). Bayesian Statistics: An introduction. West Sussex, UK: Wiley, third ed. Lee, T. S., Mumford, D., Romero, R., & Lamme, V. A. (1998). The role of the primary visual cortex in higher level vision. Vision Res., 38, 2429–2454. Lescroart, M., & Biederman, I. (2012). Cortical representation of medial axis structure. Cerebral Cortex, 23(3), 629–637. Leymarie, F., Kimia, B., & Giblin, P. (2004). Towards surface regularization via medial axis transitions. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 3, (pp. 123–126 Vol.3). Marotti, R. B., Pavan, A., & Casco, C. (2012). The integration of straight contours (snakes and ladders): The role of spatial arrangement, spatial frequency and spatial phase. Vision Research, 71(0), 44 – 52. Marr, D., & Hildreth, E. (1980). Theory of edge detection. Proceedings of the Royal Society of London. Series B. Biological Sciences, 207(1167), 187–217. Mathes, B., & Fahle, M. (2007). Closure facilitates contour integration. Vision Research, 47(6), 818 – 827. McIlhagga, W. H., & Mullen, K. T. (1996). Contour integration with colour and luminance contrast. Vision Research, 36(9), 1265 – 1279. Mihalas, S., Dong, Y., von der Heydt, R., & Niebur, E. (2011). Mechanisms of perceptual organization provide auto-zoom and auto-localization for attention to objects. Proceedings of the National Academy of Sciences, 108(18), 7583–7588. Movshon, J., Thompson, I., & Tolhurst, D. (1987). Receptive field organization of complex cells in the cat’s striate cortex. Journal of Physiology (London), 283, 53–77. Murray, S. O., Kersten, D., Olshausen, B. A., Schrater, P., & Woods, D. L. (2002). Shape perception reduces activity in human primary visual cortex. Proceedings of the National Academy of Sciences, 99(23), 15164–15169. Murray, S. O., Schrater, P. R., & Kersten, D. (2004). Perceptual grouping and the interactions between visual cortical areas. Neural Networks, 17(5-6), 695–705. Nestor, A., Vettel, J. M., & Tarr, M. J. (2008). Task-specific codes for face recognition: How they shape the neural representation of features for detection and individuation. PLoS ONE, 3(12), e3978. 84 Olshausen, B., & Field, D. (1996a). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609. Olshausen, B., & Field, D. (1996b). Natural image statistics and efficient coding. Network: Computation in Neural Systems, 7, 333–339. Pelli, D. (1997). The videotoolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. Pettet, M. W. (1999). Shape and contour detection. Vision Research, 39(3), 551 – 557. Pettet, M. W., McKee, S. P., & Grzywacz, N. M. (1998). Constraints on long range interactions mediating contour detection. Vision Research, 38(6), 865 – 879. Psotka, J. (1978). Perceptual processes that may create stick figures and balance. Journal of Experimental Psychology, 4(1), 101–111. Ren, X., & Malik, J. (2002). A probabilistic multi-scale model for contour completion based on image statistics. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.) Computer Vision - ECCV 2002, vol. 2350 of Lecture Notes in Computer Science, (pp. 312–327). Springer Berlin Heidelberg. Saarinen, J., & Levi, D. M. (1999). The e↵ect of contour closure on shape perception. Spatial Vision, 12(2), 227–238. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423 and 623–656. Siddiqi, K., Shokoufandeh, A., Dickenson, S., & Zucker, S. (1998). Shock graphs and shape matching. In Computer Vision, 1998. Sixth International Conference on, (pp. 222–229). Singh, M., & Feldman, J. (2013). Principles of contour information: a response to Lim and Leek (2012). Psychological Review, 119, 678–683. Singh, M., & Fulvio, J. M. (2005). Visual extrapolation of contour geometry. Proceedings of the National Academy of Sciences of the United States of America, 102(3), 939–944. Singh, M., & Fulvio, J. M. (2007). Bayesian contour extrapolation: Geometric determinants of good continuation. Vision Research, 47(6), 783 – 798. Tversky, T., Geisler, W., & Perry, J. (2004). Contour grouping: Closure e↵ects are explained by good continuation and proximity. Vision Research, 44, 2769–2777. Wertheimer, M. (1958). Principles of perceptual organization. In D. Beardslee, & M. Wertheimer (Eds.) Readings in perception, (pp. 103–123). Princeton: Van Nostrand (translated from original published in 1923). Wilder, J., Feldman, J., & Singh, M. (2011). Superordinate shape classification using natural shape statistics. Cognition, 119(3), 325 – 340. Yuan, C., Lin, E., Millard, J., & Hwang, J.-N. (1999). Closed contour edge detection of blood vessel lumen and outer wall boundaries in black-blood {MR} images. Magnetic Resonance Imaging, 17(2), 257 – 266. 85 Yuille, A. L., Fang, F., Schrater, P. R., & Kersten, D. (2004). Human and ideal observers for detecting image curves. In Nerual Information Processing Systems (NIPS). Zhaoping, L. (2005). Border ownership from intracortical interactions in visual area V2. Neuron, 47(1), 143 – 153. Zhou, H., Friedman, H. S., & von der Heydt, R. (2000). Coding of border ownership in monkey visual cortex. The Journal of Neuroscience, 20(17), 6594–6611.