Download Object recognition in clutter: selectivity and invariance

Object Recognition in Clutter: Selectivity and Invariance Properties in Monkey Inferotemporal Cortex Davide Zoccolan and James DiCarlo The problem: A major challenge of current theories of vision is to understand how the visual system performs object recognition in cluttered conditions, typical of natural visual scenes, where objects of interest do not appear in isolation but together with background objects. Object recognition in primates is thought to depend on neuronal activity in the inferotemporal cortex (IT) [1], which is the last stage of the ventral visual stream. In fact, neurons found in monkey IT fulfill two essential requirements for visual recognition: invariance and selectivity. They are selectively tuned to views of complex objects such as faces and their responses show significant invariance to stimulus transformations such as scale and position changes [2, 3]. Previous studies report a reduction of an IT neuron response to its preferred stimulus when an additional “clutter” stimulus is simultaneously present in its receptive field [4, 5]. However, the relationship between position-, shape-, and clutter- sensitivity of IT neurons has not been yet systematically assessed. Motivation: Understanding how single and multiple objects are represented in the higher cortical areas of primates is one of the major objectives of computational and systems neuroscience. Such a challenge requires a highly multidisciplinary approach that combines electrophysiology and psychophysics with computational modeling. The hierarchical model of object recognition, recently developed in Poggio’s lab, accounts for both object identification and categorization of visual perception. It also provides a plausible circuitry to explain their neural basis and the origin of the invariance and selectivity properties of higher visual cells [3]. More generally, the model can play a key role in analyzing electrophysiological data, planning experiments and interpreting their results in the light of a coherent theoretical framework. Therefore, new experimental tests are continuously necessary to verify the model’s predictions and to improve its computational architecture. Previous work: Preliminary model simulations predict a complex but testable pattern of interaction between multiple objects simultaneously present in IT neuron’s receptive field [6]. When the preferred shape and a nonoptimal (clutter) shape are simultaneously present in the IT neuron’s receptive field, the response to the pair of shapes will be reduced. The amount of reduction will depend on the similarity of the clutter shape and the preferred shape. In particular, the model specifically predicts a U-shaped dependence of clutter interference (i.e., reduced neuronal response) on clutter-preferred shape similarity [6]. These model predictions call for an experimental investigation aimed at systematically testing them. Approach: The first step in our experimental design was to train monkey subjects to be experts at detecting specific objects. One monkey subject has been trained in a sequential object recognition task that requires the detection of a specific shape (the target shape) embedded in a temporal sequence of shapes drawn from the same, parameterized shape space (the distractors). Shapes are ~ 2 deg wide. To insure the generality of our results, the monkey has been trained to detect a target object in each of three different parameterized shape spaces (cars, faces, and abstract silhouettes). Results of the training in each shape space showed a consistent performance improvement (more than doubling) during the first 7-10 days of training that reached an asymptotic value that remained constant for the remaining 8-10 training sessions. Once the behavioral training has been completed, we started to perform single unit recordings from IT cortex. Each isolated neuron is tested for: 1) responsiveness of the neuron to stimuli sampled from the trained spaces; 2) selectivity of the neural response across the optimal stimulus space; 3) position tolerance of the shape selectivity; 4) impact of clutter on the shape selectivity. All the recordings are performed in passive viewing rapid sequence presentation (5 stimuli per second). The selection of stimuli presented to the monkey during the recordings includes subsets of shapes, belonging to each space, that were not used during the training phase. Our preliminary recordings show that it is possible to find IT neurons sharply tuned across subsets of shapes belonging to our stimulus spaces. The response of these neurons is maximal for a specific shape (the optimal stimulus) and then smoothly decreases for stimuli more and more dissimilar from the optimal one. Since our stimulus spaces are parametrized, it is possible to build a tuning curve of the neuronal response as a function of the distance, in the shape space, from the optimal stimulus. For those neurons whose tuning curves were invariant in the top and bottom training retinal position, we were able to test the interference produced by clutter, i.e. pairs of stimuli of controlled similarity simultaneously present in the neuron’s receptive field. These first recordings confirm at least part of the model predictions, in that the neuronal response smoothly decreases as a function of the distance, in the shape space, of the flanker stimulus from the optimal one. However, we did not find yet any IT neuron whose response increased when flanker and optimal stimulus were very dissimilar, as predicted by the U-shaped dependence of clutter interference of the model units. Difficulty: The main problems we encountered in trying to assess the impact of clutter on IT neuron responses were the following. First, it is very hard, in the first place, to find neurons with sharp selectivity across some subset of the stimulus space. Second, when such sharply tuned neurons were found, they often had small receptive fields (~3-4 deg), in which the two stimuli necessary to test the clutter interference could barely fit. Third, the selectivity of these neurons was often depressed during the protocol for the clutter interference test. We believe that this effect could be accounted by adaptation. So far, because of these problems, we could test the response to clutter in only a small fraction of the recorded IT neurons. Impact: The aim of this project is to combine physiological, psychophysical and computational studies to investigate the dependence of IT neuronal robustness to clutter on neuron's shape- and position-sensitivities. This will help to understand how complex shapes are represented in IT and how multiple objects interact or compete for IT representation. These are fundamental problems of contemporary visual science that require a strong interaction between experimental investigation and computational modeling. Future Work: We will keep testing IT neuron shape selectivity and clutter tolerance, trying, at the same time, to obtain a better characterization of the position tolerance, receptive filed size and adaptation properties of IT neurons. Research support: This report describes research done at the Center for Biological & Computational Learning, which is in the McGovern Institute for Brain Research at MIT, as well as in the Dept. of Brain & Cognitive Sciences, and which is affiliated with the Computer Sciences & Artificial Intelligence Laboratory (CSAIL). This research was sponsored by grants from: Office of Naval Research (DARPA) Contract No. MDA97204-1-0037, Office of Naval Research (DARPA) Contract No. N00014-02-1-0915, National Science Foundation (ITR/IM) Contract No. IIS-0085836, National Science Foundation (ITR/SYS) Contract No. IIS-0112991, National Science Foundation (ITR) Contract No. IIS-0209289, National Science Foundation-NIH (CRCNS) Contract No. EIA-0218693, National Science Foundation-NIH (CRCNS) Contract No. EIA-0218506, and National Institutes of Health (Conte) Contract No. 1 P20 MH66239-01A1. Additional support was provided by: Central Research Institute of Electric Power Industry, Center for eBusiness (MIT), Daimler-Chrysler AG, Compaq/Digital Equipment Corporation, Eastman Kodak Company, Honda R&D Co., Ltd., ITRI, Komatsu Ltd., Eugene McDermott Foundation, Merrill-Lynch, Mitsubishi Corporation, NEC Fund, Nippon Telegraph & Telephone, Oxygen, Siemens Corporate Research, Inc., Sony MOU, Sumitomo Metal Industries, Toyota Motor Corporation, and WatchVision Co., Ltd.. Davide Zoccolan is supported by the The International Human Frontier Science Program Organization. References: [1] K. Tanaka. Inferotemporal cortex and object vision. Annu.Rev.Neurosci. 19, 109-139 (1996). [2] N. K. Logothetis, J. Pauls, T. Poggio. Shape representation in the inferior temporal cortex of monkeys. Curr.Biol. 5, 552-563 (1995). [3] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nat.Neurosci. 2, 1019-1025 (1999). [4] T. Sato. Interactions between of visual stimuli in the receptive fields of inferior temporal neurons in awake macaques. Exp. Brain Res. 77:23-30 (1989) [5] E. Rolls and M. Tovee. The responses of single neurons in the temporal visual cortical areas of the macaque when more than one stimulus is present in the receptive field. Exp. Brain Res. 103:409-420 (1995) [6] M. Riesenhuber and T. Poggio. Are cortical models really bound by the "binding problem"? Neuron 24, 87-25 (1999).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Object recognition in clutter: selectivity and invariance