Download Binding Mechanisms in Visual Perception

Binding Mechanism in Visual Perception Zhiyi Zhou Department of Biomedical Engineering, Vanderbilt Unersity Our visual system provides us an effective perceptual pathway. It processes huge amount of visual information that we receive every second. In the complex visual scene, different objects are normally mixed together with each other, and some parts of the object may be even blocked by other objects, which makes visual perception difficult. On the other hand, only a small fraction of the visual components that exist in the visual field are actually useful in our cognition and behavior. The other components can either reduce or enhance the perception. Binding is an internal mechanism existing in the visual perception process that helps to segregate specific visual cues and integrate them to create perceptual objects. Gestalt Principles of Perception Each visual object has a specific combination of different features, such as color, brightness, contrast, orientation, spatial location, moving direction, etc. The visual system differentiates objects in the visual field from one other by detecting their distinct features, and also binds the correlated objects together based on the relations among them. The Classic Gestalt Principles were first proposed in early 20th century, which define several primary laws that construct the basic framework of object perception (Tovée 1996). 1 1. Proximity and Similarity: Objects that are located close to each other or elements that look similar tend to be grouped as a coherent unit (figure 1). Figure1: The blue squares and pink circles tend to be perceived as two groups based on the color and shape. 2. Closure: Visual system tends to follow and close the contours. Figure 2: Even though the two circles in the picture don’t have intact contours because of the overlapping among each other, we still recognize there are three circles. 3. Common fate: Objects that are moving in the same direction tend to be seen as a unit. Figure 3: The arrows in the left figure that move toward up-right direction tend to be perceived as one group, and the arrows moving toward up-left direction will be united as another group. 4. Familiarity: Elements that look familiar or meaningful tend to be grouped together. 2 Figure 4: Most people can see there are some cats in the left abstract picture, even though they are not exactly like the real cats. We recognized the distorted objects based on our perceptual experience. Neural Representation in Visual Perception When visual information is transmitted through the visual pathway, each object actives a population of neurons (Ghose and Maunsell, 1999). Every neuron among this population are activated by certain object features, such as color, brightness, orientation, motion, spatial location, and so on. For example, some retina photoreceptors are selective to the light with long wavelength while others may be selective to the light with short wavelength. Neurons in P-B visual pathway are selective to color and brightness, but neurons in M pathway are selective to object moving direction. It’s not hard to imagine that each specific feature of the object is represented by one neuron or a group of neurons that are selective to their preferred stimuli. However, if all the neurons that respond to a visual object are activated independent, the number of the neurons involved in the object perception will increase exponentially because the visual system has to recruit a huge amount of neurons to get all possible combinatorial representations (Ghose and Maunsell, 1999). Since the visual objects perceived by the visual system usually have many distinct features, using independent neural representation will reduce the efficiency of the visual system, or maybe the visual system will never have so many neurons to represent the complicated visual scenes. 3 On the other hand, neurons in the higher-level visual cortex tend to have larger receptive fields because they receive convergent inputs from lower-level neurons, so the neurons at the higher levels are more selective and they respond to more complex stimuli such that they represent more complicated features of the perceived objects (Singer and Gray, 1995). This phenomenon raises the possibility that there may exist in the higher order visual cortex a small group of highly specialized neurons that are selective to very complex stimuli (Ghose and Maunsell, 1999). But this mechanism implies that every very complicated feature or every distinct object will need at least one highly specialized neuron sitting on the top of the visual system. This kind of architecture will limit the ability of the visual system to recognize new objects because of its inflexibility (Singer and Gray, 1995). Temporal Correlation Hypothesis The visual system processes the information that represents infinite combinations of object features, this implies that the visual system must also have infinite ability to represent those feature combinations. The neurons in the higher levels of the visual system have bigger receptive fields, and they are more sensitive to complex features than to elementary components. Research showed that information converging is an important feature of the visual system, ie. the response in the middle temporal visual area (MT) induced by complex pattern motion was not found in lower level V1 (Ghose and Maunsell, 1999), perceptual binocularity doesn’t exist in LGN but can be found in area V1 or higher structure. However, convergence of distributed local features to higher order cortical areas is not the only phenomenon in binding process (Singer and Gray, 1995). 4 Singer and Gray proposed the “temporal correlation hypothesis”(Singer and Gray, 1995), which predicted that instead of converging information to single cell at the next stage, neurons activate different groups of complex neurons across different cortical areas. Therefore, the architecture of the visual system is rather a multiple to multiple neural network than a multiple to one converging pyramid. Once activated by their preferred stimuli, neuron populations that across different cortical areas show synchronous activity with certain pattern such that each object feature is coded by distinct phases and frequencies of modulation (Singer and Gray, 1995; Ghose and Maunsell, 1999). After the elementary features of the objects have been processed, the neural populations activated by different features of the same objects will be grouped together by showing synchronous activity and these populations will be segregated from the neural populations that response to other objects presented in the visual field at the time. Thus visual information processing on a perceptual object is not focused on a single cortical location, instead it is implemented by a labeled neural population assembly with its members distribute across several or a lot of different cortical locations depending on the complexity of the object’s characters. In this visual perception structure, the relations of neurons among the same population and relations among different populations are dynamically correlated, ie. same neuron can modify its relationship with other neurons at any time to participate in representing different perceptual features. This kind of dynamic association not only extends the representation range with relative small size of neural population but also make the visual system more adaptive to new object features. Binding with Synchrony 5 When photoreceptors in the retina are stimulated by photons, these cells modulate their chemical transmitter releasing based on light frequency and intensity, which causes subsequent cellular electrical activity change of the following bipolar cells, ganglion cells, and more complex visual neurons in the visual system (McIlwain 1996). How does neural signals enable the visual system to recognize specific object from its background? Synchronous neural activities have been found extensively among different brain functional areas, it’s also an important mechanism that also exists in visual perception. Gray et al (1989) recorded neural signals in cat primary visual cortex (V1) using moving light bars with different orientation and moving directions as stimuli. Oscillatory responses with frequency range of 40-60 Hz were observed across separated recording areas. When two neurons were activated individually by two light bars moving in opposite directions, the activities of these two neurons didn’t show correlated relation. If the two light bars were moving in the same direction, these two neurons showed weak synchronous activities, and this synchrony significantly increased when neurons were activated simultaneously by a single long light bar. Eckhorn et al. (1988) also found distinct object features, such as position, orientation, motion, would cause specific stimulus-evoked (SE)-resonances with frequency range of 35-85 Hz within and out of the visual cortex. In the “temporal correlation hypothesis”(Singer and Gray, 1995), it’s predicted (1) local and global features are coded individually by the coherent neural activities within the same or different cortical columns, (2) linking of features across different categories and spatial locations are implemented by synchronous neural firing in different cortical areas, (3) coherent activities in motor and sensory areas contribute to sensorimotor integration. Therefore, different neurons or neural populations are labeled 6 as correlated entities through synchronous neural activities which change spontaneous neural firing into meaningful pattern. Local and Global Coherence The size of the receptive fields of visual neurons in the lower-levels of visual system is smaller compared with that of more central neurons, this functional difference determines that the early stages of visual perception is primarily focused on local characters of the perceptual objects (Alais et al, 1998). These local features will be processed in the primary visual cortex (A17) within the same vertical column (Singer and Gray, 1995; Eckhorn et al 1988), and cohere elements will bound to generate more complex patterns. Many neural physiological experiments have shown perceptual grouping is implemented by synchronous neural firing. Alais et al. (1998) demonstrated that distributed local features moving coherently could be grouped to create a global unit, which followed the “common fate” law in the Gestalt psychology. In their experiments, the contrast of spatially distributed gratings was adjusted such that the gratings could either be perceived as independent drifting elements or as a bound unit with its motion direction corresponding to the vector sum of that of individual local drifting components. The results show that the incidence of coherence increased with temporally correlated contrast modulations decreased with uncorrelated modulations. In the neural synchrony experiment (Castelo-Branco et al, 2000), when coherent perception of a new unitary pattern (pattern motion) happened, the original motion of the two superimposed gratings could not be perceived even though they are both drifting in the neuron’s preferred direction. Instead, The synchronous neural activities in area V2 and PMLS bound the two 7 motion features and create the observed new moving direction of the unitary pattern (figure5). a b c figure 5. a,b Original lines are moving in different neuron’s preferred moving directions; c the unitary pattern is moving in a new direction intermediate to the neuron’s preference. During the visual perception, elements with more salient features are normally easier to be detected. Some researches have provided evident that neurons in area V1 have stronger responses to the contexture figure than to the background. Supèr et al (2001) studied monkey’s neural activity during a figure-ground experiment. The stimulus they used in the experiment consisted of a texture background made of oriented lines in which a small patch with orthogonal oriented lines as a figure. The small patch in the texture ground was made more or less salient by changing the length of the lines in both figure and ground. Two monkeys were trained to make response once they detected the figure and neural activities in area V1 were recorded during the experiment tasks. Experimental results showed that there were two sensory processing modes in monkey’s primary visual cortex. One (mode1) responded to the figure and the other one (mode2) responded to the texture background. In the conditions when monkeys detected the patch figure from the ground, there existed significant amplitude difference (contexture modulation) between these two modes, but in the “not seen” condition, this difference 8 modulation was absent. When the figure patch had the intermediate salience, the modulation was also lower than that in the stimulus with higher salience. Therefore, the visual system implements the figure-ground function by constructing different processing procedures with activity modulation. To further examine the mechanisms of segregate figures from ground by the neurons in primary visual cortex, Rossi et al (2001) compared the cell responses to figure with varying sizes. A round texture figure with controlled line orientation was presented with orthogonal line context in the background. While changing the size of the figure, neural responses in area V1 were measured, the results showed that when the border of the presented figure was close to the border of the neurons’ receptive fields, those neurons exhibited enhanced responses. As the size of circular boundary between the figure and ground increased beyond the size of the neurons’ receptive fields, this enhancement effect was suppressed. Visual Coherence in Motion Perception “Common fate” law of Gestalt theory defines that the visual system tends to link the objects of component that are moving in the direction as a unitary group. The example is that if there are many small uniform dots randomly moving within the visual field, the visual system is not able to detect any meaningful pattern. However if some dots suddenly cease their random motion and move together toward the same direction, we will see that the dots with the common motion create a moving pattern that pop-up from its noisy background. What is underlying mechanism in the visual perception to perceive motion and segregate the moving figures from their background? 9 Castelo-Branco et al. (2000) studied neural activities correlated to different moving patterns. In their experiment, two gratings were moving on orthogonal directions as visual stimuli presented to cat. When the contrast of the two gratings was adjusted to make them look like a signal pattern moving in one direction (pattern motion), strong synchronized neural activity was found in area V2 and PMLS (postero-medial bank of the later suprasylvian sulcus). However, if the contrast of the two gratings were adjusted such that one grating was transparent and moving on top of the other one (component motion), then the two gratings were not bound as one pattern, and no synchrony was found in the above visual cortical areas. Another research (Adelson and Movshon 1982) also suggested that contrast change affected the coherent perception of two superimposed gratings moving in two different directions. In the above studies, the contrast of each grating is coded by a distinct population of neurons. When the contrast was adjusted to induce synchronous neuron activities between two populations of neurons, the two gratings presented in the task were perceived as one moving pattern even though they were moving in different directions. However, when the synchrony failed to happen, those two gratings were perceived as two independently moving components. Motion perception is actually the discrimination of local contrast change caused by spatial position shift of moving objects (Lappin, 2002), and this procedure is mainly implemented by the ganglion cell and cortical neurons in the M pathway. Motion coherent theory (Yuille, 1988) suggested the computation of motion in the visual system has two stages in which the velocity field of the perceptual image should be first estimated in the measuring stage, and then constructed over the entire visual field. Skuler 10 (2001) studied the role of luminance change in solving binding in common fate moving. In his experiment, the luminance of both the figure and the ground in the stimuli modulated sinusoidally around the mean value, and their modulations were controlled to change with the same frequency but different in different phase in each task, and the human subjects were asked to report the figure orientation they perceived. Results showed that the performance depended on both the modulation frequency and relative phase difference between the figure and ground. Thus Sekuler (2001) suggested that common change in the direction of luminance modulation might help the visual system to segregate the moving object from the background. However, local contrast change is not the only cue perceived by the visual system in motion detection. Lappin et al (2002) examined the detection sensitivity of contrast change caused by moving object and stationary oscillation. The asymmetrical contrast change and local contrast dipole shifting make object motion more detectable than stationary oscillation, even though the contrast changes have the same strength in these two conditions. This result showed that changes in local contrast and spatial position both act as information cues in motion perception. Figure-ground Mechanism in Visual Perception Since most LGN neurons send their projects to the primary visual cortex (V1 or striate cortex), area V1 is considered as the first critical visual information processing station along the visual pathway. It’s believed that area V1 has its function mainly focused on processing local object features, such as orientation, color, contrast, etc (Tovée 1996). However, research experiments provided evidence that showed area V1 is also the first stage of binding local features in the visual perception. 11 Supèr et al (2001) observed that area V1 has two different processing modes corresponding to figure and ground information processing, and there exits significant contexture modulation between these two modes if figure is more salient in its background. One of the other experiments was studied by Lamme (1995). In this experiment, the background of the stimuli contained either randomly moving dots or oriented line segments in which the figure was a square patch sitting in the background with dots moving in certain direction or lines being distributed in certain orientation and awaking monkeys were trained to identify the figure patch from the background by making saccadic eye movement towards the location of the figure. Neural signal recordings showed that most V1 neurons recorded in the experiment showed stronger response to the figure than to the similar features in the background, which was in accordance with the results in Supèr’s experiment (2001). However, the responses of monkeys’ eye movement, which had a 30-40 msec delay after the onset of neural response, was enhanced when the figure patch covered the receptive field and reduced when the figure patch and receptive field didn’t overlap. The experimental result suggested that surround inhibition is not the only regulation mechanism existing in the primary visual cortex. The lateral interactions among visual neurons in area V1 occur extensively across and also beyond the receptive field, which produce an asymmetry in perceiving features of figure and ground (Lamme, 1995). Supèr et al (2003) further proved that strong contextual modulation of neural activities in Area V1 leads to fast saccades and weak modulation leads to slow responses. Area V2 (A18) accepts axon projections from area V1, including both M and P pathways, visual information will be sent to different higher level cortex areas through 12 the distribution processing in area V2. Research evidence suggested that area V2 also participate in local feature grouping and figure-ground segregation. Woelbern et al (2002) recorded neural signals in area V2 on awaking monkey which was trained to discriminate figure (parallel distributed uniform blobs) from its background (randomly distributed blobs). They found that feature binding and figure-ground segregation is correlated with neural synchrony at γ (35-80Hz) frequency. No perception-related differences were found in either the low frequency ranges or amplitude measures of multiple unit activities and local field potential, which suggested that transient phase locking might support figureground segregation without modulating spike rates. Feedback and Interaction Influence in Visual Perception The visual system is a layered perceptual circuit. Neurons in each layer receive input from lower-level neurons, and project its output to the next layer. Visual information is processed and interpreted in each step such that the neurons located toward the top of the system tend to be responsible for more complex perceptual features. On the other hand, horizontal and top-down interactions also extensively exist within and between LGN, primary visual cortex, and extrastriate cortical areas. Visual perception is made more accurate and efficient through these modulations. Since the perception of important visual features is normally intervened by irrelevant neighboring objects, the visual system has to have the ability to inhibit the reactions induced by relatively unimportant components. Hupé et al (1998) reported that cortical feedback improves discrimination during figure-ground segregation. They found that when area V5, which has extensive feedback 13 impact on area V3, was inactivated by cooling, the neural response of area V3 to lowsalience stimuli significantly increased, which was not found in the response to highsalience stimuli. Neurophysiological research shows attention can significantly increases responses of area V4 to attended visual stimuli. Reynolds and Desimone (2003) recorded the neural responses of area V4 in awaking monkeys when they were presented with a pair of stimuli in the receptive field. The “reference stimulus” had the preferred orientation and spatial frequency of the neuron and was controlled at fixed contrast, and the “probe stimuli” with varied contrast was chosen to have nonpreferred orientation and spatial frequency. Though the nonpreferred stimuli generally couldn’t elicit strong responses compared with the preferred stimulus, they could suppress the perceptual response to the preferred stimulus, and this inhibitory effect increased when the contrast of the nonpreferred stimulus increased. Thus Reynolds and Desimone (2003) predicted that when the visual information transmits in the visual pathway, the signals that represent the salient object component such as figure are magnified while the responses to the unsalient features such as components in the ground are reduced. Through this modulation, neural responses in higher order cortical areas mainly reflect the attended components and the signals induced by unattended components are filtered out at the lower level visual cortex such that the perceptual effect of important information is boosted. 14 Summary The visual system is a powerful and efficient perceptual system. It has a layered functional hierarchy in which information is transmitted in a bottom-up pathway which includes neural retina, LGN, primary visual cortex, as well as higher level cortical areas. When the visual system is processing the information that reflects the structure and characters of the complex visual scene, it has the ability to segregate the important features from the background and link those components to create the perceptual objects in our brain. Existing research evidence shows synchronous neural activities across the visual pathway is the possible main mechanism that is employed by the visual system in binding object features, and the synchrony not only exist in local cortical columns but also happens between different cortical areas or even across different brain hemispheres such that local and global visual features can be defined accurately and effectively. The visual system is an interactive system. Visual information is mainly converged from the lower order structure to the higher level cortical area, but top-down feedback and horizontal influence modulate the transmission such that important components are preserved and irrelevant information are filtered out. Reference: 1. Adelson E, Movshon JA, Phenomenal coherence of moving visual patterns, Nature 300, 523-525 (1982) 2. Alais D, Blake R, and Lee SH, Visual features that vary together over time group together over space, Nature Neuroscience 1, 160-164 (1998) 3. Castelo-Branco M, Goebel R, Neural synchrony correlates with surface segregation rules, Nature 405, 685-689 (2000) 4. Eckhorn R, Bauer R, and Jordan W, Coherent oscillations: A mechanism of feature linking in the visual cortex? Biological Cybernetics 60, 121-130 (1988). 5. Ghose GM, Maunsell J, Specialized representations in visual cortex: A role for binding? Neuron 24, 79-85 (1999) 15 6. Gray C, Könlg P et al, Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties, Nature 338, 334-338 (1989) 7. Gray C, The temporal correlation hypothesis of visual feature integration: Still alive and well, Neuron 24, 31-47 (1999) 8. Hupé JM, James AC et al, Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons, Nature 394, 784-787 (1998) 9. Lappin JS, Tadin D, and Whittier EJ, Visual coherence of moving and stationary image changes, Vision Research 42, 1523-1534 (2002) 10. Lamme V, The neurophysiology of figure-ground segregation in primary visual cortex, J Neurosci 15, 1605-1615 (1995) 11. McIlwain J, An introduction to the biology of vision, pp.75-99, Cambridge University Press. 12. Reynolds JH, Desimone R, Interacting Roles of Attention and visual salience in V4, Neuron 37, 853-863 (2003) 13. Rossi A, Contextual modulation in primary visual cortex of macaques, J Neurosci 21, 1698-1709 (2001). 14. Sekuler A, Generalized common fate: Grouping by common luminance changes, Psychological Science 12, 437-444 (2001) 15. Singer W, Gray C, Visual feature integration and the temporal correlation hypothesis, Annu. Rev. Neurosci. 18, 555-586 (1995) 16. Supèr H, Spekreijse H, Two distinct modes of sensory processing observed in monkey primary visual cortex (V1), Nature Neuroscience 4, 304-310 (2001) 17. Supèr H, Spekreijse H, Lamme V, Neuroscience Letters 344, 75-78 (2003) 18. Tovée M, An introduction to the visual system, pp.112-131, Cambridge University Press. 19. Tovée M, An introduction to the visual system, pp.59-76, Cambridge University Press. 20. Yulle A, A computational theory for the perception of coherent visual motion, Nature 333, 71-74 (1988). 21. Woelbrn T, Eckhorn R et al, Perceptual grouping correlates with short synchronization in monkey prestriate cortex, NeuroReport 13, 1881-1886 (2002) 16

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Binding Mechanisms in Visual Perception