Download Binding Mechanisms in Visual Perception

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neuroplasticity wikipedia , lookup

Artificial neural network wikipedia , lookup

Artificial intelligence for video surveillance wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Recurrent neural network wikipedia , lookup

Neuroeconomics wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Central pattern generator wikipedia , lookup

Neural oscillation wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Neural engineering wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Metastability in the brain wikipedia , lookup

Optogenetics wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Cortical cooling wikipedia , lookup

Neural coding wikipedia , lookup

Sensory cue wikipedia , lookup

Synaptic gating wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Convolutional neural network wikipedia , lookup

Nervous system network models wikipedia , lookup

Visual search wikipedia , lookup

Transsaccadic memory wikipedia , lookup

Perception wikipedia , lookup

Development of the nervous system wikipedia , lookup

Visual selective attention in dementia wikipedia , lookup

Visual memory wikipedia , lookup

Visual extinction wikipedia , lookup

Binding problem wikipedia , lookup

Neural correlates of consciousness wikipedia , lookup

Visual N1 wikipedia , lookup

Time perception wikipedia , lookup

C1 and P1 (neuroscience) wikipedia , lookup

Visual servoing wikipedia , lookup

Neuroesthetics wikipedia , lookup

Efficient coding hypothesis wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

P200 wikipedia , lookup

Transcript
Binding Mechanism in Visual Perception
Zhiyi Zhou
Department of Biomedical Engineering, Vanderbilt Unersity
Our visual system provides us an effective perceptual pathway. It processes huge
amount of visual information that we receive every second. In the complex visual scene,
different objects are normally mixed together with each other, and some parts of the
object may be even blocked by other objects, which makes visual perception difficult. On
the other hand, only a small fraction of the visual components that exist in the visual field
are actually useful in our cognition and behavior. The other components can either reduce
or enhance the perception. Binding is an internal mechanism existing in the visual
perception process that helps to segregate specific visual cues and integrate them to
create perceptual objects.
Gestalt Principles of Perception
Each visual object has a specific combination of different features, such as color,
brightness, contrast, orientation, spatial location, moving direction, etc. The visual system
differentiates objects in the visual field from one other by detecting their distinct features,
and also binds the correlated objects together based on the relations among them. The
Classic Gestalt Principles were first proposed in early 20th century, which define several
primary laws that construct the basic framework of object perception (Tovée 1996).
1
1. Proximity and Similarity: Objects that are located close to each other or elements
that look similar tend to be grouped as a coherent unit (figure 1).
Figure1: The blue squares and pink
circles tend to be perceived as two
groups based on the color and shape.
2. Closure: Visual system tends to follow and close the contours.
Figure 2: Even though the two circles
in the picture don’t have intact contours
because of the overlapping among each
other, we still recognize there are three
circles.
3. Common fate: Objects that are moving in the same direction tend to be seen as a
unit.
Figure 3: The arrows in the left figure that
move toward up-right direction tend to be
perceived as one group, and the arrows
moving toward up-left direction will be
united as another group.
4. Familiarity: Elements that look familiar or meaningful tend to be grouped
together.
2
Figure 4: Most people can see there are
some cats in the left abstract picture, even
though they are not exactly like the real
cats. We recognized the distorted objects
based on our perceptual experience.
Neural Representation in Visual Perception
When visual information is transmitted through the visual pathway, each object
actives a population of neurons (Ghose and Maunsell, 1999). Every neuron among this
population are activated by certain object features, such as color, brightness, orientation,
motion, spatial location, and so on. For example, some retina photoreceptors are selective
to the light with long wavelength while others may be selective to the light with short
wavelength. Neurons in P-B visual pathway are selective to color and brightness, but
neurons in M pathway are selective to object moving direction. It’s not hard to imagine
that each specific feature of the object is represented by one neuron or a group of neurons
that are selective to their preferred stimuli. However, if all the neurons that respond to a
visual object are activated independent, the number of the neurons involved in the object
perception will increase exponentially because the visual system has to recruit a huge
amount of neurons to get all possible combinatorial representations (Ghose and Maunsell,
1999). Since the visual objects perceived by the visual system usually have many distinct
features, using independent neural representation will reduce the efficiency of the visual
system, or maybe the visual system will never have so many neurons to represent the
complicated visual scenes.
3
On the other hand, neurons in the higher-level visual cortex tend to have larger
receptive fields because they receive convergent inputs from lower-level neurons, so the
neurons at the higher levels are more selective and they respond to more complex stimuli
such that they represent more complicated features of the perceived objects (Singer and
Gray, 1995). This phenomenon raises the possibility that there may exist in the higher
order visual cortex a small group of highly specialized neurons that are selective to very
complex stimuli (Ghose and Maunsell, 1999). But this mechanism implies that every
very complicated feature or every distinct object will need at least one highly specialized
neuron sitting on the top of the visual system. This kind of architecture will limit the
ability of the visual system to recognize new objects because of its inflexibility (Singer
and Gray, 1995).
Temporal Correlation Hypothesis
The visual system processes the information that represents infinite combinations
of object features, this implies that the visual system must also have infinite ability to
represent those feature combinations. The neurons in the higher levels of the visual
system have bigger receptive fields, and they are more sensitive to complex features than
to elementary components. Research showed that information converging is an important
feature of the visual system, ie. the response in the middle temporal visual area (MT)
induced by complex pattern motion was not found in lower level V1 (Ghose and
Maunsell, 1999), perceptual binocularity doesn’t exist in LGN but can be found in area
V1 or higher structure. However, convergence of distributed local features to higher order
cortical areas is not the only phenomenon in binding process (Singer and Gray, 1995).
4
Singer and Gray proposed the “temporal correlation hypothesis”(Singer and Gray, 1995),
which predicted that instead of converging information to single cell at the next stage,
neurons activate different groups of complex neurons across different cortical areas.
Therefore, the architecture of the visual system is rather a multiple to multiple neural
network than a multiple to one converging pyramid. Once activated by their preferred
stimuli, neuron populations that across different cortical areas show synchronous activity
with certain pattern such that each object feature is coded by distinct phases and
frequencies of modulation (Singer and Gray, 1995; Ghose and Maunsell, 1999). After the
elementary features of the objects have been processed, the neural populations activated
by different features of the same objects will be grouped together by showing
synchronous activity and these populations will be segregated from the neural
populations that response to other objects presented in the visual field at the time. Thus
visual information processing on a perceptual object is not focused on a single cortical
location, instead it is implemented by a labeled neural population assembly with its
members distribute across several or a lot of different cortical locations depending on the
complexity of the object’s characters. In this visual perception structure, the relations of
neurons among the same population and relations among different populations are
dynamically correlated, ie. same neuron can modify its relationship with other neurons at
any time to participate in representing different perceptual features. This kind of dynamic
association not only extends the representation range with relative small size of neural
population but also make the visual system more adaptive to new object features.
Binding with Synchrony
5
When photoreceptors in the retina are stimulated by photons, these cells modulate
their chemical transmitter releasing based on light frequency and intensity, which causes
subsequent cellular electrical activity change of the following bipolar cells, ganglion
cells, and more complex visual neurons in the visual system (McIlwain 1996). How does
neural signals enable the visual system to recognize specific object from its background?
Synchronous neural activities have been found extensively among different brain
functional areas, it’s also an important mechanism that also exists in visual perception.
Gray et al (1989) recorded neural signals in cat primary visual cortex (V1) using moving
light bars with different orientation and moving directions as stimuli. Oscillatory
responses with frequency range of 40-60 Hz were observed across separated recording
areas. When two neurons were activated individually by two light bars moving in
opposite directions, the activities of these two neurons didn’t show correlated relation. If
the two light bars were moving in the same direction, these two neurons showed weak
synchronous activities, and this synchrony significantly increased when neurons were
activated simultaneously by a single long light bar. Eckhorn et al. (1988) also found
distinct object features, such as position, orientation, motion, would cause specific
stimulus-evoked (SE)-resonances with frequency range of 35-85 Hz within and out of the
visual cortex. In the “temporal correlation hypothesis”(Singer and Gray, 1995), it’s
predicted (1) local and global features are coded individually by the coherent neural
activities within the same or different cortical columns, (2) linking of features across
different categories and spatial locations are implemented by synchronous neural firing in
different cortical areas, (3) coherent activities in motor and sensory areas contribute to
sensorimotor integration. Therefore, different neurons or neural populations are labeled
6
as correlated entities through synchronous neural activities which change spontaneous
neural firing into meaningful pattern.
Local and Global Coherence
The size of the receptive fields of visual neurons in the lower-levels of visual
system is smaller compared with that of more central neurons, this functional difference
determines that the early stages of visual perception is primarily focused on local
characters of the perceptual objects (Alais et al, 1998). These local features will be
processed in the primary visual cortex (A17) within the same vertical column (Singer and
Gray, 1995; Eckhorn et al 1988), and cohere elements will bound to generate more
complex patterns. Many neural physiological experiments have shown perceptual
grouping is implemented by synchronous neural firing. Alais et al. (1998) demonstrated
that distributed local features moving coherently could be grouped to create a global unit,
which followed the “common fate” law in the Gestalt psychology. In their experiments,
the contrast of spatially distributed gratings was adjusted such that the gratings could
either be perceived as independent drifting elements or as a bound unit with its motion
direction corresponding to the vector sum of that of individual local drifting components.
The results show that the incidence of coherence increased with temporally correlated
contrast modulations decreased with uncorrelated modulations. In the neural synchrony
experiment (Castelo-Branco et al, 2000), when coherent perception of a new unitary
pattern (pattern motion) happened, the original motion of the two superimposed gratings
could not be perceived even though they are both drifting in the neuron’s preferred
direction. Instead, The synchronous neural activities in area V2 and PMLS bound the two
7
motion features and create the observed new moving direction of the unitary pattern
(figure5).
a
b
c
figure 5. a,b Original lines are moving in different neuron’s preferred moving directions; c the
unitary pattern is moving in a new direction intermediate to the neuron’s preference.
During the visual perception, elements with more salient features are normally
easier to be detected. Some researches have provided evident that neurons in area V1
have stronger responses to the contexture figure than to the background. Supèr et al
(2001) studied monkey’s neural activity during a figure-ground experiment. The stimulus
they used in the experiment consisted of a texture background made of oriented lines in
which a small patch with orthogonal oriented lines as a figure. The small patch in the
texture ground was made more or less salient by changing the length of the lines in both
figure and ground. Two monkeys were trained to make response once they detected the
figure and neural activities in area V1 were recorded during the experiment tasks.
Experimental results showed that there were two sensory processing modes in monkey’s
primary visual cortex. One (mode1) responded to the figure and the other one (mode2)
responded to the texture background. In the conditions when monkeys detected the patch
figure from the ground, there existed significant amplitude difference (contexture
modulation) between these two modes, but in the “not seen” condition, this difference
8
modulation was absent. When the figure patch had the intermediate salience, the modulation
was also lower than that in the stimulus with higher salience. Therefore, the visual system
implements the figure-ground function by constructing different processing procedures
with activity modulation. To further examine the mechanisms of segregate figures from
ground by the neurons in primary visual cortex, Rossi et al (2001) compared the cell
responses to figure with varying sizes. A round texture figure with controlled line
orientation was presented with orthogonal line context in the background. While
changing the size of the figure, neural responses in area V1 were measured, the results
showed that when the border of the presented figure was close to the border of the
neurons’ receptive fields, those neurons exhibited enhanced responses. As the size of
circular boundary between the figure and ground increased beyond the size of the
neurons’ receptive fields, this enhancement effect was suppressed.
Visual Coherence in Motion Perception
“Common fate” law of Gestalt theory defines that the visual system tends to link
the objects of component that are moving in the direction as a unitary group. The
example is that if there are many small uniform dots randomly moving within the visual
field, the visual system is not able to detect any meaningful pattern. However if some
dots suddenly cease their random motion and move together toward the same direction,
we will see that the dots with the common motion create a moving pattern that pop-up
from its noisy background. What is underlying mechanism in the visual perception to
perceive motion and segregate the moving figures from their background?
9
Castelo-Branco et al. (2000) studied neural activities correlated to different
moving patterns. In their experiment, two gratings were moving on orthogonal directions
as visual stimuli presented to cat. When the contrast of the two gratings was adjusted to
make them look like a signal pattern moving in one direction (pattern motion), strong
synchronized neural activity was found in area V2 and PMLS (postero-medial bank of
the later suprasylvian sulcus). However, if the contrast of the two gratings were adjusted
such that one grating was transparent and moving on top of the other one (component
motion), then the two gratings were not bound as one pattern, and no synchrony was
found in the above visual cortical areas. Another research (Adelson and Movshon 1982)
also suggested that contrast change affected the coherent perception of two superimposed
gratings moving in two different directions.
In the above studies, the contrast of each grating is coded by a distinct population
of neurons. When the contrast was adjusted to induce synchronous neuron activities
between two populations of neurons, the two gratings presented in the task were
perceived as one moving pattern even though they were moving in different directions.
However, when the synchrony failed to happen, those two gratings were perceived as two
independently moving components.
Motion perception is actually the discrimination of local contrast change caused
by spatial position shift of moving objects (Lappin, 2002), and this procedure is mainly
implemented by the ganglion cell and cortical neurons in the M pathway. Motion
coherent theory (Yuille, 1988) suggested the computation of motion in the visual system
has two stages in which the velocity field of the perceptual image should be first
estimated in the measuring stage, and then constructed over the entire visual field. Skuler
10
(2001) studied the role of luminance change in solving binding in common fate moving.
In his experiment, the luminance of both the figure and the ground in the stimuli
modulated sinusoidally around the mean value, and their modulations were controlled to
change with the same frequency but different in different phase in each task, and the
human subjects were asked to report the figure orientation they perceived. Results
showed that the performance depended on both the modulation frequency and relative
phase difference between the figure and ground. Thus Sekuler (2001) suggested that
common change in the direction of luminance modulation might help the visual system to
segregate the moving object from the background. However, local contrast change is not
the only cue perceived by the visual system in motion detection. Lappin et al (2002)
examined the detection sensitivity of contrast change caused by moving object and
stationary oscillation. The asymmetrical contrast change and local contrast dipole shifting
make object motion more detectable than stationary oscillation, even though the contrast
changes have the same strength in these two conditions. This result showed that changes
in local contrast and spatial position both act as information cues in motion perception.
Figure-ground Mechanism in Visual Perception
Since most LGN neurons send their projects to the primary visual cortex (V1 or
striate cortex), area V1 is considered as the first critical visual information processing
station along the visual pathway. It’s believed that area V1 has its function mainly
focused on processing local object features, such as orientation, color, contrast, etc
(Tovée 1996). However, research experiments provided evidence that showed area V1 is
also the first stage of binding local features in the visual perception.
11
Supèr et al (2001) observed that area V1 has two different processing modes
corresponding to figure and ground information processing, and there exits significant
contexture modulation between these two modes if figure is more salient in its
background. One of the other experiments was studied by Lamme (1995). In this
experiment, the background of the stimuli contained either randomly moving dots or
oriented line segments in which the figure was a square patch sitting in the background
with dots moving in certain direction or lines being distributed in certain orientation and
awaking monkeys were trained to identify the figure patch from the background by
making saccadic eye movement towards the location of the figure. Neural signal
recordings showed that most V1 neurons recorded in the experiment showed stronger
response to the figure than to the similar features in the background, which was in
accordance with the results in Supèr’s experiment (2001). However, the responses of
monkeys’ eye movement, which had a 30-40 msec delay after the onset of neural
response, was enhanced when the figure patch covered the receptive field and reduced
when the figure patch and receptive field didn’t overlap. The experimental result
suggested that surround inhibition is not the only regulation mechanism existing in the
primary visual cortex. The lateral interactions among visual neurons in area V1 occur
extensively across and also beyond the receptive field, which produce an asymmetry in
perceiving features of figure and ground (Lamme, 1995). Supèr et al (2003) further
proved that strong contextual modulation of neural activities in Area V1 leads to fast
saccades and weak modulation leads to slow responses.
Area V2 (A18) accepts axon projections from area V1, including both M and P
pathways, visual information will be sent to different higher level cortex areas through
12
the distribution processing in area V2. Research evidence suggested that area V2 also
participate in local feature grouping and figure-ground segregation. Woelbern et al (2002)
recorded neural signals in area V2 on awaking monkey which was trained to discriminate
figure (parallel distributed uniform blobs) from its background (randomly distributed
blobs). They found that feature binding and figure-ground segregation is correlated with
neural synchrony at γ (35-80Hz) frequency. No perception-related differences were found
in either the low frequency ranges or amplitude measures of multiple unit activities and
local field potential, which suggested that transient phase locking might support figureground segregation without modulating spike rates.
Feedback and Interaction Influence in Visual Perception
The visual system is a layered perceptual circuit. Neurons in each layer receive
input from lower-level neurons, and project its output to the next layer. Visual
information is processed and interpreted in each step such that the neurons located toward
the top of the system tend to be responsible for more complex perceptual features. On the
other hand, horizontal and top-down interactions also extensively exist within and
between LGN, primary visual cortex, and extrastriate cortical areas. Visual perception is
made more accurate and efficient through these modulations. Since the perception of
important visual features is normally intervened by irrelevant neighboring objects, the
visual system has to have the ability to inhibit the reactions induced by relatively
unimportant components.
Hupé et al (1998) reported that cortical feedback improves discrimination during
figure-ground segregation. They found that when area V5, which has extensive feedback
13
impact on area V3, was inactivated by cooling, the neural response of area V3 to lowsalience stimuli significantly increased, which was not found in the response to highsalience stimuli.
Neurophysiological research shows attention can significantly increases responses
of area V4 to attended visual stimuli. Reynolds and Desimone (2003) recorded the neural
responses of area V4 in awaking monkeys when they were presented with a pair of
stimuli in the receptive field. The “reference stimulus” had the preferred orientation and
spatial frequency of the neuron and was controlled at fixed contrast, and the “probe
stimuli” with varied contrast was chosen to have nonpreferred orientation and spatial
frequency. Though the nonpreferred stimuli generally couldn’t elicit strong responses
compared with the preferred stimulus, they could suppress the perceptual response to the
preferred stimulus, and this inhibitory effect increased when the contrast of the
nonpreferred stimulus increased. Thus Reynolds and Desimone (2003) predicted that
when the visual information transmits in the visual pathway, the signals that represent the
salient object component such as figure are magnified while the responses to the
unsalient features such as components in the ground are reduced. Through this
modulation, neural responses in higher order cortical areas mainly reflect the attended
components and the signals induced by unattended components are filtered out at the
lower level visual cortex such that the perceptual effect of important information is
boosted.
14
Summary
The visual system is a powerful and efficient perceptual system. It has a layered
functional hierarchy in which information is transmitted in a bottom-up pathway which
includes neural retina, LGN, primary visual cortex, as well as higher level cortical areas.
When the visual system is processing the information that reflects the structure and
characters of the complex visual scene, it has the ability to segregate the important
features from the background and link those components to create the perceptual objects
in our brain. Existing research evidence shows synchronous neural activities across the
visual pathway is the possible main mechanism that is employed by the visual system in
binding object features, and the synchrony not only exist in local cortical columns but
also happens between different cortical areas or even across different brain hemispheres
such that local and global visual features can be defined accurately and effectively. The
visual system is an interactive system. Visual information is mainly converged from the
lower order structure to the higher level cortical area, but top-down feedback and
horizontal influence modulate the transmission such that important components are
preserved and irrelevant information are filtered out.
Reference:
1. Adelson E, Movshon JA, Phenomenal coherence of moving visual patterns, Nature 300,
523-525 (1982)
2. Alais D, Blake R, and Lee SH, Visual features that vary together over time group
together over space, Nature Neuroscience 1, 160-164 (1998)
3. Castelo-Branco M, Goebel R, Neural synchrony correlates with surface segregation rules,
Nature 405, 685-689 (2000)
4. Eckhorn R, Bauer R, and Jordan W, Coherent oscillations: A mechanism of feature
linking in the visual cortex? Biological Cybernetics 60, 121-130 (1988).
5. Ghose GM, Maunsell J, Specialized representations in visual cortex: A role for binding?
Neuron 24, 79-85 (1999)
15
6. Gray C, Könlg P et al, Oscillatory responses in cat visual cortex exhibit inter-columnar
synchronization which reflects global stimulus properties, Nature 338, 334-338 (1989)
7. Gray C, The temporal correlation hypothesis of visual feature integration: Still alive and
well, Neuron 24, 31-47 (1999)
8. Hupé JM, James AC et al, Cortical feedback improves discrimination between figure and
background by V1, V2 and V3 neurons, Nature 394, 784-787 (1998)
9. Lappin JS, Tadin D, and Whittier EJ, Visual coherence of moving and stationary image
changes, Vision Research 42, 1523-1534 (2002)
10. Lamme V, The neurophysiology of figure-ground segregation in primary visual cortex, J
Neurosci 15, 1605-1615 (1995)
11. McIlwain J, An introduction to the biology of vision, pp.75-99, Cambridge University
Press.
12. Reynolds JH, Desimone R, Interacting Roles of Attention and visual salience in V4,
Neuron 37, 853-863 (2003)
13. Rossi A, Contextual modulation in primary visual cortex of macaques, J Neurosci 21,
1698-1709 (2001).
14. Sekuler A, Generalized common fate: Grouping by common luminance changes,
Psychological Science 12, 437-444 (2001)
15. Singer W, Gray C, Visual feature integration and the temporal correlation hypothesis,
Annu. Rev. Neurosci. 18, 555-586 (1995)
16. Supèr H, Spekreijse H, Two distinct modes of sensory processing observed in monkey
primary visual cortex (V1), Nature Neuroscience 4, 304-310 (2001)
17. Supèr H, Spekreijse H, Lamme V, Neuroscience Letters 344, 75-78 (2003)
18. Tovée M, An introduction to the visual system, pp.112-131, Cambridge University Press.
19. Tovée M, An introduction to the visual system, pp.59-76, Cambridge University Press.
20. Yulle A, A computational theory for the perception of coherent visual motion, Nature
333, 71-74 (1988).
21. Woelbrn T, Eckhorn R et al, Perceptual grouping correlates with short synchronization in
monkey prestriate cortex, NeuroReport 13, 1881-1886 (2002)
16