Download A self-organizing model of disparity maps in the primary visual cortex

Document related concepts

Clinical neurochemistry wikipedia , lookup

Apical dendrite wikipedia , lookup

Neural modeling fields wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Neuroesthetics wikipedia , lookup

Subventricular zone wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Central pattern generator wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Recurrent neural network wikipedia , lookup

Neural coding wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Neuroanatomy wikipedia , lookup

Optogenetics wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Development of the nervous system wikipedia , lookup

Convolutional neural network wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Metastability in the brain wikipedia , lookup

Synaptic gating wikipedia , lookup

Nervous system network models wikipedia , lookup

Neural correlates of consciousness wikipedia , lookup

Biological neuron model wikipedia , lookup

Efficient coding hypothesis wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Transcript
A self-organizing model of disparity
maps in the primary visual cortex
TIKESH RAMTOHUL
(S0566072)
MASTER OF SCIENCE
SCHOOL OF INFORMATICS
UNIVERSITY OF EDINBURGH
2006
ABSTRACT
Current models of primary visual cortex (V1) development show how visual features
such as orientation and eye preference can emerge from spontaneous and visually
evoked neural activity, but it is not yet known whether spatially organized maps for
low-level visual pattern disparity are present in V1, and if so how they develop. This
report documents a computational approach based on the LISSOM model that was
adopted to study the potential self-organizing aspect of binocular disparity. It is among
the first studies making use of computational modelling to investigate the topographical
organization of cortical neurons based on disparity preferences.
The simulation results show that neurons develop phase disparities as a result of the
self-organizing process, but that there is no apparent orderly grouping based on
disparity preferences. However there seems to be a strong correlation between disparity
selectivity and orientation preference. Neurons exhibiting relatively large phase
disparities tend to prefer vertical orientations. This leads to suggest that cortical regions
grouped by orientation preferences might be subdivided into compartments that are in
turn organised based on disparity selectivity.
i
ACKNOWLEDGEMENTS
I would like to thank Jim Bednar for his help and support throughout this endeavour.
Thank you for your insightful comments and your patience. My sincere thanks also goes
to Chris Ball who has contributed a lot to my understanding of the Python language and
the Topographica simulator. A big thank you also to all the friends I’ve made during
my stay in Edinburgh. The camaraderie has been most soothing, especially during
stressful situations. And finally, thank you Mom and Dad for always being there for
your son.
ii
DECLARATION
I declare that this thesis was composed by myself, that the work contained herein is my
own except where explicitly stated otherwise in the text, and that this work has not been
submitted for any other degree or professional qualification except as specified.
(Tikesh RAMTOHUL)
iii
TABLE OF CONTENTS
CHAPTER 1 INTRODUCTION ............................................................................................... 1
1.1 MOTIVATION ..................................................................................................................................1
1.1 TASK DECOMPOSITION ................................................................................................................2
1.2 THESIS OUTLINE ............................................................................................................................2
CHAPTER 2 BACKGROUND .................................................................................................. 3
2.1 BASICS OF VISUAL SYSTEM........................................................................................................3
2.1.1 EYE .............................................................................................................................................3
2.1.2 VISUAL PATHWAY..................................................................................................................4
2.1.3 RETINA ......................................................................................................................................4
2.1.4 LGN.............................................................................................................................................5
2.1.5 PRIMARY VISUAL CORTEX ..................................................................................................6
2.2 DISPARITY .......................................................................................................................................7
2.2.1 GEOMETRY OF BINOCULAR VIEWING ..............................................................................7
2.2.2 ENCODING OF BINOCULAR DISPARITY ............................................................................9
2.2.3 DISPARITY SENSITIVITY OF CORTICAL CELLS.............................................................10
2.2.4 CHRONOLOGICAL REVIEW ................................................................................................12
2.2.5 Ohzawa-DeAngelis-Freeman (ODF) ENERGY MODEL (1990) .............................................14
2.2.6 ENERGY MODEL: READ et al (2002)....................................................................................16
2.3 TOPOGRAPHY ...............................................................................................................................17
2.3.1 TOPOGRAPHY AND DISPARITY .........................................................................................18
2.3.2 TS’O, WANG ROE and GILBERT (2001)...............................................................................19
2.4 SELF-ORGANIZATION .................................................................................................................20
2.5 COMPUTATIONAL MODELS ......................................................................................................20
2.5.1 KOHONEN SOM......................................................................................................................21
2.6 LISSOM ...........................................................................................................................................22
2.6.1 LISSOM ARCHITECTURE .....................................................................................................23
2.6.2 SELF-ORGANIZATION IN LISSOM .....................................................................................26
2.7 TOPOGRAPHICA ...........................................................................................................................26
2.7.1 MAP MEASUREMENT IN TOPOGRAPHICA ......................................................................27
2.8 MODEL OF DISPARITY SELF-ORGANIZATION ......................................................................28
2.8.1 WIEMER ET AL (2000) ............................................................................................................29
CHAPTER 3 METHODOLOGY ............................................................................................ 32
3.1 SELF-ORGANIZATION OF DISPARITY SELECTIVITY ...........................................................32
3.1.2 TWO-EYE MODEL FOR DISPARITY SELECTIVITY .........................................................32
3.1.3 DISPARITY MAP MEASUREMENT .....................................................................................35
3.1.4 TYPE OF INPUT ......................................................................................................................38
3.2 PHASE INVARIANCE....................................................................................................................39
3.2.1 TEST CASES FOR ODF MODEL ...........................................................................................41
CHAPTER 4 RESULTS ........................................................................................................... 44
4.1 GAUSSIAN......................................................................................................................................45
4.1.1 DETERMINATION OF DISPARITY THRESHOLD..............................................................45
4.1.2 RECEPTIVE FIELD PROPERTIES AND DISPARITY MAPS..............................................47
4.2 PLUS/MINUS ..................................................................................................................................53
4.2.1 DETERMINATION OF DISPARITY THRESHOLD..............................................................53
4.2.2 RECEPTIVE FIELD PROPERTIES AND DISPARITY MAPS..............................................55
4.3 NATURAL.......................................................................................................................................62
4.4 PHASE INVARIANCE....................................................................................................................64
CHAPTER 5 DISCUSSION..................................................................................................... 66
5.1 RECEPTIVE FIELD STRUCTURE ................................................................................................66
5.2 DISPARITY AND ORIENTATION PREFERENCE......................................................................67
5.3 ORIENTATION PREFERENCE AND PHASE DISPARITY ........................................................69
iv
5.4 VALIDATION AGAINST BIOLOGICAL DATA .........................................................................76
5.4.1 DISTRIBUTION OF PHASE DISPARITIES...........................................................................76
5.4.2 DISPARITY AND ORIENTATION PREFERENCE...............................................................77
5.5 SUMMARY .....................................................................................................................................77
CHAPTER 6 CONCLUSION AND FUTURE WORK ......................................................... 79
6.1 CONCLUSION ................................................................................................................................79
6.2 FUTURE WORK .............................................................................................................................80
6.2.1 PHASE INVARIANCE.............................................................................................................80
6.2.2 DISPARITY SELECTIVITY AND OCULAR DOMINANCE................................................83
6.2.3 VERTICAL DISPARITY..........................................................................................................83
6.2.4 PRENATAL AND POSTNATAL SELF-ORGANIZATION...................................................84
BIBLIOGRAPHY ..................................................................................................................... 85
v
CHAPTER 1
INTRODUCTION
1.1 MOTIVATION
An extremely remarkable function of the visual system is the perception of the world in
three-dimensions although the image cast on the retinas is only two-dimensional. It is
believed that the third dimension, depth, is inferred from the visual cues in the retinal
images, the most prominent one being binocular disparity (Qian, 1997). Humans and
other animals possess two eyes whose range of vision overlap. Objects within the
overlapping region project slightly different images on the two retinas. This is referred
to as disparity and it is believed to be one of the cues that the brain uses for assessing
depth.
Although recent studies of binocular disparity at the physiological level have brought
much insight to the understanding of the role of stereopsis in depth perception, much
experimental work remains to be done to eventually yield a unified account of how the
actual mechanism operates. Trying to unlock the mysteries of the brain by relying solely
on biological data is practically impossible. This is where computational models of the
brain come into the picture. They can provide concrete, testable explanations for
mechanisms of long-term development that are difficult to observe directly, making it
possible to link a large number of observations into a coherent theory. It is essential that
the models are based on real physiological data in order to understand brain functions
(Qian, 1997).
One such computational model is LISSOM (Laterally Interconnected Synergetically
Self-Organizing Map), a self-organizing map model of the visual cortex. It models the
structure, development, and function of the visual cortex at the level of maps and their
connections. Current models of the primary visual cortex (V1) development show how
visual features such as orientation and eye preference can emerge from spontaneous and
visually-evoked activity, i.e. there are groups of neurons across the surface of the cortex
which respond selectively to different types of orientation and ocular dominance. But it
is not known if spatially organized maps for disparity are present in V1, and if so, how
they develop. The main aim of the project was to investigate the potential selforganizing feature of disparity selectivity.
1
1.1 TASK DECOMPOSITION
The project was broken down into 2 main stages. The first one dealt with the
investigation of the input-driven self-organization process when disparate retinal images
are used as input. This consisted of developing an appropriate LISSOM model for
disparity and the implementation of suitable map measuring techniques to illustrate
differences in disparity selectivity. The second stage was concerned with the
investigation of possible ways to integrate or simulate complex cells in LISSOM so that
the self-organizing process becomes phase-independent and hence better suited for
disparity selectivity. Current versions of LISSOM develop phase maps when
anticorrelated inputs are used, for e.g. when positive and negative Gaussian inputs are
fed onto the retina. This is quite a predictable outcome since LISSOM develops
behaviour reminiscent of simple cells, and these cells are sensitive to the phase of the
input stimulus. But real animals do not have such maps in V1. One reason might be
because most cells in real animals are complex, and hence insensitive to phase. Thus,
integration of complex cells in the LISSOM structure could lead to formation of selforganized maps that are independent of phase.
1.2 THESIS OUTLINE
The thesis has been divided into 6 chapters, starting with this introductory chapter. The
remaining chapters are organised as follows:
•
Chapter 2 gives an overview of the visual system, describes the key points about
research on binocular disparity, talks about the self-organizing process and
presents related work on computational modelling of disparity.
•
Chapter 3 describes the methodology used for investigating the self-organizing
process of disparity preferences in LISSOM.
•
Chapter 4 presents the simulation results, giving a brief account of the direct
observations.
•
Chapter 5 provides a discussion based on the simulations, validating the
computational results with biological findings where necessary.
•
Chapter 6 highlights the major outcomes from the study on disparity, and
provides a guide to possible extensions to this research work.
2
CHAPTER 2
BACKGROUND
This chapter presents background material required for a good understanding of the
neural processes involved in disparity encoding. Moreover, the self-organizing process
is discussed thoroughly, together with the importance of computational modelling in
neuroscience. The main components of LISSOM are highlighted to provide a clear
image of how topographic maps form using this model. The chapter also includes
related work on disparity models.
2.1 BASICS OF VISUAL SYSTEM
The human visual system is a biological masterpiece that has been faceted and refined
by millions of years of evolution. Its efficiency is unparalleled in comparison with any
piece of apparatus ever invented. It interprets the information from visible light to build
a three-dimensional representation of the world. This section briefly describes the
prominent constituents of the visual system and outlines the neural mechanisms
involved during visual processing.
2.1.1 EYE
Light entering the eye is refracted as it passes through the cornea. It then passes through
the pupil, whose size is regulated by the dilation and constriction of the iris muscles, in
order to control the amount of light entering the eye. The lens is responsible for
focusing light onto the retina by proper adjustment of its shape.
Figure 2.1: Anatomy of the human eye (reprinted from [26])
3
2.1.2 VISUAL PATHWAY
The main structures involved in early visual processing are the retina, the lateral
geniculate nucleus of the thalamus (LGN), and the primary visual cortex (V1). The
initial step of the processing is carried out in the retina of the eye. Output from the
retina of each eye is fed to the LGN, at the base of each side of the brain. The processed
signals from the LGN are then sent to V1. V1 outputs are then fed to higher cortical
areas where further processing takes place. The diagram below gives a schematic
overview of the visual pathway
Figure 2.2: Visual Pathway (reprinted from [40])
2.1.3 RETINA
The retina is part of the central nervous system. It is located on the inside of the rear
surface of the eye. It consists of an array of photoreceptors and other related cells to
convert the incident light into neural signals. The retinal output takes the form of action
potentials in retinal ganglion cells whose axons collect in a bundle to constitute the
optic nerve.
The light receptors are of two kinds, rods and cones, being responsible for vision in dim
light and bright light respectively. Rods are more numerous than cones but are
conspicuously absent at the centre of the retina. This region is known as the fovea and
represents the centre of fixation. It contains a high concentration of cones, thereby
making it well-suited for fine-detailed vision.
As mentioned earlier, the output of the retina is represented by the activity of the retinal
ganglion cells. An interesting feature of these cells, which is shared by other neurons
higher up in the visual pathway, is their selective responsiveness to stimuli on specific
spots on the retina. The term ‘receptive field’ is used to explain this phenomenon.
4
Stephen Kuffler was the first to record the responses of retinal ganglion cells to spots of
light in a cat in 1953(Hubel, 1995). He observed that he could influence the firing rate
of a retinal ganglion by focusing a spot of light on a specific region of the retina. This
region was the receptive field (RF) of the cell. Levine and Shefner (1991) define a
receptive field as an “area in which stimulation leads to a response of a particular
sensory neuron”. For a retinal ganglion cell or any other neuron concerned with vision,
the receptive field is that part of the visual world that influences the firing of that
particular cell; in other words, it is that region of the retina which receives stimulation ,
consequently altering the firing rate of the cell being studied.
Most retinal ganglion cells have concentric (or centre-surround) RFs. The latter are of
two types, ON-centre cells and OFF-centre cells. These RFs are divided into 2 parts
(centre/surround), one of which is excitatory ("ON"), the other inhibitory ("OFF"). For
an ON-centre cell, a spot of light incident on the inside (centre) of the receptive field
will increase the discharge rate, while light falling on the outside ring (surround) will
suppress firing. The opposite effect is observed for OFF-centre cells. Other cells may
have receptive fields of different shapes. For example, the RFs of most simple cells in
V1 can have a two-lobe arrangement, favouring a 45-degree edge with dark in the upper
left and light in the lower right, and a three-lobe pattern, favouring a 135-degree white
line against a dark background (Miikkulainen et al, 2005)
Figure 2.3: Receptive Fields (reprinted from [40])
2.1.4 LGN
The LGN receives neural signals from the retina, and sends projections directly to the
primary visual cortex, thereby acting as a relay. Its role in the central nervous system is
not very clear, but it consists of neurons that are very similar to the retinal ganglion
cells. These neurons are arranged retinotopically. Retinotopy or topographic
representation implies that as we move along the retina from one point to another, the
corresponding points in the LGN trace a continuous path (Hubel, 1995). The ON-centre
cells in the retina connect to the ON cells in the LGN and the OFF cells in the retina
connect to the OFF cells in the LGN. Both groups of cells share a common
5
functionality, namely that of performing some sort of edge detection on the input
signals.
2.1.5 PRIMARY VISUAL CORTEX
ARCHITECTURE
The primary visual cortex, situated at the rear of the brain, is the first cortical site of
visual processing. Just like the LGN, V1 neurons also exhibit retinotopy, but they have
altogether different characteristics and functionalities as compared to their geniculate
counterparts. To begin with, most V1 neurons are binocular, displaying a strong
response to stimuli from either eye. They also respond selectively to certain features
such as spatial frequency, orientation and direction of movement of the stimulus.
Interestingly, disparity has also been identified as one of the visual cues that cause
selective discharge in V1 cells. The architecture of V1 is such that at a given location, a
vertical section through the cortical sheet consists of cells that have more or less similar
feature preferences. In this columnar model, nearby columns tend to have somewhat
matching preferences while more distant columns show a greater degree of
dissimilarity. Moreover, preferences repeat at regular intervals in every direction, thus
giving rise to a smoothly varying map for each feature (Miikkulainen et al, 2005).For
example, as we move parallel to the surface of V1, there are alternating columns of
cells, known as ocular dominance columns, which are driven predominantly by inputs
to a single eye. Another type of feature map is the orientation map, which describes the
orientation preference of cells changing gradually from horizontal to vertical and back
again as we move perpendicular to the cortical surface.
TYPE OF CELLS
Another important point about V1 which is relevant to this project concerns the type of
cells that can be found. Hubel and Wiesel(1962) subdivided the cortical cells into 2
main groups, simple and complex, based on their RFs. Simple cells often have a twolobe or three-lobe RF (shown in figure 2.3) . Consider a simple cell with a three-lobe
RF, with an ON region flanked by OFF regions. If a bar of light, with the correct
orientation, is incident on the middle region, the firing rate of the cell will increase, but
if the image is incident on the OFF regions, the firing rate will be suppressed. On the
other hand if a dark bar is incident on the ON region, suppression will take place,
whereas excitation will occur if it falls into the OFF regions of the RF. Thus, the
response of simple cells is dependent on the phase of the stimulus. In contrast, the
response of complex cells does not depend on the phase of the stimulus; spikes will be
elicited if the bar, whether dark or bright, is incident on any region within its receptive
field as long as it is properly oriented.
6
2.2 DISPARITY
We are capable of three-dimensional vision despite having only a 2-D projection of the
world on the retina. This remarkable ability might be just a mundane task for the visual
system but it has baffled many researchers for decades. No wonder then that much
effort has been put in by the scientific community to understand the processes taking
place in the brain during depth perception. It is now known that the sensation of depth is
based upon many visual cues (Qian, 1997), for example occlusion, relative size,
perspective, motion parallax, shading, blur, and relative motion (DeAngelis, 2000;
Gonzales and Perez, 1998). Such cues are monocular, but species having frontally
located eyes are additionally subjected to binocular cues, an example of which is
binocular disparity. It refers to differences between the retinal images of a given object
in space, and arises because the two eyes are laterally separated. Three-dimensional
vision based on binocular disparity is commonly referred to as stereoscopic vision.
Although monocular cues are sufficient to provide the sensation of depth, it is the
contribution of stereopsis that makes this process so effective in humans (Gonzales and
Perez, 1998)
2.2.1 GEOMETRY OF BINOCULAR VIEWING
Suppose that an observer fixes his gaze on the white sphere Q (refer to figure 2.4);
fixation by default causes images of the object to fall on the fovea. We say that QR and
QL are corresponding points on the retinas. The black sphere S is closer to the observer,
and as can be deduced easily by geometry, its images fall at non-corresponding points
in the retinas. Similarly, a point further away from the point of fixation will give images
closer to each other compared to corresponding points. Any such lack of
correspondence is known as disparity.
Figure 2.4:Geometry of Stereopsis( adapted from [57])
7
The distance z from the fixation point, which basically represents the difference in
depth, can be deduced from the retinal disparity δ = r-l and the interocular distance I
(Read, 2005). Such a disparity which is directly related to the location of an object in
depth is known as horizontal disparity.
All points that are seen as the same distance away as the fixation point (Q in this case)
are said to lie on the horopter, a surface whose exact shape depends on our estimations
on distance, and hence on our brains (Hubel, 1995). Points in front and behind the
horopter induce negative and positive disparities respectively. The projection of the
horopter in the horizontal plane across the fovea is the Vieth-Muller circle, which
represents the locus of all points with zero disparity (Gonzales and Perez, 1998)
Figure 2.5:Horizontal disparity( reprinted from [14])
Another type of disparity, much less studied but generally accepted to play some role in
depth perception, is vertical disparity. When an object is located closer to one eye than
the other, its image is slightly larger on the retina of that eye. This gives rise to vertical
disparities. Bishop (1989) points out that such disparities occur when objects are viewed
at relatively near distances above or below the visual plane, but which do not lie on the
median plane (a vertical plane through the midline of the body that divides the body into
right and left halves). This can be best explained by an illustration, given in figure 2.6.
Suppose we have a point P which is above the visual plane and to the right of the
median plane, such that it is nearer to the right eye. Simple geometrical intuition shows
that the angles β1 and β2 subtended by P are different and that β2 > β1. The vertical
disparity v is given by the difference in the 2 vertical visual angles, such that v = β2 - β1
(Bishop, 1989)
8
Figure 2.6: Vertical disparity reprinted from [6])
2.2.2 ENCODING OF BINOCULAR DISPARITY
We have seen that disparity is about the difference between the 2 retinal projections of
an object, but how do cortical cells encode this information? There are 2 models that
have been forwarded, the position difference model and the phase difference model. Let
us assume a Gabor-shaped RF for a binocular simple cell in V1. In the position
difference model, the cell has the same RF profile on each eye but with an overall
position shift between the right and left RFs, i.e. the RF profiles have identical shape in
both eyes but are centred at non-corresponding points on the 2 retinas. In the phase
difference model, the RFs are centred at corresponding retinal points but have different
shapes or phases. Figure 2.7 illustrates the differences between position and phase
encoding.
(a) Position Difference Model
(b) Phase Difference Model
Figure 2.7: Disparity encoding( reprinted from [2])
9
There is evidence that supports each of these two encoding schemes. Position disparity
was demonstrated first by Nikara et al (1968), and later by Joshua and Bishop (1970),
von der Heydt et al (1978) and Maske et al (1984). Evidence of the phase disparities has
also been shown by various studies (DeAngelis et al., 1991, 1995; DeValois and
DeValois, 1988; Fleet et al., 1996; Freeman and Ohzawa, 1990; Nomura et al., 1990;
Ohzawa et al., 1996; Qian, 1994; Qian and Zhu, 1997; Zhu and Qian, 1996)
2.2.3 DISPARITY SENSITIVITY OF CORTICAL CELLS
The two mains groups of cortical cells, namely simple and complex, differ in the
complexity of their behaviour. Hubel and Wiesel (1962) proposed a hierarchical
organisation in which complex cells receive input from simple cells, which in turn
receive input from the LGN. In this model, the simple cells have the same orientation
preference and their RFs are arranged in an overlapping fashion over the entire RF of
the complex cell (Hubel, 1995). They suggested that the complex cells would only fire
if the connections between the simple cells and the complex cell are excitatory and the
input stimuli are incident on specific regions of the RF. This model is not unanimous
among neurophysiologists since some studies have shown that some complex cells have
direct neural connections with the LGN (Qian, 1997). Research in this area has been
very active and there is a much better understanding of the properties of these cells
nowadays, especially in the role they might play in disparity encoding. It is generally
accepted that most simple and complex cells are binocular, i.e. they have receptive
fields in both retinas and show a strong response when either eye is stimulated.
Furthermore, researchers are unanimous over the disparity selectivity of these cells; they
respond differently to different disparity stimuli. These two properties are essential for
disparity computation.
It is therefore tempting to conclude that both types of cell are suitable disparity
detectors, but this is not the case since they have quite distinct characteristics. Receptive
fields of simple cells consist of excitatory (ON) and inhibitory (OFF) subregions that
respond to light and dark stimuli respectively. Complex cells, on the other hand,
respond to stimuli anywhere within their RFs for both bright and dark bars because of a
lack of separate excitatory and inhibitory subregions in the receptive fields (Skottun et
al, 1991). The following diagram illustrates typical RF types for complex and simple
cells. Complex cells generally have larger RFs and respond to targets even if the
contrast polarity is altered (for example, if we replace the black dots with the white dots
and vice-versa). Simple cells have discrete RF subregions and respond when the correct
input configuration is incident on their RFs.
10
Figure 2.8: 1-D RF for simple and complex cells (reprinted from [47])
Ohzawa and collaborators (1990) describe simple cells as “sensors for a multitude of
stimulus parameters” because besides responding to disparity, they also respond
selectively to stimulus position, contrast polarity, spatial frequency, and orientation. On
the contrary, disparity encoding in complex cells is independent of stimulus position
and contrast polarity; changes in irrelevant parameters would therefore not affect the
disparity encoding features of such cells (Ohzawa et al, 1990). The diagram below
shows how the binocular RF of a simple cell changes with lateral displacement of the
stimulus, whereas the complex cell has an elongated RF along the position axis, making
it a better disparity detector. Note that a binocular RF is generated by plotting the
response of the cell as a function of the position of the stimulus in each eye. The
stimulus used is typically a long bar at the preferred orientation of the cell.
Figure 2.9: Binocular RF for simple and complex cells (reprinted from [57])
11
2.2.4 CHRONOLOGICAL REVIEW
The first major contribution to the study of neural mechanisms in binocular vision came
from Barlow et al in the 1960’s. They found out that neurons fire selectively to objects
placed at different stereoscopic depths in the cat striate cortex (Barlow et al, 1967). A
few years later, Poggio and Fischer (1977) confirmed these findings when they
investigated awake behaving macaque monkeys. The visual system of these animals
resembles that of humans (Cowey and Ellis, 1967; Farrer and Graham, 1967; Devalois
and Jacobs, 1968; Harweth et al., 1995), and several studies have led to the conclusion
that they have characteristics of stereopsis similar to humans (Gonzales and Perez,
1998). Using solid bars as stimuli, Poggio and Fischer identified 4 basic types of
neurons, namely tuned-excitatory, tuned-inhibitory, near and far. Tuned-excitatory
neurons discharge best to objects at or very near to the horopter, i.e. zero disparity;
tuned inhibitory cells respond at all disparities except those near zero; near cells are
more responsive to objects that lie in front of the horopter, i.e. to negative disparities,
and finally far cells prefer objects that are behind the horopter, i.e positive disparities
(DeAngelis, 2000)
The invention of the random dot stereogram (RDS) by Julesz (1971) also contributed
largely to this field. A random dot stereogram is a pair of two images of randomly
distributed dots that are identical except that a portion of 1 image is displaced
horizontally (figure 2.10).
Figure 2.10: RDS reprinted from [57])
It looks unstructured when viewed individually, but under a condition of binocular
fusion, the shifted region jumps out vividly at a different depth (Qian, 1997). Several
experiments based on solid bars and RDS provided results that strengthened the notion
of 4 basic categories of disparity-selective neurons being present (Gonzales and Perez,
12
1998). A few reports also mention the presence of 2 additional categories, namely tuned
near and tuned far (Poggio et al, 1988; Gonzales et al, 1993a), combining the properties
of tuned excitatory cells with those of near and far cells respectively. Figure 2.11
illustrates the disparity tuning curves for the 6 categories of cells obtained from
experiments on the visual cortex of monkeys carried out by Poggio et al (1988) .TN
refers to tuned near cell, TE to tuned excitatory, TF to tuned far, NE to near, TI to
tuned inhibitory, and FA to far.
Figure 2.11: Disparity Tuning Curves (reprinted from [24])
Amidst all these exciting findings, a proper mathematical approach to simulate
disparity-selectivity of cortical cells failed to emerge (Qian, 1997). In 1990 however,
Ohzawa, DeAngelis and Freeman proposed the disparity energy model to simulate the
response of complex cells. The latter has been widely accepted as a very good model of
the behaviour of disparity-selective cells in V1. The next section gives an in-depth
description of this interesting approach. However, results obtained from some
physiological studies have not been in phase with certain of its predictions. In an
attempt to account for these discrepancies, Read et al (2002) came up with a modified
version of the original energy model. The basic features of Read’s model are also
highlighted in this chapter.
13
2.2.5 Ohzawa-DeAngelis-Freeman (ODF) ENERGY MODEL (1990)
In an earlier section, certain properties of complex cells have been highlighted that
make them well-suited for disparity encoding. To recap, the interesting features of these
cells include selectivity to different stimuli disparities, an indiscriminate response to
contrast polarity and an apparent insensitivity to stimulus position. Ohzawa and
colleagues postulate that these are not enough to create a suitable disparity detector.
They outline 3 additional properties that need to be taken into account to develop a
suitable algorithm. These are:
1. The disparity selectivity of complex cells must be much finer than that predicted
by the size of the RFs, as reported by Nikara et al (1968)
2. The preferred disparity must be constant for all stimulus positions within the RF
3. Incorrect contrast polarity combinations should be ineffective if presented at the
optimal disparity for the matched polarity pair, i.e. a bright bar to one eye and a
dark bar to the other should not give rise to a response at the preferred disparity
The authors explain the purpose of the first 2 requirements by an illustration shown in
the figure 2.12. Figure 2.12(a) shows the RF of a cortical neuron in image space on left
and right eye retinas. The hatched diamond-shaped zone represents the region viewed
by both eyes. Any stimulus within this zone should elicit a response from the neuron in
question. But this implies that the neuron is limited to crude disparity selectivity since
this region encompasses many different disparities (Ohzawa et al, 1990). A disparity
detector should be more specific, i.e. it should respond to a restricted range of visual
space. In this example, if we assume that the 2 eyes are fixated on an object (meaning
zero disparity), then the dark-shaded oval region represents a suitable zone. A graphical
representation of this region is shown in figure 2.12(b). The diagonal slope represents a
plane of zero disparity. The 2 axes represent stimulus positions along the left eye and
right eye. For nonzero disparity, the sensitive region for a detector must be located
parallel to and off the diagonal.
The third requirement deals with “mismatched contrast polarity” (Ohzawa et al, 1990).
Recall that complex cells exhibit phase independence since they are insensitive to
contrast polarity. The question that arises is whether a disparity detector should respond
if anti-correlated stimuli are presented to the eyes at the correct disparity (for e.g., a
bright bar to one eye and a dark one to the other). This is a theoretically implausible
scenario because images on the retinas are from the same object, so there is no question
14
of getting a bright spot in 1 part of one of the retinas and a dark spot in the
corresponding part of the other retina (Ohzawa et al, 1990), but in a computational
framework, this is very likely to happen, especially with RDS as inputs. Ohzawa and
colleagues classify this requirement as a “non-trivial” one and suggest that it is
“counterintuitive to expect the detector to reject anti-correlated stimuli at the correct
disparity on the basis of mismatched contrast polarity”
Figure 2.12: Desired characteristics of a disparity detector (reprinted from [45])
Taking into account all the aforementioned requirements, Ohzawa and collaborators
devised the disparity energy model. It consists of 4 binocular simple cell subunits that
can be combined to produce the output of a complex cell. The inputs from the 2 eyes are
combined to give the output of each subunit. The resulting signal is then subjected to a
15
half-wave rectification followed by a squaring non-linearity. The response of the
complex cell is given by combining the contribution from each subunit.
The authors postulate that the simple cell subunits must meet certain requirements to
produce a smooth binocular profile. These are as follows:
•
They must have similar monocular properties like spatial frequency, orientation,
size and position of the RF envelopes
•
They must share a common preferred disparity
•
The phases of the four simple cells must differ from each other by multiples of
90 (quadrature phase)
•
The RFs for the simple cells must be Gabor-shaped
•
The simple cells must be organised into “push-pull” pairs, i.e., the RFs in 1
simple cell are the inverses of those in the other simple cell; in other words, the
ON region of 1 cell corresponds to an OFF region in the other.
The schematic diagram in figure 2.13 shows the ODF model for a tuned-excitatory
neuron. S1 and S2 form 1 push-pull pair while S3 and S4 form the other. Members in a
push-pull pair are mutually inhibitory (Ohzawa et al, 1990).
Figure 2.13:ODF Disparity Energy Model( reprinted from [47])
2.2.6 ENERGY MODEL: READ et al (2002)
This model is very similar to the previous model except that the half-wave rectification
is performed on the inputs from each eye before they converge on a binocular cell. This
16
thresholding is achieved by introducing the monocular cells as shown in figure 2.14 and
the authors claim that this alteration gives results close to real neuronal behaviour when
anticorrelated inputs are used. The model also emphasizes on the type of synapses the
monocular cells make with the binocular neurons to account for inhibitory effects of
visual stimuli observed during biological experiments.
Figure 2.14: Modified Energy Model (reprinted from [58])
2.3 TOPOGRAPHY
The topographic arrangement of the primary visual cortex and the presence of cortical
feature maps were highlighted in a previous section. The maps are superimposed to
form a hierarchical representation of the input features. This type of representation was
suggested by Hubel and Wiesel (1977). They postulated that orientation and OD maps
are overlaid in a fashion so as to optimize the uniform representation of all possible
permutations of orientation, eye preference and retinal position (Hubel and Wiesel,
1977). Subsequent studies making use of optical imaging techniques have helped to
justify this claim of superimposed, spatially periodic maps being present in the cortex,
and also to strengthen the views formulated by Hubel and Wiesel about the geometrical
relations between feature maps (Swindale, 2000).
In addition to orientation and OD columns in the cortex, maps for direction of motion
and spatial frequency have also been observed (Hubener et al, 1997; Shmuel and
Grinvald, 1996; Welliky et al, 1996) have also been observed. The following question
comes to one’s mind: Do spatially periodic maps for other features that influence
neuronal behaviour exist? It is a well-known fact that cortical cells respond to a
17
multitude of visual feature attributes, so there exists the possibility of corresponding
feature maps to be present in the cortex. It is only through delicate biological
experiments using state-of the art imaging techniques that clear-cut results can be
obtained.
2.3.1 TOPOGRAPHY AND DISPARITY
Studies on stereoscopic vision have so far focused more on the physiology of simple
and complex cells and their role in encoding disparity. Very few studies have
concentrated on the topographic arrangement of disparity-sensitive neurons in V1.
Blakemore (1970) initially proposed that ‘constant depth’ columns can be found in the
cat V1 (DeAngelis, 2000). He advocated that such columns consist of neurons having
similar receptive-field position disparities. But due to the unsophisticated experimental
setup and lack of substantial experimental data, his findings were not deemed good
enough to comprehensively resolve this issue.
Another study conducted by LeVay and Voigt (1988) showed that “nearby V1 neurons
have slightly more similar disparity preferences than would be expected by chance”
(DeAngelis, 2000), although , as DeAngelis (2000) points out, the clustering was not as
definite as in the case of orientation selectivity or ocular dominance. Moreover, most of
the neurons that were investigated were near the V1/V2 border, thereby weakening the
claim of V1 having disparity-selective columns (DeAngelis, 2000).
Most work in this area has shifted to looking for these maps in higher areas of the
cortex, and recent research conducted by DeAngelis and Newsome (1999) has provided
some evidence for a map of binocular disparity in the middle temporal (MT, also known
as V5) visual area (DeAngelis and Newsome, 1999) of macaque monkeys by using
electrode penetrations. They found discrete patches of disparity-selective neurons
interspersed among other patches showing no disparity sensitivity. They claim that the
preferred disparity varies within the disparity-tuned patches and that disparity columns
exist in the MT region, and possibly in V2, since MT receives input connections from
V2. This is an interesting finding but yet not conclusive enough to prove the existence
of spatially periodic disparity maps. Reliable results can only be obtained if a large
population of neurons can be measured at once, for example by using optical imaging
techniques. Experiments using such a procedure have actually been conducted by Ts’o
et al (2001). They combined optical imaging, single unit electrophysiology and
cytochrome oxidase (CO) histology to investigate the structural organization for colour,
form (orientation) and disparity of area V2 in monkeys. The findings from this
18
endeavour might be of great relevance to the investigation of disparity maps in the
brain, hence the need for a separate subsection to describe the prominent features of this
research work.
2.3.2 TS’O, WANG ROE and GILBERT (2001)
Previous work on primates using CO histology has shown that V2 has a hierarchical
organization reminiscent of that found in V1, except that cells are compartmentalised
based on orientation-selectivity, colour preference and disparity selectivity. Two main
types of alternating CO-rich stripes have been observed, the thick and thin stripes that
are often separated by a third type commonly referred to as the pale stripe (Tootell and
Hamilton, 1989). The thick stripe is believed to contain cells selective to disparity and
motion, the thin stripe seems to be responsive to colour preferences, and the pale stripe
is associated with orientation selectivity. Ts’o and colleagues performed various
experiments by using a combination of techniques that included optical imaging,
electrode penetrations and CO histology for increased effectiveness and reliability. The
first set of tests provided results that seemed to justify the tripartite model of visual
processing in V2 that was put forward by earlier research work. They found unoriented,
colour-selective cells in the thin stripes, and of greater interest to this project, cells that
are selective to retinal disparity in the thick stripes. These cells were unselective to
colour, and had complex, oriented RFs (Ts’o et al, 2001).They also observed that these
cells showed little or no response to monocular stimulation but responded vigorously to
binocular stimuli over a small range of disparities. In order to get a better distinction
between the arrangement of the colour and disparity stripes, they subjected the primates
to monocular stimulation, the idea being that disparity-selective regions would not be
responding. The expected result of clear-cut delineation between the disparity and
colour stripes was not observed, leading the authors to suggest the existence of
subcompartments within the V2 stripes. Based on this type of hierarchical organisation
in V2, they draw a parallel with that of V1, which also contains subcompartments,
although mention is not made of the presence of disparity stripes in V1. Other relevant
observations along disparity stripes concern the orientation preference and type of
disparity-selective cells. It was seen that most of the cells had vertical or near vertical
preferred orientation. They found out that within a disparity stripe, most of the columns
contained cells of the tuned excitatory type but that columns populated with the other 3
types of cells were also present. The authors conclude that “one key functional role of
area V2 lies in the establishment of a definitive functional organisation and cortical map
for retinal disparity”.
19
2.4 SELF-ORGANIZATION
How does the brain develop such distinctive characteristics? Researchers believe that
these maps form by the self-organization of the afferent connections to the cortex and
are shaped by visual experience (Miikkulainen et al, 2005), based on cooperation and
competition between neurons as a result of correlations in the input activity (Sirosh and
Miikkulainen ,1993). As research in this field intensified, it became clear that selforganization is not influenced by these afferent connections alone, but also by lateral
connections parallel to the cortical surface (Miikkulainen et al, 2005). Based on various
observations, Miikuulainen and collaborators believe that the wiring of the lateral
connections is not static but rather develops “synergetically and simultaneously” with
the afferent connections, based on visual experience. These researchers describe the
adult visual cortex as a “continuously adapting recurrent structure in a dynamic
equilibrium, capable of rapid changes in response to altered visual environments”
(Miikkulainen et al, 2005). This implies that the functional role of the afferent and
lateral connections is to suppress redundant information in the input stimuli, while being
able to learn correlations found in novel visual features. This type of organization that
relies heavily on visual features is termed as Input-Driven Self-Organization.
If self-organization depends heavily on the types of inputs presented, then what about
the influence of genetic factors in the arrangement of neurons in the brain? The previous
discussion might mislead the reader in thinking that cortical maps develop solely as a
result of visual activity after birth. Experimental findings tend to show that this is not
the case. Indeed, it is believed that both genetic and environmental factors affect the
topography of the cortex. In the prenatal stage, internally generated activity such as
retinal waves and Ponto-Geniculo-Occipital(PGO) waves are thought to be genetically
specified training patterns that initiate the self-organizing process. Studies have shown
that animals have brain regions showing orientation selectivity even before birth
(Miikkulainen et al, 2005), thereby strengthening the hypothesis of geneticallydetermined internal signals in initiating the self-organizing process. Thus, an organism
already has a basic topographic framework at birth, and this organization is constantly
refined by visually-evoked activity after birth.
2.5 COMPUTATIONAL MODELS
Neuroscience has been the focus of extensive research for a very long time, but
researchers are not even close to come up with a unified account of on-going neural
mechanisms and processes. The sheer complexity of the brain has so far proved to be an
20
overwhelming hurdle although, to be fair, considerable headway has been made in
many research areas.
The setting up of biological experiments requires lots of time and effort, consequently
making progess in neuroscience slow and painful. For quite some time now, scientists
have adopted a new approach to research, namely computational modelling. This has
provided a new dimension to studies related to the brain and has been embraced by
many researchers. Computational models can be used instead of biology, as concisely
described by Miikkulainen et al (2005), “to test ideas that are difficult to establish
experimentally, and to direct experiment to areas that are not understood well”.
(Miikkulainen et al, 2005). Ohzawa and colleagues (1990) point out that these models
play an important role in neuroscience since they can provide quantitative predictions
that may be compared with results from biological experiments (Ohzawa et al, 1990).
Because of these attractive features, it is unsurprising that computational neuroscience
has been used to investigate the self-organizing process.
Ever since von der Malsburg (1973) pioneered this area of research by using a twodimensional network of neural units to model the cortex, several other models have
been proposed (Miikkulainen et al, 2005). However most of these models did not cater
for the dynamic nature of the lateral connections (Miikkulainen et al, 2005), and
therefore might not be ideal to simulate the self-organizing process. More recent models
have inevitably focused more on the dynamicity of the visual cortex with increased
emphasis on the interaction between the afferent and lateral connections (Miikkulainen
et al, 2005). The model of interest for this project is LISSOM, proposed by
Miikkulainen and colleagues. Section 2.6 describes the hierarchical and functional
properties of this model. It is based on the simple but effective self-organizing feature
map (SOM) model proposed by Kohonen (1982b). The next section gives a brief
overview of this famous algorithm.
2.5.1 KOHONEN SOM
SOM maps a high-dimensional input data space onto a two-dimensional array of
neurons. In the context of our study, the latter represents the cortical surface whereas
the former refers to a receptor surface such as the retina. Every unit in the neural sheet
is connected to all the units on the receptor surface, such that all the cortical units
receive the same input stimuli. So if we have a retina of N units, each neuron will have
an input vector of length N. Each connection has a positive synaptic weight. Since each
neuron is connected to N inputs, it will have a weight vector of length N associated with
21
it. The neuron computes its initial response as a function of the Euclidean distance
between the input and the weight vectors. A winner-take-all process usually operates
whereby the cortical neuron with the highest activation affects the activity of the nearby
neurons based on a neighbourhood function. The weight vector is modified using the
Euclidean difference between the input and the weight vectors. Initially the connection
weights are random, such that each neuron responds randomly to activity on the retina.
During learning, the weights adapt, slowly making each neuron more specific to
particular input patterns. Consequently, the weight vectors become better
approximations of the input vectors, and neighbouring weight vectors become more
similar. After many iterations, the weight vectors become an ordered map of the input
space, thereby leading to retinotopy.
2.6 LISSOM
LISSOM stands for Laterally Interconnected Synergetically Self-Organizing Map. It is
a learning algorithm designed to capture the essence of the self-organizing process in
the visual cortex by concentrating on certain processes that have been overlooked by
most computational models, more specifically the influence of the lateral connections.
Miikkulainen et al (2005) perceive LISSOM as a model that can give concrete, testable
answers to the following viewpoints:
1. input-driven self-organization is responsible to shape up cortical structures
2. self-organization is influenced by inputs that are internally generated, as a result
of the genetic blueprint of an organism, as well as visually-evoked activity from
the environment
3. perceptual grouping, i.e. the process of finding correlations in the input stimuli
to successfully and coherently identify an object in the visual scene, is a
consequence of the interaction between afferent and lateral connections
LISSOM is inspired from various studies that have been conducted on the selforganizing process and from data collected about the structure of the cortex. It was
developed in an attempt to model neurobiological phenomena and yield results that may
inspire new research directions. The salient features of LISSOM, as described by
Miikkulainen and collaborators (2005) are given below:
1. the neural sheet is a two-dimensional array of computational units, each unit
corresponding to a vertical column in the cortex
2. each cortical unit received input from a local anatomical receptive field in the
retina, usually with the ON-centre and OFF-centre channels of the LGN as
intermediate sheets between the input and output sheets
22
3. interactions between afferent and lateral connections govern the input-driven
self-organizing process
4. each cortical unit has a weight vector whose length is determined by the number
of connections; it responds by computing a weighted sum of its input
5. learning is based on Hebbian adaptation with divisive normalization, which is
the computational equivalent of the biological learning procedure
The following sections describe the LISSOM model in more detail. Most of the material
is taken from Chapter 3 in ‘Computational Maps in the Visual Cortex’ (CMVC) book
by Miikkulainen and colleagues (2005).
2.6.1 LISSOM ARCHITECTURE
The architecture of the basic LISSOM model is shown in figure 2.15. Each V1 neuron is
connected to a bunch of neurons in the LGN-ON and LGN-OFF sheets. In LISSOM
jargon, the term connection field is used to refer to a region in a lower-level sheet that
has direct connections with a neuron in a layer hierarchically on top of the other one.
Thus each cortical neuron is connected to specific regions in the LGN layer. This is
unlike the SOM model, wherein each cortical neuron is fully connected to the lower
layer. The LGN neurons in turn have connection fields onto the retina. Each neuron
develops an initial response as a weighted sum of the activation in its afferent input
connections. The lateral connections translate the initial activation pattern into a
localized response on the map. After a settling period, the connection weights of cortical
neurons are modified through Hebbian learning. As the self-organizing process
progresses, activity bubbles are produced that become increasingly focused and
localized. The result is a self-organized structure in a dynamic equilibrium with the
input.
Figure 2.1 : Architecture for basic LISSOM model(reprinted from [40])
23
RETINA
The retinal sheet is basically an array of photoreceptors that can be activated by the
presentation of input patterns. The activity χxy for a photoreceptor cell (x,y) is calculated
according to
where (xc,k , yc,k) specifies the centre of Gaussian k and σu its width
LGN
The connection weights of the LGN neurons are set to fixed strengths using a
difference-of-Gaussians model (DoG). There is a retinotopic mapping between the LGN
and the retina. The weights are calculated from the difference of two normalized
Gaussians; weights for an OFF-centre cell are the negative of the ON-centre weights,
i.e. they are calculated as the surround minus the centre. The weight Lxy,ab from receptor
(x, y) in the connection field of an ON-centre cell (a, b) with centre (xc, yc) is given by
where σc determines the width of the central Gaussian and σs width of the surround
Gaussian.
The cells in the ON and OFF channels of the LGN compute their responses as a
squashed weighted sum of activity in their receptive. Mathematically the response ξab of
an LGN cell (a,b) is computed by
where χxy is the activation of cell (x, y) in the connection field of (a, b), Lxy,ab is the
afferent weight from (x, y) to (a, b), and γL is a constant scaling factor. The squashing
function σ is a piecewise linear approximation of a sigmoid activation function
24
CORTEX
The total activation is obtained by taking both the afferent and lateral connections into
account .First, the afferent stimulation sij of V1 neuron (i, j) is calculated as a weighted
sum of activations in its connection fields on the LGN:
where ξab is the activation of neuron (a, b) in the receptive field of neuron (i, j) in the
ON or OFF channels, Aab,ij is the corresponding afferent weight, and γA is a constant
scaling factor. The afferent stimulation is squashed using the sigmoid activation
function. The neuron’s initial response is given as as
where σ (·) is a piecewise linear sigmoid
After the initial response, lateral connections influence cortical activity over discretized
time steps. At each of these time steps, the neuron combines the afferent stimulation s
with lateral excitation and inhibition:
where ηkl(t-1) is the activity of another cortical neuron (k, l) during the previous time
step, Ekl,ij is the excitatory lateral connection weight on the connection from that neuron
to neuron (i, j), and Ikl,ij is the inhibitory connection weight. The scaling factors γE and γI
represent the relative strengths of excitatory and inhibitory lateral interactions.
Connections to the cortex are not set to fixed strengths as it is the case with the LGN
connections. Weight adaptation of afferent and lateral connections is based on Hebbian
learning with divisive postsynaptic normalization. The equation is as given below
25
where wpq,ij is the current afferent or lateral connection weight from (p, q) to (i, j),
w’pq,ij is the new weight to be used until the end of the next settling process, α is the
learning rate for each type of connection, Xpq is the presynaptic activity after settling,
and ηij stands for the activity of neuron (i, j) after settling
2.6.2 SELF-ORGANIZATION IN LISSOM
Orientation maps have been widely investigated using computational modeling.
LISSOM has also focused extensively on the topographic arrangement of cortical
neurons based on orientation preference. The figure below shows the organization of
orientation preferences before and after the self-organizing process
Iteration 0
Iteration 10,000
Figure 2.16 : Orientation map in LISSOM(reprinted from [40])
2.7 TOPOGRAPHICA
The Topographica simulator has been developed by Miikkulainen and collaborators to
investigate topographic maps in the brain. It is perpetually being refined and extended
in an attempt to make it as generic as possible and to promote the investigation of
biological phenomena that are not very well understood. This study on disparity
necessitated the addition of some new functionalities in Topographica, especially for
the generation of disparity feature maps. The next section highlights the map
measurement techniques currently used in Topographica.
26
2.7.1 MAP MEASUREMENT IN TOPOGRAPHICA
There are various algorithms that can be used for feature map measurement. Most of
these techniques produce similar results, especially in cases where neurons are strongly
selective for the feature being investigated, but might yield different results for units
that are less selective (Miikkulainen et al, 2005).There are typically two types of
methods, namely direct and indirect, that can be used to compute preference maps. In
direct methods, maps can be calculated directly from the weight values of each neuron,
while indirect methods involve presenting a set of input patterns and analyzing the
responses of each neuron. The choice of map measurement technique usually implies a
tradeoff between efficiency and accuracy, since direct methods are more efficient while
indirect methods are more accurate (Miikkulainen et al, 2005). In Topographica both
methods have been used to calculate feature maps. For example, the map of preferred
position is obtained by computing the centre of gravity of each neuron’s afferent
weights, whereas orientation maps are calculated by an indirect method called the
weighted average method, introduced by Blasdel and Salama (1986). The next section
gives a detailed account of this method. It is based almost entirely from material in
Appendix G.1.3 of the CMVC book.
WEIGHTED AVERAGE METHOD
In the weighted average method, inputs are presented in such a way so as to ensure that
a whole range of combinations of parameter values is possible. For each value of the
map parameter, the maximal response of the neuron is recorded. The preference of a
neuron corresponds to the weighted average of the peak responses to all map parameter
values. This may sound somewhat confusing; it’s better to clarify this by describing this
method mathematically and subsequently giving an example of its application. The
following subsection does just that.
Consider the weighted average method being used to compute orientation preferences.
For each orientation φ, parameters such as spatial frequency and phase are varied in a
systematic way, and the peak response η̂ φ is recorded. A vector with length η̂ φ and 2φ
as its orientation is used to encode information about each orientation φ. The vector
V= (Vx , Vy) is formed by summing up the vectors for all the orientation values.
27
The preferred orientation of the neuron, θ is estimated as half the orientation of V :
where atan2(Vx , Vy) returns tan-1(Vx / Vy) with the quadrant of the result chosen based
on the signs of both arguments. Orientation selectivity can be obtained by taking the
magnitude of V. This can be normalized for better comparison and analysis purposes.
Normalized orientation selectivity, denoted by S, is given by:
As an illustration, suppose that input patterns were presentation at orientation 0°, 60°
and 120°, and phases 0, π/8, …, 7π/8 for a total of 24 patterns. Assume that the peak
responses for a given neuron for all the different phases were 0.1 for 0°, 0.4 for 60° and
0.8 for 120°. The preferred orientation and selectivity of this neuron are
The interesting point to note here concerns the selectivity value for this neuron. It has a
relatively low selectivity because it responds quite well to patterns with different
orientations, namely 60° and 120°. High selectivity would therefore indicate a strong
bias towards one particular value of a feature parameter.
2.8 MODEL OF DISPARITY SELF-ORGANIZATION
Although numerous models for self-organization have been put forward, very few have
actually been customized to investigate the topographic arrangement of neurons based
on disparity preferences. This is quite surprising since the presence of disparity maps in
V1 based on biological data is yet to be confirmed, and therefore computational
neuroscience would have been ideal to help scientists channel their research on
predictions made by the model. Anyway, one such work was undertaken by Wiemer et
al (2000).
28
2.8.1 WIEMER ET AL (2000)
The aim of their research was to investigate the representation of orientation, ocular
dominance and disparity in the cortex. Since orientation and OD maps have been widely
studied using both computational and traditional neuroscience techniques, the emphasis
of their work was more on the potential presence of disparity map in the cortex. They
used the SOM algorithm for learning. Nothing really extraordinary, one might say, but
what really made their work distinctive are the methods employed for presenting input.
In an earlier section, the importance of the input stimuli in the self-organization process
has been made clear. Wiemer and colleagues stress on the significance of the types of
binocular stimuli to be presented in order to have any chance of getting clumps of
disparity-selective neurons. They generate such stimuli by first creating a threedimensional scene, and then take stereo pictures of it. According to the authors, this
kind of processing preserves the correlations between stimuli features such as
orientation and disparity. Further processing includes cutting the pictures in ocular
stripes, and aligning them in alternating order to produce a fused projection. The whole
fused image is not presented as the input stimulus. Instead, chunks of it that contain
features from both the left and right ocular stripes are selected based on the amount of
correlation between them. The algorithm rejects any chunk that contains very little
correlation between the part from the left ocular stripe and that from the right ocular
stripes. This is done to ascertain that the left and right-eyed part correspond to the same
three-dimensional object (Wiemer et al, 2000). The selected chunks represent the
binocular stimuli that will be presented to the network. They are normalized to ensure
that dissimilarities in brightness of different pairs of stereo images do not affect the final
results.
Figure 2.17 (reprinted from [69]): A and B represent the stereo images obtained from
the three-dimensional scene. These images are divided into stripes, which are arranged
in alternating order to form the image shown in C. Chunks of this image are then
selected based on the amount of correlation to yield the pool of stimuli, shown in D, to
be presented to the network.
29
The results obtained by Wiemer and collaborators are shown in figures 2.18 and 2.19
Figure 2.18 shows maps of left-eyed and right-eyed receptive fields. It can be seen that
RFs of different shapes and asymmetries are present with pools of neurons preferring
similar patterns varying gradually along the two-dimensional grid (Wiemer et al, 2000).
Fig 2.18 : Maps of left-eyed and right-eyed receptive fields(reprinted from [69])
Figure 2.19 shows the resulting binocular feature representation obtained by the fusion
of the left- and right-eyed receptive fields. The corresponding orientation, ocular
dominance and disparity maps are illustrated. One interesting point that can be clearly
distinguished when analyzing maps B and D in figure 2.19 is that disparity differences
are greatest in regions of vertical and oblique orientations (Wiemer et al, 2000). The
white patches in D represent positive disparities; black patches indicate negative
disparities, while gray is for zero or very small disparities. Based on these results, the
authors suggest that subcompartments corresponding to different disparities might exist
in regions of constant orientation, such that maps of orientation, ocular dominance and
disparity might be geometrically related.
The study by Wiener and colleagues is an intriguing one to say the least. Their
conclusion about disparity patches lying within regions of relatively uniform orientation
preference has been proved to be true by the experiments conducted by Ts’o et al on the
V2 region of the macaque monkey. But there are certain reservations about the validity
of their model. First, the authors do not mention anything about which region of the
visual cortex they are simulating, thereby casting doubts over specificity of their
experiment. Moreover, they don’t use the concept of overlapping receptive fields that is
a distinctive feature of cortical neurons, which might be important for self-organization.
In addition, the manner in which binocular stimuli are presented is quite debatable
although the algorithm used to extract such stimuli from a three-dimensional scene is
30
impressive. It seems that they use the notion of a ‘cyclopean’ eye to process the
binocular stimuli instead of having two eyes with slightly disparate images incident on
them. Next, if the maps are analyzed, serious discrepancies seem to be present as
compared to the ones found in biology: The neurons do not have the usual centresurround, two-lobed or three-lobed RF that is so characteristic of cortical neurons; the
orientation map is far from showing any kind of periodicity. Despite these apparent
imperfections, this model is one of the first to probe into the existence of disparity
maps, and therefore the effort by Wiemer et al is worth some consideration.
Figure 2.19 : Binocular Feature Representation(reprinted from [69])
31
CHAPTER 3
METHODOLOGY
This chapter is divided into 2 main sections. The first one highlights the methods used
to investigate self-organization of disparity maps in. The second one describes the work
done in an attempt to solve the problem of phase-dependent behaviour of current
LISSOM models.
3.1 SELF-ORGANIZATION OF DISPARITY SELECTIVITY
At the onset of the project, the Topographica software did not have the desired
functionality to investigate the potential self-organizing feature of disparity selectivity.
Therefore the first major task was to understand the intricacies of the simulator to be
able to provide the appropriate experimental setup for the inspection and measurement
of disparity preferences and selectivities. More specifically, the first requirement
consisted of developing a network based on two retinas, with suitable input patterns on
each one, to simulate the input-driven self-organizing process that is believed to
culminate into disparity selectivity in the primary visual cortex of primates. The second
major task was about development of a map measurement mechanism to compute the
disparity preferences of the cortical units. The following sections give a detailed
account of how these two tasks were tackled
3.1.2 TWO-EYE MODEL FOR DISPARITY SELECTIVITY
Most studies using LISSOM have been based on a single eye, although two retinas have
been used before to investigate ocular dominance. However the experiments on eye
preferences were conducted in the C++ version of LISSOM. There was no tailor-made
Topographica script that dealt with two eyes. So the first and perhaps the most
important aspect of the project was to come up with a model that could reliably simulate
the self-organization process. The starting point and inspiration was the lissom_oo_or.ty
script that can be found in the examples folder of the topographica directory. It is used
to investigate orientation selectivity based on a single retina and LGN ON/OFF
channels. The model for this simulation is illustrated in figure 3.1.
32
Figure 3.1: Model used in lissom_oo_or.ty
As can be seen from the illustration, the model consists of an input sheet (‘Retina’), an
LGN layer (‘LGNOff’ and ‘LGNOn’), and an output sheet (‘V1’). The connections from
the retina to the LGN (‘Surround’ and ‘Centre’) are based on the difference-ofGaussians (DoG) model. The afferent connections, namely ‘LGNOffAfferent’ and
‘LGNOnAfferent’, regulate the activation of neurons in the neural sheet. Lateral
connections are also included; they are represented as the dotted yellow circles in ‘V1’.
Another important point concerns the type of input pattern. This model makes use of
elongated Gaussians. They are generated randomly across the input space, and their
orientations are also determined by a random process. The idea is to ensure that each
receptor in the input sheet is subjected to all possible position-orientation permutations
when the simulation is run for a large number of iterations, so that a smooth orientation
map can develop. It is to be noted that the values used for the parameters in this model
are based on those that were used in the LISSOM C++ experiments for orientation
selectivity. Interested readers can check out Appendix A in CMVC book to get a general
idea of how the correct values can be chosen for the simulation parameters.
Based on the single-eye model, it was relatively straightforward to design the skeleton
for a two-eye model. It consists of two input sheets, two LGN ON/OFF channels and
one output sheet. The number of afferent connections is doubled as expected. The
diagram below sums it all. Initially, the values used for the parameters were the same as
33
in the single-retina model; but later on, some of them had to be tweaked, as shall be
explained in subsequent sections
Figure 3.2: two-eye model
The next step consisted of generating input patterns that would be slightly offset in each
eye to simulate horizontal retinal disparity. This was not as easy as setting up the
framework since it required knowledge of some distinct features in Topographica.
Eventually this was successfully achieved. The figure below shows a simulation
scenario in which the input patterns are offset by 12 units (the retinal size is 54 units
along the horizontal axis).
Figure 3.3: elongated Gaussians as input
A prototypical model to investigate disparity selectivity was now ready. Not quite,
actually. The main criterion to judge whether the parameters in the model had been
34
correctly set was by comparing the orientation maps obtained while using the
lissom_oo_or.ty script and the disparity model for the case of zero disparity. They
should be giving identical maps. This was not the case and clearly indicated that
something was wrong. After much probing, it became obvious that something was
amiss with the activation of neurons in the output sheet. Let us consider the one-eye
case. For a particular input pattern I, let us assume that the activity of a neuron, say N,
due to the afferent connections is x. Then for the case of zero disparity, for pattern I in
each eye, the activity of N due to the afferent connections would be greater than x.
Recall that the activity of a cortical neuron due to its afferent connections is simply a
linear sum of the contribution of all these connections. There are many ways to solve
the problem of this increased level of activity. The simplest one includes using the
strength parameter associated with the connections that is provided by Topographica.
The strength value in the one-eye model was 0.5. By simply halving that value for each
of the afferent connections in the zero disparity models, the desired orientation map was
obtained.
The model was ready for experimentation, except for one last tweak. In the one-eye
model, the weights were initialized randomly. For examples, the LGNOff and LGNOn
connections would have different sets of initial weights. This is not a problem per se,
but for the experiments on disparity, one would want to compare the learned weight
values between the left-eye connections and the right-eye connections. To provide a
convenient platform of comparison, the weights of all the connections were initialized
to the same value. This has very little influence on the final topographic organization.
3.1.3 DISPARITY MAP MEASUREMENT
At the beginning of the project, the necessary functionality to generate orientation maps
was already present in Topographica. The first step in setting up disparity map
measurement therefore consisted of understanding the different classes and functions
that were involved in computing orientation preferences. Once the whole mechanism
became clear, the basic function for measuring disparity was implemented. There are
certain useful points that are worth to be highlighted. The first one concerns the type of
input used. Sine gratings were chosen ahead of other input types for reasons that are
listed below:
•
One input presentation covers the whole input space as compared to inputs such
as elongated Gaussians being used. The latter cover only a small portion of the
retina and are usually used for training instead of map computation. This implies
35
that fewer input presentations are required when sine gratings are used, thus
making the map measurement process faster
•
Sine gratings have a phase component that is very important for disparity
measurement, as discussed below
The second point concerns the type of disparity being measured. In Chapter 2, allusion
was made to two main types of disparity encoding, namely position difference and
phase difference. In the current LISSOM model, it is easier to investigate phase
disparity as a result of the way in which connection fields have been implemented.
Consider a scenario with one input sheet and one output sheet, without any LGN layer.
If a cortical neuron is at position (i,j) in the two-dimensional output sheet, then its
connection field will also be centred at position (i,j) in the retina, thereby maintaining
retinotopy. Now if two input sheets are used, our neuron will have its connection field
centred at corresponding points in each retina. Recall that in the position difference
model, a cortical neuron has its receptive fields at non-corresponding points on each
retina whereas for the phase model, the RFs are at corresponding points. Thus LISSOM
is better suited to investigate phase disparity, although the current implementation can
be altered to make it suitable for the investigation of position disparity.
Since phase disparity is to be measured, one important factor, namely periodicity, has to
be taken into account when using the weighted average method. Phase is a periodic
parameter, and therefore averaging must be done in the vector domain, as described in
the previous section. Sine gratings are 2π-periodic, therefore phases just above and
below zero(for e.g. 10° and 350°) should average to 0°; for non-periodic parameters
such as ocular dominance, the arithmetic weighted average is used (Miikkulainen et al,
2005). This was already implemented in Topographica, so it was only a matter of using
the right functions.
Yet another point concerns the presentation of the input patterns to two retinas. Some
modifications had to be made to the existing Topographica code to ensure that the
patterns are input properly to both eyes. Taking all these into account, a functionality to
compute disparity maps was implemented. The following parameters were to be varied
systematically, phase and orientation of the sine grating, and the amount of phase
disparity to be present in both eyes. As an example, consider a certain input presentation
where the phase of the sine grating is 0°, the orientation is 90°, and the amount of phase
disparity is 180°. The function computes the new phase of the pattern on the left retina
to be -90° (0° - 180°/2), and that for the right retina as 90° (0° + 180°/2), thereby
maintaining a phase difference of 180° between the two patterns. Note that since phase
is a cyclic property, only values between 0° and 360° are considered, such that values
36
such as -90° are processed to fall into that range (-90° is the same as 270° for a sine
grating). So, for the phase of the input pattern, the range was between 0° and 360°. For
disparity however, the range should be between -180° and 180° instead of 0° to 360°.
This is because disparity has been treated as a signed property in this project and the
convention has been maintained for map measurement purposes as well. A positive
value of disparity would imply the phase of the input pattern in the right eye is greater
than that in the left eye, while a negative value would indicate the opposite. But
eventually the range 0° to 360° was used because of the colour code that was used for
proper visualization of selective regions in the disparity map. Negative values are
clipped to zero during the plotting process, and therefore the range -180° and 180° does
not yield a proper map. When using the range 0° to 360°, it is imperative that a
distinction is made between values from 0° to 180°, and 180° to 360°. Any value, say
x°, in the former range indicates that the phase of the input pattern in the right eye is
greater than that in the left eye by x°. Any value, say y°, in the range 180° to 360°,
indicates that the phase in the left eye is greater than that in the right eye by (360° - y°).
So if we have a phase difference of 300° (phase in right eye is greater than phase in left
eye by 300°), it is the same as a phase difference of -60° (phase in left eye is greater
than phase in right eye by 60°).
A: Or = 90°, Freq =1.2 Hz, LPhase=0°,
RPhase=0°
B: Or = 90°, Freq =1.2 Hz, LPhase=-90°,
RPhase=90°
C: Or = 45°, Freq =2.4 Hz, LPhase=45°,
RPhase=135°
D: Or = 0°, Freq =2.4 Hz, LPhase=45°,
RPhase=90°
Figure 3.4: Sine Gratings
37
Note that the the png file used for the colour code for disparity maps was originally
used for orientation. Readers should not take the orientations of the colour bars into
consideration while analyzing the disparity maps.
Figure 3.5: Colour Code
3.1.4 TYPE OF INPUT
The self-organizing process depends a lot on the type of inputs being presented. It was
therefore imperative to carry out sets of experiments with different types of inputs. The
first set of experiments (referred to as Gaussian) has the usual oriented Gaussians as
input, with all the patterns being brighter than the background The second set (referred
to as Plus/Minus) deals with oriented Gaussians which can be either brighter or darker
than their surround . This required some modifications to the original disparity script
file in order to randomly present bright or dark Gaussian patterns. The final set of
experiments (referred to as Natural) is concerned with the presentation of stereo image
pairs as input. This necessitated the collection of suitable stereo images from the
internet to set up an image database. The script file randomly chooses a pair of stereo
images from the database, applies a window function to select corresponding chunks
from each image and present them as input (see figure 3.6)
38
Left part of stereo image
Right part of stereo image
(a):Pair of Stereo Images
(b):Chunks of Stereo Image presented as input
Figure 3.6 : Natural
3.2 PHASE INVARIANCE
The second stage of the project consisted of finding a means to resolve the phase
invariance issue in LISSOM. The RF profiles developed during LISSOM simulation
correspond to those of simple cells, and this means that they are sensitive to the phase of
their inputs. For example, if the RF of one neuron has an ON-centre flanked by two
OFF regions, it would respond to a bright bar that is correctly oriented, but would not
respond to a dark bar even if it is optimally oriented. It has been mentioned earlier that
complex cells are different from simple cells as far as the phase of the stimulus is
concerned. Complex cells are indeed phase-dependent and therefore are ideally suited
for disparity selectivity, as discussed in Chapter 2. There was a need to probe into this
matter in an attempt to integrate complex cells in LISSOM. Because of its popularity
and simplicity, the ODF model was chosen as a potential inspiration of how phase
invariance can be achieved in LISSOM. This section described the methodology used to
implement and test the ODF model using the functionality in Topographica.
39
Based on the schematic diagram in section 2.2.5 of Chapter 2, the basic framework for
the LISSOM version of the ODF model was implemented. A skeletal view is given in
figure 3.7. It consists of two input sheets (LeftRetina and RightRetina), 4 intermediate
sheets (S1, S2, S3 and S4) cell sub-units, and one output sheet(C).
Figure 3.7 : Skeletal view for ODF model in Topographica
Since the aim was to investigate the output of a single complex cell, a one-to-one
connectivity had to be present between the intermediate sheets and the output sheet.
This means that the each of the intermediate sheets and the output sheet consists of only
one neuron. The intermediate sheets thus correspond to the simple cell sub-units in the
ODF model, and the output sheet corresponds to the complex cell. The next step was to
apply the half-wave rectification and the squaring non-linearity to the output of the
simple cell sub-units. A couple of simple output functions were implemented and
applied to the intermediate sheets. The last step consisted of simulating the RF profiles
of the simple cells. Since receptive field structure is determined by the weight values of
the afferent connections in Topographica, Gabor-shaped weight patterns with the
required 90° phase differences were applied to connections from the input sheets to the
intermediate sheets. The LISSOM-based ODF model was ready for experimentation.
40
3.2.1 TEST CASES FOR ODF MODEL
Once the ODF model was implemented, the next task consisted of setting up some test
cases to understand its behaviour. There are quite a few studies that have been
conducted in the past to investigate this, one of which involved the use of Random Dot
Stereograms as input stimuli. This was carried out by Read et al (2002), and their work
on the energy model is the main inspiration for the test cases in this section. The
interesting point about RDS, as Gonzales and Perez (1998) advocate, is that “they do
not have any monocularly recognizable figure and depth can be perceived only under
strict binocular vision” (Gonzales et al, 1998). It was important to implement this type
of stimulus into Topographica. There are many ways to generate RDS, but since this
work was inspired from the experiments carried out by Read et al, the techniques used
in their study were adopted. The matlab code for generating RDS was provided by
Jenny Read, and it became a case of implementing a Topographica-friendly version of
the code. It was quite tricky to achieve this but in the end, this was successfully done.
The Topographica version allows many parameters to be varied such that a large range
of RDS patterns could be generated. Some of the parameters included horizontal
disparity, vertical disparity, dot density, dot size and random. The random parameter
randomly determines the position of each dot in the retina. The following diagrams
illustrate scenarios for zero, negative, and positive horizontal disparities with a retinal
size of 100*100 units, dot size of 5*5 units, dot density of 50%, and random parameter
set to 500.
(a):Zero Disparity
41
(b):Negative disparity(Near)
Dots move outwards
(c):Positive disparity(Far)
Dots move inwards
Figure 3.8: Random Dot Stereograms
Using the RDS as stimuli, it was hoped that the LISSOM-based ODF model would
yield disparity tuning curves characteristic of the 4 main categories of cells observed in
V1, namely tuned excitatory, tuned inhibitory, near and far. Previous studies have
shown that the ODF model can simulate the response of these 4 types of cells by setting
the correct RF profiles for each simple cell sub-unit. Tables 3.1 to 3.4 give the phases
used for the Gabor-shaped weight patterns associated with the connections between the
retinas and the simple cells.
S1
S2
S3
S4
LeftRetina
0°
180°
90°
270°
RightRetina
0°
180°
90°
270°
Table 3.1: RF profiles for generating tuned excitatory cell
42
S1
S2
S3
S4
LeftRetina
0°
180°
90°
270°
RightRetina
180°
0°
270°
90°
Table 3.2: RF profiles for generating tuned inhibitory cell
S1
S2
S3
S4
LeftRetina
0°
180°
90°
270°
RightRetina
90°
270°
180°
0°
Table 3.3: RF profiles for generating near cell
S1
S2
S3
S4
LeftRetina
90°
270°
180°
0°
RightRetina
0°
180°
0°
270°
Table 3.4: RF profiles for generating far cell
43
CHAPTER 4
RESULTS
This chapter is divided into 4 main sections, the first three illustrating the results
obtained while investigating the self-organizing process using different types of input.
The last section presents the results obtained with the Topographica-based ODF model.
For the Gaussian and Plus/Minus case, it was important to determine a suitable
threshold for the maximum amount of retinal disparity that should be applied to get
reliable results. Since orientation maps have been studied extensively both by using
computational modelling and by optical imaging techniques on animals, it was
reasonable to investigate the effects of disparity on these maps in an attempt to estimate
the aforementioned disparity threshold. It was hoped that within a certain range of
disparities, the OR maps would be more or less similar, and that disparity levels outside
that range would produce some major degradation to the arrangement of the orientationselective patches. The idea was to have the orientation map for the zero disparity case as
a reference, so that it can be compared to the orientation maps obtained when the
amount of disparity is incrementally increased. Once the threshold was determined,
other features like receptive field properties and disparity maps could be investigated
for different levels of retinal disparity. A good way to understand the response of
cortical neurons in LISSOM is to look at the afferent weights after the learning process.
The afferent weights for one neuron represent the retinal pattern that would produce
maximal excitation; therefore these weights are analogous to the receptive fields of the
cortical unit. It is to be noted that a disparity of x retinal units here implies that the
network has been trained by presenting patterns that are offset in each eye by a
maximum value of x; it does not refer to input patterns having a constant retinal
disparity of x. The retina is 54 units wide, and the diameter of the circular connection
field of an LGN neuron is 18 retinal units.
44
4.1 GAUSSIAN
4.1.1 DETERMINATION OF DISPARITY THRESHOLD
(a) : OR maps for zero disparity
(b) : OR maps for disparity of 1.5 units
(c) : OR maps for disparity of 2.0 units
45
(d) : OR maps for disparity of 2.5 units
(e) : OR maps for disparity of 3.0 units
(f) : OR maps for disparity of 4.5 units
Figure 4.1: OR maps for different levels of disparity for Gaussian
COMMENTS
On analyzing these maps, it can be seen that as the amount of disparity is increased,
there is a gradual degradation of the OR maps with respect to the reference map, as
initially predicted. There seems to be an increased preference to horizontal orientations.
Just by eye-balling, it can be estimated that the threshold for maximum retinal disparity
is 2.0 units.
46
4.1.2 RECEPTIVE FIELD PROPERTIES AND DISPARITY MAPS
CASE 1: ZERO DISPARITY
LGNOffLeftAfferent Weights after 20000
iterations, Plotting Density 10.0
LGNOffRightAfferent Weights after 20000
iterations, Plotting Density 10.0
Figure 4.2(a): Receptive fields of left and right eye (Off Afferent)
LGNOnLeftAfferent Weights after 20000
iterations, Plotting Density 10.0
LGNOnRightAfferent Weights after 20000
iterations,Plotting Density 10.0
Figure 4.2(b): Receptive fields of left and right eye (On Afferent)
47
ON
OFF
ON-OFF
Figure 4.2(c): Left-Eye RF of typical neuron (On Afferent-Off Afferent of unit[5][5])
ON
OFF
ON-OFF
Figure 4.2(d): Right-Eye RF of typical neuron (On Afferent-Off Afferent of unit[5][5])
LGNOffRightAfferent Weights - LGNOffLeftAfferent Weights
Plotting Density 10.0
Figure 4.2(e): Difference between left and right RFs (Off Afferent)
48
Disparity Map with original colour code
Map with processed colours
Figure 4.2(f): Disparity Map for zero disparity case
COMMENTS
Figures 4.2(a) and 4.2(b) show the afferent weights from the LGN sheets to the output
sheet for every fifth neuron in each direction. It can be seen that most of the neurons are
selective for orientation, with preferences varying for each of them. For example, in
figure 4.2(b), the second neuron from top left corner has a preferred orientation of
around 90° (vertical), while the fifth one has a preferred orientation of around 0°
(horizontal). The final RF (On-Off) of a typical neuron for either eye is shown in figures
4.2(c) 4.2(d). It has an ON-centre with two flanking OFF lobes. Most of the neurons
seem to have the same three-lobed structure. Figure 4.2(e) shows the difference between
the right-eye weights and the left-eye weights. It clearly indicates that the weights are
identical, showing that the phase disparity between right-eye and left-eye RFs is zero.
This is confirmed by the disparity map shown in figure 4.2(f). It is obvious from these
maps that the cortical neurons have a disparity preference of zero. Note that one version
of the disparity map with inverted colours have also been included for better clarity.
Upon careful observation, it can be seen that there are very faint patches randomly
scattered. This is an artifact of the weighted average method and the coarse granularity
of the colour coding.
49
CASE 2: DISPARITY of 2.0 UNITS
LGNOffLeftAfferent Weights after 20000
iterations, Plotting Density=24.0
LGNOffRightAfferent Weights after 20000
iterations, Plotting Density=24.0
Figure 4.3(a): Receptive fields of left and right eye(Off Afferent)
LGNOffRightAfferent Weights - LGNOffLeftAfferent Weights
Plotting Density=24.0
Figure 4.3(b): Difference between left and right RFs(Off Afferent)
50
Disparity Map with original colour code
Map with inverted colours
Figure 4.3(c): Disparity Map for disparity of 2.0 units
COMMENTS
For better visualization purposes, the afferent weights from the LGN sheets to the
output sheet are plotted for every second neuron in each direction. From figure 4.3(a), it
can be seen that orientation preferences are fairly evenly distributed in the range 0° to
180°. A notable distinction from the zero disparity case is in the difference between the
right-eye and left-eye afferent weights, shown in figure 4.3(b). It indicates that when
some disparity is introduced, phase differences start to appear in the monocular RFs of
the neurons. The disparity map illustrates these differences as well. Faint bluish and
greenish patches are found to be present here and there, hinting that the phase disparities
might range from -30° to 30°. These patches also seem to correspond to regions where
neurons having a preference for vertical or near-vertical orientations are foundA more
detailed account of the properties for this amount of disparity will be given in the next
chapter.
51
CASE 3: DISPARITY of 4.5 UNITS
LGNOffLeftAfferent Weights after 20000
iterations, Plotting Density=24.0
LGNOffRightAfferent Weights after 20000
iterations, Plotting Density=24.0
Figure 4.4(a): Receptive fields of left and right eye(Off Afferent)
LGNOffRightAfferent Weights - LGNOffLeftAfferent Weights
Plotting Density=24.0
Figure 4.4(b): Difference between left and right RFs(Off Afferent)
52
A:Disparity Map with original colour code
Map with inverted colours
Figure 4.4(c): Disparity Map for disparity of 4.5 units
COMMENTS
Figure 4.4(a) depicts the preference for horizontal or near horizontal orientations by
most of the neurons, as illustrated by the orientation map in figure 4.1(f). The
dissimilarity between the right- and left-eyed RFs also seem to be more pronounced, as
shown in figure 4.4(b), with more clear-cut phase differences. The disparity map
corroborates these observations. Both negative and positive phase disparities are wellrepresented, with magnitudes of 90° and above also present.
4.2 PLUS/MINUS
4.2.1 DETERMINATION OF DISPARITY THRESHOLD
(a) : OR maps for zero disparity
53
(b) : OR maps for disparity of 1.5 units
(c) : OR maps for disparity of 2.0 units
(d) : OR maps for disparity of 2.5 units
(e) : OR maps for disparity of 3.0 units
54
(f) : OR maps for disparity of 4.5 units
Figure 4.5: OR maps for different levels of disparity for Plus/Minus
COMMENTS
The first point to note in the Plus/Minus case is that the reference OR map is different
from its Gaussian counterpart. The arrangement for the orientation-selective clumps is
seemingly less smooth. But the increased bias to horizontal orientations as amount of
disparity is increased is also present here. A threshold of 2.0 retinal units seems to be a
safe bet for this set of inputs as well.
4.2.2 RECEPTIVE FIELD PROPERTIES AND DISPARITY MAPS
CASE 1: ZERO DISPARITY
LGNOffLeftAfferent Weights after 20000
iterations, Plotting Density 10.0
LGNOffRightAfferent Weights after 20000
iterations, Plotting Density 10.0
Figure 4.6(a): Receptive fields of left and right eye(Off Afferent)
55
LGNOnLeftAfferent Weights after 20000
iterations, Plotting Density 10.0
LGNOnRightAfferent Weights after 20000
iterations, Plotting Density 10.0
Figure 4.6(b): Receptive fields of left and right eye(On Afferent)
ON
OFF
ON-OFF
Figure 4.6(c): Left-Eye, OFF-Centre RF of neuron [0][3]
ON
OFF
ON-OFF
Figure 4.6(d): Left-Eye, ON-Centre RF of neuron [0][21]
56
ON
OFF
ON-OFF
Figure 4.6(e): Left-Eye, two-lobed RF of neuron [9][10]
LGNOffRightAfferent Weights - LGNOffLeftAfferent Weights
Plotting Density 10.0
Figure 4.6(f): Difference between left and right RFs(Off Afferent)
57
Disparity Map with original colour code
Map with inverted colours
Figure 4.6(g): Disparity Map for zero disparity case
COMMENTS
The striking feature here concerns the type of RF profiles. In contrast to the Gaussian
case where most RFs had the ON-centre three-lobed structure, in the Plus/Minus case,
there are both ON-centre and OFF-centre RFs, as well as two-lobed RFs. This can be
clearly seen in figures 4.6(a)-(e). Otherwise, the disparity characteristics seem to be
similar to the Gaussian zero-disparity case.
CASE 2: DISPARITY of 2.0 UNITS
LGNOffLeftAfferent Weights after 20000
iterations,Plotting Density=24.0
LGNOffRightAfferent Weights after 20000
iterations, Plotting Density=24.0
Figure 4.7(a): Receptive fields of left and right eye(Off Afferent)
58
LGNOffRightAfferent Weights - LGNOffLeftAfferent Weights
Plotting Density=24.0
Figure 4.7(b): Difference between left and right RFs(Off Afferent)
Disparity Map with original colour code
Map with inverted colours
Figure 4.7(c): Disparity Map for disparity of 2.0 units
59
COMMENTS
There seems to be larger and more vivid patches of disparity-selective neurons here than
in the Gaussian case, suggesting a greater range of phase disparities, but probably not
by much. A more detailed analysis will be undertaken in the next chapter.
CASE 3: DISPARITY of 4.5 UNITS
LGNOffLeftAfferent Weights after 20000
iterations, Plotting Density=24.0
LGNOffRightAfferent Weights after 20000
iterations, Plotting Density=24.0
Figure 4.8(a): Receptive fields of left and right eye(Off Afferent)
60
LGNOffRightAfferent Weights - LGNOffLeftAfferent Weights
Plotting Density=24.0
Figure 4.8(b): Difference between left and right RFs(Off Afferent)
Disparity Map with original colour code
Map with processed colours
Figure 4.8(c): Disparity Map for disparity of 4.5 units
61
COMMENTS
The afferent weights show a high bias towards horizontal orientations, just like in the
Gaussian experiment for this level of retinal disparity. The disparity map is quite
different though, in the sense that it spans a much larger range of phase disparities. The
coloured patches suggest that disparities between -180° to 180° are encoded.
4.3 NATURAL
Figure 4.9(a): OR maps for stereo images
LGNOffLeftAfferent Weights after 10000
iterations, Plotting Density 10.0
LGNOffRightAfferent Weights after 10000
iterations, Plotting Density 10.0
Figure 4.9(b): Receptive fields of left and right eye(Off Afferent)
62
LGNOnLeftAfferent Weights after 10000
iterations, Plotting Density 10.0
LGNOnRightAfferent Weights after 10000
iterations, Plotting Density 10.0
Figure 4.9(c): Receptive fields of left and right eye (On Afferent)
Disparity Map with original colour code
Map with processed colours
Figure 4.9(d) : Disparity Map for Stereol Images
COMMENTS
The OR map obtained when chunks of stereo image pairs are used is very messy. This
haphazard organization can be understood when the receptive fields are analyzed. Most
of them do not have well-developed profiles. The familiar two-lobed and three-lobed
RFs are not present. The disparity map is also different from the other disparity maps
investigated previously. There is relatively very little preference for zero disparity, even
when compared to the cases in the Gaussian and Plus/Minus experiments with retinal
disparity of 4.5 units. There is apparently a preference for large phase disparities,
typically those with magnitude higher than 90°.
63
4.4 PHASE INVARIANCE
The simulations for the LISSOM-based model were performed using retinal size of
100*100 units, dot density of 50% and a dot size of 5*5 units. For each experiment, 500
different patterns were presented. The disparity tuning curves obtained for the tuned
excitatory, tuned inhibitory, near and far scenarios are given below. The x-axis
represents the amount horizontal disparity, where 0.1 units correspond to 10 retinal
units. The vertical axis gives the cumulative sum of the activity of the cell after
presentation of 500 different input patterns
(a) Tuned Excitatory
(b) Tuned Inhibitory
(c) Near
(d) Far
Figure 4.10: Disparity Tuning Curves for the 4 different RF configurations
The tuning curves obtained are similar to those observed in biological experiments for
the 4 main types of cells, namely tuned excitatory, tuned inhibitory, near and far (cf.
figure 2.11 in Chapter 2), suggesting that the ODF model was successfully
implemented in Topographica. However, due to time constraints, work on the phase
64
invariance issue had to be curtailed. There is probably a means to integrate the ODF
model into LISSOM to ultimately simulate complex cells, but implementation of such a
scheme would probably require quite some time. Instead, a phase invariant model based
on the trace rule and inspired from previous work by Sullivan and R. de Sa (2004), is
proposed in the Chapter 6. It might serve as a rough guide to how phase invariant
response can be achieved.
65
CHAPTER 5
DISCUSSION
In this project, the effects of retinal disparity on input-driven self-organization have
been investigated using the LISSOM model. This chapter is about the analysis of results
from the different sets of experiments conducted in an attempt to assess the validity of
the model used. Emphasis is laid not only on the computational aspects of the study but
also on the relevance of the results to biological findings in the field of binocular
disparity.
The discussion will focus on the Gaussian and Plus/Minus experiments; whereas the
simulation results based on stereo images will be overlooked. This is because natural
images contain a multitude of features, for example, several different orientations and
different levels of disparity might be present in a single image, and therefore a
systematic analysis of the results is practically impossible.
The phase invariance issue could not be investigated properly, and therefore it is not
addressed in this chapter. However, a model that can potentially achieve phase invariant
response is outlined in Chapter 6.
5.1 RECEPTIVE FIELD STRUCTURE
This section deals with the effect of input types on the receptive field structure. The
results from the experiments have shown that the final receptive field profile depends a
lot on the type of input used for training. The experiments on Gaussians yielded threelobed ON-centre receptive fields, whereas the Plus/Minus experiments produced both
ON and OFF-centre RFs as well as receptive fields with only two lobes.
In the Gaussian experiments, the input patterns are always brighter than the
background. This means that the activity on the LGN ON and OFF sheets will always
have the configuration shown in the figure 5.1, i.e the type of activity shown for the
LGN ON sheets will never be present on the LGN OFF sheets, and vice versa. Thus,
over a large number of learning iterations, the afferent weights to V1 organize in such a
way so as to best match the activity on the LGN sheets, leading to the formation of
three-lobed ON-centre receptive fields.
66
Figure 5.1: Gaussian Inputs (always brighter than background)
In the Plus/Minus experiments, the input patterns can be brighter or darker than the
background, and therefore the LGN activity can have either of the configurations shown
in figure 5.2(a) and 5.2(b). This explains why V1 neurons tend to have both ON-centre
and OFF-centre Gabor-shaped RFs. The interaction between the afferent connections
and the lateral connections also lead to the formation of other types of RF profiles, such
as the two-lobed ones.
(a):Gaussian patterns brighter than background
(b):Gaussian patterns darker than background
Figure 5.2 : Plus/Minus Inputs
5.2 DISPARITY AND ORIENTATION PREFERENCE
On the basis of the results obtained for the Gaussian and Plus/Minus experiments, it was
seen that as the amount of disparity was gradually increased, the preference for
horizontal or near-horizontal orientations conspicuously increased. Conversely, the
preference for vertical or near-vertical orientation seems to decrease drastically,
especially at large retinal disparities. The OR maps clearly show this trend. There is an
explanation for these phenomena. Consider the diagrams shown in figures 5.3(a)-(d)
below. In figure 5.3(a) for instance, each illustration refers to a square section of the
retina with an elongated Gaussian input. The dashed turquoise circle corresponds to the
receptive field of a hypothetical cortical neuron. The receptive fields in both retinas are
67
at corresponding points. Figure (iii) is obtained when figures (i) and (ii) are superposed.
This can be interpreted as the stimulus seen by the neuron when the monocular stimuli
are fused together, as if there is a cyclopean eye. Figures 5.3(a) and 5.3(b) consider a
scenario where the inputs are horizontally oriented while vertical stimuli are
investigated figures 5.3(c) and 5.3(d). In the example, two different retinal disparities
are considered, x and y, where y is greater than x. It can be seen in figures 5.3(a) and
5.3(b), than the fused stimulus falling within the binocular receptive field of the neuron
is more or less the same even when different retinal disparities are used. The binocular
stimulus is a ‘reinforced’ version of the monocular stimuli and thus has considerable
influence on the activation of the neuron and hence on the learning process.
For the vertical case, when the retinal disparity is x units, both input stimuli fall within
the binocular receptive field of the neuron and hence there is considerable activation.
But at a retinal disparity of y, this is no longer the case and the activity of the neuron
due to this set of stimuli is less than before, thereby contributing less towards weight
adaptation. Thus, for the same retinal disparity of y units, this neuron is activated more
if the input patterns have an orientation of 0° than an orientation of 90°. This can
explain the presence of more horizontal-oriented and less vertical-oriented receptive
fields as the amount of disparity is increased.
(i)Left Retina
(ii)Right Retina
(iii)Fused image
Figure 5.3(a): Retinal disparity of x units for a horizontally-oriented input pattern
(i)Left Retina
(ii)Right Retina
(iii)Fused image
Figure 5.3(b): Retinal disparity of y units for a horizontally-oriented input pattern
68
(i)Left Retina
(ii)Right Retina
(iii)Fused image
Figure 5.3(c): Retinal disparity of x units for a vertically-oriented input pattern
(i)Left Retina
(ii)Right Retina
(iii)Fused image
Figure 5.3(d): Retinal disparity of y units for a vertically-oriented input pattern
5.3 ORIENTATION PREFERENCE AND PHASE DISPARITY
The disparity maps in the previous chapter showed the presence of phase disparities
between the left-eye and right-eye receptive fields. This section covers these findings in
more details. The simulations for the Gaussian and Plus/Minus case with a maximum
retinal disparity of 2.0 units would be considered. Emphasis would be laid on the
relationship between orientation preference and disparity, and on the magnitude of these
phase disparities.
The first step of the analysis will focus on the frequency distribution of the phase
disparities and orientations for both the Gaussian and Plus/Minus simulations. The
corresponding histograms are shown in figures 5.4(a)-(d). It can be seen that in both
cases, the modal value is in the region of zero, i.e. there is an overwhelming number of
neurons that prefer phase disparities as close as possible to zero. For the Gaussian
experiment, the maximum negative disparity is -35°, and the maximum positive
disparity is around 29°, while for the Plus/Minus case, the maximum values are -86°
and 41° respectively. The orientation histograms also seem to indicate that there is a
preference for orientations having a significant horizontal component; orientations in
the range between 70° to 130° are much less frequent.
69
Figure 5.4(a): Histogram for phase disparity (Gaussian, retinal disparity of 2.0 units)
Figure 5.4(b): Histogram for orientation (Gaussian, retinal disparity of 2.0 units)
70
Figure 5.4(c): Histogram for phase disparity (Plus/Minus, retinal disparity of 2.0 units)
Figure 5.4(d): Histogram for orientation (Plus/Minus, retinal disparity of 2.0 units)
71
It was observed that phase disparities seemed to be associated with cells preferring
orientations that have a significant vertical component, although reservations have to be
made about this observation since a quantitative analysis was not performed. This
section will deal with such an analysis. A small portion of the cortical sheet will be
considered, a square region of 100 units, corresponding approximately to the enclosed
parts shown below in the disparity maps.
Disparity Map(Gaussian,2.0 units)
Disparity Map(Plus/Minus,2.0 units)
Figure 5.5: Disparity Maps
The phase disparities and orientation preferences for this bunch of neurons, for both the
Gaussian and Plus/Minus simulations with a maximum retinal disparity of 2.0 units are
given in the tables below. The dashed red and blue rectangles enclose clumps of
neurons with a preference for negative and positive disparities respectively. Their
corresponding orientation preferences are also enclosed. It can be seen that in general,
neurons having quite different left-eye and right-eye RF profiles tend to prefer
orientations with a significant vertical component. Scatter plots for phase disparities as a
function of orientation for this bunch of neurons are shown in figures 5.6(e) and 5.6(f).
Note that the absolute value of the disparities is taken, and a similar sort of processing is
done to the orientation values. For example, an orientation of 150° is the same as an
orientation of -30°, and therefore has the same magnitude as an orientation of 30°.
72
-23
-19
-5
4
8
11
6
0
-2
-1
-22
-16
0
9
12
14
6
-6
-13
-13
-9
-1
-1
14
16
16
3
-2
-3
-15
-1
-1
0
0
16
18
-3
-3
-4
-16
-1
-1
0
0
0
3
1
-4
-5
-13
-1
-1
0
0
1
0
0
0
-3
-5
4
6
6
10
11
6
3
2
0
-2
8
9
8
10
10
8
4
3
2
-2
5
4
4
9
11
5
3
2
0
0
-1
-1
3
9
2
0
0
0
0
0
Figure 5.6(a): Disparity preference (in degrees) for a bunch of neurons(Gaussian)
125
112
11
109
100
90
87
75
70
66
105
103
106
103
97
92
94
127
125
97
46
30
34
90
91
91
104
137
135
119
35
17
17
28
66
78
118
136
135
122
34
20
21
26
36
49
179
133
134
117
40
28
27
31
38
35
22
28
60
48
40
44
46
53
54
42
48
56
55
45
46
53
69
77
67
49
52
53
51
36
50
55
67
64
53
46
47
47
42
8
43
40
39
42
40
33
32
34
29
3
Figure 5.6(b): Orientation preference (in degrees) for a bunch of neurons(Gaussian)
73
-13
-17
-12
13
25
24
15
6
3
6
-15
-12
6
18
22
25
10
0
2
3
-18
4
10
16
17
18
12
0
2
2
-10
10
12
14
17
15
-3
-9
1
1
-1
11
11
11
12
14
-16
-24
-11
1
3
7
8
8
9
6
-9
-19
-14
2
3
0
6
6
7
4
1
0
-1
-1
1
1
5
7
8
8
5
4
2
-1
2
-6
3
2
2
9
5
4
1
-1
0
2
2
2
3
0
5
2
0
-3
Figure 5.6(c): Disparity preference (in degrees) for a bunch of neurons(Plus/Minus)
61
67
77
79
82
80
68
36
25
18
72
72
72
86
90
89
75
24
23
18
97
71
78
89
93
93
88
21
16
12
80
63
74
82
88
93
97
163
14
7
55
54
64
69
72
74
96
108
134
1
47
111
53
55
54
52
80
96
95
8
40
32
48
48
48
47
67
71
67
7
27
11
57
51
51
53
62
60
54
23
20
172
23
21
39
59
58
55
51
41
0
20
25
21
0
158
59
53
50
50
Figure 5.6(d): Orientation preference (in degrees) for a bunch of neurons (Plus/Minus)
74
Figure 5.6(e): Scatter Plot for Gaussian simulation
Figure 5.6(f): Scatter Plot for Plus/Minus simulation
The question that one might ask is why does LISSOM exhibit such type of behaviour.
This can probably be explained by considering figure 5.7, which depicts the left-eye and
right-eye receptive field of a hypothetical neuron. Let us assume that each RF encloses
100 retinal receptors, such that our neuron has 100 connections to each retina. Each
connection has a weight associated with it. Let us also assume that the input pattern
covers 40 receptors in each eye, namely, RL1 to RL40 in the left retina and RR1 to RR40 in
the right retina. In this example, large patterns have been purposefully used to stress on
how phase differences might develop. During this particular input presentation, the
weights associated with these receptors will be affected to a greater extent. Since
75
RL1…RL40 do not correspond to RR1…RR40, and because of the vertical orientation, a
phase difference creeps in between the left-eye and right-eye RF. It can intuitively be
deduced that for horizontal orientations, there cannot be any such phase differences,
unless vertical retinal disparity is also present. However, if small horizontally-oriented
patterns are used, we would most probably observe a few cases of position disparity
(i)Left Retina
(ii)Right Retina
Figure 5.7: Vertically-oriented patterns
5.4 VALIDATION AGAINST BIOLOGICAL DATA
The main topics of discussion in this chapter have been about the distribution of phase
disparity preferences among cortical neurons and the relation between phase disparity
preference and orientation preference. The LISSOM-based model developed a
topographic map in which an overwhelming majority of neurons prefer phase disparities
in the region of 0°, but the map also included patches of neurons preferring larger
magnitudes of phase disparity. It has been seen that these neurons tend to prefer
orientations that have a significant vertical component. These two findings need to be
validated against biological data to assess the plausibility of the model.
5.4.1 DISTRIBUTION OF PHASE DISPARITIES
The work done by Anzai et al (1999a) on neural mechanisms involved in the encoding
of binocular disparity provides a good comparison platform. Anzai and colleagues
investigated the role of position and phase in the encoding of disparity in cats. They
inspected 97 simple cells in 14 adult cats and compiled a frequency distribution for the
observed phase disparities. They found out that phase disparities are distributed around
zero, indicating that cells with similar RF profiles are most numerous (Anzai et al,
1999a). It was also observed that the disparities were mostly limited between -90° and
90°, although some disparities beyond that range were also recorded (Anzai et al,
1999a).
76
Figure 5.8: Histogram of phase disparity compiled by Anzai et al from results obtained
in cells from cats (reprinted from [3])
5.4.2 DISPARITY AND ORIENTATION PREFERENCE
Studies have shown that RF profiles for the left and right eyes are more or less the same
for cells tuned to horizontal orientations, whereas those for cells tuned to vertical
orientations show a certain degree of dissimilarity (DeAngelis et al, 1991, 1995;
Ohzawa et al. 1996). The study by Anzai et al also confirmed this. This is because the
eyes are displaced laterally, and due to binocular parallax, a larger range of binocular
disparities is along horizontal compared to vertical directions; consequently, phase
disparity is expected to be larger for cells preferring vertical orientations (Anzai et al,
1999a).
5.5 SUMMARY
The LISSOM-based experiments have shown that disparity-selectivity develop as a
result of input-driven self-organization. It was found that cortical neurons develop left
and right-eye RF profiles that differ in phase when trained with inputs presented at noncorresponding points on the retinas. Most of the neurons prefer phase disparities in the
region of zero, as has been shown to be the case in biology. Moreover, the magnitude of
the phase disparities has been observed to be below 90° in experiments with reasonable
amount of retinal disparity (2.0 retinal units). This corroborates with work done by
Blake and Wilson (1991), and Marr and Poggio (1979), who remarked that phase
disparity must be limited to 90° “in order for band-pass filters to unambiguously encode
binocular disparity” (Anzai et al, 1999a). This is also confirmed by Anzai and
colleagues, although phase disparities larger than 90° were also recorded during their
77
study on simple cells in the cat’s striate cortex. Since the self-organizing process
depends a lot on the input type, it would be interesting to investigate the effects of input
patterns other than the elongated Gaussians that have been used in this project. The
Gaussian and Plus/Minus experiments yielded disparity maps that were not a great deal
different from each other despite the fact that the RF structures in either case were quite
dissimilar.
The other notable observation concerns the relationship between phase disparities and
orientation preference. The simulation results have shown that in general, neurons with
dissimilar left and right-eye RF profiles tend to prefer vertical orientations. This is
backed by the biological recordings of Anzai et al (1999a) and also by the observations
made by Ts’o et al (2001) during their impressive study on region V2 of monkeys. The
LISSOM results tend to suggest that regions with a preferred orientation in the vicinity
90° contain subcompartments that are selective for relatively high phase disparities,
while regions that contain neurons preferring orientations with a significant component
might be containing substructures with preferred phase disparities around zero.
78
CHAPTER 6
CONCLUSION AND FUTURE WORK
6.1 CONCLUSION
The topographic representation of features in the primary visual cortex has been the
focus of many studies; it is believed that an understanding of the neural mechanisms
involved in topography can provide the foundations to formulate general theories about
learning, memory and knowledge representation in the brain (Swindale, 1996). The
ordered arrangement of neurons in V1 is a consequence of a self-organizing process that
is believed to take place in both pre-natal and post-natal stages. Computational
modelling is increasingly being used to simulate this process in an attempt to help
neurobiologists in their quest for answers. Many models have successfully simulated the
self-organizing process for features like orientation and ocular dominance, and results
have been in phase with experimental data. But little or no work has been done to
investigate binocular disparity, believed to be a major visual cue for the perception of
depth. This study is the first of its kind to investigate disparity selectivity in LISSOM,
and is one among few others that have used some self-organizing algorithm to
investigate this feature. The main aim of this project was to investigate the input-driven
self-organizing process for disparity selectivity in LISSOM. Simulation results with
simple elongated Gaussians as input have shown that cortical neurons develop left and
right-eye receptive fields that differ in phase. It was found that an overwhelming
majority of the neurons preferred phase disparities close to zero, with the rest having a
preferred disparity of less than 40° for experiments with relatively small retinal
disparities. The type of input used for training seems to affect the overall disparity map,
but not by much; when Gaussians brighter than the background were used as inputs,
almost all of the neurons develop phase disparities of magnitude less than 30°, while in
the case of random bright and dark Gaussian input patterns, the threshold is around 40°
for the vast majority. An interesting observation in either case concerns the relationship
between disparity selectivity and orientation preference. The results show that neurons
that prefer vertical or near-vertical orientations tend to develop relatively large phase
differences between their monocular receptive fields, while neurons that prefer
orientations around 0° tend to have similar monocular RFs. . This suggests that cortical
regions grouped by orientation preferences might be subdivided into compartments that
are in turn organised based on disparity selectivity. But before jumping to conclusions,
more in-depth computational experiments have to be carried out. Several disparityrelated issues could not be investigated due to time constraints, the most important one
probably being the implementation of complex cells. The next section describes some of
79
the key points that need to be investigated to eventually provide a concrete, testable
explanation for the self-organization mechanism of disparity selectivity so as to assist
theoretical neuroscientists in better understanding the functioning of the brain.
6.2 FUTURE WORK
6.2.1 PHASE INVARIANCE
First and foremost, the phase invariance issue has to be resolved. Although the ODF
model simulates complex cells quite nicely and has proved to produce biologically
plausible results, it lacks developmental capabilities, i.e. it is difficult to integrate it
within a self-organizing model. It is a very specific model that follows certain fixed
rules. There is a need for a more flexible approach that can be integrated within the selforganizing framework of LISSOM. Because of lack of time, such a model could not be
implemented, but a potential approach is proposed in this section to serve as a guide in
extending the work done on the phase invariance issue in LISSOM. It is based on the
temporal trace rule developed by Foldiak (1991) and is largely inspired from a selforganizing model proposed by Sullivan and R. de Sa (2004).
The temporal trace rule is a modified Hebbian rule in which weight adaptation is based
on the presynaptic activity(x) and to a trace, or average value, of the postsynaptic
activity (ỹ). A trace is a running average of the activation of a neuron, such that activity
of the unit at a particular moment influences learning at a later moment (Foldiak, 1991).
The main idea behind this rule is that input stimuli that are close together on a temporal
scale are likely to be generated by the same object (Sullivan and R. de Sa, 2004). The
rule is characterized by these equations, as formulated by Foldiak,
where
ỹ(t) is the trace value for a complex cell
ỹ(t-1) is the previous trace value, i.e. the trace value at the previous time step
y(t) is the instantaneous activation based on visual stimuli (using the dot-product rule)
δ is a decay term.
The model by Sullivan and R. de Sa comprises three learning rules, namely Hebbian
Learning, the temporal trace rule and SOM. It consists of a simple cell layer and a
complex cell layer. The simple cell layer consists of identical groups of neurons having
80
similar orientation-preferences while the complex cell layer is initially unorganized. So
the basic assumption is that self-organization has already taken place among the simple
cells before they contribute towards the learning process of the complex cells. This is
not an unrealistic assumption since, as the authors point out, work done by Albus and
Wolf (1984) indicated that simple cells in cats are orientation selective before the input
reaches the complex cells in layers 2/3 of the striate cortex. Another assumption made
by Sullivan and R. de Sa is that the complex cells all respond to the same receptive
field, which is very unrealistic, as humbly acknowledged by the authors.
For the proposed model, the first assumption will be maintained but the second one will
be dropped. Instead of a fully-connected network between the simple cell layer and the
complex cell layer, each unit in the output layer will have a connection field enclosing a
suitably-sized region in the simple cell layer. So basically, if only one eye is considered,
there will be an input sheet, the customary LGN ON and OFF sheets, and a couple of
other sheets corresponding to the simple and complex layers. This is shown in figure
6.1.
Figure 6.1: Proposed model for phase-invariant response
There will be two learning phases, the first one directed towards the topographic
organization of the units in the Simple sheet, and the second one for the learning process
of the complex cells. During the first phase, the connections between the Simple and
81
Complex sheets would be inactive. One possible type of input for this stage could be
Plus/Minus elongated Gaussians to allow the development of different kinds of RF
profiles in the Simple layer. During the second stage, learning should be switched off
(Topographica provides this option) for the afferent connections to the Simple layer.
The connections from this layer to the Complex layer can then be activated. The inputs
for this phase of training could be in the form of a series of activity wave line stimuli
that are swept across the retina in discretized time steps (Sullivan and R. de Sa, 2004;
Foldiak, 1991). The relative brightness of the line with respect to the background, and
its orientation, can be randomized for each sweep. This kind of activity can be a model
of the pre-natal retinal waves (Sullivan and R. de Sa, 2004; Foldiak, 1991). By
sweeping the line stimulus across the eye, simple units of the appropriate orientation
and phase-preference will be activated in different positions at different moments of
time (Foldiak, 1991). The activation of these simple units would be the input to the
complex cell layer. If activation of these simple units excites a complex cell, then the
trace of this cell is enhanced for a certain period of time (preferably the number of time
steps it would take to traverse the receptive of the complex cell, although this is not a
trivial matter since the receptive of a cortical neuron is not explicitly defined in
LISSOM when multiple sheet layers are used). All the connections from the simple
units that get activated during that time period get strengthened according to the trace
rule (Foldiak, 1991). Only those simple cells that have a preferred orientation similar to
the orientation of the bar stimulus would be activated, thus making the complex cell
strongly selective for that orientation. Note that phase independence would also
potentially develop after training since the brightness of the bar stimuli relative to the
background is randomized for each sweep. So, for the region of the retina that
corresponds to the receptive field of a particular complex cell, an optimally-oriented bar
would in theory excite that cell irrespective of its brightness and position, after many
permutations of the input stimuli have been presented. The presence of lateral
connections in the complex cell layer would possibly lead to the formation of an
ordered map for orientation, just like in the current simple-cell versions of LISSOM.
The proposed model is just a rough guide to what might be done to get phase invariance
in LISSOM. Its implementation might necessitate a look at some other intricacies that
have been overlooked. It is somewhat different from the model proposed by Sullivan
and R. de Sa. The latter mention that their model captures phase invariance, but refer to
this property as the response of a complex cell to an optimally-oriented stimulus
anywhere within the receptive field of the cell. This is contradictory since it points
towards position invariant response rather than phase invariant response. The proposed
model tries to deal both with position invariance and phase invariance. If the
82
implementation of the one-eye model does yield reliable results, it can be extended to a
two-eye model and disparity-selectivity can be investigated.
6.2.2 DISPARITY SELECTIVITY AND OCULAR DOMINANCE
Many studies have focused on both disparity selectivity and ocular dominance, but it is
still unclear if there is some kind of relationship between these two features. Certain
studies have shown that manipulations with OD columns also affect stereopsis, and that
these two features appear and mature with similar time-courses (Gonzales and Perez,
1998). Gonzales and colleagues suggest that “binocularity may not be an obligate
feature of a cell to be involved in the stereoscopic process” (Gonzales and Perez, 1998).
Some studies have shown that disparity sensitivity is strongly associated with binocular
neurons, as one would expect, but other investigations have surprisingly indicated that
disparity-selective cells are strongly dominated by one eye (Gonzales and Perez, 1998).
This might all be tied up to the type of cell and influence of excitatory and inhibitory
connections from each eye. It would be worthwhile to investigate the relationship, if
any, between disparity selectivity and ocular dominance in LISSOM to shed some light
into this issue. The script for the self-organizing process leading to OD, and the OD
map-measuring functionality have been implemented in Topographica during the
course of this project. Unfortunately, proper experiments could not be performed
because one particular functionality in Topographica, namely Joint Normalization, was
not working properly. As its name suggests, it takes all the afferent connections into
consideration for normalization, instead of independently normalizing each connection.
Joint Normalization is important while investigating ocular dominance since input
patterns differ in brightness level on each retina, and therefore there is a need to reflect
these differences among the connection weights during learning.
6.2.3 VERTICAL DISPARITY
Horizontal disparity represents the major visual cue for stereopsis, but it is not enough
to compute stereo depth on its own (Gonzalez and Perez, 1998). Certain reports have
suggested that vertical disparity might play a role in calibrating horizontal disparities in
the depth perception process while other studies deny this (Gonzalez and Perez, 1998).
Once again computational modeling can help in disambiguate this matter. It is
straightforward to include vertical retinal disparity in the current scripts, but caution
must be observed while setting a value to it, since physiological studies suggest that
vertical disparities are very small as compared to horizontal disparities
83
6.2.4 PRENATAL AND POSTNATAL SELF-ORGANIZATION
Very little is known about the effects of prenatal activity on the development of
disparity selectivity in V1. Prior work in this area is almost non-existent. However,
Berns et al (1993) attempted to probe into this and their experiment yielded some
interesting results. They used a model consisting of 2 retinas and a cortical layer fully
connected with synaptic weights that used Hebbian adaptation. The model also included
lateral connections in the cortical layer. They found out that with no correlations present
between the eyes, the model developed only monocular cells, but with correlation, the
cortical neurons formed were completely binocular. But with an initial phase of sameeye correlations and a second phase with both same-eye and between-eye correlations, a
mixture of disparity-selective monocular and binocular cells were observed to form
(Berns et al, 1993). Although the model used is not a robust representation of biological
systems, the results tend to indicate that disparity selectivity is very much influenced by
both prenatal and postnatal activities. An extension to the work done in this project
could be the investigation of both spontaneous and visually-evoked activity in the selforganizing process. Spontaneous activity can be simulated using noisy disk model of
retinal waves and postnatal training can be achieved using stereo images. The procedure
would be similar to the one that has already been used in the study of influence of
genetic and environmental factors on the development of orientation maps in V1 using
LISSOM. It would be interesting to observe the impact of stereo images during training
after the noisy disk phase, especially since use of stereo images alone did not yield
conclusive results during this study
84
BIBLIOGRAPHY
[1].
Albus, K., and Wolf, W. (1984). Early postnatal development of neuronal function in
the kitten’s visual cortex: A laminar analysis. The Journal of Physiology, 348:153–185.
[2].
Anzai, A., Ohzawa, I., Freeman, R.D., 1997. Neural mechanisms underlying binocular
fusion and stereopsis: position v/s phase. Proc. Natl. Acad. Sci. USA 94, 5438–5443.
[3].
Anzai, A., Ohzawa, I., and Freeman, R. D. Neural mechanisms for encoding binocular
disparity: position versus phase. J. Neurophysiol. 82: 874–890, 1999a.
[4].
Barlow, H.B., Blakemore, C., Pettigrew, J.D., 1967. The neural mechanisms of
binocular depth discrimination. J. Physiol. 193, 327–342.
[5].
Berns G.S., Dayan P., and Sejnowski T.J. A correlational model for the development of
disparity selectivity in visual cortex that depends on prenatal and postnatal phases. Proc
Natl Acad Sci U S A. 1993 Sep 1;90(17):8277-81
[6].
Bishop,P.O (1989) Vertical disparity, egocentric distance and stereoscopic depth
constancy: a new interpretation. Proc. R. Soc. London Ser. B 237,445-469
[7].
Blake, R. and Wilson, H. R. Neural models of stereoscopic vision. Trends Neurosci. 14:
445–452, 1991.
[8].
Blakemore, C. (1970) The representation of three-dimensional visual space in the cat’s
striate cortex. J. Physiol. (Lond) 209, 155–178
[9].
Blasdel, G. G., and Salama, G. (1986). Voltage-sensitive dyes reveal a modular
organization in monkey striate cortex. Nature, 321:579–585.
[10].
Cowey, A. and Ellis, C.M. (1967) Visual acuity of rhesus and squirrel monkeys. J.
Comp. Physiol. Psychol. 64, 80-84
[11].
DeAngelis, G. C., Ohzawa, I., and Freeman, R. D. Depth is encoded in thevisual cortex
by a specialized receptive field structure. Nature 352: 156–159, 1991.
[12].
DeAngelis, G. C., Ohzawa, I., and Freeman, R. D. Neuronal mechanisms underlying
stereopsis: how do simple cells in the visual cortex encode binocular disparity?
Perception 24: 3–31, 1995.
85
[13].
DeAngelis, G.C., Newsome, W.T., 1999. Organization of disparity-selective neurons in
macaque area MT. J. Neurosci.19, 1398–1415.
[14].
DeAngelis, G.C., Seeing in three dimensions: the neurophysiology of stereopsis. Trends
in Cognitive Sciences – Vol. 4, No. 3, March 2000
[15].
Devalois, R. and Jacobs, G. H. (1968) Primate vision color, Science 162, 533-540
[16].
Devalois, R. L. and Devalois, K. K. Spatial Vision. New York: Oxford,1988.
[17].
Farrer, D.N. and Graham, E.S (1967) Visual acuity in monkeys. A monocular and
binocular subjective technique.Vision Res. 7, 743-747
[18].
Fischer B and Krueger J. Disparity tuning and binocularity of single neurons in cat
visual cortex. Exp Brain Res 35: 1–8, 1979.
[19].
Fischer B and Poggio GF. Depth sensitivity of binocular cortical neurons of behaving
monkeys. Proc R Soc Lond B Biol Sci 204: 409–414, 1979.
[20].
Fleet, D. J., Wagner, H., and Heeger, D. J. Neural encoding of binocular disparity:
energy models, position shifts and phase shifts. Vision Res. 36: 1839–1857, 1996.
[21].
Foldiak, P. (1991). Learning invariance from transformation sequences. Neural
Computation, 3:194–200
[22].
Freeman, R. D. and Ohzawa, I. On the neurophysiological organization of binocular
vision. Vision Res. 30: 1661–1676, 1990.
[23].
Gonzalez, F., Krause, F. ,Perez, R.,Alonso, J.M.and Acuna, C. (1993a) Binocular
matching in monkey visual cortex: single cell responses to correlated and uncorrelated
dynamic random dot stereograms. Neuroscience 52, 933-939
[24].
Gonzales, F. and Perez R. (1998) Neural mechanisms underlying stereoscopic vision
Progress in Neurobiology 55, 191-224
[25].
Gonzalez F, Perez R, Justo MS, and Ulibarrena C. Binocular interaction and sensitivity
to horizontal disparity in visual cortex in the awake monkey. Int J Neurosci 107: 147–
160, 2001.
[26].
http://www.nei.nih.gov/photo/eyean/ (first visited 13/08/06)
86
[27].
Harweth, R.S., Smith, E.T. and Siderov, J. (1995) Behavioral studies of local stereopsis
and disparity vergence in monkeys, Vision Res. 35, 1755-1770
[28].
Hubel, D.H., Wiesel, T.N., 1962. Receptive fields binocular interaction and functional
architecture in the cat’s visual cortex
[29].
Hubel D.H., Wiesel T.N. (1977) Functional architecture of macaque monkey visual
cortex. Proc R Soc Lond B 198:1-59
[30].
Hubel, D., Eye, Brain, and Vision (Scientific American Library Series, 1995)
[31].
Hubener m., Shoham D., Grinvald A, Bonhoeffer T (1997) Spatial relationships among
three columnar systems in cat area 17. J Neurosci 17, 9270-9284
[32].
Joshua, D.E. and Bishop, P.O (1970) Binocular single vision and depth discrimination.
Receptive field disparities for central peripheral vision and binocular interaction on
peripheral single units in cat striate cortex. Exp. Brain Res. 10, 389-416
[33].
Julesz, B (1971) Foundations of Cyclopean Perception, University of Chicago Press:
Chicago
[34].
Kohonen, T. (1982b). Self-organized formation of topologically correct feature maps.
Biological Cybernetics, 43:59–69.
[35].
LeVay, S. and Voigt, T. (1988) Ocular Dominance and disparity coding in cat visual
cortex. Visual Neurosci. 1, 395- 414
[36].
Levine M.W. and Shefner, J.M. (1991). Fundamentals of sensation and perception, 2nd
ed. Pacific Grove, CA: Brooks/Cole.
[37].
Marr, D. and Poggio, T. A computational theory of human stereo vision. Proc. R. Soc.
Lond. B Biol. Sci . 204: 301–328, 1979.
[38].
Maske, R., Yamane, S., and Bishop, P. O. Binocular simple cells for local stereopsis:
comparison of receptive field organizations for the two eyes.Vision Res. 24: 1921–1929,
1984.
[39].
Miikkulainen, R., Bednar, J. A., Choe, Y., and Sirosh, J. (1997). Self-organization,
plasticity, and low-level visual phenomena in a laterally connected map model of the
primary visual cortex. In Goldstone, R. L., Schyns, P. G., and Medin, D. L., editors,
Perceptual Learning, volume 36 of Psychology of Learning and Motivation, 257–308.
San Diego, CA: Academic Press.
87
[40].
Miikkulainen, R., Bednar, J.A. , Chloe Y., Shirosh, J. Computational Maps in the
Visual Cortex (New York: Springer 2005)
[41].
Nikara, T., Bishop, P. O., and Pettigrew, J. D. Analysis of retinal correspondence by
studying receptive fields of binocular single units in cat striate cortex. Exp. Brain Res.
6: 353–372, 1968
[42].
Nomura, M., Matsumoto, G., and Fujiwara, S. A binocular model for the simple cell.
Biol. Cybern. 63: 237–242, 1990.
[43].
Ohzawa, I., Freeman, R.D., 1986a. The binocular organization of complex cells in the
cat’s visual cortex.J. Neurophysiol. 56, 243–259.
[44].
Ohzawa, I., Freeman, R.D., 1986b. The binocular organization of simple cells in the
cat’s visual cortex.J. Neurophysiol. 56, 221–24
[45].
Ohzawa, I., DeAngelis, G.C., Freeman, R.D., 1990. Stereoscopic depth discrimination
in the visual cortex: neurons ideally suited as disparity detectors. Science 249, 1037–
1041.
[46].
Ohzawa, I., DeAngelis, G. C., and Freeman, R. D. Encoding of binocular disparity by
simple cells in the cat’s visual cortex. J. Neurophysiol. 75:1779–1805, 1996.
[47].
Ohzawa, I., DeAngelis, G. C., and Freeman, R. D. Encoding of binocular disparity by
complex cells in the cat’s visual cortex. J. Neurophysiol. 77:2879–2909, 1997.
[48].
Poggio GF and Fischer B.(1977) Binocular interaction and depth sensitivity of striate
and prestriate cortex of behaving rhesus monkey. J Neurophysiol 40:1392–1405
[49].
Poggio, G.F. and Talbot, W.H. (1981) Mechanisms of static and dynamic stereopsis in
foveal cortex of the rhesus monkey. J Physiol. Lond. 315, 469- 492
[50].
Poggio, G.F., Poggio, T., 1984. The analysis of stereopsis. Ann. Rev. Neurosci. 7, 379–
412.
[51].
Poggio, G.F., Motter, B.C., Squatrito, S., Trotter, Y., 1985. Responses of neurons in
visual cortex (V1 and V2) of the alert macaque to dynamic random-dot stereograms.
Vision Res. 25, 397–406.
[52].
Poggio GF, Gonzalez F, and Krause F. (1988) Stereoscopic mechanisms in monkey
visual cortex: binocular correlation and disparity selectivity. J Neurosci 8: 4531-4550
88
[53].
Prince SJ, Pointon AD, Cumming BG, and Parker AJ. (2002b) Quantitative analysis of
the responses of V1 neurons to horizontal disparity in dynamic random-dot
stereograms. J Neurophysiol 87: 191–208.
[54].
Qian, N., (1994). Computing stereo disparity and motion with known binocular cell
properties. Neural Comp. 6,390–404.
[55].
Qian, N., (1997) Binocular disparity and the perception of depth. Neuron. 18, 359–368.
[56].
Qian, N. and Zhu, Y. (1997) Physiological computation of binocular disparity. Vision
Res. 37: 1811–1827.
[57].
Read, J.C.A. 2005. Early computational processing in binocular vision and depth
perception. Progress in Biophysics and Molecular Biology 87 (2005) 77–108
[58].
Read, J.C.A., Parker, A.J., Cumming, B.G., 2002. A simple model accounts for the
reduced response of disparity-tuned V1 neurons to anti-correlated images. Vis.
Neurosci. 19, 735–753.
[59].
Schmuel A, Grinvald A (1996) Functional organization for direction of motion and its
relationship to orientation maps in area 18. J Neurosci 16, 6945-6964
[60].
Skottun, B.C., DeValois, R.L., Grosof, D.H., Movshon, J.A., Albrecht,D.G., and Bonds,
A.B. (1991). Classifying simple and complex cells on the basis of reponse modulation.
Vision Res. 31, 1079–1086.
[61].
Sirosh, J., and Miikkulainen, R. (1993). How lateral interaction develops in a selforganizing feature map. In Proceedings of the IEEE International Conference on
Neural Networks (San Francisco, CA), 1360–1365. Piscataway, NJ: IEEE
[62].
Sullivan, T.J. and R. de Sa, V. (2004) A temporal trace and SOM-based model of
complex cell development, Neurocomputing, 58-60, 827-833
[63].
Swindale N.V. (2000) The development of topography in the visual cortex: a review of
models. Network: Computation in Neural Systems 7, 161-247
[64].
Swindale N.V. (2000) How Many Maps are there in Visual Cortex? Cerebral Cortex,
Vol. 10, No. 7, 633-643
[65].
Tootell, R.B.H., and Hamilton, S.L. (1989). Functional anatomy of the second visual
area (V2) in the Macaque. Journal of Neuroscience, 9, 2620-2644
89
[66].
Ts’o, D.Y., Roe, A.W., and Gilbert, C.D. (2001). A hierarchy of the functional
organization for color, form and disparity in primate visual area V2. Vision Research,
41, 1333-1349
[67].
Von Der Heydt, R., Adorjani, C. S., Hanny, P., and Baumgartner, G. Disparity
sensitivity and receptive field incongruity of units in the cat striate cortex. Exp. Brain
Res. 31: 523–545, 1978.
[68].
Von der Malsburg, C. (1973). Self-organization of orientation-sensitive cells in the
striate cortex. Kybernetik, 15:85–100. Reprinted in Anderson and Rosenfeld (1988),
212–227.
[69].
Wiemer, J., Burwick,T., von Seelen, W. (2000) Self-organizing maps for feature
representation based on natural binocular stimuli. Biol. Cybern. 82, 97-110 (2000)
[70].
Weliky M, Bosking WH, Fitzpatrick D (1996) A systematic map of direction preference
in primary visual cortex. Nature 379, 725-728
[71].
Zhu, Y. and Qian, N. Binocular receptive field models, disparity tuning, and
characteristic disparity. Neural Comput. 8: 1611–1641, 1996.
90