November 2000 Volume 3 Number Supp p 1168

Models of motion detection
Alexander Borst
The author is at the ESPM-Division of Insect Biology, 201 Wellman
Hall, Univ. of California Berkeley, Berkeley, California 94720, USA.
e-mail: [email protected]
Visual motion detection is one of the most active areas in systems
neuroscience today1, 2, and the cellular mechanisms of directional
selectivity may soon be understood in unprecedented biophysical
detail. Alongside undeniable technical advances such as whole-cell
patch-clamp recording and the retinal slice preparation, a major
determinant of this recent progress is the conceptual foundation laid
almost half a century ago.
Curiously, the story began with two young soldiers during World War
II. A biology student, Bernhard Hassenstein, then 21, met a 19-yearold aspiring physicist, Werner Reichardt. In the craziness of wartime,
they promised each other that, if they survived, they would do
something great together: start the first institute of physics and
biology. In 1958 they founded the Research Group of Cybernetics at
the Max-Planck-Institute of Biology in Tübingen, Germany. In a
congenial collaboration, which still sounds like the goal of every
summer school in computational neuroscience, they did a series of
elegant experiments, using the optomotor response of the beetle
Cholorphanus as a behavioral measure. This response is the animal's
tendency to follow the movement of the visual surround to
compensate for its mistaken perception of self-motion in the opposite
direction. The beetle was glued to a rod so it could not move its body,
head or eyes relative to the surround, but could express its behavior
at decision points by rotating a 'Y-maze globe' under its feet (Fig. 1).
Their results3 led to the development of a model for
motion detection that became known as the
'correlation-type motion detector', the 'HassensteinReichardt model' or briefly—omitting half the original
team—the 'Reichardt detector' (Fig. 2). The core
computation in this model is a delay-and-compare
mechanism: delaying the brightness signal as
measured by one photoreceptor by a low-pass filter
and comparing it by multiplication with the
instantaneous signal derived from a neighboring
location. Doing this twice in a mirror-symmetrical
fashion and subtracting the output signals of both
subunits leads to a response that is fully directionally
selective. The strict mathematical treatment of this
model4 led to many counterintuitive predictions, which
nevertheless were experimentally verified in many
species' behavior and in many types of neurons (for
review, see ref. 5). For example, the model predicted
that the response, unlike a speedometer, should not
increase continuously with increasing velocity; instead,
going beyond an optimum velocity should decrease the
response. The model also predicted that the optimum
velocity should vary with the pattern's spatial
wavelength so that their ratio remains constant.
The theory's influence can hardly be overestimated. It
inspired work on motion vision in many animals,
including humans. In some cases, filters and
parameters of the original model were modified to fit
experimental observations6. In others, researchers
approaching the problem from a different angle arrived
at similar solutions, such as the 'motion energy model',
which despite a different internal architecture is
identical to the original model at its output7. Some of
these studies became famous under their own name,
like the 'Barlow-Levick-model' of motion detection,
arising from ex-periments on rabbit retinal ganglion
cells8 that were stimulated, not by smooth motion, but
by a sequence of discrete illumination steps in two
neighboring locations, in either the preferred or null
direction for the cell. Barlow and Levick found that the
response to the null direction sequence was
significantly reduced compared to the sum of the
individual responses, whereas the response to the
preferred direction sequence was roughly equal to the
sum of individual responses. The authors proposed a
veto-mechanism or 'null-direction inhibition' as the
basis for direction selectivity. From this study, the
historical thread leads to the proposal that a shunting
inhibition is the cellular implementation of the veto
operation9, and from there directly to the current 'pre
or post' debate over directionally selective ganglion
Thus, the Hassenstein-Reichardt model set the
standard for how researchers thought about visual
motion detection and how they designed experiments.
In a more general sense, it introduced mathematical
techniques and quantitative modeling to biology,
clearly demonstrating that our intuition does not reach
very far; instead we soon reach the point where the
'pen starts getting smarter than the person holding it'.
Far beyond the question of whether the particular
Hassenstein-Reichardt model is correct or not, this has
probably been its most significant contribution to
Figure 1: Tethered Chlorophanus walking on the Y-
maze globe (from ref. 10).
Figure 2: Correlation-type motion detector (from ref.
November 2000 Volume 3 Number Supp p 1170
Computation by neural
Geoffrey E. Hinton
The author is in the Gatsby Computational
Neuroscience Unit, University College London, 17
Queen Square, London WC1N 3AR, UK.
e-mail: [email protected]
Networks of neurons can perform computations that
have proved very difficult to emulate in conventional
computers. In trying to understand how real nervous
systems achieve their remarkable computational
abilities, researchers have been confronted with three
major theoretical issues. How can we characterize the
dynamics of neural networks with recurrent
connections? How do the time-varying activities of
populations of neurons represent things? How are
synapse strengths adjusted to learn these
representations? To gain insight into these difficult
theoretical issues, it has proved necessary to study
grossly idealized models that are as different from real
biological neural networks as apples are from planets.
The 1980s saw major progress on all three fronts. In a
classic 1982 paper1, Hopfield showed that
asynchronous networks with symmetrically connected
neurons would settle to locally stable states, known as
'point attractors', which could be viewed as contentaddressable memories. Although these networks were
both computationally inefficient and biologically
unrealistic, Hopfield's work inspired a new generation
of recurrent network models; one early example was a
learning algorithm that could automatically construct
efficient and robust population codes in 'hidden'
neurons whose activities were never explicitly specified
by the training environment2.
The 1980s also saw the widespread use of the
backpropagation algorithm for training the synaptic
weights in both feedforward and recurrent neural
networks. Backpropagation is simply an efficient
method for computing how changing the weight of any
given synapse would affect the difference between the
way the network actually behaves in response to a
particular training input and the way a teacher desires
it to behave3. Backpropagation is not a plausible model
of how real synapses learn, because it requires a
teacher to specify the desired behavior of the network,
it uses connections backward, and it is very slow in
large networks. However, backpropagation did
demonstrate the impressive power of adjusting
synapses to optimize a performance measure. It also
allowed psychologists to design neural networks that
could perform interesting computations in unexpected
ways. For example, a recurrent network that is trained
to derive the meaning of words from their spelling
makes very surprising errors when damaged, and
these errors are remarkably similar to those made by
adults with dyslexia4.
The practical success of backpropagation led
researchers to look for an alternative performance
measure that did not involve a teacher and that could
easily be optimized using information that was locally
available at a synapse. A measure with all the right
properties emerges from thinking about perception in a
peculiar way: the widespread existence of top-down
connections in the brain, coupled with our ability to
generate mental images, suggests that the perceptual
system may literally contain a generative model of
sensory data. A generative model stands in the same
relationship to perception as do computer graphics to
computer vision. It allows the sensory data to be
generated from a high-level description of the scene.
Perception can be seen as the process of inverting the
generative model—inferring a high-level description
from sensory data under the assumption that the data
were produced by the generative model. Learning then
is the process of updating the parameters of the
generative model so as to maximize the likelihood that
it would generate the observed sensory data.
Many neuroscientists find this way of thinking
unappealing because the obvious function of the
perceptual system is to go from the sensory data to a
high-level representation, not vice versa. But to
understand how we extract the causes from a
particular image sequence, or how we learn the classes
of things that might be causes, it is very helpful to
think in terms of a top-down, stochastic, generative
model. This is exactly the approach that statisticians
take to modeling data, and recent advances in the
complexity of such statistical models5 provide a rich
source of ideas for understanding neural computation.
All the best speech recognition programs now work by
fitting a probabilistic generative model.
If the generative model is linear, the fitting is relatively
straightforward but can nevertheless lead to impressive
results6, 7. There is good empirical evidence that the
brain uses generative models with temporal dynamics
for motor control8 (see also ref. 9, this issue). If the
generative model is non-linear and allows multiple
causes, it can be very difficult to compute the likely
causes of a pattern of sensory inputs. When exact
inference is unfeasible, it is possible to use bottom-up,
feedforward connections to activate approximately the
right causes, and this leads to a learning algorithm for
fitting hierarchical nonlinear models that requires only
information that is locally available at synapses10. So
far, theoretical neuroscientists have considered only a
few simple types of nonlinear generative model.
Although these have produced impressive results, it
seems likely that more sophisticated models and better
fitting techniques will be required to make detailed
contact with neural reality.
