Download 179 - Edmund Rolls

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Signal transduction wikipedia , lookup

Multielectrode array wikipedia , lookup

Neuroanatomy wikipedia , lookup

Neuroesthetics wikipedia , lookup

Time perception wikipedia , lookup

Synaptic gating wikipedia , lookup

Electrophysiology wikipedia , lookup

Optogenetics wikipedia , lookup

Development of the nervous system wikipedia , lookup

Psychophysics wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Neural coding wikipedia , lookup

Subventricular zone wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Neural correlates of consciousness wikipedia , lookup

Catastrophic interference wikipedia , lookup

C1 and P1 (neuroscience) wikipedia , lookup

Biological neuron model wikipedia , lookup

Nervous system network models wikipedia , lookup

Recurrent neural network wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Convolutional neural network wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Transcript
Proceedings o f 1993 International Joint Conference on Neural Networks
Learning invariant responses to the natural
transformations of objects’
G u y Wallis,2 E d m u n d Rolls and Peter Foldiik.
Oxford University, D e p a r t m e n t of Experimental Psychology,
South Parks Road, Oxford OX1 3UD, England.
Abstract
The primate visual system builds representations of objects which are invariant with respect to transforms
such as translation, size, and eventually view, in a series of hierarchical cortical areas. To clarify how such
a system might learn to recognise ‘naturally’ transformed objects, we are investigating a model of cortical
visual processing which incorporates a number of features of the primate visual system. The model has a
series of layers with convergence from a limited region of the preceding layer, and mut,ual inhibition over a
short range within a layer. The feedforward connections between layers provide the inputs to competitive
networks, each utilising a modified Hebb-like learning rule which encorporates a temporal trace of the
preceding neuronal activity. The trace learning rule is aimed at enabling the neurons to learn transform
invariant responses via experience of the real world, with its inherent spatio-temporal coiist8raints. We
show that the model can learn to produce translation-invariant responses.
1 Introduction
There is evidence that over a series of cortical processing stages, the visual system of primates
produces a representation of objects which shows invariance with respect to, for example, translation, size, and view, as shown by recordings from single neurons in the temporal lobe (see Rolls
1992; Tanaka 1988). In a recent paper, Rolls (1992) reviews much of this work, with specific
regard t o those cells responsive t o faces, and goes on to advance a theory for how these neurons
could acquire their transform independent selectivity. It is this analysis which forms the basis
for the network described here.
Fundamental elements of Rolls’( 1992) hypothesis are:
0
A series of competitive networks, organised in hierarchical layers, exhibiting mutual inhibition over a short range within each layer.
0
0
A convergent series of connections from a localised population of cells in preceding layers
t o each cell of the following layer, thus allowing the receptive field size of cells t o increase
through the visual processing areas or layers.
A modified Hebb-like learning rule incorporating a temporal trace of each cell’s previous
activity, which, it is suggested, will enable the neurons t o learn transform invariances.
To clarify the reasoning behind the third point consider the situation in which a single neuron
is strongly activated by a stimulus within a real world object. The trace of this neuron’s activation will then gradually decay over a time period in the order of 0.5s, say. If, during this
’The research was supported by a grant from the Human Frontier Science Program, and by the Oxford IRC
in Brain and Behaviour.
2G.Wallis received a SERC studentsliip during the course of this work.
1087
Authorized licensed use limited to: BOSTON UNIVERSITY. Downloaded on August 6, 2009 at 01:27 from IEEE Xplore. Restrictions apply.
limited time window, the net is presented with a transformed version of the original stimulus
then not only will the initially active afferent synapses modify, but so also will the synapses
activated by this transformed version of this stimulus. In this way the cell will learn t o respond
t o either appearance of the original stimulus. The cell will not, however, tend t o make spurious
links across stimuli that are part of different objects because of the unlikelihood of one object
consistently following another. A possible biological basis for this temporal trace could lie in the
persistent firing of neurons for 300-500ms observed after presentations of stimuli for as little as
16 ms (Rolls et al, 1993), or alternatively, in the fact that NMDA receptor activated channels
remain activated for periods of up t o several hundred milliseconds (Rolls, 1992). What follows
is a summary of the work carried out‘ on a network architecture which has been simulated t o
investigate these hypotheses.
2 Theory of Learning
The idea that incorporating a trace of cell activity could aid the learning of natural transforms
of objects was first discussed in detail by Foldi6k (FoldiAk 1991). The learning rule used here is
similar t o FoldiAk’s, and can be summarised as follows:
and
where xJ is the j t h input t o the neuron, yt is the output of the ith neuron, wZJis the j t h weight
on the ith neuron, 17 governs the relative influence of the trace and the new input (typically
0.4 - 0.6)’ and gt(t) represents the value of the ith cell’s trace at time t. Note that in this
simulation neuronal learning is bounded by normalisation of each cell’s dendritic weight vector.
An alternative, more biologically relevant implementation, using a local weight bounding operation, has in part been explored using a version of the Oja update rule (Oja 1982; Iiohonen 1984).
3 The Network
3.1 Architecture
The forward connections t o a cell in one layer are derived from a topologically corresponding
region of the preceding layer, using a gaussian distribution of connection probabilities to determine the exact neurons in the preceding layer t o which connections are made. This schema is
constrained t o preclude the repeated connection of any cells. Each cell receives 50 connections
from the 32x32 cells of the preceeding layer, with a 67% probability that a connection comes
from within 4 cells of the distribution centre. Fig.1 shows the general convergent network architecture used, and fig.:! Rolls’ proposal for convergence in the primate visual system. Within
each layer, lateral inhibition between neurons has a radius of effect just greater than the radius
of feedforward convergence just defined. The lateral inhibition is simulated via a linear local
contrast enhancing filter active on each neuron. (Note that this differs froin the global ‘winnertake-all’ paradigm implemented by Foldibk, 1991). The cell activation is then passed through a
non-linear cell output activation function.
1088
Authorized licensed use limited to: BOSTON UNIVERSITY. Downloaded on August 6, 2009 at 01:27 from IEEE Xplore. Restrictions apply.
TE
Layer 4
TEO
v4
v2
v1
view independence
t
view dependent
configurationsensitive
combinationsof features
t
larger receptive fields
t
LGN
0 1.3 3.2 8.0 20 50
Eccentricity I deg
Layer 1
Figure 1: Hierachical network structure.
Figure 2: Convergence in the visual syst,ein.
3.2 Network Input
In order that the results of the simulation might be made more relevant t o understanding processing in higher cortical visual areas, the inputs to layer 1 come from a separate input layer
which provides an approximation t o the encoding found in visual area 1 (V1) of the primate
visual system. These response characteristics are provided by a series of spatially tuned filters
with image contrast sensitivities chosen t o accord with tlie general tuning profiles observed in
the simple cells of V1. Currently, only even symmetric - bar detecting - filter shapes are used.
The precise filter shapes were computed by weighting the difference of two gaussians by a third
orthogonal gaussian according t o the following f ~ r n i u l a : ~
where f is the filter spatial frequency (in the range 0.0625 t o 0.2.5 pixels-’ over four octaves),
8 is the filter orientation (0” t o 135O over four orientations), and p is the sign of the filter i.e.
f l . Cells of layer 1 receive a topologically colisistent, localised, random selection of the filter
responses in the input layer, under the constraint that each cell samples every filter spatial frequency and receives a constant number of inputs.
4 Analysis of learning
In order to test the network a set of three non-orthogonal stimuli, based upon probable 3-D
edge cues (such as a ‘T’ shape), was constructed. During training these stimuli were chosen
in random sequence t o be swept across the ‘retina’ of tlie network, a total of 1000 times. In
order t o assess the characteristics of the cells within the net, a two-way analysis of variance
was performed on tlie set of responses of each cell, with one factor being the stimulus type
and the other the position of the stimulus on the ‘retina’. A high F ratio for stimulus type
( F s ) , and low F ratio for stimulus position ( F p ) would imply that a cell had learned a position invariant representation of the stimuli. The discrimination factor of a particular cell was
then simply gauged as the ratio
(a measure for ranking a t least the most invariant cells4).
FP
’
3We thank Dr. R. Watt, of Stirling University, for assistarnce with the implementation of this filter scheme
4We are grateful to Dr. F, Marriott for his statistical adFice.
1089
Authorized licensed use limited to: BOSTON UNIVERSITY. Downloaded on August 6, 2009 at 01:27 from IEEE Xplore. Restrictions apply.
8
;L"
I
8
Layer 4 - Trace
0
,"
Layer4 -Nonace
20
Layer 4 - Random
6p
Q
cip
.s
Stlmulus2
n
0
0
10
20
30
40
50
0
1
Cell Rank
2
3
4
5
6
Location
Figure 3: Comparison of network discrimination.
Figure 4: Cell showing stiinulus selectivity.
To assess the utility of the trace learning rule, nets trained with the trace rule were compared with
nets trained without the trace rule and with untrained nets (with the initial random weights).
The result of the simulations, illustrated in fig.3, show that networks trained with the trace
learning rule do have neurons with much higher values of
An example of the responses of
one such cell are illustrated in fig.4. Similar position invariant encoding has been demonstrated
for a stimulus set consisting of faces.
g.
5 Conclusions
The results described in this paper show that the proposed learning mechanism and neural architecture can produce cells with responses selective for stimulus type with considerable position
invariance. Although only translation invariance with a limited number of stimuli has been investigated here, in future investigations it will be important t o include the use of a much larger
stimulus set. It would also be of interest t o investigate invariance learning for other 'natural'
object image transforms. The ability of the network t o be trained with natural scenes inay also
help t o advance our understanding of encoding i n the visual system.
References
FoldiBk, P.( 1991). Learning invariance from transformation sequences. Neural Computnfioir 3(2,), 194200.
Kohonen, T.(1984). Self-organization and associative memory. Pub. Sprznger- Verlag.
Oja, E.( 1982). A simplified neuron model as a principal coinponelit analyser. J v I. M a f h . Biol. 15,
26 7-2 79.
Rolls, E.( 1992). Neurophysiological mechanisms underlying face processing within and beyond the teinporal cortical areas. Phzl. Trans. Roy. Soczety London Ser. B 335, 11-21.
Rolls, E. and Tovee, M,(1993), Processing speed in the cerebral cortex, and the neurophysiology of
visual masking. In preparation.
Tanaka, K. Saito, H. Fukada, Y. and Moriya, M.(1991) Coding visual images of objects in the
inferot,emporal cortex of the macaque monkey. Jnl. of Neurophyszology 66(1), 170-189.
1090
Authorized licensed use limited to: BOSTON UNIVERSITY. Downloaded on August 6, 2009 at 01:27 from IEEE Xplore. Restrictions apply.