Download Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Environmental enrichment wikipedia , lookup

Executive functions wikipedia , lookup

History of neuroimaging wikipedia , lookup

Brain–computer interface wikipedia , lookup

Neural engineering wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Binding problem wikipedia , lookup

Artificial general intelligence wikipedia , lookup

Activity-dependent plasticity wikipedia , lookup

Neuroinformatics wikipedia , lookup

Neural oscillation wikipedia , lookup

Cognitive neuroscience wikipedia , lookup

Artificial neural network wikipedia , lookup

Neurocomputational speech processing wikipedia , lookup

Human brain wikipedia , lookup

Cognitive neuroscience of music wikipedia , lookup

Neuroplasticity wikipedia , lookup

Mirror neuron wikipedia , lookup

Single-unit recording wikipedia , lookup

Catastrophic interference wikipedia , lookup

Brain Rules wikipedia , lookup

Aging brain wikipedia , lookup

Neuroeconomics wikipedia , lookup

Perceptual control theory wikipedia , lookup

Development of the nervous system wikipedia , lookup

Connectome wikipedia , lookup

Neuroanatomy wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Optogenetics wikipedia , lookup

Visual servoing wikipedia , lookup

Central pattern generator wikipedia , lookup

Neural modeling fields wikipedia , lookup

Neuroanatomy of memory wikipedia , lookup

Neuroesthetics wikipedia , lookup

Time perception wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Convolutional neural network wikipedia , lookup

Neural coding wikipedia , lookup

Neural correlates of consciousness wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Biological neuron model wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Recurrent neural network wikipedia , lookup

Synaptic gating wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Efficient coding hypothesis wikipedia , lookup

Metastability in the brain wikipedia , lookup

Inferior temporal gyrus wikipedia , lookup

Nervous system network models wikipedia , lookup

Transcript
Feedback on CS 564 Project Proposals - October 30, 2001
The proposals received last week had many excellent pieces, but almost all proposals had something missing, or
failed to draw the pieces together into a coherent work plan. In some cases the literature review was too brief to be
of use; in other cases, the review was not properly factored into the project design. I could rarely get a sense of what
individual group members were going to contribute to the group's efforts. I am thus asking you to submit an
improved version of your proposal by November 13 (email to [email protected]), using both the general
feedback in this message and the specific comments which I will supply on each draft on November 1. You should
not regard this as extra work - instead you should see it as a necessary part of your ongoing efforts to successfully
complete your project in early December.
To avoid misunderstanding, let me be even more explicit about the structure of the proposal for this second
round:
1.
Something I did not ask you for before but which seems necessary: (a) Inseret a copy of the material
describing the Specific Aims for your Project in my NSF Proposal. (b) Then provide an abstract (at most
one page) for your Project stating clearly which parts of these Specific Aims you will cover in your Project,
and the general approach you plan to take.
2.
Review the neuroscience literature. Start by stating the general criteria you are using to search for papers
relevant to your Project. Then survey at least 6 to 8 papers, giving the reader (and yourselves!) an in-depth
understanding of what contribution each paper makes and how it makes it. This section should end with a
careful analysis of how these data relate to the proposed modeling. Which data will be used to define the
basic structure of the model? Do any of the data change our previous assumptions? Which data should the
model explain? Which data seem to be noteworthy as challenges for future modeling, but beyond the scope
of the current semester's effort? (As a model for this, I include the review Larry Kite wrote as a solo effort surely setting a base level for what each group should achieve.)
3.
Review the modeling literature. Start by stating the general criteria you are using to search for modeling
and neural net papers relevant to your Project. Then survey at least 6 to 8 papers, giving the reader (and
yourselves!) an in-depth understanding of what contribution each paper makes and how it makes it. This
section should end with a clear summary of what methods from these papers you plan to use and why; what
methods you will not use and why; and any new ideas you will bring to the modeling. (As a model for this,
I include the review Ryan Mukai wrote as a solo effort - surely setting a base level for what each group
should achieve.)
4.
Now comes the meat of the Proposal: The careful analysis of what you propose to do by early December. If
it is a modeling effort give us the details - building on the conclusions you made in your neuroscience
review {2} and your modeling review {3} to spell out explicitly what model you will build, how you will
characterize inputs and outputs and training rules, and what your criteria are for the success of the model. A
similarly careful plan is required if you choose instead to develop a general architecture for your specific
aim, rather than model one portion in detail.
Arbib: Feedback on CS 564 Project Proposals
5.
October 30, 2001
2
Following the detailed outline of your model, provide a one-page "Statement of Work": what will be the
role of each group member, and what will be your strategy for coordinating these individual efforts to
produce an integrated product in December.
6.
Gather all the references in your report into a full bibliography at the end. I prefer that you follow Larry's
format: (Author, Year) citation in the body of the report; citations unnumbered but arranged alphabetically
by surname of first author in the Reference list.
Important Notes: If you decide at this stage that one of you wants to pursue a project on his own; or if your group
decides that it would be better to split in two groups with rather different aims, that is fine with me. In that case,
each November 12 report should come from its own (possibly reduced) group. However, whatever the group, all
sections must be genuinely co-authored. (Even if you review papers individually you still need to discuss together
how you will use material from these papers, and then edit the result into a single format.) It is not acceptable to
staple together separate reports by group members which show no signs of integration.
An exemplary review of the neuroscience literature: Larry Kite
[MAA: The survey provides an excellent set of recent papers relevant to the problem to be solved. This needs to be
followed by a careful analysis of how these data relate to the proposed modeling. Which data will be used to
define the basic structure of the model? Do any of the data change our previous assumptions? Which data should
the model explain? Which data seem to be noteworthy as challenges for future modeling, but beyond the scope of
the current semester's effort.]
Literature Review
The following constitutes by no means a complete review of the neuroscience literature related to visual control
of grasping movements. I hope to delve much deeper into the literature for the final presentation. Here, however, is a
representative sample of current results in the control of grasping and reaching and the role of vision therein.
In Boussaoud, et al (1999), they examined gaze effects and their relation to the transformation from a retinacentered frame of reference to body-centric coordinates. The authors posit that the distributed nature of eye position
signals across cortical areas suggest that the transformation from retinal to body-centric coordinates does not
proceed in a serial fashion through the pathways linking visual and motor cortical areas. Rather, “various stages of
the visuomotor pathways, such as the posterior parietal cortex and the dorsal pre-motor cortex, contain the necessary
signals for an implicit representation of targets using eye position and retinal information.” This is not to suggest
that there are not neurons that explicitly code target information in a head-centered reference frame. It does suggest
that neuronal populations at all levels of the transformation might create a distributed, implicit coding of target
location and movement direction. In conclusion, the authors write: “A comprehensive theory of visuomotor
transformations must take into account the distributed nature of gaze modulation of the discharge rates of individual
neurons across the cerebral cortex.” Implicit in their findings, the authors state, is that interactions between parietal
areas and PMd might play a role in building flexible, task-dependent reference frames for coding target location and
coordinated gaze and arm movements. Further, mixtures of coordinate systems may emerge at a behavioral level
Arbib: Feedback on CS 564 Project Proposals
October 30, 2001
3
from the distributed neuronal representations using multiple reference frames. The authors also note that there have
been no published models where gaze and retinal signals are combined to code for movement kinematics.
In Rosenbaum, et al (1999), the authors present a computational model for solving the inverse kinematics
problem for reaching and grasping movements. The essence of their idea is that movements are specified in order to
satisfy a hierarchy of cost constraints. Once a suitable goal position is found, a straight-line interpolation in joint
space is calculated to go from the starting posture to the goal posture. Note that goal postures are planned before
movements. If this were not the case, one would have to run through the movement in one’s head to find an
appropriate goal posture. The model also takes into account obstacle avoidance, calculating “via postures”,
intermediate postures through which an obstacle can be avoided. It is interesting to note that their model accurately
predicted certain behavior seen in human reaching and grasping. For instance, grasping for larger objects results in a
larger hand aperture during the reach. Furthermore, the maximum aperture comes later in the movement toward
larger objects.
In Brochier, et al (1999), a muscimol inactivation study in monkey, the authors concluded that cutaneous
feedback to SI is essential for fine control of grip forces and that there is a close relationship between SI and MI in
controlling the precision grip. With injections into SI, finger movements could not be coordinated. However,
performance was improved when the monkey had access to visual cues for control.
In Connolly & Goodale (1999), the authors noted a tight coupling between transport and grip components in a
grasping task. When visual feedback of a limb was prevented in human subjects, reach duration was longer with
proportionate increases in both the acceleration and deceleration phases. However, maximum grip aperture was the
same for both visually augmented reaching/grasping and the case in which visual feedback was removed. Thus, the
authors report, the posture of the hand can be programmed without visual feedback. The conclusion that there exists
a tight coupling between transport and grip was based on the fact that the relative timing of acceleration and
deceleration was unchanged between open-loop (reaching without vision) and closed-loop (reaching with vision)
tasks. The authors do note, however, that visual feedback of the hand is used selectively to guide the closing phase
of the hand movement to an object as the hand becomes more foveated. It is important to note that grasp and
transport are temporally coupled, but functionally distinct.
In Neggers & Bekkering (2000), the authors demonstrated that when ocular gaze is fixated on the target of a
pointing movement that has already started, a second saccade cannot be started until the pointing movement is
completed. This is the case even though the human subject is aware that the second target is presented and wants to
saccade to the second target. The authors conclude that there is an active saccadic inhibition process, which keeps
ocular fixation on a target.
In Santello & Soechting (1998) the authors demonstrate that in a grasping task, the precise finger configuration
of each finger need not be specified before the object is grasped. Instead, tactile feedback can be used to mold the
hand to the object’s precise contours. The only requirement is that the grip be wide enough. However, at a point past
the half-way point in the movement, the shape of the object being grasped (e.g., concave vs. convex) can be
determined through analysis of the finger positions using discriminant functions.
Arbib: Feedback on CS 564 Project Proposals
October 30, 2001
4
In Jenmalm, et al (2000), the authors show that human subjects use visual information to identify the grip-force
requirements of a grasp well before somatosensory information is available. Visual information is also used to
access stored memory information of previous experiences in grasping a given object. Such information can be used
to “set” motor command parameters in advance of the grasp.
In Inoue, et al (1998), a PET study of pointing with visual feedback of the hands in humans, the authors attempt
to locate the brain regions where movements are processed to allow accurate pointing. They conclude that the
supramarginal cortex, the posterior cingulate cortex of the left hemisphere, and the cerebellum are involved in the
integration of visual feedback of hand movements and accurate pointing.
In Ferraina, et al (2000), the authors purport to show that parietal region PEc is a visuomotor region, rather than
a somatosensory region, as widely believed. They show that PEc is “an early node of the parietal system underlying
eye-hand coordination during reaching.” The authors also note the influence of eye position signals on reach-related
activity in the superior parietal lobule.
In Johansson, et al (2001), the authors studied human gaze behavior, hypothesizing that “the brain uses gaze
fixations to obtain spatial information for controlling manipulatory actions.” An important finding was that the
human subjects never fixated or tracked their own hand in the designated task. Instead, they directed their gaze
almost exclusively to the objects presented in the task. Further, in the manipulation task, the subjects mainly directed
their gaze at locations that were critical for the control of the task, rather than at intrinsic features of the objects. The
kinematics of the task determined when gaze was shifted between landmarks. The authors also note that the gaze
shift processes are phasically coupled to the neural programs controlling the hand. They propose that the anchoring
of the gaze at certain points act as spatiotemporal checkpoints for the development of correlations between
somatosensory and visual information and the signals required for predictions of motor commands in manipulatory
tasks.
In Churchill (2000), a human study of prehension in the presence of visual cues, the authors find that “visual
contact with the hand and the environment does not influence the transport component until the hand nears the
object.” In the absence of environmental cues, however, vision of the hand becomes more important. Further, their
experiments showed that the moving hand opens wider when it cannot be seen, increasing the chance of the object
being contacted by one finger. Additionally, peak aperture was wider in reaching with vision, but without
environmental cues, than it was in grasping in the light, which leads the authors to the conclusion that the visual
environment plays a role in the control of grip formation.
Finally, in Battaglia-Mayer, et al (2001), a single-neuron recording study in rhesus monkeys with relevance to
insights into optic ataxia, the authors find that the visual properties of neurons in regions V6A and PEc in the
superior parietal lobule are implicated in the process of visually perceiving moving objects, including the hand, in
the visual field. Accordingly, populations of neurons in these areas may play a major role in visually monitoring
hand position and the movement of the hand in the visual field. It is particularly these neuron’s sensitivity to optic
flow that suggests that they play a role in the analysis of self-motion. They develop the idea of a “global tuning
field” of parietal neurons, analogous to the receptive fields of, for example, visual neurons in V1, in which parietal
neurons respond to movement in a particular direction. The existence of global tuning fields of parietal neurons has
Arbib: Feedback on CS 564 Project Proposals
October 30, 2001
5
several implications. First, the directional properties of information implicitly coded in parietal neurons could
facilitate the combination of signals on the basis of spatial congruence. In other words, perhaps eye and hand signals
can be dynamically recombined to encode spatial information. Second, since the global tuning property is found in
2/3 of parietal neurons, the activities of parietal neurons are in some sense context-dependent. This allows for
flexible combinations of signals, but means that no permanent assignment of coding schemes can be made to
parietal neurons.
References
Arbib, M., 2001, Brain Theory and Artificial Intelligence, Lecture Notes.
Batista, A.P., Newsome, W.T., 2000, Visuo-Motor Control: Giving the brain a hand, Current Biology, 10: R145R148
Battaglia-Mayer, A., Ferraina, S., Genovesio, A., Marconi, B., Squatrito, S., Molinari, M., Lacquaniti, F., Caminiti,
R., 2001, Eye-Hand Coordination during Reaching. II. An Analysis of the Relationships between Visuomanual
Signals in Parietal Cortex and Parieto-frontal Association Projections, Cerebral Cortex, 11: 528-544
Biggs, J., Horch, K., Clark, F.J., 1999, Extrinsic muscles of the hand signal fingertip location more precisely than
they signal the angles of the individual finger joints, Exp Brain Res, 125: 221-230
Boussaoud, D., Bremmer, F., 1999, Gaze Effects in the Cerebral Cortex: Reference Frames for Space Coding and
Action, Exp Brain Res, 128: 170-180
Brochier, T., Boudreau, M., Pare, M., Smith, A.M., 1999, The effects of muscimol inactivation of small regions of
motor and somatosensory cortex on independent finger movements and force control in the precision grip. Exp
Brain Res, 128: 31-40
Carey, D.P., 2000, Eye to hand or hand to eye?, Current Biology, 10: R416-R419
Churchill, A., Hopkins, B., Rohnqvist, L., Vogt, S., 2000, Vision of the hand and environmental context in human
prehension, Exp Brain Res, 134: 81-89
Connolly, J.D., Goodale, M.A., 1999, The role of visual feedback of hand position in the control of manual
prehension, Exp Brain Res, 128: 281-286
Ellis, R.R, Flanagan, J.R., Lederman, S.J., 1999, The Influence of Visual Illusions on Grasp Position, Exp Brain
Res, 125: 109-114
Fagg & Arbib, 1998, Modeling parietal-premotor interactions in primate control of grasping. Neural Networks 11(78) 1277-1303
Ferraina, S., Battaglia-Mayer, A., Genovesio, A., Marconi, B., Onorati, P., Caminiti, R., 2001, Early Coding of
Visuomanual Coordination During Reaching in Parietal Area PEc, J. Neurophysiol., 85: 462-467
Ferraina, S., Johnson, P.B., Garasto, M.R., Battaglia-Mayer, A., Ercolani, L., Bianchi, L., Lacquaniti, F., Caminiti,
R., 1997, Commbination of Hand and Gaze Signals During Reaching: Activity in Parietal Area 7m of the
Monkey, J. Neurophysiol., 77: 1034-1038
Fogassi, L., Gallese, V. Buccino, G., Craighero, L., Fadigan, L., Rizzolatti, G., 2001, Cortical mechanism for the
visual guidance of hand grasping movements in the monkey: A reversible inactivation study, Brain, 124, 571-586
Arbib: Feedback on CS 564 Project Proposals
Gallese,
V.,
Craighero,
L.,
Fadiga,
October 30, 2001
L.,
Fogassi,
L.,
1999,
6
Perception
Through
Action,
http://psyche.cs.monash.edu.au/v5/psyche-5-21-gallese.html
Gallese, V., The acting brain: reviewing the neuroscientific evidence,
http://www.uniroma3.it/kant/field/bermudezsymp_gallese.htm
Husain, M., Jackson, R.J., 2001, Vision: Visual space is not what it appears to be, Current Biology, 11:R1-R4
Illert, M., Kummel, H., 1999, Reflex pathways from large muscle spindle afferents and recurrent axon collaterals to
motoneurones of wrist and digit muscles: a comparison in cats, monkeys and humans. Exp Brain Res, 128: 13-19
Inoue, K., Kawashima, R., Satoh, K., Kinomura, S., Goto, R., Koyama, M., Sugiura, M., Ito, M., Fukuda, H., 1998,
PET Study of Pointing With Visual Feedback of Moving Hands, J. Neurophysiol., 79: 117-125
Jenmalm, P., Dahlstedt, S., Johansson, R.S., 2000, Visual and Tactile Information About Object-Curvature Control
Fingertip Forces and Grasp Kinematics in Human Dextrous Manipulation, J. Neurophysiol., 84: 2984-2997
Johansson, R.S., Westling, G., Backstrom, A., Flanagan, J.R., 2001, Eye-Hand Coordination in Object
Manipulation, Journal of Neuroscience, 21(17): 6917-6932
Neggers, S.F.W, Bekkering, H., 2000, Ocular Gaze is Anchored to the Target of an Ongoing Pointing Movement, J.
Neurophysiol., 83: 639-651
Rosenbaum, D.A., Meulenbroek, R.G.J., Vaughan, J., Jansen, C., 1999, Coordination of Reaching and Grasping by
Capitalizing on Obstacle Avoidance and Other Constraints, Exp Brain Res, 128: 92-100
Santello, M., Soechting, J.F., 1998, Gradual Molding of the Hand to Object Contours, J. Neurophysiol., 79: 13071320
Simoes, C., Mertens, M., Forss, N., Jousmaki, V., Lutkenhoner, B., Hari, R., 2001, Functional Overlap of Finger
Representations in Human SI and SII Cortices, J. Neurophysiol., 86: 1661-1665
von Donkelaar, P., Lee, J., Drew, A.S., 2000, Transcranial Magnetic Stimulation Disrupts Eye-Hand Interactions in
the Posterior Parietal Cortex, J. Neurophysiol., 84: 1677-1680
Wolpert, D.M., 1998, Multiple paired forward and inverse models for motor control, Neural Networks, 11: 13171329.
An exemplary review of the neural net literature: Ryan Mukai
[MAA: The survey is rooted in a general view of the problem to be solved; this grounds a search for papers which
let one proceed from known approaches to the discovery of papers which seem to provide techniques needed for
development of the new model.]
The Use of Self-Organizing Maps
The brain is described by Arbib [7] as a “layered, somatotopic, distributed computer”, a reference to several key
facts about the brain pointed out by Arbib, Kohonen, and Haykin [8]. A primate’s cerebral cortex is organized into
many distinct processing regions. For example, areas 17, 18, and 19 located in the occipital lobe of the human
cerebral cortex are responsible for visual processing [7,8]. Area 46 in the macaque monkey is believed to play a role
in short-term task-related memory [5], while area F5, the area of our primary interest, has been shown to play a key
role in both grasping behavior and in facial control [5]. Within areas 17, 18, and 19, Hubel and Wiesel [7,8]
Arbib: Feedback on CS 564 Project Proposals
October 30, 2001
7
discovered highly ordered sensory maps, with cells exhibiting such features as ocular dominance, spatial sensitivity
in the case of simple cells, and orientation sensitivity in the case of both simple and complex cells. The very
organization of the simple cells forms a retinotopic map, and both simple and complex cells are organized into
arrays with smooth, continuous variations in their orientation sensitivity. The organization of the feline visual
cortex clearly follows a layered, retinotopic pattern at least in its earlier processing stages. Furthermore, studies of
the frog by Lettvin indicate the presence of four retinotopic layers corresponding to four classes of ganglion cells in
the frog’s tectum. In both frog and cat cases, we see that the earlier stages of visual processing are based on highlyorganized retinotopic maps which perform low-level feature extraction, although the frog’s visual system is clearly
designed to respond to highly specific stimuli while that of the cat is designed to yield a broader and more general
picture of the world. Nonetheless, the visual cortex provides an excellent example of a layered, somatotopic
(retinotopic) processing array.
The SOM, or self-organizing map, was developed by Kohonen in an effort to model the somatotopic maps
found in the cerebral cortex, and it has produced very good results [8]. Like a section of an animal’s cerebral cortex,
the SOM will form regions on its two-dimensional surface (although differential dimensionalities are possible, we
won’t discuss them here) in which similar features are mapped relatively close to each other while features that are
more distant are mapped relatively far away from each other. This indeed bears a strong resemblance to the sort of
map organization found in the visual cortex, and since other brain regions, including motor control regions, are
believed to exhibit similar organization, the Kohonen map is a very good first-order model of organization of a small
patch of the brain. In our case, studies by Rizzolatti and Sakata suggest a similar organized map of the F5 and AIP
regions of the brain of the macaque monkey, with various regions corresponding to various types of grasps or
various stages of execution [5]. This makes the Kohonen map a good starting point for studying self-organization
and development of F5 canonical neurons.
It is often pointed out in the literature [1,2,3,4] that the SOM is not designed for time-domain processing. Yet it
is clear from both the FARS [5] and MNS [6] models of grasping behavior in the monkey that grasping is a
temporally organized task. Hence, it is necessary that some form of temporal learning and temporal control occur in
the grasping system of a monkey, and this certainly applies to the population of F5 canonical neurons modeled in the
FARS model. In FARS, the temporal sequencing of various grasping stages is modeled within F5 itself, although it
is presently believed that the F6 area handles actual temporal control and sequencing (reflex sequencing versus
externally controlled sequencing) [5]. In any event, the weakness of the basic SOM, its lack of inherent temporal
processing capability, certainly needs to be addressed if we are to apply it to the problem of modeling development
of the F5 canonical population.
The Basic SOM
Having argued that the SOM provides a way to model development of the F5 canonical neurons, based on the
previous successes of SOM at modeling the development of other sections of cerebral cortex, we now present the
SOM in its basic form. A very detailed discussion of the SOM may be found in either Kohonen or Haykin [8].
If we arrange artificial neurons in a two-dimensional array, we can assign each neuron a random initial weight
vector
wi that is of the same dimensionality as the input vector x . The learning procedure proceeds as follows:
Arbib: Feedback on CS 564 Project Proposals
1.
October 30, 2001
8
x comes in at the input, each neuron in the lattice checks the Euclidean distance from its own
weight vector to the input vector: x  wi . The closest neuron “wins” the competition (this is how lateral
When
inhibition is implemented).
2.
Let i be the index of the winning neuron. Then each neuron updates its internal weight according
to: wk
n  1  i, k wk n  1  i, k x  wk n  1
where we have included a neighborhood
function, which decays as separation between neurons in the lattice increases in order to simulate lateral
inhibition, in the learning parameter
3.
i, k  .
The output of the network is usually the simple firing of the winning neuron while the others remain silent.
However, it is also possible to compute internal activity levels and use those as output.
The radius of the Gaussian neighborhood function included in the learning parameter above usually starts very
wide during the “ordering phase” of learning when the network is assigning different regions of input space to
different regions on its own two-dimensional surface. This neighborhood decreases until it is essentially down to
one neuron when we get to the “tuning phase”, at which time each neuron begins to specialize on certain regions of
the input space. Neighboring neurons specialize in neighboring regions of the input space at the end, so the resulting
network can be thought of as a somatotopic map relating the output space to the input space.
The above network is clearly designed to process in a vector-by-vector fashion, and, except for the learning
rule, there is little notion of time-domain operation or of time-sequencing, items that are crucial to a temporally
sequenced behavior such as grasping. Accordingly, we begin examining some approaches found in the literature
that attempt to extend the basic SOM to deal with time-series data.
Temporal Processing with the SOM
In 1992, Kangas et. al. examined the problem of using an SOM as part of a speech recognition system. Many
speech recognition systems use hidden Markov models in conjunction with a vector classification algorithm to
segment an utterance into phonemes and to recognize the individual phonemes. A Kohonen map could be used in
the vector classification stage, replacing the more traditional k-means algorithm. However, temporal sequencing
issues are still handled by the hidden Markov model [4].
Kangas, et. al, propose the use of a time trace over the two-dimensional surface of the SOM [4]. The idea is
this: a traditional SOM is trained on phonemes until it forms a topological map with similar phonemes close together
on the map. When utterances are fed into the SOM, a time-domain trace of the evolution of the map’s output can be
sent to a higher-level processor. Utterances, such as individual words spoken in isolation, can be recognized by
examination of the traces drawn on the surface of the SOM, with each utterance having a characteristic trace.
Kangas, et. al. propose two separate methods:
1.
Traditional SOM firing. Only the winning neuron fires. The trace of neurons that fire during an utterance
delineates a curve over the surface of the map. This time-domain trace is processed in order to recognize
the word.
2.
Use of neural activities. A time-record of all neural activities is kept, leaving a “fuzzier” trace. These data
are sent to a higher level for processing. Leaky integrators are used at the neural outputs.
Arbib: Feedback on CS 564 Project Proposals
October 30, 2001
9
The two methods have yielded word recognition accuracy ranging from 89.9% to 92.0%. This does not require
modification of the SOM itself, and it does illustrate something interesting. In the FARS model [5], different
subpopulations of the F5 canonical neurons will fire during different stages of the grasp (i.e. some fire more during
pre-shaping, some fire more during the closing phase, some during the holding phase, etc.)
One may thus
hypothesize that similar time traces may occur over the surface of the F5 area during the time-course of a grasp.
Furthermore, since F6 is believed to be responsible for sequencing, it is even possible that a simpler, non-timedomain model of the SOM similar to that of Kangas may prove useful in understanding the development and
operation of F5, but we will discuss other time-domain approaches, along with our reasons for pursuing them.
In 1998, Koskela et. al proposed the RSOM, or Recurrent Self-Organizing Map, in their paper [2]. They
illustrated a previous algorithm, TKN (Temporal Kohonen Map), and compared it analytically and experimentally to
the RSOM procedure. We begin briefly stating that the TKN is very similar to the tradition SOM. The main
differences lies in the outputs. A traditional SOM simply fires its winning neuron while leaving the others silent.
By contrast, a TKN, will fire all of its neurons to varying degrees, depending on the degree of excitation. The output
of each neuron is a leaky integrator unit, so there is a time-decay involved. Hence, this network yields time-domain
behavior. Leaky integrators are influenced by past outputs as well as by present outputs, resulting in an overall
network output that is a time-domain response to a time-domain signal.
However, it can also be shown that using a traditional SOM and modifying only its output typically yields
suboptimal results. Koskela [2] illustrates this suboptimality and proposes RSOM, illustrating its superiority, which
is due to a difference in the manner of training. We summarize Koskela’s training ideas below:
1.
At each iteration, a “temporally leaked difference vector” in the words of Koskela [2], is computed for each


neuron. The equation for the ith neuron is:
yi n  1   yi n  1   xn  wi n . Here,  is

corresponds to short memory (emphasizes current differences
the memory parameter. A larger value of
over past differences), while a smaller value of

corresponds to long memory (emphasizes past
differences more than current differences).
2.
Instead of directly applying
x  w n  1 as the “correction” to the neuron’s existing weight, as we
i
would in the traditional SOM, we apply
yi n . This yields a weight update that depends both on the
present weight difference and on past weight differences. The result is that the network is really trained on
a weighted temporal average of the input vectors rather than on the raw vectors themselves.
3.
The resulting weight update equation is:
wk n  1  i, k wk n  1  i, k yi n .
Here we see the use of weighted time-averages in the training process, and [2] shows a significant improvement
in performance over TKN.
We find the TKN and RSOM concepts interesting for two reasons. First, the F5 canonical neurons are believed
to fire in a population-coded style, like many or most other neurons in the brain [5,6]. The use of leaky-integrator
outputs in TKN and by Kangas et. al. [2] yields a more realistic model, from a biological perspective. Second,
RSOM captures the idea of using a form of time averaging on the inputs prior to presentation to the raw SOM for
training purposes, thus teaching the neurons to respond to a weighted time average of the input. This will have
Arbib: Feedback on CS 564 Project Proposals
October 30, 2001
10
implications for our model of how F5 canonical neurons may learn and self-organize during developmental stages.
It should be born in mind that for small values of

corresponding to a long memory, the network’s weight vectors
will tend toward those inputs which have a great deal of temporal reinforcement.
The CSOM (Contextual Self-Organizing Map) model is another improved SOM, proposed by Voegtlin in 2000
[3]. Here, the objective is to have the network actually learn to recognize context in a temporal sequence of input
data. Voegtlin defines a context as the series input data from the beginning up to the present. In general, the length
of this sequence approaches infinity, so it is not feasible to remember an entire context. However, it would be useful
to train a self-organizing map which can fire its units in response to a context seen so far, remembering the most
recent portion of that context only since infinite memory is impossible. This would give the network the ability to
recognize time-domain sequences even as they are coming in, and such a system would provide a more realistic
model of a biological network responding to a temporally ordered sequence of events.
Voegtlin’s network could fire in response to the most commonly occurring series of contexts. For example, if
the bit sequence “0110” was generated by the bit-stream generator, a neuron in one portion of the map would fire as
soon as the last bit came in. Another neuron may fire when the sequence “01001” arrives, firing when the fifth bit
comes in, and so on. This requires the network to actually have a form of memory since the bits and fed in serially.
The Voegtlin network was able to come up with a nearly optimal scheme for context recognition. In a few
cases, the same sequence could trigger firing of more than one neuron. For example, a neuron may respond to
“0110” while its neighbor responds to “01101”. The first neuron, upon receiving the first four bits, would fire while
its neighbor, upon receiving the fifth bit, would fire immediately thereafter. This represents some suboptimality in
the subsequences selected by the self-organizing map, but, nonetheless, performance is much closer to optimal than
it would be for either TKN or RSOM.
Voegtlin’s map is a recurrent self-organizing map.
The interval activity of a cell is given by:
Vi t    xt   wix t    yt  1  wiy . We see that this differs from the traditional SOM in that we are
2
2
now including the network’s entire previous output as part of its input. The weights

and
 can be used to
determine the relative importance of external input versus self-feedback. The recurrent connections are the key to
the network’s implicit memory and ability to learn to recognize temporal sequences.
Summarizing our discussion so far, TKN, RSOM, and CSOM provide three ways of generating time-domain
self-organizing maps. CSOM, through the use of temporally recurrent connections beyond the simple lateral
excitation and inhibition used in all versions of SOM, actually achieves a form of memory of previous inputs,
allowing it to actually perform classification and mapping of time-domain sequences. While the excitatory and
inhibitory connections implied in even simple SOM models represent a form of spatial interconnection, Voegtlin
introduces the temporal recurrence idea to the neural interconnections themselves. By contrast, TKN and RSOM
perform their temporal processing at the output of the SOM (TKN) or at the input of the SOM (RSOM). Given that
the F5 canonical system necessarily operates in the time-domain, such a sequencing capability may prove very
useful in modeling it, especially since neural interconnections probably provide a more biologically plausible model
Arbib: Feedback on CS 564 Project Proposals
October 30, 2001
11
of temporal learning that do IIR filters at the input or output of the SOM. Furthermore, the temporal averaging and
reinforcement concept from the RSOM model may play a role in F5 neural development as we will argue.
Before proceeding to plans to apply these models to the development of F5 canonical neurons, we would like to
point out one more time-domain model, which may be the most biologically realistic. Until this point, the SOM
models we have discussed still use average firing rates as their outputs, and some even fire only a “winning neuron”.
The choice of a winning neuron, especially during the learning phase, is based on global competition. From a
biological perspective, it seems more plausible that the winner should be selected in a more localized fashion, and
time-domain spiking networks provide a plausible way for this type of learning to take place [1].
Ruf and Schmitt [1] describe a network whose self-organizing properties are based more on local interactions.
The Hodgkin-Huxley equations tell us that a neuron with a higher internal activity level will tend to fire a spike
somewhat earlier than a neuron with a lower (but still above threshold) level of excitation. Since very strong
membrane depolarization can produce earlier and more frequent firing, Ruf and Schmitt argue that it is reasonable to
use the timing of output spikes to pick a winner. According, the neuron that fires earliest is chosen. When that
neuron fires first, it is firing first because it is receiving the strongest excitation in response to the input. That initial
spike gives it the competitive edge, allowing it to excite its immediate neighbors while inhibiting more distant
neurons in the network before they can fire. By including this temporal behavior component in the network, Ruf
and Schmitt provide a more biologically plausible way for self-organization of a cerebral map to occur.
Furthermore, since their spiking neural network operates in the time-domain, with a clear temporal component in its
behavior, it is naturally adapted to firing in response to temporal signals as opposed to firing in response to certain
fixed vectors like a standard Kohonen SOM. A modification of this network may provide an even more biologically
plausible model for the operation of the F5 canonical neurons.
Having summarized some of the most current approaches to temporal self-organization in artificial neural
networks, we can now proceed to examine their applicability to the development of F5 canonical neurons, which are
the F5 neurons modeled in the FARS model [5].
The F5 canonical neurons receive a combination of tactile, proprioceptive, and visual feedback data during the
course of operation (visual feedback being indirect through AIP and tactile through SII). If a human or monkey
infant is still learning to grasp, his or her grasp will tend to be somewhat random at the beginning. However, we
would like to point out a key feature of this learning phase, which may be crucial to the development of successful
grasps: temporal reinforcement. If an infant makes a failed attempt at grasping, he or she will usually keep trying.
The feedback signals from the failed grasp may tend to be rather brief. However, a successful grasp results in the
ability to hold the object. This results in longer temporal reinforcement of both the command (output) signals sent
from F5 to the motor control areas (chiefly F1) and in longer temporal reinforcement of the visual, tactile, and
proprioceptive feedback signals sent to F5. Hence, there may be a tendency for both the AIP and the F5 neurons to
self-organize under the influence of temporal reinforcement, with greater temporal reinforcement given to successful
grasps and lesser reinforcement given to failed grasps. This concept may play a key role in the self-organization of
the F5 canonical subpopulation modeled by Fagg and Arbib in the FARS model, and it is indeed used by
Schlesinger [9,10] in his model of how infants learn to reach for objects. However, we prefer to use self-organizing
Arbib: Feedback on CS 564 Project Proposals
October 30, 2001
12
maps as opposed to Schlesinger’s large population of Econets optimized by genetic algorithms as a self-organizing
learning scheme is far more plausible from a biological point of view.
The RSOM model presented earlier certainly has a form of temporal averaging at its input, and it could be
thought of as a system that simply uses temporal averaging to reinforce learning of a weighted average by the
underlying conventional SOM. For the case of relatively long “memory” in the weighted averaging units, this
would result in greater reinforcement of those behaviors associate with longer times and lesser reinforcement of
behaviors associated with shorter times.
The CSOM model, presented as a superior alternative to RSOM by Voegtlin [3], also contains a useful idea:
recurrent, temporal feedback in addition to the purely spatial recurrent excitation/inhibition used in the conventional
SOM. Given the clear need for a monkey to engage in a series of time-ordered actions while executing a grasp,
there is a need for working memory and state, a simple model of which is provided in FARS. CSOM demonstrates
the ability to maintain state and memory based on past input while engaging in self-organizing learning and context
recognition, making it a strong candidate for modeling learning processes in the F6 area of the monkey’s cerebral
cortex since F6 is thought to be involved in sequencing (according to current research data). Given the high level of
interconnection between F6 and F5, as well as the heavy interconnection between AIP and F5, it is reasonable to say
that these areas must co-develop along with the rest of the monkey’s cerebral cortex during infancy. The selforganizing processes that take place in F6 and in AIP will certainly have an impact on the self-organized learning
that takes place in area F5. Having a self-organizing model which learns to maintain a simple memory and which
can handle simple state-sequencing tasks would be crucial to an overall developmental model of the FARS system
and certainly to the F5 neurons contained within such a system as well.
We can therefore choose to model a subset of FARS with a system of three self-organizing maps for each of
three regions: AIP, F5 canonical, and F6. Alternately, we could exclude F6 from a very simple model and having
F5 learn the sequencing process, although this seems somewhat unrealistic given the evidence that F6 is responsible
for time-domain sequencing control [5]. With either approach, it would be necessary to model other brain regions
using simpler, non-neural models that provide realistic inputs to these three self-organizing maps. AIP would need
to learn, based on input from the dorsal visual pathway, how to compute useful affordances. This learning would
most likely involving receiving useful feedback from F5 in determining the success or failure of a given grasp. F5,
in turn, would need to learn to generate appropriate grasping signals to F1. Through interconnections with F6,
which could be modeled using Voegtlin’s CSOM model in order to develop a concept of state and temporal
sequencing, F5 and F6 could learn together how to compute sequences of actions that constitute successful grasps.
For F6 (or F5 in an even simpler but less realistic model) to learn sequencing, it should have both excitatory and
inhibitory connections to F5. Likewise, F5 needs to have projections into F6 in order to provide current state
information.
Bibliography
1.
Berthold Ruf and Michael Schmitt.
“Self-Organization of Spiking Neurons Using Action Potential
Timing,” IEEE Transactions on Neural Networks, Vol. 9, No. 3, pp. 575-578, May 1998
Arbib: Feedback on CS 564 Project Proposals
2.
October 30, 2001
13
Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski. “Temporal Sequence Processing
using Recurrent SOM,” in Proceedings of the Second International Conference on Knowledge-Based
Intelligent Electronic Systems, Vol. 1, pp. 290-297, 1998
3.
Thomas Voegtlin. “Context Quantization and Contextual Self-Organizing Maps,” in Proceedings of the
IEEE-INNS-ENNS International Joint Conference on Neural Networks, Vol. 6, pp. 20-25, 2000
4.
Jari Kangas, Kari Torkkola, Mikko Kokkonen.
“Using SOMs as Feature Extractors for Speech
Recognition,” in IEEE international Conference on Signal Processing, Vol. 2, pp. 341-344, 1992
5.
Andrew H. Fagg and Michael A. Arbib. “Modeling parietal-premotor interactions in primate control of
grasping,” Neural Networks, Vol. 11, pp. 1277-1303, 1998
6.
Oztop, E., and Arbib, M.A., to appear, “Schema design and implementation of the grasp-related Mirror
Neuron System,” Biological Cybernetics
7.
Michael Arbib. The Metaphorical Brain 2: Neural Networks and Beyond. John Wiley and Sons, New
York, 1989
8.
Simon Haykin.
Neural Networks: A Comprehensive Foundation.
Macmillan College Publishing
Company, New York, 1994.
9.
Schlesinger, M., & Parisi, D. “Multimodal control of reaching: The role of tactile feedback.” IEEE
Transactions on Evolutionary Computation: Special Section on Evolutionary Computation and Cognitive
Science, 5, 122-128, 2001
10. Schlesinger, M., Parisi, D., & Langer, J. “Learning to reach by constraining the movement search space.”
Developmental Science, 3, 67-80, 2000