Download Introduction

Feedback on CS 564 Project Proposals - October 30, 2001 The proposals received last week had many excellent pieces, but almost all proposals had something missing, or failed to draw the pieces together into a coherent work plan. In some cases the literature review was too brief to be of use; in other cases, the review was not properly factored into the project design. I could rarely get a sense of what individual group members were going to contribute to the group's efforts. I am thus asking you to submit an improved version of your proposal by November 13 (email to [email protected]), using both the general feedback in this message and the specific comments which I will supply on each draft on November 1. You should not regard this as extra work - instead you should see it as a necessary part of your ongoing efforts to successfully complete your project in early December. To avoid misunderstanding, let me be even more explicit about the structure of the proposal for this second round: 1. Something I did not ask you for before but which seems necessary: (a) Inseret a copy of the material describing the Specific Aims for your Project in my NSF Proposal. (b) Then provide an abstract (at most one page) for your Project stating clearly which parts of these Specific Aims you will cover in your Project, and the general approach you plan to take. 2. Review the neuroscience literature. Start by stating the general criteria you are using to search for papers relevant to your Project. Then survey at least 6 to 8 papers, giving the reader (and yourselves!) an in-depth understanding of what contribution each paper makes and how it makes it. This section should end with a careful analysis of how these data relate to the proposed modeling. Which data will be used to define the basic structure of the model? Do any of the data change our previous assumptions? Which data should the model explain? Which data seem to be noteworthy as challenges for future modeling, but beyond the scope of the current semester's effort? (As a model for this, I include the review Larry Kite wrote as a solo effort surely setting a base level for what each group should achieve.) 3. Review the modeling literature. Start by stating the general criteria you are using to search for modeling and neural net papers relevant to your Project. Then survey at least 6 to 8 papers, giving the reader (and yourselves!) an in-depth understanding of what contribution each paper makes and how it makes it. This section should end with a clear summary of what methods from these papers you plan to use and why; what methods you will not use and why; and any new ideas you will bring to the modeling. (As a model for this, I include the review Ryan Mukai wrote as a solo effort - surely setting a base level for what each group should achieve.) 4. Now comes the meat of the Proposal: The careful analysis of what you propose to do by early December. If it is a modeling effort give us the details - building on the conclusions you made in your neuroscience review {2} and your modeling review {3} to spell out explicitly what model you will build, how you will characterize inputs and outputs and training rules, and what your criteria are for the success of the model. A similarly careful plan is required if you choose instead to develop a general architecture for your specific aim, rather than model one portion in detail. Arbib: Feedback on CS 564 Project Proposals 5. October 30, 2001 2 Following the detailed outline of your model, provide a one-page "Statement of Work": what will be the role of each group member, and what will be your strategy for coordinating these individual efforts to produce an integrated product in December. 6. Gather all the references in your report into a full bibliography at the end. I prefer that you follow Larry's format: (Author, Year) citation in the body of the report; citations unnumbered but arranged alphabetically by surname of first author in the Reference list. Important Notes: If you decide at this stage that one of you wants to pursue a project on his own; or if your group decides that it would be better to split in two groups with rather different aims, that is fine with me. In that case, each November 12 report should come from its own (possibly reduced) group. However, whatever the group, all sections must be genuinely co-authored. (Even if you review papers individually you still need to discuss together how you will use material from these papers, and then edit the result into a single format.) It is not acceptable to staple together separate reports by group members which show no signs of integration. An exemplary review of the neuroscience literature: Larry Kite [MAA: The survey provides an excellent set of recent papers relevant to the problem to be solved. This needs to be followed by a careful analysis of how these data relate to the proposed modeling. Which data will be used to define the basic structure of the model? Do any of the data change our previous assumptions? Which data should the model explain? Which data seem to be noteworthy as challenges for future modeling, but beyond the scope of the current semester's effort.] Literature Review The following constitutes by no means a complete review of the neuroscience literature related to visual control of grasping movements. I hope to delve much deeper into the literature for the final presentation. Here, however, is a representative sample of current results in the control of grasping and reaching and the role of vision therein. In Boussaoud, et al (1999), they examined gaze effects and their relation to the transformation from a retinacentered frame of reference to body-centric coordinates. The authors posit that the distributed nature of eye position signals across cortical areas suggest that the transformation from retinal to body-centric coordinates does not proceed in a serial fashion through the pathways linking visual and motor cortical areas. Rather, “various stages of the visuomotor pathways, such as the posterior parietal cortex and the dorsal pre-motor cortex, contain the necessary signals for an implicit representation of targets using eye position and retinal information.” This is not to suggest that there are not neurons that explicitly code target information in a head-centered reference frame. It does suggest that neuronal populations at all levels of the transformation might create a distributed, implicit coding of target location and movement direction. In conclusion, the authors write: “A comprehensive theory of visuomotor transformations must take into account the distributed nature of gaze modulation of the discharge rates of individual neurons across the cerebral cortex.” Implicit in their findings, the authors state, is that interactions between parietal areas and PMd might play a role in building flexible, task-dependent reference frames for coding target location and coordinated gaze and arm movements. Further, mixtures of coordinate systems may emerge at a behavioral level Arbib: Feedback on CS 564 Project Proposals October 30, 2001 3 from the distributed neuronal representations using multiple reference frames. The authors also note that there have been no published models where gaze and retinal signals are combined to code for movement kinematics. In Rosenbaum, et al (1999), the authors present a computational model for solving the inverse kinematics problem for reaching and grasping movements. The essence of their idea is that movements are specified in order to satisfy a hierarchy of cost constraints. Once a suitable goal position is found, a straight-line interpolation in joint space is calculated to go from the starting posture to the goal posture. Note that goal postures are planned before movements. If this were not the case, one would have to run through the movement in one’s head to find an appropriate goal posture. The model also takes into account obstacle avoidance, calculating “via postures”, intermediate postures through which an obstacle can be avoided. It is interesting to note that their model accurately predicted certain behavior seen in human reaching and grasping. For instance, grasping for larger objects results in a larger hand aperture during the reach. Furthermore, the maximum aperture comes later in the movement toward larger objects. In Brochier, et al (1999), a muscimol inactivation study in monkey, the authors concluded that cutaneous feedback to SI is essential for fine control of grip forces and that there is a close relationship between SI and MI in controlling the precision grip. With injections into SI, finger movements could not be coordinated. However, performance was improved when the monkey had access to visual cues for control. In Connolly & Goodale (1999), the authors noted a tight coupling between transport and grip components in a grasping task. When visual feedback of a limb was prevented in human subjects, reach duration was longer with proportionate increases in both the acceleration and deceleration phases. However, maximum grip aperture was the same for both visually augmented reaching/grasping and the case in which visual feedback was removed. Thus, the authors report, the posture of the hand can be programmed without visual feedback. The conclusion that there exists a tight coupling between transport and grip was based on the fact that the relative timing of acceleration and deceleration was unchanged between open-loop (reaching without vision) and closed-loop (reaching with vision) tasks. The authors do note, however, that visual feedback of the hand is used selectively to guide the closing phase of the hand movement to an object as the hand becomes more foveated. It is important to note that grasp and transport are temporally coupled, but functionally distinct. In Neggers & Bekkering (2000), the authors demonstrated that when ocular gaze is fixated on the target of a pointing movement that has already started, a second saccade cannot be started until the pointing movement is completed. This is the case even though the human subject is aware that the second target is presented and wants to saccade to the second target. The authors conclude that there is an active saccadic inhibition process, which keeps ocular fixation on a target. In Santello & Soechting (1998) the authors demonstrate that in a grasping task, the precise finger configuration of each finger need not be specified before the object is grasped. Instead, tactile feedback can be used to mold the hand to the object’s precise contours. The only requirement is that the grip be wide enough. However, at a point past the half-way point in the movement, the shape of the object being grasped (e.g., concave vs. convex) can be determined through analysis of the finger positions using discriminant functions. Arbib: Feedback on CS 564 Project Proposals October 30, 2001 4 In Jenmalm, et al (2000), the authors show that human subjects use visual information to identify the grip-force requirements of a grasp well before somatosensory information is available. Visual information is also used to access stored memory information of previous experiences in grasping a given object. Such information can be used to “set” motor command parameters in advance of the grasp. In Inoue, et al (1998), a PET study of pointing with visual feedback of the hands in humans, the authors attempt to locate the brain regions where movements are processed to allow accurate pointing. They conclude that the supramarginal cortex, the posterior cingulate cortex of the left hemisphere, and the cerebellum are involved in the integration of visual feedback of hand movements and accurate pointing. In Ferraina, et al (2000), the authors purport to show that parietal region PEc is a visuomotor region, rather than a somatosensory region, as widely believed. They show that PEc is “an early node of the parietal system underlying eye-hand coordination during reaching.” The authors also note the influence of eye position signals on reach-related activity in the superior parietal lobule. In Johansson, et al (2001), the authors studied human gaze behavior, hypothesizing that “the brain uses gaze fixations to obtain spatial information for controlling manipulatory actions.” An important finding was that the human subjects never fixated or tracked their own hand in the designated task. Instead, they directed their gaze almost exclusively to the objects presented in the task. Further, in the manipulation task, the subjects mainly directed their gaze at locations that were critical for the control of the task, rather than at intrinsic features of the objects. The kinematics of the task determined when gaze was shifted between landmarks. The authors also note that the gaze shift processes are phasically coupled to the neural programs controlling the hand. They propose that the anchoring of the gaze at certain points act as spatiotemporal checkpoints for the development of correlations between somatosensory and visual information and the signals required for predictions of motor commands in manipulatory tasks. In Churchill (2000), a human study of prehension in the presence of visual cues, the authors find that “visual contact with the hand and the environment does not influence the transport component until the hand nears the object.” In the absence of environmental cues, however, vision of the hand becomes more important. Further, their experiments showed that the moving hand opens wider when it cannot be seen, increasing the chance of the object being contacted by one finger. Additionally, peak aperture was wider in reaching with vision, but without environmental cues, than it was in grasping in the light, which leads the authors to the conclusion that the visual environment plays a role in the control of grip formation. Finally, in Battaglia-Mayer, et al (2001), a single-neuron recording study in rhesus monkeys with relevance to insights into optic ataxia, the authors find that the visual properties of neurons in regions V6A and PEc in the superior parietal lobule are implicated in the process of visually perceiving moving objects, including the hand, in the visual field. Accordingly, populations of neurons in these areas may play a major role in visually monitoring hand position and the movement of the hand in the visual field. It is particularly these neuron’s sensitivity to optic flow that suggests that they play a role in the analysis of self-motion. They develop the idea of a “global tuning field” of parietal neurons, analogous to the receptive fields of, for example, visual neurons in V1, in which parietal neurons respond to movement in a particular direction. The existence of global tuning fields of parietal neurons has Arbib: Feedback on CS 564 Project Proposals October 30, 2001 5 several implications. First, the directional properties of information implicitly coded in parietal neurons could facilitate the combination of signals on the basis of spatial congruence. In other words, perhaps eye and hand signals can be dynamically recombined to encode spatial information. Second, since the global tuning property is found in 2/3 of parietal neurons, the activities of parietal neurons are in some sense context-dependent. This allows for flexible combinations of signals, but means that no permanent assignment of coding schemes can be made to parietal neurons. References Arbib, M., 2001, Brain Theory and Artificial Intelligence, Lecture Notes. Batista, A.P., Newsome, W.T., 2000, Visuo-Motor Control: Giving the brain a hand, Current Biology, 10: R145R148 Battaglia-Mayer, A., Ferraina, S., Genovesio, A., Marconi, B., Squatrito, S., Molinari, M., Lacquaniti, F., Caminiti, R., 2001, Eye-Hand Coordination during Reaching. II. An Analysis of the Relationships between Visuomanual Signals in Parietal Cortex and Parieto-frontal Association Projections, Cerebral Cortex, 11: 528-544 Biggs, J., Horch, K., Clark, F.J., 1999, Extrinsic muscles of the hand signal fingertip location more precisely than they signal the angles of the individual finger joints, Exp Brain Res, 125: 221-230 Boussaoud, D., Bremmer, F., 1999, Gaze Effects in the Cerebral Cortex: Reference Frames for Space Coding and Action, Exp Brain Res, 128: 170-180 Brochier, T., Boudreau, M., Pare, M., Smith, A.M., 1999, The effects of muscimol inactivation of small regions of motor and somatosensory cortex on independent finger movements and force control in the precision grip. Exp Brain Res, 128: 31-40 Carey, D.P., 2000, Eye to hand or hand to eye?, Current Biology, 10: R416-R419 Churchill, A., Hopkins, B., Rohnqvist, L., Vogt, S., 2000, Vision of the hand and environmental context in human prehension, Exp Brain Res, 134: 81-89 Connolly, J.D., Goodale, M.A., 1999, The role of visual feedback of hand position in the control of manual prehension, Exp Brain Res, 128: 281-286 Ellis, R.R, Flanagan, J.R., Lederman, S.J., 1999, The Influence of Visual Illusions on Grasp Position, Exp Brain Res, 125: 109-114 Fagg & Arbib, 1998, Modeling parietal-premotor interactions in primate control of grasping. Neural Networks 11(78) 1277-1303 Ferraina, S., Battaglia-Mayer, A., Genovesio, A., Marconi, B., Onorati, P., Caminiti, R., 2001, Early Coding of Visuomanual Coordination During Reaching in Parietal Area PEc, J. Neurophysiol., 85: 462-467 Ferraina, S., Johnson, P.B., Garasto, M.R., Battaglia-Mayer, A., Ercolani, L., Bianchi, L., Lacquaniti, F., Caminiti, R., 1997, Commbination of Hand and Gaze Signals During Reaching: Activity in Parietal Area 7m of the Monkey, J. Neurophysiol., 77: 1034-1038 Fogassi, L., Gallese, V. Buccino, G., Craighero, L., Fadigan, L., Rizzolatti, G., 2001, Cortical mechanism for the visual guidance of hand grasping movements in the monkey: A reversible inactivation study, Brain, 124, 571-586 Arbib: Feedback on CS 564 Project Proposals Gallese, V., Craighero, L., Fadiga, October 30, 2001 L., Fogassi, L., 1999, 6 Perception Through Action, http://psyche.cs.monash.edu.au/v5/psyche-5-21-gallese.html Gallese, V., The acting brain: reviewing the neuroscientific evidence, http://www.uniroma3.it/kant/field/bermudezsymp_gallese.htm Husain, M., Jackson, R.J., 2001, Vision: Visual space is not what it appears to be, Current Biology, 11:R1-R4 Illert, M., Kummel, H., 1999, Reflex pathways from large muscle spindle afferents and recurrent axon collaterals to motoneurones of wrist and digit muscles: a comparison in cats, monkeys and humans. Exp Brain Res, 128: 13-19 Inoue, K., Kawashima, R., Satoh, K., Kinomura, S., Goto, R., Koyama, M., Sugiura, M., Ito, M., Fukuda, H., 1998, PET Study of Pointing With Visual Feedback of Moving Hands, J. Neurophysiol., 79: 117-125 Jenmalm, P., Dahlstedt, S., Johansson, R.S., 2000, Visual and Tactile Information About Object-Curvature Control Fingertip Forces and Grasp Kinematics in Human Dextrous Manipulation, J. Neurophysiol., 84: 2984-2997 Johansson, R.S., Westling, G., Backstrom, A., Flanagan, J.R., 2001, Eye-Hand Coordination in Object Manipulation, Journal of Neuroscience, 21(17): 6917-6932 Neggers, S.F.W, Bekkering, H., 2000, Ocular Gaze is Anchored to the Target of an Ongoing Pointing Movement, J. Neurophysiol., 83: 639-651 Rosenbaum, D.A., Meulenbroek, R.G.J., Vaughan, J., Jansen, C., 1999, Coordination of Reaching and Grasping by Capitalizing on Obstacle Avoidance and Other Constraints, Exp Brain Res, 128: 92-100 Santello, M., Soechting, J.F., 1998, Gradual Molding of the Hand to Object Contours, J. Neurophysiol., 79: 13071320 Simoes, C., Mertens, M., Forss, N., Jousmaki, V., Lutkenhoner, B., Hari, R., 2001, Functional Overlap of Finger Representations in Human SI and SII Cortices, J. Neurophysiol., 86: 1661-1665 von Donkelaar, P., Lee, J., Drew, A.S., 2000, Transcranial Magnetic Stimulation Disrupts Eye-Hand Interactions in the Posterior Parietal Cortex, J. Neurophysiol., 84: 1677-1680 Wolpert, D.M., 1998, Multiple paired forward and inverse models for motor control, Neural Networks, 11: 13171329. An exemplary review of the neural net literature: Ryan Mukai [MAA: The survey is rooted in a general view of the problem to be solved; this grounds a search for papers which let one proceed from known approaches to the discovery of papers which seem to provide techniques needed for development of the new model.] The Use of Self-Organizing Maps The brain is described by Arbib [7] as a “layered, somatotopic, distributed computer”, a reference to several key facts about the brain pointed out by Arbib, Kohonen, and Haykin [8]. A primate’s cerebral cortex is organized into many distinct processing regions. For example, areas 17, 18, and 19 located in the occipital lobe of the human cerebral cortex are responsible for visual processing [7,8]. Area 46 in the macaque monkey is believed to play a role in short-term task-related memory [5], while area F5, the area of our primary interest, has been shown to play a key role in both grasping behavior and in facial control [5]. Within areas 17, 18, and 19, Hubel and Wiesel [7,8] Arbib: Feedback on CS 564 Project Proposals October 30, 2001 7 discovered highly ordered sensory maps, with cells exhibiting such features as ocular dominance, spatial sensitivity in the case of simple cells, and orientation sensitivity in the case of both simple and complex cells. The very organization of the simple cells forms a retinotopic map, and both simple and complex cells are organized into arrays with smooth, continuous variations in their orientation sensitivity. The organization of the feline visual cortex clearly follows a layered, retinotopic pattern at least in its earlier processing stages. Furthermore, studies of the frog by Lettvin indicate the presence of four retinotopic layers corresponding to four classes of ganglion cells in the frog’s tectum. In both frog and cat cases, we see that the earlier stages of visual processing are based on highlyorganized retinotopic maps which perform low-level feature extraction, although the frog’s visual system is clearly designed to respond to highly specific stimuli while that of the cat is designed to yield a broader and more general picture of the world. Nonetheless, the visual cortex provides an excellent example of a layered, somatotopic (retinotopic) processing array. The SOM, or self-organizing map, was developed by Kohonen in an effort to model the somatotopic maps found in the cerebral cortex, and it has produced very good results [8]. Like a section of an animal’s cerebral cortex, the SOM will form regions on its two-dimensional surface (although differential dimensionalities are possible, we won’t discuss them here) in which similar features are mapped relatively close to each other while features that are more distant are mapped relatively far away from each other. This indeed bears a strong resemblance to the sort of map organization found in the visual cortex, and since other brain regions, including motor control regions, are believed to exhibit similar organization, the Kohonen map is a very good first-order model of organization of a small patch of the brain. In our case, studies by Rizzolatti and Sakata suggest a similar organized map of the F5 and AIP regions of the brain of the macaque monkey, with various regions corresponding to various types of grasps or various stages of execution [5]. This makes the Kohonen map a good starting point for studying self-organization and development of F5 canonical neurons. It is often pointed out in the literature [1,2,3,4] that the SOM is not designed for time-domain processing. Yet it is clear from both the FARS [5] and MNS [6] models of grasping behavior in the monkey that grasping is a temporally organized task. Hence, it is necessary that some form of temporal learning and temporal control occur in the grasping system of a monkey, and this certainly applies to the population of F5 canonical neurons modeled in the FARS model. In FARS, the temporal sequencing of various grasping stages is modeled within F5 itself, although it is presently believed that the F6 area handles actual temporal control and sequencing (reflex sequencing versus externally controlled sequencing) [5]. In any event, the weakness of the basic SOM, its lack of inherent temporal processing capability, certainly needs to be addressed if we are to apply it to the problem of modeling development of the F5 canonical population. The Basic SOM Having argued that the SOM provides a way to model development of the F5 canonical neurons, based on the previous successes of SOM at modeling the development of other sections of cerebral cortex, we now present the SOM in its basic form. A very detailed discussion of the SOM may be found in either Kohonen or Haykin [8]. If we arrange artificial neurons in a two-dimensional array, we can assign each neuron a random initial weight vector wi that is of the same dimensionality as the input vector x . The learning procedure proceeds as follows: Arbib: Feedback on CS 564 Project Proposals 1. October 30, 2001 8 x comes in at the input, each neuron in the lattice checks the Euclidean distance from its own weight vector to the input vector: x  wi . The closest neuron “wins” the competition (this is how lateral When inhibition is implemented). 2. Let i be the index of the winning neuron. Then each neuron updates its internal weight according to: wk n  1  i, k wk n  1  i, k x  wk n  1 where we have included a neighborhood function, which decays as separation between neurons in the lattice increases in order to simulate lateral inhibition, in the learning parameter 3. i, k  . The output of the network is usually the simple firing of the winning neuron while the others remain silent. However, it is also possible to compute internal activity levels and use those as output. The radius of the Gaussian neighborhood function included in the learning parameter above usually starts very wide during the “ordering phase” of learning when the network is assigning different regions of input space to different regions on its own two-dimensional surface. This neighborhood decreases until it is essentially down to one neuron when we get to the “tuning phase”, at which time each neuron begins to specialize on certain regions of the input space. Neighboring neurons specialize in neighboring regions of the input space at the end, so the resulting network can be thought of as a somatotopic map relating the output space to the input space. The above network is clearly designed to process in a vector-by-vector fashion, and, except for the learning rule, there is little notion of time-domain operation or of time-sequencing, items that are crucial to a temporally sequenced behavior such as grasping. Accordingly, we begin examining some approaches found in the literature that attempt to extend the basic SOM to deal with time-series data. Temporal Processing with the SOM In 1992, Kangas et. al. examined the problem of using an SOM as part of a speech recognition system. Many speech recognition systems use hidden Markov models in conjunction with a vector classification algorithm to segment an utterance into phonemes and to recognize the individual phonemes. A Kohonen map could be used in the vector classification stage, replacing the more traditional k-means algorithm. However, temporal sequencing issues are still handled by the hidden Markov model [4]. Kangas, et. al, propose the use of a time trace over the two-dimensional surface of the SOM [4]. The idea is this: a traditional SOM is trained on phonemes until it forms a topological map with similar phonemes close together on the map. When utterances are fed into the SOM, a time-domain trace of the evolution of the map’s output can be sent to a higher-level processor. Utterances, such as individual words spoken in isolation, can be recognized by examination of the traces drawn on the surface of the SOM, with each utterance having a characteristic trace. Kangas, et. al. propose two separate methods: 1. Traditional SOM firing. Only the winning neuron fires. The trace of neurons that fire during an utterance delineates a curve over the surface of the map. This time-domain trace is processed in order to recognize the word. 2. Use of neural activities. A time-record of all neural activities is kept, leaving a “fuzzier” trace. These data are sent to a higher level for processing. Leaky integrators are used at the neural outputs. Arbib: Feedback on CS 564 Project Proposals October 30, 2001 9 The two methods have yielded word recognition accuracy ranging from 89.9% to 92.0%. This does not require modification of the SOM itself, and it does illustrate something interesting. In the FARS model [5], different subpopulations of the F5 canonical neurons will fire during different stages of the grasp (i.e. some fire more during pre-shaping, some fire more during the closing phase, some during the holding phase, etc.) One may thus hypothesize that similar time traces may occur over the surface of the F5 area during the time-course of a grasp. Furthermore, since F6 is believed to be responsible for sequencing, it is even possible that a simpler, non-timedomain model of the SOM similar to that of Kangas may prove useful in understanding the development and operation of F5, but we will discuss other time-domain approaches, along with our reasons for pursuing them. In 1998, Koskela et. al proposed the RSOM, or Recurrent Self-Organizing Map, in their paper [2]. They illustrated a previous algorithm, TKN (Temporal Kohonen Map), and compared it analytically and experimentally to the RSOM procedure. We begin briefly stating that the TKN is very similar to the tradition SOM. The main differences lies in the outputs. A traditional SOM simply fires its winning neuron while leaving the others silent. By contrast, a TKN, will fire all of its neurons to varying degrees, depending on the degree of excitation. The output of each neuron is a leaky integrator unit, so there is a time-decay involved. Hence, this network yields time-domain behavior. Leaky integrators are influenced by past outputs as well as by present outputs, resulting in an overall network output that is a time-domain response to a time-domain signal. However, it can also be shown that using a traditional SOM and modifying only its output typically yields suboptimal results. Koskela [2] illustrates this suboptimality and proposes RSOM, illustrating its superiority, which is due to a difference in the manner of training. We summarize Koskela’s training ideas below: 1. At each iteration, a “temporally leaked difference vector” in the words of Koskela [2], is computed for each   neuron. The equation for the ith neuron is: yi n  1   yi n  1   xn  wi n . Here,  is  corresponds to short memory (emphasizes current differences the memory parameter. A larger value of over past differences), while a smaller value of  corresponds to long memory (emphasizes past differences more than current differences). 2. Instead of directly applying x  w n  1 as the “correction” to the neuron’s existing weight, as we i would in the traditional SOM, we apply yi n . This yields a weight update that depends both on the present weight difference and on past weight differences. The result is that the network is really trained on a weighted temporal average of the input vectors rather than on the raw vectors themselves. 3. The resulting weight update equation is: wk n  1  i, k wk n  1  i, k yi n . Here we see the use of weighted time-averages in the training process, and [2] shows a significant improvement in performance over TKN. We find the TKN and RSOM concepts interesting for two reasons. First, the F5 canonical neurons are believed to fire in a population-coded style, like many or most other neurons in the brain [5,6]. The use of leaky-integrator outputs in TKN and by Kangas et. al. [2] yields a more realistic model, from a biological perspective. Second, RSOM captures the idea of using a form of time averaging on the inputs prior to presentation to the raw SOM for training purposes, thus teaching the neurons to respond to a weighted time average of the input. This will have Arbib: Feedback on CS 564 Project Proposals October 30, 2001 10 implications for our model of how F5 canonical neurons may learn and self-organize during developmental stages. It should be born in mind that for small values of  corresponding to a long memory, the network’s weight vectors will tend toward those inputs which have a great deal of temporal reinforcement. The CSOM (Contextual Self-Organizing Map) model is another improved SOM, proposed by Voegtlin in 2000 [3]. Here, the objective is to have the network actually learn to recognize context in a temporal sequence of input data. Voegtlin defines a context as the series input data from the beginning up to the present. In general, the length of this sequence approaches infinity, so it is not feasible to remember an entire context. However, it would be useful to train a self-organizing map which can fire its units in response to a context seen so far, remembering the most recent portion of that context only since infinite memory is impossible. This would give the network the ability to recognize time-domain sequences even as they are coming in, and such a system would provide a more realistic model of a biological network responding to a temporally ordered sequence of events. Voegtlin’s network could fire in response to the most commonly occurring series of contexts. For example, if the bit sequence “0110” was generated by the bit-stream generator, a neuron in one portion of the map would fire as soon as the last bit came in. Another neuron may fire when the sequence “01001” arrives, firing when the fifth bit comes in, and so on. This requires the network to actually have a form of memory since the bits and fed in serially. The Voegtlin network was able to come up with a nearly optimal scheme for context recognition. In a few cases, the same sequence could trigger firing of more than one neuron. For example, a neuron may respond to “0110” while its neighbor responds to “01101”. The first neuron, upon receiving the first four bits, would fire while its neighbor, upon receiving the fifth bit, would fire immediately thereafter. This represents some suboptimality in the subsequences selected by the self-organizing map, but, nonetheless, performance is much closer to optimal than it would be for either TKN or RSOM. Voegtlin’s map is a recurrent self-organizing map. The interval activity of a cell is given by: Vi t    xt   wix t    yt  1  wiy . We see that this differs from the traditional SOM in that we are 2 2 now including the network’s entire previous output as part of its input. The weights  and  can be used to determine the relative importance of external input versus self-feedback. The recurrent connections are the key to the network’s implicit memory and ability to learn to recognize temporal sequences. Summarizing our discussion so far, TKN, RSOM, and CSOM provide three ways of generating time-domain self-organizing maps. CSOM, through the use of temporally recurrent connections beyond the simple lateral excitation and inhibition used in all versions of SOM, actually achieves a form of memory of previous inputs, allowing it to actually perform classification and mapping of time-domain sequences. While the excitatory and inhibitory connections implied in even simple SOM models represent a form of spatial interconnection, Voegtlin introduces the temporal recurrence idea to the neural interconnections themselves. By contrast, TKN and RSOM perform their temporal processing at the output of the SOM (TKN) or at the input of the SOM (RSOM). Given that the F5 canonical system necessarily operates in the time-domain, such a sequencing capability may prove very useful in modeling it, especially since neural interconnections probably provide a more biologically plausible model Arbib: Feedback on CS 564 Project Proposals October 30, 2001 11 of temporal learning that do IIR filters at the input or output of the SOM. Furthermore, the temporal averaging and reinforcement concept from the RSOM model may play a role in F5 neural development as we will argue. Before proceeding to plans to apply these models to the development of F5 canonical neurons, we would like to point out one more time-domain model, which may be the most biologically realistic. Until this point, the SOM models we have discussed still use average firing rates as their outputs, and some even fire only a “winning neuron”. The choice of a winning neuron, especially during the learning phase, is based on global competition. From a biological perspective, it seems more plausible that the winner should be selected in a more localized fashion, and time-domain spiking networks provide a plausible way for this type of learning to take place [1]. Ruf and Schmitt [1] describe a network whose self-organizing properties are based more on local interactions. The Hodgkin-Huxley equations tell us that a neuron with a higher internal activity level will tend to fire a spike somewhat earlier than a neuron with a lower (but still above threshold) level of excitation. Since very strong membrane depolarization can produce earlier and more frequent firing, Ruf and Schmitt argue that it is reasonable to use the timing of output spikes to pick a winner. According, the neuron that fires earliest is chosen. When that neuron fires first, it is firing first because it is receiving the strongest excitation in response to the input. That initial spike gives it the competitive edge, allowing it to excite its immediate neighbors while inhibiting more distant neurons in the network before they can fire. By including this temporal behavior component in the network, Ruf and Schmitt provide a more biologically plausible way for self-organization of a cerebral map to occur. Furthermore, since their spiking neural network operates in the time-domain, with a clear temporal component in its behavior, it is naturally adapted to firing in response to temporal signals as opposed to firing in response to certain fixed vectors like a standard Kohonen SOM. A modification of this network may provide an even more biologically plausible model for the operation of the F5 canonical neurons. Having summarized some of the most current approaches to temporal self-organization in artificial neural networks, we can now proceed to examine their applicability to the development of F5 canonical neurons, which are the F5 neurons modeled in the FARS model [5]. The F5 canonical neurons receive a combination of tactile, proprioceptive, and visual feedback data during the course of operation (visual feedback being indirect through AIP and tactile through SII). If a human or monkey infant is still learning to grasp, his or her grasp will tend to be somewhat random at the beginning. However, we would like to point out a key feature of this learning phase, which may be crucial to the development of successful grasps: temporal reinforcement. If an infant makes a failed attempt at grasping, he or she will usually keep trying. The feedback signals from the failed grasp may tend to be rather brief. However, a successful grasp results in the ability to hold the object. This results in longer temporal reinforcement of both the command (output) signals sent from F5 to the motor control areas (chiefly F1) and in longer temporal reinforcement of the visual, tactile, and proprioceptive feedback signals sent to F5. Hence, there may be a tendency for both the AIP and the F5 neurons to self-organize under the influence of temporal reinforcement, with greater temporal reinforcement given to successful grasps and lesser reinforcement given to failed grasps. This concept may play a key role in the self-organization of the F5 canonical subpopulation modeled by Fagg and Arbib in the FARS model, and it is indeed used by Schlesinger [9,10] in his model of how infants learn to reach for objects. However, we prefer to use self-organizing Arbib: Feedback on CS 564 Project Proposals October 30, 2001 12 maps as opposed to Schlesinger’s large population of Econets optimized by genetic algorithms as a self-organizing learning scheme is far more plausible from a biological point of view. The RSOM model presented earlier certainly has a form of temporal averaging at its input, and it could be thought of as a system that simply uses temporal averaging to reinforce learning of a weighted average by the underlying conventional SOM. For the case of relatively long “memory” in the weighted averaging units, this would result in greater reinforcement of those behaviors associate with longer times and lesser reinforcement of behaviors associated with shorter times. The CSOM model, presented as a superior alternative to RSOM by Voegtlin [3], also contains a useful idea: recurrent, temporal feedback in addition to the purely spatial recurrent excitation/inhibition used in the conventional SOM. Given the clear need for a monkey to engage in a series of time-ordered actions while executing a grasp, there is a need for working memory and state, a simple model of which is provided in FARS. CSOM demonstrates the ability to maintain state and memory based on past input while engaging in self-organizing learning and context recognition, making it a strong candidate for modeling learning processes in the F6 area of the monkey’s cerebral cortex since F6 is thought to be involved in sequencing (according to current research data). Given the high level of interconnection between F6 and F5, as well as the heavy interconnection between AIP and F5, it is reasonable to say that these areas must co-develop along with the rest of the monkey’s cerebral cortex during infancy. The selforganizing processes that take place in F6 and in AIP will certainly have an impact on the self-organized learning that takes place in area F5. Having a self-organizing model which learns to maintain a simple memory and which can handle simple state-sequencing tasks would be crucial to an overall developmental model of the FARS system and certainly to the F5 neurons contained within such a system as well. We can therefore choose to model a subset of FARS with a system of three self-organizing maps for each of three regions: AIP, F5 canonical, and F6. Alternately, we could exclude F6 from a very simple model and having F5 learn the sequencing process, although this seems somewhat unrealistic given the evidence that F6 is responsible for time-domain sequencing control [5]. With either approach, it would be necessary to model other brain regions using simpler, non-neural models that provide realistic inputs to these three self-organizing maps. AIP would need to learn, based on input from the dorsal visual pathway, how to compute useful affordances. This learning would most likely involving receiving useful feedback from F5 in determining the success or failure of a given grasp. F5, in turn, would need to learn to generate appropriate grasping signals to F1. Through interconnections with F6, which could be modeled using Voegtlin’s CSOM model in order to develop a concept of state and temporal sequencing, F5 and F6 could learn together how to compute sequences of actions that constitute successful grasps. For F6 (or F5 in an even simpler but less realistic model) to learn sequencing, it should have both excitatory and inhibitory connections to F5. Likewise, F5 needs to have projections into F6 in order to provide current state information. Bibliography 1. Berthold Ruf and Michael Schmitt. “Self-Organization of Spiking Neurons Using Action Potential Timing,” IEEE Transactions on Neural Networks, Vol. 9, No. 3, pp. 575-578, May 1998 Arbib: Feedback on CS 564 Project Proposals 2. October 30, 2001 13 Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski. “Temporal Sequence Processing using Recurrent SOM,” in Proceedings of the Second International Conference on Knowledge-Based Intelligent Electronic Systems, Vol. 1, pp. 290-297, 1998 3. Thomas Voegtlin. “Context Quantization and Contextual Self-Organizing Maps,” in Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, Vol. 6, pp. 20-25, 2000 4. Jari Kangas, Kari Torkkola, Mikko Kokkonen. “Using SOMs as Feature Extractors for Speech Recognition,” in IEEE international Conference on Signal Processing, Vol. 2, pp. 341-344, 1992 5. Andrew H. Fagg and Michael A. Arbib. “Modeling parietal-premotor interactions in primate control of grasping,” Neural Networks, Vol. 11, pp. 1277-1303, 1998 6. Oztop, E., and Arbib, M.A., to appear, “Schema design and implementation of the grasp-related Mirror Neuron System,” Biological Cybernetics 7. Michael Arbib. The Metaphorical Brain 2: Neural Networks and Beyond. John Wiley and Sons, New York, 1989 8. Simon Haykin. Neural Networks: A Comprehensive Foundation. Macmillan College Publishing Company, New York, 1994. 9. Schlesinger, M., & Parisi, D. “Multimodal control of reaching: The role of tactile feedback.” IEEE Transactions on Evolutionary Computation: Special Section on Evolutionary Computation and Cognitive Science, 5, 122-128, 2001 10. Schlesinger, M., Parisi, D., & Langer, J. “Learning to reach by constraining the movement search space.” Developmental Science, 3, 67-80, 2000

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduction