* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Towards the integration of neural mechanisms and cognition in
Brain Rules wikipedia , lookup
Eyeblink conditioning wikipedia , lookup
Neuroscience in space wikipedia , lookup
Activity-dependent plasticity wikipedia , lookup
Neuropsychology wikipedia , lookup
Neural oscillation wikipedia , lookup
Embodied language processing wikipedia , lookup
Catastrophic interference wikipedia , lookup
Brain–computer interface wikipedia , lookup
Neuroanatomy wikipedia , lookup
Executive functions wikipedia , lookup
Artificial general intelligence wikipedia , lookup
Optogenetics wikipedia , lookup
Neuroethology wikipedia , lookup
Neural coding wikipedia , lookup
Human brain wikipedia , lookup
Cognitive neuroscience of music wikipedia , lookup
Environmental enrichment wikipedia , lookup
Binding problem wikipedia , lookup
Neurocomputational speech processing wikipedia , lookup
Central pattern generator wikipedia , lookup
Neuroinformatics wikipedia , lookup
Time perception wikipedia , lookup
Aging brain wikipedia , lookup
Artificial neural network wikipedia , lookup
Premovement neuronal activity wikipedia , lookup
Cortical cooling wikipedia , lookup
Neuroplasticity wikipedia , lookup
Synaptic gating wikipedia , lookup
Neural modeling fields wikipedia , lookup
Feature detection (nervous system) wikipedia , lookup
Biological neuron model wikipedia , lookup
Visual servoing wikipedia , lookup
Neuroesthetics wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
Neurophilosophy wikipedia , lookup
Convolutional neural network wikipedia , lookup
Neural engineering wikipedia , lookup
Neuroeconomics wikipedia , lookup
Cognitive neuroscience wikipedia , lookup
Development of the nervous system wikipedia , lookup
Holonomic brain theory wikipedia , lookup
Neural correlates of consciousness wikipedia , lookup
Efficient coding hypothesis wikipedia , lookup
Recurrent neural network wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Neural binding wikipedia , lookup
Politecnico di Milano Dipartimento di Elettronica e Informazione PhD program in Information Technology Towards the integration of neural mechanisms and cognition in biologically inspired robots Doctoral Dissertation of: Flavio Mutti Advisor: Prof. Giuseppina Gini Tutor: Prof. Andrea Bonarini Supervisor of the Doctoral Program: Prof. Carlo Fiorini 2012 - XXV edition Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32 I 20133 — Milano Politecnico di Milano Dipartimento di Elettronica e Informazione PhD program in Information Technology Towards the integration of neural mechanisms and cognition in biologically inspired robots Doctoral Dissertation of: Flavio Mutti Advisor: Prof. Giuseppina Gini Tutor: Prof. Andrea Bonarini Supervisor of the Doctoral Program: Prof. Carlo Fiorini 2012 - XXV edition A Fernanda v Acknowledgements Eccoci qua. Se sto scrivendo queste parole significa che sono arrivato in fondo a questo lavoro. A ben pensarci questa è la terza volta che scrivo dei ringraziamenti in un lavoro di tesi, ma forse questi sono quelli più speciali perchè difficilmente avrò voglia di scriverne un’altra. Non è stato facile arrivare fino in fondo; mi sono impegnato, mi sono sforzato ma devo ammetere che molti hanno contribuito alla buona riusciuta di questa impresa. Prima tra tutti, vorrei ringraziare la prof.ssa Gini che mi ha guidato ed insegnato in tutto il programma di dottorato. Un grazie di cuore. Inoltre, ringrazio il prof. Zanero e Federico Maggi per avermi guidato nello sviluppo di uno dei miei temi di ricerca preferiti. In secondo luogo, ma non secondi, vorrei ringraziare la mia famiglia che in quasi 10 anni di università non mi hanno mai fatto mancare nulla, supportandomi nelle mie scelte. Luisella, Giorgio, Sante, Gino grazie. Melania, un grazie particolare a te, che mi sei stata e mi sei vicino in questo importante traguardo. A Riccardo, con cui ho condiviso e tuttora condivido la passione per la scienza e la ricerca (mi aspetto il Nobel da entrambi). Al prof. Pfeifer, Cristiano, Hugo, Naveen, JP, Mat per il magnifico periodo passato a Zurigo. Ho imparato tanto lavorando con voi. Ad Ale e Paolo, con cui condivido una nuova avventura lavorativa. Grazie, per il supporto, l’incoraggiamento e per il cactus da ufficio che mi avete regalato. A Nicola, il mio collega di avventure nelle ostili terre dei dottorandi. Zurigo, Dubrovnik, Vienna per citarne alcune. Ad Alessio, per le mangiate, le risate e le serate milanesi. Anche tu ormai conosci gioie e dolori del dottorato. A Camilla, che mi ha aiutato a scrivere questa tesi in un inglese che non siano angloitaliano. E meno male che i ringraziamenti li sto scrivendo nella mia lingua madre. A Vittorio, il Prazzo, Ago e Filippo. (Ex-)Abitanti della baracca di Milano. Ai ragazzi del laboratorio del primo piano: Michele, Giampaolo, Gerardo, Alessandro, Ettore. Avete condiviso con me i pranzi per tre anni, non crediate di liberarvi di me tanto facilmente. vii Agli amici dell’amministrazione del DEIB per le pause caffè, la simpatia, e le scommesse sul team Mutti-Vitucci. Agli amici di Piacenza, non posso elencarvi tutti ma sappiate che intendo proprio voi. Ai miei compagni di squadra dello ZeroNove, fuorza! viii Abstract How intelligence arises in humans is far to be completely unveiled. Understanding the brain mechanisms that make it possible is one of the most interesting and debated topics in neuroscience. However, recent advances speculate about that this is only half part of the story. Intelligent behaviours in humans could emerge from a good balance among several factors, namely, the brain, the body, sensors, actuators, and the environment. Even though no conclusive evidences about the truth of this theory are available, it is very promising. Beyond the great relevance for science, the natural application of these studies is the robotics field. In the last decades, several approaches have been proposed to design intelligent machines on the basis of scientific findings, especially from neuroscience. The underlying idea of these approaches is to transfer knowledge from neuroscience and biomechanics to the designing of biologically inspired robots. Although the design of a mechanical structure mimicking the biological counterpart is still an affordable task, it becomes less true for the design of underlying neural mechanisms both for controlling the body and for the emergence of cognition. Even tough the biologically inspired robotics is a very active research field, several solutions have been proposed, from evolutionary approaches to developmental robotics. However, a solution that encodes on the same neural lattice low level computational mechanisms and the emergence of cognition is still missing. In this thesis a comprehensive study of this topic is presented. First, I design several low level computational models composing the visual dorsal pathway, which is devoted to one of the most relevant functionalities provided by the brain: the reaching task. A comparison with the state of art is presented. Second, a comparison among previously developed models inspires the proposal for a common computational framework that should embody brain computational principles, such as population coding. Third, a biologically inspired cognitive architecture is proposed. The architecture develops a middle level of cognition, filling the gap between the low level computational mechanisms and symbolic reasoning. This cognitive architecture is able to generate new goals and behaviours from previous ones, exploiting the synergy among the thalamus, the ix amygdala, and the cortex. Finally, a proposal for a roadmap towards a fully integrated biologically inspired architecture is presented. This architecture exploits the synergy between the low level computational mechanisms and the proposed cognitive architecture. x Sommario Come l’intelligenza emerge dal cervello umano è ancora un problema aperto per la comunità scientifica. Comprendere come i meccanismi neurali del cervello permettano il sorgere degli aspetti cognitivi è uno dei più interessanti e dibattuti argomenti nel campo delle neuroscienze. I comportamenti intelligenti potrebbero emergere come giusto bilanciamento tra diversi fattori, quali: il cervello, il corpo, i sensori, i meccanismi di attuazione, e l’ambiente in cui è immerso il soggetto. Nonostante non ci siamo prove conclusive che l’interazione tra le sopracitate componenti sia condizione necessaria e sufficiente per l’emergere dell’intelligenza, la teoria proposta è promettente. Al di là della grande rilevanza scientifica di questo argomento, la naturale applicazione di questi studi è la robotica. Negli ultimi decenni sono stati portati avanti numerosi progetti di ricerca il cui obiettivo era di sviluppare macchine intelligenti, inspirate da studi neuroscientifici. Ne segue che la progettazione può riguardare sia la parte meccanica del robot, sia la parte di elaborazione delle informazioni. Se la progettazione di parti meccaniche ispirate dalla natura è ancora un compito ragionevolemente risolvibile, questo diventa più complitato nel momento in cui si voglia modellare il cervello sia per il controllo del robot che per l’emergere di aspetti cognitivi. Il campo della robotica bioinspirata è un campo di ricerca molto attivo, e diverse soluzioni sono state proposte, dall’evolutionary robotics alla developmental robotics. Tuttavia, una soluzione, che incorpori nello stesso modello neurale sia gli aspetti cognitivi che gli aspetti di basso livello per il controllo del corpo, è ancora mancante. In questa tesi viene proposto uno studio comprensivo di diversi meccanismi neurali, elicitandone punti di forza e di debolezza. Primo, ho proposto diverse architetture neurali che modellano aree corticali afferenti al percorso visivo dorsale, il cui compito principale è risolvere il problema del raggiungimento degli oggetti percepiti attraverso il sistema visivo. Secondo, ho confrontato i modelli sopracitati per proporre un framework computazionale comune che incorpori quei principi computazionali diffusi in tutte le aree del cervello. Un esempio di questi principi computazionali è il cosidetto population coding. Terzo, ho sviluppato una architettura cognitiva biologicamente inspirata. L’architettura sviluppa xi un livello cognitivo intermedio, facendo da ponte tra i meccanismi computazionali di basso livello e il ragionamento simbolico. Questa architettura è in grado di apprendere nuovi obiettivi e comportamenti basandosi sull’interazione di alcune aree cerebrali, quali: il talamo, la corteccia, e l’amigdala. Infine, viene proposta una roadmap per lo sviluppo di una architettura biologicamente inspirata che tenga conto sia degli aspetti computazionali di basso livello che degli aspetti cognitivi. xii Contents 1 Introduction 1.1 Problem definition . . . . . . 1.2 Biologically inspired solutions 1.3 The aim of this work . . . . . 1.4 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 5 6 2 Related works 2.1 Overview . . . . . . . . . . 2.2 Neuroscience . . . . . . . . 2.3 Biomimetics . . . . . . . . . 2.4 Robotics . . . . . . . . . . . 2.4.1 Industrial Robots . . 2.4.2 Autonomous Robots 2.5 Closing remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 9 11 12 13 15 18 . . . . . . . . . . . . . . ar. . . . . . . . . 19 19 19 22 24 26 27 31 . . . . . . . 3 Motivations 3.1 How the mammal brain executes reaching . . . . . . 3.1.1 Background . . . . . . . . . . . . . . . . . . . 3.1.2 The primary visual cortex . . . . . . . . . . . 3.1.3 The posterior parietal cortex . . . . . . . . . 3.1.4 The primary motor cortex . . . . . . . . . . . 3.2 Modelling a biological architecture . . . . . . . . . . 3.3 Proposed neural models and their role . . . . . . . . 3.4 The proposal of a roadmap for developing bioinspired chitectures . . . . . . . . . . . . . . . . . . . . . . . . 4 A Primary Visual Cortex Model for Depth 4.1 Related works . . . . . . . . . . . . . . 4.2 Neural Model . . . . . . . . . . . . . . 4.2.1 Image preprocessing . . . . . . 4.2.2 Disparity energy neurons . . . 4.2.3 Neural Architecture . . . . . . 4.2.4 Disparity direction . . . . . . . 4.3 Experimental Results . . . . . . . . . . 4.4 Conclusions . . . . . . . . . . . . . . . Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 . . . . . . . . 37 38 40 40 40 42 44 45 48 xiii Contents 5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem 5.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Neural Architecture . . . . . . . . . . . . . . . . . . . . . 5.2.1 Sensory representation . . . . . . . . . . . . . . . . 5.2.2 Posterior Parietal Cortex model . . . . . . . . . . . 5.2.3 Head-centered network layer . . . . . . . . . . . . 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . 5.3.1 Experiment with retinal position in degrees . . . . 5.3.2 Experiment with a simplified camera model . . . . 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 51 52 52 53 55 56 57 57 6 A Visuomotor Mapping Using an Active Stereo Head troller Based on the Hering’s Law 6.1 Related works . . . . . . . . . . . . . . . . . . . . . . 6.2 Neural Architecture . . . . . . . . . . . . . . . . . . 6.2.1 Hering-based Control system . . . . . . . . . 6.2.2 Extending the Hering-based Control system . 6.2.3 Visuomotor mapping for a 3 DoF arm . . . . 6.3 Experimental Results . . . . . . . . . . . . . . . . . . 6.3.1 Hering-based results . . . . . . . . . . . . . . 6.3.2 Extended Hering-based results . . . . . . . . 6.3.3 Visuomotor mapping results . . . . . . . . . . 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . 7 A model of a middle level of cognition 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 7.2 Biological model . . . . . . . . . . . . . . . . . . . 7.3 Implementation Model . . . . . . . . . . . . . . . . 7.3.1 Intentional Distributed Robot Architecture 7.3.2 Motor system: movement generations . . . 7.4 Experimental results . . . . . . . . . . . . . . . . . 7.4.1 Setup . . . . . . . . . . . . . . . . . . . . . 7.4.2 Goal generation . . . . . . . . . . . . . . . . 7.4.3 Movement generation . . . . . . . . . . . . 7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Con. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 60 62 62 64 66 69 69 71 74 78 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 81 84 86 86 93 96 96 96 99 103 8 Conclusions 105 Bibliography 111 xiv List of Figures 2.1 2.2 3.1 3.2 3.3 3.4 3.5 4.1 The sets show research fields that are involved in the development of this work. The robotics field represents the classical approach to robotics; the neuroscience field is related to the neuroscientific advances in the comprehension of the brain functionalities; the biomimetic is the field related to mimicking biological solutions to solve specific problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 The classical robot model with a closed loop of sensing and actuation. The information flow is processed in a serial way. . . . . . . . . . . . . . . . . . . . . . . . . . . 12 The anatomy of the dorsal pathway . . . . . . . . . . . . The anatomy of the primary visual cortex area (V1) . . . The anatomy of the posterior parietal cortex area (PPC) . The anatomy of the primary motor cortex area (M1) . . . This sketch represents the 3-layers architecture, which supports this work. The architecture is composed by 3 layers. The mechanics and sensors layer is the physical robot (or its model in case of simulation). The electrical/actuators layer is the bridge between the low-level neural circuits and the robot; it is the control interface and it implements how the neural activity is translated in actuation. The Neural lattice layer is the brain model and it is fairly composed by at least two sublayers: the neural circuits and the cognition. The neural circuits layer contains the biologically inspired models of the brain functional areas; their main functionalities are the information processing, the sensorimotor mapping, and the motor representation. The cognitive layer contains other neural circuits that elaborate information at an higher level, taking into account motivations and goals. In my study, each layer can communicate either with the lower or the higher one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 22 24 26 28 Proposed neural architecture . . . . . . . . . . . . . . . . 42 xv List of Figures xvi 4.2 Cones estimation . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Teddy estimation . . . . . . . . . . . . . . . . . . . . . . . 46 4.4 Venus estimation . . . . . . . . . . . . . . . . . . . . . . . 46 4.5 Tsukuba estimation . . . . . . . . . . . . . . . . . . . . . 47 4.6 Comparison among the proposed neural architecture for disparity estimation and some state-of-art algorithms [106]; the table is extracted from the online evaluation page of Middlebury Database. . . . . . . . . . . . . . . . . . . . . 47 5.1 (Left pane) Body definition composed by an eye, a head and an arm with the same origin. (Right pane) Neural Network model. The first layer encodes the sensory information into a neural code, the second layer models the posterior parietal cortex and it performs the multi sensory fusion and the third layer encodes the arm position with respect to the head frame of reference. . . . . . . . . 51 5.2 Experimental results with rx in degrees.(Top left) It shows the responses of the trained network that represents the arm position ax with respect to the head frame of reference for −20◦ ,0◦ and 20◦ , respectively. (Top right) The error distribution (in degrees) of the estimated ax with respect to the arm position, the eye position and the retinal position respectively. The solid lines represent the mean error and the dashed lines represent the standard deviation limits. (Bottom left) It represents a receptive field after the training phase of the PPC layer. (Bottom right) Contours at half the maximum response strength for the 64 PPC neurons. . . . . . . . . . . . . . . . . . . . 54 5.3 Experimental results with rx in pixels.(Top left) It shows the responses of the trained network that represents the arm position ax with respect to the head frame of reference for −20◦ ,0◦ and 20◦ , respectively. (Top right) The error distribution (in degrees) of the estimated ax with respect to the arm position, the eye position and the retinal position respectively. The solid lines represent the mean error and the dashed lines represent the standard deviation limits. (Bottom left) It represents a receptive field after the training phase of the PPC layer. (Bottom right) Contours at half the maximum response strength for the 64 PPC neurons. . . . . . . . . . . . . . . . . . . . 55 List of Figures 6.1 Frames of reference of the active stereo system with 3 DOF. The tilt movement is executed along the x-axis of the world frame, and it rotates the frames of both eyes of θT [rad]. Ideally, I define a virtual neck that performs the tilt movement. . . . . . . . . . . . . . . . . . . . . . . . . 64 6.2 System architecture. (left pane) The schematic model of the working environment with the active stereo system and the arm initial position. The aim is to detect the target position in space through stereo cameras, compute head joint angles to foveate the target and directly compute the final joint configuration of the arm to reach the target location. The sensorimotor map is learned using the end-effector itself as a target for the vision system. (right pane) The schematic of the arm. It has 3 DOF with links lengths compatible with human counterparts. The range of θ1 is [− π2 , π2 ], the θ2 range is [− π2 , π2 ] and the θ3 range is [0, 43 π]. . . . . . . . . . . . . . . . . . . . . . . 67 6.3 Error maps computed for the left eye; I have experienced very similar error values also for the right eye. Top row: testing sets with the error associated to each foveated 3D point. Bottom row: the error distribution in pixel for the testing set. The red line represents the mean of the error. As I expected the error distribution along the Z direction is lower then along the other directions. . . . . . . . . . . 70 6.4 Original system. In the left pane it is shown the mean error associated to each 3D point in the testing set. The mean error is computed considering each plausible initial joints configuration of the head; for each configuration I compute the error to foveate. In the right plane is shown the mean error distribution. . . . . . . . . . . . . . . . . 72 6.5 Error maps computed for the left eye of the extended system with the fifth neck configuration. . . . . . . . . . . 73 6.6 Extended system. In the left pane it is shown the mean error to foveate each 3D point in the testing set. The mean error is computed considering each plausible initial joints configuration of the head. In the right pane it is shown the mean error distribution. . . . . . . . . . . . . . 74 xvii List of Figures 6.7 The trajectories of the cameras performed by the trained extended system. The blue cross represents the 3D feature in space in position [200 0 40]. For graphical reason, the image is scaled but it is clearly shown that the system firstly moves the neck and only when the neck is in a steady position the eyes perform the vergence movement. 75 6.8 (left pane) Complete dataset. The points in space represent the end-effector positions used as targets for the active stereo head. In the dataset at each end-effector position are associated the corresponding arm joint configuration, the foveating joint angles of the head with the euclidean error between the foveation point e the endeffector position. This dataset is used for the cross-folding validation. (right pane) The Euclidean error between the real end-effector position and the one estimated by the radial basis network; the error is quite low except for that 3D points that are very near to the head and to the shoulder. 76 6.9 Error directions projected on different planes of the world frame of reference. The blue dots represent the targets and the red lines are the distance in space between the target and the arm position computed by the network. For visualization reasons, I do not plot the estimated endeffector position. (left pane) Error projection into the plane X-Z. (right pane) Error projection into the plane Y-Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.10 Radial basis centers distribution in the input space. The red circle represents a basis center, the blue dots are the testing values and the cyan dots are the values used for the training phase. . . . . . . . . . . . . . . . . . . . . . . 77 7.1 xviii The overall IDRA architecture. It is composed by a set of Intentional Modules (IM) and by a Global Phylogenetic Module (PM). It receives in input a set of sensory information a produce in output the motor commands for the controlled robot. . . . . . . . . . . . . . . . . . . . . . . . 82 List of Figures 7.2 A comparison between a (very) sketchy outline of the thalamo-cortical system as commonly described [84] and the suggested architecture. It is worth to emphasize the similarity between various key structures: A. Category Modules vs cortical areas, B. Global phylogenetic module vs. amygdala, C. Ontogenetic modules vs. thalamus, D. Control signals (high and low bandwidth), E. high bandwidth data bus vs intracortical connections. . . . . . . 7.3 Intentional Module (IM) . . . . . . . . . . . . . . . . . . 7.4 Ontogenetic Module (OM) . . . . . . . . . . . . . . . . . 7.5 State-action Table . . . . . . . . . . . . . . . . . . . . . 7.6 The 2 boards for the experiment . . . . . . . . . . . . . 7.7 The architecture used in the goal-generation experiment 7.8 The gaze is concentrated on the star object . . . . . . . 7.9 The red line represent the global relevant signal whereas the blue line represent the ontogenetic signal. After training, the cognitive architecture is able to produce a signal of relevance also for the star-shaped objects. . . . . . . . 7.10 The robot NAO in the movement experiment . . . . . . 7.11 The architecture in the second experiment . . . . . . . . 7.12 The most relevant hand positions . . . . . . . . . . . . . . . . . . . . 87 89 90 95 97 98 99 . . . . 100 100 101 102 xix List of Tables 3.1 A proposal of classification of the experiments. The first three entries are experiments at the Neural circuits level (see Figure 3.5) where the neural architectures are developed focusing on the computational mechanisms founded in the brain. The last entry is an experiment that propose a cognitive architecture, focused on the interaction of the cortex with the other brain areas. It is worth noting that the computational network, once learned, does not need other learning phase, whereas the cognitive model has a variable neural network that learns during the interaction with the environment. . . . . . . . . . . . . . . . . . . . . 33 4.1 The angle deviation from the optimality . . . . . . . . . . 45 6.1 Possible configurations . . . . . . . . . . . . . . . . . . . . 66 xxi 1 Introduction The problem of designing an intelligent robot is far to be solved. An interesting challenge, both for engineers and neuroscientists, is to develop a robot that acts like a human, thinking like a human. Even though this problem is not new and researchers have been worked on this field since the XX century, in the last decades we have seen the arising of the biologically inspired approaches. Biologically inspired solutions mimic what is known about the brain to import in robots some capabilities of thinking. Nevertheless, the declared aim is to develop a complete conceptual and computational framework, describing both how the brain might work and when the cognition arises. In this thesis, I will review the last findings in this wide research area, with the clear intention to unveil some features of the brain computational framework. 1.1 Problem definition Reaching a target in the environment with an arm is one of the most relevant capabilities of mammals. The reaching task involves several computations that transform the perception of the target in a complete movement of the arm to reach it. First, the target must be perceived with at least an external sensory system, such as vision, and filtered in order to localize its position in the sensory frame of reference (FoR); second, the information coming from the sensory system(s) must be integrated with the proprioception information of the body, such as the position of the arm with respect to the body; third, the target position must be computed with respect to the arm FoR, performing a coordinate transformation between the sensory FoR and the arm FoR; fourth, the arm movement trajectory must be computed and executed. Typically, an intelligent behaviour is identified by an observer in a subject able to react to unexpected events, and to modify and manipulate the surrounding environment. It implies a strong correlation between the capability to perform actions by a smart subject and the high cognitive reasoning. Despite recent findings in neuroscience, the underlying mechanisms of the human brain are far to be completely understood. How the brain processes information to solve the reaching task is one of the key feature that makes mammals on the top of the evolution chain. 1 1 Introduction It is well known that the underlying neural circuitry is organized in several functional areas responsible to solve specific subtasks of the whole cortical information processing [59]. It implies that an high level of synchronization among different areas is needed. Moreover, this functional organization follows the well known divide et impera paradigm. These anatomical and functional characteristics are interesting from a neuroscientific point of view since almost the whole brain is activated to solve a reaching task [108]. Another interesting feature of the brain is that the underlying computational mechanisms of the different brain functional areas are widespread in mammals [12][29][61][87]. Besides the scientific implication of this fact, from an engineering point of view it is interesting the capability of these computational mechanisms to self adapt to different bodies, exploiting their morphology. For these reasons, I conclude that the most interesting characterization of the basic motion tasks as emerged from neuroscience are: first, the indication that the reaching task is the expression of a complex and intelligent behaviour; second, the high level of synchronization and organization of the different brain areas; third, the adaptability of the computational mechanisms to different body shapes. On the other hand, the robotics community addresses the reaching task since the early era of robotics [113]. Despite many efforts to solve the problem of reaching task, a robust and generic solution is missing. However, it is commonly accepted that the capability to solve the reaching task is a very important characteristic of a robotic system, especially for humanoids. Interacting with the environment is the main goal of any robot, regardless the specific task performed. Moreover, a generic and robust solution is still missing, and several approaches are available, such as optimal feedback control [108][120][119], visual servoing [113], and adaptive control [50][113]. However, these methods need to know some robot characteristics that could be not available such as kinematics, dynamics, and controller parameters. Of course, the robot interacts with the environment pursuing a task that must be accomplished. Typically, the knowledge is intrinsically coded by the designer that programs the robot to perform specific trajectories, with several constraints, such as time execution, velocities, and accelerations. Moreover, in the industrial context, the robots perform repetitive tasks, that need high precision and few autonomous decisions. On the other hand, autonomous robots must solve high level problems without any explicit definition of them. They typically work in hostile, highly dynamic environments and they must take decisions with a partial knowledge of both the surrounding environment and the robot state. In this case, taking smart decisions is crucial for achieving the goals. An 2 1.2 Biologically inspired solutions autonomous robot, working with these constraints, needs the capability to think autonomously and to take actions, pursuing its own goals. For this reason, a cognitive architecture seems the obvious answer to those problems that can be solved by robots and that need high autonomy in the decision-making phase. Classical approaches at the developing of cognitive architecture are based, among the others, on symbolic processing, rules, and statistical learning (for a review see [126]). Making a further step, some scenarios could need a robot that is not only able to take decision with respect to its past experience but that is also able to develop new goals and behaviours. This goal generation phase is grounded on some innate criteria that bootstrap the following behaviours. These new goals should represent an higher level of abstraction with respect to basic goals, towards an artificial consciousness [23][24]. This processing is quite similar in humans. For example, let suppose that a primary need for a human is eating. Generally, the human will act to reach its objective. However, its own actions will be different if its surrounding environment is the jungle or the metropolis. In the first case, he will hunt animals or collect vegetables, whereas, in the second case, he will go to the market. But, definitely, in both cases he is pursuing its basic goal of eating. Both behaviours (or goals) of going to the market and hunting are higher level abstractions of the need of eating. These concepts are particularly relevant if the aim is the design of a complex cognitive architecture that is also able to adapt its own behaviour through the interaction with the environment. However, a cognitive agent can not interact with the environment without a computational framework able to process the incoming sensory information, to automatically estimate the environmental state, and to interact with the surrounding objects. In the same way, an architecture able to reach objects in space, given the perceived position through the sensory system, is useless without a plan that can be generated by a further level of processing, the cognition. The synergy among a cognitive architecture, the way in which an agent perceive and interact, and the working environment, can drive towards a new generations of autonomous robots [92][93]. 1.2 Biologically inspired solutions A biologically inspired control strategy (also know as biomimentic or bioinspired) is a computational model inspired by the study of animal brain. These computational models can solve different problems, from the distance estimation of an object perceived by the vision or audio 3 1 Introduction system to the arm trajectory computation for reaching it. In the last decades, these models have been applied in the robotics field, following the intuition that a robot, that acts like a human, could use the same strategies to take a decision. Generally, the implicit assumptions of the biologically inspired solutions are that they are robust to noise, able to adapt at different robot morphologies, and to be the best solution just because the evolution has chosen it. There are several reasons to design a biologically inspired controller to control a generic robot, from humanoids to wheel-driven ones. First, the design methodology is deeply different. Classical solutions are typically based on control system theory [113]. Classical strategies are widely used in factories where the robots are mainly used to improve the productivity. Roughly speaking, classical design processes are composed by four steps: task definition, feasibility study, robot modelling and technology, and controller design. Using the control theory principles it is possible to design open/closed loop controllers that are able to properly drive the robot in its activities, that are task specific. Once a robotic setup is dismissed and a new one is implemented, it could be necessary to design a completely different controller to solve the same task. The constraints of the classical solutions are properly taken into account by the designer and the parameters of the controllers must be properly chosen. On the other hand, the biologically inspired solutions are derived by the study of animals and humans brain circuitry, meaning that a designer must observe a solution that already exists in nature and that is highly optimized by the evolution [125]. Second, the classical control strategies are ad-hoc solutions that take into account the specific robotic setup; in fact, an ad-hoc solution could not be simply applied to a different robotic setup, whereas biologically inspired solutions have internal parameters that must be estimated during a training phase and that are related to the computational strategies rather than the robot morphology, although bioinspired solutions exploit the robot shape. Third, classical controllers can deal only with those tasks that are encoded in their programs because they usually work in structured environment. The biologically inspired solutions are more oriented to autonomous robots and, as it will be clear in the following chapters, the cognition could emerge in the same neural layer encoding the low level computation mechanisms. Fourth, in the last twenty years the classical solutions have shown their own limits in the autonomous robotics, whereas the biologically inspired strategies has been developed with particular efficiency. Fifth, other approaches to autonomous robotics, though biomimetic 4 1.3 The aim of this work solutions, usually deal with only a part of the big problem of designing an intelligent machine. Techniques such as Self Localization and MApping (SLAM)[72] or visuomotor mapping [21][22] often solve specific tasks, with a true difficulties to compose them in a single controller. Moreover, it is opinion of the author that other biomimetic approaches, such as evolutionary robotics [37][83], partially cover the big picture of biologically inspired solutions. In fact, Webb pointed out that there are several ways to design biological plausible models and she derived seven paradigms to evaluate them [130]. The biologically inspired controller based on neuroscientific models should overcome this integration problem, merging the computation and the cognition on the same neural lattice. Despite the above considerations, designing biologically inspired solutions is not a simple task due to several reasons. First, the human and robot mechanics are very different and the low level control strategies will differ; in fact, the human joints are actuated through the muscles whereas the robot joints can be directly actuated through servomotors. Second, there is technological limit in the computational capability of the hardware and the computational load needed by a bioinspired method could not be applied in real time. Third, mimicking only specific brain areas that solve specific tasks could have worse performances with respect to classical methods whereas a complete brain model could really overcome a complete classical strategy. In the following, I will refer to a biologically inspired (or bioinspired) solutions assuming that the bioinspiration is taken by neuroscientific findings. 1.3 The aim of this work This thesis pursues several objectives, at different level of complexity. First, I want to investigate several biologically inspired models of different cortical areas that are functionally grouped in the visual dorsal pathway. The comparison between each single model and its own state of art gives insight in the neuroscientific findings, related to computational mechanisms of the cortical areas. Second, a qualitative comparison among previously developed models permits to propose a common computational framework for those computations requiring a minimum level of cognition and that our brain is able to automatically compute. Third, a biologically inspired cognitive architecture is investigated. It is based on the interaction among the cortex and other areas of the brain 5 1 Introduction (e.g. thalamus and amygdala) that typically are not well investigated. Fourth, the proposal of a roadmap for the developing of biologically inspired architecture is presented. These architectures must exploit the synergy between low-level computation and cognitive development. The analysis involves a comparison among different types of learning of both low level computational models and the cognitive architecture. A roadmap for the integration is proposed. 1.4 Thesis organization Chapter 2 shows the related works, introducing the common background needed to understand how scientific and engineering fields interact. Chapter 3 introduces relevant aspects of this thesis, focusing on the visual dorsal pathway, on how to model biologically inspired architectures, and on the motivations driving this work. Chapter 4 introduces the model of the primary visual cortex for the disparity map computation. Quantitative measurements are provided and compared with the state of art algorithm for the dense disparity map estimation. Chapter 5 introduces a model of the posterior parietal cortex able to compute the visuomotor mapping between the eye frame of reference and the arm frame of reference. The map is learned through unsupervised learning. Quantitative results are provided to evaluate performances with respect to previous results. Chapter 6 introduces a model of the Hering’s law of equal innervation to control an active stereo head and to perform the arm trajectory to reach the foveated target. Quantitative results are provided to verify the robustness of the system. Chapter 7 introduces a cognitive architecture for goal generation based on the interaction among the thalamo, the amygdala, and the cortex. Qualitative results are provided to verify the generation of new goals. Chapter 8 draws the conclusions of this thesis, considering the experimental results of the previous chapters. 6 2 Related works The biologically inspired solutions for robotics involve at least three main research fields: robotics, biomimetics, and neuroscience. Since the amount of information about these fields is too huge to be described in a single chapter, I will specifically focus on those topics that are strictly related to this thesis. The aim is to introduce those concepts that will be the common background for the following chapters. Section 2.1 presents how these research fields are overlapped and why; Section 2.2 presents those works related to neuroscience, describing the mammal brain; Section 2.3 describes the classical approach to mimic biological solutions; Section 2.4 shows the recent advances in robotics and the classical approaches. 2.1 Overview Biologically inspired controllers, mimicking the brain functionalities in terms of both cognitive capabilities and neural computation, are a fairly new and promising topics. Although the performances are still questionable, recent advances show the potentiality of this new approach. However, this topic needs the knowledge of at least three broad research fields, namely, neuroscience, robotics, and biomimetics. Figure 2.1 shows a sketch diagram highlighting several research topics that need a shared knowledge among these research areas. Robotics is the research field dealing with the study and the design of robots, that must substitute humans in repetitive or dangerous tasks; biomimetics is the research field that studies how to find both novel solutions and technologies by means of observing the natural evolution; neuroscience is the research field that studies how the brain might work, building models and inferring new theories. On the other side, the intersection between robotics and biomimetics deals with, among the others, the design of engineering technologies that mimic the natural solutions, such as the spider net for new buildings and materials. For instance in robotics, the human skeleton and muscles are investigated for the designing of new robot mechanics and actuators. Moreover, researchers with a background in both robotics and neuroscience usually are involved in Human Computer Interface researches. 7 2 Related works Figure 2.1: The sets show research fields that are involved in the development of this work. The robotics field represents the classical approach to robotics; the neuroscience field is related to the neuroscientific advances in the comprehension of the brain functionalities; the biomimetic is the field related to mimicking biological solutions to solve specific problems. 8 2.2 Neuroscience For example, they study solution for driving devices with the EEG signals; these devices could be either wheelchair, prosthesis, or computers. Furthermore, the overlapped region between neuroscience and biomimetics is quite fuzzy because, from a classical point of view, the description of brain functionalities in terms of computational models is part of the scientific approach; in the neuroscience domain, it is called Computational Neuroscience. However, the difference consists in the application of these computational models. In fact, if they are used for solving engineering problems such as disparity estimation, they can not be considered purely scientific models but stay in the common region between neuroscience and biomimetics (for a discussion, refer to [94][130]). Finally, the synergy among these three broad research areas permits to develop novel solutions for the design of biologically inspired controllers that mimic the brain both for cognition and for low level computational mechanisms. In the following paragraphs, I will present an overview of the recent advances in these fields, taking also into consideration overlapping topics. Even though I deal with a huge amount of literatures and the overlapping research areas are not always well defined, I will introduce a simple taxonomy of the most relevant works in these main research fields. The overlapping topics will be presented once, focusing on particular applications. 2.2 Neuroscience Neuroscience deals with the study of both the central and peripheral nervous system. This research field covers several topics, from the chemical explanation of the activation level of neurotrasmitters to the observed behaviours, driven by the neural activity. Among these topics, there are at least three research subareas that are relevant for my thesis, namely, cognitive neuroscience, evolutionary neuroscience, and computational neuroscience. These fields provide worth contributions for the development of robotics architectures based on neuroscientific principles. For a very complete introduction about the principles of neuroscience, please refer to [59]. Cognitive neuroscience focuses on the development of a theoretical framework that could fill the gap between the neural activity and complex behavioural traits of the brain such as memory, learning, high vision processing, emotion, and higher cognitive functions. The underlying feature, widespread among these brain functionalities, is the information processing: how the brain encodes and propagates information [2]. 9 2 Related works Despite recent advances, the neural mechanisms, driving the interaction with the environment to select a proper action, are still under discussion [14][26]. According to the classical view, the brain workflow is composed by at least three phases: perception, cognition, and action. Following this classification, the cognitive functions are separated from the sensorimotor system, but recent works show that the cognitive functions are not localized in high specialized brain areas; instead they are managed by the same sensorimotor neural population. For a complete discussion about recent advances in cognitive neuroscience please refer to [26]. Evolutionary neuroscience deals with the understanding of humans brain evolution through time. In particular, the classical approach for studying the development of the human brain is to compare it with the brain of the others mammals. In fact, evaluating the main genetic differences, it is possible to infer how the natural evolution has influenced the emergence of humans. Taking into consideration the aim of my work, I will not focus on this neuroscientific branch but it is worth noting that this topic perfectly match those concepts driving the studies in evolutionary robotics; for a complete review, please refer to [99]. Computational neuroscience is the research field dealing with the understanding of how neural populations encode and process information [107]. Typically, these research propose mathematical models explaining some relevant features of the biological counterparts. According to Sejnowski et al taxonomy [107], three classes of brain models exist: the realistic brain model, the simplifying brain model, and technology for brain modelling. The realistic brain models encapsulate as much as possible details regarding the biological object under investigation. These models are characterized by a huge computational time on large parallel computers (or clusters of computers) for simulations. An example of this kind of model is the Hodgkin-Huxley neuron model [47]. The simplifying brain model overcomes the computational infeasibility of realistic models, providing higher level characteristics without taking into account the physics dynamics of chemical processes underlying neural communications. An example of this modelling strategy is the artificial neural network of perceptrons. Typically, the technology for brain modelling deals with the development of dedicated hardware mimicking the same computational parallelism of the biological brain. According to the Webb’s paradigms, realistic models have biological relevance and a fine level of details whereas neural networks have a very low degree of biological relevance but have a higher level of abstraction [130]. It is worth noting that these models can be described at the level of a single neuron or at the level of a brain area. The single-neuron mod10 2.3 Biomimetics els try to infer both the firing properties of the biological neuron and the information encoding (at the level of single neuron); for a review, please refer to [55]. On the other hand, a brain area model tries to infer the organization of the population of neurons in order to produce a neural activity that has the same properties of the biological counterpart. These populations have their computational mechanisms that can not be inferred by the single neuron activities; example of these approaches are models of the primary visual cortex [76][124], the posterior parietal cortex [82][135], and the amygdala-thalamo-cortex interaction [69][70][81]. Among the others, an interesting application of neuroscience advances is the prosthesis research field. Even though, the design of a prosthesis is well defined in literature there are several problems to deal with: designing the artificial limb (more related to robotics issues), controlling the prosthetic implants through neural signals, reducing human tissues reaction to prothesis material, eliminating noise, and making the prosthesis psychologically accepted. For a complete review, please refer to [63]. 2.3 Biomimetics Biomimetics focuses on the comparison between nature problem-solving techniques and their application in the engineering technologies. Although there is a large amount of literature regarding bioinspired solutions, a largely accepted taxonomy is still missing [130][94]. Recently, a taxonomy proposes three main categories of bioinspiration: the comparative approach, the natural via artificial approach, and the natural pro artificial approach [125]. The comparative approach is based on the comparison of the performances of both systems in order to infer properties where the underlying idea is to perform the same experiments with both the natural and the artificial system to investigate similarities and dissimilarities. In the natural via artificial approach models of biological systems are used to infer properties of the natural counterpart. In the natural pro artificial (pro) approach, engineers study natural systems to infer novel solutions to either novel or well-known problems. The distinction among these three approaches could be debated, but for the purposes of this work I will always refer to the natural pro artificial approach. Following the above mentioned taxonomy, the pro approach, among the others, deals with the development of natural computation mechanisms that will be integrated in engineering systems for problem-solving. 11 2 Related works Figure 2.2: The classical robot model with a closed loop of sensing and actuation. The information flow is processed in a serial way. For example, in the computer networks field, a taxonomy of biologically inspired algorithms related to several issues in computer networks such as routing, anomaly detection, and spreading of computer viruses has been recently proposed [71]. In robotics, several models inspired by (non-)human brain has been proposed. They model several aspects of the brain functionalities such as depth estimation [76], localization [72], sensorimotor mapping [21][79], neuroevolution [36], and motor skills [54]. On the other hand, the pro approach also deals with the study and development of new technologies starting from the observation of nature. For example, several sensors are developed following the study of animals such as reptiles and mammals [9]. 2.4 Robotics Robotics deals with the study of artificial agents able to substitute humans in several tasks. A robotic system is composed by at least four subsystems, namely: mechanical, actuation, sensory, and control subsystem (see Figure 2.2). The mechanical subsystem represents the mechanical structure of the robot such as wheels and legs for locomotion, and arms for manipulation. The actuation subsystem deals with the control laws that rule the actuation of mechanical parts. The sensory subsystem processes the stimuli coming from both proprioceptive (e.g. encoders), and exteroceptive (e.g. cameras, audio, or touch) sensory systems. The control subsystem is responsible to take a decision considering the task 12 2.4 Robotics planning and the environmental constraints. Among several taxonomies, robotics systems can be classified with respect to either their working environment or their mechanical structure. Generally speaking, robots that are classified following their own mechanical structures can be divided in two classes: manipulators and mobile robots. The manipulators are characterized by one or more arms, that are connected by joints. Typically, the manipulator tasks deal with the manipulation of objects in space, such as pick-and-place task in the classical industrial robotics. The mobile robots are characterized by the capability to move inside their working environment. The movement is possible due to wheels or legs, though the control laws will vary with respect to the type of actuation. Moreover, the robotics systems can be classified in two classes with respect to the working environment in which they operate, namely, industrial robots and autonomous robots. The industrial robots work inside a structured environment, meaning that the environment contains landmarks and static features well recognizable by the robots. Moreover, the environment is typically static, meaning that the robot knows a priori its structure. On the other hand, autonomous robots deal with unstructured environments in hostile conditions. The environments are usually dynamic and the robot must take its decisions with respect to the perceived environment state. Typical applications of autonomous robots are security, defence, exploration, and service robotics (i.e. autonomous cars). For a complete review about the previous topics, please refer to [113]. 2.4.1 Industrial Robots Classical industrial robots can be designed and controlled following a well know design pattern, composed by three steps: modelling, planning, and control. Modelling a robot means to define a proper model of the mechanical structure taking into account geometrical, differential, static, and dynamical constraints. Planning deals with the definition of trajectories of the robot with respect to the specific task. In case of manipulators, trajectories can be known whereas in case of mobile robot it could be necessary to take into account a dynamic environment. Knowing the trajectory generated by the planning phase and the sensory information over time, the control subsystem must generate the joints torques that guarantee the robot trajectory. 13 2 Related works Modelling The first step in the design process of a robot controller is to describe its mechanical structure in a formal way. The kinematics model correlates the joints angles with the final position and orientation of the end-effector. Solving the direct kinematic problem means to compute the position and orientation of the end-effector knowing the position of the joint angles, whereas the inverse kinematic solution is to find the joints angles, knowing the end-effector position and orientation. The differential kinematics describes the joints velocities with the angular and linear velocity of the end-effector. The static analysis models the relationship between the force and the torque of the end-effector with the joints torques. The dynamic model of the mechanical structure plays an important role in the design of a robot controller, especially in those cases where the dynamical aspects, such as inertia, are relevant. It describes the relationship between the joints torques and the structure motion, using the Lagrange formulae. The dynamic direct problem means to compute the accelarations, velocities, and positions of joints, knowing their torques, whereas the inverse dynamic problem means to compute torques, starting from accelerations, velocities, and positions of joints. Planning The planning phase deals with the generation of the motion laws for either the joints space or the working space. Dealing with manipulators and mobile robots implies different planning strategies, the former needs the final joints/position configuration with respect to a initial configuration, whereas the latter needs to know a final position in space. If the environment contains obstacles, it is necessary to formalize the concepts of obstacle defining those regions in space that can not be reached by the robot, both for manipulators and mobile robot. The trajectory generation in presence of obstacles refers to the motion planning problem. Control Once the trajectory is known, the robot needs a controller that is able to translate the desired movement into motor commands. Moreover, it could be necessary to take into account the complex kinematics and dynamics of a robot, making the design of a proper controller necessary. The controller receives in input the trajectory given by the planning phase and generates the joints torques to maintain the desired trajectory. 14 2.4 Robotics The controllers are generally based on closed-loop feedback, in order to stabilize the trajectories. There are mainly four type of control: motion control, force control, visual servoing, and optimal feedback control. The motion control works with the joints position or the end-effector position. The force control models the interaction of the end-effector with the environment in order to avoid high force on the end-effector. The visual servoing controller has the visual feedback in the closed-loop, given by a set of cameras. The optimal feedback control has a prediction and an update phase in order to minimize the error between the desired trajectory and the actual trajectory. Closing remarks The industrial robots are particularly suitable for application where the environment is highly structured and static; these properties imply that the robot explicitly knows the structure of its working environment and the objects in the scene are static, except for the robot itself. These robots are able to reach an extremely high precision in positioning the end-effector but the classical techniques, deriving from the classical control system theory, are not very suitable for situations where the environment is dynamic and not structured, and where there could be several robots that cooperate. 2.4.2 Autonomous Robots Despite the advances in the industrial robotics fields, the classical techniques fail dealing with unstructured and dynamic environment. The lack of flexibility makes the classical approaches unsuitable for controlling robots with a certain degree of autonomy. Autonomous robots are robots able to take decisions with partially or no information about the surrounding environment, basing their choice on both previous experience and the objective. Typically, the task is defined at the high level but not at the low level of execution due to the complexity of the environment. The problem is to design (or estimate) a controller that is able to solve a task in an unstructured, hostile, dynamic environment without a full a priori knowledge of the working environment. The development of new techniques became increasingly necessary, and both machine learning and human-driven approaches were proposed. 15 2 Related works Evolutionary Robotics Evolutionary robotics (ER) is a research field where statistical techniques are used to evolve both robot controllers and mechanical structures, emulating the natural evolution [37]. The ER process is composed by several cycles where a population of controllers is evolved in order to estimate the best controller able to solve the specific task in a complex environment. At the beginning, a population of different controllers is generated. Each controller is implemented on a robot (either real or simulated) and, given a task, the controller’s performance is evaluated, using a fitness function. The controller’s performances are ranked and only a small subset of the best controllers are used for a further evolution step. Using a classical genetic algorithm (GA) (for a review see [43]), at the best controllers are applied the mutation and the crossover operators. Given the new population, evolved by the previous one, the controllers evaluation is performed again, repeating the cycle. Typically, these cycles are repeated until the desired performance is reached. An advantage of this approach is that the designer could not known much about the working environment due to its complexity and a dynamic model of both robot and environment could not be available. Moreover the controllers can be of different types but the ER process still remains the same, except for the GA algorithm that needs to know how to apply the operators. In particular, ER can evolve neural networks, system parameters of previously defined controllers, and others. Furthermore, the ER process can be applied to different kind of robotics setups without a conceptually change in the process. However, one of the main disadvantages is related to the definition of both the fitness function and genetic operators. In fact, it is well known in literature that these operators are the key components for the successfully estimation of a robot controller. For a complete discussion about the choice of the fitness function and a complete list of surveys regarding the different specific topics of the ER, please refer to [83]. Developmental Robotics Developmental robotics (DR) is a recent approach to biologically inspired robotics, focusing both on the acquisition of new skills and on the role of morphological development in robot efficiency. A key idea is the concept of embodiment; it claims that the body mediates sensory inputs and actuation, making the body itself an important part on the 16 2.4 Robotics emergence of cognition. Thus, DR encloses the idea that body, environment, and brain are coupled and intelligent behaviours arise by the interaction among them, in such a way that complex behaviours arise from previously developed simple ones. For a review of the theoretical, philosophical, and robotics experiments regarding the developmental robotics, please refer to [66][93]. Developmental robotics is a general methodology to develop a controllers for autonomous robots. Among the different theoretical frameworks the most interesting are: cognitive developmental robotics (CDR) [8] and autonomous mental development (AMD) [131]. Cognitive mental development is focused on the autonomous development of knowledge through the interaction with the environment, whereas the autonomous mental development is more focused on autonomous understanding of the tasks to be accomplished where humans provide the reinforcement information about robot interaction with the environment. Learning from demonstration Learning from demonstration (LfD) is a methodology that enables a robot to correctly choose the proper action with respect to the current state. The robot training is performed through the interaction with a human teacher, following a supervised learning paradigm. The LfD is composed by two fundamental phases: first, collecting the dataset of movements shown by the teacher, and second inferring the controller based on the gathered data. The data acquisition can be performed in two ways: by demonstration or by imitation. In the former way, the teacher directly demonstrates the pair state-action on the real robot, where the latter deals with a demonstration that is not performed directly on the robot but on a different platform. An example of demonstration is the teleoperation, whereas an example of imitation is putting sensors on the teacher. There are three main approaches to learn a control policy: approximating a mapping function, learning a dynamic model of the interaction robot-environment, and representing the policy as a sequence of actions that must be chosen by a planner. An advantage of LfD is the capability to train the robot to execute a specific task without an explicit knowledge of the dynamic system. A main disadvantage is that the robot chooses its action only for those states that are encountered during the training phase with the human operator. For a further discussion about the details of the algorithms and a more focused taxonomy about the different LfD strategies, please refer to [7]. 17 2 Related works 2.5 Closing remarks To reach the goals of this thesis, I need to keep in mind some of the concepts and methodologies developed in the three main research fields described above. The neuroscience provides mathematical models at different levels: first, receptive fields of specific neuron populations and their parametrization; second, connections and underlying principles in the network architectures, that implement specific functionalities, such as lateral connections in posterior parietal cortex [102]; third, intercortical connections, describing the emergence of cognitive functions. However, these models are generally parametric and a learning phase is needed. On the other hand, robotics provides the mathematical framework to design and simulate kinematics chains, useful to test the neurocomputational models. Finally, biomimetic drives, from a theoretical point of view, the study and the adaptation of the neurocomputational models to the robotic framework, suggesting assumptions and simplifications. 18 3 Motivations 3.1 How the mammal brain executes reaching Neuroscience investigates how the human control is distributed among different brain areas and the spinal cord. However, the motor control is only part of the system needed to successfully reach a target; the sensory information processing, one of the most relevant feature of the human brain, is involved. There are strong evidences that a specific sensory information is processed by a small portion of the brain (called functional area). These areas show different functionalities, and are able to extract features, filter the incoming data, and reduce the noise. On the other hand, other brain areas must be able to interpret the neural responses of the sensory areas to correctly compute the position of the target and the trajectory to reach it. The main areas that are known to be involved in the reaching task are: the Primary Visual Cortex, the Posterior Parietal Cortex, and the Motor Cortex. Ignoring other sensory areas involved in the reaching task and that the posterior parietal cortex integrates different sensor modalities, I will only deal with the visual system that is well known to be the major source of sensory information in humans. 3.1.1 Background In the following paragraphs, I make several assumptions and simplifications due to the complexity of the topic and the objective difficulty to describe the whole brain areas interaction solving the reaching task. As already indicated, the brain can be subdivided in different areas, each of them with specific functionalities. This divide et impera anatomical separation of the brain areas leads to a hierarchical organization of the brain functionalities, from the raw sensorimotor perception to high cognitive capabilities. Given a sensory source, the incoming information is filtered along different brain areas, mixed with other sensory information, and used to make an action decision. In the rest of the text, I call a pathway the information flow through different areas focused on achieving a specific objective, such as a reaching task through visual perception. 19 3 Motivations The dorsal pathway is commonly associated to achieving reaching tasks: from the perception of the target to the motor command execution of the arm. Although the dorsal pathway receives sensory information from different sources, such as audio, somatosensory, and video, this dissertation covers only the visual information processing. Following the literature definitions, I call this group of brain areas visual dorsal pathway. Widespread computational mechanisms Despite the huge number of functional areas in the brain that solve different computational problems such as sensorimotor mapping, depth perception, object classification, and motor control, there are some computational mechanisms that are shared among them. These mechanisms arise from the common computational layer underlying each functional area: the neural network. There are, at least, six widespread computational mechanisms that should be taken into account, namely population coding [29][39][61], gain modulation [5][102][135], normalization [15], statistical coding [87], feedback connections, and neural plasticity. Population coding is the mechanism used in the brain to represent sensory information. The responses of an ensemble of neurons encode both sensory or motor variables in such a way that can be further processed by next cortical areas, e.g. motor cortex. One of the main advantage of using a population of neurons to represent a single variable is its robustness to neural noise [29][61]. Gain modulation is an encoding strategy for population of neurons where the single neuron response amplitude is varying without a change in the neuron selectivity. This modulation, also know as gain field, can arise from either multiplicative or nonlinear additive responses and is considered a key mechanism in the coordinate transformations [5][12]. Normalization is a widespread mechanism in several sensory systems where neural responses are divided by the summed activity of a population of neurons to decode a distributed neural representation [15]. The statistical coding is a kind of population coding, especially used for sensory data [87]. It seems to be widespread in the brain areas devoted to preprocess the incoming sensory data [114] and it offers at least two advantages: it reduces the dimensionality of the input space [53] and it gives an interpretation to the topological organization and emergence of the neuron receptive fields [52]. There are different proposed approaches that take into account statistical properties of the sensory input such as Independent Component Analysis (ICA) [51][53] and sparse coding [87]. 20 3.1 How the mammal brain executes reaching Feedback connections are a mechanism implemented between both intra and extra brain areas, involved in the neural implementation of optimal feedback control [120], refining visualspatial estimation, and filtering [59]. It plays an important role in the brain performance but it is typically neglected in the modelling due to the intrinsic mathematical complexity to deal with the feedback. On the other hand, a key feature that plays a very important role is the neural plasticity, also know as learning. It works at different levels, from the single neuron to whole brain areas. The Hebbian learning is the commonly accepted learning principle at network level. The visual dorsal pathway Figure 3.1: The anatomy of the dorsal pathway The visual dorsal pathway is known to play an important role in the vision for action tasks (Figure 3.1). The raw sensory information, gathered by the environment, is filtered and processed to achieve an highlevel representation of the surrounding environment. As already noted, the main areas involved in the visual dorsal pathway are: 1. the Primary Visual Cortex 2. the Posterior Parietal Cortex 3. the Motor Cortex The primary visual cortex (V1 area) receives in input the binocular visual signal previously filtered in the Lateral Geniculate Nuclei (LGN) 21 3 Motivations and in the retina, whereas it produces in output several neural activities related to high-level information extracted by the binocular signals, such as depth, motion, segmentation, and target detection [27][59]. The V1 area is the first one where visual signals coming from the two eyes are combined to compute binocular information, useful for the depth estimation; in the following pages, I will focus deeply on the depth estimation. The posterior parietal cortex (PPC) receives in input the neural activities of the main sensory areas, such as the primary visual cortex, the auditory cortex, and the somatosensory cortex. Its main tasks are related to the visuospatial localization of the body with respect to the surrounding environment, language, attention, and sensory fusion [59]. The PPC projects to the premotor cortex to coordinate the movements. The motor cortex (M1 area) is related to the movements generations whereas each subarea of the motor cortex is related to the activity of a specific group of muscles. The motor cortex receives information by two main sources: the PPC and the somatosensory cortex (for the muscle proprioception); the motor cortex directly projects into the motoneurons. For a reaching task, the motor cortex computes the movement of the arm to reach a target previously perceived by the visual system and localized by the posterior parietal cortex. 3.1.2 The primary visual cortex Figure 3.2: The anatomy of the primary visual cortex area (V1) 22 3.1 How the mammal brain executes reaching Anatomy The Primary Visual Cortex (V1 area) is located in the occipital lobe and it is part of the visual cortex (see Figure 3.2). In the Brodmann classification, V1 is anatomically situated in the area 17. It covers both the hemispheres of the brain, where the left hemisphere receives visual information from the right side of the visual field, and vice-versa, through the Lateral Geniculate Nuclei. The Primary Visual Cortex can be subdivided in 6 different layers, named from 1 through 6, each with its specific functionality. Functionalities Several studies demonstrate that the primary visual cortex deals with the spatiotemporal representation of the visual stimuli. It is the first brain area in the visual pathway dealing with the binocular fusion. In literature, the primary visual cortex is known to solve the problem of depth perception and motion detection. The depth perception is the capability to estimate, from the retinal images, the distance of the objects in the scene with respect to the eyes frame of reference. The depth of the environment is represented through the disparity map, which is the difference between the due retinal stimuli. The motion detection is the capability to estimate the direction and the velocity of a target, moving in front of the visual field. For a further discussion about the fundamentals of the functionalities, please refer to [27]. Neurocomputational models The primary visual cortex solves several tasks but, for the purpose of this work, only the model for depth perception is presented. The disparity map computation seems to exploit, at least, three widespread properties of the brain, namely, population coding [86], normalization [15], and statistical coding [49][114]. Several computational models address the problem of depth estimation, through the computation of the disparity map using either energy model [19][76][124] or a template matching approach [121]. The energy models are based on the study of the primary visual cortex performed by Ohzawa et al. [86]. He found the existence of two types of neurons, namely, simple cells and complex cells. The simple cells directly filter the binocular stimuli coming from the retinas, with their receptive fields, whereas the complex cells gather the simple cells responses to effectively estimate the disparity. Both simple and complex cells have a preferred 23 3 Motivations disparity. This means that their responses is at maximum when they filter visual stimuli with the preferred disparity. Other works found that simple cells receptive fields fit, with a certain degree of confidence, to Gabor filters with a proper parametrization [96]. The energy-based models differ in the neural network complexity and in the internal mechanisms used to make robust the estimation, and can be roughly divided in multi-scale approach [19][76], and single-scale [97][124]. The multi-scale approach improves the robustness of the estimation, with respect to other approaches combining the estimation at different levels of granularity [76]. The simple cells receptive fields have different dimensions on the image planes to filter the visual stimuli at different scales. The single-scale approach uses other mechanisms, such as the combination of both phase- and position- shifts, for improving its performances [97]. 3.1.3 The posterior parietal cortex Figure 3.3: The anatomy of the posterior parietal cortex area (PPC) Anatomy The Posterior Parietal Cortex (PPC) is situated in the Posterior lobe of the brain (see Figure 3.3). In particular, it is after the portion of the Parietal Lobe called Primary Somatosensory Cortex. The PPC can be subdivided into the Inferior Parietal Lobule and the Superior Parietal Lobule, that are anatomically divided by the Intraparietal sulcus. 24 3.1 How the mammal brain executes reaching Functionalities Several studies show that the PPC is involved in integrating sensory information [134], manipulation of objects, sensorimotor mapping [13], and coordinate transformations of different body parts [10][82][135]. These functionalities can be found in several subregions of the PPC: the medial intraparietal area (MIP), the lateral intraparietal area (LIP), the ventral intraparietal area (VIP), and the anteriorintraparietal area (AIP). For a further discussion about the fundamentals of the functionalities please refer to [59]. Neurocomputational models For the purpose of this work, among several functionalities provided by the PPC, the coordinate transformation is the most interesting. The computation of CT seems to exploit three widespread properties of the brain, namely, population coding [61], gain modulation [5][102][135], and normalization [15]. Several computational models of the PPC address the problem of CTs using three-layer feed-forward neural networks (FNNs) [134], recurrent neural networks (RNNs) [102], or basis functions (BFs) [95]. The FNNs and the BFs models are trained with supervised learning technique whereas the RRNs model uses a mix of supervised and unsupervised approaches to train the neural connections, encoding multiple FoRs transformation in the output responses. However, only the BF model exhibits the capability to encode multiple FoRs transformation in the output responses. This result comes out from using a complete set of basis functions and an intermediate frame of reference encoding [134]. It is worth noting that gain modulation plays an important role in the computation of the coordinate transformations but it is still unclear if this property comes out in the cortex as a result of statistical representation of the incoming information. Previous models show that the multiplicative behaviour of gain modulation can arise using supervised learning on a feed-forward neural network [134][135] or putting the gain modulation from the beginning and evaluating which are the conditions to compute coordinate transformation [95][102]. Recently, De Meyer shows evidence to support that gain fields can arise through the self-organization of an underling cortical model called Predictive Coding/Biased Competition (PC/BC) [28]. The PC/BC model is composed by an ensemble of neurons with feed-forward and feedback connections and it is based on the minimization of the residual error between the 25 3 Motivations internal representation of the sensor value and the incoming sensory information. It demonstrates that the gain modulation mechanism arises through the competition of the neurons inside the PC/BC model, and comments on the feasibility of such system to compute CTs. Further experiments on the PC/BC model demonstrate its feasibility to solve the coordinate transformation [82]. For a complete discussion, please refer to chapter 5. 3.1.4 The primary motor cortex Figure 3.4: The anatomy of the primary motor cortex area (M1) Anatomy The Primary Motor Cortex (M1 area) is located in the frontal lobe and it is part of the motor cortex (see Figure 3.4). In the Brodmann classification, M1 area is anatomically situated in the area 4. In humans, the primary visual cortex is part of the central sulcus and the precentral gyrus. The primary motor cortex is anteriorly surrounded by several subregions of the precentral gyrus that are part of the premotor cortex, and posteriorly by the primary somatosensory cortex. Functionalities The aim of the primary motor cortex is to control voluntary movements of the body [59]. Several experiments support this observation. Electrical stimulation of the M1 area causes movements in the subjects [90], 26 3.2 Modelling a biological architecture whereas there is a temporal correlation between the primary motor cortex neurons activity and the intention of movement [38]. Moreover, recent studies show that the neurons activity is modulated by different sensory modalities, such as vision and somatosensation. This kind of modulation implies a rich heterogenicity in the response properties of the primary motor cortex neurons [45]. For a further discussion about the fundamentals of the functionalities, please refer to [59]. Neurocomputational models The most interesting functionality of the primary motor cortex is the ability to encode a movement in neurons activity. The encoding of the movements seems to exploit at least three widespread computational mechanisms, namely, population coding [29][39], feedback connections, and normalization [15]. Several computational models of the primary motor cortex address the movement encoding using linear models [39], multiple regression model [88], and Kalman filter [132][133]. However, these models consider the firing rate as a continuous statistical distribution whereas the spike trains are generally discrete; to take into account this fact, another class of models is proposed [64]. On the other hand, the primary motor cortex can be viewed as a part of a dynamical system that controls and generates movements; in this case the feedback connections play an important role for driving arm movements to the target [108][120]. Interestingly, Churchland et al. show that the population activity contains a strong oscillatory component even for non periodic behaviours [25]. Anyway, the movement generation can be also viewed as the combination of motor primitives [105]. Following this view, the primary motor cortex interacts with the cerebellum to choose which motor primitives should be combined in order to generate the desired movement [118]. This topic will be fully discussed in chapter 7. 3.2 Modelling a biological architecture Even though the proposal of a taxonomy for designing a biologically inspired architecture is out of the aim of this work, I specifically introduce a generic 3-layers architecture, representing the interaction among several levels of computation, from the basic mechanics to the cognitive aspects (see Figure 3.5). The architecture is a conceptual sketch of how the different robot part could interact. According to the principle of ecological balance the architecture provides a flexible network that 27 3 Motivations Figure 3.5: This sketch represents the 3-layers architecture, which supports this work. The architecture is composed by 3 layers. The mechanics and sensors layer is the physical robot (or its model in case of simulation). The electrical/actuators layer is the bridge between the low-level neural circuits and the robot; it is the control interface and it implements how the neural activity is translated in actuation. The Neural lattice layer is the brain model and it is fairly composed by at least two sublayers: the neural circuits and the cognition. The neural circuits layer contains the biologically inspired models of the brain functional areas; their main functionalities are the information processing, the sensorimotor mapping, and the motor representation. The cognitive layer contains other neural circuits that elaborate information at an higher level, taking into account motivations and goals. In my study, each layer can communicate either with the lower or the higher one. 28 3.2 Modelling a biological architecture should adapt with respect to the robot morphology, the sensory system, the working environment, and the way of actuation [92]. The mechanics and sensors layer represents the physical body of the robot, including, but not limited to, sensors, motors, links technology, materials, and robot morphology. This layer is as general as possible because a strong assumptions of biologically inspired controllers is the capability to implement common bioinspired computational principles and, at the same time, to exploit different robot morphologies, up to a learning period [91][92]. The Electrical/Actuators layer is a bridge between the neural circuitry of the above layer and the physical robot actuation of the bottom layer. It contains neural circuitry as the above layers and it implements the actuation circuitry. Its main role is to translate neural population activities into proper motor commands for the physical robot. To clearly outline the matter, let suppose to have a neural population encoding in degrees the movement of a robotic arm of 1 degree of freedom (DOF). Now suppose that the neural population has an activity representing a relative movement in degree with respect to the actual arm position. The neural circuit implicitly has knowledge about its body shape (as in the embodiment paradigm [93]) but it could not know anything about the actuation of the physical arm that can be actuated through servomotors, DC motors, McKibben muscles, etcetera. The role of the Electrical/Actuators layer is exactly to fill the gap between the representation of the movement and its actuation. For our purposes, the motoneurons are an example of neural mechanism belonging to this layer. The neural lattice is a layer that briefly models the brain. Among the others, I focus on two broad functions: the low level information processing, for both robot and environment state estimation, and the cognition. Both share the same neural lattice, although these functionalities reside in anatomically separated regions of the brain. I expect that the interaction between the cognitive circuits and the neural circuits (for low level computations) exploits the morphology of the robot, even though some mechanisms could be independent from the morphology whereas ad-hoc mechanisms could exploit specific body shapes. The sketch diagram shows an overlapping region between the two sublayers, representing that the cognition influences the low level computation, whereas the low level computation is part of cognitive decisions. The neural circuits sublayer contains neural populations, essential to perform low level computations, sensory processing and filtering, and information fusion. From a neuroscientific point of view, this sublayer contains those cortex neural populations associated with the functional areas of the brain such as the primary visual cortex for depth estimation, 29 3 Motivations the posterior parietal cortex for sensorimotor mapping, the motor cortex for movement representation, the cortex areas associated to long-term memory, and so on. Other non cortical areas reside in this sublayer too, such as the thalamus, the amygdala, and the cerebellum. These circuits have knowledge about the physical body sensory inputs and the body shape in terms of degrees of freedom, but their role is to produce a distributed representation of both the environment and the body state, without having knowledge of the ultimate goal. The cognition sublayer represents the cognitive aspect of the robot controller. It drives the behaviour with respect to both innate and developed goals, learned through its existence. This sublayer is, as usual, implemented by neural populations but it is worth noting that it specifically focuses on the interaction among the neural populations of the neural circuits sublayer. In this sublayer, the cortex information, coming from the neural circuits layer, are integrated with ancient brain areas, such as thalamus and amygdala (encoding the developed and the innate goals, respectively), in order to take a decision. The decision is influenced by the current environment and robot state, and by the current goal. The neural responses drive the neural circuits sublayers to an intelligent behaviour for an external observer. It is worth noting that the main characteristic of this sublayer is the focus on the inter cortical connections, that make possible to build an abstract representation for decision-making. This architecture is a proposal to organize the experiments performed in this thesis, pointing out the relevant aspects of each proposed neural model. Moreover, it would be a common lattice to clearly identify each neurocomputational model for robotics with respect to their role and their output. On the other hand, it is important to define an approach for modelling biologically inspired systems [94][130]. Roughly speaking, I have chosen a modelling process composed by three steps: 1. investigate neurocomputational findings and select a relevant model of the region of interest; 2. extend the experimental results of the selected model in order to speculate about undiscovered properties; 3. make hypotheses about new properties, extend the model, and investigate them through simulations. I have applied this modelling process for each experiment that I will describe in the following chapters. This process is fairly independent 30 3.3 Proposed neural models and their role from the level of abstraction of the investigated model [130]. However, the kind of hypotheses, that will be proposed in the third step, depends on the level of detail of the model itself. This process is in accordance with the modelling process proposed by Webb [130], where the main difference is that I have used, as a source of information, a previously proposed model. 3.3 Proposed neural models and their role This thesis is focused mainly on the top layer, namely, the cognition and the neural circuits. Experiments will be presented in the following chapters by using the bottom-up approach: first, I will present the experience with the neurocomputational models for robot controlling and then I will introduce a cognitive architecture based on biological evidences of the interaction among the cortex, the amygdala, and the thalamus. A synoptic description of the performed experiments is shown in Table 3.1 and, in the following chapters, I will focus on each experiment, specifically showing the main features and what they aim at. A model and the experimentation of the primary visual cortex are proposed in chapter 4. The aim of this experiment is to develop a biologically inspired neural network that is able to compute the disparity map of the environment perceived through a stereo camera. This experiment makes some assumptions: the underlying neural network has a fixed architecture, where each layer has a computational meaning (such as the spatial pooling layer); the receptive fields of the binocular neuron, on the retina image planes, are constant in time and represented as Gabor filters with a proper parametrization. It implies that the receptive fields are not learned; the number of neurons is fixed in the architecture, whereas the neurons model follows the disparity energy model proposed by Ohzawa [85]. Going on with the visual dorsal pathway, a model of the posterior parietal cortex is proposed in chapter 5. The aim of this model is to develop a biologically inspired neural populations that is able to learn its visuomotor mapping through the interaction with the environment. I make some assumptions also in this case: the underlying neural network has a fixed architecture and each layer has a population activity that can be decoded to obtain a human readable format; the receptive fields are learned through a learning phase that uses an unsupervised learning technique. It will be clear that there are two distinct learning phases, both based on the classical Hebbian learning; the number of neurons is 31 3 Motivations fixed in the architecture, whereas there are two different neuron models: the PC/BC model [28], and the classical perceptron model. Moreover, a model of the Hering’s law of equal innervation is proposed in chapter 6. The aim of this model is to develop a system able to compute the arm motor commands for reaching tasks, given a learned visuomotor mapping. The model estimates the motor commands of a 3 DOF arm able to reach a target in space only knowing its position in both the stereo image planes. The system is composed by two layers: the first layer is a dynamical systems modelling the Hering’s law of equal innervation that produces, in output, the head joints angles commands to foveate the target, whereas the second layer is a radial basis function (RBF) network that computes the arm joints commands to reach the target, knowing the head foveation angles. The experimental hypothesis are: the RBF network has a fixed architecture and the number of neurons is fixed; the network training phase is based on a classical supervised technique, namely gradient descent. Even though the Heringbased model partially covers some capabilities of the PPC model, there are several reasons to take it into account. First, the PPC model is a pure computational model whereas the Hering model provides an actuation and second, the PPC model provides a computational mechanism that, in principle, can be replicated in different robot morphologies whereas the Hering model suggests a control strategy for stereoscopic robots. Finally, a cognitive architecture (also known as Intentional Architecture -IA- or IDRA - Intentional Distributed Robot Architecture -), mimicking the interaction among the thalamus, the cortex, and the amygdala, is proposed in chapter 7. The cognitive architecture is based on recent findings in neuroscience and, at the best of my knowledge, it is the first proposal to model a middle level of cognition. The aim of this last model is to design a biologically inspired architecture that could develop new goals starting from innate goals, during the interaction with the environment. The focus is on the interaction among different brain areas instead of trying to model every single details of each area. This experimental model is based on some hypothesis in order to focus on the arising of complex behaviours: the neurons are classical perceptrons; the learning phase is a mixed approach because it uses both unsupervised learning and self-generating reinforcement learning; the neural architecture is variable with the possibility to recruit new neurons and add other layers. The goals drive the generation of motor commands that improves the goal reaching. A comparative overview of the performed experiments can be found in Table 3.1. The first three entries represent the experiments related to neurocomputational models at the Neural circuits sublayer, whereas 32 [82] Neural circuits Neural circuits Cognition PPC Hering IA reinforcement supervised unsupervised supervised Learning method mix RBF PC/BC [28] energy model Type of neurons variable fixed fixed fixed Number of neurons variable fixed fixed fixed Number of layers Output goals generation motor commands sensorimotor mapping disparity map Table 3.1: A proposal of classification of the experiments. The first three entries are experiments at the Neural circuits level (see Figure 3.5) where the neural architectures are developed focusing on the computational mechanisms founded in the brain. The last entry is an experiment that propose a cognitive architecture, focused on the interaction of the cortex with the other brain areas. It is worth noting that the computational network, once learned, does not need other learning phase, whereas the cognitive model has a variable neural network that learns during the interaction with the environment. [68] [69] [80] [81] [78] [75] [79] [77] [76] Neural circuits V1 References Layers (Fig. 3.5) Model 3.3 Proposed neural models and their role 33 3 Motivations the last entry contains the features of the cognitive experiment. Interestingly, the neurocomputational models have more or less similar characteristics except for the learning method. However, it is worth noting that, for these models, the learning phase is done a priori, before evaluating the performances. On the other hand, the cognitive model has a learning phase that is performed real time and it models a continuous learning during the interaction with the environment. These facts are reflected on the architecture plasticity. Furthermore, each neurocomputational model can be implemented using different types of neurons, even if all models share some of the neural computational principles previously described, such as the population coding mechanism. The cognitive model implements the population coding mechanism also considering that the neural populations can grow during the interaction with the environment. In other words, the cognitive architecture is more focused on the interaction among different neural networks (encoding also ancient areas of the brain, not only the cortex), whereas the neurocomputational models are focused on intra cortical mechanisms. Despite the huge amount of literature regarding biologically inspired controllers and neurocomputational models, it is far to be completely understood which is a common methodology to design biological architecture. In particular, there are at least three key features to take into account: the type of learning, the neural architecture, and the encoding strategy. The learning phase can occur in different stages of the design process and can be either real time or offline. Moreover, there is not a common way to design a neural network that can encode the underlying mechanisms of the cortex. Finally, it is quite clear in literature how the information is propagated inside the same functional area of the brain, but it is difficult to integrate different cortical models because there is not a common encoding strategy that allows to share the same information. In the following pages, I will introduce why these models are relevant and why the performed experiments constitute a further step in the investigation of the biologically inspired models. 3.4 The proposal of a roadmap for developing bioinspired architectures The aim of this work is the proposal of a roadmap in order to design a biologically inspired architecture. The designing phase should be enough flexible to adapt to different morphologies, using the same principles 34 3.4 The proposal of a roadmap for developing bioinspired architectures to exploit different body shapes properties; however this phase should provide a certain degree of adaptability to learn how to control the specific robot. This roadmap is a proposal of investigating potentially relevant models, for further integration in more complex biologically inspired architecture. In Section 3.1 I have introduced the brain areas involved in reaching a target. As I have already discussed, these areas share the same representation mechanisms and are involved in the fusion and filtering of different sensory sources. Moreover, the visual dorsal pathway is able to compute the arm trajectory to reach the perceived target. Even though the visual dorsal pathway is modelled only through merely computational phases, it is worth to develop it for robotics. The primary objective of robots is the capability of substitute humans in their tasks, and most of those tasks frequently require to reach a target (e.g. an object). So, reaching a target, or identify it, is the essential task that a robot must accomplish. For this reason, this work specifically focuses on the reaching problem related both to neuroscience and robotics (see Section 3.2 and 3.3). Although, the reaching task is the key computational features of a robot, it should be also driven by a motivation. The capability of computing the arm trajectory in order to reach a target is a merely computational mechanism that does not need a motivation, or a cognition. For this reason, this work takes into account those cognitive aspects that drive the pursuit of a goal. As discussed in Section 3.2 and 3.3, recent advances provide the theoretical and scientific background to propose a cognitive architecture based on the interaction of different brain areas that are able to develop goals after the interaction with the environment. The experiments presented in this thesis should draw the roadmap for a completely new generation of biologically inspired architecture. The experiments can roughly be divided in two categories: those proposing a computational mechanism of the cortical areas (Chapter 4, 5, 6) and those proposing a biologically inspired cognitive architecture (Chapter 7). The first set of experiments, as depicted in Section 3.3, models different cortical areas and it is worth noting that they share the same computational principles, regardless both the learning mechanism and the type of neurons. This set of experiments clearly shows that the computational mechanisms can be implemented following either a previously developed neuroscientific study (such as V1 or Hering) or through unsupervised learning. This points out that the computational mechanisms could emerge through the interaction with the environment, exactly as in the second set of the experiments. On the other hand, the second set of experiments points out that only the interaction among different brain areas models could drive the pursuing of a goal and the generation 35 3 Motivations of new ones. A biologically inspired architecture has at least two advantages: it is not task-specific and, using the same underlying neural substrate, it is possible to obtain both computational mechanisms and cognitive functions. First, it departs from the classical approach in designing robotic applications based on mathematical tools to specifically construct the desired trajectory, in position, velocity, and acceleration. Those classical approaches need to know exactly the dynamic model of the robot, whereas a biologically inspired mechanism can adapt to different robot body, exploiting its morphology and without expressly considering the dynamic model. Second, an advantage is related to the designing process itself. In fact, in this roadmap, cognitive functions and computational mechanisms share the same underlying neural substrate and the training of the neural substrate can be obtained also with the interaction between these two layers. Considering that this is a proposal of a roadmap for developing a biologically inspired architecture, encoding both cognitive functions and low level computational mechanisms, I recognize that further efforts are needed to get the whole goal. 36 4 A Primary Visual Cortex Model for Depth Perception 1 The human primary visual cortex estimates the environment depth starting from a pair of stereo images. The visual signal is captured by the photoreceptors of the retina; after a brightness pre-processing, the signal is sent through the Lateral Genicolate Nucleous (LGN) to the primary visual cortex (V1). From photoreceptors to LGN the visual signal is strictly monocular, so there is no way to estimate the depth of the environment using stereo image cues (e.g. the retinal disparity). It is known that the depth perception depends primarily on information about retinal disparity and not on other information cues coming from high-level decision areas (like prefrontal cortex) [58]. So, it exists an ”automatic process” to estimate disparity that does not imply reasoning. Therefore, if it was possible to simulate the mechanisms that underlie the depth perception, I could design a system to estimate retinal disparity in a bio-inspired way. From the humanoid robotics point of view, that involves different disciplines from biology to engineering, the ultimate goal is to build a humanoid that can interact with humans ”like” a human. One of the main issues is to design a controller for humanoid robots is to deal with the vastness of the information available in the surrounding environment; it is not feasible to simply copy a biological system ”as is” (and this fact is true in general, even for biological systems much simpler than humans). Rather, the goal is to discover principles that underlie the biological control and try to transfer those to humanoid robotics [93]. Besides, the field of bionics seeks to design robots that mimics biological structures and recent works show that a successful design rely on embodiment [93][127]. It follows that the design of the controller (the central nervous system) is inseparable by the morphology of the robot because both affect the efficiency of the robot [93][67]. So, if I intend to design a complete controller for a humanoid antropomorphic robot that must interact with humans in hostile environments, and with potentially infinite situations, it could be suitable to try to exploit the intrinsic structure of the brain for information processing. 1 adapted from [76][77]. 37 4 A Primary Visual Cortex Model for Depth Perception The idea is to develop a vision system that can be easily integrated in a more complex architecture which must take into account also the structure and the embodiment of the humanoid. Section 4.1 introduces the recent literature on neural approaches to stereo systems based on neuroscience evidences, section 4.2 shows the neural architecture modelling the primary visual cortex, section 4.3 introduces the experimental results, and section 4.4 draws the conclusions. 4.1 Related works Generally, stereo vision addresses the research field that studies how to elicit some interesting features from two images; one of the main features of interest is the perception of depth. Then, generally speaking, I am searching for algorithms or methods that allow to perform the perception of depth with adequate reliability. In this section, I introduce the main algorithms and methods published in recent literature. The first model that explains the intrinsic function of neurons in the primary visual cortex is the disparity energy model [86][85]. This model proposes the existence of two different types of neurons, called respectively simple and complex cells, and explains how they communicate in order to maximally respond in presence of the disparity for what they are tuned. It also shows that the receptive field of binocular simple cells can be approximated as Gabor filters, with proper parametrization (see equation 4.1). The Gabor filter is typically used in signal processing and it has several interesting properties (for a detailed description see [65]). However, this model does not explain how it is possible to produce a reliable disparity map. A first systematic study to explain some properties of the disparity energy model, starting from its mathematical definition, is developed in [96][97]. They introduce topics as spatial pooling, scale pooling, and orientation pooling; each simulation presented there is done with random dot stereograms synthetic images (RDS) [58]. The usage of synthetic images is reasonable in order to explain properties but in general it is not meaningful to evaluate the performance of the system with them, because the synthetic images have a simplified spatial structure and luminance intensity with respect to natural scenes, and in my experience the result of experiments with RDS are misleading. In [35], the firing rate of complex neurons is taken as an evidence of the disparity estimation. So, given different complex neurons with different preferred disparities, the estimated disparity is equal to the preferred 38 4.1 Related works disparity of the most responsive neuron of the population. A template-matching approach is proposed in [121]; it is based on a scalar measure of the mismatch between the neural responses and the templates of responses, given a specific disparity. In [19], a model that successfully integrates spatial pooling, orientation pooling, and scale pooling is proposed; they present a coarse-to-fine mechanism with phase and position-shift integration. Due to the neural architecture, the system is robust to the image acquisition noise but the disadvantage is that the system was designed to work with small disparities and typically in real scenes the disparity range is quite wide. Another approach to design a bio-inspired vision system is discussed in [123], where the cooperation between phase- and position-shift mechanisms is interpreted in a way quite different respect to [19]. Both methods are based on disparity energy model but in [123] the system is intrinsically mono-scale and mono-orientation and it provides a mechanism in order to evaluate a large range of disparities. Besides, the authors suggest to use a normalized feature in order to assess whether the estimation is reliable or not and if an image point belongs to occluded regions. Their results, compared to [19], seem to denote a better ability to estimate the disparity map; however this architecture does not include the orientation and scale pooling, that should improve the disparity estimation. In [124], the previous model is extended with orientation pooling in order to accumulate ”evidence” to support a disparity hypothesis. This model is mainly based on a Bayes filter and it uses a Bayes factor to test the hypothesis with the maximum support. Moreover, the proposed model identifies the occluded pixels. In [124], they publish the results of the model tested on Middlebury stereo images [106]. Another approach to disparity map estimation, proposed in [20], is essentially a coarse-to-fine algorithm, with orientation and spatial pooling. To obtain a robust disparity estimation, a weighted sum of the complex responses for each orientation is computed. Then, they define a vector disparity as the vector difference from corresponding points in the left and the right images, that permits to evaluate disparities that are not only horizontal. In fact the model can estimate even disparities that have also a vertical component (it is necessary to compute this component when the principal axes of the stereo cameras are not parallel). My model combines the technique proposed in [124] for the computation of large disparities and the capability of the neural architecture of [20] to estimate the vertical component of the disparity. Moreover, in order to improve the robustness, we introduce a weighted coarse-to-fine 39 4 A Primary Visual Cortex Model for Depth Perception mechanism in a way similar to [19]. 4.2 Neural Model The primary visual cortex (or V1 area) is the first area that integrates information afferent from both the eyes to produce a three-dimensional representation of the environment based on two-dimensional retinal images; this process is also called binocular fusion. The perception of the environment depth is closely related to the estimation of retinal disparity; retinal images are not strictly equal because of the physical distance between the two eyes. Computing the disparity between the two retinal images allows to estimate the environment depth, relative to the fixation plan determined by the eyes convergence. For my purpose, the problem of depth perception can be reduced to the computation of retinal disparity. 4.2.1 Image preprocessing Generally the image sizes are not known at prior, so I need to develop a system able to deal with stereo pairs of different dimension. Obviously, this approach involves to take care about some system parameters, as I will explain later. From now on I consider only aspects that are independent of the size of the images. The acquired images are, in general, color images so it is necessary to convert them to luminance data in order to preserve only the intensity component. After this, the mean luminance is subtracted from the pair of the two images in order to improve the edge enhancement, like the human retina [85]. 4.2.2 Disparity energy neurons The disparity energy model explains the response properties of the binocular neurons in V1 [86]. This model uses two types of neurons, the simple and complex cells, that are tuned to specific disparities [85][96]. The left and the right images coming from the preprocessing stage are then filtered with Gabor filters of different orientation, scale, and shape, according to the disparity energy model and the coarse-to-fine technique with both the phase and position shift mechanisms [19]. Let rs and rq be the simple and the complex response of the simple and the complex neurons respectively, and let g(x, y, θ, φ, ∆φ, ω) be the Gabor filter. Then, g(x, y, θ, φ, ∆φ, ω) = s(x, y, θ, φ, ∆φ, ω)w(x, y, θ) 40 (4.1) 4.2 Neural Model where s(x, y, θ, φ, ∆φ, ω) is a cosinusoid and w(x, y, θ) is a 2D Gaussianshaped function (known as envelope). The cosinusoid is defined as follows, s(x, y, θ, φ, ∆φ, ω) = cos(ω[x cos θ + y sin θ] + φ + ∆φ) (4.2) where ω is the preferred spatial frequency, θ is the filter orientation, φ is the phase parameter that will be used in the complex response mechanism (to define a quadrature pair of simple responses) and ∆φ is the phase difference between a pair of receptive fileds (RFs). The envelope is defined as follows, w(x, y, θ) = k exp(− [x cos θ + y sin θ]2 [x sin θ + y cos θ]2 − ) 2σx2 2σy2 (4.3) where σx and σy define the envelope dimensions (and the RF extension) and k, involved in the filter gain, is defined as k= 1 2πσx σy (4.4) Therefore the receptive fields (RFs), based on biological evidences [86][85], can be modelled as, gl (x, y) = g(x, y, θ, φ, ∆φ , ω) 2 (4.5) ∆φ , ω) (4.6) 2 where l, r subscripts represent the left and the right RF, respectively; d is the position-shift parameter, ∆φ is the phase-shift parameter. Then the simple cell response is written as, gr (x, y) = g(x − d, y, θ, φ, − Z ∞ Z ∞ rs = { −∞ [gl (x, y)Il (x, y) + gr (x, y)Ir (x, y)]dxdy}2 (4.7) −∞ where Il and Ir are the input images coming from the preprocessing stage. Finally, the complex response cell is defined as, rq = rs,1 + rs,2 (4.8) where rs,1 and rs,2 are simple responses in quadrature phase, i.e. φ1 = 0, φ2 = π2 and ∆φ1 = ∆φ2 . The preferred disparity of the complex cell response is given by, ∆φ Dpref = +d (4.9) ω sin θ 41 4 A Primary Visual Cortex Model for Depth Perception Figure 4.1: Proposed neural architecture which means that the complex cell will response maximally when the RFs of the complex neuron contain the preferred disparity. 4.2.3 Neural Architecture In this section I explain the neural architecture of my system (see Figure 4.1). Motivated by the previous section I integrate some interesting features of other proposed models. For each pixel in the left image I want to estimate the corresponding pixel in the right image to produce the disparity map. The spatial frequency is ω = π2 and σx = 2. The position shift across the population is ∆C = {0, 1, ..., 55}. The aspect ratio is 2. Spatial pooling To improve the response of complex cell it is possible to take into account the physiological fact that the RF size of the biological complex cell is larger than that of the complex cell model [96]. Moreover, a given complex cell response is improved by the responses of the complex cells of the near neurons. This observation is included into the model by averaging several pairs of complex cells with overlapping RFs. Spatial 42 4.2 Neural Model pooling can be mathematically defined as, 1 rc (x0 , y0 ) = (a + 1)2 x0 + a2 y0 + a2 X X rq (i, j)w(i, j) (4.10) i=x0 − a2 j=y0 − a2 where rq is the complex cell response at the (i, j) stereo images location (see equation 4.8), w(i, j) is a spatial weighting function and rc is the spatial pooling response of the complex cell with RFs centered at (x0 , y0 ) over stereo images. In this system the chosen weighting function is a symmetric two-dimensional gaussian with σpooling = 2σx . Normalized response In [124], they propose to evaluate the population responses at different locations in order to estimate the position-shift component and to refine the estimation with the chosen population via phase-shift mechanism. To evaluate the position-shift component they suggest to use a normalized feature R defined as, R= P −M M (4.11) where P and M are respectively the peak and the mean of the population response curve. It can be demonstrated that the feature R takes values between 0 and 1, since M ≥ (P − M ). In order to choose the most probable position shift, they choose the population response curve (each of them is localized at different horizontal locations) that maximizes the feature R∆c , where ∆c denotes the current position disparity. Due to the good properties of this normalized feature I use it in order to estimate the disparity. Orientation pooling According to [124], I implement a similar orientation pooling mechanism. b= R X wθ R∆C,θ (4.12) θ b formally depends from the position shift ∆C and the scale, where R R∆C,θ is the normalized response at equation 4.11 and the weights wθ , that are estimated through an exhaustive search in the space problem using a set of Middlebury stereo images. According to previous results [19], I use 5 orientations ranging from −60o to 60o in 30o steps. 43 4 A Primary Visual Cortex Model for Depth Perception Scale pooling Differently from [124], I propose to introduce a scale pooling phase in order to make robust the estimated disparity. However, despite [19] and [20], I introduce the scale pooling as weighted average across scales. Formally, R= X b ws R (4.13) s where ws are the weights that are estimated through an exhaustive search. Due some empirical results, I choose to use only two scales because the overall performance seems to increase for two scales only. Now, taking the maximum R at each position shift location, I estimate the most probable disparity. 4.2.4 Disparity direction For each pixel it is possible to determine the direction of the estimated disparity. I take the preferred direction of the filter (i.e. the normal direction respect to the principal axis of the filter) that is the direction of the bi-dimensional preferred disparity associated to the corresponding complex neuron. In [20], the authors propose a weighted sum of the complex cell responses for each orientation (i.e. center of gravity). Here, I propose a weighted average of the estimated disparity for each orientation with optimized weights (estimated with an exhaustive search through the orientations). Formally, V (x, y) = X wθi dθi (4.14) i where wθi is the estimated weight that depends from the orientation θ (again, it is estimated through an exhaustive search) and dθi is the vector with module equal to the estimated disparity at the given orientation θ. The resultant vector V (x, y) will have the direction of the estimated disparity (in my case I expect to have always horizontal vectors because my test images have only horizontal disparities). My simulations show the effectiveness of the proposed formula, see Table 4.1; for each pixel in the images I compute the estimated disparity direction and after that I extract the mean square error. 44 4.3 Experimental Results Stereo images Venus Cones Teddy Tsukuba Mean Square Error [rad2 ] 0.043 0.036 0.094 0.180 Table 4.1: The angle deviation from the optimality (a) Cones ground truth disparity map (b) Cones estimated disparity map Figure 4.2: Cones estimation 4.3 Experimental Results In this section I present the results obtained through simulation. The bio-inspired model was first coded in Matlab to prove its correctness and afterward, in order to minimize the computational time, some key functions (e.g. 2D convolution) were implemented in CUDA. Qualitatively I can perform a dense stereo estimation map of 383x434 pixels (the size of some Middlebury images) in about 12 seconds. The performed simulations are relative to the disparity estimation and the estimated disparity direction. The estimated disparity maps are then submitted to the Middlebury evaluation system and the results are presented (see Figure 4.6). The evaluated stereo images are Cones, Teddy, Venus and Tsukuba (see Figures 4.2, 4.3, 4.4, 4.5). It is worth noting that the proposed algorithm is biologically plausible and this should be taken into account when comparing my approach to the other algorithms. I have obtained an improvement of the performance with respect to [124]. 45 4 A Primary Visual Cortex Model for Depth Perception (a) Teddy ground truth disparity map (b) Teddy estimated disparity map Figure 4.3: Teddy estimation (a) Venus ground truth disparity map (b) Venus estimated disparity map Figure 4.4: Venus estimation 46 4.3 Experimental Results (a) Tsukuba ground truth disparity (b) Tsukuba estimated disparity map map Figure 4.5: Tsukuba estimation Figure 4.6: Comparison among the proposed neural architecture for disparity estimation and some state-of-art algorithms [106]; the table is extracted from the online evaluation page of Middlebury Database. The simulations also show that the architecture, with the weighted sum of the direction of the oriented Gabor filter, is able to correctly identify the horizontal directions of the disparity (remember that the stereo images from Middlebury database have only horizontal disparities). Simulations should be performed in order to validate the model even for non-horizontal only disparities. With respect to the performance reported in [20], I have obtained comparable results, in terms of bad pixel errors. 47 4 A Primary Visual Cortex Model for Depth Perception 4.4 Conclusions In this chapter I have presented a bio-mimetic system that computes a disparity map starting from a pair of stereo images. Previous works show the possibility to develop a bio-inspired system and here I have proposed a different bio-inspired mechanism in order to improve the performance reported in [20][124]. In fact, experimental evidences show that the system is more reliable for small disparities than for large disparities [19]. However, as previously shown, natural images have a wide range of possible disparities even in the same scene. One way to overcome this issue is to use coarse Gabor filter large enough to cover large disparities, but empirical evidence seems to indicate that the estimation is not reliable. Besides, the computational cost is too expensive and a further research is needed to adopt the simulator for a real-time implementation, such as implementing the network on real hardware to exploit the parallelism. Another approach, implemented in my system, is to use a smaller coarse Gabor filter in order to cover small but reliable disparities [19] with an initial position-shift [124] mechanism at the coarse scale. Experimental evidences and the results presented in [124] seem to confirm the reliability of this type of approach. I have proposed a strategy in order to integrate the pooling mechanisms, proposed in [19] and [20], with the position-shift selection at coarse scale based on a bio-mimetic feature analysis proposed in [124]. The obtained results point out that the model of the primary visual cortex is a possible candidate for a physical implementation in hardware. Other works show this possibility but in general these implementations do not include the pooling mechanisms of orientations and scales [111][122], while it has been pointed out how their contribute is fundamental in natural scenes. The output of the system is a disparity map with associated disparity directions (in the current implementation only horizontal directions); with these decoded features it is possible to correctly estimate the depth of the environment. 48 5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem 1 In humans, the problem of coordinate transformations is far from being completely understood. The problem is often addressed using a mix of supervised and unsupervised learning techniques. In this chapter, I propose a novel learning framework which requires only unsupervised learning. I design a neural architecture that models the visual dorsal pathway and learns coordinate transformations in a computer simulation comprising an eye, a head and an arm (each entailing one degree of freedom). The learning is carried out in two stages. First, I train a posterior parietal cortex (PPC) model to learn different frames of reference transformations. And second, I train a head-centered neural layer to compute the position of an arm with respect to the head. My results show the self-organization of the receptive fields (gain fields) in the PPC model and the self-tuning of the response of the head-centered population of neurons. This chapter is organized as follows. In Section 5.1 I present the related works, in Section 5.2 I design the neural network model that performs the implicit sensorimotor mapping, in Section 5.3 I present the performed experiments and in Section 5.4 I derive my conclusions. 5.1 Related works A coordinate transformation (CT) is the capability to compute the position of a point in space with respect to a specific frame of reference (FoR), given the position of the same point in another FoR. The way the mammal brain solves the problem of CTs has been largely studied. Nowadays it is fairly well known from lesion studies [108] that the main area involved in this type of computation is the Posterior Parietal Cortex [4][48]. The computation of CT seems to exploit two widespread properties of the brain, namely, population coding [61], and gain modulation [5][102]. 1 adapted from [82]. 49 5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem Population coding is a general mechanism used by the brain to represent information both to encode sensory stimuli and to drive the body actuators. The responses of an ensemble of neurons encode both sensory or motor variables in such a way that can be further processed by the next cortical areas, e.g. motor cortex. There are at least two main advantages of using a population of neurons to encode information: robustness to noise [61] and the capability to approximate nonlinear transformations [95]. Gain modulation is an encoding strategy for the amplitude of the response of a single neuron that can be scaled without changing the response selectivity of the neuron. This modulation, also know as gain field, can arise from either multiplicative or nonlinear additive responses [5][12]. Several computational models of the PPC address the problem of CTs using three-layer feed-forward neural networks (FNNs) [134], recurrent neural networks (RNNs) [102], or basis functions (BFs) [95]. The FNNs and the BFs models are trained with supervised learning technique whereas the RRNs model uses a mix of supervised and unsupervised approaches to train the neural connections, encoding multiple FoRs transformation in the output responses. It is worth noting that gain modulation plays an important role in the computation of the coordinate transformations but it is still unclear if this property emerges in the cortex from statistical properties of the afferent (visual) information. Recently, De Meyer shows evidence to support that gain fields can arise through the self-organization of an underling cortical model called Predictive Coding/Biased Competition (PC/BC) [28]. It demonstrates that the gain modulation mechanism arises through the competition of the neurons inside the PC/BC model, and comments on the feasibility of such system to compute CTs. These computational models of the PPC could be particularly suitable for the robotics community to solve the well-know problem of CT. In the recent past, it was proposed an architecture that explicitly includes a PPC model composed by a set of radial basis functions trained with supervised learning techniques [21]. However, most of the approaches in robotics address the problem of FoR transformation inside the more general sensorimotor mapping approach, without explicitly exploit the features of PPC models [48]. Following these ideas, I present a biologically inspired model for CTs. First I describe the training of a PPC model with an unsupervised learning approach; and second I introduce the computation of the arm position with respect to the head position. I hypothesise that gain modulation mechanisms can emerge in the PPC neurons, and that basis functions, encoding parallel CTs, can emerge after the training phase. 50 5.2 Neural Architecture Retinal position Head Eye ex ax rx Arm rx PPC layer y x ex K Head-centered h W Eye position Figure 5.1: (Left pane) Body definition composed by an eye, a head and an arm with the same origin. (Right pane) Neural Network model. The first layer encodes the sensory information into a neural code, the second layer models the posterior parietal cortex and it performs the multi sensory fusion and the third layer encodes the arm position with respect to the head frame of reference. The main contributions of this work are: first to show an unsupervised approach to the learning of sensorimotor mapping; second to exploit the synergy between a biologically inspired neural network and the population coding paradigm; and third to introduce quantitative evaluation of the sensorimotor mapping performance. 5.2 Neural Architecture In this section I present the neural model used for computing CTs between an arm and the head FoR. I define a simple mechanical structure composed by an eye, a head and an arm with the same origin. I assume the same origin because the fixed translations among these FoRs can be neglected due to their known contribution in the computation of the CTs (Figure 5.1, left pane). The eye position is defined by the angle ex with respect to the head FoR, the retinal stimuli position of the arm is defined by the angle rx with respect to the eye FoR; the head-centered position of the arm is defined by ax = rx + ex angle (see Figure 5.1, left pane). The neural architecture is divided in three layers: the first is composed by two populations of neurons which represent the information of the retinal position of the arm, rx , and the eye position with respect to the head, ex . The second is composed by PPC population of neurons that encode the position of the arm in different FoRs. The third is a population of neurons that encodes the arm position with respect to the head FoR. 51 5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem 5.2.1 Sensory representation The first layer of the network model, receives as input the analog eye position with respect to the head FoR (ex ) as well as the arm position with respect to the retinal FoR (rx ). For my purposes, I defined the eye angle ex in degrees and the retinal position of the target rx both in degrees and in pixels, see Section 5.3. These numeric values are encoded in a population coding paradigm, where a given sensor value can be represented as a population of neural responses where each neuron is centered at a particular value and it will fire higher as the sensor value is closer to the neuron preferred sensor value. The response of population neuron is defined as: (v − µi )2 ni = A exp − (5.1) 2σ 2 where ni is the response of the population neuron i, µi is the neuron preferred sensor value, v is the input analog angle (in degrees) and σ is the standard deviation of the gaussian. For example, suppose that ex is equal to a certain angle ex in degrees; the representation of eye position with respect to the head FoR is given by a set of population responses as follows: pe = [n0 , . . . , nM ]T (5.2) where pe ∈ RM is the vector that contains the population responses, M is the number of neurons in the population, and ni is the single neuron response given by the Equation 5.1. It is worth nothing that ex ranges between a minimum and a maximum value and the distribution of the neuron preferred sensor values µi can be arbitrary. I choose to linearly distribute these values in the sensor space because there should not be a preferred region in space where the sensor values are over represented. Similar considerations are also valid for the representation of rx , defining the correspondent population responses vector pr with dimension L. I define the overall sensory representation as: x = [pr pe ]T (5.3) where x ∈ RL+M is composed by the responses of two populations representing both eye position with respect to the head FoR and retinal position of the arm with respect to the eye FoR. 5.2.2 Posterior Parietal Cortex model The PPC layer is based on the Predictive Coding/Biased Competition model (PC/BC) proposed in [28]. The model is trained with a unsuper52 5.2 Neural Architecture vised approach that is based on Hebbian learning. The system equations are: s = x (2 + Ŵ T y) y = (1 + y) ⊗ W s (5.4) where s is the internal state of the PC/BC model, x = [n0 , . . . , nL+M ] is the neural population input vector defined by M retinal neurons and L neurons encoding the eye position, W is the weight matrix, Ŵ is the normalized W , y is the output vector of the PPC layer, and 1 , 2 are constant parameters; and ⊗ indicate element-wise division and multiplication respectively. These equations are evaluated iteratively for a certain number of time steps; after a certain period of time, y and s values reach a steady state. The internal state s is self-tuned and represents the similarity between the input vector x and the reconstruction of the input Ŵ T y (s ≈ 1 indicates an almost perfect reconstruction). The unsupervised training rule is given by: W = W ⊗ {1 + β y(sT − 1)} (5.5) where β is the learning rate. This training rule minimizes the difference between the population responses x and the input reconstruction W T y; the weights increase for s > 1 and decrease for s < 1. Let’s consider the output vector y = [y0 , . . . , yT ] as the population responses of the PPC model. Each neuron response yi should be compatible with the gain modulation paradigm, according to the experimental results of [28], in such a way that the response exhibit a multiplicative behaviour, as a function of both eye and retinal positions. The weight matrix, which encodes the response properties, is internal to the PPC model and the training phase is independent with respect to the unsupervised training phase that will involve the head-centered network layer. 5.2.3 Head-centered network layer The population of neurons associated to the head-centered frame of reference deals with the estimation of the arm position ax given the eye angle ex and the projection of the arm in the retina rx . The synapses between the PPC layer and head-centered frame are trained with an Hebbian learning, taking into account the arm position, ax . Estimating ax means identifying the maximum response inside a population of neuron that encodes ax with the population coding paradigm. The head-centered population responses are given by h = K y, where y is the output vector of the PPC model, K is the weight matrix representing the 53 5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem Figure 5.2: Experimental results with rx in degrees.(Top left) It shows the responses of the trained network that represents the arm position ax with respect to the head frame of reference for −20◦ ,0◦ and 20◦ , respectively. (Top right) The error distribution (in degrees) of the estimated ax with respect to the arm position, the eye position and the retinal position respectively. The solid lines represent the mean error and the dashed lines represent the standard deviation limits. (Bottom left) It represents a receptive field after the training phase of the PPC layer. (Bottom right) Contours at half the maximum response strength for the 64 PPC neurons. fully-connected synapses between the PPC model and the head-centered layer and h is a vector that contains the population responses, encoding the estimated ax . The dimension of h depends on the granularity of the ax encoding. The training phase is performed using Hebbian learning: K = K + δ h pTa δ= 1 N (5.6) where pa is the vector that contains the proprioceptive population responses encoding ax , and δ is the learning rate depending by N , the number of samples. 54 5.3 Experimental Results Figure 5.3: Experimental results with rx in pixels.(Top left) It shows the responses of the trained network that represents the arm position ax with respect to the head frame of reference for −20◦ ,0◦ and 20◦ , respectively. (Top right) The error distribution (in degrees) of the estimated ax with respect to the arm position, the eye position and the retinal position respectively. The solid lines represent the mean error and the dashed lines represent the standard deviation limits. (Bottom left) It represents a receptive field after the training phase of the PPC layer. (Bottom right) Contours at half the maximum response strength for the 64 PPC neurons. 5.3 Experimental Results In this section I present the results obtained in two experiments; in the first experiment I train and analyse the network where either the eye angle and the retinal position are encoded in degrees and in the second experiment I introduce a simple camera model to encode the retinal information in pixel. The training phase is carried out in two steps: (1) train the PPC layer and (2) train the head-centered layer. The PPC layer is trained following the method described in Section 5.2.2 (Equation 5.5) and the synapses between the PPC and head-centered layer are trained using Hebbian learning as described in Section 5.2.3 (Equation 5.6). 55 5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem 5.3.1 Experiment with retinal position in degrees In the first experiment, I encode both rx and ex in degrees and for the PPC layer, I use the same parameter values as in [28]. The y consists of a 64-element vector with a range for the sensors values defined as follows: rx ∈ [−30◦ , 30◦ ], ex ∈ [−30◦ , 30◦ ], ax ∈ [−60◦ , 60◦ ]. I encode the sensory input with a population of 61 neurons with a gaussian response and with a standard deviation σ = 6◦ . The σ value is chosen taking into account the experiment described in [28] whereas the neuron preferred values are equally distributed inside the range value. After the training of the PPC layer, I train the head-centered layer with a population of 121 neurons, defining h as a 121-elements vector. With 121 neurons representing ax the coding resolution (1◦ ) can be analytically derived. The standard deviation of the neuron responses associated to the arm position ax is equal to 6◦ . The population of neurons, encoding the proprioceptive position of the arm, has the same number of neurons of the head-centered layer (121) and each neuron has the same standard deviation (6◦ ). The proprioceptive responses vector pa drives the Hebbian learning for the head-centered neural layer (Equation 5.6). Figure 5.2 shows the analysis of the trained network: top left pane shows the responses of the trained network that represents the arm position ax with respect to the head frame of reference. The red solid line represent the response for ax = 20◦ , the green dashed-dot line represent the response for ax = 0◦ and the blue dash line represent the response for ax = −20◦ . Top right pane shows the error distribution (in degrees) of the estimated ax with respect to the arm position, the eye position and the retinal position respectively. The solid lines represent the mean error and the dashed lines represent the standard deviation limits. The error distributions are quite similar and, in general, the error is quite low with a global mean error equal to 1.93◦ with a global standard deviation equal to 1.89◦ . Bottom left pane shows the receptive field after the training phase of the PPC layer: it is shown the global shape of the gain modulation. As expected, the curves shapes are compatible with the gain modulation paradigm, supporting the evidence that an unsupervised method can effectively learn a multiplicative behaviour. Bottom right shows the contours at half the maximum response strength for the 64 PPC neurons: it is worth noting the different color of the contours that represent different level of activations. A qualitative analysis points out that the population responses are stronger where the correspondent neuron receptive fields are slightly overlapped. Moreover, the PPC neurons receptive fields almost cover the whole subspace in the ex -rx plane, 56 5.4 Conclusions indicating that there is at least a neuron firing for each combination of ex and rx . 5.3.2 Experiment with a simplified camera model In the second experiment I investigate a more realistic scenario where the retinal position is a pixel position in the image plane. I just consider only the horizontal component of the image position of the arm. To compute the real ax value I exploit some geometrical constraints, given by the camera model. In the specific: −1 rx ax = ex + tan [◦ ] (5.7) f where rx is the retinal position in pixels of the arm and f is the focal length of the camera. For my purposes, I choose a focal length equal to 120 pixels that represents a camera with a open lens of about 140◦ . The PPC layer contains 64 neurons but the input range are rx ∈ [−320, 320], ex ∈ [−25◦ , 25◦ ], ax ∈ [−94◦ , 94◦ ] where rx is defined in pixels; it follows that I suppose to have a image plane with an horizontal component that has a size equal to 641 pixels. The range of ax follows the maximum value that the ax can reach. I use 101 and 51 neurons to represent rx and ex , respectively. I use the standard deviation σ of gaussian representing rx equal to 60 pixels. Also in this case the standard deviation of the proprioceptive neurons encoding ax is equal to 6◦ . Figure 5.3 shows the results from the analysis of the trained network. The overall performance is lower than that obtained in the previous experiments: the top right pane shows the error distribution with respect to the arm, eye and retinal position, respectively. In this set of experiments, during the PPC learning, the system is able to learn PPC receptive fields that are compatible, in a qualitative way, with the gain modulation principle (see Figure 5.3, bottom left pane). The bottom right pane shows the receptive fields distribution in the space rx -ex where I have the same qualitative features of the previous experiment. The estimation of ax has a global mean error equal to 3.36◦ with a global standard deviation equal to 2.90◦ . 5.4 Conclusions This chapter described an unsupervised approach to learn coordinate transformations. The results show how the system is able to correctly 57 5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem compute the position of a target with respect to the stable head frame of reference knowing only the projection of the target onto the image plane and the eye position with respect to the head. Further experiments are foreseen to validate the model for more realistic scenarios, trying the method on a real robotic system and extending the model for complex physical architectures. 58 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law 1 Solving the reaching task problem means to compute the final hand position in space whereas a sensory stimulus -usually vision- has indicated the position of a target. Before computing the trajectory of the arm, it is necessary to estimate the target position with respect to the arm frame of reference (FoR). More specifically, the controller has to be able to compute the chain of coordinate transformations from the sensory input to the actuation, considering also that the dimensionality of the input and output spaces often differs. A way to define these transformations is to compute a sensorimotor map that correlates the sensory input space with the actuators space. According to neuroscience findings, in particular to the theory of neural modelling, the system can learn the sensorimotor mapping over a radial basis framework [95]. In principle, it is possible to compute sensorimotor maps among several sensory systems and actuators systems but for my purposes I always refer to the visual sensory system; the mapping between the feature space of the visual stimuli and the joints space of the actuator is called visuomotor mapping. In my study an active stereo head, able to triangulate targets in space, processes the incoming visual information. Given a target, the active stereo head is able to foveate it in a specific joints configuration. If the target is the arm’s end-effector the learning strategy correlates the arm joints configuration with the head joints configuration to foveate the target. The learning of visuomotor mapping is then obtained with an active stereo head and an arm. Three degrees of freedom are enough in my study since the system only needs to reach point, regardless of its orientation. Typically, this mapping is learned after foveating several random arm movements: this technique is named motor babbling [101]. Motor babbling is a learning schema (or system identification) where the robot autonomously develops an internal model of its body in the environment 1 adapted from [75][78][79]. 59 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law either randomly or systematically exploring different configurations. In this work I present a bioinspired approach for reaching; I show how data from a redundant stereo camera structure, driven by a controller fitting the Hering’s law of equal innervation are used to build a visuomotor map in the radial basis framework. The main contributions of this work are briefly summarized. The first result is to successfully exploit the synergy between an active stereo vision system based on the Hering’s law and a radial basis network that performs the visuomotor mapping. The second contribution is to show how a redundant stereo camera controller can be effectively used to train a sensorimotor map. The third novel approach is to investigate how robust is the Hering-based head controller in computing the foveation joint angles and interpreting them as input features for the radial basis network (in the visuomotor mapping). Section 6.1 presents the related works, section 6.2 introduces the proposed neural model, both for vergence and for visuomotor mapping, section 6.3 present the performed experiments, and in section 6.4 I derive my conclusions. 6.1 Related works Even though the problem of sensorimotor representation is widely covered in literature (for a review, see [48]), for the purposes of my work I consider only those papers approaching the visuomotor mapping through motor babbling [101]. First, I review recent works on active stereo systems and second I present some works approaching the sensorimotor mapping problem. Several approaches and methods to effectively employ active vision techniques are surveyed in [18]. The authors describe problems arising from many applications, e.g. object recognition, tracking, robotic manipulation, localization and mapping. A lot of techniques are proposed to deal with the low-level control strategies to drive the active stereo head. Among the different surveyed approaches, it is worth noting that the most of bioinspired architectures are based on the disparity energy model [86], directly controlling vergence and version. Wang et al. show the autonomous development of the vergence control, maximizing neural responses through reinforcement learning [128][129]. Gibaldi et al. show a model that directly extracts the disparity-vergence response without an explicit calculation of the disparity [41]. Moreover, the same authors implement the control strategy for the iCub head to foveate steady or moving object along the depth direction considering 60 6.1 Related works only some fixed configurations in the tilt direction [40]. Shimonomura et al. propose an hardware stereo head built with an FPGA and silicon retinas; the vergence system is able to foveate a point processing the disparity computed with the energy model [112]. Tsang et al. [122] show a gaze and vergence control system using the disparity energy model with a vergence-version control with a virtual vergence component. Qu et al. [98] propose a neural model based on the energy model introducing the orientation and scale pooling; they show how the novel features improve the learning curve. Sun et al. [117] demonstrate that the vergence command can be learned starting from a sparse coding paradigm. Other recent approaches addressing the problem of the vergence are based on more classical algorithms, either fuzzy [62] or SIFT [6]. Typically the experimental data are collected only along the depth direction; in my research, instead, I have addressed the problem to produce statistics related to a wider space along the three direction in space. Moreover, I have introduced the neck redundancy in order to improve the capability of the control system. On the other hand, the visuomotor mapping is the correlation between the visual representation of the target and the arm joints configuration to reach it. Chinellato et al. show a bidirectional visuomotor mapping of the environment built on a radial basis function framework that is trained through exploratory actions (gazing and reaching) and implemented on a real humanoid [21]. Saegusa et al. propose a method to infer the body schema based on stochastic motor babbling. The babbling is driven by the visuomotor experience [100]. In another work, Saegusa et al. propose a new method to produce a motor behaviour which improves the reliability of the state prediction [101]. Hemion et al. report a competitive learning mechanism to infer the way the robot actuators can influence its sensory input without a preprocessing step of self-detection [46]. Gläser et al. claim the first implementation of a framework that includes a population coding layer for the representation of the schemata in a neural map and a basis function representation for the sensorimotor transformations where schemata refers to a cognitive structure describing regularities within experiences that is similar to the motor primitives reported for vertebrates [42]. Further references on the radial basis networks used for sensorimotor mapping can be found in [21]. 61 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law 6.2 Neural Architecture 6.2.1 Hering-based Control system In this section I introduce the bio-inspired active stereo vision system, previously proposed in [103]. The fundamental equations are based on the Hering’s law of equal innervation which states that the eyes movement is generated by combining the movements of vergence and version [60]. The system is a proportional model which needs to be trained to learn the proportional parameters. The controller drives a 3 degrees of freedom (DOF) structure with 2 DOF for the pan command for both eyes and 1 DOF for the tilt, as in Figure 6.1. The fundamental equations are: θ̇version = K1 (xL + xR ) (6.1) θ̇vergence = K2 δ (6.2) θ̇tilt = K3 (yL + yR ) (6.3) where xL and xR are the feature x-position on the left or right image plane and yL and yR are the feature y-position on the left or right image plane. The disparity of the projected feature is represented by δ, and [K1 , K2 , K3 ] are the parameters that must be estimated. I can compute the pan and tilt angles as following: θ̇r = θ̇version − θ̇vergence (6.4) θ̇l = θ̇version + θ̇vergence (6.5) θ̇t = −θ̇tilt (6.6) Setup To be as consistent as possible with reality I use the camera model with the same calibration matrix for both eyes: 200 0 320 K = 0 200 240 0 0 1 (6.7) with focal length equal to 200 pixels and with an image plane of 640 × 480 pixels. This calibration matrix leads to a lens angle of about 100◦ . 62 6.2 Neural Architecture It is worth noting that I use undistorted non-rectified matrices, taking into account that I deal with an active system and considering the consistency of the camera model. I define the origin of the neck-frame coincident with the origin of the world frame of reference; the only movement of the neck is given by the tilt activity. The camera positions are defined at 0.2 m of distance to each other along the x-axis, and at 0.2 m along the y-axis of the world frame of reference (see Figure 6.1). The unity measure of the world frame of reference is meter. To evaluate the performance of the system I use the following error measure: q 2 eL/R = x2L/R + yL/R (6.8) that is the Euclidean distance computed in the image plane between the final feature position in the image plane and the centre of the image plane (in this case I have defined the origin of the frame of reference of the image plane exactly at the centre of the image plane itself). The subindices L/R refer to the left and right camera respectively. I choose to evaluate the error for the left and the right eye separately to understand if the foveation error vary between the two eyes. Learning phase In this section I propose a method to learn the parameters Ki that guarantee a minimum error eL/R for any desired 3D point to be foveated, independently from the starting position of the stereo camera. The parameters can be learned by performing the following minimisation: c(X, Y, Z) = e2L + e2R + X | θ˙l |j + j K = argminK1 ,K2 ,K3 X | θ˙r |j + j y | θ˙t |j (6.9) j XXX x X c(x, y, z) z The Euclidean distances in the objective function are needed to evaluate the performance of the system in foveating the desired point; the sum terms are necessary to minimize the lengths of the performed trajectories (and therefore avoiding oscillations around the desired final position). The objective function is minimised numerically using the gradient descent method; the points used as training set cover most of the view field and can be described as follows: 63 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law Right y Lefty θR θL x z x image plane z image plane y World x θT z Figure 6.1: Frames of reference of the active stereo system with 3 DOF. The tilt movement is executed along the x-axis of the world frame, and it rotates the frames of both eyes of θT [rad]. Ideally, I define a virtual neck that performs the tilt movement. x ∈ [−100, 100] m y ∈ [−100, 100] m z ∈ [1, 201] m with a step of 50 m. 6.2.2 Extending the Hering-based Control system The model presented so far takes into account only 3 DOF to foveate a generic target in the 3D space. In this section, I extend the model adding a further degree of freedom (i.e. the neck) to improve the performance of the head in the pan activity. Moreover I investigate whether, from a biological point of view, it is possible to infer some similarities between the obtained head trajectories and the stereotypical trajectories performed by primates (eventually humans). In order to add the additional neck-joint, I have investigated different augmented version of control system presented in Section 6.2.1, and for each of them I have evaluated the performance. First of all, I have introduced the neck component in accordance with 64 6.2 Neural Architecture Equation 6.1-6.3: θ̇neck = K4 (xL + xR ) (6.10) It implies that neck motions depend on the position of the feature in the image planes. Neck movements only consists of rotations along the Y axis and are independent from the tilting command. Setup Introducing a new degree of freedom for the neck to make the system redundant, requires to define a chain of roto-translations from the neck to the world frame of reference. The position of a 3D feature (initially defined in the world frame of reference) in the camera frame of reference can be computed as follows: L/R RW = L/R N RN (θL/R ) RH (θN ) H RW (θT ) (6.11) L/R where RW is the roto-translation between the world frame of referL/R ence and the camera frame of reference (left or right), RN (θL/R ) is the roto-translation between the neck and the camera frame of reference, N (θ ) is the roto-translation between the head and the neck (defined RH N H (θ ) is the tilting as the movement along the pan direction) and RW T command defined as a rotation of the head frame of reference with respect to the world frame. The camera model and the other parameters are defined as in Section 6.2.1. Neck configurations In order to compute the angle movements for pan, tilt, and rotation, equations 6.1-6.3, and 6.10 have to be combined appropriately. I call configurations the different ways to obtain these angle movements. These configurations are summarised in the Table 6.1, and reflects the following ideas: • The eye movements (pan) could be mediated by the neck component (Configuration 1-4) • The neck movements (pan direction) could be mediated by vergence and version (Configuration 1,2,3,6) • The eye and the neck could be independent each other (Configuration 5) 65 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law Configuration 1 θ̇r = θ̇version − θ̇vergence + θ̇neck θ̇l = θ̇version + θ̇vergence + θ̇neck θ̇t = −θ̇tilt θ̇n = θ̇r Configuration 3 θ̇r = θ̇version − θ̇vergence + θ̇neck θ̇l = θ̇version + θ̇vergence + θ̇neck θ̇t = −θ̇tilt θ̇n = θ̇neck − θ̇version Configuration 5 θ̇r = θ̇version − θ̇vergence θ̇l = θ̇version + θ̇vergence θ̇t = −θ̇tilt θ̇n = θ̇neck Configuration 2 θ̇r = θ̇version − θ̇vergence + θ̇neck θ̇l = θ̇version + θ̇vergence + θ̇neck θ̇t = −θ̇tilt θ̇n = θ̇l Configuration 4 θ̇r = θ̇version − θ̇vergence + θ̇neck θ̇l = θ̇version + θ̇vergence + θ̇neck θ̇t = −θ̇tilt θ̇n = θ̇neck Configuration 6 θ̇r = θ̇version − θ̇vergence θ̇l = θ̇version + θ̇vergence θ̇t = −θ̇tilt θ̇n = θ̇neck − θ̇version Table 6.1: Possible configurations Learning phase I have adapted the learning procedure that is used for the 3 DOF system (see Equation 6.9) to the new 4 DOF system: c(X, Y, Z) = e2L + e2R + X j | θ˙l |j + X | θ˙r |j + j K = argminK1 ,K2 ,K3 ,K4 X | θ˙t |j + j y | θ˙n |j (6.12) j XXX x X c(x, y, z) z The minimisation is performed with the same algorithm and on the same training set as in Section 6.2.1. 6.2.3 Visuomotor mapping for a 3 DoF arm The simulated robotic system is composed by the previously described head and an arm. The head is a 4 DOF structure with 2 DOF for the pan command for both eyes, 1 DOF for tilting, and 1 DOF for the neck component; the arm is composed by 2 DOF for the shoulder and 1 DOF for the elbow as in Figure 6.2. I define the properties of the head and the arm to be as much as possible compatible with the human characteristics (see Figure 6.2 right pane; for the head, see Section 6.2.2). 66 6.2 Neural Architecture Right Left y y x z x z TARGET target retinal position Stereo vision control system arm joint angles Arm controller (RBF network) image plane head joint angles y Neck x z Arm y World x z Figure 6.2: System architecture. (left pane) The schematic model of the working environment with the active stereo system and the arm initial position. The aim is to detect the target position in space through stereo cameras, compute head joint angles to foveate the target and directly compute the final joint configuration of the arm to reach the target location. The sensorimotor map is learned using the end-effector itself as a target for the vision system. (right pane) The schematic of the arm. It has 3 DOF with links lengths compatible with human counterparts. The range of θ1 is [− π2 , π2 ], the θ2 range is [− π2 , π2 ] and the θ3 range is [0, 34 π]. I evaluate the system for a reaching task: given a target feature in space the system must be able to perceive it with the stereo cameras, compute the joint angles of the head to foveate that 3D point and, without physically foveating it, use these head angles to map the head joints space into the arm joints space for reaching the target position with the end-effector. The control architecture for reaching can be functionally subdivided in two main modules: the first module is the vergence control system that controls the stereo cameras system, and the second module is the RBF network that computes the final arm joint configuration for reaching, only knowing the joint configuration of the stereo system to foveate the target. Conceptually, the training is equivalent to the motor babbling schema; given a set of random movements of the arm it is possible to correlate the arm joints space and the foveating joint angles of the head. After an initial training phase, where the system explores the environment and gathers data, the radial basis network is trained and the performance is verified on a test set of 3D points (knowing the ideal arm joints configurations to reach them). Given a target in 3D space, the motor system should be able to compute the final joint configuration of the arm to reach the target. This computational task is performed by the sensorimotor mapping. The 67 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law subsystem receives in input only the joint angles of the head to foveate the target and it is able to compute the final configuration of the arm without physically move the head. The basis functions can be combined linearly to approximate any nonlinear function, such the mapping of the peripersonal space (or the working space of the robot). According to [95], basis functions are suitable to model the computational framework of the posterior parietal cortex where the sensorimotor transformations are performed. The radial basis network is composed by a nonlinear hidden layer and a linear output layer. Each nonlinear hidden neuron is a radial basis function. I use a gaussian radial function: (x − c)2 h(x) = exp − r2 (6.13) where h(·) is the basis function, x is the head joints vector, c is the basis center and r is the spread. The output neuron is defined as: f (x) = m X wj hj (x) (6.14) j=1 where m is the number of hidden neurons, wj is the weight and f (x) is the estimated arm joints vector. In the following section, I will discuss in more details the dataset generation. On the dataset I perform a 10-fold cross-validation, selecting with a uniform distribution samples in the populated dataset [31]. With 9 folds I train the network and with the last fold I evaluate the performance, repeating this procedure for each possible combination of folds. This procedure is equivalent to motor babbling since the folds are populated by randomly choosing samples from the whole dataset. Moreover, to improve the accuracy of the visuomotor mapping, I perform a ”meta-learning” over all the possible combination of the folds, varying the spread of the basis to infer the best spread value. Once I have selected the best spread value, I perform the cross-folding validation to evaluate the radial basis network. The controlled 3 DOF arm has human-like characteristics (see Figure 6.2 right pane). The lengths of the links are compatible with those of the human counterpart and the range of the joints is similar to those of the humans. 68 6.3 Experimental Results 6.3 Experimental Results 6.3.1 Hering-based results Experimental results The gradient descent minimization of the cost function on the training set leads to the following parameters: K1 = 0.3286 K2 = 0.0859 K3 = 0.1837 (6.15) It is worth noting that the cost function has a lot of local minima but, in my experience, the overall performance of the system is not affected. To test the performance of the learned control system, I have conducted the following experiments: • Exploring the 3D space, that investigates the capability of the active stereo system to foveate points that are not contained in the training set. • Testing initial position, that investigates the capability of the system to foveate a feature in 3D space, regardless the initial joints configuration of the stereo camera. The aim is to investigate the robustness of the system to foveate a feature starting from a generic position. Exploring 3D space As a first experiment, I investigate the capability of the system to foveate a huge set of features (e.g. 3D points) in the 3D space starting from a defined initial position. Based on their 3D positions, the evaluated points (testing sets) can be grouped in three cubes adjacent to the training set: Along Z direction [−100, 100] × [−100, 100] × [201, 401] Along Y direction [−100, 100] × [100, 200] × [1, 201] Along X direction [−200, −100] × [−100, 100] × [1, 201] Each of these portions of space is discretised with a step of 10 m in each direction. I do not consider the points that are not projected in either image planes. 69 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law Figure 6.3 shows the errors associated to each point in the 3D space (top row), and the overall error distributions (bottom row). The mean error associated to the testing set along the Z direction is 1.42 pixel with a variance of 0.33. This result is expected mainly because the projections of the 3D points are closer to the image centre as their distances from the image plane increase. Along the X direction the error increases as the X component increases. Since these points are close to the image planes, their projections are in the border of the images and, consequently, the task of foveating them is more challenging. However, as can be seen from the bottom pane, the errors are distributed in an acceptable error interval; i.e. [2.5;5.5] pixels with an average of 4.33 pixels and a variance of 0.488. Similar considerations can be done for the testing set along the Y direction, where qualitatively the error increases as the Y component of the 3D points increases. The mean error is 3.96 pixels with a variance equal to 0.296. Figure 6.3: Error maps computed for the left eye; I have experienced very similar error values also for the right eye. Top row: testing sets with the error associated to each foveated 3D point. Bottom row: the error distribution in pixel for the testing set. The red line represents the mean of the error. As I expected the error distribution along the Z direction is lower then along the other directions. 70 6.3 Experimental Results Test initial position The initial position test aims at understanding the robustness of the system to foveate a 3D point starting from a generic joint configuration (i.e. θl , θr and θt ). Vergence and version affect the panning command competitively (see Equation 6.4). To check whether the system is able to perform panning accurately, I have evaluated the most problematic region of the 3D space. Indeed, the Z region represents an ”easy” case where the points are always projected to the centre of the image, and the Y region does not affect the panning but the tilting. The testing subspace along X direction, used for the experiments, is: [−200, −100] × [−100, 100] × [1, 201] discretised with a step of 10 m on each direction. Let the system foveate each of the testing points starting from each possible joint configuration in the joint space. I have defined a range of values for each joint, i.e. [−60◦ ; 60◦ ] with a step of 30◦ . In total, I have 125 different joints configurations. Then, I compute the mean error associated to each 3D point and the results are shown in Figure 6.4. Qualitatively, the error increases as the Z component of the 3D points decreases (see left pane). Since these points are close to the image planes, their projections are in the border of the images and, consequently, the task of foveating them is more challenging. However, as can be seen from the right pane, the errors are distributed in an acceptable error interval; i.e. [1;35] pixels with an average of 15 pixels. 6.3.2 Extended Hering-based results The experiments presented in this section aim to: • selecting the neck configuration that has the best performance in terms of error in the exploration of the 3D testing space • comparing the performance with the results collected with the original system To select the best configuration I have compared the obtained results in the experiment ”exploring the 3D space”. The best configuration was then used to run the experiment ”test initial position”. 71 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law Figure 6.4: Original system. In the left pane it is shown the mean error associated to each 3D point in the testing set. The mean error is computed considering each plausible initial joints configuration of the head; for each configuration I compute the error to foveate. In the right plane is shown the mean error distribution. Exploring 3D space I have run the experiments for each neck configuration and, comparing mean and variance, I have found that the best configuration is the number 5, with decoupled control between eyes and neck (see Table 6.1). The testing sets are the same as defined in Section 6.2. The obtained parameters K after the training phase of Configuration 5 are: K1 = 0.0167 K2 = 0.5543 K3 = 0.1584 K4 = 0.3542 (6.16) Figure 6.5 presents the error maps related to Configuration 5. Results seem to be compatible with the performance obtained with the 3 DOF system (see Section 6.2 and Figure 6.3); i.e. the mean errors are 4.33, 3.93 and 1.41 pixel, and the variances are 0.65, 0.32 and 0.34, respectively for the testing sets along the X, Y and Z directions. Test initial position The initial position experiment results are presented in Figure 6.6. The errors are distributed in the interval [5; 20] pixels. Compared to the 72 6.3 Experimental Results Figure 6.5: Error maps computed for the left eye of the extended system with the fifth neck configuration. performance of the 3 DOF system (see Section 6.2.1 and Figure 6.4), the error presents a lower mean and standard deviation. I can therefore conclude that the additional neck-joint provides robustness to the system and, specifically, it reduces the influence of the initial configuration of the head on the performance of the system in foveating a point in space. Head trajectories I have investigated different possible control laws for the extended model to take into account the redundancy introduced by the neck; what emerges, comparing the error in foveating 3D points, is that the best performance is obtained when both eyes and neck controls are decoupled (Configuration 5). Comparing the error illustrated in Figure 6.3 and 6.5 it emerges that the mean error and variance associated to the extended system are in general similar to the original ones. Figures 6.4 and 6.6 present the experimental results of the initial position of the system. In this case the error of the extended system presents a lower mean and standard deviation. I can therefore conclude that the additional neck-joint provides robustness to the system. Furthermore, a qualitative analysis of the trajectories of the extended model with decoupled control (see Figure 6.7), i.e. Configuration 5, seems to be compatible with some biological results [17]. 73 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law Figure 6.6: Extended system. In the left pane it is shown the mean error to foveate each 3D point in the testing set. The mean error is computed considering each plausible initial joints configuration of the head. In the right pane it is shown the mean error distribution. 6.3.3 Visuomotor mapping results Method Training the active stereo head means to estimate its parameters [75]. I have discretized the arm joint space in fine steps of 12◦ along each axis, for a total number of 1062 samples. For each position in the joint space the direct kinematics is computed. Knowing the position of the endeffector and its projection onto the image planes of the stereo cameras, I compute the vergence-version angles to foveate the end-effector itself. With the calibration parameters of the stereo cameras I compute the foveating point in 3D space in order to compute the Euclidean error between the 3D position of the end-effector and the foveated point in space. This is an intrinsic error of the active visual controller and it does not depend on the arm controller. The trained network should be able to manage it, estimating the end-effector position regardless the foveation error. Figure 6.8 (left pane) shows the generated dataset; each point represents a valid end-effector position. Moving selectively the arm in space I build a dataset where each sample is composed by: • 3D position of the end-effector • arm joints configuration 74 6.3 Experimental Results Figure 6.7: The trajectories of the cameras performed by the trained extended system. The blue cross represents the 3D feature in space in position [200 0 40]. For graphical reason, the image is scaled but it is clearly shown that the system firstly moves the neck and only when the neck is in a steady position the eyes perform the vergence movement. • head joints configuration, foveating the 3D position of the endeffector (projected into the image planes of the cameras) • Euclidean error in 3D space between the foveated point and the position of the arm (due to the vision system intrinsic error) I split the dataset in 10 folds to perform the cross-validation in the training phase of the radial basis network where each testing fold is composed by 118 samples. Cross-validation is widely used to evaluate the performance of the radial basis network, using the mean square error (MSE) as evaluation criterion to control the stopping of the training phase. In order to improve as much as possible the accuracy of the network, I have implemented an optimization loop to detect the best spread value for the basis that minimize the mean euclidean error between the estimated arm joint configuration and the real arm configuration. The range of the evaluated spread values is [0.5, 1.3] rad. Results The evaluation of the radial basis network is mainly composed by two phases: 1. choose the best spread value 2. evaluate the network performance with the cross-folding technique (using the optimal spread value) 75 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law Figure 6.8: (left pane) Complete dataset. The points in space represent the end-effector positions used as targets for the active stereo head. In the dataset at each end-effector position are associated the corresponding arm joint configuration, the foveating joint angles of the head with the euclidean error between the foveation point e the end-effector position. This dataset is used for the cross-folding validation. (right pane) The Euclidean error between the real end-effector position and the one estimated by the radial basis network; the error is quite low except for that 3D points that are very near to the head and to the shoulder. Figure 6.9: Error directions projected on different planes of the world frame of reference. The blue dots represent the targets and the red lines are the distance in space between the target and the arm position computed by the network. For visualization reasons, I do not plot the estimated end-effector position. (left pane) Error projection into the plane X-Z. (right pane) Error projection into the plane Y-Z. 76 6.3 Experimental Results Figure 6.10: Radial basis centers distribution in the input space. The red circle represents a basis center, the blue dots are the testing values and the cyan dots are the values used for the training phase. In the first phase, I execute an optimization loop over the 10-fold crossvalidation, varying the spread value of the basis functions. In the optimization phase, I have found that the best spread value is equal to 1.25 [rad]. After the estimation of the optimal value, I use it to perform the 10-fold cross-validation over the dataset, evaluating the overall performance related to the reaching task. Figure 6.8 (right pane) shows the scatter of the Euclidean error between the real position of the end-effector and the estimated one for the test set. As previously said, I use the 10-fold cross-validation so the dataset is split in 10 folds and the error is given for each testing fold (each fold is used as testing set and the other 9 are used as training set). Here the error, generated from the testing of each fold, is plotted. The figure clearly shows that the error is very low in the whole workspace with the exception of points that are very near to the head and shoulder. Knowing the foveation angles of the head, the network is able to correctly compute the arm joint angles to reach the targets. The mean error is 0.0320 m and the standard deviation is 0.0591 m inside a error range of [0.0001, 0.9603] m. I notice that the maximum error of 0.9603 m is associated to a point in the border of the workspace, very near to the head. This error is due to the constraints that I have imposed to the arm joints range, when the forearm is at 132◦ over 135◦ . 77 6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law Figure 6.9 shows the euclidean distance between the estimated arm position and the desired target. Blue dots represent the targets and red lines are the distance in space between the target and the arm position computed by the network. In the left pane is shown the projection on the X-Z plane of the world frame of reference; the error is distributed in the whole space but it decreases when the Z value increases. It is due to the constraints imposed on the arm range. The right pane shows the same projection on the Y-Z plane and clearly indicates that the error is not due to the shoulder component in the tilt direction, since red lines are along the same shoulder position. It is due to the fact that the network receives only one tilting component of the head, in opposition to the horizontal head movements that are generated by the left, right and neck components. Figure 6.10 shows the radial basis centers distribution in the input space and is referred to a single training of the cross-folding validation. The training phase has selected 82 neurons. The red circles represent a basis centers, blue dots are the testing values, and cyan dots are the values used for the training phase. The left pane shows the centers distribution projected on the plane of θL -θR that command the pan movements of the camera. The centers are distributed along a straight line and it is in agreement with the highly cooperation-competition of the two pan angles. Furthermore, the θL and θR values used as input for the network are well-distributed along the same line. In the middle pane I project the same center on the θN -θT plane and the centers distribution is quite uniform mainly because the two angles are conceptually independent. Finally, in the right pane I project the input data on the plane θN -θL and also in this case I observe a correlation between the neck component and the left camera. It is clearly shown that the neck performs the raw movements and the camera only executes small movements to correct the vergence. Also in this case, the centers are mainly distributed in the region containing the common movements but for the outliers, that have ad-hoc centers created during the training phase. 6.4 Conclusions In this work I have presented a vergence-version control system for an active stereo head based on the Hering’s law, able to drive the learning of a visuomotor mapping. First, I have quantitatively evaluated the performance of the original system previously presented in [103]. I have defined a cost function and I have trained the system with a classical technique; the obtained results show the robustness and the effectiveness 78 6.4 Conclusions of the controller. Second, I have extended the controller adding a neck component that makes the system redundant. I have defined different possible configurations of the neck control including coupled/decoupled controls. I have extended the cost function and trained the new controller for each neck configuration. I have compared the different neck configurations and chosen the best in terms of obtained performance. I have found the best performance with a decoupled control eye-neck. The trajectories generated from this controller are compatible with the human head trajectories in foveating tasks. Moreover, comparing the performance with those of [103], I have found that the extended controller solves the redundancy improving the performance and the robustness of the system. Furthermore, I have presented a novel computational system that is able to compute the visuomotor map between a target (perceived through the redundant active stereo head) and a 3 DOF arm, designed with human-like mechanics constraints. The 3 DOF arm can move around its peripersonal space through a radial basis network that computes the visuomotor mapping. Through an optimization phase, I select the best spread value for the basis functions that are part of the neural network. I have confirmed the robustness of the mapping through a 10fold cross-validation. After the training phase, the overall controller is able to detect in space a target and without moving the head, but just computing the angles of foveation, to reach it with the arm. The results confirm the robustness and the accuracy of the system to reach target in peripersonal space. Moreover, the Hering-based stereo controller generates the joint angles that are robust features to drive the network for the visuomotor mapping. The next step will be the validation of the model with a real robot with the same characteristics described above and adding new degrees of freedoms to the arm to make it redundant. 79 7 A model of a middle level of cognition 1 Cognitive development concerns the evolution of human mental capabilities through experience earned during life. Many researchers have developed agents that could develop autonomously through experience, interacting with the environment and adapting to it. An important feature needed to accomplish this objective is the self-generation of motivations and goals, as well as the development of complex behaviours consistent with those goals. My target is to realize a bio-inspired cognitive architecture, based on an amygdala-thalamo-cortex model, capable of autonomously developing new goals and behaviours. Experimental results show the development of new goals and movements. 7.1 Introduction During their life, humans develop their mental capabilities: this process is called cognitive development, and concerns how a person perceives, thinks, and gains understanding of the world through the interaction of genetic and learned factors [11]. A fundamental aspect in the cognitive development is the autonomous generation of new goals and behaviours, which allows the individual to adapt to various situations he faces every day. How humans can develop autonomously new goals during their existence is not completely understood. In order to realize agents capable to interact in an effective way with humans and integrate in their life, robotics should study the processes of human brain which allow the cognitive development of the individual, as well as the modalities underlying the generation of new goals and behaviours. My work gives a contribution to the achievement of this objective: its purpose is to create a bio-inspired robotic model based on human brain processes, that should make the agent able to autonomously develop new goals as well as new behaviours that could be consistent with these goals. 1 adapted from [68][69][80][81]. 81 7 A model of a middle level of cognition Figure 7.1: The overall IDRA architecture. It is composed by a set of Intentional Modules (IM) and by a Global Phylogenetic Module (PM). It receives in input a set of sensory information a produce in output the motor commands for the controlled robot. There are different approaches in robotics to adaptation. Behaviourbased robotics allows the agent to adapt its behaviour to changes in the environment, in order to accomplish its goals. In this approach goals are hard-coded into the robot, which cannot develop new ones. Developmental robotics aims at modelling the development of cognition in natural and artificial systems [66]. Developmental robotics leads to the cognitive development of the agent, making it able to adapt to the environment and autonomously develop new motivations and goals, that were not present at design time. Here I address an intermediate level of cognition that allows mammals and humans to be aware of the surrounding environment and then interact with it. This capability is an essential precondition to enable the robot to fit into the human’s everyday life. A robot must be able not just to act in a consistent manner to the changes in the surrounding environment, but also to develop goals that can emerge from that, in order to interact effectively with people. Such robot would also develop a unique personality, depending on the experiences that contributed to the creation of its new goals and behaviours. These features would make it the perfect robot for advanced applications. I present a system that allows the robot to develop new goals, in 82 7.1 Introduction addition to the hard-coded ones, and to adapt its own behaviour to these objectives. This system is inspired by the human brain, in particular by the communication of three areas of the brain: cortex, thalamus and amygdala, considered as a key element in human cognitive development. The Intentional Distributed Robotic Architecture (IDRA) is a network of elementary units, called Intentional Modules (IM), that enables the development of new goals (see Figure 7.1). In addition in the network it is also present a Phylogenetic Module (PM), containing the hard-coded objectives, i.e. the “innate instincts”; like in the amygdala. Through the action of the PM, the more the current state of the robot meets the objectives, the higher is the signal coming out. Each IM consists of two internal modules: Categorization (CM) and Ontogenetic (OM) (see Figure 7.3). The Categorization Module, like the cerebral cortex, returns a vector that encodes the neural activation of the cortex in response to the input. The neural activity represents the similarity of the current input with respect to previous relevant incoming signals. The Ontogenetic Module is the basis of the development of new objectives; it receives the vector of neural activations from the Categorization Module, and through the Hebbian learning develops new goals. It returns a signal indicating whether the current state meets the new goals. As depicted in Figure 7.3 PM and OM signals are compared, returning the more relevant of the two. The result is called relevant signal and it drives the self-learning phase of each IM. Therefore, the execution flow starts when the sensory input is acquired, filtered and sent to the network of IMs; each module returns a vector containing information about the state, and a signal indicating how much the actual state is satisfying the actual goals. The network can be composed by several layers, that can be connected in forward or feedback mode. The vector of neural activations are then used by a Motor System (MS) to generate movements consistent with the goals of the agent. Each movement composes a series of elementary components, called motor primitives, which represent the muscles activations over time; their composition (i.e. muscular synergy) leads to the execution of complex movements [44][74][116]. Information coming from IDRA is also used to create the behavior of the agent in new goals and situations. Each Dynamic Behavior analyzes the input to suggest the best movement, that is, which action will lead to the fulfilment of the goals, either learned or innate. The Dynamic Behaviour returns a set of muscle activations (a composition of motor primitives), each one referring to a joint connected to that specific Behaviour. These muscle activations are then used to perform the correct movement. IDRA therefore supports the cognitive development of the 83 7 A model of a middle level of cognition robot while the MS allows the robot to move and take actions in order to accomplish these goals. The system has been tested with two main experiments to verify the goals development skills as well as the ability to adapt its behaviour to its goals. In a first experiment the agent (a NAO robot, a humanoid produced by Aldebaran Robotics) learns a new goal from a hard-coded one; starting from an innate instinct related to figures with high-saturated colours, the agent autonomously develops an interest to a particular shape of the figures. In a second experiment the motor capabilities are tested. The NAO robot moves to maximize the relevant signal coming form the network of IMs; movements are generated by linear combination of motor primitives. The main contributions of this work are: • the design and full implementation of a cognitive architecture based on an amygdala-thalamo-cortical model, as already described in general terms in [70]; • the validation of the architecture, by testing the goals generation with the replication of the experiment performed in [70]; • the extension of IDRA with the Motor System, which allows the creation of movements by linear composition of motor primitives. In Section 7.2 I present the biological aspect related to my system. In Section 7.3 I discuss the Cognitive Architecture model and implementation. In Section 7.4 I describe the experiments to test the goals development and movements generation. Section 7.5 contains the conclusion. 7.2 Biological model Several studies have shown the importance of the amygdala-talamocortical interactions in cognitive development [115]. The cortex, the outer part of the brain, is divided in several sectors, and receives signals from the sensory organs. Although most of the sectors receive input from a specific source, different studies have proven that different areas can properly react to different stimuli sources. Therefore the whole cortex is composed of the same kind of cells, and it is able to respond, store and adapt to different kind of stimuli. The cortex acts as a memory bank that the brain uses to compare the incoming data to find analogies patterns, and invariant representations, and act accordingly. 84 7.2 Biological model The thalamus, the largest component of the diencephalon, is the primary site of relay for all of the sensory pathways, except olfaction, on their way to the cortex. The thalamus plays a central role for the mammals in the development of new motivations, as well in the choice of what goal to pursue. The thalamus is “a central, convergent, compact miniature map of the cortex” [110]. The thalamus is partitioned into about fifty segments, which do not communicate directly with each other. Instead, each one is in synchronized projection to a specific segment of the cortex, and receives a projection from the same segment. Therefore, while the cortex is concerned with data processing, storing and distribution, the thalamus determines which goals have to be pursued. Furthermore, each pair of cortex and thalamus sections intensively communicates. The amygdala is an almond-shaped group of nuclei located deep within the medial temporal lobes. It is heavily connected to the cortex area; it is involved in the generation of somatosensory response on the basis of both innate and previously developed goals, and of sensory information [1]. The amygdala seems to be an essential part in social and environment cognition in order to guide social behaviours. The amygdala has a key role in the recognition of the emotions and in the generation of an adequate response [32]. Thus one of the principal tasks of the amygdala is to generate new goals taking advantage of hardwired criteria. The cerebellum is supposed to have functionalities and structures similar to the classical perceptron pattern. The cerebellum has an extended network of various types of neurons, giving different abilities including motor learning and motor coordination [3]. Many studies have shown the importance of the basal ganglia in the motor generation of the movement. Basal ganglia and cerebellum seem to create two different sets of loop circuits with the cortex, both dedicated to different features of motor learning; independent and in different coordinates. Furthermore, it has been recently proposed a functional dissociation between the basal ganglia and the cerebellum: the former is implicated in optimal control of movements, and the latter seems to be able to predict the sensory consequences of a specific movement [120]. The spinal cord is the lower caudal part of the nervous system; it receives and processes sensory information from the various parts of the body, and controls the movement of the muscles. It acts as a bridge between the body and the mind [30]. Several studies have shown how complex movements are generated from the combination of a limited number of waveform modules, which are independent from the considered muscle, its speed, and its gravitational load. It was suggested that the nervous system do not need to 85 7 A model of a middle level of cognition generate all the muscles activity patterns, but to generate only a few basic patterns and combine them to generate specific muscle activation. This model can be represented by an oscillator that produces the output frequency from the input signal [89]. A motor primitive is a specific neural network, found in the spinal cord. The Dynamic Movement Primitives - a formulation of movement primitives via autonomous nonlinear differential equations - have been successfully used in robotic applications, and can be employed with supervised learning and reinforcement learning [104]. However, I separate myselves from this kind of experiments, which use learning in task-based robots. In my architecture I do not focus on high-level motor skills, nor on high level of reasoning and planning. Instead, I focus on an intermediate level of cognition that allows mammals and humans to be aware of the surrounding environment. Consciousness has been already suggested to be a product of an intermediate level of cognition. The awareness is supposed to be not a direct product of sensations, the “phenomenological mind”, nor a product of high level conceptual thoughts, the “computational mind”, but to be a product from several intermediate levels of representation [56][57]. This middle level has some interesting features related to consciousness: it underlines how I can interpret the surrounding environment and react to this awareness without the need for neither high-level conceptualizations nor complex motor controls, therefore solving the grounding problem of a semantic interpretation of a formal symbol system that is intrinsic to the system itself. In particular I will deal primarily with the categorical representation that is the learned and innate capability to pick out the invariant features of object and event categories from their sensory projections. While the behaviour-based robots explore the environment in order to optimize their actions for reaching a predefined goal, I aim at generating robots able to explore the environment in order to find new goals on their own. They must be curious to explore their environment, and once explored, they must be able to do something new according to their abilities as well as to their experience. Developmental robots must be able to perform actions following goals that were not present at design time [66][70]. 7.3 Implementation Model 7.3.1 Intentional Distributed Robot Architecture IDRA deals with the cognitive development of the agent, analyzing inputs and allowing it to develop new goals (a comparison with the biological counterpart is shown in Figure 7.2). The architecture is basically a 86 7.3 Implementation Model Figure 7.2: A comparison between a (very) sketchy outline of the thalamo-cortical system as commonly described [84] and the suggested architecture. It is worth to emphasize the similarity between various key structures: A. Category Modules vs cortical areas, B. Global phylogenetic module vs. amygdala, C. Ontogenetic modules vs. thalamus, D. Control signals (high and low bandwidth), E. high bandwidth data bus vs intracortical connections. 87 7 A model of a middle level of cognition net of linked modules, simulating connections and interactions between the cerebral cortex, the thalamus and the amygdala. The modules composing the net are the PM (amygdala) and a layered set of IMs, each composed by the Categorization Module (cerebral cortex) and the Ontogenetic Module (thalamus). The IMs are linked in various ways, also with feedback connections, while the PM broadcasts its signal to all the IMs, without receiving data back. This kind of structure simulates the interaction of the three brain areas: the thalamus can communicate with some respective areas of the cortex, which collects all information coming from the thalamus, but different areas of the thalamus cannot communicate between them. The amygdala sends information both to the thalamus and the cortex (see Figure 7.2). The input of the net arrives from various sensors (video, audio, tactile, etc.), is filtered and sent to the IMs of the first layer. The output of the net is a vector, representing the neural activation generated by sensory input, and a signal, representing how much the actual input satisfies both hard-coded goals and new developed goals. IDRA can autonomously develop new goals (through the Ontogenetic Module) starting from hardcoded ones (provided by PM), which represent the “innate instincts” of the agent. Innate instincts Phylogenetic processes contribute to the adaptation of organism behaviors to the environment through the production of instincts [73]. An Instinct is the inherent inclination of a living organism toward a particular behaviour, i.e. an impulse or powerful motivation from a subconscious source. The part of the brain that is associated with these types of reactions is the amygdala [16][33]. The amygdala has its own set of “receivers” for sensory intake, and can retrieve information from the environment and take a decision before a person could consciously think about it. I have implemented instincts in the system as hard-coded goals in the PM. It tells the agent what is relevant according to its hard-coded instinctive functions. The input of this module comes from sensors; each sensory information is processed by instinctive functions; each function processes only a certain input type, according to the instincts associated to that specific type.The output of this module (normalized between zero and one) is the phylogenetic signal, which tells how much the incoming stimulus is important according to the a priori stored criteria. 88 7.3 Implementation Model Intentional module and Neuroplasticity Neuroplasticity is the lifelong ability of the brain to reorganize neural pathways based on new experiences. Neuroplasticity can occur in different levels, ranging from cellular changes (e.g. during learning) to large-scale changes involved in cortical remapping (e.g. as consequence of response to injury). Scientific research has demonstrated how substantial changes can occur in the lowest neocortical processing areas, and how these changes can profoundly alter the pattern of neuronal activation in response to experience [16]. Neuroplasticity has replaced the formerly-held position that the brain is a physiologically static organ. In order to learn or memorize a fact or skill, there must be persistent functional changes in the brain, and these changes represent the new knowledge. The IM (Figure 7.3) can easily adapt to changes in sensory input. If I send to IM input from a video sensor, the IM will specialize to that type of input; however, if I send to this specialized IM a different type of input, e.g. an audio input, it will gradually adapt. The IM, as the basic unit, must be able to self-develop new goals and motivations. This idea resembles what developed in [70]. Figure 7.3: Intentional Module (IM) I can summarize the main objectives of the module as: 89 7 A model of a middle level of cognition • adapt to any kind of input; • learn to categorize incoming stimuli; • use acquired categories to develop new criteria to categorize; • interface smoothly with similar modules and give rise to a hierarchical structure. The IM is composed by two structures: the Categorization Module, which performs categorization, and the Ontogenetic Module, which can develop new goals. At the beginning, both modules are empty. Incoming data are sent to the Categorization Module. Once the categories have been created, a distance measure is sent to the Ontogenetic Module. The distance measure is computed between the incoming sensory signal and the developed categories. The Ontogenetic module performs Hebbian learning to develop new goals, and returns a signal based on how much these new goals are satisfied. This signal is called ontogenetic signal; a high value of ontogenetic signal corresponds to high satisfaction of the developed goals (see Figure 7.4). Figure 7.4: Ontogenetic Module (OM) The IM receives in input also the signal from the PM, and returns the maximum between the signals of the PM and the Ontogenetic Module, called relevant signal. The IM also send the categories created by the Categorization Module. 90 7.3 Implementation Model Categorization Module: cognitive autonomy The cerebral cortex is divided into lobes, each having a specific function. Parts of the cortex that receive sensory inputs from the thalamus are called primary sensory areas. One of the main feature of the cerebral cortex is the ability to adapt to stimuli, whatever is their nature, so each cortical area is capable to process any type of data [109]. The Categorization Module represents the cerebral cortex of IDRA. It receives input from sensors or from other IMs and performs categorization. The input is elaborated twice: first with Independent Component Analysis (ICA) [53] then with a clustering algorithm, such as K-Means. ICA allows the module to generalize the input representation regardless to the type of incoming stimuli. In an early development stage, independent components are extracted from a series of input through the ICA algorithm. After this training stage, the input is projected in the bases space (i.e. the previously extracted independent components), in order to reduce the dimension of the data and to get a general representation: W = IC × I (7.1) Where W is the resulting vector of weights, IC is the matrix of independent components and I is the input vector. This produces a vector of weights where clustering is performed, using KMeans algorithm. Clustering is a simple way to get the neural code, i.e. the translation of a stimulus into a neural activation. Considering that any information is represented within the brain by networks of neurons, it is supposed that neurons can encode any type of information [11][32]. During clustering, each vector is assigned to an existing cluster, if the distance from existing clusters is below a previously-set threshold; otherwise a new cluster is created, using the newly acquired vector. The output of the Categorization Module is a vector containing the activations of clusters, which depend on the distances of the input data from the centre of each cluster (category). This vector corresponds to the activation of a neuron centred in each cluster: yi = ρ(x, Ci ) (7.2) where yi is the distance of the actual input from the centre of the cluster i, x is the input and Ci is the centre of the cluster i. For my purposes, the values are normalized between zero and one. A new category is created by the categorization module depending on the value of the relevant signal computed by the IM; only relevant inputs are categorized, so that the module saves only meaningful information. 91 7 A model of a middle level of cognition Ontogenetic Module: goals generation Functionalities of the thalamus include elaboration of input and motor signals, as well as regulation of consciousness, sleep, and alertness [109]. Furthermore, thalamus is a “miniature map” of the cerebral cortex [115]. Additionally, each portion of the thalamus receives a reciprocal connection from the same portions of the cerebral cortex whereby the cortex can modify thalamic functions. These connections are more data intensive from cerebral cortex to thalamaus, while backward connection going from thalamus to cortex is weaker. This close connection between thalamus and cortex and their interplay led to the idea that goals generation is spread everywhere in cerebral cortex and it is obtained by the interaction between thalamus and cortex. The Ontogenetic Module represents the thalamus of my system. It is closely connected to the Categorization Module and uses the categories computed by the Categorization Module and a Hebbian learning function to develop new goals. The values of neural activations, provided by the Categorization Module, are evaluated using a vector of weights, and the resulting ontogenetic signal is the maximum value between the evaluated neural activations: os = max(yi wi ) i (7.3) where os is the resulting ontogenetic signal, yi is the activation of neuron i and wi is the vector of weights associated to neuron i and normalized between zero and one. The ontogenetic signal strongly depends on the weights used to evaluate the input. These weights are updated at each iteration, using a Hebbian learning function: wi = wi + η(hs yi − (wi yi2 )) (7.4) where η is the learning rate, hs stands for the hebbian signal, which is a control signal coming from the IM. A threshold is fixed, so if a weight is beyond or equals the threshold, its value is set to one. The output of the Ontogenetic Module is the ontogenetic signal, whose value represents how much the actual input state satisfies the new goals developed through the Hebbian learning process. Signal propagation The signal propagation through the internal sub modules of the IM can be summarized in the following steps (see Figure 7.3): 92 7.3 Implementation Model 1. The sensory input, the global relevant signal, and the external relevant signal of the previous time step are received; 2. the CM computes its output signal which encodes a proper representation of the actual sensory signal x(t); 3. the OM computes its ontogenetic signal, knowing the actual input representation of the sensory and having an internal representation of the developed goals in the input space; 4. the IM computes the relevant signal that represents the relevance of the sensory input at time t; 5. training phase: the internally generated relevant signal (in the previous step of this algorithm) drives the developing of new categories in the CM and the updating of the gates in the OM. 7.3.2 Motor system: movement generations The problem of moving and acting in a smart way, according not only to hard-coded goals, but also to new developed goals, is not trivial. I suggest a solution based on Dynamic Behaviours for movement evaluation, and on the concept of Motor Primitives for movement generation. The input to the motor part comes from the network of IMs, and it is composed by a vector of neural activations and a relevant signal. The vector, representing the state of the environment in a high level of abstraction, is clustered using K-Means algorithm. The output of the clustering is the cluster corresponding to the current state; with a State-Action table the best movement to do, according to the relevant signal, is selected. The movement is composed by a linear combination of primitives, and it is sent to the agent that uses it to compute joints values. Different lines of evidence have led to the idea that motor actions and movements are composed of elementary building blocks, called motor primitives. Motor primitives might be equivalent to “motor schemas”, “prototypes” or “control modules” [105]. Motor primitives could be transformed with a set of operation and combined in different ways, according to well defined syntactic rules, in order to obtain the entire motor repertoire of motor actions of an organism. At neuronal level, a primitive corresponds to a neuron assembly, for example, of spinal or cortical neurons [44]. Studies on the motor system suggests that voluntary actions are composed by movement primitives, that are bonded to each other either simultaneously or serially in time [44][74][116]. 93 7 A model of a middle level of cognition Following this idea, I use this concept of motor primitives to create muscular activations. A motor primitive could be seen as the activation of a muscle during time. The higher the value of the primitive, the stronger the muscle activation, which will bring to a faster execution of the movement. Activating different muscles in time, a complex movement can be performed. I implement primitives as Gaussian functions delayed in time: p=e −(x−c)2 2σ 2 (7.5) where c is the centre of the muscular activation of the primitive p. I have chosen the bell-shaped profiles of Gaussian function for primitives according to biological evidences: when humans move their limbs from one position to another they generally change joint angles in a smooth manner, such that angular velocity follows a symmetrical, bell-shaped profile [34]. To generate a complex movement, primitives are linearly combined, producing a muscular synergy: m= X wi p i (7.6) Each weight wi is initially randomly generated. In human brain the motor cortex is involved in planning, control and execution of movements.. Each neuron in the primary motor cortex contributes to the force in a muscle [33]. Furthermore, the primary motor cortex is somatotopically organized, which means that stimulation of a specific part of the primary motor cortex elicits a response from a specific body region. According to the fundamental role of the motor cortex in movement generation, the approach should be similar to the one used for categorization: I need a neural code of information like the one computed by the categorization module. So the first step in order to perform a movement is clustering. Clustering is performed by K-Means algorithm. However, in this case I do not need to create new categories, so clusters are defined in a previous training phase. I need the cluster representing the current state, namely the part of the primary motor cortex that is stimulated by the current state. This approach respects the idea that the same neural activation produces the same muscular response. I need something able to select the best movement according to the current state of the environment, depending on the goals. Dynamic Behaviours allow the agent to perform the best movement, given the state of the environment and the relevant signal, as well as the ability to learn the best movement to execute in an unknown situation. The relevant signal depends on the ontogenetic signal, coming from the thalamus, 94 7.3 Implementation Model and the phylogenetic signal, computed by the amygdala; its use in the motor development is based on scientific evidences [30]. Each Dynamic Behaviour is composed by a list of actuators to move in order to satisfy a goal. If I want the agent to look to a ball, for example, I create a Behaviour linked to the actuators which control the head yaw and pitch angles. The Dynamic Behaviour selects the set of movements to be executed in order to get the best relevant signal as a response from the networks of IMs. The computation of the best movement is based on a State-Action table (Figure 7.5); the table associates a state and a movement to a relevant signal. When the system is in a certain state and performs a movement, the relevant signal generated by this state-performed movement combination is stored in the table. Figure 7.5: State-action Table The policy for movement selection for each input state follows these rules: • if there is a movement associated with a relevant signal above a defined threshold, that movement is selected; • otherwise, if there is a movement not yet performed, that movement is selected; • otherwise, if all movements have already been performed at least once, and no one is associated with a relevant signal, a new random movement is added to the list Each movement is a set of weights to apply to the primitives to obtain a muscular activation. Once a movement is selected, it is used to lin95 7 A model of a middle level of cognition early combine primitives: the resulting vector is sent to the agent that computes the correct joint values and moves. 7.4 Experimental results 7.4.1 Setup IDRA implementation was specifically designed to be as modular as possible, in order to allow further addition of innate abilities in the PM, new kinds of sensor’s inputs, and new kinds of actuator’s behaviours. Moreover, it is designed to be virtually able to adapt to every robot the user would like to use it with. The project was developed in C# using Microsoft Visual Studio 2010, making extensive use of libraries and data standards that I specifically developed for this project. In the current state of the work the project is provided with C# implementation and XML config files for an Aldebaran NAO robot, and with a dummy robot useful for preliminary testing. The architecture runs on a Windows PC with: • Processor: Intel Core i7 920; • Video Card: Nvidia GeForce GTS 250; • Memory: 4 GB DDR3 Ram; • Hard-drive: 1 TB 7200 rpm. 7.4.2 Goal generation To test goal generation starting from hard-coded goals, I have performed a simple experiment, with a network composed of a single IM, and with input coming from a video camera. The agent should be able to extract information about the colour saturation of the image, according to a hard-wired instinct, and then learn the shape of the observed figure, thus developing a new interest for that particular shape. The experiment uses a NAO robot, in particular only the frontal camera, and two actuators controlling the head movement, namely HeadPitch and HeadYaw. This experiment is similar to the one performed in [70], where the employed agent was a two degree of freedom camera. Here I have used an agent far more complex and I have performed the test with the full architecture. The robot has a single innate ability, i.e. the “attraction” to coloured objects; the test consists in showing how a new interest, i.e. the “attraction” to specific shapes, could show up without the need to hardcode 96 7.4 Experimental results it. The test environment is limited to two board containing geometrical figures (Figure 7.6). The first board presents a series of black shapes, among which there is a black star. The second board presents some stars in highly saturated colours. The boards are put on a wall in front of the NAO robot, at an adequate distance to allow the camera to see the entire board while moving. Figure 7.6: The 2 boards for the experiment The network (Figure 7.7) is composed by only one layer including a single IM, and it receives as input just the video signal coming from the frontal camera. The input is filtered in three different ways: • logPolarBW filter: retrieves an RGB image and returns it in logpolar coordinates, in a single channel colour space; • logPolarSat filter: retrieves an RGB image and converts it in HSV (Hue Saturation Value), then returns the saturation channel in log-polar coordinates. • cartesianRGB filter: retrieves an RGB image and returns the same image in Cartesian coordinates in a three-channel colour space. This input is sent to interface and it is used by humans to understand what the robot is looking at. The IM receives data from logPolarBW in array form, while data from logPolarSat is received from the Phylogenetic Module. The phylogenetic signal here is the percentage of high-saturated pixels in the image. The output is the output of the single IM in the net. Head movements are randomly generated, using a uniform distribution for the angle and a bell-shaped Gaussian function for the amplitude of the movement. The 97 7 A model of a middle level of cognition Figure 7.7: The architecture used in the goal-generation experiment probability function of the amplitude is: e−λ r2 p(r, λ) = R rmax −λρ2 dρ −rmax e (7.7) where r is a random variable and λ is the relevant signal output from the IM. The variance of the Gaussian function depends on the relevant signal computed by the IDRA architecture. According to this, when NAO is interested in what it sees, the movement of the head is small, while non attractive images lead to wide head movements. In order to project the input in the basis space, independent components are extracted through ICA on images coming from the camera, using the parameters: • number of samples: 2000; • max number of iterations: 200; • convergence threshold: 10-5; • max number of independent components: 32; • eigenvalue threshold: 10-4. Let me now discuss the results. Initially the board contains only black figures. The interest of NAO is more or less equal for every part of the board; it aimlessly points at every part of its visual field, since he cannot find anything interesting. After, the second board is shown. Now the interest of NAO is focusing on the three star-shaped figures. After the board is switched again with the first one. Unlike before, the interest of the NAO robot is now focused on the star-shaped figure, which is black colored, meaning that the learning process of the Ontogentic Module has developed a new interest in the shape of the figure, which goes in addition with the previous interest for its color (Figure 7.8 and 7.9). 98 7.4 Experimental results During this test I have used the following parameterization: 0.6 threshold for the creation of new categories, 0.6 for determining the correlation of a signal to a category, 128 for the number of clusters, 0.8 threshold and 0.1 learning rate for Hebbian learning. Figure 7.8: The gaze is concentrated on the star object 7.4.3 Movement generation A limitation of the first experiment is that it relegates the agent to a passive role. Once the agent has learned from the experience, I need it to be able to take some action in order to interact with the environment, and so change its own perceptions towards a more satisfying condition. The objective of this experiment is to test the Behaviour and OutputSynthesis parts of the architecture. Starting from a simple hard-coded instinct, i.e. the “attraction” to coloured objects, and the ability to move only one of its arms, the robot learns which movements would allow him to see a coloured object, taken in the hand, moving it near the eyes, and then to increase its reward signal (Figure 7.10). The experiment uses NAO, in particular its frontal camera, and four actuators controlling the movement of the right arm. During this test I used a 0.5 threshold for the creation of new categories, a 0.28 threshold for the K-means algorithm, a 0.3 threshold for the minimum distance between centroids of clusters, a 1000 limit for number of points per cluster, a 10 limit for the number of clusters and number of categories, a 0.8 threshold and a 0.1 learning rate for Hebbian learning, a 0.8 threshold for the choice of the best possible movement in the Behaviours. For the motor primitives part, I used 5 Gaussians as primitives, each one with a standard deviation of 6.7 and a mean value calculated in order to equally distribute the functions on a scale from 0 to 100. 99 7 A model of a middle level of cognition Figure 7.9: The red line represent the global relevant signal whereas the blue line represent the ontogenetic signal. After training, the cognitive architecture is able to produce a signal of relevance also for the star-shaped objects. Figure 7.10: The robot NAO in the movement experiment 100 7.4 Experimental results With respect to the previous experiment, I have drastically reduced the number of categories and clusters. In fact, the head was fixed, therefore kept looking at the same point, unless the arm itself did not pass in front of the camera. The arm has a very limited input dimension (4, the number of joints), with respect to the video input (160x120 image, 19200 pixels). Limiting the number of categories, it allows to have a good representation of the almost static environment of the robot. The network is slightly more complex (Figure 7.11). It has two input signals, filtered in four different ways. Besides the 3 filters for the images as before described, I add the rightArmPosition filter: retrieves the proprioceptive information about the four joints of the right arms and it returns a vector containing the joints angles values in radians. Figure 7.11: The architecture in the second experiment IDRA is composed by two layers; the first contains two IMs. The first IM receives logpolar gray images and records the shapes. The second, instead, receives the proprioceptive data about the position of the right arm. Both send their output vector and relevant signals to a third module, situated in layer 2. Their output therefore represents the state of the known world: what it can see, and where its arm is. The PM receives the logPolarSat filtered input, and broadcasts its signal to the net. Therefore, if the robot is in a state where he can see a colored object, the third IM will have a high outgoing relevant signal. In order to project input in the basis space, independent components are extracted through ICA. The first IM computes ICA on video images, the second one on the joints values of the right arm, and the third on the combined output of the others. The parameters used are the same as in the first experiment. Once the net is trained, clusters for the Motor System should be computed. While the net is running, samples of the output are collected, 101 7 A model of a middle level of cognition then K-Means algorithm is executed to create clusters. Parameters used are: number of sample: 2000, number of clusters: 100. The experiment starts with the robot in a random position. At the beginning, the State-Action table is empty, and movements are chosen and executed randomly. After a number of steps, the table starts filling, and movements begin to be coherent with the maximization of the relevant signal received from the architecture. When the table contains a good number of entries, movements start to be frequently repeated. Several positions of the arms (rows in the State-Action table) know the reference to a movement (column) that bring the hand to a position with a high relevant signal; several positions with a high relevant signal know many movements, but none of them brings the hand to a position with a high relevant signal. Therefore, I observe the robot starting from a random position, going towards a good one, and then going towards a random position, and so on. Figure 7.12 shows the clusters of positions manifesting the higher relevant signals, colored in red. For each cluster, I report a 3D representation of the NAO, showing its position, and what it sees through its top camera. Figure 7.12: The most relevant hand positions The implemented Motor System is simple, with obvious limitations; the movements that maximize the relevant signal are performed only if the state is known, and if the State-Action table has the corresponding entry. According to this, the table requires a lot of entries to produce 102 7.5 Conclusions an effective movement selection system. Furthermore, the motor training has run for a relatively short period of time. As a consequence, the State-Action table presented a rather limited extension, in comparison with the high dimension of the input representing all the possible states of the environment. Even more, although the brain has been proven to have an associative memory of sequences of patterns [11], the system here presented has no memory of previous actions. My results are coherent with the objective of the experiment: the robot moves using the linear combination of primitives, and it is able to learn what movements be perform to go from a known state to a state with a high reward. In addition, the experiment led to the creation of a sensor-motor map through the cognitive architecture. For all these reason, the implemented Motor System is an excellent starting point for the development of an effective system which allows the agent to move according to its goals. 7.5 Conclusions The aim of this work is the creation of a bio-inspired software architecture based on the processes that take place in the human brain; this architecture must be able to learn new goals, as well as to learn new actions to achieve such goals. The architecture has been successfully designed and implemented. My experiments have shown how the agent is able to keep memory of the past situations, and act accordingly to the achievement of its goals, whether innate or acquired. Besides this, the intrinsic dynamicity of the architecture allows the agent to acknowledge the changes in the environment that are independent from its own actions, and then recalibrate its actions for a particular situation. 103 8 Conclusions In this thesis I have explored the design of biologically inspired controllers, focusing on the investigation on two broad aspects: the low level computation mechanisms and the cognitive decision-making process. The common background shared by these two broad aspects is to exploit biological principles associated to the internal mechanisms of the brain. Generally, I have focused on these mechanisms widespread in the mammal brains, having a particular attention at the human brain. For these reasons, I have investigated two aspects of the human brain, namely, the visual dorsal pathway and the interaction among the thalamus, the amygdala, and the cortex. The visual dorsal pathway is interesting for several reasons. First, it contains a set of basic functionalities (spread in several brain’s areas), mandatory for both perceiving and acting, such as visual processing, object position estimation, sensory fusion, visuomotor mapping, and trajectories generation; second, it is tightly coupled with the arising of cognition, as pointed out in [92]; third, this pathway is widely studied both in neuroscience and engineering field, providing qualitative and quantitative results of comparison. On the other hand, the interaction among the thalamus, the amygdala, and the cortex is a proposal for modelling the emergence of goals and behaviours, totally inspired by neuroscientific evidences. Moreover, this interaction implies that the cognition is widespread in the brain because it emerges through the interaction of different brain areas. I have investigated the low level computational models of the cortex, looking at those computational principles widespread in the different functional areas. Specifically, I have focused on two fundamental cortical areas: the primary visual cortex (V1) and the posterior parietal cortex (PPC). The primary motor cortex (M1) is modelled without focusing on the particular underlying neural network. I have defined three sets of experiments to investigate each of the above mentioned areas. For each of them, I have imposed several constraints in order to have a proper testing environment for the generation of quantitative results. It is worth noting that one of the main objective of these experiments is to investigate several strategies of bioinspiration, comparing them in terms of underlying neural network, learning method, neural plasticity, 105 8 Conclusions and neurons recruitment. For this reason, models of several functional areas are proposed. The V1 model exploits the interaction among binocular neurons for computing the disparity map, the model of the PPC computes the coordinate transformation between different frame of references, and the model based on the Hering’s law computes the visuomotor mapping and the arm trajectory to reach the perceived target. The PPC and the Hering-based model have partially overlapped functionalities because both compute the visuomotor mapping and follow a motor babbling schema for their learning. In fact, the PPC model builds its mapping following an unsupervised learning schema whereas the Hering-based model is trained through a classical supervised technique. Furthermore, the PPC model has internal parameters to be tuned that are not related to the controlled body whereas the Hering-based model must estimate different parameters for different body shapes, meaning that the PPC model has a certain degree of independence from the robot shape. However, the Hering-based model computes also the arm trajectory in the 3D space for the target reaching, solving the problem’s complexity with a minimization of cost function. On the other hand, the V1 model deals with a widely studied brain area that processes the visual information coming from both eyes. With respect to both PPC and Hering-based model, the V1 model has a predefined underlying neural network where the focusing is on the mechanisms for filtering the population responses in order to make a robust estimation. The second aspect deeply investigated in this work is a cognitive architecture. The main difference among several proposals in literature is the emergence of a certain degree of cognition from modelling the interaction among several brain areas. The involved areas are the thalamus, that deals with the developed goals and behaviours, the amygdala, dealing with the innate goals and with the emotion, and the cortex, that is a massive network for the information processing. Even though these experiments of the low level computational mechanisms deal with cortical areas, in the cognitive experiment I have not focused on the specific low-level mechanisms but I have implemented a simplified version of the cortical areas. It permits to simplify the experiment’s complexity and to focus on the generation of new goals and behaviours due to the interaction among the thalamus, the amygdala, and the cortex. In order to discuss about the obtained results, these should be split in two main categories: the results obtained on the single experiment compared with the state of art, and the results obtained in terms of comparison among experiments at low-level and at the cognitive level. 106 The V1 model explicitly computes the disparity map including several well known mechanisms of the primary visual cortex and improving previously published results. Even though experiments on real data point out good performances, they are far from classical techniques based on computer vision. The PPC model introduces, for the first time and at the best of my knowledge, a neural architecture that is trained with an unsupervised method based on the Hebbian learning. It is able to learn from scratch the visuomotor mapping between the eye frame of reference and the arm frame of reference. Quantitative results are provided, but a comparison with the literature is only qualitative due to the lack of previous quantitative results. On the other hand, the Hering-based model provides a possible strategy to learn a visuomotor mapping and to reach a perceived target only knowing the retinal position of the target. This work is compared with other works and quantitative data and statistics are provided. A qualitative analysis with respect to the other works shows that this method is particularly suitable due to its robustness in the exploration of the surrounding space. Last, the cognitive architecture has been tested in two scenarios where it should be able to develop a new behaviours, starting from an innate criteria. In the first scenario, a qualitative analysis, supported by quantitative results, points out the capability of the system of both generating new goals and improving its neural representation of the sensory information through the interaction with the environment. In the second scenario, the architecture has been able both to learn a sensorimotor mapping through the interaction with the environment pursuing its own goals and to learn motor patterns. On the other side, the comparative analysis of the developed models, both cortical and cognitive, gives interesting results. First, low level computational models are implemented over computational frameworks that differ from each other, even if the results, compared with those of the state of art, point out a performance improvement. Second, these models can be applied, up to a training phase, to different technological solutions. For example, the camera calibration parameters are not needed by the V1 model, and the type of actuation is irrelevant for both PPC model and Hering-based model. Third, the cognitive architecture adapts to the specific body shape, exploiting its morphology to produce intelligent behaviours, without having a specific knowledge of the cortex representation of the incoming sensory information flow. Forth, a comparison among low-level computational mechanisms points out the flexibility of unsupervised techniques because they are able to emerge in both neurons receptive fields and network properties that are similar to the biological counterpart. Fifth, the cognitive architecture is composed by a totally biologically inspired architecture and the cognition emerges 107 8 Conclusions over a common neural lattice. Previous results show only part of the problem focusing on specific brain areas without investigating the big picture. How to merge different levels of computations, both cognitive and low-level, is one of the big challenges in biologically inspired robotics, and a qualitative comparison among several approaches it is the first step of this roadmap proposal. Even though, several approaches exist for developing the neural mechanisms, either supervised or not, it is worth noting that the learning strategies are different methods to exploit the same underlying architecture (see for example the similarities between [76] and [51]). In fact, all the proposed models share the same computational principles, such as population coding, feedback connections, and so on. Of course, this is not a conclusive list of principles and further efforts are required to develop a unique, homogeneous computational framework of the brain; this proposed roadmap is the basis for further experiments to investigate both the computational mechanisms and the motivations to use them. From the life sciences point of view, these models can help to give an insight on those underlying mechanisms for both low level computation and the arising of cognition. These models are based on the most recent advances in computational neuroscience, even though they are highly speculative. Each model implements well known mechanisms in neuroscience with a specific focus on both improving the performance and making hypotheses on the neural organization. For each of them I have introduced several improvements, in terms of both new computational mechanisms and neural architecture. The V1 model introduces the scale pooling mechanism, the PPC model is based on an unsupervised learning method, the Hering-based model introduces the decoupled control between eyes and the neck, and the IDRA model proposes a way of interaction among several brain areas, speculating on both the functionality and the timing of the interaction. All of these properties, even though based on previously developed brain models, should be validated through a comparison with the biological counterpart. In fact, scientists should confirm or deny these hypotheses, that I have introduced in my models, by means of the analysis of real data. From an engineering point of view, it could be possible to integrate the proposed low-level computational models in an unique architecture. However this is out of the aim of this thesis, that has aimed at investigating how bioinspiration is approached by the engineers and how the comparison among different techniques could emerge with a more plausible shared framework (see Table 3.1). In my opinion the unsupervised approach with a common neural lattice is the way to approach at least the problem at low level. In Chapter 5 I have shown how it is possible to 108 emerge a neural architecture that resembles the biological counterpart, with emerging computational mechanisms, such as gain field. Moreover, the comparison between [76] and [51] points out the suitability of designing a biologically inspired model of the primary visual cortex trained in an unsupervised fashion. Actually there are not concluding evidences that the unsupervised approach is the effective technique implemented in the brain for the organization of the network architecture, but it is quite clear that the unsupervised approach works for the definition of the receptive fields. On the other hand, the cognitive architecture is built on the same neural lattice of the cortical models; it uses unsupervised learning for developing new goals, and a self-generating reinforcement learning to represent the innate criteria. It indicates that cognition is highly correlated to the low-level computation, such as sensorimotor mapping, and it could overcome high level cognitive architectures based, for example, on Bayesian networks and purely probabilistic methods. A Bayesian brain could be considered biologically inspired too, since it performs computations that produce an output similar to the biological counterpart, especially in coordination task [61]. However, it is quite clear that a Bayesian brain does not model the complexity of a cognitive agent built on the same neural lattice and it does not take into account how the cognition arises on the network from the interaction with the environment. Despite the results presented in this thesis, the roadmap indicates further steps in the development of a cognitive architecture that merges cognitive aspects, the low level computations and the morphology of the robot. First, a common neural lattice should be provided for those computational models that mimic functional brain areas, given a common learning strategy, e.g. unsupervised. Second, these computational models should be integrated in the cognitive neural architecture. Third, experiments on different robotics setup must be performed to validate a truly biologically inspired architecture. 109 Bibliography [1] R. Adolphs and M. Spezio. Role of the amygdala in processing visual social stimuli. Progress in Brain Research, 156:363–378, 2006. [2] T. D. Albright, E. C. Kandel, and M. I. Posner. Cognitive neuroscience. Current Opinion in Neurobiology, 10:612–624, 2000. [3] J. S. Albus, D. T. Branch, C. Donald, and H. Perkel. A theory of cerebellar function. 1971. [4] R. A. Andersen and H. Cui. Intention, action planning, and decision making in parietal-frontal circuits. Neuron, 63:568–583, 2009. [5] R. A. Andersen, G. K. Essick, and R. M. Siegel. Encoding of spatial location by posterior parietal neurons. Science, 230:456– 458, 1985. [6] G. Aragon-Camarasa, H. Fattah, and J. P. Siebert. Towards a unified visual framework in a binocular active robot vision system. Robotics and Autonomous Systems, 58(3):276–286, 2010. [7] B. D. Argall, S. Chernova, M. Veloso, and B. Browning. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57:469–483, 2009. [8] M. Asada, K. Hosoda, Y. Kuniyoshi, H. Ishiguro, T. Inui, Y. Yoshikawa, M. Ogino, and C. Yoshida. Cognitive developmental robotics: A survey. IEEE Transactions on Autonomous Mental Development, 1(1):12–34, 2009. [9] Y. Bar-Cohen. Biological senses as inspiring model for biomimetic sensors. IEEE Sensors Journal, 11(12):3194–3201, 2011. [10] P. Bernier and G. T. Grafton. Human posterior parietal cortex flexibly determines reference frames for reaching based on sensory contex. Neuron, 68:776–788, 2010. [11] D. F. Bjorklund. Children’s Thinking: Cognitive Development and Individual Differences. 2004. 111 Bibliography [12] M. Brozovic, L. F. Abbott, and R. A. Andersen. Mechanism of gain modulation at single neuron and network levels. Journal of Computational Neuroscience, 25:158–168, 2008. [13] C. A. Bumeo and R. A. Andersen. The posterior parietal cortex: Sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia, 44:2594–2606, 2006. [14] S. A. Bunge. How we use rules to select actions: A review of evidence from cognitive neuroscience. Cognitive, Affective, and Behavioural Neuroscience, 4(4):564–579, 2004. [15] M. Carandini and D. J. Heeger. Normalization as a canonical neural computation. Nature Neuroscience, 13:51–62, 2012. [16] W. Chaney. Dynamic Mind. 2007. [17] L. L. Chen. Head movements evoked by electrical stimulation in the frontal eye field of the monkey: evidence for independent eye and head control. Journal of Neurophysiology, 95:3528–3542, 2006. [18] S. Chen, Y. Li, and N. M. Kwok. Active vision in robotic systems: A survey of recent developments. The International Journal of Robotics Research, 2011. [19] Y. Chen and N. Qian. Coarse-to-fine disparity energy model with both phase-shift and position-shift receptive field mecahanisms. Neural Computation, 16:1545–1577, 2004. [20] M. Chessa, S. P. Sabatini, and F. Solari. A fast joint bioinspired algorithm for optic flow and two-dimensional disparity estimation. Computer Vision System, pages 184–193, 2009. [21] E. Chinellato, M. Antonelli, B. J. Grzyb, and A. P. del Pobil. Implicit sensorimotor mapping of the peripersonal space by gazing and reaching. IEEE Transactions on Autonomous Mental Development, 3(1):43–53, 2011. [22] E. Chinellato, B. J. Grzyb, N. Marzocchi, A. Bosco, P. Fattori, and A. P. del Pobil. The dorso-medial visual stream: From neural activation to sensorimotor interaction. Neurocomputing, 74:1203– 1212, 2011. [23] R. Chrisley. Embodied artificial intelligence. Artificial Intelligence, 2003. 112 Bibliography [24] R. Chrisley. Philosophical foundations of artificial consciousness. Artificial Intelligence in Medicine, 44:119–137, 2008. [25] P. S. Churchland, J. P. Cunningham, M. T. Kaufman, J. D. Foster, P. Nuyujukian, S. I. Ryu, and K. V. Shenoy. Neural population dynamics during reaching. Nature, 487:51–56, 2012. [26] P. Cisek and J. F. Kalaska. Neural mechanisms for interacting with a world full of action choices. Annual Review of Neuroscience, 33:269–298, 2010. [27] B. G. Cumming and G. C. DeAngelis. The physiology of stereopsis. Annual Review of Neuroscience, 24:203–238, 2001. [28] K. De Meyer and M. W. Spratling. Multiplicative gain modulation arises through unsupervised learning in a predictive coding model of cortical function. Neural Computation, 23:1536–1567, 2011. [29] S. Deneve, P. E. Latham, and A. Pouget. Efficient computation and cue integration with noisy population codes. Nature Neuroscience, 18:826–831, 2001. [30] J. e. a. Doyon. Contributions of the basal ganglia and functionally related brain structures to motor learning. Behavioral Brain Research, 199(1):61–75, 2009. [31] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley and Sons, 2001. [32] S. Duncan and L. F. Barrett. The role of the amygdala in visual awareness. Trends in Cognitive Sciences, 11(5):190–192, 2007. [33] E. V. Evarts. Relation of pyramidal tract activity to force exerted during voluntary movement. Journal of Neurophysiology, 31(1):14– 27, 1968. [34] J. R. Flanagan and D. J. Ostry. Trajectories of human multi-joint arm movements: Evidence of joint level planning. Experimental Robotics I, page 594–613, 1990. [35] D. Fleet, H. Wagner, and T. Sejnowski. Neural encoding of binocular disparity: Energy models, position shifts and phase shifts. Vision Research, 36:1839–1857, 1996. [36] D. Floreano, P. Durr, and C. Mattiussi. Neuroevolution: from architectures to learning. Evolutionary Intelligence, 1:47–62, 2008. 113 Bibliography [37] D. Floreano and L. Keller. Evolution of adaptive behaviour in robots by means of darwinian selection. PLoS Biology, 8:1–8, 2010. [38] A. Georgopoulos, J. Kalaska, R. Caminiti, and J. Massey. On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. Journal of Neuroscience, 2:1527–1537, 1982. [39] A. P. Georgopoulos, A. B. Schwartz, and R. E. Kettner. Neuronal population coding of movement direction. Science, 233:1416–1419, 1986. [40] A. Gibaldi, A. Canessa, M. Chessa, S. P. Sabatini, and F. Solari. A neuromorphic control module for real-time vergence eye movements on the icub robot head. In Proc. 11th IEEE-RAS Int Humanoid Robots (Humanoids) Conf, pages 543–550, 2011. [41] A. Gibaldi, M. Chessa, A. Canessa, S. P. Sabatini, and F. Solari. A cortical model for binocular vergence control without explicit calculation of disparity. Neurocomputing, 73:1065–1073, 2010. [42] C. Glaser, F. Joublin, and C. Goerick. Learning and use of sensorimotor schemata maps. In Proc. IEEE 8th Int. Conf. Development and Learning ICDL 2009, pages 1–8, 2009. [43] D. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley: Reading, 1989. [44] C. B. Hart and S. F. Giszter. Modular premotor drives and unit bursts as primitives for frog motor behaviors. The Journal of Neuroscience, 24:5269–5282, 2004. [45] N. G. Hatsopoulos and A. J. Suminski. Sensing with the motor cortex. Neuron, 72:477–487, 2011. [46] N. J. Hemion, F. Joublin, and K. J. Rohlfing. A competitive mechanism for self-organized learning of sensorimotor mappings. In Proc. IEEE Int Development and Learning (ICDL) Conf, volume 2, pages 1–6, 2011. [47] A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane current and application to conduction and excitation in nerve. The Journal of physiology, 117:500–544, 1954. [48] M. Hoffmann, H. Marques, A. Arieta, H. Sumioka, M. Lungarella, and R. Pfeifer. Body schema in robotics: A review. IEEE Transactions on Autonomous Mental Development, 2(4):304–324, 2010. 114 Bibliography [49] P. O. Hoyer and A. Hyvarinen. Independent component analysis applied to feature extraction from colour and stereo images. Network: Computation in Neural Systems, 11:191–210, 2000. [50] T. C. Hsia. Adaptive control of robot manipulators - a review. In International Conference on Robotics and Automation, 1986. [51] A. Hyvarinen and P. O. Hoyer. A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images. Vision Research, 41:2413– 2423, 2001. [52] A. Hyvarinen, P. O. Hoyer, and M. Inki. Topographic independent component analysis. Neural Computation, 13:1527–1558, 2001. [53] A. Hyvärinen and E. Oja. Independent component analysis: Algorithms and applications. Neural Networks, 13:411–430, 2000. [54] A. J. Ijspeert, N. J., and S. Schaal. Movement imitation with nonlinear dynamical systems in humanoid robots. In IEEE International Conference on Robotics and Automation, 2002. [55] E. M. Izhikevich. Which model to use for cortical spiking neurons? IEEE Transactions On Neural Networks, 15(5), 2004. [56] R. S. Jackendoff. Consciousness and the computational mind, volume 100. 1987. [57] R. S. Jackendoff and F. Lerdahl. The capacity for music: what is it, and what’ s special about it? Cognition, 100:33–72, 2006. [58] B. Julesz. Foundations of Cyclopean Perception. University of Chicago Press, 1971. [59] E. R. Kandel, J. Schwartz, and T. M. Jessell. Principles of Neural Science. 2000. [60] W. M. King. Binocular coordination of eye movements - hering’s law of equal innervation or uniocular control? European Journal of Neuroscience, 33:2139–2146, 2011. [61] D. C. Knill and A. Pouget. The bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences, 27(12):712–719, 2004. [62] N. Kyriakoulis, A. Gasteratos, and S. G. Mouroutsos. Fuzzy vergence control for an active binocular vision system. In Proc. 7th 115 Bibliography IEEE Int. Conf. Cybernetic Intelligent Systems CIS 2008, pages 1–5, 2008. [63] J. C. K. Lai, M. P. Schoen, A. Perez Gracia, D. S. Naidu, and S. W. Leung. Prosthetic devices: Challenges and implications of robotic implants and biological interfaces. Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine, 221(2):173–183, 2007. [64] V. Lawhern, W. Wu, N. G. Hatsopoulos, and L. Paninski. Population decoding of motor cortical activity using a generalized linear model with hidden states. Journal of Neuroscience Methods, 189:267–280, 2010. [65] T. S. Lee. Image representation using 2d gabor wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10), 1996. [66] M. Lungarella, G. Metta, R. Pfeifer, and G. Sandini. Developmental robotics: a survey. Connection Science, 15:151–190, 2003. [67] M. Lungarella and O. Sporns. Mapping information flow in sensorimotor networks. PLos Computational Biology, 2, 2006. [68] R. Manzotti and F. Mutti. Machine consciousness through goal generation. In IEEE Symposium Series on Computational Intelligence, 2013. Accepted. [69] R. Manzotti, F. Mutti, S. Y. Lee, and G. Gini. A model of a middle level of cognition based on the interaction among the thalamus, amygdala, and the cortex. In IEEE International Conference on Systems, Man, and Cybernetics, 2012. [70] R. Manzotti and V. Tagliasco. From behavior-based robots to motivation-based robots. Robotics and Autonomous Systems, 51:175–190, 2005. [71] M. Meisel, V. Pappas, and L. Zhang. A taxonomy of bologically inspired research in computer networking. Computer Networks, 54:901–916, 2010. [72] M. Milford, G. Wyeth, and D. Prasser. Ratslam: a hippocampal model for simultaneous localization and mapping. In IEEE International Conference on Robotics and Automation, 2004. 116 Bibliography [73] E. A. Murray and S. P. Wise. Interactions between orbital prefrontal cortex and amygdala: advanced cognition, learned responses and instinctive behaviors. Current opinion in neurobiology, 20(2):212– 220, 2010. [74] F. A. Mussa-Ivaldi and E. Bizzi. Motor learning through the combination of primitives. Philosophical Transactions of the Royal Society Lond. B Biological Sciences, 355:1755–1769, 2000. [75] F. Mutti, C. Alessandro, M. Angioletti, A. Bianchi, and G. Gini. Learning and evaluation of a vergence control system inspired by hering’s law. In IEEE Int. Conf. on Biomedical Robotics and Biomechatronics (BIOROB), 2012. [76] F. Mutti and G. Gini. Bio-inspired disparity estimation system from energy neurons. In ICABB, 2010. [77] F. Mutti and G. Gini. Bio-inspired vision system for depth perception in humanoids. In CogSys, 2010. poster. [78] F. Mutti and G. Gini. Bioinspired vergence control system: learning and quantitative evaluation. In CogSys, 2012. poster. [79] F. Mutti and G. Gini. Visuomotor mapping based on hering’s law for a redundant active stereo head and a 3 dof arm. In BIONETICS, 2012. [80] F. Mutti, G. Gini, M. Burrafato, L. Florio, and R. Manzotti. Developing new sensor-motor goals in a bioinspired architecture for evolutionary agents. Unpublished, 2013. [81] F. Mutti, R. Manzotti, G. Gini, and S. Y. Lee. Implementation and evaluation of a goal-generating agent through unsupervised learning on nao robot. In Biologically Inspired Cognitive Architectures, 2012. [82] F. Mutti, H. Marques, and G. Gini. A model of the visual dorsal pathway for computing coordinate transformations: an unsupervised approach. In Biologically Inspired Cognitive Architectures, 2012. [83] A. L. Nelson, G. J. Barlow, and L. Doitsidis. Fitness functions in evolutionary robotics: A survey and analysis. Robotics and Autonomous Systems, 57:345–370, 2009. 117 Bibliography [84] R. Nieuwenhuys, J. Voogd, and C. van Huijzen. The Human Central Nervous System: A Synopsis and Atlas. Steinkopff, Amsterdam, 2007. [85] I. Ohzawa. Mechanisms of stereoscopic vision: the disparity energy model. Current Opinion in Neurobiology, 1998. [86] I. Ohzawa, G. C. De Angelis, and R. D. Freeman. Stereoscopic depth discrimination in the visual cortex: neurons ideally suited as disparity detectors. Science, 249:1037–1041, 1990. [87] B. A. Olshausen and D. J. Field. Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14:481–487, 2004. [88] L. Paninski, M. R. Fellows, N. G. Hatsopoulos, and J. P. Donoghue. Spatiotemporal tuning of motor cortical neurons for hand position and velocity. Journal of Neurophysiology, 91:515– 532, 2004. [89] A. E. Patla, T. W. Calvert, and R. B. Stein. Model of a pattern generator for locomotion in mammals. AJP - Regulatory Integrative and comparative Physiology, 248:484–494, 1985. [90] W. Penfield and E. Boldrey. Somatic motor and sensory representation in the cerebral cortex of man as studied by electrical stimulation. Brain, 60:389–443, 1937. [91] R. Pfeifer and F. Iida. Morphological computation: connecting body, brain, and environment. Japanese Scientific Monthly, 2005. [92] R. Pfeifer, F. Iida, and J. Bongard. New robotics: Design principles for intelligent systems. Artificial Life, 11:99–120, 2005. [93] R. Pfeifer, M. Lungarella, and F. Iida. Self-organization, embodiment, and biologically inspired robotics. Science, 318:1088–1093, 2007. [94] R. Pfeifer and C. Scheier. Understanding intelligence. The MIT Press, 2001. [95] A. Pouget and T. J. Sejnowski. Spatial transformations in parietal cortex using basis functions. Journal of Cognitive Neuroscience, 9(2):222–237, 1997. [96] N. Qian. Computing stereo disparity and motion with known binocular cell proprieties. Neural Computation, 6:390–404, 1994. 118 Bibliography [97] N. Qian. Binocular disparity and the perception of depth. Neuron, 18:359–368, 1997. [98] C. Qu and B. E. Shi. The role of orientation diversity in binocular vergence control. In Proc. Int Neural Networks (IJCNN) Joint Conf, pages 2266–2272, 2011. [99] P. Rakic. Evolution of the neocortex: a perspective from developmental biology. Nature, 10:724–735, 2009. [100] R. Saegusa, G. Metta, and G. Sandini. Own body perception based on visuomotor correlation. In Proc. IEEE/RSJ Int Intelligent Robots and Systems (IROS) Conf, pages 1044–1051, 2010. [101] R. Saegusa, G. Metta, G. Sandini, and S. Sakka. Active motor babbling for sensorimotor learning. In Proc. IEEE Int. Conf. Robotics and Biomimetics ROBIO 2008, pages 794–799, 2009. [102] E. Salinas and L. F. Abbott. Coordinate transformations in the visual system: How to generate gain fields and what to compute with them. Progress Brain Research, 130:175–190, 2001. [103] J. G. Samarawickrama and S. P. Sabatini. Version and vergence control of a stereo camera head by fitting the movement into the Hering’s law. In Proc. Fourth Canadian Conf. Computer and Robot Vision CRV ’07, pages 363–370, 2007. [104] S. Schaal, A. Ijspeert, and A. Billard. Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society Lond. B Biological Sciences, 358:537–547, 2003. [105] S. Schaal, J. Peters, J. Nakanishi, and A. J. Ijspeert. Learning movement primitives, page 561–572. 2005. [106] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47:7–42, 2002. [107] T. Sejnowski, C. Koch, and P. S. Churchland. Computational neuroscience. Science, 241:1299–1306, 1988. [108] R. Shadmehr and J. W. Krakauer. A computational neuroanatomy for motor control. Experimental Brain Research, 185:359–381, 2008. [109] J. Sharma, A. Angelucci, and M. Sur. Induction of visual orientation modules in auditory cortex. Nature, 404:841–849, 2000. 119 Bibliography [110] S. Sherman and R. Guillery. Exploring the Thalamus. 2000. [111] K. Shimonomura, T. Kushima, and T. Yagi. Binocular robot vision emulating disparity computation in the primary visual cortex. Neural Networks, 21:331–340, 2008. [112] K. Shimonomura and T. Yagi. Neuromorphic vergence eye movement control of binocular robot vision. In Proc. IEEE Int Robotics and Biomimetics (ROBIO) Conf, pages 1774–1779, 2010. [113] B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo. Robotics: modelling, planning and control. 2011. [114] E. P. Simoncelli. Vision and the statistics of the visual environment. Current Opinion in Neurobiology, 13:144–149, 2003. [115] O. Sporns. Networks of the Brain. 2010. [116] P. S. G. Stein. Neuronal control of turtle hindlimb motor rhythms. Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 191:213–229, 2005. [117] W. Sun and B. E. Shi. Joint development of disparity tuning and vergence control. In Proc. IEEE Int Development and Learning (ICDL) Conf, volume 2, pages 1–6, 2011. [118] K. A. Thoroughman and R. Shadmehr. Learning of action through adaptive combination of motor primitives. Nature, 407:742–747, 2000. [119] E. Todorov. Bayesian Brain, chapter Optimal Control Theory, pages 269–298. 2006. [120] E. Todorov and M. I. Jordan. Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5(11):1226–1235, 2002. [121] J. Tsai and J. D. Victor. Reading a population code: A multiscale neural model for representing binnocular disparity. Vision Research, 43:445–466, 2003. [122] E. K. C. Tsang, S. Y. M. Lam, Y. Meng, and B. E. Shi. Neuromorphic implementation of active gaze and vergence control. In Proceedings of the IEEE International Symposium on Circuits and Systems, pages 1076–1079, 2008. 120 Bibliography [123] E. K. C. Tsang and B. E. Shi. Estimating disparity with confidence from energy neurons. Advances in Neural Information Processing Systems, 20, 2007. [124] E. K. C. Tsang and B. E. Shi. Disparity estimation by pooling evidence from energy neurons. IEEE Transactions On Neural Network, 20(11), 2009. [125] K. Vassie and G. Morlino. Natural and Artificial Systems: Compare, Model or Engineer? 2012. [126] D. Vernon, G. Metta, and G. Sandini. A survey of artificial cognitive systems: Implications for the autonomous development of mental capabilities in computational agents. IEEE Transactions On Evolutionary Computation, 11(2):151–180, 2007. [127] J. F. V. Vincent, O. A. Bogatyreva, N. R. Bogatyrev, A. Bowyer, and A. K. Pahl. Biomimetics: its pratice and theory. Journal of Royal Society Interface, 3, 2006. [128] Y. Wang and B. E. Shi. Autonomous development of vergence control driven by disparity energy neuron populations. Neural Computation, 22:730–751, 2010. [129] Y. Wang and B. E. Shi. Improved binocular vergence control via a neural network that maximizes an internally defined reward. IEEE Transactions on Autonomous Mental Development, 3(3):247–256, 2011. [130] B. Webb. Can robots make good models of biological behaviour? Behavioural and Brain Sciences, 24(6):1033–1050, 2001. [131] J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, and E. Thelen. Autonomous mental development by robots and animals. Science, 291:599–600, 2001. [132] W. Wu, Y. Gao, E. Bienenstock, J. P. Donoghue, and M. J. Black. Bayesian population decoding of motor cortical activity using a kalman filter. Neural Computation, 18(1):80–118, 2006. [133] W. Wu, J. E. Kulkarni, N. G. Hatsopoulos, and L. Paninski. Neural decoding of hand motion using a linear state-space model with hidden states. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 17(4):370–378, 2009. 121 Bibliography [134] J. Xing and R. A. Andersen. Models of the posterior parietal cortex which perform multimodal integration and represent space in several coordinate frames. Journal of Cognitive Neuroscience, 12(4):601–614, 2000. [135] D. Zipser and R. A. Andersen. A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature, 331:679–684, 1988. 122