Download Towards the integration of neural mechanisms and cognition in

Document related concepts

Brain Rules wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Neuroscience in space wikipedia , lookup

Activity-dependent plasticity wikipedia , lookup

Neuropsychology wikipedia , lookup

Enactivism wikipedia , lookup

Neural oscillation wikipedia , lookup

Embodied language processing wikipedia , lookup

Catastrophic interference wikipedia , lookup

Brain–computer interface wikipedia , lookup

Neuroanatomy wikipedia , lookup

Executive functions wikipedia , lookup

Artificial general intelligence wikipedia , lookup

Optogenetics wikipedia , lookup

Neuroethology wikipedia , lookup

Neural coding wikipedia , lookup

Human brain wikipedia , lookup

Cognitive neuroscience of music wikipedia , lookup

Environmental enrichment wikipedia , lookup

Binding problem wikipedia , lookup

Neurocomputational speech processing wikipedia , lookup

Central pattern generator wikipedia , lookup

Neuroinformatics wikipedia , lookup

Time perception wikipedia , lookup

Aging brain wikipedia , lookup

Connectome wikipedia , lookup

Artificial neural network wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Cortical cooling wikipedia , lookup

Neuroplasticity wikipedia , lookup

Synaptic gating wikipedia , lookup

Neural modeling fields wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Biological neuron model wikipedia , lookup

Visual servoing wikipedia , lookup

Neuroesthetics wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Neurophilosophy wikipedia , lookup

Convolutional neural network wikipedia , lookup

Neural engineering wikipedia , lookup

Neuroeconomics wikipedia , lookup

Cognitive neuroscience wikipedia , lookup

Development of the nervous system wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Neural correlates of consciousness wikipedia , lookup

Efficient coding hypothesis wikipedia , lookup

Recurrent neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Neural binding wikipedia , lookup

Metastability in the brain wikipedia , lookup

Nervous system network models wikipedia , lookup

Transcript
Politecnico di Milano
Dipartimento di Elettronica e Informazione
PhD program in Information Technology
Towards the integration of neural
mechanisms and cognition in
biologically inspired robots
Doctoral Dissertation of:
Flavio Mutti
Advisor:
Prof. Giuseppina Gini
Tutor:
Prof. Andrea Bonarini
Supervisor of the Doctoral Program:
Prof. Carlo Fiorini
2012 - XXV edition
Politecnico di Milano
Dipartimento di Elettronica e Informazione
Piazza Leonardo da Vinci 32
I 20133 — Milano
Politecnico di Milano
Dipartimento di Elettronica e Informazione
PhD program in Information Technology
Towards the integration of neural
mechanisms and cognition in
biologically inspired robots
Doctoral Dissertation of:
Flavio Mutti
Advisor:
Prof. Giuseppina Gini
Tutor:
Prof. Andrea Bonarini
Supervisor of the Doctoral Program:
Prof. Carlo Fiorini
2012 - XXV edition
A Fernanda
v
Acknowledgements
Eccoci qua. Se sto scrivendo queste parole significa che sono arrivato
in fondo a questo lavoro. A ben pensarci questa è la terza volta che
scrivo dei ringraziamenti in un lavoro di tesi, ma forse questi sono quelli
più speciali perchè difficilmente avrò voglia di scriverne un’altra. Non è
stato facile arrivare fino in fondo; mi sono impegnato, mi sono sforzato
ma devo ammetere che molti hanno contribuito alla buona riusciuta di
questa impresa.
Prima tra tutti, vorrei ringraziare la prof.ssa Gini che mi ha guidato
ed insegnato in tutto il programma di dottorato. Un grazie di cuore.
Inoltre, ringrazio il prof. Zanero e Federico Maggi per avermi guidato
nello sviluppo di uno dei miei temi di ricerca preferiti.
In secondo luogo, ma non secondi, vorrei ringraziare la mia famiglia
che in quasi 10 anni di università non mi hanno mai fatto mancare nulla,
supportandomi nelle mie scelte. Luisella, Giorgio, Sante, Gino grazie.
Melania, un grazie particolare a te, che mi sei stata e mi sei vicino in
questo importante traguardo.
A Riccardo, con cui ho condiviso e tuttora condivido la passione per
la scienza e la ricerca (mi aspetto il Nobel da entrambi).
Al prof. Pfeifer, Cristiano, Hugo, Naveen, JP, Mat per il magnifico
periodo passato a Zurigo. Ho imparato tanto lavorando con voi.
Ad Ale e Paolo, con cui condivido una nuova avventura lavorativa.
Grazie, per il supporto, l’incoraggiamento e per il cactus da ufficio che
mi avete regalato.
A Nicola, il mio collega di avventure nelle ostili terre dei dottorandi.
Zurigo, Dubrovnik, Vienna per citarne alcune.
Ad Alessio, per le mangiate, le risate e le serate milanesi. Anche tu
ormai conosci gioie e dolori del dottorato.
A Camilla, che mi ha aiutato a scrivere questa tesi in un inglese
che non siano angloitaliano. E meno male che i ringraziamenti li sto
scrivendo nella mia lingua madre.
A Vittorio, il Prazzo, Ago e Filippo. (Ex-)Abitanti della baracca di
Milano.
Ai ragazzi del laboratorio del primo piano: Michele, Giampaolo, Gerardo, Alessandro, Ettore. Avete condiviso con me i pranzi per tre anni,
non crediate di liberarvi di me tanto facilmente.
vii
Agli amici dell’amministrazione del DEIB per le pause caffè, la simpatia, e le scommesse sul team Mutti-Vitucci.
Agli amici di Piacenza, non posso elencarvi tutti ma sappiate che
intendo proprio voi.
Ai miei compagni di squadra dello ZeroNove, fuorza!
viii
Abstract
How intelligence arises in humans is far to be completely unveiled. Understanding the brain mechanisms that make it possible is one of the
most interesting and debated topics in neuroscience. However, recent
advances speculate about that this is only half part of the story. Intelligent behaviours in humans could emerge from a good balance among
several factors, namely, the brain, the body, sensors, actuators, and the
environment. Even though no conclusive evidences about the truth of
this theory are available, it is very promising.
Beyond the great relevance for science, the natural application of these
studies is the robotics field. In the last decades, several approaches have
been proposed to design intelligent machines on the basis of scientific
findings, especially from neuroscience. The underlying idea of these
approaches is to transfer knowledge from neuroscience and biomechanics
to the designing of biologically inspired robots. Although the design
of a mechanical structure mimicking the biological counterpart is still
an affordable task, it becomes less true for the design of underlying
neural mechanisms both for controlling the body and for the emergence
of cognition.
Even tough the biologically inspired robotics is a very active research
field, several solutions have been proposed, from evolutionary approaches
to developmental robotics. However, a solution that encodes on the same
neural lattice low level computational mechanisms and the emergence of
cognition is still missing.
In this thesis a comprehensive study of this topic is presented. First, I
design several low level computational models composing the visual dorsal pathway, which is devoted to one of the most relevant functionalities
provided by the brain: the reaching task. A comparison with the state
of art is presented. Second, a comparison among previously developed
models inspires the proposal for a common computational framework
that should embody brain computational principles, such as population
coding. Third, a biologically inspired cognitive architecture is proposed.
The architecture develops a middle level of cognition, filling the gap between the low level computational mechanisms and symbolic reasoning.
This cognitive architecture is able to generate new goals and behaviours
from previous ones, exploiting the synergy among the thalamus, the
ix
amygdala, and the cortex. Finally, a proposal for a roadmap towards
a fully integrated biologically inspired architecture is presented. This
architecture exploits the synergy between the low level computational
mechanisms and the proposed cognitive architecture.
x
Sommario
Come l’intelligenza emerge dal cervello umano è ancora un problema
aperto per la comunità scientifica. Comprendere come i meccanismi neurali del cervello permettano il sorgere degli aspetti cognitivi è uno dei più
interessanti e dibattuti argomenti nel campo delle neuroscienze. I comportamenti intelligenti potrebbero emergere come giusto bilanciamento
tra diversi fattori, quali: il cervello, il corpo, i sensori, i meccanismi di
attuazione, e l’ambiente in cui è immerso il soggetto.
Nonostante non ci siamo prove conclusive che l’interazione tra le sopracitate componenti sia condizione necessaria e sufficiente per l’emergere
dell’intelligenza, la teoria proposta è promettente. Al di là della grande
rilevanza scientifica di questo argomento, la naturale applicazione di
questi studi è la robotica. Negli ultimi decenni sono stati portati avanti
numerosi progetti di ricerca il cui obiettivo era di sviluppare macchine
intelligenti, inspirate da studi neuroscientifici. Ne segue che la progettazione può riguardare sia la parte meccanica del robot, sia la parte di
elaborazione delle informazioni. Se la progettazione di parti meccaniche
ispirate dalla natura è ancora un compito ragionevolemente risolvibile,
questo diventa più complitato nel momento in cui si voglia modellare
il cervello sia per il controllo del robot che per l’emergere di aspetti
cognitivi.
Il campo della robotica bioinspirata è un campo di ricerca molto attivo, e diverse soluzioni sono state proposte, dall’evolutionary robotics
alla developmental robotics. Tuttavia, una soluzione, che incorpori nello
stesso modello neurale sia gli aspetti cognitivi che gli aspetti di basso
livello per il controllo del corpo, è ancora mancante.
In questa tesi viene proposto uno studio comprensivo di diversi meccanismi neurali, elicitandone punti di forza e di debolezza. Primo, ho proposto diverse architetture neurali che modellano aree corticali afferenti al
percorso visivo dorsale, il cui compito principale è risolvere il problema
del raggiungimento degli oggetti percepiti attraverso il sistema visivo.
Secondo, ho confrontato i modelli sopracitati per proporre un framework computazionale comune che incorpori quei principi computazionali
diffusi in tutte le aree del cervello. Un esempio di questi principi computazionali è il cosidetto population coding. Terzo, ho sviluppato una
architettura cognitiva biologicamente inspirata. L’architettura sviluppa
xi
un livello cognitivo intermedio, facendo da ponte tra i meccanismi computazionali di basso livello e il ragionamento simbolico. Questa architettura è in grado di apprendere nuovi obiettivi e comportamenti basandosi
sull’interazione di alcune aree cerebrali, quali: il talamo, la corteccia, e
l’amigdala. Infine, viene proposta una roadmap per lo sviluppo di una
architettura biologicamente inspirata che tenga conto sia degli aspetti
computazionali di basso livello che degli aspetti cognitivi.
xii
Contents
1 Introduction
1.1 Problem definition . . . . . .
1.2 Biologically inspired solutions
1.3 The aim of this work . . . . .
1.4 Thesis organization . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
5
6
2 Related works
2.1 Overview . . . . . . . . . .
2.2 Neuroscience . . . . . . . .
2.3 Biomimetics . . . . . . . . .
2.4 Robotics . . . . . . . . . . .
2.4.1 Industrial Robots . .
2.4.2 Autonomous Robots
2.5 Closing remarks . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
9
11
12
13
15
18
. .
. .
. .
. .
. .
. .
. .
ar. .
.
.
.
.
.
.
.
19
19
19
22
24
26
27
31
.
.
.
.
.
.
.
3 Motivations
3.1 How the mammal brain executes reaching . . . . . .
3.1.1 Background . . . . . . . . . . . . . . . . . . .
3.1.2 The primary visual cortex . . . . . . . . . . .
3.1.3 The posterior parietal cortex . . . . . . . . .
3.1.4 The primary motor cortex . . . . . . . . . . .
3.2 Modelling a biological architecture . . . . . . . . . .
3.3 Proposed neural models and their role . . . . . . . .
3.4 The proposal of a roadmap for developing bioinspired
chitectures . . . . . . . . . . . . . . . . . . . . . . . .
4 A Primary Visual Cortex Model for Depth
4.1 Related works . . . . . . . . . . . . . .
4.2 Neural Model . . . . . . . . . . . . . .
4.2.1 Image preprocessing . . . . . .
4.2.2 Disparity energy neurons . . .
4.2.3 Neural Architecture . . . . . .
4.2.4 Disparity direction . . . . . . .
4.3 Experimental Results . . . . . . . . . .
4.4 Conclusions . . . . . . . . . . . . . . .
Perception
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 34
.
.
.
.
.
.
.
.
37
38
40
40
40
42
44
45
48
xiii
Contents
5 A Posterior Parietal Cortex Model to Solve the Coordinate
Transformations Problem
5.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Neural Architecture . . . . . . . . . . . . . . . . . . . . .
5.2.1 Sensory representation . . . . . . . . . . . . . . . .
5.2.2 Posterior Parietal Cortex model . . . . . . . . . . .
5.2.3 Head-centered network layer . . . . . . . . . . . .
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . .
5.3.1 Experiment with retinal position in degrees . . . .
5.3.2 Experiment with a simplified camera model . . . .
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .
49
49
51
52
52
53
55
56
57
57
6 A Visuomotor Mapping Using an Active Stereo Head
troller Based on the Hering’s Law
6.1 Related works . . . . . . . . . . . . . . . . . . . . . .
6.2 Neural Architecture . . . . . . . . . . . . . . . . . .
6.2.1 Hering-based Control system . . . . . . . . .
6.2.2 Extending the Hering-based Control system .
6.2.3 Visuomotor mapping for a 3 DoF arm . . . .
6.3 Experimental Results . . . . . . . . . . . . . . . . . .
6.3.1 Hering-based results . . . . . . . . . . . . . .
6.3.2 Extended Hering-based results . . . . . . . .
6.3.3 Visuomotor mapping results . . . . . . . . . .
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . .
7 A model of a middle level of cognition
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
7.2 Biological model . . . . . . . . . . . . . . . . . . .
7.3 Implementation Model . . . . . . . . . . . . . . . .
7.3.1 Intentional Distributed Robot Architecture
7.3.2 Motor system: movement generations . . .
7.4 Experimental results . . . . . . . . . . . . . . . . .
7.4.1 Setup . . . . . . . . . . . . . . . . . . . . .
7.4.2 Goal generation . . . . . . . . . . . . . . . .
7.4.3 Movement generation . . . . . . . . . . . .
7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
Con.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
59
60
62
62
64
66
69
69
71
74
78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
81
81
84
86
86
93
96
96
96
99
103
8 Conclusions
105
Bibliography
111
xiv
List of Figures
2.1
2.2
3.1
3.2
3.3
3.4
3.5
4.1
The sets show research fields that are involved in the development of this work. The robotics field represents the
classical approach to robotics; the neuroscience field is related to the neuroscientific advances in the comprehension
of the brain functionalities; the biomimetic is the field related to mimicking biological solutions to solve specific
problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
The classical robot model with a closed loop of sensing
and actuation. The information flow is processed in a
serial way. . . . . . . . . . . . . . . . . . . . . . . . . . . 12
The anatomy of the dorsal pathway . . . . . . . . . . . .
The anatomy of the primary visual cortex area (V1) . . .
The anatomy of the posterior parietal cortex area (PPC) .
The anatomy of the primary motor cortex area (M1) . . .
This sketch represents the 3-layers architecture, which
supports this work. The architecture is composed by 3
layers. The mechanics and sensors layer is the physical
robot (or its model in case of simulation). The electrical/actuators layer is the bridge between the low-level
neural circuits and the robot; it is the control interface
and it implements how the neural activity is translated in
actuation. The Neural lattice layer is the brain model and
it is fairly composed by at least two sublayers: the neural
circuits and the cognition. The neural circuits layer contains the biologically inspired models of the brain functional areas; their main functionalities are the information processing, the sensorimotor mapping, and the motor
representation. The cognitive layer contains other neural
circuits that elaborate information at an higher level, taking into account motivations and goals. In my study, each
layer can communicate either with the lower or the higher
one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
22
24
26
28
Proposed neural architecture . . . . . . . . . . . . . . . . 42
xv
List of Figures
xvi
4.2
Cones estimation . . . . . . . . . . . . . . . . . . . . . . . 45
4.3
Teddy estimation . . . . . . . . . . . . . . . . . . . . . . . 46
4.4
Venus estimation . . . . . . . . . . . . . . . . . . . . . . . 46
4.5
Tsukuba estimation . . . . . . . . . . . . . . . . . . . . . 47
4.6
Comparison among the proposed neural architecture for
disparity estimation and some state-of-art algorithms [106];
the table is extracted from the online evaluation page of
Middlebury Database. . . . . . . . . . . . . . . . . . . . . 47
5.1
(Left pane) Body definition composed by an eye, a head
and an arm with the same origin. (Right pane) Neural
Network model. The first layer encodes the sensory information into a neural code, the second layer models the
posterior parietal cortex and it performs the multi sensory fusion and the third layer encodes the arm position
with respect to the head frame of reference. . . . . . . . . 51
5.2
Experimental results with rx in degrees.(Top left) It shows
the responses of the trained network that represents the
arm position ax with respect to the head frame of reference for −20◦ ,0◦ and 20◦ , respectively. (Top right) The
error distribution (in degrees) of the estimated ax with
respect to the arm position, the eye position and the retinal position respectively. The solid lines represent the
mean error and the dashed lines represent the standard
deviation limits. (Bottom left) It represents a receptive
field after the training phase of the PPC layer. (Bottom
right) Contours at half the maximum response strength
for the 64 PPC neurons. . . . . . . . . . . . . . . . . . . . 54
5.3
Experimental results with rx in pixels.(Top left) It shows
the responses of the trained network that represents the
arm position ax with respect to the head frame of reference for −20◦ ,0◦ and 20◦ , respectively. (Top right) The
error distribution (in degrees) of the estimated ax with
respect to the arm position, the eye position and the retinal position respectively. The solid lines represent the
mean error and the dashed lines represent the standard
deviation limits. (Bottom left) It represents a receptive
field after the training phase of the PPC layer. (Bottom
right) Contours at half the maximum response strength
for the 64 PPC neurons. . . . . . . . . . . . . . . . . . . . 55
List of Figures
6.1
Frames of reference of the active stereo system with 3
DOF. The tilt movement is executed along the x-axis of
the world frame, and it rotates the frames of both eyes of
θT [rad]. Ideally, I define a virtual neck that performs the
tilt movement. . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2
System architecture. (left pane) The schematic model of
the working environment with the active stereo system
and the arm initial position. The aim is to detect the
target position in space through stereo cameras, compute
head joint angles to foveate the target and directly compute the final joint configuration of the arm to reach the
target location. The sensorimotor map is learned using
the end-effector itself as a target for the vision system.
(right pane) The schematic of the arm. It has 3 DOF
with links lengths compatible with human counterparts.
The range of θ1 is [− π2 , π2 ], the θ2 range is [− π2 , π2 ] and the
θ3 range is [0, 43 π]. . . . . . . . . . . . . . . . . . . . . . . 67
6.3
Error maps computed for the left eye; I have experienced
very similar error values also for the right eye. Top row:
testing sets with the error associated to each foveated 3D
point. Bottom row: the error distribution in pixel for the
testing set. The red line represents the mean of the error.
As I expected the error distribution along the Z direction
is lower then along the other directions. . . . . . . . . . . 70
6.4
Original system. In the left pane it is shown the mean
error associated to each 3D point in the testing set. The
mean error is computed considering each plausible initial
joints configuration of the head; for each configuration I
compute the error to foveate. In the right plane is shown
the mean error distribution. . . . . . . . . . . . . . . . . 72
6.5
Error maps computed for the left eye of the extended
system with the fifth neck configuration. . . . . . . . . . . 73
6.6
Extended system. In the left pane it is shown the mean
error to foveate each 3D point in the testing set. The
mean error is computed considering each plausible initial
joints configuration of the head. In the right pane it is
shown the mean error distribution. . . . . . . . . . . . . . 74
xvii
List of Figures
6.7
The trajectories of the cameras performed by the trained
extended system. The blue cross represents the 3D feature in space in position [200 0 40]. For graphical reason, the image is scaled but it is clearly shown that the
system firstly moves the neck and only when the neck is in
a steady position the eyes perform the vergence movement. 75
6.8
(left pane) Complete dataset. The points in space represent the end-effector positions used as targets for the
active stereo head. In the dataset at each end-effector
position are associated the corresponding arm joint configuration, the foveating joint angles of the head with the
euclidean error between the foveation point e the endeffector position. This dataset is used for the cross-folding
validation. (right pane) The Euclidean error between the
real end-effector position and the one estimated by the radial basis network; the error is quite low except for that
3D points that are very near to the head and to the shoulder. 76
6.9
Error directions projected on different planes of the world
frame of reference. The blue dots represent the targets
and the red lines are the distance in space between the
target and the arm position computed by the network.
For visualization reasons, I do not plot the estimated endeffector position. (left pane) Error projection into the
plane X-Z. (right pane) Error projection into the plane
Y-Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.10 Radial basis centers distribution in the input space. The
red circle represents a basis center, the blue dots are the
testing values and the cyan dots are the values used for
the training phase. . . . . . . . . . . . . . . . . . . . . . . 77
7.1
xviii
The overall IDRA architecture. It is composed by a set of
Intentional Modules (IM) and by a Global Phylogenetic
Module (PM). It receives in input a set of sensory information a produce in output the motor commands for the
controlled robot. . . . . . . . . . . . . . . . . . . . . . . . 82
List of Figures
7.2
A comparison between a (very) sketchy outline of the
thalamo-cortical system as commonly described [84] and
the suggested architecture. It is worth to emphasize the
similarity between various key structures: A. Category
Modules vs cortical areas, B. Global phylogenetic module
vs. amygdala, C. Ontogenetic modules vs. thalamus, D.
Control signals (high and low bandwidth), E. high bandwidth data bus vs intracortical connections. . . . . . .
7.3 Intentional Module (IM) . . . . . . . . . . . . . . . . . .
7.4 Ontogenetic Module (OM) . . . . . . . . . . . . . . . . .
7.5 State-action Table . . . . . . . . . . . . . . . . . . . . .
7.6 The 2 boards for the experiment . . . . . . . . . . . . .
7.7 The architecture used in the goal-generation experiment
7.8 The gaze is concentrated on the star object . . . . . . .
7.9 The red line represent the global relevant signal whereas
the blue line represent the ontogenetic signal. After training, the cognitive architecture is able to produce a signal
of relevance also for the star-shaped objects. . . . . . . .
7.10 The robot NAO in the movement experiment . . . . . .
7.11 The architecture in the second experiment . . . . . . . .
7.12 The most relevant hand positions . . . . . . . . . . . . .
.
.
.
.
.
.
.
87
89
90
95
97
98
99
.
.
.
.
100
100
101
102
xix
List of Tables
3.1
A proposal of classification of the experiments. The first
three entries are experiments at the Neural circuits level
(see Figure 3.5) where the neural architectures are developed focusing on the computational mechanisms founded
in the brain. The last entry is an experiment that propose
a cognitive architecture, focused on the interaction of the
cortex with the other brain areas. It is worth noting that
the computational network, once learned, does not need
other learning phase, whereas the cognitive model has a
variable neural network that learns during the interaction
with the environment. . . . . . . . . . . . . . . . . . . . . 33
4.1
The angle deviation from the optimality . . . . . . . . . . 45
6.1
Possible configurations . . . . . . . . . . . . . . . . . . . . 66
xxi
1 Introduction
The problem of designing an intelligent robot is far to be solved. An interesting challenge, both for engineers and neuroscientists, is to develop
a robot that acts like a human, thinking like a human. Even though
this problem is not new and researchers have been worked on this field
since the XX century, in the last decades we have seen the arising of the
biologically inspired approaches. Biologically inspired solutions mimic
what is known about the brain to import in robots some capabilities of
thinking. Nevertheless, the declared aim is to develop a complete conceptual and computational framework, describing both how the brain
might work and when the cognition arises. In this thesis, I will review
the last findings in this wide research area, with the clear intention to
unveil some features of the brain computational framework.
1.1 Problem definition
Reaching a target in the environment with an arm is one of the most
relevant capabilities of mammals. The reaching task involves several
computations that transform the perception of the target in a complete
movement of the arm to reach it. First, the target must be perceived
with at least an external sensory system, such as vision, and filtered in
order to localize its position in the sensory frame of reference (FoR);
second, the information coming from the sensory system(s) must be
integrated with the proprioception information of the body, such as the
position of the arm with respect to the body; third, the target position
must be computed with respect to the arm FoR, performing a coordinate
transformation between the sensory FoR and the arm FoR; fourth, the
arm movement trajectory must be computed and executed.
Typically, an intelligent behaviour is identified by an observer in a subject able to react to unexpected events, and to modify and manipulate
the surrounding environment. It implies a strong correlation between
the capability to perform actions by a smart subject and the high cognitive reasoning. Despite recent findings in neuroscience, the underlying
mechanisms of the human brain are far to be completely understood.
How the brain processes information to solve the reaching task is one of
the key feature that makes mammals on the top of the evolution chain.
1
1 Introduction
It is well known that the underlying neural circuitry is organized in several functional areas responsible to solve specific subtasks of the whole
cortical information processing [59]. It implies that an high level of synchronization among different areas is needed. Moreover, this functional
organization follows the well known divide et impera paradigm. These
anatomical and functional characteristics are interesting from a neuroscientific point of view since almost the whole brain is activated to solve
a reaching task [108]. Another interesting feature of the brain is that the
underlying computational mechanisms of the different brain functional
areas are widespread in mammals [12][29][61][87]. Besides the scientific
implication of this fact, from an engineering point of view it is interesting the capability of these computational mechanisms to self adapt
to different bodies, exploiting their morphology. For these reasons, I
conclude that the most interesting characterization of the basic motion
tasks as emerged from neuroscience are: first, the indication that the
reaching task is the expression of a complex and intelligent behaviour;
second, the high level of synchronization and organization of the different brain areas; third, the adaptability of the computational mechanisms
to different body shapes.
On the other hand, the robotics community addresses the reaching
task since the early era of robotics [113]. Despite many efforts to solve
the problem of reaching task, a robust and generic solution is missing. However, it is commonly accepted that the capability to solve the
reaching task is a very important characteristic of a robotic system, especially for humanoids. Interacting with the environment is the main
goal of any robot, regardless the specific task performed. Moreover, a
generic and robust solution is still missing, and several approaches are
available, such as optimal feedback control [108][120][119], visual servoing [113], and adaptive control [50][113]. However, these methods need
to know some robot characteristics that could be not available such as
kinematics, dynamics, and controller parameters.
Of course, the robot interacts with the environment pursuing a task
that must be accomplished. Typically, the knowledge is intrinsically
coded by the designer that programs the robot to perform specific trajectories, with several constraints, such as time execution, velocities, and
accelerations. Moreover, in the industrial context, the robots perform
repetitive tasks, that need high precision and few autonomous decisions.
On the other hand, autonomous robots must solve high level problems
without any explicit definition of them. They typically work in hostile,
highly dynamic environments and they must take decisions with a partial knowledge of both the surrounding environment and the robot state.
In this case, taking smart decisions is crucial for achieving the goals. An
2
1.2 Biologically inspired solutions
autonomous robot, working with these constraints, needs the capability
to think autonomously and to take actions, pursuing its own goals. For
this reason, a cognitive architecture seems the obvious answer to those
problems that can be solved by robots and that need high autonomy
in the decision-making phase. Classical approaches at the developing of
cognitive architecture are based, among the others, on symbolic processing, rules, and statistical learning (for a review see [126]).
Making a further step, some scenarios could need a robot that is not
only able to take decision with respect to its past experience but that
is also able to develop new goals and behaviours. This goal generation
phase is grounded on some innate criteria that bootstrap the following
behaviours. These new goals should represent an higher level of abstraction with respect to basic goals, towards an artificial consciousness
[23][24]. This processing is quite similar in humans. For example, let
suppose that a primary need for a human is eating. Generally, the human will act to reach its objective. However, its own actions will be
different if its surrounding environment is the jungle or the metropolis.
In the first case, he will hunt animals or collect vegetables, whereas,
in the second case, he will go to the market. But, definitely, in both
cases he is pursuing its basic goal of eating. Both behaviours (or goals)
of going to the market and hunting are higher level abstractions of the
need of eating. These concepts are particularly relevant if the aim is the
design of a complex cognitive architecture that is also able to adapt its
own behaviour through the interaction with the environment.
However, a cognitive agent can not interact with the environment
without a computational framework able to process the incoming sensory
information, to automatically estimate the environmental state, and to
interact with the surrounding objects. In the same way, an architecture
able to reach objects in space, given the perceived position through
the sensory system, is useless without a plan that can be generated
by a further level of processing, the cognition. The synergy among a
cognitive architecture, the way in which an agent perceive and interact,
and the working environment, can drive towards a new generations of
autonomous robots [92][93].
1.2 Biologically inspired solutions
A biologically inspired control strategy (also know as biomimentic or
bioinspired) is a computational model inspired by the study of animal
brain. These computational models can solve different problems, from
the distance estimation of an object perceived by the vision or audio
3
1 Introduction
system to the arm trajectory computation for reaching it. In the last
decades, these models have been applied in the robotics field, following
the intuition that a robot, that acts like a human, could use the same
strategies to take a decision. Generally, the implicit assumptions of the
biologically inspired solutions are that they are robust to noise, able to
adapt at different robot morphologies, and to be the best solution just
because the evolution has chosen it.
There are several reasons to design a biologically inspired controller to
control a generic robot, from humanoids to wheel-driven ones. First, the
design methodology is deeply different. Classical solutions are typically
based on control system theory [113]. Classical strategies are widely
used in factories where the robots are mainly used to improve the productivity. Roughly speaking, classical design processes are composed by
four steps: task definition, feasibility study, robot modelling and technology, and controller design. Using the control theory principles it is
possible to design open/closed loop controllers that are able to properly
drive the robot in its activities, that are task specific. Once a robotic
setup is dismissed and a new one is implemented, it could be necessary
to design a completely different controller to solve the same task. The
constraints of the classical solutions are properly taken into account by
the designer and the parameters of the controllers must be properly chosen. On the other hand, the biologically inspired solutions are derived
by the study of animals and humans brain circuitry, meaning that a
designer must observe a solution that already exists in nature and that
is highly optimized by the evolution [125].
Second, the classical control strategies are ad-hoc solutions that take
into account the specific robotic setup; in fact, an ad-hoc solution could
not be simply applied to a different robotic setup, whereas biologically
inspired solutions have internal parameters that must be estimated during a training phase and that are related to the computational strategies
rather than the robot morphology, although bioinspired solutions exploit
the robot shape.
Third, classical controllers can deal only with those tasks that are
encoded in their programs because they usually work in structured environment. The biologically inspired solutions are more oriented to autonomous robots and, as it will be clear in the following chapters, the
cognition could emerge in the same neural layer encoding the low level
computation mechanisms.
Fourth, in the last twenty years the classical solutions have shown
their own limits in the autonomous robotics, whereas the biologically
inspired strategies has been developed with particular efficiency.
Fifth, other approaches to autonomous robotics, though biomimetic
4
1.3 The aim of this work
solutions, usually deal with only a part of the big problem of designing an intelligent machine. Techniques such as Self Localization and
MApping (SLAM)[72] or visuomotor mapping [21][22] often solve specific tasks, with a true difficulties to compose them in a single controller.
Moreover, it is opinion of the author that other biomimetic approaches,
such as evolutionary robotics [37][83], partially cover the big picture of
biologically inspired solutions. In fact, Webb pointed out that there are
several ways to design biological plausible models and she derived seven
paradigms to evaluate them [130]. The biologically inspired controller
based on neuroscientific models should overcome this integration problem, merging the computation and the cognition on the same neural
lattice.
Despite the above considerations, designing biologically inspired solutions is not a simple task due to several reasons. First, the human and
robot mechanics are very different and the low level control strategies
will differ; in fact, the human joints are actuated through the muscles
whereas the robot joints can be directly actuated through servomotors.
Second, there is technological limit in the computational capability of the
hardware and the computational load needed by a bioinspired method
could not be applied in real time. Third, mimicking only specific brain
areas that solve specific tasks could have worse performances with respect to classical methods whereas a complete brain model could really
overcome a complete classical strategy.
In the following, I will refer to a biologically inspired (or bioinspired)
solutions assuming that the bioinspiration is taken by neuroscientific
findings.
1.3 The aim of this work
This thesis pursues several objectives, at different level of complexity.
First, I want to investigate several biologically inspired models of different cortical areas that are functionally grouped in the visual dorsal
pathway. The comparison between each single model and its own state of
art gives insight in the neuroscientific findings, related to computational
mechanisms of the cortical areas.
Second, a qualitative comparison among previously developed models
permits to propose a common computational framework for those computations requiring a minimum level of cognition and that our brain is
able to automatically compute.
Third, a biologically inspired cognitive architecture is investigated. It
is based on the interaction among the cortex and other areas of the brain
5
1 Introduction
(e.g. thalamus and amygdala) that typically are not well investigated.
Fourth, the proposal of a roadmap for the developing of biologically
inspired architecture is presented. These architectures must exploit
the synergy between low-level computation and cognitive development.
The analysis involves a comparison among different types of learning of
both low level computational models and the cognitive architecture. A
roadmap for the integration is proposed.
1.4 Thesis organization
Chapter 2 shows the related works, introducing the common background needed to understand how scientific and engineering fields interact.
Chapter 3 introduces relevant aspects of this thesis, focusing on the
visual dorsal pathway, on how to model biologically inspired architectures, and on the motivations driving this work.
Chapter 4 introduces the model of the primary visual cortex for the
disparity map computation. Quantitative measurements are provided
and compared with the state of art algorithm for the dense disparity
map estimation.
Chapter 5 introduces a model of the posterior parietal cortex able to
compute the visuomotor mapping between the eye frame of reference and
the arm frame of reference. The map is learned through unsupervised
learning. Quantitative results are provided to evaluate performances
with respect to previous results.
Chapter 6 introduces a model of the Hering’s law of equal innervation
to control an active stereo head and to perform the arm trajectory to
reach the foveated target. Quantitative results are provided to verify
the robustness of the system.
Chapter 7 introduces a cognitive architecture for goal generation based
on the interaction among the thalamo, the amygdala, and the cortex.
Qualitative results are provided to verify the generation of new goals.
Chapter 8 draws the conclusions of this thesis, considering the experimental results of the previous chapters.
6
2 Related works
The biologically inspired solutions for robotics involve at least three
main research fields: robotics, biomimetics, and neuroscience. Since the
amount of information about these fields is too huge to be described in
a single chapter, I will specifically focus on those topics that are strictly
related to this thesis. The aim is to introduce those concepts that will
be the common background for the following chapters.
Section 2.1 presents how these research fields are overlapped and why;
Section 2.2 presents those works related to neuroscience, describing the
mammal brain; Section 2.3 describes the classical approach to mimic
biological solutions; Section 2.4 shows the recent advances in robotics
and the classical approaches.
2.1 Overview
Biologically inspired controllers, mimicking the brain functionalities in
terms of both cognitive capabilities and neural computation, are a fairly
new and promising topics. Although the performances are still questionable, recent advances show the potentiality of this new approach.
However, this topic needs the knowledge of at least three broad research fields, namely, neuroscience, robotics, and biomimetics. Figure
2.1 shows a sketch diagram highlighting several research topics that need
a shared knowledge among these research areas. Robotics is the research
field dealing with the study and the design of robots, that must substitute humans in repetitive or dangerous tasks; biomimetics is the research
field that studies how to find both novel solutions and technologies by
means of observing the natural evolution; neuroscience is the research
field that studies how the brain might work, building models and inferring new theories.
On the other side, the intersection between robotics and biomimetics
deals with, among the others, the design of engineering technologies that
mimic the natural solutions, such as the spider net for new buildings and
materials. For instance in robotics, the human skeleton and muscles are
investigated for the designing of new robot mechanics and actuators.
Moreover, researchers with a background in both robotics and neuroscience usually are involved in Human Computer Interface researches.
7
2 Related works
Figure 2.1: The sets show research fields that are involved in the development of this work. The robotics field represents the classical approach to robotics; the neuroscience field is related
to the neuroscientific advances in the comprehension of the
brain functionalities; the biomimetic is the field related to
mimicking biological solutions to solve specific problems.
8
2.2 Neuroscience
For example, they study solution for driving devices with the EEG signals; these devices could be either wheelchair, prosthesis, or computers.
Furthermore, the overlapped region between neuroscience and biomimetics is quite fuzzy because, from a classical point of view, the description
of brain functionalities in terms of computational models is part of the
scientific approach; in the neuroscience domain, it is called Computational Neuroscience. However, the difference consists in the application
of these computational models. In fact, if they are used for solving
engineering problems such as disparity estimation, they can not be considered purely scientific models but stay in the common region between
neuroscience and biomimetics (for a discussion, refer to [94][130]).
Finally, the synergy among these three broad research areas permits to
develop novel solutions for the design of biologically inspired controllers
that mimic the brain both for cognition and for low level computational
mechanisms.
In the following paragraphs, I will present an overview of the recent
advances in these fields, taking also into consideration overlapping topics. Even though I deal with a huge amount of literatures and the
overlapping research areas are not always well defined, I will introduce
a simple taxonomy of the most relevant works in these main research
fields. The overlapping topics will be presented once, focusing on particular applications.
2.2 Neuroscience
Neuroscience deals with the study of both the central and peripheral nervous system. This research field covers several topics, from the chemical
explanation of the activation level of neurotrasmitters to the observed
behaviours, driven by the neural activity. Among these topics, there are
at least three research subareas that are relevant for my thesis, namely,
cognitive neuroscience, evolutionary neuroscience, and computational
neuroscience. These fields provide worth contributions for the development of robotics architectures based on neuroscientific principles. For a
very complete introduction about the principles of neuroscience, please
refer to [59].
Cognitive neuroscience focuses on the development of a theoretical
framework that could fill the gap between the neural activity and complex behavioural traits of the brain such as memory, learning, high vision
processing, emotion, and higher cognitive functions. The underlying
feature, widespread among these brain functionalities, is the information processing: how the brain encodes and propagates information [2].
9
2 Related works
Despite recent advances, the neural mechanisms, driving the interaction
with the environment to select a proper action, are still under discussion
[14][26]. According to the classical view, the brain workflow is composed
by at least three phases: perception, cognition, and action. Following
this classification, the cognitive functions are separated from the sensorimotor system, but recent works show that the cognitive functions are
not localized in high specialized brain areas; instead they are managed
by the same sensorimotor neural population. For a complete discussion
about recent advances in cognitive neuroscience please refer to [26].
Evolutionary neuroscience deals with the understanding of humans
brain evolution through time. In particular, the classical approach for
studying the development of the human brain is to compare it with
the brain of the others mammals. In fact, evaluating the main genetic
differences, it is possible to infer how the natural evolution has influenced
the emergence of humans. Taking into consideration the aim of my work,
I will not focus on this neuroscientific branch but it is worth noting
that this topic perfectly match those concepts driving the studies in
evolutionary robotics; for a complete review, please refer to [99].
Computational neuroscience is the research field dealing with the understanding of how neural populations encode and process information
[107]. Typically, these research propose mathematical models explaining
some relevant features of the biological counterparts.
According to Sejnowski et al taxonomy [107], three classes of brain
models exist: the realistic brain model, the simplifying brain model, and
technology for brain modelling. The realistic brain models encapsulate
as much as possible details regarding the biological object under investigation. These models are characterized by a huge computational time
on large parallel computers (or clusters of computers) for simulations.
An example of this kind of model is the Hodgkin-Huxley neuron model
[47]. The simplifying brain model overcomes the computational infeasibility of realistic models, providing higher level characteristics without
taking into account the physics dynamics of chemical processes underlying neural communications. An example of this modelling strategy is
the artificial neural network of perceptrons. Typically, the technology
for brain modelling deals with the development of dedicated hardware
mimicking the same computational parallelism of the biological brain.
According to the Webb’s paradigms, realistic models have biological relevance and a fine level of details whereas neural networks have a very
low degree of biological relevance but have a higher level of abstraction
[130].
It is worth noting that these models can be described at the level of
a single neuron or at the level of a brain area. The single-neuron mod10
2.3 Biomimetics
els try to infer both the firing properties of the biological neuron and
the information encoding (at the level of single neuron); for a review,
please refer to [55]. On the other hand, a brain area model tries to
infer the organization of the population of neurons in order to produce
a neural activity that has the same properties of the biological counterpart. These populations have their computational mechanisms that
can not be inferred by the single neuron activities; example of these approaches are models of the primary visual cortex [76][124], the posterior
parietal cortex [82][135], and the amygdala-thalamo-cortex interaction
[69][70][81].
Among the others, an interesting application of neuroscience advances
is the prosthesis research field. Even though, the design of a prosthesis
is well defined in literature there are several problems to deal with: designing the artificial limb (more related to robotics issues), controlling
the prosthetic implants through neural signals, reducing human tissues
reaction to prothesis material, eliminating noise, and making the prosthesis psychologically accepted. For a complete review, please refer to
[63].
2.3 Biomimetics
Biomimetics focuses on the comparison between nature problem-solving
techniques and their application in the engineering technologies. Although there is a large amount of literature regarding bioinspired solutions, a largely accepted taxonomy is still missing [130][94]. Recently, a
taxonomy proposes three main categories of bioinspiration: the comparative approach, the natural via artificial approach, and the natural pro
artificial approach [125].
The comparative approach is based on the comparison of the performances of both systems in order to infer properties where the underlying
idea is to perform the same experiments with both the natural and the
artificial system to investigate similarities and dissimilarities. In the
natural via artificial approach models of biological systems are used to
infer properties of the natural counterpart. In the natural pro artificial
(pro) approach, engineers study natural systems to infer novel solutions
to either novel or well-known problems. The distinction among these
three approaches could be debated, but for the purposes of this work I
will always refer to the natural pro artificial approach.
Following the above mentioned taxonomy, the pro approach, among
the others, deals with the development of natural computation mechanisms that will be integrated in engineering systems for problem-solving.
11
2 Related works
Figure 2.2: The classical robot model with a closed loop of sensing and
actuation. The information flow is processed in a serial way.
For example, in the computer networks field, a taxonomy of biologically
inspired algorithms related to several issues in computer networks such
as routing, anomaly detection, and spreading of computer viruses has
been recently proposed [71]. In robotics, several models inspired by
(non-)human brain has been proposed. They model several aspects of
the brain functionalities such as depth estimation [76], localization [72],
sensorimotor mapping [21][79], neuroevolution [36], and motor skills [54].
On the other hand, the pro approach also deals with the study and
development of new technologies starting from the observation of nature.
For example, several sensors are developed following the study of animals
such as reptiles and mammals [9].
2.4 Robotics
Robotics deals with the study of artificial agents able to substitute humans in several tasks. A robotic system is composed by at least four
subsystems, namely: mechanical, actuation, sensory, and control subsystem (see Figure 2.2). The mechanical subsystem represents the mechanical structure of the robot such as wheels and legs for locomotion, and
arms for manipulation. The actuation subsystem deals with the control
laws that rule the actuation of mechanical parts. The sensory subsystem
processes the stimuli coming from both proprioceptive (e.g. encoders),
and exteroceptive (e.g. cameras, audio, or touch) sensory systems. The
control subsystem is responsible to take a decision considering the task
12
2.4 Robotics
planning and the environmental constraints.
Among several taxonomies, robotics systems can be classified with
respect to either their working environment or their mechanical structure. Generally speaking, robots that are classified following their own
mechanical structures can be divided in two classes: manipulators and
mobile robots.
The manipulators are characterized by one or more arms, that are
connected by joints. Typically, the manipulator tasks deal with the
manipulation of objects in space, such as pick-and-place task in the
classical industrial robotics. The mobile robots are characterized by the
capability to move inside their working environment. The movement is
possible due to wheels or legs, though the control laws will vary with
respect to the type of actuation.
Moreover, the robotics systems can be classified in two classes with
respect to the working environment in which they operate, namely, industrial robots and autonomous robots.
The industrial robots work inside a structured environment, meaning
that the environment contains landmarks and static features well recognizable by the robots. Moreover, the environment is typically static,
meaning that the robot knows a priori its structure. On the other hand,
autonomous robots deal with unstructured environments in hostile conditions. The environments are usually dynamic and the robot must take
its decisions with respect to the perceived environment state. Typical applications of autonomous robots are security, defence, exploration,
and service robotics (i.e. autonomous cars).
For a complete review about the previous topics, please refer to [113].
2.4.1 Industrial Robots
Classical industrial robots can be designed and controlled following a
well know design pattern, composed by three steps: modelling, planning, and control. Modelling a robot means to define a proper model of
the mechanical structure taking into account geometrical, differential,
static, and dynamical constraints. Planning deals with the definition
of trajectories of the robot with respect to the specific task. In case
of manipulators, trajectories can be known whereas in case of mobile
robot it could be necessary to take into account a dynamic environment. Knowing the trajectory generated by the planning phase and the
sensory information over time, the control subsystem must generate the
joints torques that guarantee the robot trajectory.
13
2 Related works
Modelling
The first step in the design process of a robot controller is to describe
its mechanical structure in a formal way. The kinematics model correlates the joints angles with the final position and orientation of the
end-effector. Solving the direct kinematic problem means to compute
the position and orientation of the end-effector knowing the position of
the joint angles, whereas the inverse kinematic solution is to find the
joints angles, knowing the end-effector position and orientation. The
differential kinematics describes the joints velocities with the angular
and linear velocity of the end-effector. The static analysis models the
relationship between the force and the torque of the end-effector with
the joints torques. The dynamic model of the mechanical structure
plays an important role in the design of a robot controller, especially in
those cases where the dynamical aspects, such as inertia, are relevant.
It describes the relationship between the joints torques and the structure motion, using the Lagrange formulae. The dynamic direct problem
means to compute the accelarations, velocities, and positions of joints,
knowing their torques, whereas the inverse dynamic problem means to
compute torques, starting from accelerations, velocities, and positions
of joints.
Planning
The planning phase deals with the generation of the motion laws for
either the joints space or the working space. Dealing with manipulators
and mobile robots implies different planning strategies, the former needs
the final joints/position configuration with respect to a initial configuration, whereas the latter needs to know a final position in space.
If the environment contains obstacles, it is necessary to formalize the
concepts of obstacle defining those regions in space that can not be
reached by the robot, both for manipulators and mobile robot. The trajectory generation in presence of obstacles refers to the motion planning
problem.
Control
Once the trajectory is known, the robot needs a controller that is able
to translate the desired movement into motor commands. Moreover,
it could be necessary to take into account the complex kinematics and
dynamics of a robot, making the design of a proper controller necessary.
The controller receives in input the trajectory given by the planning
phase and generates the joints torques to maintain the desired trajectory.
14
2.4 Robotics
The controllers are generally based on closed-loop feedback, in order to
stabilize the trajectories. There are mainly four type of control: motion
control, force control, visual servoing, and optimal feedback control.
The motion control works with the joints position or the end-effector
position. The force control models the interaction of the end-effector
with the environment in order to avoid high force on the end-effector.
The visual servoing controller has the visual feedback in the closed-loop,
given by a set of cameras. The optimal feedback control has a prediction
and an update phase in order to minimize the error between the desired
trajectory and the actual trajectory.
Closing remarks
The industrial robots are particularly suitable for application where the
environment is highly structured and static; these properties imply that
the robot explicitly knows the structure of its working environment and
the objects in the scene are static, except for the robot itself. These
robots are able to reach an extremely high precision in positioning the
end-effector but the classical techniques, deriving from the classical control system theory, are not very suitable for situations where the environment is dynamic and not structured, and where there could be several
robots that cooperate.
2.4.2 Autonomous Robots
Despite the advances in the industrial robotics fields, the classical techniques fail dealing with unstructured and dynamic environment. The
lack of flexibility makes the classical approaches unsuitable for controlling robots with a certain degree of autonomy. Autonomous robots are
robots able to take decisions with partially or no information about the
surrounding environment, basing their choice on both previous experience and the objective. Typically, the task is defined at the high level
but not at the low level of execution due to the complexity of the environment.
The problem is to design (or estimate) a controller that is able to solve
a task in an unstructured, hostile, dynamic environment without a full a
priori knowledge of the working environment. The development of new
techniques became increasingly necessary, and both machine learning
and human-driven approaches were proposed.
15
2 Related works
Evolutionary Robotics
Evolutionary robotics (ER) is a research field where statistical techniques are used to evolve both robot controllers and mechanical structures, emulating the natural evolution [37]. The ER process is composed
by several cycles where a population of controllers is evolved in order to
estimate the best controller able to solve the specific task in a complex
environment.
At the beginning, a population of different controllers is generated.
Each controller is implemented on a robot (either real or simulated)
and, given a task, the controller’s performance is evaluated, using a fitness function. The controller’s performances are ranked and only a small
subset of the best controllers are used for a further evolution step. Using a classical genetic algorithm (GA) (for a review see [43]), at the best
controllers are applied the mutation and the crossover operators. Given
the new population, evolved by the previous one, the controllers evaluation is performed again, repeating the cycle. Typically, these cycles are
repeated until the desired performance is reached.
An advantage of this approach is that the designer could not known
much about the working environment due to its complexity and a dynamic model of both robot and environment could not be available.
Moreover the controllers can be of different types but the ER process
still remains the same, except for the GA algorithm that needs to know
how to apply the operators. In particular, ER can evolve neural networks, system parameters of previously defined controllers, and others.
Furthermore, the ER process can be applied to different kind of robotics
setups without a conceptually change in the process.
However, one of the main disadvantages is related to the definition
of both the fitness function and genetic operators. In fact, it is well
known in literature that these operators are the key components for the
successfully estimation of a robot controller.
For a complete discussion about the choice of the fitness function and
a complete list of surveys regarding the different specific topics of the
ER, please refer to [83].
Developmental Robotics
Developmental robotics (DR) is a recent approach to biologically inspired robotics, focusing both on the acquisition of new skills and on
the role of morphological development in robot efficiency. A key idea
is the concept of embodiment; it claims that the body mediates sensory
inputs and actuation, making the body itself an important part on the
16
2.4 Robotics
emergence of cognition. Thus, DR encloses the idea that body, environment, and brain are coupled and intelligent behaviours arise by the
interaction among them, in such a way that complex behaviours arise
from previously developed simple ones. For a review of the theoretical, philosophical, and robotics experiments regarding the developmental robotics, please refer to [66][93].
Developmental robotics is a general methodology to develop a controllers for autonomous robots. Among the different theoretical frameworks the most interesting are: cognitive developmental robotics (CDR)
[8] and autonomous mental development (AMD) [131]. Cognitive mental
development is focused on the autonomous development of knowledge
through the interaction with the environment, whereas the autonomous
mental development is more focused on autonomous understanding of
the tasks to be accomplished where humans provide the reinforcement
information about robot interaction with the environment.
Learning from demonstration
Learning from demonstration (LfD) is a methodology that enables a
robot to correctly choose the proper action with respect to the current
state. The robot training is performed through the interaction with a
human teacher, following a supervised learning paradigm.
The LfD is composed by two fundamental phases: first, collecting
the dataset of movements shown by the teacher, and second inferring
the controller based on the gathered data. The data acquisition can
be performed in two ways: by demonstration or by imitation. In the
former way, the teacher directly demonstrates the pair state-action on
the real robot, where the latter deals with a demonstration that is not
performed directly on the robot but on a different platform. An example
of demonstration is the teleoperation, whereas an example of imitation
is putting sensors on the teacher.
There are three main approaches to learn a control policy: approximating a mapping function, learning a dynamic model of the interaction
robot-environment, and representing the policy as a sequence of actions
that must be chosen by a planner.
An advantage of LfD is the capability to train the robot to execute a
specific task without an explicit knowledge of the dynamic system. A
main disadvantage is that the robot chooses its action only for those
states that are encountered during the training phase with the human
operator.
For a further discussion about the details of the algorithms and a more
focused taxonomy about the different LfD strategies, please refer to [7].
17
2 Related works
2.5 Closing remarks
To reach the goals of this thesis, I need to keep in mind some of the
concepts and methodologies developed in the three main research fields
described above. The neuroscience provides mathematical models at
different levels: first, receptive fields of specific neuron populations and
their parametrization; second, connections and underlying principles in
the network architectures, that implement specific functionalities, such
as lateral connections in posterior parietal cortex [102]; third, intercortical connections, describing the emergence of cognitive functions.
However, these models are generally parametric and a learning phase is
needed.
On the other hand, robotics provides the mathematical framework to
design and simulate kinematics chains, useful to test the neurocomputational models.
Finally, biomimetic drives, from a theoretical point of view, the study
and the adaptation of the neurocomputational models to the robotic
framework, suggesting assumptions and simplifications.
18
3 Motivations
3.1 How the mammal brain executes reaching
Neuroscience investigates how the human control is distributed among
different brain areas and the spinal cord. However, the motor control
is only part of the system needed to successfully reach a target; the
sensory information processing, one of the most relevant feature of the
human brain, is involved. There are strong evidences that a specific
sensory information is processed by a small portion of the brain (called
functional area). These areas show different functionalities, and are able
to extract features, filter the incoming data, and reduce the noise. On
the other hand, other brain areas must be able to interpret the neural
responses of the sensory areas to correctly compute the position of the
target and the trajectory to reach it.
The main areas that are known to be involved in the reaching task
are: the Primary Visual Cortex, the Posterior Parietal Cortex, and the
Motor Cortex. Ignoring other sensory areas involved in the reaching
task and that the posterior parietal cortex integrates different sensor
modalities, I will only deal with the visual system that is well known to
be the major source of sensory information in humans.
3.1.1 Background
In the following paragraphs, I make several assumptions and simplifications due to the complexity of the topic and the objective difficulty to
describe the whole brain areas interaction solving the reaching task. As
already indicated, the brain can be subdivided in different areas, each
of them with specific functionalities. This divide et impera anatomical separation of the brain areas leads to a hierarchical organization of
the brain functionalities, from the raw sensorimotor perception to high
cognitive capabilities.
Given a sensory source, the incoming information is filtered along
different brain areas, mixed with other sensory information, and used
to make an action decision. In the rest of the text, I call a pathway the
information flow through different areas focused on achieving a specific
objective, such as a reaching task through visual perception.
19
3 Motivations
The dorsal pathway is commonly associated to achieving reaching
tasks: from the perception of the target to the motor command execution of the arm. Although the dorsal pathway receives sensory information from different sources, such as audio, somatosensory, and video,
this dissertation covers only the visual information processing. Following the literature definitions, I call this group of brain areas visual dorsal
pathway.
Widespread computational mechanisms
Despite the huge number of functional areas in the brain that solve
different computational problems such as sensorimotor mapping, depth
perception, object classification, and motor control, there are some computational mechanisms that are shared among them. These mechanisms
arise from the common computational layer underlying each functional
area: the neural network. There are, at least, six widespread computational mechanisms that should be taken into account, namely population coding [29][39][61], gain modulation [5][102][135], normalization
[15], statistical coding [87], feedback connections, and neural plasticity.
Population coding is the mechanism used in the brain to represent sensory information. The responses of an ensemble of neurons encode both
sensory or motor variables in such a way that can be further processed
by next cortical areas, e.g. motor cortex. One of the main advantage
of using a population of neurons to represent a single variable is its
robustness to neural noise [29][61].
Gain modulation is an encoding strategy for population of neurons
where the single neuron response amplitude is varying without a change
in the neuron selectivity. This modulation, also know as gain field, can
arise from either multiplicative or nonlinear additive responses and is
considered a key mechanism in the coordinate transformations [5][12].
Normalization is a widespread mechanism in several sensory systems
where neural responses are divided by the summed activity of a population of neurons to decode a distributed neural representation [15].
The statistical coding is a kind of population coding, especially used
for sensory data [87]. It seems to be widespread in the brain areas
devoted to preprocess the incoming sensory data [114] and it offers at
least two advantages: it reduces the dimensionality of the input space
[53] and it gives an interpretation to the topological organization and
emergence of the neuron receptive fields [52]. There are different proposed approaches that take into account statistical properties of the
sensory input such as Independent Component Analysis (ICA) [51][53]
and sparse coding [87].
20
3.1 How the mammal brain executes reaching
Feedback connections are a mechanism implemented between both
intra and extra brain areas, involved in the neural implementation of
optimal feedback control [120], refining visualspatial estimation, and
filtering [59]. It plays an important role in the brain performance but it
is typically neglected in the modelling due to the intrinsic mathematical
complexity to deal with the feedback.
On the other hand, a key feature that plays a very important role is
the neural plasticity, also know as learning. It works at different levels,
from the single neuron to whole brain areas. The Hebbian learning is
the commonly accepted learning principle at network level.
The visual dorsal pathway
Figure 3.1: The anatomy of the dorsal pathway
The visual dorsal pathway is known to play an important role in the
vision for action tasks (Figure 3.1). The raw sensory information, gathered by the environment, is filtered and processed to achieve an highlevel representation of the surrounding environment. As already noted,
the main areas involved in the visual dorsal pathway are:
1. the Primary Visual Cortex
2. the Posterior Parietal Cortex
3. the Motor Cortex
The primary visual cortex (V1 area) receives in input the binocular
visual signal previously filtered in the Lateral Geniculate Nuclei (LGN)
21
3 Motivations
and in the retina, whereas it produces in output several neural activities related to high-level information extracted by the binocular signals,
such as depth, motion, segmentation, and target detection [27][59]. The
V1 area is the first one where visual signals coming from the two eyes
are combined to compute binocular information, useful for the depth
estimation; in the following pages, I will focus deeply on the depth estimation.
The posterior parietal cortex (PPC) receives in input the neural activities of the main sensory areas, such as the primary visual cortex,
the auditory cortex, and the somatosensory cortex. Its main tasks are
related to the visuospatial localization of the body with respect to the
surrounding environment, language, attention, and sensory fusion [59].
The PPC projects to the premotor cortex to coordinate the movements.
The motor cortex (M1 area) is related to the movements generations
whereas each subarea of the motor cortex is related to the activity of a
specific group of muscles. The motor cortex receives information by two
main sources: the PPC and the somatosensory cortex (for the muscle
proprioception); the motor cortex directly projects into the motoneurons. For a reaching task, the motor cortex computes the movement of
the arm to reach a target previously perceived by the visual system and
localized by the posterior parietal cortex.
3.1.2 The primary visual cortex
Figure 3.2: The anatomy of the primary visual cortex area (V1)
22
3.1 How the mammal brain executes reaching
Anatomy
The Primary Visual Cortex (V1 area) is located in the occipital lobe and
it is part of the visual cortex (see Figure 3.2). In the Brodmann classification, V1 is anatomically situated in the area 17. It covers both the
hemispheres of the brain, where the left hemisphere receives visual information from the right side of the visual field, and vice-versa, through
the Lateral Geniculate Nuclei. The Primary Visual Cortex can be subdivided in 6 different layers, named from 1 through 6, each with its specific
functionality.
Functionalities
Several studies demonstrate that the primary visual cortex deals with
the spatiotemporal representation of the visual stimuli. It is the first
brain area in the visual pathway dealing with the binocular fusion. In
literature, the primary visual cortex is known to solve the problem of
depth perception and motion detection.
The depth perception is the capability to estimate, from the retinal
images, the distance of the objects in the scene with respect to the eyes
frame of reference. The depth of the environment is represented through
the disparity map, which is the difference between the due retinal stimuli.
The motion detection is the capability to estimate the direction and the
velocity of a target, moving in front of the visual field. For a further
discussion about the fundamentals of the functionalities, please refer to
[27].
Neurocomputational models
The primary visual cortex solves several tasks but, for the purpose of this
work, only the model for depth perception is presented. The disparity
map computation seems to exploit, at least, three widespread properties
of the brain, namely, population coding [86], normalization [15], and
statistical coding [49][114].
Several computational models address the problem of depth estimation, through the computation of the disparity map using either energy
model [19][76][124] or a template matching approach [121]. The energy
models are based on the study of the primary visual cortex performed
by Ohzawa et al. [86]. He found the existence of two types of neurons,
namely, simple cells and complex cells. The simple cells directly filter
the binocular stimuli coming from the retinas, with their receptive fields,
whereas the complex cells gather the simple cells responses to effectively
estimate the disparity. Both simple and complex cells have a preferred
23
3 Motivations
disparity. This means that their responses is at maximum when they
filter visual stimuli with the preferred disparity. Other works found that
simple cells receptive fields fit, with a certain degree of confidence, to
Gabor filters with a proper parametrization [96].
The energy-based models differ in the neural network complexity and
in the internal mechanisms used to make robust the estimation, and
can be roughly divided in multi-scale approach [19][76], and single-scale
[97][124]. The multi-scale approach improves the robustness of the estimation, with respect to other approaches combining the estimation at
different levels of granularity [76]. The simple cells receptive fields have
different dimensions on the image planes to filter the visual stimuli at
different scales. The single-scale approach uses other mechanisms, such
as the combination of both phase- and position- shifts, for improving its
performances [97].
3.1.3 The posterior parietal cortex
Figure 3.3: The anatomy of the posterior parietal cortex area (PPC)
Anatomy
The Posterior Parietal Cortex (PPC) is situated in the Posterior lobe of
the brain (see Figure 3.3). In particular, it is after the portion of the
Parietal Lobe called Primary Somatosensory Cortex. The PPC can be
subdivided into the Inferior Parietal Lobule and the Superior Parietal
Lobule, that are anatomically divided by the Intraparietal sulcus.
24
3.1 How the mammal brain executes reaching
Functionalities
Several studies show that the PPC is involved in integrating sensory information [134], manipulation of objects, sensorimotor mapping [13], and
coordinate transformations of different body parts [10][82][135]. These
functionalities can be found in several subregions of the PPC: the medial
intraparietal area (MIP), the lateral intraparietal area (LIP), the ventral
intraparietal area (VIP), and the anteriorintraparietal area (AIP). For
a further discussion about the fundamentals of the functionalities please
refer to [59].
Neurocomputational models
For the purpose of this work, among several functionalities provided by
the PPC, the coordinate transformation is the most interesting. The
computation of CT seems to exploit three widespread properties of the
brain, namely, population coding [61], gain modulation [5][102][135], and
normalization [15].
Several computational models of the PPC address the problem of
CTs using three-layer feed-forward neural networks (FNNs) [134], recurrent neural networks (RNNs) [102], or basis functions (BFs) [95].
The FNNs and the BFs models are trained with supervised learning
technique whereas the RRNs model uses a mix of supervised and unsupervised approaches to train the neural connections, encoding multiple
FoRs transformation in the output responses. However, only the BF
model exhibits the capability to encode multiple FoRs transformation
in the output responses. This result comes out from using a complete
set of basis functions and an intermediate frame of reference encoding
[134].
It is worth noting that gain modulation plays an important role in
the computation of the coordinate transformations but it is still unclear
if this property comes out in the cortex as a result of statistical representation of the incoming information. Previous models show that
the multiplicative behaviour of gain modulation can arise using supervised learning on a feed-forward neural network [134][135] or putting
the gain modulation from the beginning and evaluating which are the
conditions to compute coordinate transformation [95][102]. Recently, De
Meyer shows evidence to support that gain fields can arise through the
self-organization of an underling cortical model called Predictive Coding/Biased Competition (PC/BC) [28]. The PC/BC model is composed
by an ensemble of neurons with feed-forward and feedback connections
and it is based on the minimization of the residual error between the
25
3 Motivations
internal representation of the sensor value and the incoming sensory information. It demonstrates that the gain modulation mechanism arises
through the competition of the neurons inside the PC/BC model, and
comments on the feasibility of such system to compute CTs. Further
experiments on the PC/BC model demonstrate its feasibility to solve
the coordinate transformation [82]. For a complete discussion, please
refer to chapter 5.
3.1.4 The primary motor cortex
Figure 3.4: The anatomy of the primary motor cortex area (M1)
Anatomy
The Primary Motor Cortex (M1 area) is located in the frontal lobe
and it is part of the motor cortex (see Figure 3.4). In the Brodmann
classification, M1 area is anatomically situated in the area 4. In humans,
the primary visual cortex is part of the central sulcus and the precentral
gyrus. The primary motor cortex is anteriorly surrounded by several
subregions of the precentral gyrus that are part of the premotor cortex,
and posteriorly by the primary somatosensory cortex.
Functionalities
The aim of the primary motor cortex is to control voluntary movements
of the body [59]. Several experiments support this observation. Electrical stimulation of the M1 area causes movements in the subjects [90],
26
3.2 Modelling a biological architecture
whereas there is a temporal correlation between the primary motor cortex neurons activity and the intention of movement [38].
Moreover, recent studies show that the neurons activity is modulated
by different sensory modalities, such as vision and somatosensation. This
kind of modulation implies a rich heterogenicity in the response properties of the primary motor cortex neurons [45]. For a further discussion
about the fundamentals of the functionalities, please refer to [59].
Neurocomputational models
The most interesting functionality of the primary motor cortex is the
ability to encode a movement in neurons activity. The encoding of the
movements seems to exploit at least three widespread computational
mechanisms, namely, population coding [29][39], feedback connections,
and normalization [15].
Several computational models of the primary motor cortex address
the movement encoding using linear models [39], multiple regression
model [88], and Kalman filter [132][133]. However, these models consider
the firing rate as a continuous statistical distribution whereas the spike
trains are generally discrete; to take into account this fact, another class
of models is proposed [64].
On the other hand, the primary motor cortex can be viewed as a
part of a dynamical system that controls and generates movements; in
this case the feedback connections play an important role for driving
arm movements to the target [108][120]. Interestingly, Churchland et al.
show that the population activity contains a strong oscillatory component even for non periodic behaviours [25].
Anyway, the movement generation can be also viewed as the combination of motor primitives [105]. Following this view, the primary motor
cortex interacts with the cerebellum to choose which motor primitives
should be combined in order to generate the desired movement [118].
This topic will be fully discussed in chapter 7.
3.2 Modelling a biological architecture
Even though the proposal of a taxonomy for designing a biologically
inspired architecture is out of the aim of this work, I specifically introduce a generic 3-layers architecture, representing the interaction among
several levels of computation, from the basic mechanics to the cognitive aspects (see Figure 3.5). The architecture is a conceptual sketch of
how the different robot part could interact. According to the principle
of ecological balance the architecture provides a flexible network that
27
3 Motivations
Figure 3.5: This sketch represents the 3-layers architecture, which supports this work. The architecture is composed by 3 layers.
The mechanics and sensors layer is the physical robot (or its
model in case of simulation). The electrical/actuators layer
is the bridge between the low-level neural circuits and the
robot; it is the control interface and it implements how the
neural activity is translated in actuation. The Neural lattice
layer is the brain model and it is fairly composed by at least
two sublayers: the neural circuits and the cognition. The
neural circuits layer contains the biologically inspired models
of the brain functional areas; their main functionalities are
the information processing, the sensorimotor mapping, and
the motor representation. The cognitive layer contains other
neural circuits that elaborate information at an higher level,
taking into account motivations and goals. In my study, each
layer can communicate either with the lower or the higher
one.
28
3.2 Modelling a biological architecture
should adapt with respect to the robot morphology, the sensory system,
the working environment, and the way of actuation [92].
The mechanics and sensors layer represents the physical body of the
robot, including, but not limited to, sensors, motors, links technology,
materials, and robot morphology. This layer is as general as possible
because a strong assumptions of biologically inspired controllers is the
capability to implement common bioinspired computational principles
and, at the same time, to exploit different robot morphologies, up to a
learning period [91][92].
The Electrical/Actuators layer is a bridge between the neural circuitry
of the above layer and the physical robot actuation of the bottom layer.
It contains neural circuitry as the above layers and it implements the
actuation circuitry. Its main role is to translate neural population activities into proper motor commands for the physical robot. To clearly
outline the matter, let suppose to have a neural population encoding in
degrees the movement of a robotic arm of 1 degree of freedom (DOF).
Now suppose that the neural population has an activity representing a
relative movement in degree with respect to the actual arm position. The
neural circuit implicitly has knowledge about its body shape (as in the
embodiment paradigm [93]) but it could not know anything about the
actuation of the physical arm that can be actuated through servomotors,
DC motors, McKibben muscles, etcetera. The role of the Electrical/Actuators layer is exactly to fill the gap between the representation of the
movement and its actuation. For our purposes, the motoneurons are an
example of neural mechanism belonging to this layer.
The neural lattice is a layer that briefly models the brain. Among
the others, I focus on two broad functions: the low level information
processing, for both robot and environment state estimation, and the
cognition. Both share the same neural lattice, although these functionalities reside in anatomically separated regions of the brain. I expect
that the interaction between the cognitive circuits and the neural circuits (for low level computations) exploits the morphology of the robot,
even though some mechanisms could be independent from the morphology whereas ad-hoc mechanisms could exploit specific body shapes. The
sketch diagram shows an overlapping region between the two sublayers,
representing that the cognition influences the low level computation,
whereas the low level computation is part of cognitive decisions.
The neural circuits sublayer contains neural populations, essential to
perform low level computations, sensory processing and filtering, and
information fusion. From a neuroscientific point of view, this sublayer
contains those cortex neural populations associated with the functional
areas of the brain such as the primary visual cortex for depth estimation,
29
3 Motivations
the posterior parietal cortex for sensorimotor mapping, the motor cortex
for movement representation, the cortex areas associated to long-term
memory, and so on. Other non cortical areas reside in this sublayer
too, such as the thalamus, the amygdala, and the cerebellum. These
circuits have knowledge about the physical body sensory inputs and the
body shape in terms of degrees of freedom, but their role is to produce a
distributed representation of both the environment and the body state,
without having knowledge of the ultimate goal.
The cognition sublayer represents the cognitive aspect of the robot
controller. It drives the behaviour with respect to both innate and developed goals, learned through its existence. This sublayer is, as usual,
implemented by neural populations but it is worth noting that it specifically focuses on the interaction among the neural populations of the
neural circuits sublayer. In this sublayer, the cortex information, coming from the neural circuits layer, are integrated with ancient brain areas, such as thalamus and amygdala (encoding the developed and the
innate goals, respectively), in order to take a decision. The decision is
influenced by the current environment and robot state, and by the current goal. The neural responses drive the neural circuits sublayers to an
intelligent behaviour for an external observer. It is worth noting that
the main characteristic of this sublayer is the focus on the inter cortical
connections, that make possible to build an abstract representation for
decision-making.
This architecture is a proposal to organize the experiments performed
in this thesis, pointing out the relevant aspects of each proposed neural
model. Moreover, it would be a common lattice to clearly identify each
neurocomputational model for robotics with respect to their role and
their output.
On the other hand, it is important to define an approach for modelling
biologically inspired systems [94][130]. Roughly speaking, I have chosen
a modelling process composed by three steps:
1. investigate neurocomputational findings and select a relevant model
of the region of interest;
2. extend the experimental results of the selected model in order to
speculate about undiscovered properties;
3. make hypotheses about new properties, extend the model, and
investigate them through simulations.
I have applied this modelling process for each experiment that I will
describe in the following chapters. This process is fairly independent
30
3.3 Proposed neural models and their role
from the level of abstraction of the investigated model [130]. However,
the kind of hypotheses, that will be proposed in the third step, depends
on the level of detail of the model itself. This process is in accordance
with the modelling process proposed by Webb [130], where the main
difference is that I have used, as a source of information, a previously
proposed model.
3.3 Proposed neural models and their role
This thesis is focused mainly on the top layer, namely, the cognition
and the neural circuits. Experiments will be presented in the following chapters by using the bottom-up approach: first, I will present the
experience with the neurocomputational models for robot controlling
and then I will introduce a cognitive architecture based on biological
evidences of the interaction among the cortex, the amygdala, and the
thalamus.
A synoptic description of the performed experiments is shown in Table 3.1 and, in the following chapters, I will focus on each experiment,
specifically showing the main features and what they aim at.
A model and the experimentation of the primary visual cortex are
proposed in chapter 4. The aim of this experiment is to develop a biologically inspired neural network that is able to compute the disparity
map of the environment perceived through a stereo camera. This experiment makes some assumptions: the underlying neural network has a
fixed architecture, where each layer has a computational meaning (such
as the spatial pooling layer); the receptive fields of the binocular neuron,
on the retina image planes, are constant in time and represented as Gabor filters with a proper parametrization. It implies that the receptive
fields are not learned; the number of neurons is fixed in the architecture,
whereas the neurons model follows the disparity energy model proposed
by Ohzawa [85].
Going on with the visual dorsal pathway, a model of the posterior
parietal cortex is proposed in chapter 5. The aim of this model is to
develop a biologically inspired neural populations that is able to learn
its visuomotor mapping through the interaction with the environment. I
make some assumptions also in this case: the underlying neural network
has a fixed architecture and each layer has a population activity that
can be decoded to obtain a human readable format; the receptive fields
are learned through a learning phase that uses an unsupervised learning
technique. It will be clear that there are two distinct learning phases,
both based on the classical Hebbian learning; the number of neurons is
31
3 Motivations
fixed in the architecture, whereas there are two different neuron models:
the PC/BC model [28], and the classical perceptron model.
Moreover, a model of the Hering’s law of equal innervation is proposed in chapter 6. The aim of this model is to develop a system able to
compute the arm motor commands for reaching tasks, given a learned
visuomotor mapping. The model estimates the motor commands of a 3
DOF arm able to reach a target in space only knowing its position in
both the stereo image planes. The system is composed by two layers:
the first layer is a dynamical systems modelling the Hering’s law of equal
innervation that produces, in output, the head joints angles commands
to foveate the target, whereas the second layer is a radial basis function (RBF) network that computes the arm joints commands to reach
the target, knowing the head foveation angles. The experimental hypothesis are: the RBF network has a fixed architecture and the number
of neurons is fixed; the network training phase is based on a classical
supervised technique, namely gradient descent. Even though the Heringbased model partially covers some capabilities of the PPC model, there
are several reasons to take it into account. First, the PPC model is a pure
computational model whereas the Hering model provides an actuation
and second, the PPC model provides a computational mechanism that,
in principle, can be replicated in different robot morphologies whereas
the Hering model suggests a control strategy for stereoscopic robots.
Finally, a cognitive architecture (also known as Intentional Architecture -IA- or IDRA - Intentional Distributed Robot Architecture -), mimicking the interaction among the thalamus, the cortex, and the amygdala, is proposed in chapter 7. The cognitive architecture is based on
recent findings in neuroscience and, at the best of my knowledge, it is
the first proposal to model a middle level of cognition. The aim of this
last model is to design a biologically inspired architecture that could develop new goals starting from innate goals, during the interaction with
the environment. The focus is on the interaction among different brain
areas instead of trying to model every single details of each area. This
experimental model is based on some hypothesis in order to focus on the
arising of complex behaviours: the neurons are classical perceptrons; the
learning phase is a mixed approach because it uses both unsupervised
learning and self-generating reinforcement learning; the neural architecture is variable with the possibility to recruit new neurons and add other
layers. The goals drive the generation of motor commands that improves
the goal reaching.
A comparative overview of the performed experiments can be found
in Table 3.1. The first three entries represent the experiments related
to neurocomputational models at the Neural circuits sublayer, whereas
32
[82]
Neural circuits
Neural circuits
Cognition
PPC
Hering
IA
reinforcement
supervised
unsupervised
supervised
Learning method
mix
RBF
PC/BC [28]
energy model
Type of neurons
variable
fixed
fixed
fixed
Number of neurons
variable
fixed
fixed
fixed
Number of layers
Output
goals generation
motor commands
sensorimotor mapping
disparity map
Table 3.1: A proposal of classification of the experiments. The first three entries are experiments at the Neural circuits
level (see Figure 3.5) where the neural architectures are developed focusing on the computational mechanisms
founded in the brain. The last entry is an experiment that propose a cognitive architecture, focused on the
interaction of the cortex with the other brain areas. It is worth noting that the computational network, once
learned, does not need other learning phase, whereas the cognitive model has a variable neural network that
learns during the interaction with the environment.
[68] [69] [80] [81]
[78] [75] [79]
[77] [76]
Neural circuits
V1
References
Layers (Fig. 3.5)
Model
3.3 Proposed neural models and their role
33
3 Motivations
the last entry contains the features of the cognitive experiment. Interestingly, the neurocomputational models have more or less similar
characteristics except for the learning method. However, it is worth
noting that, for these models, the learning phase is done a priori, before
evaluating the performances. On the other hand, the cognitive model
has a learning phase that is performed real time and it models a continuous learning during the interaction with the environment. These facts
are reflected on the architecture plasticity.
Furthermore, each neurocomputational model can be implemented
using different types of neurons, even if all models share some of the
neural computational principles previously described, such as the population coding mechanism. The cognitive model implements the population coding mechanism also considering that the neural populations can
grow during the interaction with the environment. In other words, the
cognitive architecture is more focused on the interaction among different
neural networks (encoding also ancient areas of the brain, not only the
cortex), whereas the neurocomputational models are focused on intra
cortical mechanisms.
Despite the huge amount of literature regarding biologically inspired
controllers and neurocomputational models, it is far to be completely
understood which is a common methodology to design biological architecture. In particular, there are at least three key features to take into
account: the type of learning, the neural architecture, and the encoding
strategy. The learning phase can occur in different stages of the design
process and can be either real time or offline. Moreover, there is not
a common way to design a neural network that can encode the underlying mechanisms of the cortex. Finally, it is quite clear in literature
how the information is propagated inside the same functional area of
the brain, but it is difficult to integrate different cortical models because
there is not a common encoding strategy that allows to share the same
information.
In the following pages, I will introduce why these models are relevant
and why the performed experiments constitute a further step in the
investigation of the biologically inspired models.
3.4 The proposal of a roadmap for developing
bioinspired architectures
The aim of this work is the proposal of a roadmap in order to design a
biologically inspired architecture. The designing phase should be enough
flexible to adapt to different morphologies, using the same principles
34
3.4 The proposal of a roadmap for developing bioinspired architectures
to exploit different body shapes properties; however this phase should
provide a certain degree of adaptability to learn how to control the
specific robot. This roadmap is a proposal of investigating potentially
relevant models, for further integration in more complex biologically
inspired architecture.
In Section 3.1 I have introduced the brain areas involved in reaching a
target. As I have already discussed, these areas share the same representation mechanisms and are involved in the fusion and filtering of different
sensory sources. Moreover, the visual dorsal pathway is able to compute
the arm trajectory to reach the perceived target. Even though the visual
dorsal pathway is modelled only through merely computational phases,
it is worth to develop it for robotics. The primary objective of robots
is the capability of substitute humans in their tasks, and most of those
tasks frequently require to reach a target (e.g. an object). So, reaching a
target, or identify it, is the essential task that a robot must accomplish.
For this reason, this work specifically focuses on the reaching problem
related both to neuroscience and robotics (see Section 3.2 and 3.3).
Although, the reaching task is the key computational features of a
robot, it should be also driven by a motivation. The capability of computing the arm trajectory in order to reach a target is a merely computational mechanism that does not need a motivation, or a cognition.
For this reason, this work takes into account those cognitive aspects that
drive the pursuit of a goal. As discussed in Section 3.2 and 3.3, recent
advances provide the theoretical and scientific background to propose a
cognitive architecture based on the interaction of different brain areas
that are able to develop goals after the interaction with the environment.
The experiments presented in this thesis should draw the roadmap for
a completely new generation of biologically inspired architecture. The
experiments can roughly be divided in two categories: those proposing
a computational mechanism of the cortical areas (Chapter 4, 5, 6) and
those proposing a biologically inspired cognitive architecture (Chapter
7). The first set of experiments, as depicted in Section 3.3, models different cortical areas and it is worth noting that they share the same
computational principles, regardless both the learning mechanism and
the type of neurons. This set of experiments clearly shows that the computational mechanisms can be implemented following either a previously
developed neuroscientific study (such as V1 or Hering) or through unsupervised learning. This points out that the computational mechanisms
could emerge through the interaction with the environment, exactly as
in the second set of the experiments. On the other hand, the second
set of experiments points out that only the interaction among different
brain areas models could drive the pursuing of a goal and the generation
35
3 Motivations
of new ones.
A biologically inspired architecture has at least two advantages: it is
not task-specific and, using the same underlying neural substrate, it is
possible to obtain both computational mechanisms and cognitive functions. First, it departs from the classical approach in designing robotic
applications based on mathematical tools to specifically construct the
desired trajectory, in position, velocity, and acceleration. Those classical approaches need to know exactly the dynamic model of the robot,
whereas a biologically inspired mechanism can adapt to different robot
body, exploiting its morphology and without expressly considering the
dynamic model. Second, an advantage is related to the designing process
itself. In fact, in this roadmap, cognitive functions and computational
mechanisms share the same underlying neural substrate and the training
of the neural substrate can be obtained also with the interaction between
these two layers. Considering that this is a proposal of a roadmap for
developing a biologically inspired architecture, encoding both cognitive
functions and low level computational mechanisms, I recognize that further efforts are needed to get the whole goal.
36
4 A Primary Visual Cortex Model
for Depth Perception
1
The human primary visual cortex estimates the environment depth
starting from a pair of stereo images. The visual signal is captured by
the photoreceptors of the retina; after a brightness pre-processing, the
signal is sent through the Lateral Genicolate Nucleous (LGN) to the
primary visual cortex (V1). From photoreceptors to LGN the visual
signal is strictly monocular, so there is no way to estimate the depth of
the environment using stereo image cues (e.g. the retinal disparity). It
is known that the depth perception depends primarily on information
about retinal disparity and not on other information cues coming from
high-level decision areas (like prefrontal cortex) [58]. So, it exists an
”automatic process” to estimate disparity that does not imply reasoning.
Therefore, if it was possible to simulate the mechanisms that underlie the
depth perception, I could design a system to estimate retinal disparity
in a bio-inspired way.
From the humanoid robotics point of view, that involves different
disciplines from biology to engineering, the ultimate goal is to build a
humanoid that can interact with humans ”like” a human. One of the
main issues is to design a controller for humanoid robots is to deal with
the vastness of the information available in the surrounding environment;
it is not feasible to simply copy a biological system ”as is” (and this
fact is true in general, even for biological systems much simpler than
humans). Rather, the goal is to discover principles that underlie the
biological control and try to transfer those to humanoid robotics [93].
Besides, the field of bionics seeks to design robots that mimics biological structures and recent works show that a successful design rely
on embodiment [93][127]. It follows that the design of the controller
(the central nervous system) is inseparable by the morphology of the
robot because both affect the efficiency of the robot [93][67]. So, if I
intend to design a complete controller for a humanoid antropomorphic
robot that must interact with humans in hostile environments, and with
potentially infinite situations, it could be suitable to try to exploit the
intrinsic structure of the brain for information processing.
1
adapted from [76][77].
37
4 A Primary Visual Cortex Model for Depth Perception
The idea is to develop a vision system that can be easily integrated
in a more complex architecture which must take into account also the
structure and the embodiment of the humanoid.
Section 4.1 introduces the recent literature on neural approaches to
stereo systems based on neuroscience evidences, section 4.2 shows the
neural architecture modelling the primary visual cortex, section 4.3 introduces the experimental results, and section 4.4 draws the conclusions.
4.1 Related works
Generally, stereo vision addresses the research field that studies how to
elicit some interesting features from two images; one of the main features
of interest is the perception of depth. Then, generally speaking, I am
searching for algorithms or methods that allow to perform the perception
of depth with adequate reliability.
In this section, I introduce the main algorithms and methods published in recent literature.
The first model that explains the intrinsic function of neurons in the
primary visual cortex is the disparity energy model [86][85]. This model
proposes the existence of two different types of neurons, called respectively simple and complex cells, and explains how they communicate in
order to maximally respond in presence of the disparity for what they
are tuned. It also shows that the receptive field of binocular simple cells
can be approximated as Gabor filters, with proper parametrization (see
equation 4.1). The Gabor filter is typically used in signal processing and
it has several interesting properties (for a detailed description see [65]).
However, this model does not explain how it is possible to produce a
reliable disparity map.
A first systematic study to explain some properties of the disparity
energy model, starting from its mathematical definition, is developed
in [96][97]. They introduce topics as spatial pooling, scale pooling, and
orientation pooling; each simulation presented there is done with random
dot stereograms synthetic images (RDS) [58]. The usage of synthetic
images is reasonable in order to explain properties but in general it is
not meaningful to evaluate the performance of the system with them,
because the synthetic images have a simplified spatial structure and
luminance intensity with respect to natural scenes, and in my experience
the result of experiments with RDS are misleading.
In [35], the firing rate of complex neurons is taken as an evidence of the
disparity estimation. So, given different complex neurons with different
preferred disparities, the estimated disparity is equal to the preferred
38
4.1 Related works
disparity of the most responsive neuron of the population.
A template-matching approach is proposed in [121]; it is based on a
scalar measure of the mismatch between the neural responses and the
templates of responses, given a specific disparity.
In [19], a model that successfully integrates spatial pooling, orientation pooling, and scale pooling is proposed; they present a coarse-to-fine
mechanism with phase and position-shift integration. Due to the neural architecture, the system is robust to the image acquisition noise but
the disadvantage is that the system was designed to work with small
disparities and typically in real scenes the disparity range is quite wide.
Another approach to design a bio-inspired vision system is discussed
in [123], where the cooperation between phase- and position-shift mechanisms is interpreted in a way quite different respect to [19]. Both
methods are based on disparity energy model but in [123] the system is
intrinsically mono-scale and mono-orientation and it provides a mechanism in order to evaluate a large range of disparities. Besides, the
authors suggest to use a normalized feature in order to assess whether
the estimation is reliable or not and if an image point belongs to occluded regions. Their results, compared to [19], seem to denote a better
ability to estimate the disparity map; however this architecture does
not include the orientation and scale pooling, that should improve the
disparity estimation. In [124], the previous model is extended with orientation pooling in order to accumulate ”evidence” to support a disparity
hypothesis. This model is mainly based on a Bayes filter and it uses a
Bayes factor to test the hypothesis with the maximum support. Moreover, the proposed model identifies the occluded pixels. In [124], they
publish the results of the model tested on Middlebury stereo images
[106].
Another approach to disparity map estimation, proposed in [20], is
essentially a coarse-to-fine algorithm, with orientation and spatial pooling. To obtain a robust disparity estimation, a weighted sum of the
complex responses for each orientation is computed. Then, they define
a vector disparity as the vector difference from corresponding points in
the left and the right images, that permits to evaluate disparities that
are not only horizontal. In fact the model can estimate even disparities that have also a vertical component (it is necessary to compute
this component when the principal axes of the stereo cameras are not
parallel).
My model combines the technique proposed in [124] for the computation of large disparities and the capability of the neural architecture
of [20] to estimate the vertical component of the disparity. Moreover, in
order to improve the robustness, we introduce a weighted coarse-to-fine
39
4 A Primary Visual Cortex Model for Depth Perception
mechanism in a way similar to [19].
4.2 Neural Model
The primary visual cortex (or V1 area) is the first area that integrates
information afferent from both the eyes to produce a three-dimensional
representation of the environment based on two-dimensional retinal images; this process is also called binocular fusion.
The perception of the environment depth is closely related to the estimation of retinal disparity; retinal images are not strictly equal because
of the physical distance between the two eyes. Computing the disparity between the two retinal images allows to estimate the environment
depth, relative to the fixation plan determined by the eyes convergence.
For my purpose, the problem of depth perception can be reduced to
the computation of retinal disparity.
4.2.1 Image preprocessing
Generally the image sizes are not known at prior, so I need to develop a
system able to deal with stereo pairs of different dimension. Obviously,
this approach involves to take care about some system parameters, as I
will explain later. From now on I consider only aspects that are independent of the size of the images. The acquired images are, in general, color
images so it is necessary to convert them to luminance data in order to
preserve only the intensity component. After this, the mean luminance
is subtracted from the pair of the two images in order to improve the
edge enhancement, like the human retina [85].
4.2.2 Disparity energy neurons
The disparity energy model explains the response properties of the binocular neurons in V1 [86]. This model uses two types of neurons, the simple and complex cells, that are tuned to specific disparities [85][96]. The
left and the right images coming from the preprocessing stage are then
filtered with Gabor filters of different orientation, scale, and shape, according to the disparity energy model and the coarse-to-fine technique
with both the phase and position shift mechanisms [19]. Let rs and
rq be the simple and the complex response of the simple and the complex neurons respectively, and let g(x, y, θ, φ, ∆φ, ω) be the Gabor filter.
Then,
g(x, y, θ, φ, ∆φ, ω) = s(x, y, θ, φ, ∆φ, ω)w(x, y, θ)
40
(4.1)
4.2 Neural Model
where s(x, y, θ, φ, ∆φ, ω) is a cosinusoid and w(x, y, θ) is a 2D Gaussianshaped function (known as envelope). The cosinusoid is defined as
follows,
s(x, y, θ, φ, ∆φ, ω) = cos(ω[x cos θ + y sin θ] + φ + ∆φ)
(4.2)
where ω is the preferred spatial frequency, θ is the filter orientation,
φ is the phase parameter that will be used in the complex response
mechanism (to define a quadrature pair of simple responses) and ∆φ
is the phase difference between a pair of receptive fileds (RFs). The
envelope is defined as follows,
w(x, y, θ) = k exp(−
[x cos θ + y sin θ]2 [x sin θ + y cos θ]2
−
)
2σx2
2σy2
(4.3)
where σx and σy define the envelope dimensions (and the RF extension) and k, involved in the filter gain, is defined as
k=
1
2πσx σy
(4.4)
Therefore the receptive fields (RFs), based on biological evidences
[86][85], can be modelled as,
gl (x, y) = g(x, y, θ, φ,
∆φ
, ω)
2
(4.5)
∆φ
, ω)
(4.6)
2
where l, r subscripts represent the left and the right RF, respectively;
d is the position-shift parameter, ∆φ is the phase-shift parameter. Then
the simple cell response is written as,
gr (x, y) = g(x − d, y, θ, φ, −
Z
∞
Z
∞
rs = {
−∞
[gl (x, y)Il (x, y) + gr (x, y)Ir (x, y)]dxdy}2
(4.7)
−∞
where Il and Ir are the input images coming from the preprocessing
stage. Finally, the complex response cell is defined as,
rq = rs,1 + rs,2
(4.8)
where rs,1 and rs,2 are simple responses in quadrature phase, i.e. φ1 = 0,
φ2 = π2 and ∆φ1 = ∆φ2 . The preferred disparity of the complex cell
response is given by,
∆φ
Dpref =
+d
(4.9)
ω sin θ
41
4 A Primary Visual Cortex Model for Depth Perception
Figure 4.1: Proposed neural architecture
which means that the complex cell will response maximally when the
RFs of the complex neuron contain the preferred disparity.
4.2.3 Neural Architecture
In this section I explain the neural architecture of my system (see Figure
4.1). Motivated by the previous section I integrate some interesting
features of other proposed models. For each pixel in the left image I
want to estimate the corresponding pixel in the right image to produce
the disparity map. The spatial frequency is ω = π2 and σx = 2. The
position shift across the population is ∆C = {0, 1, ..., 55}. The aspect
ratio is 2.
Spatial pooling
To improve the response of complex cell it is possible to take into account
the physiological fact that the RF size of the biological complex cell
is larger than that of the complex cell model [96]. Moreover, a given
complex cell response is improved by the responses of the complex cells
of the near neurons. This observation is included into the model by
averaging several pairs of complex cells with overlapping RFs. Spatial
42
4.2 Neural Model
pooling can be mathematically defined as,
1
rc (x0 , y0 ) =
(a + 1)2
x0 + a2
y0 + a2
X
X
rq (i, j)w(i, j)
(4.10)
i=x0 − a2 j=y0 − a2
where rq is the complex cell response at the (i, j) stereo images location (see equation 4.8), w(i, j) is a spatial weighting function and rc is
the spatial pooling response of the complex cell with RFs centered at
(x0 , y0 ) over stereo images. In this system the chosen weighting function
is a symmetric two-dimensional gaussian with σpooling = 2σx .
Normalized response
In [124], they propose to evaluate the population responses at different locations in order to estimate the position-shift component and to
refine the estimation with the chosen population via phase-shift mechanism. To evaluate the position-shift component they suggest to use a
normalized feature R defined as,
R=
P −M
M
(4.11)
where P and M are respectively the peak and the mean of the population response curve. It can be demonstrated that the feature R takes
values between 0 and 1, since M ≥ (P − M ). In order to choose the
most probable position shift, they choose the population response curve
(each of them is localized at different horizontal locations) that maximizes the feature R∆c , where ∆c denotes the current position disparity.
Due to the good properties of this normalized feature I use it in order
to estimate the disparity.
Orientation pooling
According to [124], I implement a similar orientation pooling mechanism.
b=
R
X
wθ R∆C,θ
(4.12)
θ
b formally depends from the position shift ∆C and the scale,
where R
R∆C,θ is the normalized response at equation 4.11 and the weights wθ ,
that are estimated through an exhaustive search in the space problem
using a set of Middlebury stereo images.
According to previous results [19], I use 5 orientations ranging from
−60o to 60o in 30o steps.
43
4 A Primary Visual Cortex Model for Depth Perception
Scale pooling
Differently from [124], I propose to introduce a scale pooling phase in
order to make robust the estimated disparity. However, despite [19]
and [20], I introduce the scale pooling as weighted average across scales.
Formally,
R=
X
b
ws R
(4.13)
s
where ws are the weights that are estimated through an exhaustive
search. Due some empirical results, I choose to use only two scales
because the overall performance seems to increase for two scales only.
Now, taking the maximum R at each position shift location, I estimate
the most probable disparity.
4.2.4 Disparity direction
For each pixel it is possible to determine the direction of the estimated
disparity. I take the preferred direction of the filter (i.e. the normal
direction respect to the principal axis of the filter) that is the direction
of the bi-dimensional preferred disparity associated to the corresponding
complex neuron.
In [20], the authors propose a weighted sum of the complex cell responses for each orientation (i.e. center of gravity). Here, I propose
a weighted average of the estimated disparity for each orientation with
optimized weights (estimated with an exhaustive search through the orientations). Formally,
V (x, y) =
X
wθi dθi
(4.14)
i
where wθi is the estimated weight that depends from the orientation θ
(again, it is estimated through an exhaustive search) and dθi is the vector
with module equal to the estimated disparity at the given orientation
θ. The resultant vector V (x, y) will have the direction of the estimated
disparity (in my case I expect to have always horizontal vectors because
my test images have only horizontal disparities). My simulations show
the effectiveness of the proposed formula, see Table 4.1; for each pixel in
the images I compute the estimated disparity direction and after that I
extract the mean square error.
44
4.3 Experimental Results
Stereo images
Venus
Cones
Teddy
Tsukuba
Mean Square Error [rad2 ]
0.043
0.036
0.094
0.180
Table 4.1: The angle deviation from the optimality
(a) Cones ground truth disparity map
(b) Cones estimated disparity map
Figure 4.2: Cones estimation
4.3 Experimental Results
In this section I present the results obtained through simulation. The
bio-inspired model was first coded in Matlab to prove its correctness
and afterward, in order to minimize the computational time, some key
functions (e.g. 2D convolution) were implemented in CUDA.
Qualitatively I can perform a dense stereo estimation map of 383x434
pixels (the size of some Middlebury images) in about 12 seconds. The
performed simulations are relative to the disparity estimation and the
estimated disparity direction. The estimated disparity maps are then
submitted to the Middlebury evaluation system and the results are presented (see Figure 4.6).
The evaluated stereo images are Cones, Teddy, Venus and Tsukuba
(see Figures 4.2, 4.3, 4.4, 4.5).
It is worth noting that the proposed algorithm is biologically plausible
and this should be taken into account when comparing my approach to
the other algorithms. I have obtained an improvement of the performance with respect to [124].
45
4 A Primary Visual Cortex Model for Depth Perception
(a) Teddy ground truth disparity map
(b) Teddy estimated disparity map
Figure 4.3: Teddy estimation
(a) Venus ground truth disparity map
(b) Venus estimated disparity map
Figure 4.4: Venus estimation
46
4.3 Experimental Results
(a) Tsukuba ground truth disparity (b) Tsukuba estimated disparity map
map
Figure 4.5: Tsukuba estimation
Figure 4.6: Comparison among the proposed neural architecture for disparity estimation and some state-of-art algorithms [106]; the
table is extracted from the online evaluation page of Middlebury Database.
The simulations also show that the architecture, with the weighted
sum of the direction of the oriented Gabor filter, is able to correctly
identify the horizontal directions of the disparity (remember that the
stereo images from Middlebury database have only horizontal disparities). Simulations should be performed in order to validate the model
even for non-horizontal only disparities.
With respect to the performance reported in [20], I have obtained
comparable results, in terms of bad pixel errors.
47
4 A Primary Visual Cortex Model for Depth Perception
4.4 Conclusions
In this chapter I have presented a bio-mimetic system that computes a
disparity map starting from a pair of stereo images. Previous works show
the possibility to develop a bio-inspired system and here I have proposed
a different bio-inspired mechanism in order to improve the performance
reported in [20][124].
In fact, experimental evidences show that the system is more reliable
for small disparities than for large disparities [19]. However, as previously shown, natural images have a wide range of possible disparities
even in the same scene. One way to overcome this issue is to use coarse
Gabor filter large enough to cover large disparities, but empirical evidence seems to indicate that the estimation is not reliable. Besides, the
computational cost is too expensive and a further research is needed to
adopt the simulator for a real-time implementation, such as implementing the network on real hardware to exploit the parallelism. Another
approach, implemented in my system, is to use a smaller coarse Gabor
filter in order to cover small but reliable disparities [19] with an initial
position-shift [124] mechanism at the coarse scale.
Experimental evidences and the results presented in [124] seem to confirm the reliability of this type of approach. I have proposed a strategy
in order to integrate the pooling mechanisms, proposed in [19] and [20],
with the position-shift selection at coarse scale based on a bio-mimetic
feature analysis proposed in [124].
The obtained results point out that the model of the primary visual
cortex is a possible candidate for a physical implementation in hardware. Other works show this possibility but in general these implementations do not include the pooling mechanisms of orientations and
scales [111][122], while it has been pointed out how their contribute is
fundamental in natural scenes.
The output of the system is a disparity map with associated disparity
directions (in the current implementation only horizontal directions);
with these decoded features it is possible to correctly estimate the depth
of the environment.
48
5 A Posterior Parietal Cortex
Model to Solve the Coordinate
Transformations Problem
1
In humans, the problem of coordinate transformations is far from being completely understood. The problem is often addressed using a mix
of supervised and unsupervised learning techniques. In this chapter, I
propose a novel learning framework which requires only unsupervised
learning. I design a neural architecture that models the visual dorsal
pathway and learns coordinate transformations in a computer simulation comprising an eye, a head and an arm (each entailing one degree
of freedom). The learning is carried out in two stages. First, I train a
posterior parietal cortex (PPC) model to learn different frames of reference transformations. And second, I train a head-centered neural layer
to compute the position of an arm with respect to the head. My results show the self-organization of the receptive fields (gain fields) in
the PPC model and the self-tuning of the response of the head-centered
population of neurons.
This chapter is organized as follows. In Section 5.1 I present the
related works, in Section 5.2 I design the neural network model that
performs the implicit sensorimotor mapping, in Section 5.3 I present the
performed experiments and in Section 5.4 I derive my conclusions.
5.1 Related works
A coordinate transformation (CT) is the capability to compute the position of a point in space with respect to a specific frame of reference
(FoR), given the position of the same point in another FoR. The way
the mammal brain solves the problem of CTs has been largely studied.
Nowadays it is fairly well known from lesion studies [108] that the main
area involved in this type of computation is the Posterior Parietal Cortex
[4][48].
The computation of CT seems to exploit two widespread properties of
the brain, namely, population coding [61], and gain modulation [5][102].
1
adapted from [82].
49
5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem
Population coding is a general mechanism used by the brain to represent
information both to encode sensory stimuli and to drive the body actuators. The responses of an ensemble of neurons encode both sensory or
motor variables in such a way that can be further processed by the next
cortical areas, e.g. motor cortex. There are at least two main advantages
of using a population of neurons to encode information: robustness to
noise [61] and the capability to approximate nonlinear transformations
[95]. Gain modulation is an encoding strategy for the amplitude of the
response of a single neuron that can be scaled without changing the response selectivity of the neuron. This modulation, also know as gain
field, can arise from either multiplicative or nonlinear additive responses
[5][12].
Several computational models of the PPC address the problem of
CTs using three-layer feed-forward neural networks (FNNs) [134], recurrent neural networks (RNNs) [102], or basis functions (BFs) [95].
The FNNs and the BFs models are trained with supervised learning
technique whereas the RRNs model uses a mix of supervised and unsupervised approaches to train the neural connections, encoding multiple
FoRs transformation in the output responses.
It is worth noting that gain modulation plays an important role in
the computation of the coordinate transformations but it is still unclear
if this property emerges in the cortex from statistical properties of the
afferent (visual) information. Recently, De Meyer shows evidence to
support that gain fields can arise through the self-organization of an
underling cortical model called Predictive Coding/Biased Competition
(PC/BC) [28]. It demonstrates that the gain modulation mechanism
arises through the competition of the neurons inside the PC/BC model,
and comments on the feasibility of such system to compute CTs.
These computational models of the PPC could be particularly suitable
for the robotics community to solve the well-know problem of CT. In
the recent past, it was proposed an architecture that explicitly includes
a PPC model composed by a set of radial basis functions trained with
supervised learning techniques [21]. However, most of the approaches
in robotics address the problem of FoR transformation inside the more
general sensorimotor mapping approach, without explicitly exploit the
features of PPC models [48].
Following these ideas, I present a biologically inspired model for CTs.
First I describe the training of a PPC model with an unsupervised learning approach; and second I introduce the computation of the arm position with respect to the head position. I hypothesise that gain modulation mechanisms can emerge in the PPC neurons, and that basis
functions, encoding parallel CTs, can emerge after the training phase.
50
5.2 Neural Architecture
Retinal position
Head
Eye
ex
ax
rx
Arm
rx
PPC layer
y
x
ex
K
Head-centered
h
W
Eye position
Figure 5.1: (Left pane) Body definition composed by an eye, a head and
an arm with the same origin. (Right pane) Neural Network
model. The first layer encodes the sensory information into
a neural code, the second layer models the posterior parietal
cortex and it performs the multi sensory fusion and the third
layer encodes the arm position with respect to the head frame
of reference.
The main contributions of this work are: first to show an unsupervised
approach to the learning of sensorimotor mapping; second to exploit the
synergy between a biologically inspired neural network and the population coding paradigm; and third to introduce quantitative evaluation of
the sensorimotor mapping performance.
5.2 Neural Architecture
In this section I present the neural model used for computing CTs between an arm and the head FoR. I define a simple mechanical structure
composed by an eye, a head and an arm with the same origin. I assume
the same origin because the fixed translations among these FoRs can
be neglected due to their known contribution in the computation of the
CTs (Figure 5.1, left pane). The eye position is defined by the angle ex
with respect to the head FoR, the retinal stimuli position of the arm is
defined by the angle rx with respect to the eye FoR; the head-centered
position of the arm is defined by ax = rx + ex angle (see Figure 5.1,
left pane). The neural architecture is divided in three layers: the first
is composed by two populations of neurons which represent the information of the retinal position of the arm, rx , and the eye position with
respect to the head, ex . The second is composed by PPC population of
neurons that encode the position of the arm in different FoRs. The third
is a population of neurons that encodes the arm position with respect
to the head FoR.
51
5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem
5.2.1 Sensory representation
The first layer of the network model, receives as input the analog eye
position with respect to the head FoR (ex ) as well as the arm position
with respect to the retinal FoR (rx ). For my purposes, I defined the
eye angle ex in degrees and the retinal position of the target rx both
in degrees and in pixels, see Section 5.3. These numeric values are
encoded in a population coding paradigm, where a given sensor value can
be represented as a population of neural responses where each neuron is
centered at a particular value and it will fire higher as the sensor value is
closer to the neuron preferred sensor value. The response of population
neuron is defined as:
(v − µi )2
ni = A exp −
(5.1)
2σ 2
where ni is the response of the population neuron i, µi is the neuron
preferred sensor value, v is the input analog angle (in degrees) and σ is
the standard deviation of the gaussian.
For example, suppose that ex is equal to a certain angle ex in degrees;
the representation of eye position with respect to the head FoR is given
by a set of population responses as follows:
pe = [n0 , . . . , nM ]T
(5.2)
where pe ∈ RM is the vector that contains the population responses, M
is the number of neurons in the population, and ni is the single neuron
response given by the Equation 5.1. It is worth nothing that ex ranges
between a minimum and a maximum value and the distribution of the
neuron preferred sensor values µi can be arbitrary. I choose to linearly
distribute these values in the sensor space because there should not be
a preferred region in space where the sensor values are over represented.
Similar considerations are also valid for the representation of rx , defining
the correspondent population responses vector pr with dimension L.
I define the overall sensory representation as:
x = [pr pe ]T
(5.3)
where x ∈ RL+M is composed by the responses of two populations representing both eye position with respect to the head FoR and retinal
position of the arm with respect to the eye FoR.
5.2.2 Posterior Parietal Cortex model
The PPC layer is based on the Predictive Coding/Biased Competition
model (PC/BC) proposed in [28]. The model is trained with a unsuper52
5.2 Neural Architecture
vised approach that is based on Hebbian learning. The system equations
are:
s = x (2 + Ŵ T y)
y = (1 + y) ⊗ W s
(5.4)
where s is the internal state of the PC/BC model, x = [n0 , . . . , nL+M ]
is the neural population input vector defined by M retinal neurons and
L neurons encoding the eye position, W is the weight matrix, Ŵ is the
normalized W , y is the output vector of the PPC layer, and 1 , 2 are
constant parameters; and ⊗ indicate element-wise division and multiplication respectively. These equations are evaluated iteratively for a
certain number of time steps; after a certain period of time, y and s
values reach a steady state. The internal state s is self-tuned and represents the similarity between the input vector x and the reconstruction
of the input Ŵ T y (s ≈ 1 indicates an almost perfect reconstruction).
The unsupervised training rule is given by:
W = W ⊗ {1 + β y(sT − 1)}
(5.5)
where β is the learning rate. This training rule minimizes the difference
between the population responses x and the input reconstruction W T y;
the weights increase for s > 1 and decrease for s < 1.
Let’s consider the output vector y = [y0 , . . . , yT ] as the population responses of the PPC model. Each neuron response yi should be compatible with the gain modulation paradigm, according to the experimental
results of [28], in such a way that the response exhibit a multiplicative
behaviour, as a function of both eye and retinal positions. The weight
matrix, which encodes the response properties, is internal to the PPC
model and the training phase is independent with respect to the unsupervised training phase that will involve the head-centered network
layer.
5.2.3 Head-centered network layer
The population of neurons associated to the head-centered frame of reference deals with the estimation of the arm position ax given the eye
angle ex and the projection of the arm in the retina rx . The synapses
between the PPC layer and head-centered frame are trained with an
Hebbian learning, taking into account the arm position, ax . Estimating ax means identifying the maximum response inside a population
of neuron that encodes ax with the population coding paradigm. The
head-centered population responses are given by h = K y, where y is the
output vector of the PPC model, K is the weight matrix representing the
53
5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem
Figure 5.2: Experimental results with rx in degrees.(Top left) It shows
the responses of the trained network that represents the arm
position ax with respect to the head frame of reference for
−20◦ ,0◦ and 20◦ , respectively. (Top right) The error distribution (in degrees) of the estimated ax with respect to the
arm position, the eye position and the retinal position respectively. The solid lines represent the mean error and the
dashed lines represent the standard deviation limits. (Bottom left) It represents a receptive field after the training
phase of the PPC layer. (Bottom right) Contours at half the
maximum response strength for the 64 PPC neurons.
fully-connected synapses between the PPC model and the head-centered
layer and h is a vector that contains the population responses, encoding
the estimated ax . The dimension of h depends on the granularity of the
ax encoding. The training phase is performed using Hebbian learning:
K = K + δ h pTa
δ=
1
N
(5.6)
where pa is the vector that contains the proprioceptive population responses encoding ax , and δ is the learning rate depending by N , the
number of samples.
54
5.3 Experimental Results
Figure 5.3: Experimental results with rx in pixels.(Top left) It shows
the responses of the trained network that represents the arm
position ax with respect to the head frame of reference for
−20◦ ,0◦ and 20◦ , respectively. (Top right) The error distribution (in degrees) of the estimated ax with respect to the
arm position, the eye position and the retinal position respectively. The solid lines represent the mean error and the
dashed lines represent the standard deviation limits. (Bottom left) It represents a receptive field after the training
phase of the PPC layer. (Bottom right) Contours at half the
maximum response strength for the 64 PPC neurons.
5.3 Experimental Results
In this section I present the results obtained in two experiments; in the
first experiment I train and analyse the network where either the eye
angle and the retinal position are encoded in degrees and in the second
experiment I introduce a simple camera model to encode the retinal
information in pixel. The training phase is carried out in two steps:
(1) train the PPC layer and (2) train the head-centered layer. The
PPC layer is trained following the method described in Section 5.2.2
(Equation 5.5) and the synapses between the PPC and head-centered
layer are trained using Hebbian learning as described in Section 5.2.3
(Equation 5.6).
55
5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem
5.3.1 Experiment with retinal position in degrees
In the first experiment, I encode both rx and ex in degrees and for the
PPC layer, I use the same parameter values as in [28]. The y consists
of a 64-element vector with a range for the sensors values defined as follows: rx ∈ [−30◦ , 30◦ ], ex ∈ [−30◦ , 30◦ ], ax ∈ [−60◦ , 60◦ ]. I encode the
sensory input with a population of 61 neurons with a gaussian response
and with a standard deviation σ = 6◦ . The σ value is chosen taking into
account the experiment described in [28] whereas the neuron preferred
values are equally distributed inside the range value.
After the training of the PPC layer, I train the head-centered layer
with a population of 121 neurons, defining h as a 121-elements vector.
With 121 neurons representing ax the coding resolution (1◦ ) can be
analytically derived. The standard deviation of the neuron responses
associated to the arm position ax is equal to 6◦ . The population of
neurons, encoding the proprioceptive position of the arm, has the same
number of neurons of the head-centered layer (121) and each neuron
has the same standard deviation (6◦ ). The proprioceptive responses
vector pa drives the Hebbian learning for the head-centered neural layer
(Equation 5.6).
Figure 5.2 shows the analysis of the trained network: top left pane
shows the responses of the trained network that represents the arm position ax with respect to the head frame of reference. The red solid line
represent the response for ax = 20◦ , the green dashed-dot line represent
the response for ax = 0◦ and the blue dash line represent the response
for ax = −20◦ . Top right pane shows the error distribution (in degrees)
of the estimated ax with respect to the arm position, the eye position
and the retinal position respectively. The solid lines represent the mean
error and the dashed lines represent the standard deviation limits. The
error distributions are quite similar and, in general, the error is quite
low with a global mean error equal to 1.93◦ with a global standard deviation equal to 1.89◦ . Bottom left pane shows the receptive field after the
training phase of the PPC layer: it is shown the global shape of the gain
modulation. As expected, the curves shapes are compatible with the
gain modulation paradigm, supporting the evidence that an unsupervised method can effectively learn a multiplicative behaviour. Bottom
right shows the contours at half the maximum response strength for the
64 PPC neurons: it is worth noting the different color of the contours
that represent different level of activations. A qualitative analysis points
out that the population responses are stronger where the correspondent
neuron receptive fields are slightly overlapped. Moreover, the PPC neurons receptive fields almost cover the whole subspace in the ex -rx plane,
56
5.4 Conclusions
indicating that there is at least a neuron firing for each combination of
ex and rx .
5.3.2 Experiment with a simplified camera model
In the second experiment I investigate a more realistic scenario where
the retinal position is a pixel position in the image plane. I just consider
only the horizontal component of the image position of the arm. To
compute the real ax value I exploit some geometrical constraints, given
by the camera model. In the specific:
−1 rx
ax = ex + tan
[◦ ]
(5.7)
f
where rx is the retinal position in pixels of the arm and f is the focal
length of the camera. For my purposes, I choose a focal length equal to
120 pixels that represents a camera with a open lens of about 140◦ .
The PPC layer contains 64 neurons but the input range are rx ∈
[−320, 320], ex ∈ [−25◦ , 25◦ ], ax ∈ [−94◦ , 94◦ ] where rx is defined in
pixels; it follows that I suppose to have a image plane with an horizontal
component that has a size equal to 641 pixels. The range of ax follows
the maximum value that the ax can reach. I use 101 and 51 neurons
to represent rx and ex , respectively. I use the standard deviation σ
of gaussian representing rx equal to 60 pixels. Also in this case the
standard deviation of the proprioceptive neurons encoding ax is equal
to 6◦ .
Figure 5.3 shows the results from the analysis of the trained network.
The overall performance is lower than that obtained in the previous
experiments: the top right pane shows the error distribution with respect to the arm, eye and retinal position, respectively. In this set
of experiments, during the PPC learning, the system is able to learn
PPC receptive fields that are compatible, in a qualitative way, with the
gain modulation principle (see Figure 5.3, bottom left pane). The bottom right pane shows the receptive fields distribution in the space rx -ex
where I have the same qualitative features of the previous experiment.
The estimation of ax has a global mean error equal to 3.36◦ with a global
standard deviation equal to 2.90◦ .
5.4 Conclusions
This chapter described an unsupervised approach to learn coordinate
transformations. The results show how the system is able to correctly
57
5 A Posterior Parietal Cortex Model to Solve the Coordinate Transformations Problem
compute the position of a target with respect to the stable head frame of
reference knowing only the projection of the target onto the image plane
and the eye position with respect to the head. Further experiments are
foreseen to validate the model for more realistic scenarios, trying the
method on a real robotic system and extending the model for complex
physical architectures.
58
6 A Visuomotor Mapping Using an
Active Stereo Head Controller
Based on the Hering’s Law
1
Solving the reaching task problem means to compute the final hand
position in space whereas a sensory stimulus -usually vision- has indicated the position of a target. Before computing the trajectory of the
arm, it is necessary to estimate the target position with respect to the
arm frame of reference (FoR). More specifically, the controller has to
be able to compute the chain of coordinate transformations from the
sensory input to the actuation, considering also that the dimensionality
of the input and output spaces often differs.
A way to define these transformations is to compute a sensorimotor
map that correlates the sensory input space with the actuators space.
According to neuroscience findings, in particular to the theory of neural
modelling, the system can learn the sensorimotor mapping over a radial
basis framework [95]. In principle, it is possible to compute sensorimotor maps among several sensory systems and actuators systems but for
my purposes I always refer to the visual sensory system; the mapping
between the feature space of the visual stimuli and the joints space of
the actuator is called visuomotor mapping.
In my study an active stereo head, able to triangulate targets in space,
processes the incoming visual information. Given a target, the active
stereo head is able to foveate it in a specific joints configuration. If the
target is the arm’s end-effector the learning strategy correlates the arm
joints configuration with the head joints configuration to foveate the
target. The learning of visuomotor mapping is then obtained with an
active stereo head and an arm. Three degrees of freedom are enough in
my study since the system only needs to reach point, regardless of its
orientation.
Typically, this mapping is learned after foveating several random arm
movements: this technique is named motor babbling [101]. Motor babbling is a learning schema (or system identification) where the robot autonomously develops an internal model of its body in the environment
1
adapted from [75][78][79].
59
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
either randomly or systematically exploring different configurations.
In this work I present a bioinspired approach for reaching; I show
how data from a redundant stereo camera structure, driven by a controller fitting the Hering’s law of equal innervation are used to build a
visuomotor map in the radial basis framework. The main contributions
of this work are briefly summarized. The first result is to successfully
exploit the synergy between an active stereo vision system based on the
Hering’s law and a radial basis network that performs the visuomotor
mapping. The second contribution is to show how a redundant stereo
camera controller can be effectively used to train a sensorimotor map.
The third novel approach is to investigate how robust is the Hering-based
head controller in computing the foveation joint angles and interpreting
them as input features for the radial basis network (in the visuomotor
mapping).
Section 6.1 presents the related works, section 6.2 introduces the proposed neural model, both for vergence and for visuomotor mapping, section 6.3 present the performed experiments, and in section 6.4 I derive
my conclusions.
6.1 Related works
Even though the problem of sensorimotor representation is widely covered in literature (for a review, see [48]), for the purposes of my work I
consider only those papers approaching the visuomotor mapping through
motor babbling [101]. First, I review recent works on active stereo systems and second I present some works approaching the sensorimotor
mapping problem.
Several approaches and methods to effectively employ active vision
techniques are surveyed in [18]. The authors describe problems arising
from many applications, e.g. object recognition, tracking, robotic manipulation, localization and mapping. A lot of techniques are proposed
to deal with the low-level control strategies to drive the active stereo
head. Among the different surveyed approaches, it is worth noting that
the most of bioinspired architectures are based on the disparity energy
model [86], directly controlling vergence and version.
Wang et al. show the autonomous development of the vergence control, maximizing neural responses through reinforcement learning [128][129].
Gibaldi et al. show a model that directly extracts the disparity-vergence
response without an explicit calculation of the disparity [41]. Moreover,
the same authors implement the control strategy for the iCub head to
foveate steady or moving object along the depth direction considering
60
6.1 Related works
only some fixed configurations in the tilt direction [40]. Shimonomura
et al. propose an hardware stereo head built with an FPGA and silicon
retinas; the vergence system is able to foveate a point processing the disparity computed with the energy model [112]. Tsang et al. [122] show a
gaze and vergence control system using the disparity energy model with
a vergence-version control with a virtual vergence component. Qu et al.
[98] propose a neural model based on the energy model introducing the
orientation and scale pooling; they show how the novel features improve
the learning curve. Sun et al. [117] demonstrate that the vergence command can be learned starting from a sparse coding paradigm. Other
recent approaches addressing the problem of the vergence are based on
more classical algorithms, either fuzzy [62] or SIFT [6]. Typically the
experimental data are collected only along the depth direction; in my
research, instead, I have addressed the problem to produce statistics related to a wider space along the three direction in space. Moreover, I
have introduced the neck redundancy in order to improve the capability
of the control system.
On the other hand, the visuomotor mapping is the correlation between
the visual representation of the target and the arm joints configuration
to reach it. Chinellato et al. show a bidirectional visuomotor mapping
of the environment built on a radial basis function framework that is
trained through exploratory actions (gazing and reaching) and implemented on a real humanoid [21]. Saegusa et al. propose a method to
infer the body schema based on stochastic motor babbling. The babbling is driven by the visuomotor experience [100]. In another work,
Saegusa et al. propose a new method to produce a motor behaviour
which improves the reliability of the state prediction [101]. Hemion et
al. report a competitive learning mechanism to infer the way the robot
actuators can influence its sensory input without a preprocessing step
of self-detection [46]. Gläser et al. claim the first implementation of a
framework that includes a population coding layer for the representation
of the schemata in a neural map and a basis function representation for
the sensorimotor transformations where schemata refers to a cognitive
structure describing regularities within experiences that is similar to the
motor primitives reported for vertebrates [42]. Further references on the
radial basis networks used for sensorimotor mapping can be found in
[21].
61
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
6.2 Neural Architecture
6.2.1 Hering-based Control system
In this section I introduce the bio-inspired active stereo vision system,
previously proposed in [103]. The fundamental equations are based on
the Hering’s law of equal innervation which states that the eyes movement is generated by combining the movements of vergence and version
[60].
The system is a proportional model which needs to be trained to
learn the proportional parameters. The controller drives a 3 degrees of
freedom (DOF) structure with 2 DOF for the pan command for both
eyes and 1 DOF for the tilt, as in Figure 6.1. The fundamental equations
are:
θ̇version = K1 (xL + xR )
(6.1)
θ̇vergence = K2 δ
(6.2)
θ̇tilt = K3 (yL + yR )
(6.3)
where xL and xR are the feature x-position on the left or right image
plane and yL and yR are the feature y-position on the left or right image
plane. The disparity of the projected feature is represented by δ, and
[K1 , K2 , K3 ] are the parameters that must be estimated.
I can compute the pan and tilt angles as following:
θ̇r = θ̇version − θ̇vergence
(6.4)
θ̇l = θ̇version + θ̇vergence
(6.5)
θ̇t = −θ̇tilt
(6.6)
Setup
To be as consistent as possible with reality I use the camera model with
the same calibration matrix for both eyes:


200 0 320
K =  0 200 240
0
0
1
(6.7)
with focal length equal to 200 pixels and with an image plane of
640 × 480 pixels. This calibration matrix leads to a lens angle of about
100◦ .
62
6.2 Neural Architecture
It is worth noting that I use undistorted non-rectified matrices, taking into account that I deal with an active system and considering the
consistency of the camera model.
I define the origin of the neck-frame coincident with the origin of the
world frame of reference; the only movement of the neck is given by the
tilt activity. The camera positions are defined at 0.2 m of distance to
each other along the x-axis, and at 0.2 m along the y-axis of the world
frame of reference (see Figure 6.1). The unity measure of the world
frame of reference is meter.
To evaluate the performance of the system I use the following error
measure:
q
2
eL/R = x2L/R + yL/R
(6.8)
that is the Euclidean distance computed in the image plane between
the final feature position in the image plane and the centre of the image
plane (in this case I have defined the origin of the frame of reference of
the image plane exactly at the centre of the image plane itself). The
subindices L/R refer to the left and right camera respectively. I choose to
evaluate the error for the left and the right eye separately to understand
if the foveation error vary between the two eyes.
Learning phase
In this section I propose a method to learn the parameters Ki that
guarantee a minimum error eL/R for any desired 3D point to be foveated,
independently from the starting position of the stereo camera. The
parameters can be learned by performing the following minimisation:
c(X, Y, Z) = e2L + e2R +
X
| θ˙l |j +
j
K = argminK1 ,K2 ,K3
X
| θ˙r |j +
j
y
| θ˙t |j
(6.9)
j
XXX
x
X
c(x, y, z)
z
The Euclidean distances in the objective function are needed to evaluate the performance of the system in foveating the desired point; the sum
terms are necessary to minimize the lengths of the performed trajectories
(and therefore avoiding oscillations around the desired final position).
The objective function is minimised numerically using the gradient
descent method; the points used as training set cover most of the view
field and can be described as follows:
63
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
Right
y
Lefty
θR
θL
x
z
x
image plane z
image plane
y
World
x
θT
z
Figure 6.1: Frames of reference of the active stereo system with 3 DOF.
The tilt movement is executed along the x-axis of the world
frame, and it rotates the frames of both eyes of θT [rad]. Ideally, I define a virtual neck that performs the tilt movement.
x ∈ [−100, 100]
m
y ∈ [−100, 100]
m
z ∈ [1, 201]
m
with a step of 50 m.
6.2.2 Extending the Hering-based Control system
The model presented so far takes into account only 3 DOF to foveate
a generic target in the 3D space. In this section, I extend the model
adding a further degree of freedom (i.e. the neck) to improve the performance of the head in the pan activity. Moreover I investigate whether,
from a biological point of view, it is possible to infer some similarities
between the obtained head trajectories and the stereotypical trajectories
performed by primates (eventually humans).
In order to add the additional neck-joint, I have investigated different
augmented version of control system presented in Section 6.2.1, and for
each of them I have evaluated the performance.
First of all, I have introduced the neck component in accordance with
64
6.2 Neural Architecture
Equation 6.1-6.3:
θ̇neck = K4 (xL + xR )
(6.10)
It implies that neck motions depend on the position of the feature in
the image planes. Neck movements only consists of rotations along the
Y axis and are independent from the tilting command.
Setup
Introducing a new degree of freedom for the neck to make the system
redundant, requires to define a chain of roto-translations from the neck
to the world frame of reference. The position of a 3D feature (initially
defined in the world frame of reference) in the camera frame of reference
can be computed as follows:
L/R
RW
=
L/R
N
RN (θL/R ) RH
(θN )
H
RW
(θT )
(6.11)
L/R
where RW is the roto-translation between the world frame of referL/R
ence and the camera frame of reference (left or right), RN (θL/R ) is the
roto-translation between the neck and the camera frame of reference,
N (θ ) is the roto-translation between the head and the neck (defined
RH
N
H (θ ) is the tilting
as the movement along the pan direction) and RW
T
command defined as a rotation of the head frame of reference with respect to the world frame. The camera model and the other parameters
are defined as in Section 6.2.1.
Neck configurations
In order to compute the angle movements for pan, tilt, and rotation,
equations 6.1-6.3, and 6.10 have to be combined appropriately. I call
configurations the different ways to obtain these angle movements.
These configurations are summarised in the Table 6.1, and reflects the
following ideas:
• The eye movements (pan) could be mediated by the neck component (Configuration 1-4)
• The neck movements (pan direction) could be mediated by vergence and version (Configuration 1,2,3,6)
• The eye and the neck could be independent each other (Configuration 5)
65
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
Configuration 1
θ̇r = θ̇version − θ̇vergence + θ̇neck
θ̇l = θ̇version + θ̇vergence + θ̇neck
θ̇t = −θ̇tilt
θ̇n = θ̇r
Configuration 3
θ̇r = θ̇version − θ̇vergence + θ̇neck
θ̇l = θ̇version + θ̇vergence + θ̇neck
θ̇t = −θ̇tilt
θ̇n = θ̇neck − θ̇version
Configuration 5
θ̇r = θ̇version − θ̇vergence
θ̇l = θ̇version + θ̇vergence
θ̇t = −θ̇tilt
θ̇n = θ̇neck
Configuration 2
θ̇r = θ̇version − θ̇vergence + θ̇neck
θ̇l = θ̇version + θ̇vergence + θ̇neck
θ̇t = −θ̇tilt
θ̇n = θ̇l
Configuration 4
θ̇r = θ̇version − θ̇vergence + θ̇neck
θ̇l = θ̇version + θ̇vergence + θ̇neck
θ̇t = −θ̇tilt
θ̇n = θ̇neck
Configuration 6
θ̇r = θ̇version − θ̇vergence
θ̇l = θ̇version + θ̇vergence
θ̇t = −θ̇tilt
θ̇n = θ̇neck − θ̇version
Table 6.1: Possible configurations
Learning phase
I have adapted the learning procedure that is used for the 3 DOF system
(see Equation 6.9) to the new 4 DOF system:
c(X, Y, Z) = e2L + e2R +
X
j
| θ˙l |j +
X
| θ˙r |j +
j
K = argminK1 ,K2 ,K3 ,K4
X
| θ˙t |j +
j
y
| θ˙n |j (6.12)
j
XXX
x
X
c(x, y, z)
z
The minimisation is performed with the same algorithm and on the
same training set as in Section 6.2.1.
6.2.3 Visuomotor mapping for a 3 DoF arm
The simulated robotic system is composed by the previously described
head and an arm. The head is a 4 DOF structure with 2 DOF for the
pan command for both eyes, 1 DOF for tilting, and 1 DOF for the neck
component; the arm is composed by 2 DOF for the shoulder and 1 DOF
for the elbow as in Figure 6.2. I define the properties of the head and the
arm to be as much as possible compatible with the human characteristics
(see Figure 6.2 right pane; for the head, see Section 6.2.2).
66
6.2 Neural Architecture
Right
Left
y
y
x
z
x
z
TARGET
target
retinal position
Stereo vision
control system
arm
joint angles
Arm controller
(RBF network)
image plane
head
joint angles
y
Neck
x
z
Arm
y
World
x
z
Figure 6.2: System architecture. (left pane) The schematic model of the
working environment with the active stereo system and the
arm initial position. The aim is to detect the target position
in space through stereo cameras, compute head joint angles
to foveate the target and directly compute the final joint
configuration of the arm to reach the target location. The
sensorimotor map is learned using the end-effector itself as a
target for the vision system. (right pane) The schematic of
the arm. It has 3 DOF with links lengths compatible with
human counterparts. The range of θ1 is [− π2 , π2 ], the θ2 range
is [− π2 , π2 ] and the θ3 range is [0, 34 π].
I evaluate the system for a reaching task: given a target feature in
space the system must be able to perceive it with the stereo cameras,
compute the joint angles of the head to foveate that 3D point and,
without physically foveating it, use these head angles to map the head
joints space into the arm joints space for reaching the target position
with the end-effector.
The control architecture for reaching can be functionally subdivided in
two main modules: the first module is the vergence control system that
controls the stereo cameras system, and the second module is the RBF
network that computes the final arm joint configuration for reaching,
only knowing the joint configuration of the stereo system to foveate the
target. Conceptually, the training is equivalent to the motor babbling
schema; given a set of random movements of the arm it is possible
to correlate the arm joints space and the foveating joint angles of the
head. After an initial training phase, where the system explores the
environment and gathers data, the radial basis network is trained and
the performance is verified on a test set of 3D points (knowing the ideal
arm joints configurations to reach them).
Given a target in 3D space, the motor system should be able to compute the final joint configuration of the arm to reach the target. This
computational task is performed by the sensorimotor mapping. The
67
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
subsystem receives in input only the joint angles of the head to foveate
the target and it is able to compute the final configuration of the arm
without physically move the head. The basis functions can be combined
linearly to approximate any nonlinear function, such the mapping of the
peripersonal space (or the working space of the robot). According to
[95], basis functions are suitable to model the computational framework
of the posterior parietal cortex where the sensorimotor transformations
are performed. The radial basis network is composed by a nonlinear
hidden layer and a linear output layer. Each nonlinear hidden neuron is
a radial basis function. I use a gaussian radial function:
(x − c)2
h(x) = exp −
r2
(6.13)
where h(·) is the basis function, x is the head joints vector, c is the basis
center and r is the spread. The output neuron is defined as:
f (x) =
m
X
wj hj (x)
(6.14)
j=1
where m is the number of hidden neurons, wj is the weight and f (x) is
the estimated arm joints vector.
In the following section, I will discuss in more details the dataset
generation. On the dataset I perform a 10-fold cross-validation, selecting with a uniform distribution samples in the populated dataset [31].
With 9 folds I train the network and with the last fold I evaluate the
performance, repeating this procedure for each possible combination of
folds. This procedure is equivalent to motor babbling since the folds
are populated by randomly choosing samples from the whole dataset.
Moreover, to improve the accuracy of the visuomotor mapping, I perform a ”meta-learning” over all the possible combination of the folds,
varying the spread of the basis to infer the best spread value. Once I
have selected the best spread value, I perform the cross-folding validation to evaluate the radial basis network. The controlled 3 DOF arm
has human-like characteristics (see Figure 6.2 right pane). The lengths
of the links are compatible with those of the human counterpart and the
range of the joints is similar to those of the humans.
68
6.3 Experimental Results
6.3 Experimental Results
6.3.1 Hering-based results
Experimental results
The gradient descent minimization of the cost function on the training
set leads to the following parameters:
K1 = 0.3286 K2 = 0.0859 K3 = 0.1837
(6.15)
It is worth noting that the cost function has a lot of local minima but,
in my experience, the overall performance of the system is not affected.
To test the performance of the learned control system, I have conducted the following experiments:
• Exploring the 3D space, that investigates the capability of the active stereo system to foveate points that are not contained in the
training set.
• Testing initial position, that investigates the capability of the system to foveate a feature in 3D space, regardless the initial joints
configuration of the stereo camera. The aim is to investigate the
robustness of the system to foveate a feature starting from a generic
position.
Exploring 3D space
As a first experiment, I investigate the capability of the system to foveate
a huge set of features (e.g. 3D points) in the 3D space starting from a defined initial position. Based on their 3D positions, the evaluated points
(testing sets) can be grouped in three cubes adjacent to the training set:
Along Z direction
[−100, 100] × [−100, 100] × [201, 401]
Along Y direction
[−100, 100] × [100, 200] × [1, 201]
Along X direction
[−200, −100] × [−100, 100] × [1, 201]
Each of these portions of space is discretised with a step of 10 m in each
direction. I do not consider the points that are not projected in either
image planes.
69
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
Figure 6.3 shows the errors associated to each point in the 3D space
(top row), and the overall error distributions (bottom row).
The mean error associated to the testing set along the Z direction
is 1.42 pixel with a variance of 0.33. This result is expected mainly
because the projections of the 3D points are closer to the image centre
as their distances from the image plane increase. Along the X direction
the error increases as the X component increases. Since these points
are close to the image planes, their projections are in the border of the
images and, consequently, the task of foveating them is more challenging.
However, as can be seen from the bottom pane, the errors are distributed
in an acceptable error interval; i.e. [2.5;5.5] pixels with an average of
4.33 pixels and a variance of 0.488. Similar considerations can be done
for the testing set along the Y direction, where qualitatively the error
increases as the Y component of the 3D points increases. The mean
error is 3.96 pixels with a variance equal to 0.296.
Figure 6.3: Error maps computed for the left eye; I have experienced
very similar error values also for the right eye. Top row:
testing sets with the error associated to each foveated 3D
point. Bottom row: the error distribution in pixel for the
testing set. The red line represents the mean of the error.
As I expected the error distribution along the Z direction is
lower then along the other directions.
70
6.3 Experimental Results
Test initial position
The initial position test aims at understanding the robustness of the
system to foveate a 3D point starting from a generic joint configuration
(i.e. θl , θr and θt ).
Vergence and version affect the panning command competitively (see
Equation 6.4). To check whether the system is able to perform panning
accurately, I have evaluated the most problematic region of the 3D space.
Indeed, the Z region represents an ”easy” case where the points are
always projected to the centre of the image, and the Y region does
not affect the panning but the tilting. The testing subspace along X
direction, used for the experiments, is:
[−200, −100] × [−100, 100] × [1, 201]
discretised with a step of 10 m on each direction. Let the system
foveate each of the testing points starting from each possible joint configuration in the joint space. I have defined a range of values for each
joint, i.e. [−60◦ ; 60◦ ] with a step of 30◦ . In total, I have 125 different
joints configurations. Then, I compute the mean error associated to each
3D point and the results are shown in Figure 6.4. Qualitatively, the error
increases as the Z component of the 3D points decreases (see left pane).
Since these points are close to the image planes, their projections are in
the border of the images and, consequently, the task of foveating them
is more challenging. However, as can be seen from the right pane, the
errors are distributed in an acceptable error interval; i.e. [1;35] pixels
with an average of 15 pixels.
6.3.2 Extended Hering-based results
The experiments presented in this section aim to:
• selecting the neck configuration that has the best performance in
terms of error in the exploration of the 3D testing space
• comparing the performance with the results collected with the original system
To select the best configuration I have compared the obtained results in
the experiment ”exploring the 3D space”. The best configuration was
then used to run the experiment ”test initial position”.
71
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
Figure 6.4: Original system. In the left pane it is shown the mean error
associated to each 3D point in the testing set. The mean
error is computed considering each plausible initial joints
configuration of the head; for each configuration I compute
the error to foveate. In the right plane is shown the mean
error distribution.
Exploring 3D space
I have run the experiments for each neck configuration and, comparing
mean and variance, I have found that the best configuration is the number 5, with decoupled control between eyes and neck (see Table 6.1).
The testing sets are the same as defined in Section 6.2. The obtained
parameters K after the training phase of Configuration 5 are:
K1 = 0.0167
K2 = 0.5543 K3 = 0.1584
K4 = 0.3542
(6.16)
Figure 6.5 presents the error maps related to Configuration 5. Results seem to be compatible with the performance obtained with the
3 DOF system (see Section 6.2 and Figure 6.3); i.e. the mean errors
are 4.33, 3.93 and 1.41 pixel, and the variances are 0.65, 0.32 and 0.34,
respectively for the testing sets along the X, Y and Z directions.
Test initial position
The initial position experiment results are presented in Figure 6.6. The
errors are distributed in the interval [5; 20] pixels. Compared to the
72
6.3 Experimental Results
Figure 6.5: Error maps computed for the left eye of the extended system
with the fifth neck configuration.
performance of the 3 DOF system (see Section 6.2.1 and Figure 6.4),
the error presents a lower mean and standard deviation. I can therefore
conclude that the additional neck-joint provides robustness to the system
and, specifically, it reduces the influence of the initial configuration of
the head on the performance of the system in foveating a point in space.
Head trajectories
I have investigated different possible control laws for the extended model
to take into account the redundancy introduced by the neck; what
emerges, comparing the error in foveating 3D points, is that the best
performance is obtained when both eyes and neck controls are decoupled (Configuration 5). Comparing the error illustrated in Figure 6.3
and 6.5 it emerges that the mean error and variance associated to the
extended system are in general similar to the original ones. Figures
6.4 and 6.6 present the experimental results of the initial position of
the system. In this case the error of the extended system presents a
lower mean and standard deviation. I can therefore conclude that the
additional neck-joint provides robustness to the system.
Furthermore, a qualitative analysis of the trajectories of the extended
model with decoupled control (see Figure 6.7), i.e. Configuration 5,
seems to be compatible with some biological results [17].
73
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
Figure 6.6: Extended system. In the left pane it is shown the mean error
to foveate each 3D point in the testing set. The mean error
is computed considering each plausible initial joints configuration of the head. In the right pane it is shown the mean
error distribution.
6.3.3 Visuomotor mapping results
Method
Training the active stereo head means to estimate its parameters [75]. I
have discretized the arm joint space in fine steps of 12◦ along each axis,
for a total number of 1062 samples. For each position in the joint space
the direct kinematics is computed. Knowing the position of the endeffector and its projection onto the image planes of the stereo cameras,
I compute the vergence-version angles to foveate the end-effector itself.
With the calibration parameters of the stereo cameras I compute the
foveating point in 3D space in order to compute the Euclidean error
between the 3D position of the end-effector and the foveated point in
space. This is an intrinsic error of the active visual controller and it does
not depend on the arm controller. The trained network should be able to
manage it, estimating the end-effector position regardless the foveation
error. Figure 6.8 (left pane) shows the generated dataset; each point
represents a valid end-effector position. Moving selectively the arm in
space I build a dataset where each sample is composed by:
• 3D position of the end-effector
• arm joints configuration
74
6.3 Experimental Results
Figure 6.7: The trajectories of the cameras performed by the trained
extended system. The blue cross represents the 3D feature
in space in position [200 0 40]. For graphical reason, the
image is scaled but it is clearly shown that the system firstly
moves the neck and only when the neck is in a steady position
the eyes perform the vergence movement.
• head joints configuration, foveating the 3D position of the endeffector (projected into the image planes of the cameras)
• Euclidean error in 3D space between the foveated point and the
position of the arm (due to the vision system intrinsic error)
I split the dataset in 10 folds to perform the cross-validation in the training phase of the radial basis network where each testing fold is composed
by 118 samples. Cross-validation is widely used to evaluate the performance of the radial basis network, using the mean square error (MSE)
as evaluation criterion to control the stopping of the training phase. In
order to improve as much as possible the accuracy of the network, I have
implemented an optimization loop to detect the best spread value for
the basis that minimize the mean euclidean error between the estimated
arm joint configuration and the real arm configuration. The range of
the evaluated spread values is [0.5, 1.3] rad.
Results
The evaluation of the radial basis network is mainly composed by two
phases:
1. choose the best spread value
2. evaluate the network performance with the cross-folding technique
(using the optimal spread value)
75
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
Figure 6.8: (left pane) Complete dataset. The points in space represent
the end-effector positions used as targets for the active stereo
head. In the dataset at each end-effector position are associated the corresponding arm joint configuration, the foveating
joint angles of the head with the euclidean error between the
foveation point e the end-effector position. This dataset is
used for the cross-folding validation. (right pane) The Euclidean error between the real end-effector position and the
one estimated by the radial basis network; the error is quite
low except for that 3D points that are very near to the head
and to the shoulder.
Figure 6.9: Error directions projected on different planes of the world
frame of reference. The blue dots represent the targets and
the red lines are the distance in space between the target and
the arm position computed by the network. For visualization
reasons, I do not plot the estimated end-effector position.
(left pane) Error projection into the plane X-Z. (right pane)
Error projection into the plane Y-Z.
76
6.3 Experimental Results
Figure 6.10: Radial basis centers distribution in the input space. The
red circle represents a basis center, the blue dots are the
testing values and the cyan dots are the values used for the
training phase.
In the first phase, I execute an optimization loop over the 10-fold crossvalidation, varying the spread value of the basis functions. In the optimization phase, I have found that the best spread value is equal to
1.25 [rad]. After the estimation of the optimal value, I use it to perform the 10-fold cross-validation over the dataset, evaluating the overall
performance related to the reaching task.
Figure 6.8 (right pane) shows the scatter of the Euclidean error between the real position of the end-effector and the estimated one for the
test set. As previously said, I use the 10-fold cross-validation so the
dataset is split in 10 folds and the error is given for each testing fold
(each fold is used as testing set and the other 9 are used as training set).
Here the error, generated from the testing of each fold, is plotted. The
figure clearly shows that the error is very low in the whole workspace
with the exception of points that are very near to the head and shoulder. Knowing the foveation angles of the head, the network is able to
correctly compute the arm joint angles to reach the targets. The mean
error is 0.0320 m and the standard deviation is 0.0591 m inside a error
range of [0.0001, 0.9603] m. I notice that the maximum error of 0.9603 m
is associated to a point in the border of the workspace, very near to the
head. This error is due to the constraints that I have imposed to the
arm joints range, when the forearm is at 132◦ over 135◦ .
77
6 A Visuomotor Mapping Using an Active Stereo Head Controller Based on the Hering’s Law
Figure 6.9 shows the euclidean distance between the estimated arm
position and the desired target. Blue dots represent the targets and red
lines are the distance in space between the target and the arm position
computed by the network. In the left pane is shown the projection on
the X-Z plane of the world frame of reference; the error is distributed in
the whole space but it decreases when the Z value increases. It is due
to the constraints imposed on the arm range. The right pane shows the
same projection on the Y-Z plane and clearly indicates that the error is
not due to the shoulder component in the tilt direction, since red lines
are along the same shoulder position. It is due to the fact that the
network receives only one tilting component of the head, in opposition
to the horizontal head movements that are generated by the left, right
and neck components.
Figure 6.10 shows the radial basis centers distribution in the input
space and is referred to a single training of the cross-folding validation.
The training phase has selected 82 neurons. The red circles represent
a basis centers, blue dots are the testing values, and cyan dots are the
values used for the training phase. The left pane shows the centers
distribution projected on the plane of θL -θR that command the pan
movements of the camera. The centers are distributed along a straight
line and it is in agreement with the highly cooperation-competition of
the two pan angles. Furthermore, the θL and θR values used as input
for the network are well-distributed along the same line. In the middle
pane I project the same center on the θN -θT plane and the centers distribution is quite uniform mainly because the two angles are conceptually
independent. Finally, in the right pane I project the input data on the
plane θN -θL and also in this case I observe a correlation between the
neck component and the left camera. It is clearly shown that the neck
performs the raw movements and the camera only executes small movements to correct the vergence. Also in this case, the centers are mainly
distributed in the region containing the common movements but for the
outliers, that have ad-hoc centers created during the training phase.
6.4 Conclusions
In this work I have presented a vergence-version control system for an
active stereo head based on the Hering’s law, able to drive the learning of a visuomotor mapping. First, I have quantitatively evaluated the
performance of the original system previously presented in [103]. I have
defined a cost function and I have trained the system with a classical
technique; the obtained results show the robustness and the effectiveness
78
6.4 Conclusions
of the controller. Second, I have extended the controller adding a neck
component that makes the system redundant. I have defined different
possible configurations of the neck control including coupled/decoupled
controls. I have extended the cost function and trained the new controller for each neck configuration. I have compared the different neck
configurations and chosen the best in terms of obtained performance. I
have found the best performance with a decoupled control eye-neck. The
trajectories generated from this controller are compatible with the human head trajectories in foveating tasks. Moreover, comparing the performance with those of [103], I have found that the extended controller
solves the redundancy improving the performance and the robustness of
the system.
Furthermore, I have presented a novel computational system that
is able to compute the visuomotor map between a target (perceived
through the redundant active stereo head) and a 3 DOF arm, designed
with human-like mechanics constraints. The 3 DOF arm can move
around its peripersonal space through a radial basis network that computes the visuomotor mapping. Through an optimization phase, I select
the best spread value for the basis functions that are part of the neural
network. I have confirmed the robustness of the mapping through a 10fold cross-validation. After the training phase, the overall controller is
able to detect in space a target and without moving the head, but just
computing the angles of foveation, to reach it with the arm.
The results confirm the robustness and the accuracy of the system to
reach target in peripersonal space. Moreover, the Hering-based stereo
controller generates the joint angles that are robust features to drive
the network for the visuomotor mapping. The next step will be the
validation of the model with a real robot with the same characteristics
described above and adding new degrees of freedoms to the arm to make
it redundant.
79
7 A model of a middle level of
cognition
1
Cognitive development concerns the evolution of human mental capabilities through experience earned during life. Many researchers have
developed agents that could develop autonomously through experience,
interacting with the environment and adapting to it. An important
feature needed to accomplish this objective is the self-generation of motivations and goals, as well as the development of complex behaviours
consistent with those goals. My target is to realize a bio-inspired cognitive architecture, based on an amygdala-thalamo-cortex model, capable
of autonomously developing new goals and behaviours. Experimental
results show the development of new goals and movements.
7.1 Introduction
During their life, humans develop their mental capabilities: this process
is called cognitive development, and concerns how a person perceives,
thinks, and gains understanding of the world through the interaction of
genetic and learned factors [11]. A fundamental aspect in the cognitive
development is the autonomous generation of new goals and behaviours,
which allows the individual to adapt to various situations he faces every
day. How humans can develop autonomously new goals during their
existence is not completely understood. In order to realize agents capable to interact in an effective way with humans and integrate in their
life, robotics should study the processes of human brain which allow
the cognitive development of the individual, as well as the modalities
underlying the generation of new goals and behaviours.
My work gives a contribution to the achievement of this objective: its
purpose is to create a bio-inspired robotic model based on human brain
processes, that should make the agent able to autonomously develop
new goals as well as new behaviours that could be consistent with these
goals.
1
adapted from [68][69][80][81].
81
7 A model of a middle level of cognition
Figure 7.1: The overall IDRA architecture. It is composed by a set of Intentional Modules (IM) and by a Global Phylogenetic Module (PM). It receives in input a set of sensory information
a produce in output the motor commands for the controlled
robot.
There are different approaches in robotics to adaptation. Behaviourbased robotics allows the agent to adapt its behaviour to changes in
the environment, in order to accomplish its goals. In this approach
goals are hard-coded into the robot, which cannot develop new ones.
Developmental robotics aims at modelling the development of cognition
in natural and artificial systems [66]. Developmental robotics leads to
the cognitive development of the agent, making it able to adapt to the
environment and autonomously develop new motivations and goals, that
were not present at design time.
Here I address an intermediate level of cognition that allows mammals and humans to be aware of the surrounding environment and then
interact with it. This capability is an essential precondition to enable
the robot to fit into the human’s everyday life. A robot must be able
not just to act in a consistent manner to the changes in the surrounding
environment, but also to develop goals that can emerge from that, in
order to interact effectively with people. Such robot would also develop
a unique personality, depending on the experiences that contributed to
the creation of its new goals and behaviours. These features would make
it the perfect robot for advanced applications.
I present a system that allows the robot to develop new goals, in
82
7.1 Introduction
addition to the hard-coded ones, and to adapt its own behaviour to these
objectives. This system is inspired by the human brain, in particular
by the communication of three areas of the brain: cortex, thalamus and
amygdala, considered as a key element in human cognitive development.
The Intentional Distributed Robotic Architecture (IDRA) is a network
of elementary units, called Intentional Modules (IM), that enables the
development of new goals (see Figure 7.1). In addition in the network it
is also present a Phylogenetic Module (PM), containing the hard-coded
objectives, i.e. the “innate instincts”; like in the amygdala. Through
the action of the PM, the more the current state of the robot meets the
objectives, the higher is the signal coming out.
Each IM consists of two internal modules: Categorization (CM) and
Ontogenetic (OM) (see Figure 7.3). The Categorization Module, like
the cerebral cortex, returns a vector that encodes the neural activation
of the cortex in response to the input. The neural activity represents
the similarity of the current input with respect to previous relevant incoming signals. The Ontogenetic Module is the basis of the development
of new objectives; it receives the vector of neural activations from the
Categorization Module, and through the Hebbian learning develops new
goals. It returns a signal indicating whether the current state meets the
new goals. As depicted in Figure 7.3 PM and OM signals are compared,
returning the more relevant of the two. The result is called relevant
signal and it drives the self-learning phase of each IM. Therefore, the
execution flow starts when the sensory input is acquired, filtered and
sent to the network of IMs; each module returns a vector containing information about the state, and a signal indicating how much the actual
state is satisfying the actual goals. The network can be composed by
several layers, that can be connected in forward or feedback mode.
The vector of neural activations are then used by a Motor System
(MS) to generate movements consistent with the goals of the agent.
Each movement composes a series of elementary components, called motor primitives, which represent the muscles activations over time; their
composition (i.e. muscular synergy) leads to the execution of complex
movements [44][74][116].
Information coming from IDRA is also used to create the behavior of
the agent in new goals and situations. Each Dynamic Behavior analyzes
the input to suggest the best movement, that is, which action will lead
to the fulfilment of the goals, either learned or innate. The Dynamic
Behaviour returns a set of muscle activations (a composition of motor
primitives), each one referring to a joint connected to that specific Behaviour. These muscle activations are then used to perform the correct
movement. IDRA therefore supports the cognitive development of the
83
7 A model of a middle level of cognition
robot while the MS allows the robot to move and take actions in order
to accomplish these goals.
The system has been tested with two main experiments to verify the
goals development skills as well as the ability to adapt its behaviour to
its goals. In a first experiment the agent (a NAO robot, a humanoid produced by Aldebaran Robotics) learns a new goal from a hard-coded one;
starting from an innate instinct related to figures with high-saturated
colours, the agent autonomously develops an interest to a particular
shape of the figures. In a second experiment the motor capabilities are
tested. The NAO robot moves to maximize the relevant signal coming
form the network of IMs; movements are generated by linear combination of motor primitives. The main contributions of this work are:
• the design and full implementation of a cognitive architecture
based on an amygdala-thalamo-cortical model, as already described
in general terms in [70];
• the validation of the architecture, by testing the goals generation
with the replication of the experiment performed in [70];
• the extension of IDRA with the Motor System, which allows the
creation of movements by linear composition of motor primitives.
In Section 7.2 I present the biological aspect related to my system.
In Section 7.3 I discuss the Cognitive Architecture model and implementation. In Section 7.4 I describe the experiments to test the goals
development and movements generation. Section 7.5 contains the conclusion.
7.2 Biological model
Several studies have shown the importance of the amygdala-talamocortical interactions in cognitive development [115].
The cortex, the outer part of the brain, is divided in several sectors, and receives signals from the sensory organs. Although most of
the sectors receive input from a specific source, different studies have
proven that different areas can properly react to different stimuli sources.
Therefore the whole cortex is composed of the same kind of cells, and it
is able to respond, store and adapt to different kind of stimuli. The cortex acts as a memory bank that the brain uses to compare the incoming
data to find analogies patterns, and invariant representations, and act
accordingly.
84
7.2 Biological model
The thalamus, the largest component of the diencephalon, is the primary site of relay for all of the sensory pathways, except olfaction, on
their way to the cortex. The thalamus plays a central role for the mammals in the development of new motivations, as well in the choice of
what goal to pursue. The thalamus is “a central, convergent, compact miniature map of the cortex” [110]. The thalamus is partitioned
into about fifty segments, which do not communicate directly with each
other. Instead, each one is in synchronized projection to a specific segment of the cortex, and receives a projection from the same segment.
Therefore, while the cortex is concerned with data processing, storing
and distribution, the thalamus determines which goals have to be pursued. Furthermore, each pair of cortex and thalamus sections intensively
communicates.
The amygdala is an almond-shaped group of nuclei located deep within
the medial temporal lobes. It is heavily connected to the cortex area; it
is involved in the generation of somatosensory response on the basis of
both innate and previously developed goals, and of sensory information
[1]. The amygdala seems to be an essential part in social and environment cognition in order to guide social behaviours. The amygdala has a
key role in the recognition of the emotions and in the generation of an
adequate response [32]. Thus one of the principal tasks of the amygdala
is to generate new goals taking advantage of hardwired criteria.
The cerebellum is supposed to have functionalities and structures similar to the classical perceptron pattern. The cerebellum has an extended
network of various types of neurons, giving different abilities including
motor learning and motor coordination [3]. Many studies have shown
the importance of the basal ganglia in the motor generation of the movement. Basal ganglia and cerebellum seem to create two different sets of
loop circuits with the cortex, both dedicated to different features of motor learning; independent and in different coordinates. Furthermore, it
has been recently proposed a functional dissociation between the basal
ganglia and the cerebellum: the former is implicated in optimal control
of movements, and the latter seems to be able to predict the sensory
consequences of a specific movement [120].
The spinal cord is the lower caudal part of the nervous system; it
receives and processes sensory information from the various parts of the
body, and controls the movement of the muscles. It acts as a bridge
between the body and the mind [30].
Several studies have shown how complex movements are generated
from the combination of a limited number of waveform modules, which
are independent from the considered muscle, its speed, and its gravitational load. It was suggested that the nervous system do not need to
85
7 A model of a middle level of cognition
generate all the muscles activity patterns, but to generate only a few
basic patterns and combine them to generate specific muscle activation.
This model can be represented by an oscillator that produces the output frequency from the input signal [89]. A motor primitive is a specific
neural network, found in the spinal cord. The Dynamic Movement Primitives - a formulation of movement primitives via autonomous nonlinear
differential equations - have been successfully used in robotic applications, and can be employed with supervised learning and reinforcement
learning [104]. However, I separate myselves from this kind of experiments, which use learning in task-based robots. In my architecture I do
not focus on high-level motor skills, nor on high level of reasoning and
planning. Instead, I focus on an intermediate level of cognition that allows mammals and humans to be aware of the surrounding environment.
Consciousness has been already suggested to be a product of an intermediate level of cognition. The awareness is supposed to be not a direct
product of sensations, the “phenomenological mind”, nor a product of
high level conceptual thoughts, the “computational mind”, but to be a
product from several intermediate levels of representation [56][57]. This
middle level has some interesting features related to consciousness: it underlines how I can interpret the surrounding environment and react to
this awareness without the need for neither high-level conceptualizations
nor complex motor controls, therefore solving the grounding problem of
a semantic interpretation of a formal symbol system that is intrinsic to
the system itself. In particular I will deal primarily with the categorical
representation that is the learned and innate capability to pick out the
invariant features of object and event categories from their sensory projections. While the behaviour-based robots explore the environment in
order to optimize their actions for reaching a predefined goal, I aim at
generating robots able to explore the environment in order to find new
goals on their own. They must be curious to explore their environment,
and once explored, they must be able to do something new according
to their abilities as well as to their experience. Developmental robots
must be able to perform actions following goals that were not present at
design time [66][70].
7.3 Implementation Model
7.3.1 Intentional Distributed Robot Architecture
IDRA deals with the cognitive development of the agent, analyzing inputs and allowing it to develop new goals (a comparison with the biological counterpart is shown in Figure 7.2). The architecture is basically a
86
7.3 Implementation Model
Figure 7.2: A comparison between a (very) sketchy outline of the
thalamo-cortical system as commonly described [84] and the
suggested architecture. It is worth to emphasize the similarity between various key structures: A. Category Modules vs
cortical areas, B. Global phylogenetic module vs. amygdala,
C. Ontogenetic modules vs. thalamus, D. Control signals
(high and low bandwidth), E. high bandwidth data bus vs
intracortical connections.
87
7 A model of a middle level of cognition
net of linked modules, simulating connections and interactions between
the cerebral cortex, the thalamus and the amygdala. The modules composing the net are the PM (amygdala) and a layered set of IMs, each
composed by the Categorization Module (cerebral cortex) and the Ontogenetic Module (thalamus). The IMs are linked in various ways, also
with feedback connections, while the PM broadcasts its signal to all
the IMs, without receiving data back. This kind of structure simulates
the interaction of the three brain areas: the thalamus can communicate
with some respective areas of the cortex, which collects all information
coming from the thalamus, but different areas of the thalamus cannot
communicate between them. The amygdala sends information both to
the thalamus and the cortex (see Figure 7.2).
The input of the net arrives from various sensors (video, audio, tactile,
etc.), is filtered and sent to the IMs of the first layer. The output of the
net is a vector, representing the neural activation generated by sensory
input, and a signal, representing how much the actual input satisfies both
hard-coded goals and new developed goals. IDRA can autonomously
develop new goals (through the Ontogenetic Module) starting from hardcoded ones (provided by PM), which represent the “innate instincts” of
the agent.
Innate instincts
Phylogenetic processes contribute to the adaptation of organism behaviors to the environment through the production of instincts [73]. An
Instinct is the inherent inclination of a living organism toward a particular behaviour, i.e. an impulse or powerful motivation from a subconscious source. The part of the brain that is associated with these
types of reactions is the amygdala [16][33]. The amygdala has its own
set of “receivers” for sensory intake, and can retrieve information from
the environment and take a decision before a person could consciously
think about it.
I have implemented instincts in the system as hard-coded goals in
the PM. It tells the agent what is relevant according to its hard-coded
instinctive functions. The input of this module comes from sensors; each
sensory information is processed by instinctive functions; each function
processes only a certain input type, according to the instincts associated
to that specific type.The output of this module (normalized between zero
and one) is the phylogenetic signal, which tells how much the incoming
stimulus is important according to the a priori stored criteria.
88
7.3 Implementation Model
Intentional module and Neuroplasticity
Neuroplasticity is the lifelong ability of the brain to reorganize neural
pathways based on new experiences. Neuroplasticity can occur in different levels, ranging from cellular changes (e.g. during learning) to
large-scale changes involved in cortical remapping (e.g. as consequence
of response to injury). Scientific research has demonstrated how substantial changes can occur in the lowest neocortical processing areas,
and how these changes can profoundly alter the pattern of neuronal activation in response to experience [16]. Neuroplasticity has replaced the
formerly-held position that the brain is a physiologically static organ.
In order to learn or memorize a fact or skill, there must be persistent
functional changes in the brain, and these changes represent the new
knowledge.
The IM (Figure 7.3) can easily adapt to changes in sensory input. If
I send to IM input from a video sensor, the IM will specialize to that
type of input; however, if I send to this specialized IM a different type
of input, e.g. an audio input, it will gradually adapt. The IM, as the
basic unit, must be able to self-develop new goals and motivations. This
idea resembles what developed in [70].
Figure 7.3: Intentional Module (IM)
I can summarize the main objectives of the module as:
89
7 A model of a middle level of cognition
• adapt to any kind of input;
• learn to categorize incoming stimuli;
• use acquired categories to develop new criteria to categorize;
• interface smoothly with similar modules and give rise to a hierarchical structure.
The IM is composed by two structures: the Categorization Module,
which performs categorization, and the Ontogenetic Module, which can
develop new goals. At the beginning, both modules are empty. Incoming
data are sent to the Categorization Module. Once the categories have
been created, a distance measure is sent to the Ontogenetic Module. The
distance measure is computed between the incoming sensory signal and
the developed categories. The Ontogenetic module performs Hebbian
learning to develop new goals, and returns a signal based on how much
these new goals are satisfied. This signal is called ontogenetic signal; a
high value of ontogenetic signal corresponds to high satisfaction of the
developed goals (see Figure 7.4).
Figure 7.4: Ontogenetic Module (OM)
The IM receives in input also the signal from the PM, and returns the
maximum between the signals of the PM and the Ontogenetic Module,
called relevant signal. The IM also send the categories created by the
Categorization Module.
90
7.3 Implementation Model
Categorization Module: cognitive autonomy
The cerebral cortex is divided into lobes, each having a specific function.
Parts of the cortex that receive sensory inputs from the thalamus are
called primary sensory areas. One of the main feature of the cerebral
cortex is the ability to adapt to stimuli, whatever is their nature, so each
cortical area is capable to process any type of data [109]. The Categorization Module represents the cerebral cortex of IDRA. It receives input
from sensors or from other IMs and performs categorization. The input
is elaborated twice: first with Independent Component Analysis (ICA)
[53] then with a clustering algorithm, such as K-Means. ICA allows the
module to generalize the input representation regardless to the type of
incoming stimuli. In an early development stage, independent components are extracted from a series of input through the ICA algorithm.
After this training stage, the input is projected in the bases space (i.e.
the previously extracted independent components), in order to reduce
the dimension of the data and to get a general representation:
W = IC × I
(7.1)
Where W is the resulting vector of weights, IC is the matrix of independent components and I is the input vector. This produces a vector of
weights where clustering is performed, using KMeans algorithm. Clustering is a simple way to get the neural code, i.e. the translation of a
stimulus into a neural activation. Considering that any information is
represented within the brain by networks of neurons, it is supposed that
neurons can encode any type of information [11][32]. During clustering, each vector is assigned to an existing cluster, if the distance from
existing clusters is below a previously-set threshold; otherwise a new
cluster is created, using the newly acquired vector. The output of the
Categorization Module is a vector containing the activations of clusters,
which depend on the distances of the input data from the centre of each
cluster (category). This vector corresponds to the activation of a neuron
centred in each cluster:
yi = ρ(x, Ci )
(7.2)
where yi is the distance of the actual input from the centre of the
cluster i, x is the input and Ci is the centre of the cluster i. For my purposes, the values are normalized between zero and one. A new category
is created by the categorization module depending on the value of the
relevant signal computed by the IM; only relevant inputs are categorized,
so that the module saves only meaningful information.
91
7 A model of a middle level of cognition
Ontogenetic Module: goals generation
Functionalities of the thalamus include elaboration of input and motor
signals, as well as regulation of consciousness, sleep, and alertness [109].
Furthermore, thalamus is a “miniature map” of the cerebral cortex [115].
Additionally, each portion of the thalamus receives a reciprocal connection from the same portions of the cerebral cortex whereby the cortex
can modify thalamic functions. These connections are more data intensive from cerebral cortex to thalamaus, while backward connection
going from thalamus to cortex is weaker. This close connection between
thalamus and cortex and their interplay led to the idea that goals generation is spread everywhere in cerebral cortex and it is obtained by the
interaction between thalamus and cortex.
The Ontogenetic Module represents the thalamus of my system. It is
closely connected to the Categorization Module and uses the categories
computed by the Categorization Module and a Hebbian learning function to develop new goals. The values of neural activations, provided
by the Categorization Module, are evaluated using a vector of weights,
and the resulting ontogenetic signal is the maximum value between the
evaluated neural activations:
os = max(yi wi )
i
(7.3)
where os is the resulting ontogenetic signal, yi is the activation of
neuron i and wi is the vector of weights associated to neuron i and normalized between zero and one. The ontogenetic signal strongly depends
on the weights used to evaluate the input. These weights are updated
at each iteration, using a Hebbian learning function:
wi = wi + η(hs yi − (wi yi2 ))
(7.4)
where η is the learning rate, hs stands for the hebbian signal, which is
a control signal coming from the IM. A threshold is fixed, so if a weight
is beyond or equals the threshold, its value is set to one. The output
of the Ontogenetic Module is the ontogenetic signal, whose value represents how much the actual input state satisfies the new goals developed
through the Hebbian learning process.
Signal propagation
The signal propagation through the internal sub modules of the IM can
be summarized in the following steps (see Figure 7.3):
92
7.3 Implementation Model
1. The sensory input, the global relevant signal, and the external
relevant signal of the previous time step are received;
2. the CM computes its output signal which encodes a proper representation of the actual sensory signal x(t);
3. the OM computes its ontogenetic signal, knowing the actual input
representation of the sensory and having an internal representation
of the developed goals in the input space;
4. the IM computes the relevant signal that represents the relevance
of the sensory input at time t;
5. training phase: the internally generated relevant signal (in the
previous step of this algorithm) drives the developing of new categories in the CM and the updating of the gates in the OM.
7.3.2 Motor system: movement generations
The problem of moving and acting in a smart way, according not only to
hard-coded goals, but also to new developed goals, is not trivial. I suggest a solution based on Dynamic Behaviours for movement evaluation,
and on the concept of Motor Primitives for movement generation. The
input to the motor part comes from the network of IMs, and it is composed by a vector of neural activations and a relevant signal. The vector,
representing the state of the environment in a high level of abstraction,
is clustered using K-Means algorithm. The output of the clustering is
the cluster corresponding to the current state; with a State-Action table
the best movement to do, according to the relevant signal, is selected.
The movement is composed by a linear combination of primitives, and
it is sent to the agent that uses it to compute joints values.
Different lines of evidence have led to the idea that motor actions and
movements are composed of elementary building blocks, called motor
primitives. Motor primitives might be equivalent to “motor schemas”,
“prototypes” or “control modules” [105]. Motor primitives could be
transformed with a set of operation and combined in different ways,
according to well defined syntactic rules, in order to obtain the entire
motor repertoire of motor actions of an organism. At neuronal level, a
primitive corresponds to a neuron assembly, for example, of spinal or
cortical neurons [44]. Studies on the motor system suggests that voluntary actions are composed by movement primitives, that are bonded to
each other either simultaneously or serially in time [44][74][116].
93
7 A model of a middle level of cognition
Following this idea, I use this concept of motor primitives to create
muscular activations. A motor primitive could be seen as the activation of a muscle during time. The higher the value of the primitive, the
stronger the muscle activation, which will bring to a faster execution of
the movement. Activating different muscles in time, a complex movement can be performed. I implement primitives as Gaussian functions
delayed in time:
p=e
−(x−c)2
2σ 2
(7.5)
where c is the centre of the muscular activation of the primitive p. I
have chosen the bell-shaped profiles of Gaussian function for primitives
according to biological evidences: when humans move their limbs from
one position to another they generally change joint angles in a smooth
manner, such that angular velocity follows a symmetrical, bell-shaped
profile [34]. To generate a complex movement, primitives are linearly
combined, producing a muscular synergy:
m=
X
wi p i
(7.6)
Each weight wi is initially randomly generated. In human brain the
motor cortex is involved in planning, control and execution of movements.. Each neuron in the primary motor cortex contributes to the
force in a muscle [33]. Furthermore, the primary motor cortex is somatotopically organized, which means that stimulation of a specific part of
the primary motor cortex elicits a response from a specific body region.
According to the fundamental role of the motor cortex in movement
generation, the approach should be similar to the one used for categorization: I need a neural code of information like the one computed
by the categorization module. So the first step in order to perform a
movement is clustering. Clustering is performed by K-Means algorithm.
However, in this case I do not need to create new categories, so clusters
are defined in a previous training phase. I need the cluster representing
the current state, namely the part of the primary motor cortex that is
stimulated by the current state. This approach respects the idea that
the same neural activation produces the same muscular response.
I need something able to select the best movement according to the
current state of the environment, depending on the goals. Dynamic Behaviours allow the agent to perform the best movement, given the state
of the environment and the relevant signal, as well as the ability to learn
the best movement to execute in an unknown situation. The relevant
signal depends on the ontogenetic signal, coming from the thalamus,
94
7.3 Implementation Model
and the phylogenetic signal, computed by the amygdala; its use in the
motor development is based on scientific evidences [30].
Each Dynamic Behaviour is composed by a list of actuators to move in
order to satisfy a goal. If I want the agent to look to a ball, for example,
I create a Behaviour linked to the actuators which control the head yaw
and pitch angles. The Dynamic Behaviour selects the set of movements
to be executed in order to get the best relevant signal as a response
from the networks of IMs. The computation of the best movement is
based on a State-Action table (Figure 7.5); the table associates a state
and a movement to a relevant signal. When the system is in a certain
state and performs a movement, the relevant signal generated by this
state-performed movement combination is stored in the table.
Figure 7.5: State-action Table
The policy for movement selection for each input state follows these
rules:
• if there is a movement associated with a relevant signal above a
defined threshold, that movement is selected;
• otherwise, if there is a movement not yet performed, that movement is selected;
• otherwise, if all movements have already been performed at least
once, and no one is associated with a relevant signal, a new random
movement is added to the list
Each movement is a set of weights to apply to the primitives to obtain
a muscular activation. Once a movement is selected, it is used to lin95
7 A model of a middle level of cognition
early combine primitives: the resulting vector is sent to the agent that
computes the correct joint values and moves.
7.4 Experimental results
7.4.1 Setup
IDRA implementation was specifically designed to be as modular as
possible, in order to allow further addition of innate abilities in the PM,
new kinds of sensor’s inputs, and new kinds of actuator’s behaviours.
Moreover, it is designed to be virtually able to adapt to every robot the
user would like to use it with.
The project was developed in C# using Microsoft Visual Studio 2010,
making extensive use of libraries and data standards that I specifically
developed for this project. In the current state of the work the project
is provided with C# implementation and XML config files for an Aldebaran NAO robot, and with a dummy robot useful for preliminary testing. The architecture runs on a Windows PC with:
• Processor: Intel Core i7 920;
• Video Card: Nvidia GeForce GTS 250;
• Memory: 4 GB DDR3 Ram;
• Hard-drive: 1 TB 7200 rpm.
7.4.2 Goal generation
To test goal generation starting from hard-coded goals, I have performed
a simple experiment, with a network composed of a single IM, and with
input coming from a video camera. The agent should be able to extract
information about the colour saturation of the image, according to a
hard-wired instinct, and then learn the shape of the observed figure, thus
developing a new interest for that particular shape. The experiment uses
a NAO robot, in particular only the frontal camera, and two actuators
controlling the head movement, namely HeadPitch and HeadYaw. This
experiment is similar to the one performed in [70], where the employed
agent was a two degree of freedom camera. Here I have used an agent far
more complex and I have performed the test with the full architecture.
The robot has a single innate ability, i.e. the “attraction” to coloured
objects; the test consists in showing how a new interest, i.e. the “attraction” to specific shapes, could show up without the need to hardcode
96
7.4 Experimental results
it. The test environment is limited to two board containing geometrical
figures (Figure 7.6). The first board presents a series of black shapes,
among which there is a black star. The second board presents some
stars in highly saturated colours. The boards are put on a wall in front
of the NAO robot, at an adequate distance to allow the camera to see
the entire board while moving.
Figure 7.6: The 2 boards for the experiment
The network (Figure 7.7) is composed by only one layer including a
single IM, and it receives as input just the video signal coming from the
frontal camera. The input is filtered in three different ways:
• logPolarBW filter: retrieves an RGB image and returns it in logpolar coordinates, in a single channel colour space;
• logPolarSat filter: retrieves an RGB image and converts it in HSV
(Hue Saturation Value), then returns the saturation channel in
log-polar coordinates.
• cartesianRGB filter: retrieves an RGB image and returns the same
image in Cartesian coordinates in a three-channel colour space.
This input is sent to interface and it is used by humans to understand what the robot is looking at.
The IM receives data from logPolarBW in array form, while data from
logPolarSat is received from the Phylogenetic Module. The phylogenetic
signal here is the percentage of high-saturated pixels in the image. The
output is the output of the single IM in the net. Head movements are
randomly generated, using a uniform distribution for the angle and a
bell-shaped Gaussian function for the amplitude of the movement. The
97
7 A model of a middle level of cognition
Figure 7.7: The architecture used in the goal-generation experiment
probability function of the amplitude is:
e−λ r2
p(r, λ) = R rmax −λρ2
dρ
−rmax e
(7.7)
where r is a random variable and λ is the relevant signal output from
the IM. The variance of the Gaussian function depends on the relevant
signal computed by the IDRA architecture. According to this, when
NAO is interested in what it sees, the movement of the head is small,
while non attractive images lead to wide head movements. In order
to project the input in the basis space, independent components are
extracted through ICA on images coming from the camera, using the
parameters:
• number of samples: 2000;
• max number of iterations: 200;
• convergence threshold: 10-5;
• max number of independent components: 32;
• eigenvalue threshold: 10-4.
Let me now discuss the results. Initially the board contains only black
figures. The interest of NAO is more or less equal for every part of the
board; it aimlessly points at every part of its visual field, since he cannot
find anything interesting. After, the second board is shown. Now the
interest of NAO is focusing on the three star-shaped figures. After the
board is switched again with the first one. Unlike before, the interest of
the NAO robot is now focused on the star-shaped figure, which is black
colored, meaning that the learning process of the Ontogentic Module
has developed a new interest in the shape of the figure, which goes in
addition with the previous interest for its color (Figure 7.8 and 7.9).
98
7.4 Experimental results
During this test I have used the following parameterization: 0.6 threshold for the creation of new categories, 0.6 for determining the correlation
of a signal to a category, 128 for the number of clusters, 0.8 threshold
and 0.1 learning rate for Hebbian learning.
Figure 7.8: The gaze is concentrated on the star object
7.4.3 Movement generation
A limitation of the first experiment is that it relegates the agent to a
passive role. Once the agent has learned from the experience, I need
it to be able to take some action in order to interact with the environment, and so change its own perceptions towards a more satisfying
condition. The objective of this experiment is to test the Behaviour and
OutputSynthesis parts of the architecture.
Starting from a simple hard-coded instinct, i.e. the “attraction” to
coloured objects, and the ability to move only one of its arms, the robot
learns which movements would allow him to see a coloured object, taken
in the hand, moving it near the eyes, and then to increase its reward
signal (Figure 7.10).
The experiment uses NAO, in particular its frontal camera, and four
actuators controlling the movement of the right arm. During this test I
used a 0.5 threshold for the creation of new categories, a 0.28 threshold
for the K-means algorithm, a 0.3 threshold for the minimum distance
between centroids of clusters, a 1000 limit for number of points per
cluster, a 10 limit for the number of clusters and number of categories, a
0.8 threshold and a 0.1 learning rate for Hebbian learning, a 0.8 threshold
for the choice of the best possible movement in the Behaviours. For the
motor primitives part, I used 5 Gaussians as primitives, each one with a
standard deviation of 6.7 and a mean value calculated in order to equally
distribute the functions on a scale from 0 to 100.
99
7 A model of a middle level of cognition
Figure 7.9: The red line represent the global relevant signal whereas the
blue line represent the ontogenetic signal. After training, the
cognitive architecture is able to produce a signal of relevance
also for the star-shaped objects.
Figure 7.10: The robot NAO in the movement experiment
100
7.4 Experimental results
With respect to the previous experiment, I have drastically reduced
the number of categories and clusters. In fact, the head was fixed,
therefore kept looking at the same point, unless the arm itself did not
pass in front of the camera. The arm has a very limited input dimension
(4, the number of joints), with respect to the video input (160x120 image,
19200 pixels). Limiting the number of categories, it allows to have a good
representation of the almost static environment of the robot.
The network is slightly more complex (Figure 7.11). It has two input
signals, filtered in four different ways. Besides the 3 filters for the images as before described, I add the rightArmPosition filter: retrieves the
proprioceptive information about the four joints of the right arms and
it returns a vector containing the joints angles values in radians.
Figure 7.11: The architecture in the second experiment
IDRA is composed by two layers; the first contains two IMs. The first
IM receives logpolar gray images and records the shapes. The second,
instead, receives the proprioceptive data about the position of the right
arm. Both send their output vector and relevant signals to a third
module, situated in layer 2. Their output therefore represents the state
of the known world: what it can see, and where its arm is.
The PM receives the logPolarSat filtered input, and broadcasts its
signal to the net. Therefore, if the robot is in a state where he can see
a colored object, the third IM will have a high outgoing relevant signal.
In order to project input in the basis space, independent components
are extracted through ICA. The first IM computes ICA on video images,
the second one on the joints values of the right arm, and the third on
the combined output of the others. The parameters used are the same
as in the first experiment.
Once the net is trained, clusters for the Motor System should be computed. While the net is running, samples of the output are collected,
101
7 A model of a middle level of cognition
then K-Means algorithm is executed to create clusters. Parameters used
are: number of sample: 2000, number of clusters: 100. The experiment starts with the robot in a random position. At the beginning, the
State-Action table is empty, and movements are chosen and executed
randomly. After a number of steps, the table starts filling, and movements begin to be coherent with the maximization of the relevant signal
received from the architecture. When the table contains a good number
of entries, movements start to be frequently repeated.
Several positions of the arms (rows in the State-Action table) know
the reference to a movement (column) that bring the hand to a position
with a high relevant signal; several positions with a high relevant signal
know many movements, but none of them brings the hand to a position
with a high relevant signal. Therefore, I observe the robot starting from
a random position, going towards a good one, and then going towards a
random position, and so on. Figure 7.12 shows the clusters of positions
manifesting the higher relevant signals, colored in red. For each cluster,
I report a 3D representation of the NAO, showing its position, and what
it sees through its top camera.
Figure 7.12: The most relevant hand positions
The implemented Motor System is simple, with obvious limitations;
the movements that maximize the relevant signal are performed only if
the state is known, and if the State-Action table has the corresponding
entry. According to this, the table requires a lot of entries to produce
102
7.5 Conclusions
an effective movement selection system.
Furthermore, the motor training has run for a relatively short period
of time. As a consequence, the State-Action table presented a rather
limited extension, in comparison with the high dimension of the input
representing all the possible states of the environment. Even more, although the brain has been proven to have an associative memory of
sequences of patterns [11], the system here presented has no memory of
previous actions.
My results are coherent with the objective of the experiment: the
robot moves using the linear combination of primitives, and it is able to
learn what movements be perform to go from a known state to a state
with a high reward. In addition, the experiment led to the creation of
a sensor-motor map through the cognitive architecture. For all these
reason, the implemented Motor System is an excellent starting point for
the development of an effective system which allows the agent to move
according to its goals.
7.5 Conclusions
The aim of this work is the creation of a bio-inspired software architecture based on the processes that take place in the human brain; this
architecture must be able to learn new goals, as well as to learn new
actions to achieve such goals. The architecture has been successfully
designed and implemented. My experiments have shown how the agent
is able to keep memory of the past situations, and act accordingly to
the achievement of its goals, whether innate or acquired. Besides this,
the intrinsic dynamicity of the architecture allows the agent to acknowledge the changes in the environment that are independent from its own
actions, and then recalibrate its actions for a particular situation.
103
8 Conclusions
In this thesis I have explored the design of biologically inspired controllers, focusing on the investigation on two broad aspects: the low
level computation mechanisms and the cognitive decision-making process. The common background shared by these two broad aspects is
to exploit biological principles associated to the internal mechanisms of
the brain. Generally, I have focused on these mechanisms widespread in
the mammal brains, having a particular attention at the human brain.
For these reasons, I have investigated two aspects of the human brain,
namely, the visual dorsal pathway and the interaction among the thalamus, the amygdala, and the cortex.
The visual dorsal pathway is interesting for several reasons. First, it
contains a set of basic functionalities (spread in several brain’s areas),
mandatory for both perceiving and acting, such as visual processing,
object position estimation, sensory fusion, visuomotor mapping, and
trajectories generation; second, it is tightly coupled with the arising of
cognition, as pointed out in [92]; third, this pathway is widely studied
both in neuroscience and engineering field, providing qualitative and
quantitative results of comparison.
On the other hand, the interaction among the thalamus, the amygdala,
and the cortex is a proposal for modelling the emergence of goals and
behaviours, totally inspired by neuroscientific evidences. Moreover, this
interaction implies that the cognition is widespread in the brain because
it emerges through the interaction of different brain areas.
I have investigated the low level computational models of the cortex,
looking at those computational principles widespread in the different
functional areas. Specifically, I have focused on two fundamental cortical
areas: the primary visual cortex (V1) and the posterior parietal cortex
(PPC). The primary motor cortex (M1) is modelled without focusing
on the particular underlying neural network. I have defined three sets
of experiments to investigate each of the above mentioned areas. For
each of them, I have imposed several constraints in order to have a
proper testing environment for the generation of quantitative results.
It is worth noting that one of the main objective of these experiments
is to investigate several strategies of bioinspiration, comparing them in
terms of underlying neural network, learning method, neural plasticity,
105
8 Conclusions
and neurons recruitment. For this reason, models of several functional
areas are proposed.
The V1 model exploits the interaction among binocular neurons for
computing the disparity map, the model of the PPC computes the coordinate transformation between different frame of references, and the
model based on the Hering’s law computes the visuomotor mapping and
the arm trajectory to reach the perceived target. The PPC and the
Hering-based model have partially overlapped functionalities because
both compute the visuomotor mapping and follow a motor babbling
schema for their learning. In fact, the PPC model builds its mapping
following an unsupervised learning schema whereas the Hering-based
model is trained through a classical supervised technique. Furthermore,
the PPC model has internal parameters to be tuned that are not related
to the controlled body whereas the Hering-based model must estimate
different parameters for different body shapes, meaning that the PPC
model has a certain degree of independence from the robot shape. However, the Hering-based model computes also the arm trajectory in the
3D space for the target reaching, solving the problem’s complexity with
a minimization of cost function.
On the other hand, the V1 model deals with a widely studied brain
area that processes the visual information coming from both eyes. With
respect to both PPC and Hering-based model, the V1 model has a predefined underlying neural network where the focusing is on the mechanisms for filtering the population responses in order to make a robust
estimation.
The second aspect deeply investigated in this work is a cognitive architecture. The main difference among several proposals in literature
is the emergence of a certain degree of cognition from modelling the
interaction among several brain areas. The involved areas are the thalamus, that deals with the developed goals and behaviours, the amygdala,
dealing with the innate goals and with the emotion, and the cortex,
that is a massive network for the information processing. Even though
these experiments of the low level computational mechanisms deal with
cortical areas, in the cognitive experiment I have not focused on the specific low-level mechanisms but I have implemented a simplified version
of the cortical areas. It permits to simplify the experiment’s complexity
and to focus on the generation of new goals and behaviours due to the
interaction among the thalamus, the amygdala, and the cortex.
In order to discuss about the obtained results, these should be split
in two main categories: the results obtained on the single experiment
compared with the state of art, and the results obtained in terms of
comparison among experiments at low-level and at the cognitive level.
106
The V1 model explicitly computes the disparity map including several
well known mechanisms of the primary visual cortex and improving previously published results. Even though experiments on real data point
out good performances, they are far from classical techniques based on
computer vision. The PPC model introduces, for the first time and at
the best of my knowledge, a neural architecture that is trained with an
unsupervised method based on the Hebbian learning. It is able to learn
from scratch the visuomotor mapping between the eye frame of reference
and the arm frame of reference. Quantitative results are provided, but a
comparison with the literature is only qualitative due to the lack of previous quantitative results. On the other hand, the Hering-based model
provides a possible strategy to learn a visuomotor mapping and to reach
a perceived target only knowing the retinal position of the target. This
work is compared with other works and quantitative data and statistics
are provided. A qualitative analysis with respect to the other works
shows that this method is particularly suitable due to its robustness in
the exploration of the surrounding space. Last, the cognitive architecture has been tested in two scenarios where it should be able to develop
a new behaviours, starting from an innate criteria. In the first scenario,
a qualitative analysis, supported by quantitative results, points out the
capability of the system of both generating new goals and improving its
neural representation of the sensory information through the interaction
with the environment. In the second scenario, the architecture has been
able both to learn a sensorimotor mapping through the interaction with
the environment pursuing its own goals and to learn motor patterns.
On the other side, the comparative analysis of the developed models,
both cortical and cognitive, gives interesting results. First, low level
computational models are implemented over computational frameworks
that differ from each other, even if the results, compared with those of
the state of art, point out a performance improvement. Second, these
models can be applied, up to a training phase, to different technological solutions. For example, the camera calibration parameters are not
needed by the V1 model, and the type of actuation is irrelevant for both
PPC model and Hering-based model. Third, the cognitive architecture
adapts to the specific body shape, exploiting its morphology to produce
intelligent behaviours, without having a specific knowledge of the cortex representation of the incoming sensory information flow. Forth, a
comparison among low-level computational mechanisms points out the
flexibility of unsupervised techniques because they are able to emerge in
both neurons receptive fields and network properties that are similar to
the biological counterpart. Fifth, the cognitive architecture is composed
by a totally biologically inspired architecture and the cognition emerges
107
8 Conclusions
over a common neural lattice.
Previous results show only part of the problem focusing on specific
brain areas without investigating the big picture. How to merge different levels of computations, both cognitive and low-level, is one of the big
challenges in biologically inspired robotics, and a qualitative comparison
among several approaches it is the first step of this roadmap proposal.
Even though, several approaches exist for developing the neural mechanisms, either supervised or not, it is worth noting that the learning
strategies are different methods to exploit the same underlying architecture (see for example the similarities between [76] and [51]). In fact,
all the proposed models share the same computational principles, such
as population coding, feedback connections, and so on. Of course, this
is not a conclusive list of principles and further efforts are required to
develop a unique, homogeneous computational framework of the brain;
this proposed roadmap is the basis for further experiments to investigate
both the computational mechanisms and the motivations to use them.
From the life sciences point of view, these models can help to give
an insight on those underlying mechanisms for both low level computation and the arising of cognition. These models are based on the most
recent advances in computational neuroscience, even though they are
highly speculative. Each model implements well known mechanisms in
neuroscience with a specific focus on both improving the performance
and making hypotheses on the neural organization. For each of them I
have introduced several improvements, in terms of both new computational mechanisms and neural architecture. The V1 model introduces
the scale pooling mechanism, the PPC model is based on an unsupervised learning method, the Hering-based model introduces the decoupled
control between eyes and the neck, and the IDRA model proposes a way
of interaction among several brain areas, speculating on both the functionality and the timing of the interaction. All of these properties, even
though based on previously developed brain models, should be validated
through a comparison with the biological counterpart. In fact, scientists
should confirm or deny these hypotheses, that I have introduced in my
models, by means of the analysis of real data.
From an engineering point of view, it could be possible to integrate
the proposed low-level computational models in an unique architecture.
However this is out of the aim of this thesis, that has aimed at investigating how bioinspiration is approached by the engineers and how the
comparison among different techniques could emerge with a more plausible shared framework (see Table 3.1). In my opinion the unsupervised
approach with a common neural lattice is the way to approach at least
the problem at low level. In Chapter 5 I have shown how it is possible to
108
emerge a neural architecture that resembles the biological counterpart,
with emerging computational mechanisms, such as gain field. Moreover,
the comparison between [76] and [51] points out the suitability of designing a biologically inspired model of the primary visual cortex trained
in an unsupervised fashion. Actually there are not concluding evidences
that the unsupervised approach is the effective technique implemented
in the brain for the organization of the network architecture, but it is
quite clear that the unsupervised approach works for the definition of
the receptive fields.
On the other hand, the cognitive architecture is built on the same
neural lattice of the cortical models; it uses unsupervised learning for
developing new goals, and a self-generating reinforcement learning to
represent the innate criteria. It indicates that cognition is highly correlated to the low-level computation, such as sensorimotor mapping, and it
could overcome high level cognitive architectures based, for example, on
Bayesian networks and purely probabilistic methods. A Bayesian brain
could be considered biologically inspired too, since it performs computations that produce an output similar to the biological counterpart,
especially in coordination task [61]. However, it is quite clear that a
Bayesian brain does not model the complexity of a cognitive agent built
on the same neural lattice and it does not take into account how the
cognition arises on the network from the interaction with the environment.
Despite the results presented in this thesis, the roadmap indicates
further steps in the development of a cognitive architecture that merges
cognitive aspects, the low level computations and the morphology of
the robot. First, a common neural lattice should be provided for those
computational models that mimic functional brain areas, given a common learning strategy, e.g. unsupervised. Second, these computational
models should be integrated in the cognitive neural architecture. Third,
experiments on different robotics setup must be performed to validate a
truly biologically inspired architecture.
109
Bibliography
[1] R. Adolphs and M. Spezio. Role of the amygdala in processing
visual social stimuli. Progress in Brain Research, 156:363–378,
2006.
[2] T. D. Albright, E. C. Kandel, and M. I. Posner. Cognitive neuroscience. Current Opinion in Neurobiology, 10:612–624, 2000.
[3] J. S. Albus, D. T. Branch, C. Donald, and H. Perkel. A theory of
cerebellar function. 1971.
[4] R. A. Andersen and H. Cui. Intention, action planning, and decision making in parietal-frontal circuits. Neuron, 63:568–583, 2009.
[5] R. A. Andersen, G. K. Essick, and R. M. Siegel. Encoding of
spatial location by posterior parietal neurons. Science, 230:456–
458, 1985.
[6] G. Aragon-Camarasa, H. Fattah, and J. P. Siebert. Towards a
unified visual framework in a binocular active robot vision system.
Robotics and Autonomous Systems, 58(3):276–286, 2010.
[7] B. D. Argall, S. Chernova, M. Veloso, and B. Browning. A survey
of robot learning from demonstration. Robotics and Autonomous
Systems, 57:469–483, 2009.
[8] M. Asada, K. Hosoda, Y. Kuniyoshi, H. Ishiguro, T. Inui,
Y. Yoshikawa, M. Ogino, and C. Yoshida. Cognitive developmental robotics: A survey. IEEE Transactions on Autonomous Mental
Development, 1(1):12–34, 2009.
[9] Y. Bar-Cohen. Biological senses as inspiring model for biomimetic
sensors. IEEE Sensors Journal, 11(12):3194–3201, 2011.
[10] P. Bernier and G. T. Grafton. Human posterior parietal cortex
flexibly determines reference frames for reaching based on sensory
contex. Neuron, 68:776–788, 2010.
[11] D. F. Bjorklund. Children’s Thinking: Cognitive Development and
Individual Differences. 2004.
111
Bibliography
[12] M. Brozovic, L. F. Abbott, and R. A. Andersen. Mechanism of
gain modulation at single neuron and network levels. Journal of
Computational Neuroscience, 25:158–168, 2008.
[13] C. A. Bumeo and R. A. Andersen. The posterior parietal cortex:
Sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia, 44:2594–2606, 2006.
[14] S. A. Bunge. How we use rules to select actions: A review of
evidence from cognitive neuroscience. Cognitive, Affective, and
Behavioural Neuroscience, 4(4):564–579, 2004.
[15] M. Carandini and D. J. Heeger. Normalization as a canonical
neural computation. Nature Neuroscience, 13:51–62, 2012.
[16] W. Chaney. Dynamic Mind. 2007.
[17] L. L. Chen. Head movements evoked by electrical stimulation in
the frontal eye field of the monkey: evidence for independent eye
and head control. Journal of Neurophysiology, 95:3528–3542, 2006.
[18] S. Chen, Y. Li, and N. M. Kwok. Active vision in robotic systems:
A survey of recent developments. The International Journal of
Robotics Research, 2011.
[19] Y. Chen and N. Qian. Coarse-to-fine disparity energy model with
both phase-shift and position-shift receptive field mecahanisms.
Neural Computation, 16:1545–1577, 2004.
[20] M. Chessa, S. P. Sabatini, and F. Solari. A fast joint bioinspired
algorithm for optic flow and two-dimensional disparity estimation.
Computer Vision System, pages 184–193, 2009.
[21] E. Chinellato, M. Antonelli, B. J. Grzyb, and A. P. del Pobil. Implicit sensorimotor mapping of the peripersonal space by gazing
and reaching. IEEE Transactions on Autonomous Mental Development, 3(1):43–53, 2011.
[22] E. Chinellato, B. J. Grzyb, N. Marzocchi, A. Bosco, P. Fattori,
and A. P. del Pobil. The dorso-medial visual stream: From neural
activation to sensorimotor interaction. Neurocomputing, 74:1203–
1212, 2011.
[23] R. Chrisley. Embodied artificial intelligence. Artificial Intelligence,
2003.
112
Bibliography
[24] R. Chrisley. Philosophical foundations of artificial consciousness.
Artificial Intelligence in Medicine, 44:119–137, 2008.
[25] P. S. Churchland, J. P. Cunningham, M. T. Kaufman, J. D. Foster,
P. Nuyujukian, S. I. Ryu, and K. V. Shenoy. Neural population
dynamics during reaching. Nature, 487:51–56, 2012.
[26] P. Cisek and J. F. Kalaska. Neural mechanisms for interacting
with a world full of action choices. Annual Review of Neuroscience,
33:269–298, 2010.
[27] B. G. Cumming and G. C. DeAngelis. The physiology of stereopsis.
Annual Review of Neuroscience, 24:203–238, 2001.
[28] K. De Meyer and M. W. Spratling. Multiplicative gain modulation
arises through unsupervised learning in a predictive coding model
of cortical function. Neural Computation, 23:1536–1567, 2011.
[29] S. Deneve, P. E. Latham, and A. Pouget. Efficient computation
and cue integration with noisy population codes. Nature Neuroscience, 18:826–831, 2001.
[30] J. e. a. Doyon. Contributions of the basal ganglia and functionally related brain structures to motor learning. Behavioral Brain
Research, 199(1):61–75, 2009.
[31] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification.
John Wiley and Sons, 2001.
[32] S. Duncan and L. F. Barrett. The role of the amygdala in visual
awareness. Trends in Cognitive Sciences, 11(5):190–192, 2007.
[33] E. V. Evarts. Relation of pyramidal tract activity to force exerted during voluntary movement. Journal of Neurophysiology,
31(1):14– 27, 1968.
[34] J. R. Flanagan and D. J. Ostry. Trajectories of human multi-joint
arm movements: Evidence of joint level planning. Experimental
Robotics I, page 594–613, 1990.
[35] D. Fleet, H. Wagner, and T. Sejnowski. Neural encoding of binocular disparity: Energy models, position shifts and phase shifts.
Vision Research, 36:1839–1857, 1996.
[36] D. Floreano, P. Durr, and C. Mattiussi. Neuroevolution: from
architectures to learning. Evolutionary Intelligence, 1:47–62, 2008.
113
Bibliography
[37] D. Floreano and L. Keller. Evolution of adaptive behaviour in
robots by means of darwinian selection. PLoS Biology, 8:1–8, 2010.
[38] A. Georgopoulos, J. Kalaska, R. Caminiti, and J. Massey. On
the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. Journal of Neuroscience, 2:1527–1537, 1982.
[39] A. P. Georgopoulos, A. B. Schwartz, and R. E. Kettner. Neuronal
population coding of movement direction. Science, 233:1416–1419,
1986.
[40] A. Gibaldi, A. Canessa, M. Chessa, S. P. Sabatini, and F. Solari. A neuromorphic control module for real-time vergence eye
movements on the icub robot head. In Proc. 11th IEEE-RAS Int
Humanoid Robots (Humanoids) Conf, pages 543–550, 2011.
[41] A. Gibaldi, M. Chessa, A. Canessa, S. P. Sabatini, and F. Solari.
A cortical model for binocular vergence control without explicit
calculation of disparity. Neurocomputing, 73:1065–1073, 2010.
[42] C. Glaser, F. Joublin, and C. Goerick. Learning and use of sensorimotor schemata maps. In Proc. IEEE 8th Int. Conf. Development
and Learning ICDL 2009, pages 1–8, 2009.
[43] D. Goldberg. Genetic Algorithms in Search, Optimization, and
Machine Learning. Addison-Wesley: Reading, 1989.
[44] C. B. Hart and S. F. Giszter. Modular premotor drives and unit
bursts as primitives for frog motor behaviors. The Journal of
Neuroscience, 24:5269–5282, 2004.
[45] N. G. Hatsopoulos and A. J. Suminski. Sensing with the motor
cortex. Neuron, 72:477–487, 2011.
[46] N. J. Hemion, F. Joublin, and K. J. Rohlfing. A competitive
mechanism for self-organized learning of sensorimotor mappings.
In Proc. IEEE Int Development and Learning (ICDL) Conf, volume 2, pages 1–6, 2011.
[47] A. L. Hodgkin and A. F. Huxley. A quantitative description of
membrane current and application to conduction and excitation
in nerve. The Journal of physiology, 117:500–544, 1954.
[48] M. Hoffmann, H. Marques, A. Arieta, H. Sumioka, M. Lungarella,
and R. Pfeifer. Body schema in robotics: A review. IEEE Transactions on Autonomous Mental Development, 2(4):304–324, 2010.
114
Bibliography
[49] P. O. Hoyer and A. Hyvarinen. Independent component analysis applied to feature extraction from colour and stereo images.
Network: Computation in Neural Systems, 11:191–210, 2000.
[50] T. C. Hsia. Adaptive control of robot manipulators - a review. In
International Conference on Robotics and Automation, 1986.
[51] A. Hyvarinen and P. O. Hoyer. A two-layer sparse coding model
learns simple and complex cell receptive fields and topography
from natural images. Vision Research, 41:2413– 2423, 2001.
[52] A. Hyvarinen, P. O. Hoyer, and M. Inki. Topographic independent
component analysis. Neural Computation, 13:1527–1558, 2001.
[53] A. Hyvärinen and E. Oja. Independent component analysis: Algorithms and applications. Neural Networks, 13:411–430, 2000.
[54] A. J. Ijspeert, N. J., and S. Schaal. Movement imitation with
nonlinear dynamical systems in humanoid robots. In IEEE International Conference on Robotics and Automation, 2002.
[55] E. M. Izhikevich. Which model to use for cortical spiking neurons?
IEEE Transactions On Neural Networks, 15(5), 2004.
[56] R. S. Jackendoff. Consciousness and the computational mind, volume 100. 1987.
[57] R. S. Jackendoff and F. Lerdahl. The capacity for music: what is
it, and what’ s special about it? Cognition, 100:33–72, 2006.
[58] B. Julesz. Foundations of Cyclopean Perception. University of
Chicago Press, 1971.
[59] E. R. Kandel, J. Schwartz, and T. M. Jessell. Principles of Neural
Science. 2000.
[60] W. M. King. Binocular coordination of eye movements - hering’s
law of equal innervation or uniocular control? European Journal
of Neuroscience, 33:2139–2146, 2011.
[61] D. C. Knill and A. Pouget. The bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences,
27(12):712–719, 2004.
[62] N. Kyriakoulis, A. Gasteratos, and S. G. Mouroutsos. Fuzzy vergence control for an active binocular vision system. In Proc. 7th
115
Bibliography
IEEE Int. Conf. Cybernetic Intelligent Systems CIS 2008, pages
1–5, 2008.
[63] J. C. K. Lai, M. P. Schoen, A. Perez Gracia, D. S. Naidu, and
S. W. Leung. Prosthetic devices: Challenges and implications of
robotic implants and biological interfaces. Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering
in Medicine, 221(2):173–183, 2007.
[64] V. Lawhern, W. Wu, N. G. Hatsopoulos, and L. Paninski. Population decoding of motor cortical activity using a generalized linear model with hidden states. Journal of Neuroscience Methods,
189:267–280, 2010.
[65] T. S. Lee.
Image representation using 2d gabor wavelets.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
18(10), 1996.
[66] M. Lungarella, G. Metta, R. Pfeifer, and G. Sandini. Developmental robotics: a survey. Connection Science, 15:151–190, 2003.
[67] M. Lungarella and O. Sporns. Mapping information flow in sensorimotor networks. PLos Computational Biology, 2, 2006.
[68] R. Manzotti and F. Mutti. Machine consciousness through goal
generation. In IEEE Symposium Series on Computational Intelligence, 2013. Accepted.
[69] R. Manzotti, F. Mutti, S. Y. Lee, and G. Gini. A model of a middle
level of cognition based on the interaction among the thalamus,
amygdala, and the cortex. In IEEE International Conference on
Systems, Man, and Cybernetics, 2012.
[70] R. Manzotti and V. Tagliasco. From behavior-based robots to
motivation-based robots. Robotics and Autonomous Systems,
51:175–190, 2005.
[71] M. Meisel, V. Pappas, and L. Zhang. A taxonomy of bologically
inspired research in computer networking. Computer Networks,
54:901–916, 2010.
[72] M. Milford, G. Wyeth, and D. Prasser. Ratslam: a hippocampal model for simultaneous localization and mapping. In IEEE
International Conference on Robotics and Automation, 2004.
116
Bibliography
[73] E. A. Murray and S. P. Wise. Interactions between orbital prefrontal cortex and amygdala: advanced cognition, learned responses and instinctive behaviors. Current opinion in neurobiology, 20(2):212– 220, 2010.
[74] F. A. Mussa-Ivaldi and E. Bizzi. Motor learning through the combination of primitives. Philosophical Transactions of the Royal
Society Lond. B Biological Sciences, 355:1755–1769, 2000.
[75] F. Mutti, C. Alessandro, M. Angioletti, A. Bianchi, and G. Gini.
Learning and evaluation of a vergence control system inspired by
hering’s law. In IEEE Int. Conf. on Biomedical Robotics and
Biomechatronics (BIOROB), 2012.
[76] F. Mutti and G. Gini. Bio-inspired disparity estimation system
from energy neurons. In ICABB, 2010.
[77] F. Mutti and G. Gini. Bio-inspired vision system for depth perception in humanoids. In CogSys, 2010. poster.
[78] F. Mutti and G. Gini. Bioinspired vergence control system: learning and quantitative evaluation. In CogSys, 2012. poster.
[79] F. Mutti and G. Gini. Visuomotor mapping based on hering’s law
for a redundant active stereo head and a 3 dof arm. In BIONETICS, 2012.
[80] F. Mutti, G. Gini, M. Burrafato, L. Florio, and R. Manzotti. Developing new sensor-motor goals in a bioinspired architecture for
evolutionary agents. Unpublished, 2013.
[81] F. Mutti, R. Manzotti, G. Gini, and S. Y. Lee. Implementation
and evaluation of a goal-generating agent through unsupervised
learning on nao robot. In Biologically Inspired Cognitive Architectures, 2012.
[82] F. Mutti, H. Marques, and G. Gini. A model of the visual dorsal
pathway for computing coordinate transformations: an unsupervised approach. In Biologically Inspired Cognitive Architectures,
2012.
[83] A. L. Nelson, G. J. Barlow, and L. Doitsidis. Fitness functions
in evolutionary robotics: A survey and analysis. Robotics and
Autonomous Systems, 57:345–370, 2009.
117
Bibliography
[84] R. Nieuwenhuys, J. Voogd, and C. van Huijzen. The Human Central Nervous System: A Synopsis and Atlas. Steinkopff, Amsterdam, 2007.
[85] I. Ohzawa. Mechanisms of stereoscopic vision: the disparity energy
model. Current Opinion in Neurobiology, 1998.
[86] I. Ohzawa, G. C. De Angelis, and R. D. Freeman. Stereoscopic
depth discrimination in the visual cortex: neurons ideally suited
as disparity detectors. Science, 249:1037–1041, 1990.
[87] B. A. Olshausen and D. J. Field. Sparse coding of sensory inputs.
Current Opinion in Neurobiology, 14:481–487, 2004.
[88] L. Paninski, M. R. Fellows, N. G. Hatsopoulos, and J. P.
Donoghue. Spatiotemporal tuning of motor cortical neurons for
hand position and velocity. Journal of Neurophysiology, 91:515–
532, 2004.
[89] A. E. Patla, T. W. Calvert, and R. B. Stein. Model of a pattern
generator for locomotion in mammals. AJP - Regulatory Integrative and comparative Physiology, 248:484–494, 1985.
[90] W. Penfield and E. Boldrey. Somatic motor and sensory representation in the cerebral cortex of man as studied by electrical
stimulation. Brain, 60:389–443, 1937.
[91] R. Pfeifer and F. Iida. Morphological computation: connecting
body, brain, and environment. Japanese Scientific Monthly, 2005.
[92] R. Pfeifer, F. Iida, and J. Bongard. New robotics: Design principles for intelligent systems. Artificial Life, 11:99–120, 2005.
[93] R. Pfeifer, M. Lungarella, and F. Iida. Self-organization, embodiment, and biologically inspired robotics. Science, 318:1088–1093,
2007.
[94] R. Pfeifer and C. Scheier. Understanding intelligence. The MIT
Press, 2001.
[95] A. Pouget and T. J. Sejnowski. Spatial transformations in parietal
cortex using basis functions. Journal of Cognitive Neuroscience,
9(2):222–237, 1997.
[96] N. Qian. Computing stereo disparity and motion with known
binocular cell proprieties. Neural Computation, 6:390–404, 1994.
118
Bibliography
[97] N. Qian. Binocular disparity and the perception of depth. Neuron,
18:359–368, 1997.
[98] C. Qu and B. E. Shi. The role of orientation diversity in binocular
vergence control. In Proc. Int Neural Networks (IJCNN) Joint
Conf, pages 2266–2272, 2011.
[99] P. Rakic. Evolution of the neocortex: a perspective from developmental biology. Nature, 10:724–735, 2009.
[100] R. Saegusa, G. Metta, and G. Sandini. Own body perception
based on visuomotor correlation. In Proc. IEEE/RSJ Int Intelligent Robots and Systems (IROS) Conf, pages 1044–1051, 2010.
[101] R. Saegusa, G. Metta, G. Sandini, and S. Sakka. Active motor babbling for sensorimotor learning. In Proc. IEEE Int. Conf. Robotics
and Biomimetics ROBIO 2008, pages 794–799, 2009.
[102] E. Salinas and L. F. Abbott. Coordinate transformations in the
visual system: How to generate gain fields and what to compute
with them. Progress Brain Research, 130:175–190, 2001.
[103] J. G. Samarawickrama and S. P. Sabatini. Version and vergence
control of a stereo camera head by fitting the movement into the
Hering’s law. In Proc. Fourth Canadian Conf. Computer and Robot
Vision CRV ’07, pages 363–370, 2007.
[104] S. Schaal, A. Ijspeert, and A. Billard. Computational approaches
to motor learning by imitation. Philosophical Transactions of the
Royal Society Lond. B Biological Sciences, 358:537–547, 2003.
[105] S. Schaal, J. Peters, J. Nakanishi, and A. J. Ijspeert. Learning
movement primitives, page 561–572. 2005.
[106] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense
two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47:7–42, 2002.
[107] T. Sejnowski, C. Koch, and P. S. Churchland. Computational
neuroscience. Science, 241:1299–1306, 1988.
[108] R. Shadmehr and J. W. Krakauer. A computational neuroanatomy
for motor control. Experimental Brain Research, 185:359–381,
2008.
[109] J. Sharma, A. Angelucci, and M. Sur. Induction of visual orientation modules in auditory cortex. Nature, 404:841–849, 2000.
119
Bibliography
[110] S. Sherman and R. Guillery. Exploring the Thalamus. 2000.
[111] K. Shimonomura, T. Kushima, and T. Yagi. Binocular robot vision emulating disparity computation in the primary visual cortex.
Neural Networks, 21:331–340, 2008.
[112] K. Shimonomura and T. Yagi. Neuromorphic vergence eye movement control of binocular robot vision. In Proc. IEEE Int Robotics
and Biomimetics (ROBIO) Conf, pages 1774–1779, 2010.
[113] B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo. Robotics:
modelling, planning and control. 2011.
[114] E. P. Simoncelli. Vision and the statistics of the visual environment. Current Opinion in Neurobiology, 13:144–149, 2003.
[115] O. Sporns. Networks of the Brain. 2010.
[116] P. S. G. Stein. Neuronal control of turtle hindlimb motor rhythms.
Physiology A: Neuroethology, Sensory, Neural, and Behavioral
Physiology, 191:213–229, 2005.
[117] W. Sun and B. E. Shi. Joint development of disparity tuning and
vergence control. In Proc. IEEE Int Development and Learning
(ICDL) Conf, volume 2, pages 1–6, 2011.
[118] K. A. Thoroughman and R. Shadmehr. Learning of action through
adaptive combination of motor primitives. Nature, 407:742–747,
2000.
[119] E. Todorov. Bayesian Brain, chapter Optimal Control Theory,
pages 269–298. 2006.
[120] E. Todorov and M. I. Jordan. Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5(11):1226–1235,
2002.
[121] J. Tsai and J. D. Victor. Reading a population code: A multiscale neural model for representing binnocular disparity. Vision
Research, 43:445–466, 2003.
[122] E. K. C. Tsang, S. Y. M. Lam, Y. Meng, and B. E. Shi. Neuromorphic implementation of active gaze and vergence control. In
Proceedings of the IEEE International Symposium on Circuits and
Systems, pages 1076–1079, 2008.
120
Bibliography
[123] E. K. C. Tsang and B. E. Shi. Estimating disparity with confidence
from energy neurons. Advances in Neural Information Processing
Systems, 20, 2007.
[124] E. K. C. Tsang and B. E. Shi. Disparity estimation by pooling
evidence from energy neurons. IEEE Transactions On Neural Network, 20(11), 2009.
[125] K. Vassie and G. Morlino. Natural and Artificial Systems: Compare, Model or Engineer? 2012.
[126] D. Vernon, G. Metta, and G. Sandini. A survey of artificial cognitive systems: Implications for the autonomous development of
mental capabilities in computational agents. IEEE Transactions
On Evolutionary Computation, 11(2):151–180, 2007.
[127] J. F. V. Vincent, O. A. Bogatyreva, N. R. Bogatyrev, A. Bowyer,
and A. K. Pahl. Biomimetics: its pratice and theory. Journal of
Royal Society Interface, 3, 2006.
[128] Y. Wang and B. E. Shi. Autonomous development of vergence
control driven by disparity energy neuron populations. Neural
Computation, 22:730–751, 2010.
[129] Y. Wang and B. E. Shi. Improved binocular vergence control via a
neural network that maximizes an internally defined reward. IEEE
Transactions on Autonomous Mental Development, 3(3):247–256,
2011.
[130] B. Webb. Can robots make good models of biological behaviour?
Behavioural and Brain Sciences, 24(6):1033–1050, 2001.
[131] J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman,
M. Sur, and E. Thelen. Autonomous mental development by
robots and animals. Science, 291:599–600, 2001.
[132] W. Wu, Y. Gao, E. Bienenstock, J. P. Donoghue, and M. J. Black.
Bayesian population decoding of motor cortical activity using a
kalman filter. Neural Computation, 18(1):80–118, 2006.
[133] W. Wu, J. E. Kulkarni, N. G. Hatsopoulos, and L. Paninski. Neural decoding of hand motion using a linear state-space model with
hidden states. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 17(4):370–378, 2009.
121
Bibliography
[134] J. Xing and R. A. Andersen. Models of the posterior parietal
cortex which perform multimodal integration and represent space
in several coordinate frames. Journal of Cognitive Neuroscience,
12(4):601–614, 2000.
[135] D. Zipser and R. A. Andersen. A back-propagation programmed
network that simulates response properties of a subset of posterior
parietal neurons. Nature, 331:679–684, 1988.
122