Download IV. Model Application: the UAV Autonomous Learning in Unknown

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Learning wikipedia , lookup

Artificial neural network wikipedia , lookup

Cognitive neuroscience of music wikipedia , lookup

Neural coding wikipedia , lookup

Convolutional neural network wikipedia , lookup

Molecular neuroscience wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Brain Rules wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Limbic system wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Machine learning wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Concept learning wikipedia , lookup

Catastrophic interference wikipedia , lookup

Mathematical model wikipedia , lookup

Neural modeling fields wikipedia , lookup

Orbitofrontal cortex wikipedia , lookup

Neuroeconomics wikipedia , lookup

Metastability in the brain wikipedia , lookup

Recurrent neural network wikipedia , lookup

Biological neuron model wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Synaptic gating wikipedia , lookup

Nervous system network models wikipedia , lookup

Basal ganglia wikipedia , lookup

Transcript
Cogn Comput
1
A Basal Ganglia Network Centric Autonomous Learning
Model and Its Application in Unmanned Aerial Vehicle
Author identifying information
Conflict of Interest: Yi Zeng declares that he has no conflict of
interest. Guixiang Wang declares that she has no conflict of
interest. Bo Xu declares that he has no conflict of interest.
Cogn Comput
2
A Basal Ganglia Network Centric Autonomous Learning
Model and Its Application in Unmanned Aerial Vehicle
Yi Zeng · Guixiang Wang · Bo Xu
Abstract Autonomous learning paradigms bring flexibility and
generality for machine learning, while most of them are
mathematical optimization driven approaches, and lack of
cognitive evidence. In order to provide a more cognitive driven
foundation, in this paper, we develop a basal ganglia network
centric autonomous learning model. Compared to existing
work on modeling basal ganglia, the work in this paper is
unique from the following perspective: (1) Our work takes the
orbitofrontal cortex (OFC) into the consideration. The
orbitofrontal cortex is critical in decision making because of its
responsibility for reward representation and is critical in control
of the learning process, while most of the basal ganglia models
do not include the orbitofrontal cortex; (2) To compensate the
inaccurate memory of numeric values, a method called precise
encoding is proposed in this paper to help the working memory
remember most of important values during the learning process.
The method combines vector convolution and the idea of
storage by digit bit and is efficient for accurate value storage; (3)
In the information coding process, the Hodgkin-Huxley model
is used to obtain a more biological plausible description of
action potential with plenty of ionic activities. To validate the
effectiveness of the proposed model, we apply our basal
ganglia network centric autonomous learning model to the
Unmanned Aerial Vehicle (UAV) autonomous learning process
in a 3D environment. We build the state, action and reward
space for the UAV in the environment. Experimental results
show that our model is able to give the UAV the ability of free
exploration in the environment after an average of 41 trainings.
Keywords
autonomous learning model · basal ganglia
network · precise encoding · UAV autonomous learning ·
reinforcement learning · interactive environment.
Yi Zeng(*) · Bo Xu
Institute of Automation, Chinese Academy of Science, Beijing,
100190, China, and CAS Center for Excellence in Brain Science
and Intelligence Technology, Chinese Academy of Sciences,
Shanghai, 200031, China
email: [email protected]
Guixiang Wang
Institute of Automation, Chinese Academy of Science, Beijing,
100190, China
email: [email protected]
The first and the second authors contributed equally to this work,
and serve as co-first authors.
I. Introduction
Human brains are general information processing systems
with high intelligence and creativity, and can deal with various
and complex cognitive tasks. This is what traditional artificial
intelligence lacks and should be inspired from. Brain-inspired
neural networks have attracted more and more attention in
recent years since they provide new opportunities to achieve the
goal of General Intelligence. The neural network architecture of
the brain supports the realization of cognitive behaviors at
multiple scales. Although spiking neural networks have been
adopted for cognitive behavior simulation and creating
intelligent applications [7, 11, 20], they are generally
considered to get inspirations from the brain at microscopic
scale. While in most models, macroscopic inspiration, which
emphasize on the coordination mechanisms of different brain
regions are missing, especially for computational learning
model. In this paper, we propose a basal ganglia centric
computational model for autonomous learning, and we provide
applications of the proposed model in the autonomous learning
process of the Unmanned Aerial Vehicle (UAV).
In hundreds of brain areas, the basal ganglia (BG) play a
central role in cognition and are related to the realization of
various cognitive functions such as action selection and
reinforcement learning [1-4, 6]. The basal ganglia do not
perform these cognitive tasks alone. When selecting actions or
performing a reinforcement learning task, the basal ganglia
need to work together with their main input from the cortex and
main output to the thalamus [34]. Actually, the basal ganglia
are associated with wide areas of the cortex [11], which
receives output of the basal ganglia via thalamus [5, 8, 9, 11],
forming the cortex-basal ganglia-thalamus loop. In this paper,
we name this loop the basal ganglia circuitry.
The Basal ganglia are a set of highly interconnected
subcortical nuclei located in the midbrain, around the thalamus
[4, 6]. Basal ganglia are implicated in diverse functions [5],
among which, action selection and reward-based learning get
more attention in recent years [1-4, 6, 17]. These two processes
are closely related to reinforcement learning. In the learning
process, the basal ganglia itself is a modulatory system,
modulated by dopamine (DA). Dopamine dynamically
modulates activity in the basal ganglia and controls learning
process [4, 5]. Damaging to the basal ganglia will cause the
reduction of releasing dopamine, which leads to cognitive
Cogn Comput
3
deficits such as Parkinson’s disease (PD) [8, 10]. This indicates
the key influence of basal ganglia on human cognition.
The cognition study on the basal ganglia circuitry has
produced a variety of models. The “box and arrow” model [18,
19] is an earlier and ordinary description of the basal ganglia
based on anatomical data. Following this previous study,
complicated basal ganglia models [8, 10, 12, 17, 19, 20] are
developed, by adding additional connections among nuclei or
more areas in the basal ganglia circuitry. They are built using
artificial neural network [8, 12, 19], conductance-based
biological neuron models [17] or spiking neural networks [20].
While there still need some improvements in these models.
Most of the basal ganglia models are built using artificial neural
network, which cannot give a good description on neural
activities; ionic level neuron model is not used in these basal
ganglia models; they usually do not take all the important brain
regions into account, especially the orbitofrontal cortex (OFC);
they do not have a real memory for precise value storage.
Besides, the problem these models can handle is easier than the
real situation.
Based on the unsolved issues above, we propose our basal
ganglia model with several major contributions listed below.
Firstly, the Hodgkin-Huxley (H-H) model [28, 29] is applied
in our model to build the basal ganglia network. It has more
ionic dynamics than other models such as the leaky
integrate-and-fire (LIF) model and can make better predictions
on neural activity [30]. Changes in ionic conductance of
sodium (Na+) and potassium (k+) will influence cognition
process. Simulations about the ionic channel of the H-H model
can provide a better explanation about the influence on
cognition.
Secondly, the orbitofrontal cortex (OFC) is taken into
account while most of the basal ganglia models didn’t consider
this. The orbitofrontal cortex (OFC) is a part of the prefrontal
cortex (PFC) [34] and is associated with reward information
representation [8, 21, 22] in decision-making and
reinforcement learning. A more comprehensive basal ganglia
model including the OFC and the amygdala is developed in this
paper. The OFC in our model receives the relative magnitude of
reinforcement values from amygdala [22] and represent
gain-loss information in the memory, including positive reward
and punishment. As the modulatory effect of the dopamine
(DA), the OFC modulates both the basal ganglia and the PFC
[22] and is a critical brain area in reinforcement learning.
Thirdly, a precise encoding mechanism base on working
memory is proposed in this paper. The PFC undertakes the task
of working memory to store values and parameters in
reinforcement learning process [3, 4, 8]. The working memory
achieved in [4, 14] cannot remember precise values, while
many values need to be remembered precisely like human does.
Our method of precise encoding handles this problem well.
Q-values, rewards, and other important values associated with
reinforcement learning process are stored in memory using the
precise encoding mechanism. Q-values in this paper are
state-action matrix included from reinforcement learning [2, 34,
39]. The sate-action pairs are used for action-selection. An
action is chosen when it has high Q-values than others in a
given state.
Last but not least, unlike most of the cognitive experiments,
which deal with relatively simple decision-making task [1, 4,
8], our model’s application of UAV autonomous learning in 3D
environment is more complicated and full of realistic
significance. Intelligent agents are expected to help people with
repeated or dangerous tasks [43, 44]. The latter is what we
concern more about. In recent years, brain-inspired models are
applied to decision-making process and many are associated
with the basal ganglia [4, 8, 17, 39, 40]. The study in [17]
developed a basal ganglia model to control a robot’s action
selection on the ground. Another robotic basal ganglia model
[41] for the simulation of rat food search is built, also in 2D
ground. Different from simpler action selections in 2D space,
we make our application in 3D environment with more
complicated realistic task and the environment remains
unknown to the UAV during the learning process.
An unknown 3D environment’s complexity lies in several
aspects. (1) Agents in a 3D environment have more freedom
and also more action probabilities; it’s harder to train an agent
in these scenes. (2) The environment is unknown means the
UAV knows nothing about the terrain, the mountains and when
and where the enemies come out. It only knows the distances to
other nearby objects. (3) To get a better learning result, the
number of states for UAVs to feel nearby objects is large
(hundreds of states), and this means complex situation and long
training process.
The rest of this paper is organized as follows. Section II gives
a detailed description of the basal ganglia from the anatomy
perspective to the mathematical model perspective. Section III
is our methods, including the Hodgkin-Huxley model for
information coding; the precise encoding algorithm we propose;
our model of basal ganglia network based autonomous learning
model with the consideration of the orbitofrontal cortex and the
working memory. The application of the UAV autonomous
learning under the basal ganglia network is presented in Section
IV. Section V presents our experimental validations. Finally,
Section VI concludes the paper and discusses the future work.
II. Problem Statement and Previous Work
A. The Problem
What we are concerned about is building a brain-inspired
cognitive model, which achieves intellectual behavior using
important brain structures and connections.
Brain-inspired model is another way to reach human-like
intelligence. With the brain as an anatomical instructor, this
kind of model has several benefits for the intelligence research.
(1) The brain is the source that generates the intelligence; and
Cogn Comput
4
studying brain-structure-based model helps exploring the origin
of the intelligence. One can develop the function of specific
brain structures and validate them using computing algorithms
or observed data from brain. This will benefit disease
researches associated with the brain and also brain machine
interface. (2) It will contribute to the study of biological
mechanisms. There are many neural evolution and
communicating rules in the brain, such as the spike
timing-dependent plasticity (STDP) [2]. Experimental results
on the brain-inspired models are much easier to observe than
that from real brains. And this will help analyzing the relations
between neuron activities and biological rules. (3)
Brain-inspired models are built using spiking neural network,
in which the spike is a strong information-coding unit.
Information is also encoded by spikes within brains. This
information coding way enables the comparability from the
neural-activity level between brains and brain-inspired models;
and will facilitate the fusion of information got from different
sense organs to form the whole brain model.
Biological brain is so magical and intelligent so that we can
get some inspirations from its biological structures and
mechanisms. The basal ganglia in the brain are critical in
cognitive behaviors such as action selecting and reinforcement
learning. Many computing models are put forward based on the
structure and function of the basal ganglia as well as their
associated brain regions [8, 10-13, 17, 19, 20]. Most of these
models are not comprehensive in brain regions and can only
deal with sample action-selecting tasks. Besides, they pay less
attention on more real and ionic neuron model such as the
Hodgkin-Huxley model.
In this paper, we build a more brain like basal ganglia model,
with more brain regions and more real neuron model. We will
explain our basic idea in the following and finally use this
model in a 3D unknown environment to train an unmanned
aerial vehicle (UAV).
Our work is related with a basal ganglia model with
mathematical description for action selection [13] and its
expanded rate neuron model [1, 11]. We develop our own basal
ganglia model based on this rate model because of several
advantages. (1) The rate neuron model has low computational
complexity and keeps main biological characters about the
basal ganglia. Different biological neural models are available
for the model. (2) The model’s activities usually correlate well
to the activities in rat’s basal ganglia. (3) Besides, the model
can deal with wide range of inputs with hundreds of dimensions
[1]. These advantages facilitate the implementation of complex
reinforcement learning process with spiking neural network.
The application for UAV behavioral learning in this article
needs hundreds of states, and the range of Q-values may vary in
a wide range.
Before the description of our model, we should introduce
some basic models we used.
B. The Neuroanatomy of the Basal Ganglia
Fig. 2. Mathematical computing model of the basal ganglia (the cortex is not
Fig.
1. The
of the
ganglia
[9]. Note
that the
the thalamus
and
shown)
[1]. neuroanatomy
The input vector
of basal
the basal
ganglia
is from
cortex (more
the
prefrontalthe
cortex
(PFC)cortex).
are not the
thevbasal
ganglia.
and D2
are
specifically
prefrontal
x, y,parts
z, uofand
denote
vectorD1
outputs
from
two
receptors in The
the striatum.
= the globus
GPi
eachdopamine
nucleus respectively.
values inGPe
the equations
arepallidus
from theexternal,
mathematic
=
the internal
globus
pallidus,
the subthalamic
nucleus,
SNc
= the
the
model
of the basal
ganglia
[1]. STN
These=values
in the equations
make
sure
substantia
nigra
pars
compacta,
SNr
=
the
substantia
nigra
pars
reticulate.
basal ganglia’s behavior is consistent with biological function.
The basic components of the basal ganglia include striatum,
the subthalamic nucleus (STN), the globus pallidus external
(GPe), two output nuclei (the substantia nigra pars reticulata
(SNr) and the internal globus pallidus (GPi) [4, 8-11]. Other
nuclei, namely the substantia nigra pars compacta (SNc) and
the ventral tegmental area (VTA), are also seen as part of the
basal ganglia. The SNc and VTA release important modulatory
signals named dopamine (DA), which are critical for several
cognitive behaviors [5, 8, 10]. Major anatomical structures and
components of the basal ganglia are provided in Fig. 1.
The striatum is comprised of two nuclei, the caudate and
putamen, which are functionally similar and often seen as two
independent regions. Anatomically and functionally, these two
components have connections with different basal ganglia
regions, forming the direct pathway and indirect pathway.
Striatum receives direct input from cortical areas and
modulatory afferents (DA) from SNc. There are two types of
DA receptors in striatum, D1 and D2. D1 receptor may enhance
response of striatum neurons in the direct pathway, while D2
has the opposite function in the indirect pathway [8, 34].
The STN is also an input nucleus as it receives input from the
cortex. It also receives inhibitory afferents from the GPe and
projects excitatory efferent back to it. In addition, the STN also
projects to the GPe and SNr, and excites both of them.
The globus pallidus (GP) is composed of two nuclei, GPe
and GPi. The GPe receives inhibitory afferent from the
striatum, excitatory afferent from the STN, and has inhibitory
connections with the STN and the GPi. The GPi is the output
nuclei of basal ganglia. It receives inhibitory input from the
striatum, the GPe and excitatory input from the STN. The GPi
provides inhibitory output to the thalamus and the brainstem
[10].
SNc and VTA are regions that release dopamine (DA) to
control response intensity of striatum. As is mentioned before,
Cogn Comput
5
the D1 and D2 receptor receive DA and have opposite effects
on the striatum neurons.
The basal ganglia are associated with wide areas of cortex
[11]. In cognition process, the basal ganglia circuitry includes
the cortex and the thalamus. The basal ganglia receive input
from the prefrontal cortex and send their output to the thalamus
[34], which project output of the basal ganglia back to the
cortex [5, 8, 9, 11].
The basal ganglia circuitry has two different pathways, the
direct pathway and the indirect pathway. In the direct pathway,
excitatory input goes from the cortex to the D1 receptor and is
projected to the GPi/SNc directly, while the indirect pathway
goes from the D2 receptor to the GPi/SNc via relays in the GPe
and STN [4, 10, 11].
C. The Rate Coding Model of the Basal Ganglia
Before description of our basal ganglia model, we firstly
introduce the mathematical model of basal ganglia proposed by
Gurney et al [1, 4, 13].
The selection mechanism of each nucleus in the basal ganglia
can be described as a linear equation [13]:
ì 0
ï
f (xi ) = í m(xi - ei )
ï 1
î
xi < ei
ei £ xi < 1 / m + ei
(1)
xi > 1 / m + ei
where f (xi ) is the output of a nucleus with
xi
the input and
0 £ xi £ 1 a suitable value interval. The equation takes excited
and inhibitory connection into account and distinguishes them
with positive and negative moduli respectively.
The exact equations for all five nuclei are shown in Fig. 2.
The D1 and D2 receptor in the striatum receive inputs from the
cortex and scale them with respective coefficients. Since D1
excites input from the cortex and D2 has an inhibitory effect on
it, D1 has a scaling factor greater than 1 and D2 has a scaling
factor less than 1.
The input of the basal ganglia is usually a vector. Each value
in the vector represents an action, and the basal ganglia should
select one action according to these values. Every region in the
basal ganglia is a group of neurons. Each group of neuron
represents a vector. x, y, z, u and v denote vector outputs from
each nucleus respectively. Note that these vectors have the
same dimension with the input vector. The mathematical
equations in Fig. 2 are from the model proposed in by Gurney
[13].
The basic mission of the basal ganglia is to select an action
according to a set of input values. The output of the basal
ganglia should be close to zero for the action selected and
positive for other actions [1, 11, 13]. Here, larger value is more
likely to be selected. Given an input of [0.4,0.6,0.1] (the
value’s interval is [0, 1]), the second value is much larger than
Fig. 3. Firing rate curve of the Hodgkin-Huxley model under the various
of current inputs. In our experiments, the neuron starts to have stable regular
spikes when the current input is more than 7.5uA/cm2. The resting potential is
set to 0 and the initial values of V, m, n, and h are set to 0. The time constant is
0.01ms. Simulation period is 1s.
others and will be picked up by the basal ganglia. Since the
output of the basal ganglia is inhibitory, the selected action will
have value close to zero, and an output of [0.2,0,0.4] may be
the response of the basal ganglia.
In the following, we will show how this mathematical model
be simulated by neuron spike coding mechanism.
It is generally believed that biological neurons carry
information by producing complex spike sequences [1, 14, 26].
These spikes encode input stimulus as firing rates, which gives
a glance of neuron activities [23, 26]. The firing frequency
increases as the enhancement of the input stimulus. With its
robustness against noise especially ‘ISI’ noise, firing rate
coding is applied in many biological neural studies [1, 4, 8, 11,
24, 26].
Suppose that J(x) is the current input, a function of x(t) ,
and G(x) is a neuron model (more specifically, we use here the
H-H model). The membrane action potential of the neuron i
can be written as:
vi (t) = Gi (Ji (x(t)))
(2)
where vi (t) is the action potential of a neuron. The action
potential of is pushed higher by the current input over time, and
fires when reach the threshold. The spiking output of neuron i
is denoted by:
d (t) = Gi (Ji (x(t)))
(3)
and its firing rate is written as [14, 16]:
ri ( x)   hi (t )*  i (t  t j )
(4)
j
where hi (t) is the post-synaptic currents function with a time
constant denoted by
t PSC , and written as [1, 16]:
hi (t) = e-t/t PSC
(5)
Cogn Comput
6
Equation (5) indicates that the effect of a neuron decays over
time.
After the encoding process, we need a decoding operation to
get what we estimate, according to the given function f (x(t)).
We can estimate the decoding output using the
least-squared-error method [1, 11, 16] or other linear decoders
such as Kalman filter [25]. Here we use the least-squared-error
calculation.
The estimated output can be expressed as:

cognition. The Changes on the ionic conductance in sodium or
potassium channel may affect the decision making or learning
process, thus gives us an opportunity to investigate on our
intelligence deep into the ion level. Thirdly, H-H model may
give a more biological plausible simulation and prediction on
neuron activity due to its predictive power and reproduction of
all the critical biophysical properties of the action potential.
The H-H model is represented as four equations [29]:
N
f ( x)   ri ( x)di
(10)
(6)
i
where d i denotes the linear decoders and N denotes the
number of neurons used to encode the input stimulus. The
deviation between the estimated value and the input can be
written as

E   [( f ( x)  f ( x))]2 dx
(7)
Through minimizing the deviation E, we obtain the
least-squared version estimation of the decoder di :
N
arg min  [( f ( x)   ri ( x)di )]2 dx
di
(8)
i
For a continuous input x, it is given by [1, 11]:
d = G -1¡
G ij = ò ri rj dx
¡ i = ò ri f (x)dx
(9)
In our basal ganglia computational model, the
Hodgkin-Huxley model is used for information coding in the
algorithm provided above. The decoders d and the firing rate
ri are composed of the weights between two groups of neurons.
Learning is a process that updates these synaptic weights using
error signal, which will be explained in Section IV.
III. Methods
A. The Hodgkin-Huxley Model
The Hodgkin-Huxley (H-H) model provides a good
description of action potential for neurons in the ionic level [28,
29]. It is used to substitute G(x) in Equation (2). There are
three reasons for using H-H model [28, 29] in this paper.
Firstly, H-H equations give more description in the ionic
level of the action potential using a coupled set of differential
equations, which explains experimental results accurately and
enables quantitatively voltage analysis on the nerve cell.
Secondly, sodium and potassium are associated to human
In the H-H model, V is the membrane potential, C is the
membrane capacitance with a value C  1  F / cm 2 and I is
the externally current input.
The ionic current consists of three components, the sodium
(Na+) current with three activation gates and one inactivation
gate, the potassium (K+) with four activation gates, and the leak
current carried primarily by chlorine (Cl-) [27, 30]. n , m and
h in the equations represent the open probability of different
ionic gates respectively [30], with n for potassium (K+), m
and h for sodium (Na+).
The transition rates of the gate between open and closed state
are denoted by a (V ) and b (V ) , which are voltage-dependent
[27, 30] and named rate constant. The transition rate functions
are fitted by the voltage clamp experiments [29, 30]:
10 - V
10 - V
exp(
) -1
10
25 - V
a m (V ) = 0.1
25 - V
exp(
)-1
10
-V
a h (V ) = 0.07 exp( )
20
a n (V ) = 0.01
b n (V ) = 0.125 exp(
b m (V ) = 4 exp(
b h (V ) =
-V
)
80
-V
)
18
(11)
1
30 - V
exp(
)+1
10
Note that the coefficients (0.01, 0.1, 0.007, etc.) in equation
11 are obtained by data fitting according to the H-H model
experiments [29, 30]. They take these values because these
values can give the best-fit curves.
In order to set the resting potential V = Vrest = 0 , other
potential and conductance values have been shifted and given
the values [27, 31, 33]:
EK  12 mV
g K  36 mS / cm
ENa  120 mV
2
g Na  120 mS / cm
EL  10.6 mV
2
g L  0.3 mS / cm 2
Cogn Comput
7
where the three potentials are the equilibrium potential when
the applied current I = 0 m A / cm2 and result in the resting
potential V  Vrest  0 .
When the input current is larger than a specific value Iq ,
which is about 6 - 7 m A / cm2 in our simulations, regular spiking
activity is observed [31]. If the spike interval is T , the average
firing rate can be expressed as f = 1/ T , and it increases as the
stimulus enhanced [31-33], but nonlinear, as shown in Fig. 3.
Neurons in a group may have different firing or refractory
times. Because the H-H model generates spikes regularly when
the input current is large enough, we add an action potential
resetting to it to control its firing process. The peak potential is
about 80-100 mV when H-H fires, so we count a spike if the
potential exceeds 60 mV and set the voltage to zero to let it
restart right now or wait for a while before the finishing of the
refractory.
B. The Orbitofrontal Cortex in the Autonomous
Learning Model
The orbitofrontal cortex (OFC) is part of the prefrontal
cortex (PFC) (see Fig. 4) and also participates in the
reinforcement learning process [8, 21, 22]. It represents the
reward information received from the amygdala [22] and gives
a control on the basal ganglia model in the learning process.
The OFC has connections with both the prefrontal cortex
(PFC) and the striatum [22, 42] of the basal ganglia. The OFC
modulates the striatum as the dopamine (DA) does. We suggest
that this kind of modulation can be modeled by the synaptic
weights between the PFC and the striatum.
The OFC represents both positive and negative rewards in
two separate sub-areas with functional distinction. These two
sub-areas are the medial and lateral OFC respectively [21, 22].
Studies in [21, 22] show that the medial OFC tends to respond
to positive reward of reinforcement values, whereas the lateral
OFC is more active when representing negative rewards. That
means, in our basal ganglia model, the medial OFC output a big
respond value when a positive reward is received, and the
lateral OFC gives a strong response when gets a negative
reward. Note that strong response means large output value.
Before modeling the contribution of the OFC, we need to
explain general learning process in the basal ganglia network.
Learning in our basal ganglia model is achieved by the
updating of the synaptic connections between the basal ganglia
and prefrontal cortex [1, 2]. As discussed above, the SNc/VTA
area modulates the striatum (both StrD1 and StrD2) by
releasing dopamine (DA), controlling the learning process. The
response of striatum to the prefrontal cortex is modeled by the
synaptic connections between them and is modulated by the
learning errors (DA). Driven by learning errors, the synaptic
weight change is written as [16]:
Fig. 4. The basal ganglia circuitry after the consideration of the orbitofrontal
cortex (OFC) and the amygdala. The prefrontal cortex (PFC) includes the OFC,
which receives reward information from the amygdala. The thalamus here
projects the output result of the basal ganglia back to the PFC, which is the
working memory. Both the OFC and the SNc have the function of learning
control for the basal ganglia.
wij   j di e j  Eerr
(12)
where k is the learning rate.
The influence of the OFC can be modeled in the synaptic
learning rule (equation (12)). Rewrite the rule after taking the
effect of the OFC:
wij   j di e j  Eerr (1  (  p  n ))
(13)
where m p is the medial OFC response and mn is the lateral
OFC response with the value interval 0 £ m p , mn £ 1. Eerr is the
error obtained from the Q-learning process and is given by [38,
40]:
(14)
Eerr = Rt (st ,at ) + g max Qt-1 (st ,a) - Qt-1 (st-1,at-1 )
a
where g is discount rate and Rt (st ,at ) is the reward. The Q
value function is updated by [38]:
Qt (st ,at ) = Qt-1 (st ,at ) + r Eerr
(15)
where r is the learning rate in reinforcement learning.
The incorporation of the OFC and the amygdala produces a
more comprehensive basal ganglia circuitry. Also, The PFC is
used as working memory. When performing reinforcement
learning, the basal ganglia take the Q-values as input from the
prefrontal cortex (PFC), and then output a selected action. A
new precise coding method based on the working memory is
proposed in the following to store accurate numeric values.
Cogn Comput
8
Fig. 5. The target vector list and the key vector list for precise encoding. The
target vectors represent elements including 10 Arabic numbers and one
decimal point. There are 7 key vectors, representing 7 positions. The x-axis
denotes serial number of every value in a vector. The y-axis represents value’s
range (0,1).
C. Precise Encoding for the Working Memory
The problem
In the reinforcement learning process, there are some
state-action values ( Q(s,a) ), reward scores, parameters and
errors have to be stored and updated while necessary. It is
widely recognized that the prefrontal cortex (PFC) works as a
working memory [3, 4, 8, 14, 15]. It stores values and
parameters during learning process and updates them while
necessary. When the environment’s current state is determined,
the Q values of that state are extracted from the working
memory and sent into the basal ganglia.
Working memory mechanism can be achieved by
integration, which connects the output of a population of
neurons back to themselves to keep the information. Details
about the integration implementation will not described here.
With the help of the bind and unbind process [4, 14], which
encode cognition process into vector-based descriptions using
convolution, information at different times or with different
meanings can be stored in one group of neurons together.
If we want to remember a target in the memory, we need to
assign a key to that target, and remember the convolution of the
target and the key. When extracting the target from the memory,
the key is used to distinguish which target is the right one, also
by convolution.
Binding (denoted by Ä ) is the storing process and unbinding
(also Ä ) is the extracting process. They are all convolution
process. Given two vectors A and B , the binding can be
written as [4, 14]:
(16)
Fig. 8. The basal ganglia network built in this paper. When performing
reinforcement learning, both the SNc and the OFC receives the reward
information and control the synaptic weights updating between the striatum and
the
The
OFC isofalso
usedcoding
as a working
for value
andthe
reward
Fig.PFC.
6. An
example
precise
processmemory
for the value
‘3.12’.
final
storing. result is sent into the working memory (the PFC). When given the
binding
position sequence in order, the number for every position is extracted.
where FA e FB calculates the element wise product of the two
vectors, and F is the discrete Fourier transform matrix.
Unbinding process extracts the target input vector from the
output (which is an approximate inverse of binding [14]).
Given vector B, unbinding will get A by the equation [4, 14]:
A » (A Ä B)Ä B¢
(17)
where Bi¢ = B(N-i)mod N and it’s a permutation, knowing that the
unbinding result is just a approximation of the original input.
Note that the dimension of the vectors for both binding and
unbinding must be the same.
Because of the inaccuracy of the unbinding process, we
cannot extract a value in the working memory without any
deviation.
This kind of memory is not good for accurate values (such as
value of 3.12). The reason is that the coding and decoding
process of the convolution will result in a not accurate
reconstruction of the input. The error makes accurate
representations of a value impossible. While we human can
usually remember a value exactly without any deviation. And
we do need this ability in many cases.
Basic idea
To improve this inaccurate memory, we develop an
information encoding method called precise encoding, which is
able to store accurate values in the working memory. The main
idea of our method is dividing a value into units and
remembering every unit.
In the memory, if we want to get the target vector A , we
need the key vector B . However, it is not possible to assign a
key vector for every numerical value. Fortunately, the Arabic
numbers are finite and only ten, if include the decimal point,
that’s eleven. We assign a random target vector for every
Cogn Comput
9
product [14] give matching scores between the extracted vector
(take two for example) and those target vectors, and tell us
what target (here include 10 numbers and one decimal point) it
actually represents. Note that the dot product of two vectors is
normalized by their length, written as:
(a,b)
(20)
sv =
ab
And the larger score is the better.
D. The Basal Ganglia Autonomous Learning Model
Fig. 7. The function of working memory in the OFC. the working memory
supports all the parameter storing during the learning process of the basal
ganglia network. The working memory is achieved using the precise encoding
method proposed in this paper.
Arabic numbers including the decimal point, making sure that
the dimension of the vector should not be too low (low
dimension has less distinction and will cause confusion when
calculate the similarity, for example, 32 is an available choice).
Six significant digits will be reserved in our calculation, and
that means seven positions (include the position of the decimal
point) will be used. The target vector and key vector list is
shown in Fig. 5. Note that the position numbers are in
ascending order, from 0 for the least significant digit to 6 for the
most significant digit. Seven randomly generated key vectors
represent these positions respectively. That means for any
given numeric value, we keep it in the working memory using
six digits and one decimal point (if necessary). If a value has
more than six digits, it will be cut off and kept the six top digits.
While if less than six digits, the top digits will be zeros until a
non-zero digit appears.
Fig. 6 shows the encoding process of our method. Let’s take
the value of 3.12 for example. It only has three significant digits
and changing it into 0003.12 is the first step. After that, its
binding result can be expressed as:
M = two Ä P0 + one Ä P1 + dot Ä P2 + three Ä P3
+ zero Ä P4 + zero Ä P5 + zero Ä P6
(18)
and is sent into the working memory. When extracting the value
from the memory, we should calculate the unbinding result
from position P0 to P6 . For example, if we want the value in
position P0 , we execute the unbinding process by the equation:
(19)
where P0¢ is the permutation of P0 , and P0 Ä P0¢ » 1 . After
extracting the target vector, the similarities calculated by dot
Our basal ganglia network model is shown in Fig. 8 and it is
more comprehensive. (1) Our model includes the orbitofrontal
cortex (OFC) for rewards response. The OFC is considered in
our model to describe the reinforcement influence on the
learning process. The OFC receives the reward information
from the amygdala, controls the basal ganglia activities at the
same time. (2) The model stores values and parameters using a
working memory built by precise encoding algorithm we
proposed. During the learning process, the basal ganglia and
other brain regions interact with the PFC to store or take values
when needed, as shown in Fig. 7.
Every part in the model takes its own responsibility. The PFC
is the working memory, storing all the parameters, function
values, errors and rewards using precise coding algorithm.
Every step, the PFC sends the value function to the basal
ganglia according to current state. The basal ganglia are
responsible for action selecting and sent its selection to the
thalamus. The thalamus is a relay for signals in
decision-making task, and sends the basal ganglia’s output back
to the PFC. The SNc/VTA receives the error and rewards
information and modulates the synaptic weights between basal
ganglia and the PFC and learning begins, as shown in Fig. 8.
Our basal ganglia network is built using H-H model. The
whole model is a learning loop. Learning will not finish until
the basal ganglia getting good action selections. There are two
pathways in the circuitry (direct and indirect pathways).
IV. Model Application: the UAV
Autonomous Learning in Unknown
3D Environment
A. The UAV’s Coordinate System and Distance
Measurement
The UAV (unmanned aerial vehicle) learns and behaves in
an unknown 3D environment, so it is much important to give a
brief and efficient description of the environment. Distance to
other object is one of the most important information. The UAV
should avoid collision without manual control when flying
among mountains or buildings. With the help of distance
Cogn Comput
10
Fig. 10. The staring points (red dots) and the probing directions in the
horizontal plane. Here degree of 10 is just a typical value.
sensors, which measure the distance between UAV and
Fig.
Theworld
process
of statessystem
generation.
horizontal
and vertical
state
Fig.12.
9. The
coordinate
and theA:UAV
body coordinate
system.
division.
state
examples
for the
UAV in an
Note thatB:the
moving
direction
is opposite
to 3D
the environment.
z axis.
through the UAV from head to tail and is in the opposite
direction of flight (see Fig. 9).
The horizontal plane determined by the x and z axes is
divided into different directions by the detecting rays every f
Fig. 11. A: Starting points for collision detection. B: The starting points (red
dots) and probing directions in the vertical plane.
obstacles from different directions, the UAV has the ability to
perform auto avoidance.
Precise distance is not a critical requirement for obstacle
avoidance, so distance sensors such as lasers or ultrasonic
sensors are available for the UAV [35, 37]. Distance within a
few hundred meters is what we care about. We also need to
know the UAV’s altitude, position and posture angles, which
can be obtained using altimeter, GPS and gyroscope
respectively.
Two coordinate systems are considered in a 3D environment,
the world coordinate system (wcs) and the UAV body
coordinate system, as shown in Fig. 9. The world coordinate
system is seen as a global coordinate system and defines the
coordinates of every object in the environment, including the
centroid position of the UAV. The body coordinate system is
fixed on the UAV with the origin at its centroid and used to
represent the orientation (posture angles) of the UAV body.
In the body coordinate system, the vertical axis is y with
positive direction going upward; the longitudinal axis (positive
direction is from the head to the tail) is z; the lateral axis is x
with positive direction pointing to the right, as shown in Fig. 9.
The UAV is free to rotate in three dimensions. They are, pitch,
head up or down about axis x, yaw, body left or right about axis
y, and roll, body rotation about axis z. The z-axis passes
degree around the UAV body (
is a typical value, same
in the vertical plane, as shown in Fig. 10 and Fig. 11). These
detecting rays are used to obtain distances from the UAV to the
environment from different directions. The starting points of
the rays are at the edges around the UAV body, as shown in red
dots in Fig. 10 and Fig. 11.
The positions of the starting point (the position of sensors)
will be changed as the moving of the UAV. And there will be
new probing directions as the posture changing of the UAV. All
these changes have to be updated in order to get correct distance
detection. Equations (21) and (22) do the adaption every time if
there is a changing in position or posture of the UAV.
æ cosq
( x ¢p , z ¢p , y ¢p ) = (cx ,cz ,cy ) + (x, z, y) ç sin q
ç
è 0
æ cosq
( xd¢ , zd¢ , yd¢ ) = (x, z, y) ç sinq
ç
è 0
- sinq
cosq
0
- sinq
cosq
0
0
0
1
0
0
1
ö
÷
÷
ø
ö
÷ (21)
÷
ø
(22)
Equation (21) updates the position of a starting point and
equation (22) updates the probing direction of a starting point,
so that every time the UAV changes its position or direction,
new positions and probing direction of starting points will be
calculated.
B. The State and Action Space
Taking
for example, the horizontal and vertical
planes are divided into 36 different directions by detecting rays.
Distances got from these directions have different importance
Cogn Comput
11
TABLE 1. TYPICAL REWARD VALUES FOR STATE CONVERSIONS
State Conversion
Reward Values
for Reference
‘NoDange’ to ‘avoiding’
‘NoDange’ to ‘NoDange’
-0.5
0.0
‘avoiding’ to ‘NoDanger’
‘avoiding’ less danger to ‘avoiding more danger
‘avoiding’ more danger to ‘avoiding’ less danger
‘avoiding’ to ‘avoiding’, same state
‘avoiding’ to ‘collision’
1.0
-0.5
0.5
-0.5
-10
according to the angles between these detecting rays and also
the flight direction. The state space, including both horizontal
and vertical states, is obtained according to the degree of
detecting ray in horizontal and vertical plane respectively.
Supposing the state amount for the horizontal and vertical plane
is N h and N v respectively, it means the horizontal plane can be
divided into N h states with every state in a range of 360/ N h
degrees, and it’s the same with the vertical plane. This kind of
state generation is shown in Fig. 12.
If the UAV is in state
or
, it
means that there are obstacles in that direction. Given two
thresholds D L and D S for the distance, take horizontal plane for
example, when distances reported by all states are above D L ,
the UAV is in the state of ‘NoDanger’ , and given the
state sh = 0 ; when distances reported by one or several states
are below D L and none is below D S , the UAV is in the
‘avoiding’ state
, and needs to take actions to
avoid collision. When the distances from one or several states
are below D S , the UAV is in the state of ‘collision’ and given
sh = -1 . For the vertical plane, it’s the same.
Knowing that there is a ‘NoDanger’ state for the horizontal
and vertical plane respectively, and the UAV is in the
‘collision’ state if sh = -1 or sv = -1 , so we totally have
the environment in order to get reward and update the basal
ganglia model.
C. The Reward Space
The reward space is determined by a set of discrete values
(typical values are shown in Table 1). Being no target point to
reach, the purpose of UAV training is to avoid collisions while
randomly wandering in the environment and the heading
direction is random. Staying always in the state of ‘NoDanger’
will not get more positive reward, while a conversion from
‘avoiding’ state to ‘NoDanger’ state dose. The ‘NoDanger’
state is no harm to the UAV, and staying in it will get a reward
value of 0. An action makes the state of the UAV change from
‘NoDanger’ to ‘avoiding’ will get a negative value and from
‘avoiding’ to ‘collision’ will get a larger negative reward. The
amount of ‘avoiding’ states for the horizontal and vertical plane
are N h and N v respectively. These states have different
degrees of danger. The state whose direction is closer to the
flight direction has a higher risk to collide and is more
dangerous. This means the conversion from a state with lower
risk to the one with higher risk gets a negative reward, on the
contrary, positive reward. Table I is a typical value for reward
in our training process. We give discrete reward values for state
conversions in stable 1, while values that reflect these
differences in state conversions are also appropriate (for
example, linear function of those discrete values in table 1).
The conversion from ‘avoiding’ to ‘avoiding’ is divided into
three different cases. From state of ‘avoiding’ with less danger
to ‘avoiding’ with more danger will get a negative reward value,
in the opposite case, positive value. If the UAV stays in the
same state of ‘avoiding’, a negative reward will be given (see
table 1). Whenever the UAV collides an obstacle, it will get a
larger negative reward value, because collision is much worse
than other states.
NS = (Nh +1)*(Nv +1) +1 states.
V. Experiments
Every time, every state reports a distance from the UAV to
the environment. The state generation process (Fig. 12) may
takes several detecting rays as a state if the range of degree
(such as 36) of a state is larger than the range (such as 10) of a
detecting ray. The distance reported by a state is the smallest
one got from all the detecting rays belonging to that state.
There is no kinematic and dynamic model in the UAV’s
training process. Controlling a mathematical model of the UAV
is not the purpose of this paper. Therefore, the action space of
the UAV is simplified into 5 actions. They are (1) forward, (2)
left and forward, (3) right and forward, (4) upward, (5)
downward. We define l s as a linear moving step (include
Based on the unknown 3D environment built in section IV,
we apply our basal ganglia network for the UAV training
process. In the environment, the UAV should wander around
(random moving direction) and learn to avoid both static and
moving obstacles. Important values (such as Q-values, rewards)
for the learning process are stored in the working memory using
the precise encoding. Every step, the environment is evaluated
and its state is identified. The basal ganglia circuitry takes
Q-values as input and selects one action as its output. And the
UAV will perform the action just selected. According to the
reward got from the OFC and state-action pairs just visited, the
error and reward information are sent into the SNc. The
Dopamine will control the learning process and help updating
the synaptic weights. This is the learning process.
forward, upward and downward) and J s as a rotating angle
step (such as turn left or right, usually 30 degrees). Every time,
the UAV chooses an action before moving a step, and detects
Cogn Comput
12
Fig. 13. The experiment of precise encoding process. A: the input vector and memory result. The vector in the memory is a temporal addition result. B: the output
vector and extracted value. The key vector comes at time of 180ms and the extraction begins at that time. Note that the extracted target vector is not exactly the
former input vector but an approximating result. The decimal point is represented by the value of ‘-1’ here.
Fig. 14. The accuracy experiments of precise encoding. 20 experiments are
carried out in order to test the accuracy of the method. In every experiment, 100
values are generated randomly in range of (0, 100000), and are stored and
extracted using the precise encoding method. The results show that the
accuracies are above 0.9 and the average accuracy is around 0.95.
A. The Precise Encoding
Fig. 13 shows the precise encoding result of the value of 3.12
and the simulation lasts for 400 milliseconds (ms). The
dimension of the target and key vectors is 100 here. In the
experiments, a vector with 100 dimensions gets a good
distinction among vectors that represent different elements
(Arabic numbers or positions), as shown in Fig. 13.
As explained in the precise encoding in section IV, ‘3.12’ is
first changed into ‘0003.12’. Remember that the encoding
Fig. 15. The action selection experiment on the basal ganglia. The top figure is
the input action list. There are five actions and their value will change every 100
ms. The basal ganglia need to select the action with biggest value. Note that the
output of the basal ganglia is negative. The basal ganglia select an action by
outputting it’s value closed to zero.
begins with the least significant digit (‘2’) until the most
significant digit (‘0’) and there is 7 positions totally including
the decimal point. The memory will remember a number with
right position within 20ms. Part A in Fig. 13 shows the target
and key vector inputs and the memory result when
remembering ‘3.12’. After the value encoding is finished at
time of 140ms, the memory will remember the value for a long
time to wait for the extraction.
Figure B shows the extraction process of ‘3.12’ from the
memory. The extracting signal comes at time of 180ms and the
extraction begins with also the least significant digit (‘2’). In
order to identify the decimal point, we assign a value of -1 for
it. As shown in B (Fig. 13), the extraction goes position by
position. As a result of similarity calculation (equation (20)),
Cogn Comput
13
Fig. 18. Attempt accounts before the success of the learning. 21 experimtns are
included; and 1000 steps without collision indicates the finishing of learning.
Fig. 16. 3D environment for the UAV autonomous learning. The
environment is unknown to the UAV.
Fig. 17. The simulation of the enemy UAV. The enemy can emerge at any
direction and a certain distance from the UAV.
the value ‘3.12’ is recognized (‘-1’ for the decimal point)
correctly. Note that the extraction should begin after the
finishing of the memory process.
B. H-H Based Basal Ganglia Action Selection
In this paper, we use the Hodgkin-Huxley (H-H) model for
basal ganglia network implementation. Fig. 15 is a simulation
of basal ganglia action selecting using the H-H model. It should
be noted that the output of the basal ganglia is inhibitory. The
basal ganglia select one action by output a nearly zero response.
In Fig. 14, there are five actions. Each time period (about 100
millisecond), the basal ganglia should select one from them. As
explained in section II, a action which has big value is better. So,
the basal ganglia take five actions with different values as
inputs and select the one with big value by output an nearly zero
value.
C. UAV Autonomous Learning
We develop our 3D UAV training environment using
jMonkeyEngine [36], which is a 3D game development kit
written in Java. After the basic scene construction (such as the
3D environment, the UAV model), we define available actions
for the UAV and abstract states according to the distances
between UAV and environment objects, as analyzed in section
V. With proper reward scores (table I), under the control of the
basal ganglia network, the UAV can learn how to wander in the
environment without collision with other objects. The learning
environment developed by us is shown in Fig. 16, with a main
view of first-person and four small views from other
perspectives (third-person view, left view, right view and rear
view). The mountain and terrain are unknown to the UAV and
what we get is the distance from the UAV to the environment
from many directions. Five actions are available here for the
UAV and Every time, if no obstacles, the UAV will choose
moving directions randomly. There are also moving obstacles
in the environment as the enemy UAV. Fig. 17 shows the
emergency of the enemy from different directions. Actually,
the moving direction of the enemy can be any one from the
circle around the UAV.
In this kind of 3D unknown environment with static and
moving obstacles, the UAV’s ‘inner brain’ learns how to
interact with the environment to avoid collisions. Connection
weights of neurons update as learning going on. After dozens of
attempts, the UAV knows which action to take and seldom
collides with the obstacles. Supposing that, if the UAV is able
to fly in the environment without collisions for more than 1000
steps, we suggest that the training is done and the UAV has the
ability to wander in an environment. Fig. 18 is the experiment
of 21 trails. It shows attempt accounts before the success of
trainings and the average attempt accounts (41 times). This
means that, in most of the cases, the uav will finish its learning
within 50 collisions.
VI. Conclusion
This paper presents a basal ganglia network centric
computational model for autonomous learning (more
specifically reinforcement learning). Compared to other related
work, our model includes more cognitive evidence and
biological details. For example, the Orbitofrontal Cortex (OFC)
is considered as a learning control brain area. The
Hodgkin-Huxley (H-H) model is used in our model to perform
information coding process. In order to store accurate values in
the memory, we propose a precise encoding method to send
Cogn Comput
14
important values in the learning process into the working
memory. In the experiment validations, we apply our basal
ganglia network based models on UAV learning in unknown
3D environment. The results given by simulations indicate
good autonomous learning ability of the UAV.
We suggest that the UAV trained in an environment could be
able to perform free exploration in another unknown
environment without re-training. That is because the
environment modeling has the distances between the UAV and
the obstacles as the main information for environment
evaluation. These kinds of distances have general suitability in
most kinds of environment. The training for the UAV can be
once and for all.
Although from the application perspective, the UAV has a
“built-in brain” and has good performance on autonomous
learning based free exploration, there are several future work
that could be done for improvement. Firstly, the Premotor
Cortex (PMC) is associated with action selection [4, 8] and
should be included in the next version of our model. Secondly,
action space for the UAV is currently simple and the UAV’s
posture angles are not fully considered (just the yaw angle,
indicating turning degrees in horizontal plane) in the
experiments. Including more posture angles will make the
UAV more flexible but more hard to control.
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
COMPLIANCE WITH ETHICAL STANDARDS
Funding: This study was funded by the Strategic Priority
Research Program of the Chinese Academy of Sciences, and
Beijing Municipal Science and Technology Commission.
Ethical approval: This article does not contain any studies with
human participants performed by any of the authors.
[20]
[21]
[22]
[23]
REFERENCES
[24]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Stewart TC, Bekolay, Eliasmith C. Learning to select actions with
spiking neurons in the basal ganglia. Frontiers in Neuroscience. 2012
Jan; 6(2): 1-14.
Bekolay T, Eliasmith C. A general error-modulated STDP learning rule
applied to reinforcement learning in the basal ganglia. Computational
and Systems Neuroscience conference. 2011 Feb 24-27; Salt Lake City,
Utah.
Eliasmith C, Stewart TC, Choo X, Bekolay T, DeWolf T, Tang Y, et al. A
large-scale model of the functioning brain. Science. 2012 Dec;
338(6111): 1202-5.
Eliasmith C. How to build a brain. Reprint edition. New York: Oxford;
2013: 121-171.
Chakravarthy VS, Joseph D, Bapi RS. What do the basal ganglia do? A
modeling perspective. Biological Cybernetics. 2010 Sep; 103(3):
237-253.
Stocco A, Lebiere C, Anderson JR. Conditional Routing of Information
to the Cortex: A Model of the Basal Ganglia’s Role in Cognitive
Coordination. Psychological Review. 2010 Apr; 117(2): 541–574.
Maass W. Networks of spiking neurons: the third generation of neural
network models. Neural Networks. 1997 Dec; 10(9): 1659-1671.
Frank MJ. Dynamic dopamine modulation in the Basal Ganglia: a neuro
computational account of cognitive deficits in medicated and
nonmedicated parkinsonism. Journal of Cognitive Neuroscience. 2005
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
Jan; 17(1): 51-72.
Utter AA, Basso MA. The basal ganglia: an overview of circuits and
function. Neuroscience & Biobehavioral Reviews. 2008 Jan; 32(3):
333-342.
Redgrave P, Rodriguez M, Smith Y, Rodriguez-Oroz MC, Lehericy S,
Bergman H, et al. Goal-directed and habitual control in the basal
ganglia: implications for Parkinson’s disease. Nature Reviews
Neuroscience. 2010 Nov; 11: 760-772.
Stewart TC, Choo X, Eliasmith C. Dynamic behavior of a spiking model
of action selection in the basal ganglia, Proceedings of the 10th
International Conference on Cognitive Modeling. 2010 Aug 5-8;
Philadelphia, PA.
Frank MJ. Hold your horses: a dynamic computational role for the
subthalamic nucleus in decision making. Neural Networks. 2006 Qct;
19(8): 1120-1136.
Gurney K, Prescott TJ, Redgrave P. A computational model of action
selection in the basal ganglia. Biological Cybernetics. 2001 Jun; 84(6):
401-410.
Stewart TC, Eliasmith C. Large-scale synthesis of functional spiking
neural circuits. Proceedings of IEEE. 2014 May; 102(5): 881-898.
Frith C, Dolan R. The role of the prefrontal cortex in higher cognitive
functions. Cognitive Brain Research. 1996 Dec; 5(1): 175-181.
MacNeil D, Eliasmith C. Fine-tuning and the stability of recurrent neural
networks. Public Library of Science (PLoS One). 2011 Sep; 6(9): 1-16.
Gurney K, Prescott TJ, Wickens JR, Redgrave P. Computational models
of the basal ganglia: from robots to membranes. Trends in Neuroscience.
2004 Aug; 27(8): 453-459.
Albin RL, Young AB, Penney JB. The functional anatomy of basal
ganglia disorders. Trends in Neuroscience. 1989 Qct; 12(10): 366-375.
Bar-Gad I, Bergman H. Stepping out of the box: information processing
in the neural networks of the basal ganglia. Current Opinion in
Neurobiology. 2001 Dec; 11(6): 689-695.
Iqarashi J, Shouno O, Fukai T, Tsujino H. Real-time simulation of a
spiking neural network model of the basal ganglia circuitry using
general purpose computing on graphics processing units. Neural
Networks. 2011 Nov; 24(9): 950-960.
Kringelbach ML. The human orbitofrontal cortex: linking reward to
hedonic experience. Nature Reviews Neuroscience. 2005 Sep; 6(9):
691-702.
Frank MJ, Claus ED. Anatomy of a decision: striato-orbitofrontal
interactions in reinforcement learning, decision making, and reversal.
Psychological Review. 2006 Apr; 113(2): 300-326.
Cessac B, Paugam-Moisy H, Viéville T. Overview of facts and issues
about neural coding by spikes. Journal of Physiology-Paris. Nov;
104(1): 5-18.
Eliasmith C, Anderson CH. Neural Engineering. Computation
Representation, and Dynamics in Neurobiological Systems. Cambridge,
Massachusetts London, England: The MIT Press; 2003.
Dethier J, Gilja V, Nuyujukian P, Elassaad SA, Shenoy KV. Spiking
neural network decoder for brain-machine interfaces. IEEE conference
on Neural Engineering. 2011 Apr 27-May 1; Cancun.
Dayan P, Abbott LF. Computational and mathematical modeling of
neural systems: Model Neurons I: Neuroelectronic. Cambridge: MIT
Press; 2003.
Izhikevich EM. Dynamical systems in neuroscience: the geometry of
excitability and bursting. Cambridge, MA: MIT Press; 2004 Dec.
Wang J, Chen LQ, Fei XY. Analysis and control of the bifurcation of
Hodgkin-Huxley model. chaos, solitons and Fractals. 2007 Jan; 31(1):
247-256.
Hodgkin AL, Huxley AF. A quantitative description of membrane
current and its application to conduction and excitation in nerve. The
Journal of Physiology. 1952 Aug; 117(4): 500-544.
Nelson ME. Electrophysiological Models In: databasing the brain: from
Data to Knowledge. Wiley, New York; 2004.
Gerstner W and Kistler WM. Spiking neuron models. Single Neurons,
Populations, Plasticity. Cambridge: Cambridge University Press; 2002.
Wells RB. Introduction to Biological signal processing and
computational Neuroscience. Moscow, ID, USA; 2010.
Long LN, Fang GL. A review of biologically plausible neuron models for
Cogn Comput
15
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
spiking neural networks. AIAA InfoTech Aerospace Conference. 2010
Apr. 20-22; Atlanta, GA.
Weber C, Elshaw M, Wermter S, Triesch J, Willmot C. Reinforcement
Learning: Theory and Applications: Reinforcement learning embedded
in brains and robots. Vienna, Austria; 2008 Jan.
Chee KY, Zhong ZW. Control, navigation and collision avoidance for an
unmanned aerial vehicle. Sensors and Actuators. 2013 Feb; 190: 66-76.
Kusterer R. jMonkeyEngine 3.0: Beginner’s Guide [Internet]. 2013 June
[cited
2015
June
29];
Available
from:
https://www.packtpub.com/sites/default/files/9781849516464_Chapter
_02_0.pdf.
Vásárhelyi G, Virágh C, Somorjai G, Tarcai N, Szörényi T, Nepusz T, et
al. Outdoor flocking and formation flight with autonomous aerial robots.
IEEE Conference on Intelligent Robots and Systems. 2014 Sep. 14-18;
Chicago, IL, USA.
Szepesv´ari C. Algorithms for reinforcement learning. 1 ed. Morgan and
Claypool; 2010.
Ito M, Doya K. Multiple representations and algorithms for
reinforcement learning in the cortico-basal ganglia circuit. Current
Opinion in Neurobiology. 2011 June; 21(3): 368-373.
Maia TV. Reinforcement learning, conditioning, and the brain: Successes
and challenges. Cognitive, Affective, & Behavioral Neuroscience. 2009
Dec; 9(4): 343-364.
Prescott TJ, Montes González FM, Gurney K, Humphries MD, Redgrave
P. A robot model of the basal ganglia: behavior and intrinsic processing.
Neural Networks. 2006 Jan; 19(1): 31-61.
Takahashi YK1, Roesch MR, Stalnaker TA, Haney RZ, Calu DJ, Taylor
AR, et al. The orbitofrontal cortex and ventral tegmental area are
necessary for learning from unexpected outcomes. Neuron. 2009 Apr;
62(2): 269-280.
Yang S, Yu Y, Zhou Z, Ying JH, Liu WX, Wang MX. HKUST IARC
Team Progress Report [Internet]. International Aerial Robotics
Competition. 2014 [cited 2015 Jun 30]; Available from:
http://www.aerialroboticscompetition.org/2014SymposiumPapers/Hon
gKongUniversityofScienceandTechnology.pdf.
IARC. Official Rules for the International Aerial Robotics Competition
[Internet]. International Aerial Robotics Competition. 2015 [cited 2015
June
30];
Available
from:
http://www.aerialroboticscompetition.org/rules.php.