Download The role of the basal ganglia in reinforcement learning

Document related concepts

Convolutional neural network wikipedia , lookup

Neuroethology wikipedia , lookup

Haemodynamic response wikipedia , lookup

Neurotransmitter wikipedia , lookup

Nonsynaptic plasticity wikipedia , lookup

Endocannabinoid system wikipedia , lookup

Neural modeling fields wikipedia , lookup

Response priming wikipedia , lookup

Electrophysiology wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Single-unit recording wikipedia , lookup

Mirror neuron wikipedia , lookup

Multielectrode array wikipedia , lookup

Caridoid escape reaction wikipedia , lookup

Molecular neuroscience wikipedia , lookup

Central pattern generator wikipedia , lookup

Neuroanatomy wikipedia , lookup

Axon guidance wikipedia , lookup

Biological neuron model wikipedia , lookup

Circumventricular organs wikipedia , lookup

Neural oscillation wikipedia , lookup

Sensory cue wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Neural correlates of consciousness wikipedia , lookup

Metastability in the brain wikipedia , lookup

Neuroeconomics wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Pre-Bötzinger complex wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Development of the nervous system wikipedia , lookup

Synaptic gating wikipedia , lookup

Optogenetics wikipedia , lookup

Nervous system network models wikipedia , lookup

Efficient coding hypothesis wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Neural coding wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Basal ganglia wikipedia , lookup

Transcript
The role of the basal ganglia in
reinforcement learning
Thesis submitted for the degree of
“Doctor of Philosophy”
by
Mati Joshua
Submitted to the Senate of the Hebrew University of
Jerusalem
March 2009
This work was carried out under the supervision of
Prof. Hagai Bergman
Abstract
Table of Contents ……………………….…....... 1
Abstract …………………………………….…... 2
Introduction ………………………..………....... 4
I. Formalism of the reinforcement learning problem ..................................... 4
II. Basal ganglia anatomy ................................................................................... 6
III. The basal ganglia as a reinforcement learning agent ................................. 7
IV. The research goals and thesis outline ........................................................... 9
Methods …………………………….……….... 11
Results ………………………….….....……...... 13
I. Value encoding by basal ganglia neuromodulators .................................. 14
II. Value encoding by basal ganglia high frequency GABAergic neurons .. 26
III. Value encoding by basal ganglia low frequency GABAergic neurons .... 41
IV. Value encoding by correlated activity of the basal ganglia ...................... 52
V. Quantifying quality of extracellular recording ..........................................62
Discussion ………………………..…...……...... 78
I. Asymmetry in the encoding of values in the basal ganglia ....................... 79
II. Encoding of dopaminergic neurons ............................................................ 80
III. Comparing basal ganglia subpopulations .................................................. 81
IV. The basal ganglia in control of motor behavior..........................................85
Bibliography........................................................88
Appendix …………………………………….... 93
An algorithm for detection of eye state ............................................................ 93
1
Abstract
Abstract
The basal ganglia are neural structures within the motor, cognitive and limbic control
circuits of the mammalian forebrain. Recent experimental and theoretical studies
depict the basal ganglia as a reinforcement learning system. This model suggests that
basal ganglia activity enables maximization of future reward by controlling the
environment.
Research has indicated that midbrain dopaminergic neurons respond with an increase
in their firing rate when the situation is better than expected (positive surprise). This
signal is in accordance with a reinforcement error signal. However, the low tonic
discharge rate of the dopaminergic neurons suggests that their capability to encode
negative events by suppressing firing rate is limited.
This limitation of the dopaminergic signal suggests two possibilities. The first is that
activity in the basal ganglia encodes both positive and negative values. The second is
that activity in the basal ganglia encodes only positive values and negative values are
encoded by other neural structures.
To dissociate these possibilities I have trained two monkeys on a probabilistic
conditioning task with food, neutral and airpuff outcomes. I recorded the activity of
single neurons in six distinct areas of the basal ganglia of awake behaving monkeys
from both basal ganglia neuro-modulators (midbrain dopaminergic neurons and
cholinergic interneurons of the striatum - TANs) and from the GABAergic neurons of
the main axis of the basal ganglia (medium spiny neurons, external and internal
segments of the globus pallidus and substantia nigra reticulata).
The licking and blinking behavior during cue presentation indicated that the monkeys
expected the different probabilistic appetitive, neutral and aversive outcomes.
Nevertheless, the activity of all five basal ganglia nuclei following the cues was
strongly modulated by expectation of reward but not by expectation of the aversive
event. Furthermore, this neural activity better reflected the probability of future
reward than the probability of future aversive outcome. A comparison of the
properties of responses of the modulators and GABAergic neurons showed that
modulators had phasic and homogeneous responses whereas responses of the
GABAergic neurons were sustained and diverse including coincident increases and
decreases of discharge rate.
2
Abstract
Analysis of the correlation between cells revealed that the synchronization between
dopaminergic neurons transiently increased following rewarding but not aversive
events. The dynamics of the increase in synchronization did not mirror the dynamics
of rate modulations. A simulation suggests that the changes in dopaminergic
synchronization could provide an additional mechanism for controlling their
concentrations in the striatum, beyond firing rate and pattern.
Thus, the difference between the response properties of the basal ganglia subsystems
suggests distinct function of these populations where the modulators provide a scalar
signal to the main axis of the basal ganglia network. The neural-behavioral
asymmetry shows that aversive events and rewards are represented in segregated
neuronal systems. This might be the physiological basis for aversive-appetitive
asymmetric human behavior.
3
Introduction
Introduction
In an attempt to understand neuronal information processing, David Marr (1)
identified three levels of analysis: the problems which must be overcome
(computational level), the strategy that can be used (algorithmic level) and how it
actually occurs in neural activity (implementation level).
In their influential book, Sutton and Barto deal with the first two levels for the
reinforcement learning problem (2). They state: "Reinforcement learning (RL) is
learning what to do…how to map a situation to actions…-so to maximize a numerical
reward signal". They investigated the different classes of algorithms that can solve
the RL problem. In the last ten years neuroscientists have started dealing with RL
problem at the third level of neural implementation. None of these levels stand alone;
it is the interactions between disciplines and questions that may lead to a better grasp
of the nature of RL.
In the following sections of the introduction I present a formalization of the
reinforcement learning problem and its solutions. I then review the main anatomical
components of the basal ganglia and discuss the physiological evidence that connects
RL theory and basal ganglia activity. Finally I outline the major goals of this research,
which is aimed at reducing the gap between computational theoretical RL models and
current knowledge of basal ganglia activity.
Formalism of the reinforcement learning problem
A reinforcement learning system can be divided into four sub-elements:
Policy – the policy defines the agent's actions in the environment. Given a state of the
environment, a policy is a mapping from this state to an action.
Reward function – After executing an action in a given state the agent receives a
single value reward. This reward defines the goal of the learning agent; i.e., to
maximize this value in the long run.
Value function – given a policy, the value function of a state is the total amount of
reward an agent can expect to receive in the future.
Environment Model – given a state and an action, the environment model provides
the statistics of the next state and the expected reward.
Mathematically, let S be a set of states and A be a set of all actions; then a policy is a
conditioned probability function:
4
Introduction
Π(a ∈ A | s ∈ S ) that gives the probability of taking an action at a given state.
The environment model contains a probabilistic function:
P (a ∈ A, st ∈ S , s t +1 ∈ S )
that specifies the transition probability of moving from state
st to st+1 by taking action a.
The reward function is a probability function:
R (r ∈ ℜ | a ∈ A, s t ∈ S , s t +1 ∈ S )
that gives the probability of receiving a reward given
an action that was taken in state st and has led to state st+1.
Π
The value function V (s ∈ S ) gives the expected future reward given a policy.
Rewards in the distant future may be worth less than near-future rewards. One of the
ways to model this "present preference" is by defining the value function as a
discounted sum of the future reward:
∞

V Π (st ) = E ∑ γ n ⋅ r (t + n)
 n=0
 , where st is the state at time t, r(t+n) is the reward n
time steps after time t, and γ is a discount parameter between 0 and 1.
The computational problem of RL is to follow a policy that maximizes the future
reward; i.e., to search for a policy ∏* that maximizes the value function V(s) (it can
be proven that such a policy exists).
Solution of the RL problem - algorithms
Finding the optimal policy ∏* relies on two processes. The first is evaluating the
quality of a policy ∏; i.e., calculating the value function of all states given a policy V∏(s). The second is improving a policy; i.e., finding a policy ∏' such that V∏'(s) ≥
V∏(s). These two processes have been combined in many ways in different situations.
When a full description of the environment has been established, one can use dynamic
programming. This method uses the statistics of the environment and combines value
iterations to evaluate the value function and policy iteration to improve the policy.
However this approach cannot be implemented in the most common case where a full
and reliable model of the environment cannot be found. Even when such a model
exists it may be impractical to use it. The state of the art algorithm for solving the RL
with no prior knowledge of the environment statistics is TD (λ). At each step the
algorithm improves the estimation of the value function by generating a prediction
error δt:
5
Introduction
δ t = rt +1 + γ ⋅ V(s t +1 )-V(s t )
Where rt+1 is the reward given at time t+1, st and st+1 are the states of the agent at time
t and t+1 respectively; V is the estimation of the value function at time t and γ is the
discounting factor defined above. The error is then used to update the estimation of
the policy and value function. This estimation is updated to states that were visited in
the past and the parameter λ fixes the amount of credit each step in the past should
receive. When λ = 0, the algorithm is known as the TD algorithm; in this case only the
value function of the last state is updated. When λ = 1 the algorithm is the Monte
Carlo algorithm- in this case there is no decay in the update rate of past states.
A very important subclass of methods that uses the δ error for solving the RL problem
is known as actor-critic methods. These methods use a separate memory to represent
the policy and the value function. The actor stores the policy and the critic stores the
value and generates the TD error when there is a mismatch between predictions and
actual outcomes. The error is then used to update the policy and the value function.
This sub-class is important because of
its similarity to the biological structure of
basal ganglia networks.
Basal Ganglia anatomy
The basal ganglia are neural structures within the motor, cognitive and limbic control
circuits in the mammalian forebrain. The neural network of the basal ganglia is
commonly viewed as two functionally related subsystems, the main axis and the
neuromodulators (3-5).
The main axis subsystem includes connections between all neocortical areas, the
amygdala and the hippocampus and the basal ganglia input structures; i.e., the
striatum (caudate, putamen and ventral striatum) and the subthalamic nucleus. These
project both directly and indirectly through the external segment of the globus
pallidus (GPe) to the basal ganglia output structures - the internal segment of the
globus pallidus (GPi) and the substantia nigra reticulata (SNr). The GPi and SNr
modify behavior through their projections to the frontal cortex (via the thalamus) and
brain stem pre- motor nuclei (5-7).
The major population of neurons in the striatum is made up of the medium spiny
neurons (MSN). These GABAergic neurons, which constitute >90% of the striatum
6
Introduction
cells, receive their major excitatory input from the cortex and the thalamus and project
to both segments of the globus pallidus and SNr. In addition their axons give rise to a
local collateral arborization, which contact other spiny neurons (8). Other striatal
neurons are the small GABAergic interneurons (1% of the population) and the large
cholinergic interneurons (2%). These cholinergic neurons are thought to correspond to
the physiologically defined (by extra-cellular recording) tonically active neurons TANs (9, 10). Other types of striatal interneurons have also been observed (11).
In the classic view of the basal ganglia (7, 12) transmission of information within the
basal ganglia occurs both directly from the striatum to the GPi/SNr and indirectly
through the GPe and STN. The striatal origins of the direct and indirect pathways are
oppositely affected by D1 and D2 dopamine receptors (13-15). Recently, single axon
tracing anatomical studies have revealed an even more complex map of basal ganglia
connectivity. Striatal neurons projecting to the GPi and SNr send collaterals to the
GPe (16, 17). The physiological evidence for the importance of direct projections
from the motor cortex to the STN (the ‘hyper-direct pathway’) indicates that like the
striatum, the STN is an input stage of the basal ganglia (18, 19). In addition the
recently described feedback projections from the GPe to the striatum (8, 20)
demonstrate the additional complexity of the network compared to the classical view.
The basal ganglia neuro-modulators adjust activity along the main axis by regulation
of plasticity at the corticostriatal synapses (21, 22). The primary basal ganglia neuromodulators are dopamine (from midbrain dopaminergic neurons, 23) and
acetylcholine (from striatal cholinergic interneurons, TANs, 22). In Parkinson's
disease, in which the dopaminergic system is the most seriously damaged, but the
noradrenergic, serotonergic and cholinergic systems are also affected (24),
demonstrates the importance of neuromodulator input to the basal ganglia main axis.
The basal ganglia as a reinforcement learning agent
The pioneering studies of Schultz et. al. (25) showed that dopaminergic neurons
increase their discharge rate when conditions are better than expected. These studies
(26) indicated that a dopaminergic cell that initially responds to delivery of food shifts
its response to an external cue that predicts the delivery of food and stops responding
to food delivery.
7
Introduction
Based on these results it was suggested that
temporal
difference
prediction
error
(25).
dopaminergic neurons encode the
Other
studies
extended
these
groundbreaking findings and showed that the dopaminergic signal resembles the TD
error signal (26-37).
It has been shown at the cellular level that dopamine contributes to plasticity in the
striatum (14, 21, 38). Based on the response properties of dopamine neurons and the
plasticity effects of the dopamine on the striatal neurons, reinforcement learning
models of the basal ganglia assume that the teaching message is transmitted to striatal
territories and reshapes the behavioral policy.
Reinforcement learning models have influenced basal ganglia research for the last
decade, yet there are still many fundamental questions which have not been
addressed. One of the major issues that still needs to be investigated in detail is: what
are the other neural correlates for reinforcement learning besides the dopaminergic
activity?
An important neuromodulator subpopulation is the cholinergic neurons of the
striatum. Consistent with the classical concept of dopamine-acetylcholine balance
(39), the dopaminergic neurons and the TANs have opposite responses. Dopaminergic
neurons typically increase their discharge rate in response to appetitive predictive
cues and outcomes, whereas TANs suppress their tonic discharge (40). Thus, it has
been suggested that some of the dopamine influence on striatal projections neurons is
mediated through inhibition of the TANs (41). The typical TANs response has led to
the conclusion that they may not encode the prediction error themselves but may
condition the dopaminergic signal (28).
Another fundamental issue is the neural correlates of RL with main axis activity.
Reward modulation of the main axis has mainly been studied at the level of the
striatum (42-45). Several studies have revealed discharge modulation of pallidal and
SNr neurons by reward (46-49) and even by the probability of future reward (50).
Nevertheless, unlike the dopaminergic studies, these studies did not find a simple and
coherent relation with RL models.
The lack of a negative teacher
Most studies of dopaminergic neurons have focused on the mismatch in the positive
domain of reinforcement; i.e., when conditions are better than expected (25).
Dopaminergic neurons typically increase their discharge rate in response to appetitive
8
Introduction
predictive cues and outcomes. In line with the predictions of reinforcement learning
theories, dopaminergic neuron discharge decreases with omission of predicted
rewards (29, 51, 52). However, this discharge suppression is limited since the
neuronal firing rate is truncated at zero. In fact, several groups (27, 28) have reported
that the instantaneous firing of Dopaminergic neurons
does not demonstrate
incremental encoding of reward omission, and it was suggested that omission is
encoded by the duration of the discharge decrease (53).
There are even fewer studies and less agreement on basal ganglia responses to
aversive events. There is no consensus regarding the responses of dopaminergic
neurons to aversive events. Some studies suggest that at least some of the
dopaminergic neurons increase their firing rate following an aversive outcome (see
54 for review). Other classical and instrumental conditioning studies suggest that
some dopamine neurons increase their firing rate following a cue that predicts
aversive outcomes (55, 56). Studies on anesthetized rats have shown that
Dopaminergic neurons mainly decrease their discharge rate following an aversive
stimulus (57, 58), but a recent study by this group showed that the decrease is limited
to VTA dopaminergic neurons (59). There are reports that TANs activity
differentiates appetitive and aversive stimuli (60, 61), but it remains unclear whether
and how TANs respond to expectation of aversion. There are no studies on the
responses of the primate basal ganglia main axis high frequency neurons to
expectation of deterministic or probabilistic aversive events.
The research goals and thesis outline
The main goal of my research was to compare aversive and reward related activity in
the basal ganglia. The main research question was whether activity in the basal
ganglia encodes positive and/or negative values.
Another goal was to test whether the anatomical division of the basal ganglia systems
into neuromodulators and main axis is also reflected in the activity of these
populations. Furthermore, does this division also reflect functional differences
between these subpopulations? Specifically does neuromodulator activity resemble
activity expected from a RL teacher (e.g., a critic) and is activity in the main axis
consistent with it being the executor of the system (e.g., the actor)?
9
Introduction
To test these issues I trained two monkeys on a probabilistic conditioning task with
food, neutral and airpuff outcomes and recorded single cell activity in the basal
ganglia.
The first chapter describing my work was published in the Journal of Neuroscience. In
this paper I analyzed the activity midbrain dopaminergic neurons and striatal
cholinergic interneurons (neuromodulators). I found that both dopaminergic and
cholinergic neurons were more strongly modulated by reward than by aversive related
events and better reflected the probability of reward than aversive outcome. I also
found that these populations encode the difference between reward and aversive
events at different epochs of probabilistic classical conditioning trials.
The second chapter was published in the Journal of Neurophysiology. In this paper I
analyzed the activity of main axis neurons. Like neuromodulators, the cells in the
GPe, GPi and SNr also showed preferential activation to reward. I compared these
populations and found differences between the output structures of the basal ganglia.
The third chapter of the results analyzes the activity of the two GABAergic
subpopulations of the basal ganglia: the low frequency discharge neurons of the
globus pallidus and the phasically active neurons of the striatum. I found that although
these populations have different physiological properties (low vs. high frequency of
the other GABAergic populations) the low frequency discharge neurons show
asymmetry in value encoding.
The fourth chapter was published in Neuron. In this paper I conducted a correlation
analysis and found that responses of the neuromodulators were homogenous whereas
the main axis responses were diverse. In addition I found that pairs of neuromodulator
cells dynamically modulate correlation. These changes in correlations may provide an
additional mechanism for controlling their concentrations in the striatum, beyond
firing rate and pattern.
The fifth chapter was published in the Journal of Neuroscience Methods. In this paper
I describe methods which I developed to quantify the quality of the isolation of
extracellular recordings. These methods were used in the first four chapters of the
thesis.
10
Methods
Methods
A full description of the methods which were used in this research can be found in the
method section of the articles. In this chapter I briefly summarize the behavior task
and the recoding technique.
Behavioral Task recording and data acquisition
Two monkeys (L and S, Macaque fascicularis, female 4 kg and male 5 kg) were
engaged in a probabilistic delay classical-conditioning task. Seven different fractal
cues, filling the entire screen, were introduced to the monkey, each predicting the
outcome in a probabilistic manner. Three cues (reward cues) predicted a liquid food
outcome with a delivery probability of 1/3, 2/3 and 1. Three other cues (aversive cues)
predicted an airpuff outcome with a delivery probability of 1/3, 2/3 and 1. The 7th cue
(the neutral cue) was never followed by a food or airpuff outcome. Cues were
presented for two seconds and were immediately followed by a result epoch which
could include an outcome (food, airpuff) or no outcome according to the probabilities
associated with the cue. All trials were followed by a variable inter- trial interval.
Following the training period (L: 6, S: 2 months), I recorded the behavior and the
basal ganglia neural activity while the monkeys were engaged in the behavioral task.
Both monkeys had reached a steady state in their behavior before recording; monkey
L was trained for a longer period since during training I was preparing the data
acquisition setup for recording.
After the training period a MRI compatible head holder and a recording chamber were
attached to the monkey’s head. The head holder enabled the immobilization of the
head during recording. The recording chamber was attached to the skull tilted 40°
laterally in the coronal plane with its center targeted at the stereotaxic coordinates of
the GPe (62, A15, L7, H1; 63)..
In each recording sessions I recorded extracellular activity from 8 glass-coated
tungsten microelectrodes which were advanced separately into the targets in the basal
ganglia. Spike activity was sorted and classified online using a template-matching
algorithm.
During recording, units were classified according to anatomical location, extracellular
waveform, firing rate and pattern, background activity and in some cases response to
free reward and to injection of dopamine agonists.
11
Methods
In addition to spiking data I monitored mouth movements by an infrared reflection
detector and three computerized digital video cameras recorded the monkey's face and
upper limbs at 50 Hz. Video analysis was carried out on home-made custom software
to identify periods when the monkeys closed their eyes (see appendix).
.
12
Results
RESULTS
Chapter details:
I. Midbrain Dopaminergic Neurons and Striatal Cholinergic Interneurons
Encode the Difference between Reward and Aversive Events at Different
Epochs of Probabilistic Classical Conditioning Trials.
Mati Joshua, Avital Adler, Rea Mitelman, Eilon Vaadia and Hagai Bergman.
Journal of Neuroscience. 2008 28(45): 11673-11684.
II. Encoding of probabilistic rewarding and aversive events by pallidal and nigral
neurons. Mati Joshua, Avital Adler, Boris Rosin, Eilon Vaadia and Hagai
Bergman. J Neurophysiol. 2009 Feb;101(2):758-72
III. Asymmetric Encoding of Positive and Negative Expectations by LowFrequency Discharge Basal Ganglia Neurons.
Mati Joshua, Avital Adler and Hagai Bergman.
IV. Synchronization of midbrain dopaminergic neurons is enhanced by rewarding
events. Mati Joshua, Avital Adler, Yifat Prut, Eilon Vaadia, Jeffery R.
Wickens and Hagai Bergman. Neuron, 2009 June 11; 62(5): 695–704
V. Quantifying the isolation quality of extracellularly recorded action potentials.
Mati Joshua, Shlomo Elias, Odeya Levine and Hagai Bergman. The Journal of
Neuroscience Methods, 2007 Jul 30;163(2):267-82.
13
Results I
The Journal of Neuroscience, November 5, 2008 • 28(45):11673–11684 • 11673
Behavioral/Systems/Cognitive
Midbrain Dopaminergic Neurons and Striatal Cholinergic
Interneurons Encode the Difference between Reward and
Aversive Events at Different Epochs of Probabilistic Classical
Conditioning Trials
Mati Joshua,1,2 Avital Adler,1,2 Rea Mitelman,1,2 Eilon Vaadia,1,2 and Hagai Bergman1,2,3
1
Department of Physiology, The Hebrew University–Hadassah Medical School, Jerusalem 91120, Israel, and 2The Interdisciplinary Center for Neural
Computation and 3Eric Roland Center for Neurodegenerative Diseases, The Hebrew University, Jerusalem 91904, Israel
Midbrain dopaminergic neurons (DANs) typically increase their discharge rate in response to appetitive predictive cues and outcomes,
whereas striatal cholinergic tonically active interneurons (TANs) decrease their rate. This may indicate that the activity of TANs and
DANs is negatively correlated and that TANs can broaden the basal ganglia reinforcement teaching signal, for instance by encoding worse
than predicted events. We studied the activity of 106 DANs and 180 TANs of two monkeys recorded during the performance of a classical
conditioning task with cues predicting the probability of food, neutral, and air puff outcomes. DANs responded to all cues with elevations
of discharge rate, whereas TANs depressed their discharge rate. Nevertheless, although dopaminergic responses to appetitive cues were
larger than their responses to neutral or aversive cues, the TAN responses were more similar. Both TANs and DANs responded faster to
an air puff than to a food outcome; however, DANs responded with a discharge elevation, whereas the TAN responses included major
negative and positive deflections. Finally, food versus air puff omission was better encoded by TANs. In terms of the activity of single
neurons with distinct responses to the different behavioral events, both DANs and TANs were more strongly modulated by reward than
by aversive related events and better reflected the probability of reward than aversive outcome. Thus, TANs and DANs encode the task
episodes differentially. The DANs encode mainly the cue and outcome delivery, whereas the TANs mainly encode outcome delivery and
omission at termination of the behavioral trial episode.
Key words: primate; basal ganglia; spike train; reinforcement; substantia nigra; striatum
Introduction
The neural network of the basal ganglia (Bar-Gad and Bergman,
2001; Gurney et al., 2004) is commonly viewed as two functionally related subsystems. The main axis includes fast neurotransmissions (glutamate and GABA) between the cortex, striatum,
and the basal ganglia output structures. The second subsystem is
composed of neuromodulators that adjust the activity along the
main axis by regulation of plasticity at the corticostriatal synapse
(Calabresi et al., 2000; Reynolds et al., 2001). The primary basal
ganglia neuromodulators are dopamine [from midbrain dopaminergic neurons (DANs) (Arbuthnott and Wickens, 2007)] and
acetylcholine [from striatal cholinergic tonically active interneurons (TANs) (Calabresi et al., 2000)].
Previous studies have shown that DANs encode the prediction
Received Aug. 13, 2008; accepted Sept. 16, 2008.
This work was partly supported by the “Fighting against Parkinson” grant from the Hebrew University Netherlands Association. We thank Dr. Bryon Gomberg for MRI; Michael Levi and Michal Rivlin for help in preparing the
experimental setup; Yael Renernt and Inna Finkes for monkey training and general assistance; and Geoffrey Schoenbaum, Yavin Shaham, and Genela Morris for critical reading of early versions of this manuscript.
Correspondence should be addressed to Mati Joshua, Department of Physiology, The Hebrew University–Hadassah Medical School, P.O. Box 12272, Jerusalem 91120, Israel. E-mail: [email protected].
DOI:10.1523/JNEUROSCI.3839-08.2008
Copyright © 2008 Society for Neuroscience 0270-6474/08/2811673-12$15.00/0
14
error in the positive domain; (i.e., they respond when conditions
are better than expected) (Schultz et al., 1997). Consistent with
the classical concept of dopamine–acetylcholine balance (Barbeau, 1962), the DANs and the TANs have opposite responses.
DANs typically increase their discharge rate in response to appetitive predictive cues and outcomes, whereas TANs suppress their
tonic discharge (Graybiel et al., 1994). Thus, it has been suggested
that some of the dopamine influence on striatal projections neurons is mediated through inhibition of the TANs (Wang et al.,
2006).
In contrast to the extensive research on reward-related activity, only a few studies have explored whether basal ganglia neurons encode the negative domain (e.g., aversive outcome or
omission of rewards, which might not be identically encoded by
the nervous system). Dopamine neurons decrease their firing rate
in response to reward omission (Schultz et al., 1997). However,
this suppression is limited because firing rate is truncated at zero.
Other groups (Morris et al., 2004; Bayer and Glimcher, 2005)
have reported that the discharge rate of dopaminergic neurons
does not demonstrate instantaneous incremental encoding of reward omission, and an alternative encoding scheme, based on
response duration, has been proposed (Bayer et al., 2007). There
are even fewer studies and less agreement on basal ganglia re-
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
11674 • J. Neurosci., November 5, 2008 • 28(45):11673–11684
sponses to aversive events. Classical and instrumental conditioning studies suggest that some of the dopamine neurons increase
their firing rate after a cue that predicts aversive outcomes (Mirenowicz and Schultz, 1996; Guarraci and Kapp, 1999). However,
that increase in firing rate may be a result of reward generalization. Studies on anesthetized rats have shown that DANs mainly
decrease their discharge rate after aversive stimulus (Ungless et
al., 2004; Coizet et al., 2006). There are reports that TAN activity
differentiates appetitive and aversive stimuli (Ravel et al., 2003;
Yamada et al., 2004), but it remains unclear whether and how
TANs respond to expectation of aversion.
Here, we designed a classical conditioning paradigm with
aversive and rewarding probabilistic outcomes. Symmetric manipulations of expectation of food (rewarding event) or an air
puff (aversive event) enable the comparison of neural responses
to expectation of positive and negative outcomes. To provide
additional controls for sensory, arousal-related, and generalization responses, our behavioral task included neutral trials, which
had the same structure as the rewarding and the aversive trials but
never yielded positive or negative outcomes.
side were attached to the monkey’s head. The head holder enabled the
immobilization of the head during recording. The recording chamber
was attached to the skull tilted 40° laterally in the coronal plane with its
center targeted at the stereotaxic coordinates of the GPe (A15, L7, H1)
(Szabo and Cowan, 1984; Martin and Bowden, 2000). Analgesia and
antibiotics were administered during surgery and continued for 2 d postoperatively. Recording began after a postoperative recovery period of 5 d.
We estimated the stereotaxic coordinates of the physiological recordings within the basal ganglia nuclei with MRI scans (see Fig. 1a). The MRI
scan (General Electric 1.5 Tesla system; fast spin echo inversion recovery
sequence; dual surface coil; repetition time, 3 s; echo time, 0.044 s; inversion time, 0.3 s; echo train length, 8; coronal slices, 2 mm wide) (Matsui
et al., 2007) was performed with five tungsten electrodes at accurate
coordinates of the chamber [Y,X ⫽ (6,0), (0,⫺6), (0,0), (0,6), and (⫺6,0)
in mm from the chamber center]. We then aligned the two-dimensional
MRI images with the sections of the atlas of Macaca fascicularis (Martin
and Bowden, 2000). We performed an additional MRI scan at the final
stage of the recording period of monkey L to verify our coordinate system. At the end of the experiment, the chamber and head holder of both
monkeys were removed, the skin was sutured, and after a recovery period
the monkeys were sent to a primate sanctuary (http://monkeypark.co.il).
All surgical procedures were performed under aseptic conditions and
general isoflurane and N2O deep anesthesia. MRI procedure was performed under Dormitor and ketamine light anesthesia.
Recording and data acquisition. During recording sessions, the monkey’s head was immobilized and eight glass-coated tungsten microelectrodes (impedance, 0.2– 0.8 M⍀ at 1000 Hz), confined within a cylindrical guide (1.65 mm inner diameter), were advanced separately (EPS;
Alpha Omega Engineering) into the targets in the basal ganglia. The
electrical activity was amplified with a gain of 5K and bandpass filtered
with a 1– 6000 Hz four-pole Butterworth filter and continuously sampled
at 25 kHz by 12 bits ⫾ 5 V analog-to-digital (A/D). Spike activity was
sorted and classified on-line using a template-matching algorithm (ASD;
Alpha Omega Engineering). Spike detection pulses and behavioral events
were sampled at 25 kHz (AlphaLab; Alpha Omega Engineering).
Mouth movements were monitored by an infrared reflection detector
(see Fig. 2a) (Dr. Bouis Devices). The infrared signal was filtered between
1 and 100 Hz by a bandpass four-pole Butterworth filter, and sampled at
1.56 kHz. In addition, three computerized digital video cameras recorded
the monkey’s face and upper limbs at 50 Hz. Video analysis was performed on home-made custom software to identify periods when the
monkeys closed their eyes (see Fig. 2b). Briefly, monkey eye location was
identified by a human observer (once for a daily recording session in
which the monkey’s head was immobilized by connecting the head
holder to an external metal frame), and a classification of eye states (open
or closed) was made based on the number of dark pixels in the eye area.
The algorithm was tested by random samples from several recording days
and found to be consistent with the judgments of a human observer for
⬎99% of the images. In representing recording sessions, we recorded the
monkeys’ spontaneous vocalizations, their arm movements with an
accelerometer, eye position using infrared reflection, and heartbeat by
electrocardiogram (ECG) (veterinary ECG 5 leads system; Palco
Laboratories).
During the acquisition of the neuronal data, two experimenters (M.J.,
A.A.) controlled the position (2–50 ␮m steps) of the eight electrodes and
the on-line spike sorting (ASD; Alpha Omega). Quality of detection and
spike sorting was estimated and graded on-line every 3 min. The on-line
quality estimation was based on the superimposed analog traces of the
recently (20 –100) sorted spikes, the waveforms of events that crossed an
amplitude threshold set by the experimenter above the noise level of each
electrode, the cumulative distribution of the distances from the detected
events to the detection template, and the stability of the discharge rate.
The first step in the neuronal data analysis targeted verification of the
real-time isolation quality (Joshua et al., 2007) and stability of the discharge rate (Gourévitch and Eggermont, 2007). Recorded units were
subjected to off-line quality analysis that included tests for rate stability,
refractory period, waveform isolation, and recording time. First, firing
rate as a function of time during the recording session was graphically
displayed, and the largest continuous segment of stable data were selected
Materials and Methods
All experimental protocols were performed in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and with Hebrew University guidelines for the use and care of
laboratory animals in research, supervised by the institutional animal
care and use committee.
Behavioral task. Two monkeys (L and S; macaque fascicularis; female, 4
kg; male, 5 kg) were engaged in a probabilistic delay classicalconditioning task (see Fig. 1b). The monkeys were seated in a primate
chair facing a 17 inch computer screen placed at a distance of 50 cm.
Seven different fractal cues (Chaos Pro 3.2 program; www.chaospro.de),
stretched on the entire screen, were introduced to the monkey, each
predicting the outcome in a probabilistic manner. Three cues (reward
cues) predicted a liquid food outcome (L, 0.4 ml, 100 ms duration; S, 0.6
ml, 150 ms) with a delivery probability of 1/3, 2/3, and 1. Three other cues
(aversive cues) predicted an air puff outcome (L, 100 ms duration; S, 150
ms; 50 –70 psi; split and directed 2 cm from each eye; Airstim; San Diego
Instruments) with a delivery probability of 1/3, 2/3, and 1. The seventh
cue (the neutral cue) was never followed by a food or air puff outcome.
Cues were presented for 2 s and were immediately followed by a result
epoch, which could include an outcome (food, air puff) or no outcome
according to the probabilities associated with the cue. The beginning of
the result epoch was signaled by one of three sounds that discriminated
the three possible events: a drop of food, an air puff, or no outcome (see
Fig. 1b). Sounds were normalized to the same intensity and duration.
These sounds were additional to the background device sounds (air puff
solenoid and food pump). All trials were followed by a variable intertrial
interval (ITI) (monkey S, 3–7; monkey L, 4 – 8 s). Because of the probabilistic structure of the behavioral task and to equalize the average occurrence of each outcome the nondeterministic cues ( p ⫽ 1 for reward or
aversive outcome) were introduced three time more than the deterministic ones. With this occurrence ratio, all trials were randomly
interleaved.
During a behavioral session (usually five sessions per week), the monkeys performed 900 –1300 trials/d before losing their motivation for
food. Over the weekend, the monkeys were given ad libitum access to
food. Water was available ad libitum during all training and recording
periods. After the training period (L, 6; S, 2 months), we recorded the
behavior and the basal ganglia neural activity while the monkeys were
engaged in the behavioral task. The same images and sounds were used
both for training and for the recording periods (L, 6; S, 5 months);
however, the visual and the auditory stimuli were shuffled between
monkeys.
Surgery, magnetic resonance imaging, and rehabilitation. After the
training period, a magnetic resonance imaging (MRI)-compatible Cilux
head holder and a square Cilux recording chamber with a 27 mm (inner)
15
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11675
Table 1. The neural database
Population
No. of cells
TAN
L: 71
S:109
DAN
L: 41
S: 65
Isolation
score
Fraction ISI ⬍2
ms
Discharge rate (spike/s)
Recorded time
(s)
No. of recorded trials
No. of spikes/recorded cell
0.92 ⫾ 0.06
关0.8 – 0.99兴
0.91 ⫾ 0.06
关0.8 – 0.99兴
0.76 ⫾ 0.16
关0.5– 0.99兴
0.8 ⫾ 0.13
关0.5– 0.99兴
0.0002 ⫾ 0.0003
关0 – 0.0016兴
0.0002 ⫾ 0.0003
关0 – 0.0016兴
0.0008 ⫾ 0.0009
关0 – 0.0039兴
0.0007 ⫾ 0.001
关0 – 0.005兴
6.8 ⫾ 1.4
关3.1–9.4兴
5.0 ⫾ 1.4
关1.93–9.2兴
4.5 ⫾ 1.9
关0.69 –9.4兴
3.7 ⫾ 1.6
关0.77–9.4兴
3647 ⫾ 1891
关1210 –9901兴
3414 ⫾ 1920
关1260 –10,801兴
2992 ⫾ 1372
关1260 –7195兴
3984 ⫾ 2037
关1260 –11,161兴
309 ⫾ 168
关111– 828兴
378 ⫾ 204
关123–1221兴
254 ⫾ 118
关110 – 621兴
393 ⫾ 200
关126 –1098兴
25,112 ⫾ 14,270
关5026 – 66,410兴
17,147 ⫾ 10,948
关2442– 63,364兴
13,211 ⫾ 10,817
关3952– 67,895兴
15,070 ⫾ 10,327
关2324 – 46,617兴
The recording statistics were calculated separately for each neural population. Each cell in the table contains the mean and SD and in brackets the range of the scores. The range of the isolation score is 0 to 1. ⬙Fraction ISI ⬍2 ms⬙ is the fraction
of interspike intervals (ISIs) ⬍2 ms of all ISIs of a cell. Recording time and number of recorded trials represent only the part of the recording satisfying the inclusion criteria and included in the analysis database.
Figure 1. MRI and task. a, MRI identification of recording coordinates. Coronal MRI images numbered with respect to distance
(in millimeters) from anterior commissure. Tungsten microelectrodes are inserted at known chamber coordinates. Identification
of the brain structures is based on alignment of the MRI images with the monkey atlas. Abbreviations: AC, Anterior commissure;
C, caudate; Chm, recording chamber (filled with 3% agar); Elc, electrode; G, globus pallidus; P, putamen; S, substantia nigra; T,
thalamus. b, Behavioral task. Top, Reward trials; middle, neutral trials; bottom, aversive trials. Cues are shown for monkey L.
Different speaker colors represent different sounds.
for additional analysis. Second, cells in which ⬎0.02 of the total interspike intervals were ⬍2 ms were excluded from the database. Third, only
TANs with an isolation score (Joshua et al., 2007) ⬎0.8 and DANs with
an isolation score ⬎0.5 were included in the database. The lower threshold used for the DANs is attributable to the highly dense cellular structure
of the substantia nigra pars compacta (SNc) which makes single-cell
isolation difficult. We tested the subgroup of DANs with an isolation
score ⬎0.8 (N ⫽ 52) and found the same qualitative population results as
reported for the larger DAN population. Finally, only cells that met the
above inclusion criteria for ⬎20 min during
performance of the behavioral task were included in the neural database (average, 59 min
and 346 trials). Table 1 provides the statistical
details of the cells that were included in the
analysis database.
During recording, units were classified according to anatomical location, extracellular
waveform, firing rate and pattern, background
activity, and in some cases response to free reward and to injection of dopamine agonists. To
validate classification, we performed off-line
analyses of the extracellular waveform shape,
firing rate, and firing pattern of the neurons
(see Figs. 3b, 4b). Waveform shape was quantified as the duration from the first negative peak
to the next positive peak; rate was defined as the
average of the overall firing rate; firing pattern
was quantified by the coefficient of variation
(SD/mean) of the interspike intervals. To further validate the DAN population response, we
repeated the population analysis on a subset of
DANs with a firing rate ⬍8 Hz and peak-topeak duration of ⬎0.5 ms. The results of this
analysis were similar to those of the whole recorded population (data not shown). Finally,
apomorphine (0.1 mg/kg) was injected in a few
cases (see Fig. 4c) to test for suppression of DAN
activity (Aebischer and Schultz, 1984). We
quantified this suppression as the root mean
square (RMS) of the high-pass-filtered signal
(300 – 6000 Hz). We used the RMS and not the
spike rate to avoid possible errors and biases
induced by spike detection and sorting (Moran
et al., 2006), which are enhanced after apomorphine intramuscular injection because of monkey movements.
Statistical analysis. Neuronal responses to behavioral events were first characterized by their
poststimulus time histogram (PSTH). The histograms were calculated in 1 ms bins and smoothed
with a Gaussian window with a SD of 20 ms. The
baseline firing rate was calculated by averaging the
firing rate in the last 3 s of the variable (4 – 8, 3–7 s;
monkey L and S, respectively) ITI and was denoted as baselineFR as follows:
baselineFR ⬅ mean(psthITI_END共t兲).
0ⱕtⱕ3
To determine significant responses in the single PSTH analysis, we
calculated the SD of the PSTH of the last 3 s of the ITI using the same
number of trials as in the studied PSTH and identified time segments
in which the deviation from baselineFR exceeded three times the
ITI-SD (3 ␴ rule). A response was considered significant only if the
16
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
11676 • J. Neurosci., November 5, 2008 • 28(45):11673–11684
duration of the deviant segment was ⬎60 ms
(three times the SD of the smoothing filter). A
cell was considered to have a significant response on a trial epoch if at least one of its
PSTHs (e.g., one of the three PSTHs after reward aversive or neutral cue) had a significant response. To check that this analysis was
not biased by multiple comparison confounds, we performed the same analysis on
ITI epoch and found that none of the cells was
significantly modulated at this epoch.
We defined the difference index between two
events as the mean absolute difference between
their PSTHs, i.e., the following:
difference index (event1, event2)⬅
mean(abs(psth1共t兲⫺psth2共t兲)).
t
This index is a mean difference between rate
functions and hence has units of spike per second. To test the significance of this index, we
used resampling (bootstrap) methods. Singletrial responses were shuffled and resampled repeatedly into two groups, and the difference index was then calculated between them. This
process was repeated 500 times. A difference
index was considered significant if it was larger
than the difference indices of a given fraction of
these surrogates (1 ⫺ p where p is the test confidence level). To cross-check the difference index results, we performed MANOVAs ( p ⬍
0.05) using 50 and 100 ms time bins. We also
bootstrapped the MANOVA statistics and
found that all these analyses yielded similar results. In this manuscript, we elected to show the
difference index because it gives an intuitive
range of difference [i.e., the average difference
(in spike rate) between the responses to two
events].
We derived two indices from the difference
index; the first was the response index that was
defined as the difference index when one of the
events was the neutral event, i.e. the following:
response index (event)⬅difference index
(event, neutral event).
Figure 2. Behavioral monitoring and results. a, Mouth signal: example from the reward cue epoch of the licking signal,
monitored by an infrared reflection detector. The black arrow indicates time of cue presentation, and the gray arrow indicates cue
offset and reward tone onset. b, Image of monkey’s eyes. Video signal was processed and each frame was classified according to
the state of the eyes [i.e., open (top) or closed (bottom)]. c, Behavioral results. Top, Licking (average ⫾ SEM) as recorded by an
infrared reflection detector directed at the monkey’s mouth. The voltage output of the detector was sampled by A/D converter and
the y-scale is given in arbitrary A/D units. Bottom, Fraction of trials with eyes closed (average⫾ SEM) as recorded by computerized
video processing. Columns correspond to trial epoch (cue; outcome, food or air puff; no outcome, sound only) aligned to event
onset (time ⫽ 0). Note the overlap of 0.5 s between the start of the outcome and the no-outcome epochs and the last 0.5 s of the
cue epoch. Data were averaged for each session and then across sessions (N, number of recording sessions). Color coding of trial
types is given at bottom right (A, aversive; N, neutral; R, reward; the number is the outcome probability). d, Normalized behavioral
response. Licking (blue) and blinking (red) response (average ⫾ SEM, number of sessions as in c) in a time window around the
behavioral event (cue, 500 – 0 ms before cue end; outcome and no outcome, 0 –500 ms after cue end for blinking response and
500 –1000 ms for licking response). The responses are normalized between 0 and 1 [i.e., in each epoch a response ( X) is transformed by (X ⫺ min)/(max ⫺ min), where min and max are the minimal and maximal values of the response in this epoch].
Abscissa, Different behavioral conditions (A, aversive; N, neutral; R, reward; the number is the outcome probability).
The second was the probability coding index.
This index was defined as the difference index
between the events with a high probability
( p ⫽ 2/3 and 1) of receiving an outcome and
the event with a low probability ( p ⫽ 1/3) of
receiving the same outcome. The clustering
of the events into high and low probability followed the behavioral
responses of the monkey (see Results) and allowed us to generate a
simple graphic representation of our results. A MANOVA of the responses to all the three different probabilities yielded similar results.
In addition to the single-cell analysis, we performed population analyses. The responses of striatal TANs and DANs are very stereotypic
(Graybiel et al., 1994; Schultz, 1998). Hence, the average population
response was estimated by averaging the PSTH deviation from baselineFR
across the whole population. To determine whether the population response was significant, we first constructed the single-cell PSTH at bins of
20 ms, and then averaged across the population to obtain the population
PSTH. Finally, we performed a t test to check bin by bin whether the
population response was significantly different from zero ( p ⬍ 0.01). If
the population PSTH was significant for more than three consecutive
bins, it was considered a significant population response.
The data of the two monkeys were grouped unless a significant difference between the individual monkeys was detected. Data analysis was
performed on custom software using MATLAB V7 (Mathworks).
Results
We recorded the neuronal activity of TANs and DANs (Fig. 1a,
Table 1) in parallel with the monitoring of the monkeys’ behavior
(Fig. 2). During recordings, the monkeys performed a probabilistic classical conditioning task (Fig. 1b) with food or air puff as
the rewarding and aversive outcomes, respectively. This task design provides a symmetric expectation of a rewarding or aversive
event after cue presentation and therefore served to test the following three hypotheses.
First, DAN and TAN activity reflects expectation, delivery,
and omission of reward and of aversive events. The alternative is
17
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11677
livery of aversive events and omission of
predicted rewarding event).
Third, TAN activity mirrors DAN activity. In a previous study, it was shown
that the pause response of TANs was coincident with the increase in DAN activity;
however, DANs but not TANs incrementally encoded reward probability (Morris
et al., 2004). Here, we test whether this simultaneous opposite response also appears
in a task that includes expectation and delivery of aversive events. We also examine
whether TANs and DANs discriminate between reward and aversive events during the
same parts of the task episode.
Monkey behavior reflects expectation of
rewarding and aversive events
We recorded the monkeys’ behavior
during performance of a probabilistic
classical conditioning task (Fig. 1b) with
food or air puff as the rewarding and
aversive outcomes, respectively. We
tested how extensive (several months; 5
d/week; ⬃1000 trials/d) conditioning affected the monkeys’ behavior by monitoring licking and blinking responses
during neural recordings (Fig. 2a,b).
The monkeys increased their licking
in response to cues predicting food but
only slightly to the aversive and neutral
cues (Fig. 2c, top row). Similarly, the
monkeys’ frequency of blinking increased to cues predicting air puff but
only slightly to reward and neutral cues
(Fig. 2c, bottom row). The increase in
blinking and licking during the cue epoch was maximal in trials in which the
Figure 3. An example of neural activity of a single striatal TAN and identification of striatal cell types. a, Rasters and PSTHs of probability of outcome was 2/3 or 1 and
a single TAN of monkey L aligned to the trial behavioral events. The rows are separated according to the expected outcome. First smaller in trials in which the probability
row, Trials with cues that predict the delivery of food. Second row, Trials with the neutral cue (a cue always followed by no was 1/3 (0 –500 ms before cue ending;
outcome). Third row, Trials with cues that predict an air puff. Columns are aligned according to the trial epoch. First column, Cue p ⬍ 0.01, Tukey’s HSD post hoc).
presentation epoch (⫺0.2 to 1 s after cue onset). Second column, Outcome epoch (⫺0.2 to 1 s after delivery of food or air puff).
The behavioral responses to food or air
Third column, Trials in which no outcome was delivered; outcome omission was signaled to the monkey by the no-outcome sound puff delivery (and their corresponding
(⫺0.2 to 1 s after sound onset). Color codes are marked at the left side of the cue rasters (A, aversive; N, neutral; R, reward; the sounds) were not dependent on their prenumber is the outcome probability). For the graphic presentation, rasters were randomly pruned and adjusted to contain the same vious predictions (Fig. 2c, outcome colnumber of trials. The total number of trials (before pruning) was 708. PSTHs were constructed by summing activity across trials in
umn). Food and air puff omission, as well
1 ms resolution and then smoothing with a Gaussian window (SD, 20 ms). Examples from three 500 ms segments of the analog
as the final (no outcome) event of the neusignal (from first, second, and last third of the recording session) are shown in the middle plot. Examples of spike waveforms are
shown next to the 500 ms analog segment. The spike waveform plot includes 100 superimposed waveforms selected randomly tral trials were indicated to the monkeys by
around the time of the corresponding analog trace. Isolation score was 0.98; the fraction of spikes in first 2 ms of the interspike an additional “no outcome” sound. When
interval (ISI) histogram was 0.00002. b, Off-line analysis of striatal cell identification based on firing pattern (abscissa) and spike expected food or air puff were not delivpeak-to-peak duration (ordinate). Color code: Black, TANs; gray, phasic active neurons (PANs). Off-line analysis of neuron shape ered (no outcome on the p ⫽ 1/3 or p ⫽
and coefficient of variance (CV) of the time of interspike interval shows that striatal neurons are separated into two clusters in 2/3 trials), licking and blinking increased,
which the PANs have a large CV with comparably narrow waveforms and TANs have a small CV with very wide waveforms. The cell respectively; this increase was in accorin a is plotted in a large black circle and marked with an arrow.
dance with the previously instructed probability. The increase in the licking and
that only reward-related events are represented by the activity of
blinking behavior was smaller and shorter than the increase after
one or both basal ganglia neuromodulators.
food or air puff outcomes (Fig. 2c, no outcome). Licking and
Second, DAN and TAN activity encode an error in the temblinking increased slightly to the neutral trials (Fig. 2c, no outporal prediction (TD) of reward and aversive events (Sutton and
come, green line).
Barto, 1998). The TD hypothesis suggests opposite modulations
Normalization of the behavioral responses (Fig. 2d) reflects
for positive (i.e., delivery and expectation of reward and omission
the opposite trends of the response to aversive versus rewardof aversive events) and negative errors (i.e., expectation and deing events. It suggests that the monkeys mainly categorized the
18
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
11678 • J. Neurosci., November 5, 2008 • 28(45):11673–11684
high-probability ( p ⫽ 2/3 and 1) versus
the low-probability ( p ⫽ 1/3) cues.
Heart rate analysis can discriminate between high- and low-arousal states
(Berntson et al., 1997). However, analysis of the heart rate and its variability did
not reveal significant differences between the epochs after aversive versus
reward predicting cues, suggesting a
symmetric effect on monkey arousal.
In sum, the analysis of the behavioral
responses indicates the monkeys could
distinguish between aversive, reward, and
neutral cues and between the cues with
high- and low-outcome probabilities. According to these behavioral findings, we
grouped the events with high probability
( p ⫽ 2/3 and p ⫽ 1) for the neural activity
analysis.
The neuronal database
We recorded 191 DANs from the SNc and
313 TANs from the putamen; of these, 106
DANs and 180 TANs passed the quality
criteria (see Materials and Methods) and
their response was further analyzed (Table
1). Figures 3a and 4a show examples of the
activity of a TAN and a DAN, respectively,
recorded during the performance of the
behavioral task. The TANs and DANs were
identified on-line (see Materials and
Methods) and identity was verified by offline clustering of the spike waveforms,
spike train pattern (Figs. 3b, 4b), and occasionally by analysis of the responses to
apomorphine injection (Fig. 4c).
We found that, in each trial epoch (cue,
outcome, and no outcome), most of the
cells had a significant response to at least
one event (Fig. 5). Below, we provide additional analysis both of the population
and the single-cell responses at each epoch
and compare the responses to the aversive,
neutral, and reward-related events and between the DANs and TANs.
Figure 4. An example of neural activity from a single DAN and identification of substantia nigra cell types. a, Same conventions
as in Figure 3a. Total number of trials was 271; isolation score was 0.67; fraction of spike in first 2 ms of the ISI histogram was
0.0001. b, Off-line analysis of substantia nigra (SN) cell identification based on firing rate (abscissa) and spike peak-to-peak
duration (ordinate). Color code: Black, DANs; gray, substantia nigra pars reticulata (SNr) neurons; light gray, unclassified SN
neurons. Off-line analysis of the spike shape and firing rate shows that nigral neurons are separated into two clusters in which the
SNr cells have a high firing rate with narrow waveforms and DANs have a low firing rate with wide waveforms. Cells that were not
classified as DAN or SNr tended to be between clusters. The cell in a is plotted in a large black circle and marked with an arrow. c,
Example of neuronal responses to apomorphine injection in a single recording day. The continuous line is the RMS of the bandpassfiltered analog signal (300 – 6000 Hz) in bins of 10 s. Color code: Black, Electrodes in which a DAN was identified; dotted gray,
electrode with a SNr neuron.
TAN and DAN activity is asymmetrically modulated by
expectation of aversive events and reward in the cue epoch
Population analysis of the neuronal activity in the cue epoch
shows that, whereas TAN average responses to the aversive, neutral, and reward predicting cues tended to overlap (Fig. 6a, top),
the population average activity of the DANs was highly discriminative both between reward and aversive events and between
cues with high ( p ⫽ 2/3 and p ⫽ 1) and low ( p ⫽ 1/3) prediction
probability of reward delivery. The DANs positive response to
aversive cues was smaller than the response to the reward cues
(Fig. 6a, bottom). The suppression of TAN activity after highprobability reward cues tended to be longer than after lowprobability cues (Fig. 6a, compare blue with light blue lines) (for
similar trends, see Shimo and Hikosaka, 2001; Ravel et al., 2003).
As previously reported (Morris et al., 2004), comparison of the
time of reward cue modulation showed that the DAN increase
and TAN decrease of discharge rate were coincident (Fig. 6b, top)
Figure 5. Percentage of TANs and DANs with significant responses to the different behavioral events. The percentage of neurons with significant responses to the cue, outcome, and
no-outcome tone events of the total number of studied neurons (n ⫽ 180 TANs and 106 DANs).
Color code: Black, TAN; white, DAN. For each epoch, we grouped trials according to trial type
(aversive, reward, and neutral). A cell was considered to be significantly modulated in an epoch
if at least one of the responses in that epoch was significant.
19
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11679
population did not robustly encode the
outcome probability. Disconfirming the
TD hypothesis, DANs also increased their
activity in response to aversive cues and
there was a small (but significant) difference between the DAN responses to aversive cues with different predictions of a
future aversive event (hypothesis 2). Finally, the first phase of the responses of the
TANs to the visual cues generally mirrored
the response of the DANs. DANs but not
the TAN population discriminated robustly between reward and aversive cues
(hypothesis 3).
Both TANs and DANs respond to
aversive and reward outcome
Population analysis of the neural activity
at the time of outcome delivery and coincident sounds showed that both TANs and
DANs respond to food and air puff delivFigure 6. TAN and DAN population response at cue epoch. a, Population average response to behavioral cues. Only the first
ery, but with a faster response to the air
0.8 s after the cue is shown to highlight the short duration of the responses. Top, TANs (n ⫽ 180 neurons). Bottom, DANs (n ⫽
106). Color coding: Dark blue, Responses to high-probability ( p ⫽ 1 and p ⫽ 2/3) reward cues; light blue, reward low ( p ⫽ puff (Fig. 8a). Whereas the cumulative re1/3)-probability cues; green, neutral cue; orange, aversive low-probability cues; red, aversive high-probability cues. b, TAN versus sponses of the TANs to aversive and reDAN population response. The populations and timescale are the same as in a. The population response was considered significant ward outcome were similar in magnitude,
if it passed the significance criteria (t test, p ⬍ 0.01) for at least three consecutive 20 ms bins. For this analysis, all trials of the same the response of the DANs was larger for the
type (aversive or reward) were grouped. Top, TAN versus DAN in the reward trials. Bottom, TAN versus DAN in the aversive trials. reward events. However, although smaller
Color coding: Orange, TAN significant bins; white, TAN nonsignificant bins; purple, DAN significant bins; gray, DAN nonsignificant in magnitude, the DANs respond with an
bins.
excitation to the aversive outcome (Fig.
8a). TANs and DANs activity at reward
with slightly shorter lags for the DAN responses. Finally, unlike
delivery was larger for the low-probability trials than for the highthe responses to the reward cue, the TAN and DAN responses to
probability trials (Fig. 8a). Comparison of the modulation time
aversive cues had a significant second phase in which the TANs
showed that the TAN and the DAN responses at the outcome
increased their activity and DANs decreased their activity (Fig.
epoch did not mirror each other. The large significant increase in
6b, bottom).
the second phase of the TAN response to the reward outcome was
The population average PSTH can be biased by a few neurons
coincident with only a small nonsignificant decrease in DAN
with an extreme response or opposite effects may be averaged
activity (Fig. 8b, top). Furthermore, in the aversive outcome, the
out. We therefore formulated the difference index (see Materials
second phase of the TAN response overlapped the end of the first
and Methods) as a measure of the modulations of a single neuron
phase of the DAN response (i.e., the increase in the discharge rate
to different events. We grouped responses across probabilities
of the TAN overlapped the increase in DAN activity) (Fig. 8b,
and tested whether the single-cell responses to reward and averbottom).
sive cues were different from the response to the neutral cue. We
Single-cell analysis shows that a large fraction of the cells refound that, in both TAN and DAN populations, the response
sponded to reward or aversive events (Fig. 9a). However, as in the
index (absolute deviation from the neutral response) for the recue epoch, reward probability was better encoded than air puff
ward trials was larger than the response index for aversive trials
probability in the outcome epoch as well (Fig. 9b), further show(Fig. 7a). A substantial fraction of TANs and DANs showed a
ing that TAN and DAN activity was more strongly modulated by
significant response index to reward cues, whereas only a small
expectation of reward.
number of cells had a significant response index to aversive cues
To summarize, we found a large modulation of TAN and
(Fig. 7a, inset).
DAN discharge rate after delivery of food or air puff, but with
When separating the DAN and the TAN responses into highprobability coding only for reward (hypothesis 1). We found
probability ( p ⫽ 2/3 and 1) and low-probability ( p ⫽ 1/3) cues,
larger TAN and DAN activity at reward delivery for the lowcoding of the reward probability was larger and more frequent
probability trials than for the high-probability trials. This trend
than coding of the aversive probability (Fig. 7b). A multivariate
was opposite to the trend found in the cue epoch (large response
ANOVA in which we did not group the high probability ( p ⫽ 2/3
to the large probability) as expected according to the TD hypothand 1) cues yielded similar results (data not shown). The differesis (hypothesis 2). However, contrary to the naive TD hypothesis
ence between TAN single-cell responses and the TAN population
(which predicts that activity moves from outcome to cue during
results suggests that single-cell responses had opposite trends and
conditioning), we found that many TANs and DANs encoded the
were averaged out in the population analysis.
aversive outcome, whereas only very few encoded the aversive cue
To summarize, we found larger and more frequent single-cell
(compare Figs. 8, 9 with Figs. 6, 7). Furthermore, as opposed to
modulation of TAN and DAN discharge after reward than after
the TD hypothesis, which predicts a decrease, DANs increased
aversive predicting cues (hypothesis 1). As expected from the TD
activity in response to the aversive outcome. In addition, the
hypothesis, DANs code reward probability both at level of the
multiphasic response of the TANs, and the major changes obpopulation and the single-unit responses; however, the TAN
served in the second excitatory phase of the TAN responses, does
20
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
11680 • J. Neurosci., November 5, 2008 • 28(45):11673–11684
not enable a straightforward comparison
with the TD predictions. Finally, the TAN
and DAN activity were not completely coincident, and both populations discriminated between reward and aversive trials
(hypothesis 3).
TAN but not DAN populations robustly
differentiates between omission of
rewards and omission of aversive events
In contrast to our previous study (Morris
et al., 2004), outcome omission was explicitly notified to the monkeys by a typical
sound (Fig. 1b). In the 2004 study, the responses of both DANs and TANs to the
reward omission were small. Population
analysis of the no-outcome events in the
current study showed that TANs, but not
DANs, had large modulations of their discharge rates. During this epoch, the TAN
population response (Fig. 10a) differentiated between the reward trials (food omission), the aversive trials (air puff omission), and the neutral trials (expected no
outcome). Furthermore, the suppression
of TAN activity was slightly longer after
omission of the high-probability reward
(Fig. 10a). Analysis of the population
PSTHs shows that only the average response of the TANs, but not the DANs to
the outcome omission events was significant (Fig. 10b). The multiphase TAN response was not coincident with the phases
of the insignificant DAN modulation (Fig.
10b). Finally, we did not find a significant
difference between the duration of the
DAN responses (Bayer et al., 2007) to the
outcome omission after high- and lowprobability cues (data not shown).
Single-cell analysis shows that, as in
the cue epoch, the response index of
both TANs and DANs for the reward trials was larger than the response index for
aversive trials (Fig. 11a). A substantial
fraction of TANs and DANs showed a
significant response index to reward
omission, whereas only a smaller number of cells had a significant response index to aversive omission (Fig. 11a, inset). Coding of reward probability was
larger and more frequent than coding of
the probability of the aversive events
(Fig. 11b). The difference between DAN
single-cell responses and the population
results suggests that some single-cell responses had opposite trends and were
averaged out in the population analysis.
To summarize, TAN and DAN singlecell modulations were larger for reward
than for aversive omission, with probability coding only for reward omission (hypothesis 1). As shown previously, after
outcome omission (no outcome) the
Figure 7. TAN and DAN single-cell response at cue epoch. a, Scatter plots comparing the response index of individual neurons
to reward and aversive cues. Response index was calculated for each cell (n ⫽ 180 TANs and 106 DANs) as the absolute difference
between the aversive or reward cue-aligned PSTH and the PSTH of the neutral cue. The black line is the identity (Y ⫽ X ) line. Points
below this line represent cells with a response index that is larger for the reward cues than for aversive cues. Top, TAN. Bottom,
DAN. Color code: Blue, Response index significant only for reward cues; red, response index significant only for aversive cues;
green, both response indices were significant; gray, neither response index was significant. Significance level was p ⬍ 0.05. The
time window used for this analysis was 0 –1000 ms from cue presentation. Inset, Pie chart of the fraction of cells with a significant
index for reward (blue), aversive (red), and both (green) cues of all cells with significant response index (number of responding of
total number of cells is given in the text at inset top). b, Scatter plot comparing the probability coding of individual TAN and DAN
neurons. The index was calculated as the difference between the grouped response to the high-probability ( p ⫽ 2/3 and p ⫽ 1)
and the low-probability ( p ⫽ 1/3) events. The format and color code are the same as in a. Points below the identity line represent
cells with a probability-coding index that is larger for the reward cues than for aversive cues.
Figure 8. TAN and DAN population response at outcome delivery. a, Population responses at the time of outcome (food or air
puff) and the corresponding sounds delivery. b, Comparison between the responses of TANs and DANs. The conventions are the
same as in Figure 6.
21
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11681
reflects other psychological and behavioral
processes. We found that rate modulations
of TANs and DANs to expectation of reward were larger than the modulation,
which followed predictions of aversive
events. Furthermore, these neurons encode the expectation level (or the previous
probability) of reward better than the expectation of aversive events. Finally, TAN
responses were not coincident with DAN
responses in all trial epochs. DANs encode
the difference between reward and aversive trials in the cue and outcome epoch,
whereas the TAN population encodes this
difference in the outcome and nooutcome epochs. Therefore, complementary coding of TANs and DANs expands
the encoding scope of the basal ganglia
neuromodulators.
TANs and DANs strongly encode aversive
outcome but not aversive expectations
There is no consensus regarding the responses of DANs to aversive events. Some
studies suggest that at least some of the
DANs increase their firing rate after averFigure 9. TAN and DAN single-cell response at outcome epoch. a, Scatter plots comparing the response index of individual sive outcome (for review, see Horvitz,
neurons to reward and aversive outcomes. b, Scatter plot comparing the probability-coding index of the single neuron response
2000), whereas others have evidence of a
to reward and aversive outcome. The conventions are the same as in Figure 7.
decrease (Ungless et al., 2004; Coizet et al.,
2006). The reported increase in the firing
rate of the DANs and striatal dopamine
levels to negative events might be attributable to reward generalization (Mirenowicz
and Schultz, 1996; Kakade and Dayan,
2002; Day et al., 2007). However, the
blinking and licking behavior observed
here indicate that the monkeys were able
to reliably discriminate between reward,
neutral, and aversive cues. Second, the calculation of the response index as the difference between the responses to appetitive/aversive and the neutral events
overcomes the confounding effects of generalization. Finally, we found that the neuronal response to the aversive outcome
was faster than responses to reward trials.
Thus, DAN responses to aversive
events may reflect multiple sources of
modulation (see below for error prediction encoding). This may explain some
of the inconsistencies between previous
experiments. Whereas in the behaving
Figure 10. TAN and DAN population response at no outcome. a, Population responses in trials with no food or air puff delivery. animal there can be positive modulaThe same no-outcome tone is given at time ⫽ 0. b, Comparison between the responses of TANs and DANs. The conventions are the tions of DAN discharge in response to
same as in Figure 6.
aversive events because of attention/
arousal processes, when the animal is
DANs encode the TD error weakly (hypothesis 2). The TAN and
anesthetized, only discomfort-related activity can be demonDAN activity were not coincident, and the TAN, but not the
strated (Ungless et al., 2004; Coizet et al., 2006).
DAN, population coded the difference between reward, aversive,
As for the TANs, our study confirms the pioneering works
and neutral trials robustly (hypothesis 3).
showing fast and robust TAN responses to an aversive outcome
Discussion
(Ravel et al., 1999, 2003). However, our study extends our previIn this manuscript, we have shown that DAN and TAN encoding
ous work (Morris et al., 2004) showing minimal differences beis not limited to encoding of reward prediction error but also
tween the TAN population responses to cues predicting future
22
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
11682 • J. Neurosci., November 5, 2008 • 28(45):11673–11684
rewards, to cues predicting future aversive
events, and neutral cues. However, we
found that unlike the population results
that probably represent the average of opposing effects, many single TANs differentiate between reward and neutral trials and
encode reward probability (Fig. 7). We
further found that the population TANs
encode the difference in the reward probability at outcome delivery (Fig. 8a) and
that these cells have a large response to
outcome omission (Fig. 10a). In the previous work (Morris et al., 2004), we found
no discriminative response at outcome delivery and only a low response to reward
omission. Future studies should explore
whether this lack of concordance is attributable to differences in behavioral paradigms (for example, explicit vs implicit
notification of trial termination, operant
vs classical conditioning, and only rewarding outcomes vs rewarding and aversive
outcomes) or to the animals’ behavioral
strategy and confidence in the prediction
of future outcome.
Figure 11. TAN and DAN single-cell response at no outcome. a, Scatter plots comparing the response index of individual
DANs encode more than reward
neurons in trials in which food and air puff were not delivered. b, Scatter plot comparing the probability-coding index. The
prediction errors
conventions are the same as in Figure 7.
Recent studies have shown that DAN activity encodes the mismatch between preattention/arousal levels (Horvitz, 2000; Ravel and Richmond,
diction and reality. Most of these studies have focused on the
2006; Redgrave and Gurney, 2006).
mismatch in the positive domain (i.e., when conditions are better
than expected) (Schultz et al., 1997). DANs typically increase
Asymmetric encoding of positive and negative expectations
their discharge rate in response to appetitive predictive cues and
by the basal ganglia
outcomes. In line with the predictions of reinforcement learning
Previous primate instrumental conditioning experiments in
theories, the DAN discharge decreases with omission of predicted
which DANs and TANs were recorded did not include expectarewards (Schultz et al., 1993; Fiorillo et al., 2003; Matsumoto and
tion of aversive outcome because the air puff could be avoided by
Hikosaka, 2007). However, this discharge suppression is limited
a correct response (Mirenowicz and Schultz, 1996; Yamada et al.,
because the neuronal firing rate is truncated at zero. Indeed, sev2004). The symmetric classical conditioning paradigm of this
eral groups (Morris et al., 2004; Bayer and Glimcher, 2005) have
study, which included reward predicting, aversive predicting, and
reported that the instantaneous firing of DANs does not demona neutral cue, enabled us to explore whether there was symmetry
strate incremental encoding of reward omission, and it was sugin the encoding of expectation of rewarding versus aversive
gested that omission is encoded by duration of the discharge
events by the TANs and the DANs.
decrease (Bayer et al., 2007). In this experiment, however, we
Single-cell analysis revealed that TAN and DAN encoding
failed to find any significant coding of reward omission by reof reward expectation and omission was larger and more fresponse amplitude or duration.
quent than encoding of expectation and omission of aversive
Naive reinforcement learning models categorize events as
events (Figs. 7a, 11a). Furthermore, we found that TAN and
having positive or negative errors and would suggest opposite
DAN encoding of the reward probability was larger and more
sign modulation to reward and aversive trials (Schultz et al.,
frequent than their encoding of the probability of the air puff1997). However, we found similar trends for DAN responses to
related events (Figs. 7b, 9b, 11b). The preferential activation to
predictions, outcomes, and omission of reward and aversivereward was also apparent in the population response of the
related events (Figs. 6a, 8a, 10a). In particular, we found a subDANs at the cue and outcome epoch in which the activity of
stantial increase to both reward and aversive outcome. Furtherthese cells was larger and coded the probabilities better (Figs.
6a, 8a). Thus, in line with previous studies (Mirenowicz and
more, responses of the DANs to reward omission and aversive
Schultz, 1996; Yamada et al., 2007), we show that even in a
outcome (Figs. 8a, 10a, respectively) were very different (declassical conditioning task in which the air puff is unavoidable,
crease vs increase), although in both cases there was a negative
expectation of aversive events is weakly represented in the
reinforcement error.
basal ganglia activity.
To summarize, our results reveal an increase in the complexity
of the encoding by the DANs of value. This does not rule out their
role in the temporal difference hypothesis. On the contrary, our
TANs do not mirror the DAN responses
working hypothesis holds that the discharge rate of DANs and
The anatomical demonstration of dopaminergic innervations of
TANs reflects changes in reward prediction as well as changes in
striatal cholinergic interneurons (Lehmann and Langer, 1983)
23
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11683
ing mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci 10:1020 –1028.
Fiorillo CD, Tobler PN, Schultz W (2003) Discrete coding of reward
probability and uncertainty by dopamine neurons. Science
299:1898 –1902.
Frank MJ, Seeberger LC, O’reilly RC (2004) By carrot or by stick: cognitive
reinforcement learning in parkinsonism. Science 306:1940 –1943.
Gourévitch B, Eggermont JJ (2007) A simple indicator of nonstationarity of
firing rate in spike trains. J Neurosci Methods 163:181–187.
Graybiel AM, Aosaki T, Flaherty AW, Kimura M (1994) The basal ganglia
and adaptive motor control. Science 265:1826 –1831.
Guarraci FA, Kapp BS (1999) An electrophysiological characterization
of ventral tegmental area dopaminergic neurons during differential
pavlovian fear conditioning in the awake rabbit. Behav Brain Res
99:169 –179.
Gurney K, Prescott TJ, Wickens JR, Redgrave P (2004) Computational
models of the basal ganglia: from robots to membranes. Trends Neurosci
27:453– 459.
Horvitz JC (2000) Mesolimbocortical and nigrostriatal dopamine responses
to salient non-reward events. Neuroscience 96:651– 656.
Joshua M, Elias S, Levine O, Bergman H (2007) Quantifying the isolation
quality of extracellularly recorded action potentials. J Neurosci Methods
163:267–282.
Kakade S, Dayan P (2002) Dopamine: generalization and bonuses. Neural
Netw 15:549 –559.
Lau B, Glimcher PW (2007) Action and outcome encoding in the primate
caudate nucleus. J Neurosci 27:14502–14514.
Lau B, Glimcher PW (2008) Value representations in the primate striatum
during matching behavior. Neuron 58:451– 463.
Lehmann J, Langer SZ (1983) The striatal cholinergic interneuron: synaptic
target of dopaminergic terminals? Neuroscience 10:1105–1120.
Martin RF, Bowden DM (2000) Primate brain maps: structure of the macaque brain. Amsterdam: Elsevier Science.
Matsui T, Koyano KW, Koyama M, Nakahara K, Takeda M, Ohashi Y, Naya
Y, Miyashita Y (2007) MRI-based localization of electrophysiological
recording sites within the cerebral cortex at single-voxel accuracy. Nat
Methods 4:161–168.
Matsumoto M, Hikosaka O (2007) Lateral habenula as a source of negative
reward signals in dopamine neurons. Nature 447:1111–1115.
Matsumoto N, Minamimoto T, Graybiel AM, Kimura M (2001) Neurons in
the thalamic CM-Pf complex supply striatal neurons with information
about behaviorally significant sensory events. J Neurophysiol
85:960 –976.
Mirenowicz J, Schultz W (1996) Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature
379:449 – 451.
Moran A, Bar-Gad I, Bergman H, Israel Z (2006) Real-time refinement of
subthalamic nucleus targeting using Bayesian decision-making on the
root mean square measure. Mov Disord 21:1425–1431.
Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H (2004) Coincident but
distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43:133–143.
Ravel S, Richmond BJ (2006) Dopamine neuronal responses in monkeys
performing visually cued reward schedules. Eur J Neurosci
24:277–290.
Ravel S, Legallet E, Apicella P (1999) Tonically active neurons in the monkey
striatum do not preferentially respond to appetitive stimuli. Exp Brain
Res 128:531–534.
Ravel S, Legallet E, Apicella P (2003) Responses of tonically active neurons
in the monkey striatum discriminate between motivationally opposing
stimuli. J Neurosci 23:8489 – 8497.
Redgrave P, Gurney K (2006) The short-latency dopamine signal: a role in
discovering novel actions? Nat Rev Neurosci 7:967–975.
Reynolds JN, Hyland BI, Wickens JR (2001) A cellular mechanism of
reward-related learning. Nature 413:67–70.
Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80:1–27.
Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine
neurons to reward and conditioned stimuli during successive steps of
learning a delayed response task. J Neurosci 13:900 –913.
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction
and reward. Science 275:1593–1599.
and the suppression of acetylcholine efflux from striatal slice by
dopamine (Stoof et al., 1992) suggest that DANs directly inhibit
the TANs (Wang et al., 2006). TANs might mediate the dopaminergic message to the D1 and D2 dopamine receptor containing
striatal projection neurons.
The opposite and coincident responses of the TANs and
DANs to predictive cues (Fig. 6) support direct inhibition. However, TAN responses at the terminal stage of the trial (Figs. 8b,
10b) include major positive deflections that do not mirror any
phase of the dopaminergic response. Notably, after outcome
omission, DANs respond similarly to the neutral outcome, reward, and air puff omissions, whereas the TANs robustly discriminate between the three events (Fig. 10). Thus, DANs may
better encode the cue predicting events and the TANs may provide more information at the completion of the trial. This is
consistent with the findings of subpopulations of striatal projection neurons with selective evaluative encoding of trial results
(Lau and Glimcher, 2007, 2008). In any case, these differential
responses indicate that the TAN discharge is not totally governed
by its dopaminergic inputs; neither are the TANs and DANs
driven by a common source (Matsumoto et al., 2001) with opposite effects on the two systems.
Concluding remarks
In this study, we showed that the dopaminergic and the cholinergic neuromodulators of the basal ganglia encode the positive
domain of behavior in a nonredundant manner. This asymmetric
encoding of behavior suggests that the basal ganglia collaborate
with other neuronal systems to shape the animal’s response to
diverse environmental events. The characteristics and interactions of these different neuronal systems may provide the basis
for asymmetric, irrational human attitudes toward rewarding
and aversive events (Tversky and Kahneman, 1981). Finally, the
stronger involvement of the basal ganglia in positive reinforcement learning is congruent with the findings that parkinsonian
patients are better at learning to avoid choices that lead to negative outcomes than they are at learning from positive outcomes
(Frank et al., 2004).
References
Aebischer P, Schultz W (1984) The activity of pars compacta neurons of the
monkey substantia nigra is depressed by apomorphine. Neurosci Lett
50:25–29.
Arbuthnott GW, Wickens J (2007) Space, time and dopamine. Trends Neurosci 30:62– 69.
Barbeau A (1962) The pathogenesis of Parkinson’s disease: a new hypothesis. Can Med Assoc J 87:802– 807.
Bar-Gad I, Bergman H (2001) Stepping out of the box: information processing in the neural networks of the basal ganglia. Curr Opin Neurobiol
11:689 – 695.
Bayer HM, Glimcher PW (2005) Midbrain dopamine neurons encode a
quantitative reward prediction error signal. Neuron 47:129 –141.
Bayer HM, Lau B, Glimcher PW (2007) Statistics of midbrain dopamine
neuron spike trains in the awake primate. J Neurophysiol
98:1428 –1439.
Berntson GG, Bigger JT Jr, Eckberg DL, Grossman P, Kaufmann PG, Malik
M, Nagaraja HN, Porges SW, Saul JP, Stone PH, van der Molen MW
(1997) Heart rate variability: origins, methods, and interpretive caveats.
Psychophysiology 34:623– 648.
Calabresi P, Centonze D, Gubellini P, Pisani A, Bernardi G (2000)
Acetylcholine-mediated modulation of striatal function. Trends Neurosci
23:120 –126.
Coizet V, Dommett EJ, Redgrave P, Overton PG (2006) Nociceptive responses of midbrain dopaminergic neurones are modulated by the superior colliculus in the rat. Neuroscience 139:1479 –1493.
Day JJ, Roitman MF, Wightman RM, Carelli RM (2007) Associative learn-
24
Results I
Joshua et al. • Value Encoding by Basal Ganglia Critics
11684 • J. Neurosci., November 5, 2008 • 28(45):11673–11684
neurons in the ventral tegmental area by aversive stimuli. Science
303:2040 –2042.
Wang Z, Kai L, Day M, Ronesi J, Yin HH, Ding J, Tkatch T, Lovinger DM,
Surmeier DJ (2006) Dopaminergic control of corticostriatal long-term
synaptic depression in medium spiny neurons is mediated by cholinergic
interneurons. Neuron 50:443– 452.
Yamada H, Matsumoto N, Kimura M (2004) Tonically active neurons in the
primate caudate nucleus and putamen differentially encode instructed
motivational outcomes of action. J Neurosci 24:3500 –3510.
Yamada H, Matsumoto N, Kimura M (2007) History- and current
instruction-based coding of forthcoming behavioral outcomes in the striatum. J Neurophysiol 98:3557–3567.
Shimo Y, Hikosaka O (2001) Role of tonically active neurons in primate caudate in reward-oriented saccadic eye movement. J Neurosci 21:7804 –7814.
Stoof JC, Drukarch B, de Boer P, Westerink BH, Groenewegen HJ (1992)
Regulation of the activity of striatal cholinergic neurons by dopamine.
Neuroscience 47:755–770.
Sutton RS, Barto AG (1998) Reinforcement learning—an introduction.
Cambridge, MA: MIT.
Szabo J, Cowan WM (1984) A stereotaxic atlas of the brain of the cynomolgus monkey (Macaca fascicularis). J Comp Neurol 222:265–300.
Tversky A, Kahneman D (1981) The framing of decisions and the psychology of choice. Science 211:453– 458.
Ungless MA, Magill PJ, Bolam JP (2004) Uniform inhibition of dopamine
25
Results II
J Neurophysiol 101: 758 –772, 2009.
First published December 3, 2008; doi:10.1152/jn.90764.2008.
Encoding of Probabilistic Rewarding and Aversive Events by Pallidal
and Nigral Neurons
Mati Joshua,1,2 Avital Adler,1,2 Boris Rosin,1 Eilon Vaadia,1,2 and Hagai Bergman1,2,3
1
Department of Physiology, The Hebrew University–Hadassah Medical School; and 2The Interdisciplinary Center for Neural Computation
and 3Eric Roland Center for Neurodegenerative Diseases, The Hebrew University, Jerusalem, Israel
Submitted 14 July 2008; accepted in final form 1 December 2008
INTRODUCTION
The neural network of the basal ganglia (BG) is commonly
viewed as two functionally related subsystems (e.g., Bar-Gad
and Bergman 2001; Gurney et al. 2004): the neuromodulator
subsystem and the main-axis subsystem. The neuromodulators
(e.g., midbrain dopaminergic neurons and cholinergic tonically
active interneurons of the striatum, DANs and TANs, respectively) control plasticity of the corticostriatal synapse (Calabresi et al. 2000; Reynolds et al. 2001). The main-axis subsystem includes connections between all neocortical areas, the
amygdala and the hippocampus and the BG input structures,
i.e., the striatum (caudate, putamen, and ventral striatum) and
the subthalamic nucleus. These project both directly and indirectly through the external segment of the globus pallidus
(GPe) to the BG output structures: the internal segment of the
globus pallidus (GPi) and the substantia nigra pars reticulata
(SNr). The GPi and SNr modify behavior through their proAddress for reprint requests and other correspondence: M. Joshua, Department of Physiology, The Hebrew University–Hadassah Medical School, POB
12272, Jerusalem 91120, Israel (E-mail: [email protected]).
758
jection to the frontal cortex (via the thalamus) and brain stem
premotor nuclei (Haber and Gdowski 2004).
Previous studies on primates have shown that BG neuromodulator activity is modulated by expectation, delivery, and
omission of rewards (Morris et al. 2004; Nakahara et al. 2004;
Ravel et al. 2001; Schultz 1998). These data have been modeled in a reinforcement framework in which the dopamine
neurons could signal prediction error (Schultz et al. 1997).
Reward modulation of the main axis has mainly been studied
at the level of the striatum (Apicella et al. 1992; Lau and
Glimcher 2007; Lauwereyns et al. 2002; Samejima et al. 2005).
Several studies have revealed discharge modulation of pallidal
and SNr neurons by reward (Gdowski et al. 2001; Handel and
Glimcher 2000; Pasquereau et al. 2007; Turner and Anderson
2005) and even by the probability of future reward (Arkadir
et al. 2004). Nevertheless, understanding the full domain of
value encoding by a neural network calls for study of neuronal
responses to expectation, delivery, and omission of predicted
aversive events as well. We recently reported that the responses of DANs and TANs of monkeys engaged in a probabilistic conditioning task involving both aversive and appetitive outcomes are biased toward the encoding of the rewarding
events (Joshua et al. 2008). The BG main axis may be affected
by other neuromodulator systems, e.g., serotonin (Daw et al.
2002; Parent et al. 1995), and thus may have a broader
encoding domain than that of the TANs and the DANs.
However, there are no studies on the responses of the primate
BG main-axis high-frequency discharge (HFD) neurons to
expectation of deterministic or probabilistic aversive events.
We therefore used the same classical conditioning paradigm
with aversive and rewarding probabilistic outcomes used in a
previous study (Joshua et al. 2008) and recorded the activity of
GPe, GPi, and SNr neurons in the same two monkeys that
served as subjects for the recording of DANs and TANs
activity. This enabled us to compare the different structures of
the main axis and these structures and the main BG neuromodulators. We limited this study to the major neuronal
population of these BG structures: the HFD neurons (DeLong
1971; Elias et al. 2007; Schultz 1986).
METHODS
All experimental protocols were performed in accordance with the
National Institutes of Health Guide for the Care and Use of Laboratory Animals and with the Hebrew University guidelines for the use
and care of laboratory animals in research, supervised by the instituThe costs of publication of this article were defrayed in part by the payment
of page charges. The article must therefore be hereby marked “advertisement”
in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
0022-3077/09 $8.00 Copyright © 2009 The American Physiological Society
26
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
Joshua M, Adler A, Rosin B, Vaadia E, Bergman H. Encoding of
probabilistic rewarding and aversive events by pallidal and nigral
neurons. J Neurophysiol 101: 758 –772, 2009. First published December 3, 2008; doi:10.1152/jn.90764.2008. Previous studies have rarely
tested whether the activity of high-frequency discharge (HFD) neurons of the basal ganglia (BG) is modulated by expectation, delivery,
and omission of aversive events. Therefore the full value domain
encoded by the BG network is still unknown. We studied the activity
of HFD neurons of the globus pallidus external segment (GPe, n ⫽
310), internal segment (GPi, n ⫽ 149), and substantia nigra pars
reticulata (SNr, n ⫽ 145) in two monkeys during a classical conditioning task with cues predicting the probability of food, neutral, or
airpuff outcomes. The responses of BG HFD neurons were longlasting and diverse with coincident increases and decreases in discharge rate. The population responses to reward-related events were
larger than the responses to aversive and neutral-related events. The
latter responses were similar, except for the responses to actual airpuff
delivery. The fraction of responding cells was larger for rewardrelated events, with better discrimination between rewarding and
aversive trials in the responses with an increase rather than a decrease
in discharge rate. GPe and GPi single units were more strongly
modulated and better reflected the probability of reward- than aversive-related events. SNr neurons were less biased toward the encoding
of the rewarding events, especially during the outcome epoch. Finally,
the latency of SNr responses to all predictive cues was shorter than the
latency of pallidal responses. These results suggest preferential activation of the BG HFD neurons by rewarding compared with aversive
events.
Results II
VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS
tional animal care and use committee. Methods are explained in detail
in a previous study (Joshua et al. 2008). Here we present a brief
summary of these methods but describe in detail methods not used in
the previous study.
Behavioral task
Recording and data acquisition
During the acquisition of the neuronal data, two experimenters (MJ
and AA) controlled the position of eight coated tungsten microelectrodes (impedance 0.2– 0.8 M⍀ at 1,000 Hz), and the real-time spike
sorting (AlphaMap, Alpha Spike Detector, Alpha-Omega Engineering) of the eight electrodes. Recorded units were subjected to off-line
quality analysis that included tests for rate stability, refractory period,
waveform isolation, and recording time. First, firing rate as a function
of time during the recording session was graphically displayed and the
largest continuous segments of stable data were selected for further
analysis. Second, cells in which ⬎0.02 of the total interspike intervals
were ⬍2 ms were excluded from the database. Third, only units with
an isolation score (Joshua et al. 2007) ⬎0.8 were included in the
database. Finally, only cells that met the above-cited inclusion criteria for
⬎20 min during the performance of the behavioral task were included in
the neural database (average 56 min and 307 trials). Table 1 provides the
statistics for the cells that were included in the analysis database.
GPe neurons were identified according to their stereotaxic coordinates (based on magnetic resonance imaging [MRI] and primate atlas
data) and their real-time physiological identification. These physiological parameters included the characteristic symmetric, narrow, and
high-amplitude spike shape; the typical firing rate and pattern (DeLong 1971); and the neuronal activity of the striatum obtained earlier
P=1/3
P=2/3
A
P=2/3
P=1/3
Cue
Cue
Outcome
Cue
1
N = 105
0.5
0.5
0
0
0
2
1
0
Time(s)
Outcome ITI
C
A1
A2/3
A1/3
N1
R1/3
R2/3
R1
1
1
Licking
(fraction of licking)
B
Blinking
(fraction eye closed)
Outcome ITI
...
Outcome
Cue
1
N = 126
0.5
0.5
2
No outcome
0
0
0
B1
Time(s)
2
1
0
Time(s)
2
No outcome
0.5
0.5
0
0
1
Time(s)
2
0.5
0
Time(s)
0
2
0
Time(s)
2
FIG. 1. Behavioral task and results. A: flow of the behavioral task. Two representative trials are shown. The outcome delivery on each trial was randomized
according to a probability associated with the cue. Cue duration ⫽ 2 s; Outcome ⫽ 0.1– 0.15 s; Intertrial Interval (ITI) ⫽ 3– 8 s. Two of 7 possible cues with
a different probability for food and airpuff are shown. Trial order and ITI length were randomized. B: fraction of trials with eyes closed (line: average; shadow:
SE) as detected by computerized eye state detection (ESD) algorithm. Trial epoch (cue, outcome: food or airpuff; no outcome: sound only) are aligned to event
onset (time ⫽ 0). Note the overlap of 0.5 s between the start of the Outcome and the No-outcome epochs and the last 0.5 s of the Cue epoch. Data were averaged
for each session (several hundred trials) and then across sessions (n ⫽ 105, number of recording sessions; 89,727, total number of trials). Color coding of trial
types is given on the right (A, Aversive; N, Neutral; R, Reward; the number is the outcome probability). B1: enlargement of the last second of the cue epoch.
C: fraction of trials with licking as computed according to an infrared reflection detector signal directed at the monkey’s mouth. Same conventions as in B (n ⫽
125 number of recording session; 113,022, total number of trials).
J Neurophysiol • VOL
101 • FEBRUARY 2009 •
27
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
Two monkeys (L and S, Macaca fascicularis, female 4 kg and male
5 kg) were introduced to seven different fractal visual cues, each
predicting the outcome in a probabilistic manner. Three cues (reward
cues) predicted a food outcome (L: 0.4 ml, 100-ms duration; S: 0.6 ml,
150-ms duration) with delivery probabilities of 1/3, 2/3, and 1; three
cues (aversive cues) predicted an airpuff outcome (100- and 150-ms
duration for L and S, respectively; 50 –70 psi; split and directed 2 cm
from each eye; Airstim System, San Diego Instruments) with delivery
probabilities of 1/3, 2/3, and 1. The seventh cue (the neutral cue) was
never followed by a food or an airpuff outcome. The full-screen cues
were presented on a 17-in. monitor (located 50 cm from the monkeys’
eyes) for 2 s and were immediately followed by an outcome (food,
airpuff) or no outcome, according to the probabilities associated with
the cue. Outcomes and outcome omissions were signaled by one of
three sounds that discriminated the three possible events: a drop of
food, an airpuff, or no outcome. Trials were followed by a variable
intertrial interval (ITI, monkey S: 3–7 s; monkey L: 4 – 8 s; Fig. 1A).
759
Results II
760
TABLE
JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN
1. The neural database
Population
Number
of Cells
GPe
L: 191
S: 119
GPi
L: 82
S: 67
SNr
L: 68
S: 77
Isolation Score
Fraction ISI ⬍2 ms
Discharge Rate,
spikes/s
Recorded Time, s
Number of
Recorded Trials
Number of Spikes/
Recorded Cell
0.95 ⫾ 0.05
[0.80–0.99]
0.96 ⫾ 0.04
[0.80–0.99]
0.96 ⫾ 0.04
[0.82–0.99]
0.94 ⫾ 0.05
[0.82–0.99]
0.95 ⫾ 0.05
[0.81–0.99]
0.96 ⫾ 0.04
[0.82–0.99]
0.0034 ⫾ 0.0042
[0–0.020]
0.0020 ⫾ 0.0038
[0–0.018]
0.0034 ⫾ 0.0037
[0–0.017]
0.0036 ⫾ 0.0046
[0–0.020]
0.0025 ⫾ 0.0042
[0–0.019]
0.0020 ⫾ 0.0040
[0–0.020]
83.3 ⫾ 21.7
[27–171]
77.6 ⫾ 25.7
[22–160]
88.1 ⫾ 21.9
[38–153]
83.2 ⫾ 18.7
[42–124]
56.5 ⫾ 26.1
[6–108]
45.1 ⫾ 17.1
[14–91.6]
3,574.0 ⫾ 2,136
[1,260–10,618]
3,188.7 ⫾ 1,591
[1,260–7,740]
3,350.0 ⫾ 1,136
[1,260–9,719]
2,572.0 ⫾ 1,136
[1,260–5,580]
3,976.0 ⫾ 2,136
[1,440–11,341]
3,062.0 ⫾ 1,136
[1,260–8,640]
304 ⫾ 187
[103–988]
341 ⫾ 198
[111–1,183]
285 ⫾ 150
[108–829]
252 ⫾ 118
[112–535]
352 ⫾ 219
[125–981]
301 ⫾ 176
[124–870]
301,810 ⫾ 207,937
[44,456–1,298,582]
246,557 ⫾ 145,823
[43,515–671,140]
291,892 ⫾ 157,110
[71,502–795,572]
212,567 ⫾ 114,868
[75,577–591,935]
238,492 ⫾ 225,442
[20,370–1,013,418]
133,548 ⫾ 91,485
[40,366–563,125]
Values are means ⫾ SD, with the range of scores in brackets. Recording statistics were calculated separately for each monkey and each neural population. The
range of the isolation score is 0 to 1. Fraction ISI ⬍2 ms is the fraction of ISIs shorter than 2 ms out of all ISIs of a cell. Recording time and number of recorded
trials represent only the part of the recording satisfying the inclusion criteria and included in the analysis database.
J Neurophysiol • VOL
At the end of the experiment the chamber and head holder of both
monkeys were removed, the skin was sutured, and following a
recovery period the monkeys were sent to a primate sanctuary (http://
monkeypark.co.il).
Statistical analysis of population responses
Responses of the HFD neurons in the GP (Arkadir et al. 2004;
Georgopoulos et al. 1983; Mink and Thach 1991b; Mitchell et al.
1987a; Turner and Anderson 2005) and SNr (Nevet et al. 2007; Sato
and Hikosaka 2002) to behavioral events are composed of either
increases or decreases in discharge rate. For this reason, responses of
BG main-axis neurons were calculated as the absolute deviation from
the baseline of the firing rate (baselineFR) and then averaged across
the population. However, this statistic does not have a natural zero
baseline. To obtain such a baseline we calculated the average of the
same statistic (i.e., absolute deviation from baseline) in the last 3 s of
the ITI when using the same number of trials as those used for the
calculation of the cell response and denoted it as baselineabs.
First, we define baselineFR as
baseline FR ⬅ mean 关psthITI_END共t兲兴
0ⱕtⱕ3
Then baselineabs is defined as
baseline abs ⬅ mean兵abs关psthITI_END共t兲 ⫺ baselineFR 兴其
0ⱕtⱕ3
Note that baselineabs calculates the mean fluctuations of the baseline
firing rate around baselineFR.
We then subtract this value from the response, i.e.
response共t兲 ⬅ abs关path共t兲 ⫺ baselineFR 兴 ⫺ baselineabs
The average population response was defined as the average of the
responses of all units (Figs. 3A, 5A, and 7A). To validate results
obtained using this statistic we divided each cell’s response into 1-ms
bins with either increases or decreases in firing rate. We then averaged
these responses separately across the populations. This analysis
yielded the same qualitative result as the former (data not shown). In
addition, we calculated the average peristimulus time histogram
(PSTH) without the absolute operation (Supplemental Fig. S1).1
Finally, some of the neurons had sustained ITI activity after reward
delivery; we analyzed the population responses to cues following
trials with no reward; however, analysis yielded the same results as
those of the whole population analysis (data not shown).
1
The online version of this article contains supplemental data.
101 • FEBRUARY 2009 •
28
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
in the same electrode trajectory to the GPe. The GPe cells can be
categorized into two subgroups (DeLong 1971): one with a highfrequency discharge rate (in this study ⬎20 Hz, HFD) and the other
with a low-frequency discharge rate (LFD). Typically the discharge of
the HFD neurons was found to be interrupted by long intervals of total
silence (Elias et al. 2007) and the LFD firing pattern usually included
short bursts with the amplitude of the spike declining along the burst.
Pallidal border cells (Bezard et al. 2001; DeLong 1971; Mitchell
et al. 1987b) were identified by their typical regular firing pattern and
broad action potentials and were excluded from the study database.
Cells were also recorded from the output structures of the basal
ganglia: the GPi and the SNr. Neurons of both structures were
identified according to their stereotaxic coordinates (based on MRI
and primate atlas data) and real-time physiological recordings. For
GPi neurons, the identification criteria constituted the depth of the
electrode, the physiological identification of border cells between the
GPe and the GPi (DeLong 1971), and the real-time assessment of
the firing pattern of the cell. SNr neurons were identified according to
the electrophysiological characteristics (narrow spike shape and high
firing rate) of the cells (DeLong et al. 1983; Schultz 1986) and the
firing characteristics of neighboring neurons and fibers (e.g., fibers of
the internal capsule, SN pars compacta (SNc) dopaminergic neurons,
and fibers of the oculomotor nerve).
We estimated the stereotaxic coordinates of the physiological
recordings within the basal ganglia nuclei by alignment of MRI scans
and the primate atlas (Martin and Bowden 2000) sections. By using
these anatomical and physiological criteria we attempted to sample all
territories of the three studied BG nuclei.
Three computerized digital video cameras recorded the monkey’s
face and upper limbs at 50 Hz. Video analysis was carried out on
custom software to identify periods when the monkeys closed their
eyes. Briefly, the monkey’s eye location was identified by a human
observer (once for a daily recording session in which the monkey’s
head was immobilized by connecting the head holder to an external
metal frame); a classification of eye states (open or closed) was made
based on the number of dark pixels in the eye area. The eye state
detection (ESD) algorithm was tested by random samples from
several recording days and found to be consistent with the judgments
of a human observer for ⬎99% of the images. Mouth movements
were monitored by an infrared reflection detector (Dr. Bouis Devices,
Karlsruhe, Germany). The infrared signal was filtered between 1 and
100 Hz by a band-pass four-pole Butterworth filter and sampled at
1.56 kHz. Based on these recordings we detected times in which the
monkeys moved their mouths by implementing a threshold-based
method. We compared mouth-movement detection with the video of
the monkeys’ faces over several recording days and found that they
were consistent.
Results II
VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS
Statistical analysis of single-unit responses
We defined the difference index between the responses of a single
cell to two events as the mean absolute difference between the
corresponding PSTHs and used resampling (bootstrap) methods to test
the significance of this index (Joshua et al. 2008). We calculated two
difference indices: the first, the response index, measures the difference between the reward or aversive event and the neutral event. The
second, the probability coding index, measured the difference between responses to the events with a high probability (P ⫽ 2/3 and 1)
of receiving an outcome and responses to the events with a low
probability (P ⫽ 1/3) of receiving the same outcome. We also
calculated the temporal evolution of the fraction of cells with significant probability discrimination. Responses were binned in nonoverlapping 100-ms bins and tested for significance (ANOVA test, P ⬍
0.01) at each time bin (Supplemental Fig. S2).
Note that the statistical significance of the response and probability
coding index analyses depends on the number of trials. In the response
index analysis we compare the reward or aversive responses to the
neutral trial response; however, there are relatively fewer neutral than
TABLE
aversive or rewarding trials. In the probability coding index analysis
we compare the high and low probabilities that are usually introduced
more often than the neutral cue (threefold more for the low-probability cue and fourfold more for the high-probability cue). Due to these
limitations we did not compare between the response and probability
coding indices but only between the same indices when the number of
trials was similar (e.g., we compared the response index for the reward
and aversive trials).
The responses of most HFD pallidal and SNr neurons to the cue
were sustained and thus the deviation from rate baseline in the
outcome and no-outcome epochs could be the result of a slow decay
from the sustained cue-related activity. We tested whether activity
after the ending of the cue (average rate in 1 s) differed significantly
(t-test, P ⬍ 0.05) from both the activity before the cue (0.5 s precue)
and from the activity at the end of the cue epoch (0.5 s before cue
ending). Cells in which both of these tests were significant and activity
did not fall between the precue and end of the cue activity were
considered to have a response that was not suspected to be due to
decay of their discharge back to baseline level.
RESULTS
Monkey behavior reflected expectation of rewarding
and aversive events
We recorded the monkeys’ behavior during performance of
a probabilistic classical conditioning task (Fig. 1A) with food
or airpuff as the rewarding and aversive outcomes, respectively. We tested how extensive conditioning (several months,
5 days/week, ⬃1,000 trials/day) affected the monkeys’ behavior by monitoring licking and blinking responses during neural
recordings (Fig. 1, B and C).
Figure 1 shows the average frequency of blinking and
licking in all trial epochs. The frequency of licking increased in
response to cues predicting food but only slightly to the
aversive and neutral cues (Fig. 1C). Similarly the monkeys’
frequency of blinking increased to cues predicting airpuff but
only slightly to reward and neutral cues (Fig. 1B). The increase
in blinking and licking during the cue epoch was maximal in
trials where the probability of outcome was 2/3 or 1 and
smaller in trials where the probability was 1/3. The frequency
of the behavioral responses to reward and aversive events was
only slightly larger for the licking versus the blinking responses. For example, at the end of the R1 (Reward, P ⫽ 1)
cue presentation the monkeys increased their licking frequency
from baseline by 40%, whereas in the A1 (Aversive, P ⫽ 1)
cue the monkeys increased their blinking frequency by 35%
(t-test, P ⫽ 0.057).
2. The fraction of cells with a significant increase or decrease in discharge response
Cue
Outcome
Reward
Aversive
No Outcome
Reward
Aversive
Reward
Aversive
Population
Inc
Dec
Inc
Dec
Inc
Dec
Inc
Dec
Inc
Dec
Inc
Dec
GPe
GPi
SNr
17.3
12.0
25.3
5.7
4.8
14.5
5.9
6.5
13.9
4.0
3.2
11.1
20.0
16.7
24.0
7.6
8.6
15.0
5.6
5.7
12.7
3.7
2.5
10.0
7.7
7.6
17.7
3.9
3.2
4.9
2.4
2.1
5.5
1.8
1.2
5.4
For each time bin (1 ms) the percentage of cells that responded to a given event by a significant (3-sigma rule) increase (Inc) or decrease (Dec) in firing rate
was calculated; this percentage was then averaged across all time bins of each of the three epochs. Epoch duration of 2,000 ms, starting at the event, was used
for the three epochs. For example, in the GPe the average percentage of cells that responded with an increase in firing rate at the cue epoch was 17.3%
(Cue–Reward–Inc–GPe entry). Note that this measure gives a smaller percentage from the overall number of responding cells since it is dependent on the fraction
of bins with a significant modulation. Each neuron might make a different contribution to this average according to the number of significant response bins in
the relevant epoch.
J Neurophysiol • VOL
101 • FEBRUARY 2009 •
29
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
To determine significant responses in the single-unit analysis we
calculated the SD of the PSTH of the last 3 s of the ITI using the
same number of trials as in the target PSTH and identified time
segments in which the response exceeded threefold the ITI SD
(3-sigma rule). A response was considered significant only if the
duration of the deviant segment was ⬎60 ms (threefold the SD of
the smoothing filter).
To obtain the number of time bins in which a cell had a
significant response to an event, we calculated the fraction of cells
that had a significant response in each 1-ms time bin after an event.
We divided these responses into increases and decreases in the
firing rate and calculated the fraction of cells that increased their
firing rate and the fraction of cells that decreased their firing rate
during the response epoch (Figs. 3B, 5B, and 7B and Table 2).
The latency of a response was defined as the first bin in which
a significant (3-sigma rule) response was detected. This conservative estimate of response latency enables comparison of the relative latencies of different neuronal populations; however, other
methods (e.g., Berenyi et al. 2007; Ritov et al. 2002) might yield
different estimates of the response latencies. For each population
we calculated the median of the response latency and the confidence interval (CI) of this median. The CI was calculated by
resampling (bootstrapping with repetitions) the latencies and recalculating the median of these surrogates. We repeated this
process 1,000 times and the 95% CI was determined as the
boundary values that included 95% of the median surrogates
(excluding 2.5% above and 2.5% below the boundaries). Calculating the CI with bias correction gave similar results.
761
Results II
JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN
The behavioral responses to food or airpuff delivery were
not dependent on their previous predictions (Fig. 1, B and C,
outcome). Food and airpuff omission, as well as the final
(no-outcome) event of the neutral trials, were indicated to the
monkeys by a “no-outcome” sound. When expected food or
airpuff was not delivered (no outcome of the P ⫽ 1/3 or P ⫽
2/3 trials) the licking and blinking frequency increased, respectively; this increase was in line with the previously instructed
probability. Licking and blinking increased slightly to the
neutral trials (Fig. 1, B and C, no outcome, green line).
Analysis of the behavioral responses indicates that the monkeys
could distinguish between aversive, reward, and neutral cues and
between the high (P ⫽ 2/3 and 1) and low (P ⫽ 1/3) outcome
reward
140
B
no outcome
R1
R2/3
R1/3
N1
A1/3
A2/3
A1
40
140
40
1
C
2
0
Spike/s
reward
cue
1
2
Time(s)
outcome
cue
aversive neutral
Spike/s
40
0
0
1
no outcome
70
40
70
40
70
40
2
0
no outcome
outcome
1
2
0
1
2
Time(s)
0
1
2
D
150
100
50
R1
150
100
50
A1
0.5 s
E
0
1
2
0
1
2
Time(s)
0
1
0.5 s
0.1 s
0.1 mV
150
100
50
0.1 mV
aversive neutral
outcome
We recorded 592 GPe, 267 GPi, and 226 SNr units during
the performance of the probabilistic conditioning task (Fig.
1A); out of these, 310 GPe, 149 GPi, and 145 SNr units passed
the quality inclusion criteria (see METHODS) and their responses
were further analyzed (Table 1).
Figure 2 shows examples of the responses of neurons form
GPe, GPi, and SNr to the 18 events of our behavioral task. The
GPe neuron in Fig. 2A had a large response in the reward-
Downloaded from jn.physiology.org on March 1, 2009
aversive neutral
cue
140
Neuronal database
reward
A
probabilities. Accordingly, we grouped the events with high
probability (P ⫽ 2/3 and P ⫽ 1) for the neural activity analysis.
Spike/s
762
2
FIG. 2. Neural activity of neurons of the globus pallidus external and internal segments (GPe and GPi, respectively) and substantia nigra pars reiculata (SNr).
A: peristimulus time histograms (PSTHs) of a single GPe cell of monkey L aligned to the trial behavioral events. The rows are separated according to the expected
outcome. First row: trials with cues that predict the delivery of food. Second row: trials with the neutral cue (a cue always followed by no outcome). Third row:
trials with cues that predict an airpuff. Columns are aligned according to the trial epoch. First column: cue presentation epoch (⫺0.5 to 2 s after cue onset). Second
column: outcome epoch (⫺0.5 to 2 s after delivery of food or airpuff). Third column: trials in which no outcome was delivered; outcome omission was signaled
to the monkey by the no-outcome sound (⫺0.5 to 2 s after sound onset). The first 0.5 s of the 2nd and 3rd columns overlaps the last 0.5 s of the left column.
Gray-level codes are marked on the middle plot (A, Aversive; N, Neutral; R, Reward; the number is the outcome probability). PSTHs were constructed by
summing activity across trials in 1-ms resolution and then smoothing with a Gaussian window (SD of 20 ms). Total number of trials in this example ⫽ 511;
isolation score ⫽ 0.98; fraction of spikes in first 2 ms of the interspike interval (ISI) histogram ⫽ 0.0007. B: same conventions as in A for a GPi neuron. Total
number of trials ⫽ 530; isolation score ⫽ 0.99, fraction of spikes in first 2 ms of the ISI histogram ⫽ 0.0002. C: same conventions as in A for a SNr neuron.
Total number of trials ⫽ 234; isolation score ⫽ 0.98, fraction of spikes in first 2 ms of the ISI histogram 0.0007. D: example of 2 raster plots from the SNr neuron
in C. Top: raster of the R1 cue. Bottom: raster of the A1 cue. Vertical black arrows mark the cue onset. E: an example of the analog data (after digital 250- to
6,000-Hz band-pass filter) for a single trial (marked by a gray horizontal arrow in D). The last row contains a magnified 0.75-s segment from the 2.5-s analog
segment above.
J Neurophysiol • VOL
101 • FEBRUARY 2009 •
30
www.jn.org
Results II
VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS
Activity was asymmetrically modulated by expectation
of aversive events and reward in the cue epoch
Figure 3 shows the population analysis of the absolute
response (deviation from the background discharge rate) of
A
GPe, GPi, and SNr neurons to the cues. The absolute population response (see METHODS) was sustained and spanned the
complete (2-s) duration of the cue epoch. The GPe and SNr
population responses to reward cues were significantly larger
than the responses to aversive and neutral cues (Fig. 3A).
Furthermore, in the beginning of the cue epoch, responses were
larger for the cues indicating a high probability of future
reward than for the low-probability cue; however, this probability-dependent difference was not observed for aversive cues.
Compared with the large differential modulation of the GPe
and SNr, the difference in GPi population response between
reward and aversive events was small and the population
response did not robustly differentiate reward probabilities
(Fig. 3A).
We used the absolute operator to examine the deviation of
the discharge rate of the BG main axis from their baseline (ITI)
discharge rate since the high-frequency tonic discharge (Table
1) of these neurons enables them to respond with both increases and decreases in their discharge rate. Absolute population analysis assumes that opposite modulations can be
detected by the nervous system (for example, due to specificity in connectivity); however, this may not be the case.
Thus we also performed the population analysis without
using absolute operator. This analysis assumes that target
structures are homogeneously innervated by neurons of the
studied structure and do not keep labeled lines for individual
neurons with increases or decreases in discharge rate. This
standard population analysis revealed the same trends of
larger responses for reward cues (Supplemental Fig. S1).
B
12
0.5
GPe
Reward
Aversive
GPe
Rwd High P
Rwd Low P
Neutral
Avr Low P
Avr High P
0
0
12
Spike/s
0
0
0.5
Decrease Increase
Fraction of cells
GPi
12
0.5
0
2
2
GPi
0.50
0.5
SNr
2
2
SNr
FIG. 3. Population response in the cue epoch. A: population
responses (average ⫾ SE) to the task cues. The PSTHs were
calculated with the absolute operator and show the mean
deviation from the background activity. Top: GPe (n ⫽ 310
neurons). Middle: GPi (n ⫽ 149). Bottom: SNr (n ⫽ 145).
Color coding: dark blue, responses to high-probability (P ⫽ 1
and P ⫽ 2/3) reward cues; light blue, low-probability (P ⫽ 1/3)
reward cue; green, neutral cue; orange, aversive low-probability cue; red, aversive high-probability cue. B: fraction of cells
with significant (3-sigma rule) modulations of firing rate in the
cue epochs. Blue, responses to all reward predicting cues; red,
responses to all aversive predicting cues. Neutral events are not
included because of their relatively lower number and to enable
inclusion of all rewarding/aversive events in the statistical tests.
The ordinate is the fraction of cells that had a significant
response at each time bin (1 ms). The values above zero are the
fraction of cells that significantly increased their firing rate; the
values below zero are the fraction of cells that significantly
decreased their firing rate.
0
0
Time(s)
2
0.5
0
Time(s)
J Neurophysiol • VOL
2
101 • FEBRUARY 2009 •
31
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
predicting cue conditions (top left) that differentiates between
reward probabilities. This neuron had a small response in the
aversive and neutral conditions (middle and bottom left). Discharge rate returned rapidly to baseline in the outcome and
no-outcome (omission) phases (middle and right columns).
Figure 2B shows a GPi neuron; with respect to the GPe
example, responses of the GPi neuron were largest in the
reward-cue conditions (top left). Unlike the neuron in Fig. 2A,
this neuron also responded to the aversive cue (bottom left);
however, this response was similar to the response to the
neutral cue (middle left). Finally, the SNr neuron in Fig. 2C
also responded to the reward cue and only slightly to the
aversive cue (left column). However, unlike the other two
neurons, this neuron responded mainly with a decrease to the
reward cue. Notably this neuron had a very large response when
reward was omitted (top right). To summarize, all the neurons
had larger responses to the reward cues than to the aversive
cue. Furthermore, reward probability, but not aversive probability, was encoded by these neurons. The neurons that did
respond to the aversive cue responded similarly to the aversive
and neutral cues. In the following text we provide further analyses
of both the population PSTH and the single-cell responses of all
recorded neurons at the three BG structures.
763
Results II
764
JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN
23
100
GPe
78
1
1
10
28
GPi
40
10
10
100
N = 75/145; 51.7%
6
SNr
100
10
1
1
100
N = 49/149; 32.9%
6
GPi
7
36
10
1
1
10
100
N = 77/145; 53.1%
9
SNr
100
21
24
48
44
10
10
1
10
100
100
GPe
20
55
100
N = 50/149; 33.6%
2
8
1
1
N = 103/310; 33.2%
100
reward
both
aversive
none
10
aversive response index
(Spike/s)
B
N = 108/310; 34.8%
7
aversive probability index
(Spike/s)
A
2008). Note that in the population PSTH analysis we found
a substantial response to the aversive cue. However, in the
next sections we show that these responses were similar to
the response to the neutral cue and thus do not reflect the
expectation of an aversive event.
The population PSTH is an average measure and therefore may
be biased by a few neurons with an extreme response and,
likewise, opposite effects may be averaged out. On the other hand,
the fractional analysis classifies the responding bins in a binary
rather than a graded way. We therefore formulated the difference
index as a measure of the modulations of a single neuron to
different events. For the response index, we grouped responses
across probabilities and tested whether single-cell responses to
reward and aversive cues were different from their response to
the neutral cue. Figure 4A shows the scatterplots comparing the
response index for the reward and aversive trials. Many of the
BG main-axis neurons had a significant reward and/or aversive
response index (GPe: 34.8%; GPi: 33.6%; and SNr: 51.7% of
the total number of recorded neurons), indicating a significant
difference between these responses and the responses to the
neutral cue. In all populations, the response index for the
reward trials of most neurons was larger than the response index for
aversive trials (Fig. 4A). A substantial fraction of the BG units
showed a significant response index for reward cues, whereas
FIG. 4. Single-cell responses in the cue epoch. A: log-log
scatterplots comparing the response index of individual neurons
to reward and aversive cues. The response index was calculated
for each cell (310 GPe, 149 GPi, and 145 SNr neurons) as the
absolute difference between the aversive or reward cue-aligned
PSTH and the PSTH of the neutral cue. The black line is the
identity (Y ⫽ X) line. Points below this line represent cells with
a response index that was larger for the reward cues than for
aversive cues. Top: GPe. Middle: GPi. Bottom: SNr. Color
code: blue, response index significant only for reward cues; red,
response index significant only for aversive cues; green, both
response indices were significant; gray, neither response index
was significant. Significance level was P ⬍ 0.05. The time
window used for this analysis was 0 –2,000 ms from cue
presentation. Inset: pie chart of the fraction of cells with a
significant index for reward (blue), aversive (red), and both
(green) cues out of all cells with a significant response index
(number of responding cells is given in the text at inset, top).
B: log-log scatterplots comparing the probability coding of
rewarding and aversive events by individual GPe, GPi, and SNr
neurons. The index was calculated as the difference between
the grouped response to the high-probability (P ⫽ 2/3 and P ⫽
1) and the low-probability (P ⫽ 1/3) events. Color code: blue,
probability coding index significant only for reward cues; red,
probability coding index significant only for aversive cues;
green, both response-indices were significant; gray, neither
response index was significant. Points below the identity line
represent cells with a probability coding index that was larger
for the reward cues than for aversive cues. Inset: pie chart of the
fraction of cells with a significant probability coding index for
reward only (blue), for aversive only (red), and for both (green)
cues out of all cells with a significant probability index.
1
1
10
100
reward response index
(Spike/s)
1
10
100
reward probability index
(Spike/s)
J Neurophysiol • VOL
101 • FEBRUARY 2009 •
32
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
This is probably due to the larger fraction of these neurons
that responded with increases rather than with decreases in
their firing rate (Fig. 3B and Table 2).
Figure 3B shows the fraction of cells that increased or
decreased their rate at each time after the cue presentation.
Unlike the population PSTH analysis, this analysis uses a
cutoff (3-sigma rule; see METHODS) for the identification of
bins with a significant deviation from the background discharge rate. In line with the population PSTH analysis, the
fraction of cells that significantly modulated their firing rate
in each of the 1-ms bins of the cue epoch was larger for the
reward cue than for the aversive cue. This difference in the
number of cells with significant responses to the reward
versus the aversive cue was larger for increases in firing rate
than for decreases (Fig. 3B and Table 2). Comparing the
patterns of response bins with the increase versus decrease
in discharge rate showed that, unlike the BG neuromodulators (Joshua et al. 2008), these opposing responses were
coincident (Fig. 3B); i.e., some of the cells increased their
firing rate whereas others decreased it at the same time.
Finally, both the population (Fig. 3A and Supplemental Fig.
S1) and the fractional analyses (Fig. 3B) showed that activity in the main axis was sustained, which contrasts with the
phasic responses of the neuromodulators (Joshua et al.
Results II
VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS
In the outcome epoch neurons responded both to food and
air puff delivery but, unlike in the cue epoch, they did not
consistently encode the probability of these events
Figure 5 shows the population PSTH and fraction of responding cells for the outcome epoch. PSTH population analysis of the outcome epoch showed that all BG main-axis
A
B
GPe
12
populations responded to both reward and aversive outcomes
(Fig. 5A). Responses in this epoch to the neutral trials (i.e.,
when no reward or airpuff was expected) were small (Fig. 5A,
green traces, and next paragraph). In the GPe and GPi the peak
response was larger for the food outcome than that for the
airpuff, whereas in the SNr the magnitude of the peak response
to aversive and reward outcomes was similar. Unlike the population cue responses, the population responses to the outcomes that
followed cues indicating different outcome probabilities were
similar and the SNr population alone showed a slight difference at food delivery time (Fig. 5A). As in the cue epoch, the
BG responses to the different outcomes contained both increases and decreases, with more cells increasing than decreasing their firing rate (Fig. 5B and Table 2) and the differences
between the average responses to reward versus aversive outcomes were due to differences in the fraction of cells responding with increases in discharge rate (Table 2).
Figure 6 shows the response index and probability coding
index analysis in the outcome epoch. The GPe and GPi responses to the reward outcome were larger and more frequent
than the responses to aversive outcome (Fig. 6A, top subplots).
However, many SNr cells responded to both food and airpuff
outcomes (Fig. 6A, bottom). Contrary to the population analysis (Fig. 5A), many GPe, GPi, and SNr cells did in fact encode
the difference between high- and low-reward probabilities
(Fig. 6B). These differences between the average population
and the single-unit analysis suggest that the absence of significant probability coding in the population analysis can be
attributed to opposite modulation effects; i.e., some cells had a
GPe
0.5
Reward
Aversive
Rwd High P
Rwd Low P
Neutral
Avr Low P
Avr High P
0
0
0
0.5
2
GPi
Spike/s
Decrease Increase
Fraction of cells
GPi
12
0.5
2
0
0
2
SNr
12
0.5
FIG. 5. Population response in outcome epoch. A: population responses at the time of outcome delivery (blue, food; red,
airpuff) and the response to the neutral noise in the trials when
no outcome was expected (green, neutral trials). The PSTHs are
calculated with the absolute operator and show mean deviation
from baseline. B: fraction of cells with significant modulations
of firing rate. Same conventions as in Fig. 3.
0
0.5
2
SNr
0
0
Time(s)
2
0.5
0
Time(s)
J Neurophysiol • VOL
2
101 • FEBRUARY 2009 •
33
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
only a small number of cells had a significant response index
for aversive cues (Fig. 4A, insets).
The probability coding index compares the difference between reward and aversive probability coding. For this purpose
we classified the cues into high-probability (P ⫽ 2/3 and 1) and
low-probability (P ⫽ 1/3) cues (in accordance with the monkeys’ behavior; Fig. 1). In Fig. 4B we show scatterplots of the
probability coding indices of three neuronal populations. In
addition to the larger reward response index (Fig. 4A), coding
of the reward probability was larger (Fig. 4B) and more
frequent (Fig. 4B, insets) than coding of the aversive probability in the three neuronal populations. Supplemental Fig. S2
shows the time course of the probability encoding (100-ms
bins, ANOVA). In most cases, a sustained encoding is seen
that is greater for the rewarding than that for the aversive trials.
In both of these difference index analyses (response index
and probability coding index) the fraction of cells with a
significant index was larger for SNr (51–53% of the cells) than
that for GPe and GPi (32–34%; Fig. 4, inset text; ␹2 test, P ⬍
0.01). The difference in the fraction of cells between the GPe
and GPi was not significant (␹2 test, P ⫽ 0.78).
765
Results II
766
JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN
A
B
N=80/310; 25.8%
10
5
100
N=180/310; 58.1%
29
GPe
GPe
100
36
10
N=87/149; 58.4%
8
GPi
100
1
1
100
10
GPi
100
20
59
100
N = 27/149; 18.1%
5
2
FIG. 6. Single-cell responses in the outcome epoch. A: loglog scatterplots comparing the response index of individual
neurons to reward and aversive outcomes. B: log-log scatterplot
comparing the probability coding index of the responses of
individual neurons to reward and aversive outcomes. Same
conventions as in Fig. 4. In this analysis we used a short time
window of 0 –1,000 ms from outcome delivery to enable better
comparison between the fast response to the aversive event and
the slower response to reward outcome.
20
10
10
1
1
10
100
N=122/145;84.1%
26
38
SNr
100
1
1
100
10
100
N=49/145; 33.8%
10
SNr
7
32
58
10
10
1
1
10
100
reward response index
(Spike/s)
1
10
100
reward probability index
(Spike/s)
larger response to the high probability, whereas others had a
larger response to the lower-probability trials. Finally, the
fraction of SNr neurons with a significant response index
(84%) was greater than the corresponding fraction of GPe and
GPi cells (58%; Fig. 6; ␹2 test, P ⬍ 0.01). The fraction of SNr
cells with probability coding indices (34%) was greater than
the corresponding fraction of GPe and GPi cells (25 and 18%,
respectively, Fig. 6B). However, this difference in fraction of
cells was significant only for the GPi (␹2 test, P ⬍ 0.01).
Encoding of reward prediction error would predict the
opposite trend in the coding of reward probability in the cue
and outcome (Fiorillo et al. 2003; Morris et al. 2004). To
probe this possibility we tested for correlations between the
difference in response to the high and low probabilities at
the cue epoch versus the difference at the outcome epoch.
For the GPe and GPi we found a small positive correlation
coefficient (CC ⫽ 0.16 and 0.34, respectively; t-test, P ⬍ 0.01);
for the SNr we found a small negative correlation that did not
reach significance (CC ⫽ ⫺0.08; P ⫽ 0.32). Thus we conclude
that HFD neurons of the main axis of the BG do not encode the
prediction error.
Neural response in the no-outcome epoch
Figure 7 shows the PSTH population and fraction of
responding cells for the no-outcome epoch. As in the cue
J Neurophysiol • VOL
epoch, population analysis in the no-outcome epoch showed
that responses to reward omission trials were larger (Fig.
7A) and more frequent (Fig. 7B) than responses to aversive
omission trials. As in the outcome epoch the difference
between the population responses to omission of high- and
low-probability outcomes was small (Fig. 7A). The population response (Fig. 7A) and the fraction of units with
significant changes in their discharge rate (Fig. 7B) to
outcome omission declined rapidly and reached the baseline
within ⬍1.5 s. This contrasts with the outcome responses
where the response did not decline to the background (ITI)
level even after 2 s (Fig. 5, A and B).
Figure 8 shows the response index and probability coding
index analysis in the no-outcome epoch. This single-cell
analysis shows that, comparable to the population analysis,
cell responses to the reward omission were larger and more
frequent than their responses to aversive omission (Fig. 8A)
and more cells encoded the a priori reward probability than
the aversive probability (Fig. 8B). The fraction of SNr cells
with a significant response index (41%) was greater than the
fraction of GPe and GPi neurons with a significant response
index (30 and 33%, respectively). This difference was
significant only for the GPe (␹2 test, P ⬍ 0.05). The fraction
of SNr cells with a significant probability coding index
(34%) was greater than the fraction of GPe and GPi neurons
101 • FEBRUARY 2009 •
34
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
aversive probability index
(Spike/s)
1
1
aversive response index
(Spike/s)
65
reward
10
both
aversive
none
115
10
Results II
VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS
A
B
GPe
12
767
GPe
0.5
Rwd High P
Rwd Low P
Neutral
Avr Low P
Avr High P
0
0
0
0.5
2
GPi
Spike/s
Decrease Increase
Fraction of cells
GPi
12
0.5
2
0
2
SNr
12
0.5
0
0.5
2
SNr
Reward
Aversive
0
0
Time(s)
2
0.5
0
Time(s)
with a significant index (22 and 19%, respectively; ␹2 test,
P ⬍ 0.05).
Activity in the outcome and no-outcome epochs did not only
reflect decay from sustained cue activity
Activity in the cue epoch is sustained and continues until
the end of the cue epoch (Fig. 3 and Supplemental Fig. S1).
Thus activity after the cue epoch (i.e., at outcome and
no-outcome epochs) could reflect a slow decay of cuerelated activity to the tonic discharge level of these neurons.
For example, the response of the GPe neuron in Fig. 2A at
the outcome epoch (Fig. 2A, top middle plot) could be
attributed to a slow decay from cue activity. A contrasting
example is the response of the SNr neuron in Fig. 2C at the
no-outcome epoch (Fig. 2C, top right plot). This response
cannot be attributed to a slow decay since it shows a clear
increase after reward omission (no outcome).
In Fig. 9 we show the percentage of cells whose activity in
the outcome/no-outcome epochs was significantly different
from the precue activity and the percentage of cells from these
groups in which activity did not reflect decay (see METHODS).
We found that many of the responses to the reward outcome
could not be attributed to slow decay of the sustained cue
activity (Fig. 9A, black bars; GPe: 28%; GPi: 22%; and SNr:
40% out of the whole population). The number of responses
(that cannot be attributed to decay of cue activity) to aversive
outcome was smaller than the number of responses to reward
outcome (Fig. 9A, gray bars; GPe: 4%; GPi: 6%; and SNr:
J Neurophysiol • VOL
2
20%). Very few GPe and GPi cells responded to reward
omission itself (Fig. 9B, black bars; GPe: 9%, GPi: 6%);
however, in the SNr a larger fraction of cells responded (decay
excluded) to reward omission (Fig. 9B, black bar; SNr: 21%).
In all the structures the number of cells that responded (decay
excluded) to aversive omission was smaller than the fraction of
cells that responded to reward omission (Fig. 9B, white bar;
GPe: 1%; GPi: 1%; and SNr: 8%).
In summary, we found that activity in the outcome/nooutcome epoch did not only reflect the decay from sustained
cue-related activity and that BG HFD cells clearly encode
outcome and no-outcome events. Note that the fraction of cells
of which we could rule out the possibility of decay from
sustained activity is a lower limit of the actual number of
responding cells. This is because our method for testing the null
hypothesis—that activity is not due to decay—is very conservative (i.e., the discharge at the outcome or the no-outcome epoch
may fall between the ITI and the end of cue discharge level and
still reflect a valid response to the outcome or no-outcome events).
Other methods that include interpolation of the whole temporal
pattern of the response may report a larger number of responding
cells to the outcome and no-outcome events.
SNr neurons responded with shorter latencies than those
of GPe and GPi neurons
Figure 10 shows the analysis of the response latency to the
reward and aversive cues. The latency of SNr responses was
significantly shorter than the responses of the GPe and GPi
101 • FEBRUARY 2009 •
35
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
0
FIG. 7. Population response in no-outcome epoch. A: population responses in trials with no food or airpuff delivery. Mean
deviations from background calculated with the absolute operator are shown. The same no-outcome tone was given at time ⫽
0 in all trials. B: fraction of cells with significant modulations of
firing rate. Same conventions as in Fig. 3.
Results II
768
JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN
N=95/310; 30.6%
10
A
100
2
18
reward
both
aversive
none
67
10
1
1
100
N = 50/149; 33.6%
4
5
41
10
1
26
10
1
1
10
100
N = 50/145; 34.5%
11
SNr
SNr
100
16
3
36
41
10
1
1
10
1
1
100
reward response index
(Spike/s)
10
100
reward probability index
(Spike/s)
(Fig. 10A; Mann–Whitney, P ⬍ 0.001). No difference between
the GPe and GPi populations was found (P ⫽ 0.93). We
grouped the responses to reward and aversive cues and the
increase and decrease responses since we did not find any
significant difference between these parameters (Fig. 10, B
and C). Although not significant, the GPi decrease response
tended to be earlier than the increase response (Turner and
Anderson 1997).
We did see similar trends in the responses in the outcome
and the no- outcome epochs. However, the persistent but
nevertheless nonsteady activity of the BG neurons during
the cue response (Fig. 3A) prevented us from establishing a
reliable baseline for testing the outcome epoch responses
Outcome
0.6
0.4
0.2
0
GPe
GPi
SNr
and thus we carried out only the latency analysis for the cue
epoch.
DISCUSSION
In this report we extended our previous study (Joshua
et al. 2008) to the study of the responses of BG main-axis
HFD neurons to expectation, delivery, and omission of
appetitive (food), aversive (airpuff), and neutral (sound
only) events. We found that the responses of GPe, GPi, and
SNr neurons were longer in duration and less stereotypic than
the responses of the main BG neuromodulators (TANs and
DANs). As with the TANs and DANs, the responses of the BG
No Outcome
B
Fraction of cells
Fraction of cells
A
FIG. 8. Single-cell responses in the no-outcome epoch.
A: log-log scatterplots comparing the response index of individual neurons in trials in which food and airpuff were not
delivered. B: scatterplot comparing the probability coding index
to food and airpuff omissions. Since P ⫽ 1 trials were never
omitted, high-probability trials include only P ⫽ 2/3 trials.
Same conventions as in Fig. 4 and same time window as in Fig.
6 (0 –1,000 ms from end of cue epoch and the onset of the
omission sound).
Reward non decay
Aversive non decay
Decay not excluded
0.6
0.4
0.2
0
J Neurophysiol • VOL
GPe
GPi
101 • FEBRUARY 2009 •
36
SNr
www.jn.org
FIG. 9. Fraction of cells responding at
the end stage of the trials. A: black: fraction
of cells that responded to the reward outcome itself and their outcome activity does
not reflect decay from reward cue response;
gray: fraction of cells that responded to the
aversive outcome itself; activity does not
reflect decay from aversive cue response.
White bars: total fraction of cells with a
significant difference between the precue
and the outcome epoch (reward above black;
aversive above gray) in which the possibility
of decay activity was not excluded. B: same
as B for the no-outcome epoch.
Downloaded from jn.physiology.org on March 1, 2009
100
N=60/145; 41.4%
3
100
100
GPi
7
100
100
10
10
N = 34/149; 22.8%
GPi
1
1
45
10
aversive probability index
(Spike/s)
1
1
aversive response index
(Spike/s)
GPe
100
10
10
N=59/310; 19.0%
12
B
GPe
Results II
VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS
B
GPe
GPi
SNr
0.75
Reward
Aversive
500
250
0
C
0.5
0.25
0
100
GPe GPi SNr
500
Time(ms)
Fraction of cells
1
Time(ms)
A
Increase
Decrease
250
Time(ms) 1000
0
GPe GPi SNr
Neural responses were larger for the reward than for the
aversive trials
We found preferential activation to reward versus aversive
events. One possible explanation for this asymmetric neural
activity is that the asymmetry arises from differences in the
relative value of the rewarding/aversive stimuli that we used.
An alternative possibility is that the encoding of reward/
aversion expectation is asymmetric in the BG. We find the
second possibility more likely since the population responses
to the aversive predicting cue and to the neutral cue were
remarkably similar (Fig. 3) and very few cells encoded the cue
predicting airpuff (Fig. 4). It could be argued that the monkeys
ignored the air puff; we have shown this is not the case since
there were large behavioral responses to cues predicting the
airpuff (Fig. 1). In a previous experiment (Mirenowicz and
Schultz 1996), in which the subjective values were calibrated,
the authors compared a reward of 0.15 ml of juice and an
aversive 28- to 58-psi airpuff directed to the hand. Similar
airpuff intensities have been used in other studies comparing
the responses of amygdala neurons (40- to 60-psi airpuff vs.
0.1– 0.9 ml of liquid food) (Belova et al. 2007; Paton et al.
2006) and lateral prefrontal cortex neurons (29-psi airpuff, 10
cm from the monkey’s face) (Kobayashi et al. 2006) to both
rewarding and aversive events. The airpuff in the current
experiment was larger (50 –70 psi) and delivered 2 cm from the
monkey’s eyes. Thus this larger and closer airpuff must have
had a negative subjective value. We further discuss the possibility of asymmetric encoding in the following text.
Preferential control over reward-related behavior
We have shown that just before the end of the cue, the
fraction of trials in which the monkey licked in expectation of
future reward and the fraction of trials in which the monkey
blinked in expectation of future airpuff were similar in magnitude. In addition we found a large blinking response even
when the airpuff was omitted (Fig. 1C). Finally, with the
J Neurophysiol • VOL
exception of the outcome epoch, the licking and the blinking
behaviors reflected the expected (low vs. high) probability of
the reward and the aversive events. Nevertheless, the BG
single-cell activity was found to be biased toward the encoding
of reward-related events and encoding of aversive events was
very weak. This difference in activity may be compensated by
the difference in synaptic connections between the BG and
their targets; however, such differences have yet to be described. Several studies have used similar paradigms to compare neural responses to reward food and aversive airpuff
(Kobayashi et al. 2006; Mirenowicz and Schultz 1996; Paton
et al. 2006). Paton et al. (2006) showed that in the amygdala,
expectations of food and airpuff are represented symmetrically.
Our research shows that, in contrast to the amygdala, food and
airpuff expectations are represented asymmetrically in the
basal ganglia. Thus we found comparable aversive- and reward-related behaviors; however, whereas the activity in the
basal ganglia strongly reflects reward behavior and encodes
probability, aversive-related events and their probability are
only weakly encoded in basal ganglia activity.
Although we found similarity in the behavioral responses
(Fig. 1, B and C), in this study we did not calibrate the
subjective value (utility) of food versus airpuff; however, we
did manipulate the expectation of aversive outcome. In previous instrumental conditioning experiments, including both reward and aversive events, the monkey could avoid the aversive
airpuff by a correct response (Mirenowicz and Schultz 1996;
Yamada et al. 2004, 2007). In the current experiment the
airpuff was unavoidable and thus the aversive cue led to direct
expectation of aversion.
In a previous study (Joshua et al. 2008), we reported that the
responses of midbrain DANs and striatal TANs (of the same
monkeys engaged in the same behavioral task) are biased
toward the encoding of rewarding events. The BG main axis is
affected by additional neuromodulator systems, e.g., serotonin
(Lavoie and Parent 1990). Theoretical studies have suggested
that the phasic serotonin signal might report the prediction
error for future punishment (Daw et al. 2002; Dayan and Huys
2008) and therefore could compensate for the biased encoding
of the value domain by the TANs and the DANs. The current
study of the BG output structures indicates that the BG mainaxis neurons have a bias toward control of reward-related
behavior similar to that of TANs and DANs. Thus even if there
are BG modulators other than the cholinergic and dopaminer-
101 • FEBRUARY 2009 •
37
FIG. 10. Response latency to cue. A: cumulative response latency distribution. Fraction of cells revealing significant response vs.
time after cue presentation. The faster response
of the SNr is represented by the faster increase
of the cumulative sum and by the early crossing of the median (0.5) horizontal line. Timescale (abscissa) is shown in log scale and starts
at 50 ms after cue onset. Gray coding: solid
light gray, GPe; dashed gray, GPi; solid black,
SNr. B: bar plot of the median and 95% confidence interval of the response latency to reward (white) and aversive cues (gray). Confidence intervals were calculated using bootstrap
methods; since distributions are not symmetric
upper and lower limits may be uneven. C: bar
plot of the median and 95% confidence interval
of the response latency of responses with increases (white) and decreases (gray) in firing rate.
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
main-axis neurons were larger and usually encoded reward
better than aversive-related events. We found substantial differences between the three populations of BG main-axis neurons. Most notably, SNr responses were more frequent, had
shorter latencies, and encoded the airpuff delivery better than
the corresponding responses of GPe and GPi neurons.
769
Results II
770
JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN
gic striatal inputs, the activity of BG output neurons follows
the same trend as that of the TANs and DANs and is biased
toward rewarding events. We therefore suggest that the other
modulators do not extend the basal ganglia encoding to aversive events and that there are neuronal systems other than the
BG that have control over aversive-related behavior.
BG main-axis responses were long-lasting and diverse
Different response characteristics of the main-axis nuclei
In this study we found several major differences between
the GPe, GPi, and the SNr. We found more intense changes
in the responses of the SNr compared with the responses of
the GPe and the GPi. SNr neurons responded with shorter
latencies to the cue (Fig. 10A) and encode the airpuff outcome
better than the pallidal neurons (Figs. 5 and 6). A simple
explanation for the enhanced encoding is the orofacial (licking
and blinking) motor behavior of the monkeys in this experiment. Initial studies emphasized the role of the SNr in the
control of orofacial movements (DeLong et al. 1983; Hikosaka
J Neurophysiol • VOL
Concluding remarks
In this study we extend our previous work on BG neuromodulators (Joshua et al. 2008). We found a similar bias of
GPe, GPi, SNr, TANs, and DANs for the encoding of
expectation of rewarding versus aversive events. Thus the
BG main axis may mainly reflect the teaching message (and
corticostriatal plasticity control) of the TANs and DANs and
may not be significantly affected by additional modulators
with broader or different messages. Our results show a
complex and different encoding by GPe, GPi, and SNr
neurons. Moreover, they indicate a different encoding by
GPi and SNr neurons and therefore suggest that there are
many functional differences between these two BG output
nuclei, despite their similar biochemical and physiological
characteristics. Future models and studies of the computational physiology of the basal ganglia and their disorders
should therefore attempt to disentangle the different functions of GPi and SNr.
ACKNOWLEDGMENTS
We thank Dr. Bryon Gomberg for MRI; M. Levi and M. Rivlin for help in
preparing the experimental setup; Y. Renernt and I. Finkes for monkey training
and general assistance; and G. Schoenbaum, Y. Shaham, and G. Morris for
critical reading of earlier versions of this manuscript.
101 • FEBRUARY 2009 •
38
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
In contrast to the short (⬍0.7 s) responses of the BG
modulators (Apicella 2007; Joshua et al. 2008; Morris et al.
2004; Schultz 1998), the responses of the BG main-axis HFD
neurons lasted throughout the 2-s-cue epoch. This is in line
with previous descriptions of pallidal (Arkadir et al. 2004) and
SNr (Wichmann and Kliem 2004) responses. Long-duration,
set-related responses have frequently been described in the
cortex (Fuster 1999; Miyashita 1988; Wise and Kurata 1989),
where they have been attributed to short-term memory or
action-preparation processes. We cannot rule out similar processes in the basal ganglia and the experimental design does
not allow us to dissociate set-related versus cue-evoked responses. However, the encoding of probability by the BG
main-axis neurons (Figs. 3 and 4) and the dissociation between
actions and neural response (for example, no neural encoding
of the probability of aversive trials, the early decay of the
neural activity compared with licking behavior after reward
delivery) suggests that the activity of these neurons may
encode the value of the current state or state–action pairs (Lau
and Glimcher 2007; Samejima et al. 2005).
The tonic discharge rate of the HFD neurons (population
average: 45.1– 88.1 spikes/s in this study) endows them with a
better dynamic range for responses with a decrease in discharge rate. Nevertheless, consistent with many previous studies (Georgopoulos et al. 1983; Mink and Thach 1991a; Mitchell et al. 1987a; Turner and Anderson 1997) we found that the
BG HFD neurons respond to behavioral events more frequently
with increases than with decreases in discharge rate. The
latencies and the temporal distribution of the responses with
increases and decreases in discharge rate were similar (Figs.
3B, 5B, 7B, and 10C), thus leading to highly diverse BG
encoding, with different polarities and different amplitudes of
the responses. The differences between the population responses with no encoding of the a priori probability of outcome
(Fig. 5) versus the single-unit encoding of this probability (Fig.
6) are in line with such a balanced diversity of the responses of
BG single units. These diverse responses augment the information capacity of the BG output structure (Bar-Gad et al.
2003).
and Wurtz 1983). Although this separation is not clear-cut
(DeLong et al. 1985; Wichmann and Kliem 2004) our results
may reflect this organization. Thus the small and less-frequent
responses in the GPi could reflect the smaller representation of
orofacial movements in the GPi. This could be also the reason
for the activation of the SNr to aversive events, but as noted
earlier this does not explain the asymmetric value representation in the SNr.
At the circuitry level, one possibility is that the origins of the
difference in pallidal versus SNr responses could be a result of
different projections from the striatum or the subthalamic
nucleus (Haber and Gdowski 2004). Another possibility is that
the GPe has different pathways to the GPi and SNr and those
GPe neurons that do project to the SNr are the neurons with the
short latency and larger response. Nevertheless, we did not find
any topographic organization in the responses of the GPe that
supports this hypothesis (data not shown). Finally, another
putative explanation for the differences between the GPi and
the SNr is the direct effects of somatodendritic release of
dopamine on SNr, but not on pallidal, neurons. The similar
latencies of SNc and SNr responses support the hypothesis that
SNc neurons may drive SNr responses by somatodentritic
release of dopamine (Cragg et al. 2001; Windels and Kiyatkin
2006).
Finally, the neural recordings were made after the monkey
was highly familiar with the task and thus activity might not be
the same as activity that occurs during learning. Previous
studies of dopaminergic neurons have shown that activity in a
familiar probabilistic task does resemble the activity in a
learning task (Fiorillo et al. 2003; Hollerman and Schultz 1998;
Morris et al. 2004). A functional MRI study has shown that
striatal activity underlies novelty-based choice in humans
(Wittmann et al. 2008). Whether this is the case for other
basal ganglia populations and the single-cell activity that
underlies novelty representation should be investigated by
future studies.
Results II
VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS
GRANTS
This study was partly supported by a “Fighting against Parkinson” Grant
from the Hebrew University Netherlands Association and a Max Vorst Family
Foundation grant.
REFERENCES
J Neurophysiol • VOL
Hikosaka O, Wurtz RH. Visual and oculomotor functions of monkey substantia nigra pars reticulata. II. Visual responses related to fixation of gaze.
J Neurophysiol 49: 1254 –1267, 1983.
Hollerman JR, Schultz W. Dopamine neurons report an error in the temporal
prediction of reward during learning. Nat Neurosci 1: 304 –309, 1998.
Joshua M, Adler A, Mitelman R, Vaadia E, Bergman H. Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference
between reward and aversive events at different epochs of probabilistic
classical conditioning trials. J Neurosci 28: 11673–11684, 2008.
Joshua M, Elias S, Levine O, Bergman H. Quantifying the isolation quality
of extracellularly recorded action potentials. J Neurosci Methods 163:
267–282, 2007.
Kobayashi S, Nomoto K, Watanabe M, Hikosaka O, Schultz W, Sakagami
M. Influences of rewarding and aversive outcomes on activity in macaque
lateral prefrontal cortex. Neuron 51: 861– 870, 2006.
Lau B, Glimcher PW. Action and outcome encoding in the primate caudate
nucleus. J Neurosci 27: 14502–14514, 2007.
Lauwereyns J, Watanabe K, Coe B, Hikosaka O. A neural correlate of
response bias in monkey caudate nucleus. Nature 418: 413– 417, 2002.
Lavoie B, Parent A. Immunohistochemical study of the serotoninergic innervation of the basal ganglia in the squirrel monkey. J Comp Neurol 299:
1–16, 1990.
Martin RF, Bowden DM. Primate Brain Maps: Structure of the Macaque
Brain. Amsterdam: Elsevier Science, 2000.
Mink JW, Thach WT. Basal ganglia motor control. I. Nonexclusive relation
of pallidal discharge to five movement modes. J Neurophysiol 65: 273–300,
1991a.
Mink JW, Thach WT. Basal ganglia motor control. II. Late pallidal timing
relative to movement onset and inconsistent pallidal coding of movement
parameters. J Neurophysiol 65: 301–329, 1991b.
Mirenowicz J, Schultz W. Preferential activation of midbrain dopamine
neurons by appetitive rather than aversive stimuli. Nature 379: 449 – 451,
1996.
Mitchell SJ, Richardson RT, Baker FH, DeLong MR. The primate globus
pallidus: neuronal activity related to direction of movement. Exp Brain Res
68: 491–505, 1987a.
Mitchell SJ, Richardson RT, Baker FH, DeLong MR. The primate nucleus
basalis of Meynert: neuronal activity related to a visuomotor tracking task.
Exp Brain Res 68: 506 –515, 1987b.
Miyashita Y. Neuronal correlate of visual associative long-term memory in
the primate temporal cortex. Nature 335: 817– 820, 1988.
Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. Coincident but
distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43: 133–143, 2004.
Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O. Dopamine
neurons can represent context-dependent prediction error. Neuron 41: 269 –
280, 2004.
Nevet A, Morris G, Saban G, Arkadir D, Bergman H. Lack of spike-count
and spike-time correlations in the substantia nigra reticulata despite overlap
of neural responses. J Neurophysiol 98: 2232–2243, 2007.
Parent A, Cote PY, Lavoie B. Chemical anatomy of primate basal ganglia.
Prog Neurobiol 46: 131–197, 1995.
Pasquereau B, Nadjar A, Arkadir D, Bezard E, Goillandeau M, Bioulac B,
Gross CE, Boraud T. Shaping of motor responses by incentive values
through the basal ganglia. J Neurosci 27: 1176 –1183, 2007.
Paton JJ, Belova MA, Morrison SE, Salzman CD. The primate amygdala
represents the positive and negative value of visual stimuli during learning.
Nature 439: 865– 870, 2006.
Ravel S, Sardo P, Legallet E, Apicella P. Reward unpredictability inside
and outside of a task context as a determinant of the responses of
tonically active neurons in the monkey striatum. J Neurosci 21: 5730 –
5739, 2001.
Reynolds JN, Hyland BI, Wickens JR. A cellular mechanism of rewardrelated learning. Nature 413: 67–70, 2001.
Ritov Y, Raz A, Bergman H. Detection of onset of neuronal activity by
allowing for heterogeneity in the change points. J Neurosci Methods 122:
25– 42, 2002.
Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific
reward values in the striatum. Science 310: 1337–1340, 2005.
Sato M, Hikosaka O. Role of primate substantia nigra pars reticulata in
reward-oriented saccadic eye movement. J Neurosci 22: 2363–2373,
2002.
101 • FEBRUARY 2009 •
39
www.jn.org
Downloaded from jn.physiology.org on March 1, 2009
Apicella P. Leading tonically active neurons of the striatum from reward
detection to context recognition. Trends Neurosci 30: 299 –306, 2007.
Apicella P, Scarnati E, Ljungberg T, Schultz W. Neuronal activity in
monkey striatum related to the expectation of predictable environmental
events. J Neurophysiol 68: 945–960, 1992.
Arkadir D, Morris G, Vaadia E, Bergman H. Independent coding of
movement direction and reward prediction by single pallidal neurons.
J Neurosci 24: 10047–10056, 2004.
Bar-Gad I, Bergman H. Stepping out of the box: information processing in
the neural networks of the basal ganglia. Curr Opin Neurobiol 11: 689 – 695,
2001.
Bar-Gad I, Morris G, Bergman H. Information processing, dimensionality
reduction and reinforcement learning in the basal ganglia. Prog Neurobiol
71: 439 – 473, 2003.
Belova MA, Paton JJ, Morrison SE, Salzman CD. Expectation modulates
neural responses to pleasant and aversive stimuli in primate amygdala.
Neuron 55: 970 –984, 2007.
Berenyi A, Benedek G, Nagy A. Double sliding-window technique: a new
method to calculate the neuronal response onset latency. Brain Res 1178:
141–148, 2007.
Bezard E, Boraud T, Chalon S, Brotchie JM, Guilloteau D, Gross CE.
Pallidal border cells: an anatomical and electrophysiological study in the
1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine-treated monkey. Neuroscience 103: 117–123, 2001.
Calabresi P, Centonze D, Gubellini P, Pisani A, Bernardi G. Acetylcholinemediated modulation of striatal function. Trends Neurosci 23: 120 –126,
2000.
Cragg SJ, Nicholson C, Kume-Kick J, Tao L, Rice ME. Dopaminemediated volume transmission in midbrain is regulated by distinct extracellular geometry and uptake. J Neurophysiol 85: 1761–1771, 2001.
Daw ND, Kakade S, Dayan P. Opponent interactions between serotonin and
dopamine. Neural Networks 15: 603– 616, 2002.
Dayan P, Huys QJ. Serotonin, inhibition, and negative mood. PLoS Comput
Biol 4: e4, 2008.
DeLong MR. Activity of pallidal neurons during movement. J Neurophysiol
34: 414 – 427, 1971.
DeLong MR, Crutcher MD, Georgopoulos AP. Relations between movement and single cell discharge in the substantia nigra of the behaving
monkey. J Neurosci 3: 1599 –1606, 1983.
DeLong MR, Crutcher MD, Georgopoulos AP. Primate globus pallidus and
subthalamic nucleus: functional organization. J Neurophysiol 53: 530 –543,
1985.
Elias S, Joshua M, Goldberg JA, Heimer G, Arkadir D, Morris G,
Bergman H. Statistical properties of pauses of the high-frequency discharge
neurons in the external segment of the globus pallidus. J Neurosci 27:
2525–2538, 2007.
Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability
and uncertainty by dopamine neurons. Science 299: 1898 –1902, 2003.
Fuster JM. The Prefrontal Cortex: Anatomy, Physiology, and Neuropsychology of the Frontal Lobes (3rd ed.). Philadelphia, PA: Lippincott–Raven,
1999.
Gdowski MJ, Miller LE, Parrish T, Nenonene EK, Houk JC. Context
dependency in the globus pallidus internal segment during targeted arm
movements. J Neurophysiol 85: 998 –1004, 2001.
Georgopoulos AP, DeLong MR, Crutcher MD. Relations between parameters of step-tracking movements and single cell discharge in the globus
pallidus and subthalamic nucleus of the behaving monkey. J Neurosci 3:
1586 –1598, 1983.
Gurney K, Prescott TJ, Wickens JR, Redgrave P. Computational models of
the basal ganglia: from robots to membranes. Trends Neurosci 27: 453– 459,
2004.
Haber SN, Gdowski MJ. The basal ganglia. In The Human Nervous System,
edited by Paxinos G, Mai JK. Amsterdam: Elsevier, 2004, p. 676 –738.
Handel A, Glimcher PW. Contextual modulation of substantia nigra pars
reticulata neurons. J Neurophysiol 83: 3042–3048, 2000.
771
Results II
772
JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN
Schultz W. Activity of pars reticulata neurons of monkey substantia nigra in
relation to motor, sensory, and complex events. J Neurophysiol 55: 660 –
677, 1986.
Schultz W. Predictive reward signal of dopamine neurons. J Neurophysiol 80:
1–27, 1998.
Schultz W, Dayan P, Montague PR. A neural substrate of prediction and
reward. Science 275: 1593–1599, 1997.
Turner RS, Anderson ME. Pallidal discharge related to the kinematics of
reaching movements in two dimensions. J Neurophysiol 77: 1051–1074, 1997.
Turner RS, Anderson ME. Context-dependent modulation of movement-related
discharge in the primate globus pallidus. J Neurosci 25: 2965–2976, 2005.
Wichmann T, Kliem MA. Neuronal activity in the primate substantia nigra
pars reticulata during the performance of simple and memory-guided elbow
movements. J Neurophysiol 91: 815– 827, 2004.
Windels F, Kiyatkin EA. Dopamine action in the substantia nigra pars
reticulata: iontophoretic studies in awake, unrestrained rats. Eur J Neurosci
24: 1385–1394, 2006.
Wise SP, Kurata K. Set-related activity in the premotor cortex of rhesus
monkeys: effect of triggering cues and relatively long delay intervals.
Somatosens Mot Res 6: 455– 476, 1989.
Wittmann BC, Daw ND, Seymour B, Dolan RJ. Striatal activity underlies
novelty-based choice in humans. Neuron 58: 967–973, 2008.
Yamada H, Matsumoto N, Kimura M. Tonically active neurons in the
primate caudate nucleus and putamen differentially encode instructed motivational outcomes of action. J Neurosci 24: 3500 –3510, 2004.
Yamada H, Matsumoto N, Kimura M. History- and current instructionbased coding of forthcoming behavioral outcomes in the striatum. J Neurophysiol 98: 3557–3567, 2007.
Downloaded from jn.physiology.org on March 1, 2009
J Neurophysiol • VOL
101 • FEBRUARY 2009 •
40
www.jn.org
Results III
Asymmetric Encoding of Positive and Negative Expectations by Low-Frequency
Discharge Basal Ganglia Neurons
Mati Joshua1, 2, Avital Adler1, 2 and Hagai Bergman1, 2, 3
1
Department of Physiology, The Hebrew University-Hadassah Medical School,
Jerusalem, 91120, Israel
2
The Interdisciplinary Center for Neural Computation, The Hebrew University,
Jerusalem, 91904,
3
Eric Roland Center for Neurodegenerative Diseases, The Hebrew University,
Jerusalem, 91904, Israel.
Abstract.
Experimental and theoretical studies depict the basal ganglia as a reinforcement
learning system where the dopaminergic neurons provide reinforcement error
signal by modulation of their firing rate. However, the low tonic discharge rate
of the dopaminergic neurons suggests that their capability to encode negative
events by suppressing firing rate is limited. We recorded the activity of single
neurons in the basal ganglia of two monkeys during the performance of
probabilistic conditioning task with food, neutral and air-puff outcomes. In a
related paper we analyzed the activity of five basal ganglia populations; here we
extend this to the low frequency discharge neurons of the main axis of the basal
ganglia i.e. the striatal phasically active neurons (PANs), and the low frequency
discharge (LFD) neurons in the external segment of the globus pallidus (GPe).
The licking and blinking behavior during the cue presentation epoch reveals that
monkeys expected the different probabilistic appetitive, neutral and aversive
outcomes. Nevertheless, the activity of single striatal and GPe neurons is more
strongly modulated by expectation of reward than by expectation of the aversive
event. The neural-behavioral asymmetry suggests that expectation of aversive
events and rewards are differentially represented at many levels of the basal
ganglia.
41
Results III
1. Introduction
Experimental and theoretical studies depict the basal ganglia as a reinforcement
learning system where the dopaminergic neurons provide the reinforcement error
signal by modulation of their firing rate. Previous studies in primates have shown that
basal ganglia activity is modulated by expectation of rewards. Most of these studies
have focused on midbrain dopaminergic neurons and striatal cholinergic interneurons
(tonically active neurons, TANs Wilson et al. 1990). Midbrain dopaminergic neurons
have been shown to encode the mismatch in the positive domain of reinforcement;
i.e., they respond when conditions are better than expected (Fiorillo et al. 2003; Satoh
et al. 2003; Nakahara et al. 2004; Bayer and Glimcher 2005). TANs also modulate
their activity when a reward is given (Kimura et al. 1984) or expected (Graybiel et al.
1994). However, some reports have indicated that TAN modulation is invariant to
reward predictability (Shimo and Hikosaka 2001; Morris et al. 2004). Finally,
although to a lesser extent, several studies have demonstrated that the main axis of the
basal ganglia is modulated by expectation of reward (Sato and Hikosaka 2002;
Arkadir et al. 2004; Samejima et al. 2005; Darbaky et al. 2005; Pasquereau et al.
2007; Lau and Glimcher 2007).
In contrast to the extensive research on reward related activity, only a few
physiological studies have explored whether neural activity of the basal ganglia
encodes the negative domain (i.e., aversive outcome or omission of rewards, which as
outlined below might not be identical events). Dopamine neurons decrease their firing
rate in response to reward omission (Schultz 1998; Satoh et al. 2003; Matsumoto and
Hikosaka 2007). However, this suppression is limited since the firing rate is truncated
at zero. In fact, other groups (Morris et al. 2004; Bayer and Glimcher 2005) have
reported that the firing of dopaminergic neurons does not demonstrate incremental
encoding of reward omission, and alternative encoding schemes have been proposed
(Tobler et al. 2005; Bayer et al. 2007). There are even fewer studies on the responses
of the basal ganglia neurons to aversive events. Although arising from slightly
different experimental paradigms (instrumental vs. classical conditioning, behaving
vs. anesthetized animals) and slightly different recording locations (ventral tegmental
area vs. the more lateral substantia nigra pars compacta) the findings are
incompatible. Some studies suggest that dopamine neurons increase their firing rate
following aversive events (Guarraci and Kapp 1999; Coizet et al. 2006) whereas
others have evidence of a decrease (Mirenowicz and Schultz 1996; Ungless et al.
2004). There are reports that TAN activity differentiates motivationally opposing
42
Results III
stimuli (Ravel et al. 2003; Yamada et al. 2004), but it remains unclear whether and
how TANs respond to expectation of aversion.
We designed a classical conditioning paradigm with aversive and rewarding
probabilistic outcomes. Thus, symmetric manipulations of expectations of food
(appetitive event) or airpuff (aversive event) were built into the experimental design,
allowing for comparison of neural responses to expectation of positive and negative
outcomes. In a parallel studies (Joshua et al. 2008a; Joshua et al. 2008b) we describe
the activity of basal ganglia critics (midbrain dopaminergic neurons and striatal
cholinergic interneurons) and high frequency (> 50 spikes/s) discharge neurons in the
main axis (external and internal segments of the globus pallidus and the substantia
nigra pars reticulata). In this manuscript, we analyzed the activity of two basal ganglia
populations with low frequency (< 20 spikes/s) discharge, the striatal phasically active
neurons (PANs), and the low frequency discharge (LFD) neurons in the external
segment of the globus pallidus (GPe).
2 Methods
All experimental protocols were performed in accordance with the National Institute
of Health Guide for the Care and Use of Laboratory Animals and with Hebrew
University guidelines for the use and care of laboratory animals in research,
supervised by the institutional animal care and use committee. Two monkeys (L and
S, Macaque fascicularis, female 4 kg and male 5 kg) were engaged in a probabilistic
delay classical-conditioning task. At the beginning of each trial, a visual cue covering
the full extent of a 17" computer screen was presented for 2 seconds. After the cue
offset the monkeys received the outcome in a probabilistic manner. Images were
fractal patterns constructed with the Chaos Pro 3.2 program (www.chaospro.de), with
the same images presented during all training and recording periods. We delivered
liquid food (L: 0.4 ml, 100 ms duration, S: 0.6 ml 150 ms) as the positive reward and
an airpuff (L: 100 ms duration, S: 150 ms; 50-70 psi; 2 cm from eye), split and
directed to both eyes, as an aversive stimulus. To enhance the monkeys' ability to
discriminate trial outcomes, the beginning of the result epoch was signaled by one of
three sounds that discriminated the delivery of food, the delivery of the airpuff and no
outcome: i.e., each possible end result was accompanied by a different sound. Sounds
were normalized to the same intensity and duration. These sounds were additional to
the background device sounds (airpuff solenoid and food pump). Sounds and visual
cues were shuffled between monkeys. All trials were followed by a variable inter trial
interval (ITI) (Monkey S: 3-7 seconds, Monkey L: 4-8 seconds). Due to the different
43
Results III
probabilities and in order to equalize the average occurrence of each outcome we
introduced the non-deterministic cues (P ≠1 for reward or aversive) three times more
than the deterministic ones. With this occurrence ratio, all trials were randomly
interleaved.
3 Results
3.1 Monkey behavior reflects expectation of rewarding and aversive events
We recorded neuronal activity in the basal ganglia (Fig. 1a, b) during performance of
a probabilistic classical conditioning task (Fig. 1c) with food or airpuff as the
rewarding and aversive outcomes, respectively. The two monkeys were introduced to
seven different fractal visual cues, each predicting the outcome in a probabilistic
manner. Three cues predicted a food outcome (reward cues) with a delivery
probability of 1/3, 2/3 and 1; three cues predicted an airpuff outcome (aversive cues)
with a delivery probability of 1/3, 2/3 and 1. The 7th cue (the neutral cue) was never
followed by a food or airpuff outcome. The same seven fractal cues were presented to
both monkeys; however the associated outcomes were randomized for each monkey.
Cues were presented for two seconds and were immediately followed by a result
epoch which could be an outcome (food, airpuff) or no-outcome, according to the
probabilities associated with the cue. The beginning of the result epoch was signaled
by one of three sounds that discriminated the three possible events: a drop of food, an
airpuff, or no outcome (Fig. 1c).
We tested how conditioning affected the monkeys' behavior by monitoring
licking and blinking during neural recordings. The monkeys increased their licking in
response to cues predicting food but only slightly to the aversive and neutral cues.
Similarly the monkeys increased their frequency of blinking to cues predicting an
airpuff but only slightly to reward and neutral cues (Fig. 1d, left). Moreover, the
increase of blinking and licking during the cue epoch was maximal in trials where the
probability of the outcome was 2/3 or 1 and smaller in trials where the probability was
1/3. When no food or airpuff were delivered (no outcome epoch - the p=1/3 or p=2/3
trials) licking and blinking increased, respectively. Furthermore, the increase was in
accordance with the previously instructed probability (Fig. 1d, right). These
behavioral results indicate that the monkeys could distinguish between aversive,
reward and neutral cues and between the different outcome probabilities they were
44
Results III
intended to signal, and that the symmetry in task design was reflected by the monkeys'
behavior.
Figure 1 – MRI, Electrophysiology and Behavior
a) MRI identification of recording coordinates. Coronal MRI image at anterior commissure
level. Tungsten microelectrodes are inserted at known chamber coordinates enabling
identification of the brain structures by alignment of the MRI images with the monkey atlas.
Abbreviations C, caudate; Chm, recording chamber (filled with 3% agar); Elc, electrode; G,
globus pallidus; P, putamen. b) Example from the recordings of a low frequency discharge
neuron in the GPe (top) and a phasically activated neuron in the striatum (down). c)
Behavioral task. Top - reward trials; Middle - neutral trials; Bottom – aversive trials. d)
Normalized behavioral response. Licking (Black) and blinking (Gray) response (average ±
SEM) in a time window around the behavioral event (cue: 500-0 ms before cue ending;
outcome and no-outcome: 0-500 ms after cue ending for blinking response and 500-1000 ms
for licking response). The responses are normalized in each epoch by the minimal and
maximal values (normalized responses = (Response-min)/(max-min)). Abscissa: different
behavioral conditions (A-Aversive, N-Neutral, R-Reward; the number is the outcome
probability).
3.2 PANs and GPe LFD activity is asymmetrically modulated by expectation of
aversive and reward outcomes
45
Results III
We recorded single unit activity from putamen PANs and from GPe LFD neurons
(Fig. 1b). Cells were included in the study if they passed criteria of waveform
isolation (isolation score > 0.5; Joshua et al. 2007) and discharge rate stability. Of the
cells that met these criteria we further analyzed only those that were recorded during
at least 20 minutes of task performance. Finally, we grouped the neurons in the same
structure recorded from both monkeys. A total of 113 neurons were recorded of which
65 neurons (38 PANs and 27 GPe LFD) passed the above criteria.
Figure 2a is an example of a PAN recorded during the performance of the
behavioral task. The neural activity following the reward cue was larger than the
activity following the neutral and aversive cues. Furthermore, the response to reward
cues increased with reward probability. Population analysis shows that both GPe LFD
neurons and PANs had larger responses to reward cues than to aversive cues and a
larger fraction of cells responded to the reward cues (Fig. 2b-c). The population PSTH
(Fig 2b, c – left columns) is an average estimate and may be biased by a few neurons
with an extreme response. However, analysis of the fraction of cells with a significant
response (Fig. 2b, c – right columns) is not sensitive to the relative amplitude of the
responses. We therefore formulated the response-index as a measure of the relative
differences between the neutral vs. the aversive and the reward responses of single
neurons. We found that for the majority of cells the response-index for the reward
trials was larger than the response-index for aversive trials (Fig. 3a-b). In addition, a
substantial fraction of the low frequency discharge neurons in the basal ganglia
showed a significant response-index to reward cues, whereas only a small number of
cells had a significant response-index to aversive cues (Fig. 3c).
46
Results III
Figures 2 – Population responses of striatal PANs and GPe LFD neurons to reward cues
are larger and more common than responses to aversive cues
a) Example of rasters and peri-stimulus time histograms (PSTHs) of a PAN aligned to
behavioral events. The rows are separated according to the expected outcome. First row: trials
with cues that predict the delivery of food. Second row: trials with the neutral cue (a cue
always followed by no outcome). Third row: trials with cues that predict an airpuff. Columns
are aligned according to the trial epoch. First column: cue presentation epoch (-0.5s to 2s after
cue onset). Second column: outcome epoch (-0.5s to 2s after delivery of food or airpuff).
Third column: trials in which no outcome was delivered; outcome omission was signaled to
the monkey by the no-outcome sound (-0.5s to 2s after sound onset). The first 0.5s of the
second and third column overlap the last 0.5s of the first column. Gray level codes are marked
at the right side of the no outcome rasters (A-Aversive, N-Neutral, R-Reward; the number is
the outcome probability). For the graphic presentation, rasters were randomly pruned and
adjusted to contain the same number of trials. PSTHs were constructed by summing activity
across trials in 1ms resolution and then smoothing with a Gaussian window (SD of 20ms).
b) Left column: Population responses of PANs (n=38) to behavioral cues. PSTH were
smoothed with a Gaussian (SD = 20) and averaged across cells. Black – average responses to
reward predicting cues, Gray – neutral cue, Light gray – aversive cues. Right column:
Fraction of cells with significant (2-sigma rule) modulations of firing rate in the cue epoch.
Same color coding as in left column. Neutral events are not included to enable inclusion of all
rewarding/aversive events in the statistical tests. The ordinate is the fraction of cells that had a
47
Results III
significant response at each time bin (1ms). c) Left column: Population responses of GPe
LFD neurons (n= 27) to behavioral cues. Smoothing with a SD =40 ms. Right column:
Fraction of cells with significant firing rate modulations in the cue epoch. Same analysis and
gray level code as in b.
Figure 3 – Response-index analysis reveals larger responses to reward than to aversive
cues
a) Scatter plots comparing the response-index of individual PAN to reward and aversive cues.
Response-index was calculated for each cell as the absolute difference between the (aversive
or reward) cue-aligned PSTH and the PSTH of the neutral cue. The black line is the identity
(Y=X) line. Points below this line represent cells with a response-index that is larger for the
reward cues than for aversive cues. The differences in the scale between nuclei reflect
differences in modulation size. Significance level was p<0.05. The time window used for this
analysis was 0-2000 ms from cue presentation. b) Same as a, for the GPe LFD population. c)
Summary of the fraction of cells with a significant response-index.
4 Discussion
48
Results III
We have shown that despite the symmetry in behavior, expectation of reward - but not
of an aversive event - affects rate modulations of basal ganglia low-frequency
discharge neurons beyond the modulation which followed a neutral cue. Asymmetry
in value expectation is congruent with theories on the localization of antagonistic
motivational systems (Konorski 1967). It has been shown that neural systems other
than the basal ganglia; e.g., the amygdala (LeDoux 2000) and the cerebellum
(McCormick and Thompson 1984), are involved in aversive conditioning. In a
parallel studies, we showed that both the neuromodulators (TANs and SNc) and basal
ganglia populations with high frequency discharge (SNr, GPi and GPe) encode
expectation of reward but not the expectation of an aversive event (Joshua et al.
2008a; Joshua et al. 2008b). Here we extend this notion and show similar results for
the PANs and for the LFD neurons of the GPe.
In a previous study, Samejima et al. (2005) showed that the activity of many
striatal projection neurons was selective to both values and action; however, only a
few neurons were tuned to relative values or action choice. Lau et al. (2007) reported
that the encoding of action and outcome was carried out by largely separate
populations of caudate neurons that were active after movement execution. Although
these studies focus on different trial epochs they are suggestive of two different
coding schemes. In our task the monkey performed actions in the cue epoch both in
the reward and the aversive trials but not in the neutral trials (Fig 1d). However,
although the action dissociates neutral and aversive trials, responses to the aversive
and neutral cue were the same (Fig 3a). This suggests that it is not action per se that is
encoded in these neurons. Furthermore, in the reward trials we found considerable
modulations, suggesting that it is the action-value that is encoded in the striatum.
We found that the fraction of cells with a short latency response was larger for
the GPe LFD than for the PANs (Fig. 2b right vs. 2c right). This difference is
surprising since the striatum is the main input of the GPe. Our recordings in the
striatum are limited to the putamen and it could be that the fast responses of GPe LFD
are due to input from other Striatum territories with faster responses.
Human decisions are not symmetric in response to negative and positive
prospects (Tversky and Kahneman 1981). Here, we show that the basal ganglia
encoding of the positive domain surpasses their encoding of the negative domain. We
extend our report to the input stage of the basal ganglia – the striatum, as well as to a
unique population of neurons in the GPe – the low frequency discharge neurons.
Having two biological systems, one for the aversive domain and one for the reward
49
Results III
domain might be the neural basis of risk-aversive, asymmetric and non-rational
human behavior (Tversky and Kahneman 1981).
Reference List
1. D. Arkadir, G. Morris, E. Vaadia, H. Bergman, J. Neurosci. 24, 10047 (2004).
2.
H. M. Bayer, P. W. Glimcher, Neuron 47, 129 (2005).
3. H. M. Bayer, B. Lau, P. W. Glimcher, J. Neurophysiol. 98, 1428 (2007).
4. V. Coizet, E. J. Dommett, P. Redgrave, P. G. Overton, Neuroscience 139, 1479
(2006).
5. Y. Darbaky, C. Baunez, P. Arecchi, E. Legallet, P. Apicella, Neuroreport 16,
1241 (2005).
6. C. D. Fiorillo, P. N. Tobler, W. Schultz, Science 299, 1898 (2003).
7. A. M. Graybiel, T. Aosaki, A. W. Flaherty, M. Kimura, Science 265, 1826 (1994).
8. F. A. Guarraci, B. S. Kapp, Behav. Brain Res. 99, 169 (1999).
9. M. Joshua, A. Adler, R. Mitelman, E. Vaadia, H. Bergman. Asymmetric Encoding
of Positive and Negative Expectations in the Basal Ganglia. Submitted . 2008.
Ref Type: Journal (Full)
10. M. Joshua, S. Elias, O. Levine, H. Bergman, J. Neurosci. Methods 163, 267
(2007).
11. M. Kimura, J. Rajkowski, E. Evarts, Proc. Natl. Acad. Sci U. S. A. 81, 4998
(1984).
12. J. Konorski, Inegrative Activity of the Brain: An Interdisciplinary Approach
(Chicago Univ. Press, Chicago, 1967).
13. B. Lau, P. W. Glimcher, J. Neurosci. 27, 14502 (2007).
14. J. E. LeDoux, in The Amygdala, J. P. Aggleton, Ed. (Oxford University, 2000)
,chap. 7, pp. 289-310.
15. M. Matsumoto, O. Hikosaka, Nature 447, 1111 (2007).
16. D. A. McCormick, R. F. Thompson, Science 223, 296 (1984).
17. J. Mirenowicz, W. Schultz, Nature 379, 449 (1996).
18. G. Morris, D. Arkadir, A. Nevet, E. Vaadia, H. Bergman, Neuron 43, 133 (2004).
19. H. Nakahara, H. Itoh, R. Kawagoe, Y. Takikawa, O. Hikosaka, Neuron 41, 269
(2004).
20. B. Pasquereau et al., J. Neurosci. 27, 1176 (2007).
21. S. Ravel, E. Legallet, P. Apicella, J. Neurosci. 23, 8489 (2003).
22. K. Samejima, Y. Ueda, K. Doya, M. Kimura, Science 310, 1337 (2005).
50
Results III
23. M. Sato, O. Hikosaka, J. Neurosci. 22, 2363 (2002).
24. T. Satoh, S. Nakai, T. Sato, M. Kimura, J. Neurosci. 23, 9913 (2003).
25. W. Schultz, J. Neurophysiol. 80, 1 (1998).
26. Y. Shimo, O. Hikosaka, J. Neurosci. 21, 7804 (2001).
27. P. N. Tobler, C. D. Fiorillo, W. Schultz, Science 307, 1642 (2005).
28. A. Tversky, D. Kahneman, Science 211, 453 (1981).
29. M. A. Ungless, P. J. Magill, J. P. Bolam, Science 303, 2040 (2004).
30. C. J. Wilson, H. T. Chang, S. T. Kitai, J. Neurosci. 10, 508 (1990).
31. H. Yamada, N. Matsumoto, M. Kimura, J. Neurosci. 24, 3500 (2004).
51
Results IV
Neuron
Article
Synchronization of Midbrain Dopaminergic Neurons
Is Enhanced by Rewarding Events
Mati Joshua,1,2,* Avital Adler,1,2 Yifat Prut,1,2 Eilon Vaadia,1,2 Jeffery R. Wickens,4 and Hagai Bergman1,2,3
1Department
of Physiology, The Hebrew University-Hadassah Medical School, Jerusalem 91120, Israel
Interdisciplinary Center for Neural Computation
3Eric Roland Center for Neurodegenerative Diseases
The Hebrew University, Jerusalem 91904, Israel
4Okinawa Institute of Science and Technology, 12-22, Suzaki, Uruma, Okinawa 904-2234, Japan
*Correspondence: [email protected]
DOI 10.1016/j.neuron.2009.04.026
2The
SUMMARY
The basal ganglia network is divided into two functionally related subsystems: the neuromodulators and
the main axis. It is assumed that neuromodulators
adjust cortico-striatal coupling. This adjustment
might depend on the response properties and
temporal interactions between neuromodulators. We
studied functional interactions between simultaneously recorded pairs of neurons in the basal ganglia
while monkeys performed a classical conditioning
task that included rewarding, neutral, and aversive
events. Neurons that belong to a single neuromodulator group exhibited similar average responses,
whereas main axis neurons responded in a highly
diverse manner. Dopaminergic neuromodulators
transiently increased trial-to-trial (noise) correlation
following rewarding but not aversive events, whereas
cholinergic neurons of the striatum decreased their
trial-to-trial correlation. These changes in functional
connectivity occurred at different epochs of the trial.
Thus, the coding scheme of neuromodulators (but
not main axis neurons) can be viewed as a singledimensional code that is further enriched by dynamic
neuronal interactions.
INTRODUCTION
Technical advances enabling recordings of the simultaneous
activity of several neurons (Abeles, 1982; Eggermont, 1990;
Baker et al., 1999) have made it possible to study the properties
of neuronal networks. Early studies (Perkel et al., 1967; Abeles,
1982; Aertsen et al., 1989; Bartho et al., 2004) focused on detection and quantization of the functional connectivity between
neurons (e.g., direct excitatory, inhibitory synapses or common
synaptic inputs). In the basal ganglia (Bergman et al., 1998),
this approach was used to provide insights into the debate
regarding the existence of parallel segregated basal ganglia
pathways (Alexander et al., 1986) versus a convergent funneling
architecture (Percheron et al., 1984; Percheron and Filion, 1991).
Recent studies have used data from simultaneously recorded
neurons to examine issues related to encoding/decoding and
information processing in the nervous system (Gawne and Richmond, 1993; Schneidman et al., 2003; Averbeck et al., 2006).
One study conducted by our group (Nevet et al., 2007) showed
that contrary to the positive noise and signal correlation found
between pairs of cortical neurons (Gawne and Richmond,
1993; Zohary et al., 1994; Lee et al., 1998; Yanai et al., 2007),
the average correlation in the substantia nigra pars reticulata
(SNr) population does not differ significantly from zero. However,
there are no studies of correlations exploring the similarity of
average responses of neurons in other structures of the basal
ganglia such as the globus pallidus external and internal
segments (GPe and GPi respectively) on the one hand, or the
neuromodulators of the basal ganglia, such as tonically active
neurons (TANs, striatal cholinergic interneurons) and midbrain
dopaminergic neurons (DANs) on the other. Moreover, there
are no studies on the basal ganglia that have examined dynamics
in the correlation of trial-by-trial discharge variations; i.e., the
dynamics of the noise correlation.
The division of the basal ganglia into neuromodulator and main
axis subsystems is based on both anatomical (Parent and Hazrati,
1995; Haber and Gdowski, 2004) and physiological properties of
these neurons (DeLong, 1971; Grace and Bunney, 1983a; Kimura
et al., 1984; Joshua et al., 2008, 2009). It was suggested that
the neuromodulators provide the network a single-dimensional
signal (scalar) and that the main axis utilizes this scalar (Schultz,
1998; Bar-Gad et al., 2003). The most common basal ganglia
models suggest that they operate as a reinforcement learning
system in which the DANs encode the temporal-difference
prediction error (Schultz et al., 1997). These models assume
that the teaching message is transmitted to all striatal territories,
and the neural plasticity of the cortico-striatal synapses is regulated by a homogenous dopamine signal and selective corticostriatal activity (Arbuthnott and Wickens, 2007). The cholinergic
interneurons are assumed to mediate or complement the
teaching message of the DANs (Centonze et al., 2003; Pisani
et al., 2007). Models that include the basal ganglia main axis
suggest that by contrast to the scalar nature of the neuromodulators, the main axis activity is diverse (Mink, 1996; Bar-Gad et al.,
2003). The GABAergic lateral connections in the main axis (Tunstall et al., 2002; Plenz, 2003; Haber and Gdowski, 2004) support
the notion of a competitive component in the activity of main axis
neurons (Fukai and Tanaka, 1997; Frank et al., 2004).
52
Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 695
Results IV Neuron
Basal Ganglia Correlations
Figure 1. Recording and Behavioral Task
(A) Behavioral task. Classical conditioning task with
three cues that predicted a food outcome (reward
cues), three cues predicted an airpuff outcome
(aversive cues), and one neutral cue. The outcome
delivery on each trial was randomized according to
a fixed probability associated with the trial cue.
Cues were randomized between monkeys and
are shown as presented to monkey S.
(B) Top: Simultaneous extracellular recordings
from eight electrodes in the globus pallidus. In
seven electrodes the cells were classified as GPe
pausers, and one of the cells was classified as
a pallidal border cell (electrode 6). Bottom: Simultaneous extracellular recordings of TANs from six
electrodes in the striatum. Data are shown after
300–6000 Hz digital band-pass filtering.
(C) A schematic diagram of basal ganglia connectivity. Dark blue arrows indicate glutamatergic
excitatory connections; light blue arrows,
GABAergic inhibitory connections; red, neuromodulators. Abbreviations: GPe indicates external
segment of the globus pallidus; GPi, internal
segment of the globus pallidus; SNc, substantia
nigra pars compacta; SNr, substantia nigra pars
reticulata; STN, subthalamic nucleus; TAN, tonically active neurons (putative striatal cholinergic
interneurons).
The recent development of efficient tools for simultaneous
recording of multineuron activity from the basal ganglia makes it
possible to explore the correlation of basal ganglia neurons. Given
the above, our working hypothesis predicts that the responses of
neuromodulators should be homogenous and synchronized
whereas main axis activity should be diverse and independent.
In addition, the temporal modulation of noise correlation (Aertsen
et al., 1989; Vaadia et al., 1995; Baker et al., 2001) might provide
another domain, beyond rate and pattern, for neuronal encoding.
RESULTS
Behavior Task and the Neuronal Data Base
Two monkeys were introduced to seven different visual cues,
each predicting the outcome in a probabilistic manner (Figure 1A).
696 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc.
53
Three cues predicted a food outcome
(reward cues) with a delivery probability
of 1/3, 2/3, and 1, and three cues predicted an airpuff outcome (aversive cues)
with a delivery probability of 1/3, 2/3,
and 1. The seventh cue (the neutral cue)
was never followed by a food or an airpuff
outcome. Thus the task contained 18
different events, i.e., 7 different cues and
11 cue-outcome/no-outcome combinations. During the task we recorded the
spiking activity of two to eight electrodes
simultaneously (see Figure 1B for an
example of simultaneous recordings of
eight electrodes in the globus pallidus
and for the simultaneous recording of six electrodes in the striatum that show activity of TANs). To avoid bias caused by shadowing effects (Lewicki, 1998; Bar-Gad et al., 2001), we limited
this study to units recorded by different electrodes. Our neural
database included 163 TANs, 144 DANs, 368 GPe, 158 GPi,
and 174 SNr pairs of neurons (see Figure 1C for schematic
network diagram) that were recorded simultaneously and satisfied the study inclusion criteria (see Experimental Procedures)
for more than 30 successive minutes during task performance.
Response Homogeneity of Neuromodulators
versus Diversity of Responses in the Main
Axis of Basal Ganglia Networks
We used the response correlation (Nevet et al., 2007) to quantify
the similarity of the responses of a pair of cells to the same event.
Results IV
Neuron
Basal Ganglia Correlations
Figure 2. Response Correlation Reveals Similarity of
Responses of the Basal Ganglia Modulators versus
Heterogeneity of Responses of Main Axis Neurons
(A) Distribution of the GPe, GPi, and SNr (main axis) response
correlations. Only responses with significant rate modulations
of both neurons were included. N indicates number of
included response pairs out of the total number of response
pairs. For this analysis we constructed the PSTHs for the 2 s
after the event onset in bins of 1 ms and smoothed them
with a Gaussian filter of SD = 20 ms.
(B) Distribution of the DAN and TAN (neuromodulators)
response correlations (same conventions as in A).
(C) The mean and SEM of the response correlation in each of
the recorded populations.
(D) The percentage of significant response correlations (t test;
p < 0.05). Black indicates positive response correlations;
white, negative response correlations. The smoothing of the
PSTHs leads to dependency between bins, and hence for
the significance testing we constructed the PSTHs in bins of
50 ms with no smoothing.
The response correlation is the correlation coefficient between
two average responses (poststimulus time histogram [PSTH])
and hence quantifies the similarity of the temporal pattern of
the responses. Figure 2 shows the distribution of the response
correlation analysis for all studied populations. The response
correlations for the GPe, GPi, and SNr neurons were symmetrically distributed with an average close to zero (Figure 2A).
However, the distribution of the response correlation of DANs
and TANs was skewed toward positive values (Figure 2B). The
mean response correlation of the neuromodulators was larger
than the mean correlation for the main axis (p < 0.001; t test
on the z transformed values, Figure 2C). We found that the difference was also apparent in the fraction of significant positive and
negative response correlations. A large proportion of the positive
response correlations of the DANs and TANs were significantly
different from zero, but this was true for only a small proportion
of the negative correlations (Figure 2D). In the GPe, GPi, and
SNr, although many of the response correlations were significantly different from zero, the proportion of cells with positive
and negative response correlations was similar (Figure 2D). We
conclude that the neuromodulators of the basal ganglia have
homogenous responses whereas the responses of the main
axis are diverse.
Response correlation analysis tests the correlation between pairs of responses to single events;
however, it does not directly test the correlation
between the average responses of pairs of neurons
to more than one event. To test whether encoding of
different events is correlated we performed signal
correlation analysis (Gawne and Richmond, 1993;
Lee et al., 1998; Averbeck and Lee, 2004). We found
that the signal and response correlation analysis
yielded similar results; i.e., the distribution of the
signal correlation of the neuromodulators was
skewed toward positive values and for the main
axis the signal correlation was symmetrically
distributed with an average close to zero (see
Figures S1A–S1D available online). Comparing the signal and
response correlations showed that these two correlation
measures were correlated (Figure S1E). This indicates that the
cell pairs with comparable temporal response pattern are
those that encode different events similarly. To summarize, the
average responses of the basal ganglia neuromodulators (TANs
and DANs) were homogeneous, in contrast to the diverse
responses of neurons in the main axis of the basal ganglia
(GPe, GPi, and SNr).
Reward Expectation and Delivery Enhances Temporal
Modulation of DAN Correlations
The response and signal correlations are measures of the correlation of the average responses (across trials) of two cells and
do not take into account the dynamic changes in their noise
correlation (correlations between variations from the average
response) that can occur within a given epoch (see Figure S2
for average noise correlation). We therefore calculated the joint
peristimulus histogram (JPSTH) (Gerstein and Perkel, 1969; Aertsen et al., 1989; Vaadia et al., 1995). The JPSTH is obtained by
subtracting the PSTH predictors from the raw coincident count
matrix to obtain an estimate of the unpredicted correlations,
i.e., correlations beyond those predicted by the modulation of
54
Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 697
Results IV Neuron
Basal Ganglia Correlations
Figure 3. Noise Correlation of DAN Pairs Increased with Expectation of Reward and Reward Delivery but Not for Aversive Events
(A) The population JPSTH of the DANs (n = 144 pairs) for the reward trials. Left, cue; middle, outcome; right, no outcome. Bin size 50 3 50 ms, smoothed with
a two-dimensional Gaussian filter with SD = 1 bin. The different JPSTHs have different intensity (color bars on the right) scales to enhance the visibility of the
correlation dynamics.
(B) The DAN population JPSTH for aversive trials. Corresponding epochs in (A) and (B) have the same color scaling to enable comparison of aversive and reward
JPSTHs.
the average discharge rate (see Figure S3 for three examples of
JPSTH analysis). Note that the JPSTH diagonal quantifies the
time-dependent modulation of zero lag noise correlation.
We extended the JPSTH analysis of a single neuron pair to the
populations of neuromodulator neurons. To examine whether
the DANs noise correlation depends on the context of the behavioral task, we analyzed the reward and aversive trials separately.
In Figure 3 we show the separation of the DAN population
JPSTHs into reward and aversive trials. In the cue and outcome
epochs, the DAN noise correlation increased only for the reward
trials (Figure 3A) but not for the aversive trials (Figure 3B). Testing
for differences between the average JPSTH diagonal before and
after the event (paired t test on the average diagonal comparing
0.5–0.0 s versus 0.1–0.6 s) shows that there was a substantial
increase in the noise correlation for the reward cue (p < 0.01) and
outcome (p < 0.001) as compared with a nonsignificant increase
for the aversive cue (p = 0.46) and a nonsignificant decrease for
the aversive outcome (p = 0.06).
The JPSTH analysis revealed changes in the synchronization
level beyond those expected by the changes in firing rate (Aertsen et al., 1989). In Figure 4 we show the comparison between
synchronization and rate modulations (JPSTH and predictor
diagonals, respectively). We found that although there was an
increase in rate for both reward and aversive trials (Figure 4A
and Joshua et al., 2008), the increase in the noise correlation
was found only in the reward trials (Figure 4B, and see Figure 4C
698 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc.
55
for a comparison of noise correlation dynamics for epochs with
similar rate modulation). Furthermore, the JPSTH analysis for
the subset of dopaminergic pairs that simultaneously increase
their firing rate to aversive outcome shows that the noise correlation of these cells does not increase (Figure S4).
JPSTH analysis of the TANs did not reveal a correlation encoding of the rewarding versus aversive events (Figure S5). Figure 5
shows the results of the significance test (paired t test)
comparing the JPSTH diagonals for the reward and aversive
trials. The difference between reward and aversive in the cue
and outcome epochs was highly significant for the DANs
(Figure 5, red line) but not for the TAN pairs (Figure 5, green
line). Thus, the transient changes in noise correlation in the
DANs, but not TANs, discriminate between reward and aversive
related events.
TANs Show an Unspecific Decrease in Noise Correlation
before Cue Ending
Figure 6 presents the analysis of the population JPSTH for the
TANs (from 0.5 s before cue onset to 1 s after cue offset and
the beginning of the outcome/no-outcome epoch). We grouped
the outcome and no-outcome epochs because we did not find
significant differences between their JPSTHs (paired t test; p >
0.l6). As was previously shown (Raz et al., 1996; Kimura et al.,
2003; Morris et al., 2004), we found that TANs tend to have positive noise correlations. In comparison to the fast increase of the
Results IV
Neuron
Basal Ganglia Correlations
Figure 4. Modulations of DAN Noise Correlation Do Not Mirror Rate Modulation
(A) Common rate modulations: Diagonal of the PSTH predictor (±SEM in gray shading, n = 144 DAN pairs) for the reward (blue) and aversive events (red). Left, cue;
middle, outcome; right, no outcome.
(B) Zero lag noise correlation: JPSTH diagonal (±SEM in gray shading) of the DANs for the reward (blue) and aversive (red) events. Same conventions as in (A).
(C) An example of reward and aversive events with similar rate modulation but opposite JPSTH modulations. Left: Predictor diagonal (common rate modulation)
for reward cue (blue solid line) and aversive outcome (red solid line). Right : Corresponding JPSTH diagonals (noise correlation modulations). The rate and JPSTH
modulation of the other events in (A) and (B) left and middle subplots are given in dashed lines. Although both PSTH predictors (common rate modulations) have
a similar positive peak (left), only the diagonal of the JPSTH for the reward cue has positive modulations (right).
noise correlation of the DANs (Figures 3 and 4) following the
onset of rewarding cue and outcome, the TAN correlations
decreased gradually during the cue epoch and increased in the
outcome epoch (Figures 6A and 6B). We found that the TANs
correlation and rate modulations tended to be separated in
time (Figure 6C).
DISCUSSION
We showed that the responses of cells from the same neuromodulator population (TANs or DANs) tended to have a positive
correlation. In comparison to the homogenous responses of
the basal ganglia modulators, the neurons of the basal ganglia
main axis had diverse responses. Pairs of DANs, as well as pairs
of TANs, dynamically modulate their discharge variation (noise
correlation) in accordance with events in the behavioral task.
The noise correlation between the DANs increased after the
cue and outcome events, whereas the TANs noise correlation
decreased just before cue offset. Furthermore, although the
discharge rate of the DANs increased both in reward and
aversive trials, their noise correlation increased only in the
reward trials.
Correlations of the Average Response Set
Neuromodulators Apart from the Main Axis
Previous studies have observed that different neuromodulator
cells have responses with similar temporal patterns (Graybiel
et al., 1994; Schultz, 1998). In this manuscript we quantified
the similarity of the temporal pattern of the response (response
correlation) and the similarity of the encoding of different events
(signal correlation). We showed that in contrast to the basal
ganglia neuromodulators, the main axis responses are diverse
(Figures 2, S1, and S2). The homogeneous responses of the neuromodulators suggest that these populations as a whole provide
the main axis with a scalar message; i.e., the encoding of
different DANs, as well as different TANs, is similar. By contrast,
the diversity of the main axis responses suggests that its activity
is highly independent, which is conducive to a large information
capacity (Bar-Gad et al., 2003). The contrast between the diversity of the main axis response and the homogeneity of the
56
Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 699
Results IV Neuron
Basal Ganglia Correlations
scalar response is consistent with these neurons being the
teacher (e.g., a critic) of this system. The actor, however,
requires specificity in encoding of different neuronal elements.
Indeed we have found such diversity in the encoding of the
main axis neurons.
Figure 5. DAN but Not TAN Noise Correlation Differentiates Reward
from Aversive Trials
The surprise ( ln(p), p of the paired t test) of the difference between reward
and aversive JPSTH diagonals for TANs (green) and DANs (red) neuronal pairs.
Dashed line indicates surprise at p = 0.01, values above the dashed line indicate p < 0.01 events. Top, cue; middle, outcome; bottom, no outcome.
modulators was demonstrated in a behavioral task with 18
different events. Nevertheless, we cannot rule out the possibility
the recording of neural activity during other tasks or over greater
spatial distances (including DANs in the ventral tegmental area
and TANs in the caudate or ventral striatum) might reveal other
effects. Future studies using a large variety of tasks and wider
sampling of basal ganglia neurons should test the consistency
and the spatial extent of the homogeneity of the basal ganglia
modulators.
Based mainly on the activity of the DANs, it has been suggested that the basal ganglia implements a reinforcement
learning algorithm (Schultz et al., 1997). The distinction between
the correlation properties of neuromodulators and the main axis
is in line with the idea that these populations have a different
role in the reinforcement learning system. The neuromodulators’
700 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc.
57
Limitations of JPSTH Analysis
Several factors limit the interpretation of JPSTH analysis. Variability of latency or excitability effects contribute confounding
factors to the JPSTH matrix (Brody, 1999). We could not
unequivocally exclude the possibility that these effects contributed to our JPSTHs. For the TANs, however, this is unlikely
because the decrease in noise correlation toward the end of
the cue epoch does not overlap with the typical fast and transient
TAN response (Figure 6C). For the DANs we indeed found
a tendency toward coincidence of noise correlation and rate
modulations, but the JPSTH analysis dissociated the rewarding
and aversive events which nevertheless have similar rate modulations (Figure 4).
Trial-to-trial variability in action might also confound the interpretation of JPSTH analysis (Ben Shaul et al., 2001). Previously
we have shown that due to their motor-related sustained
responses, the JPSTHs of main axis neuronal pairs are sensitive
to false detection of dynamic changes (Arkadir et al., 2002).
However, action itself is not encoded in neuromodulators
(Kimura et al., 1984; Schultz, 1998; Morris et al., 2004). Hence,
we conclude that variability in action did not contribute to the
neuromodulator JPSTH analysis.
The neuromodulators’ firing pattern is composed of a stereotypic short latency phasic response to external events and tonic
Poisson-like activity between these responses. (Kimura et al.,
1984; Schultz, 1986; Bayer et al., 2007). This excludes the possibility that opposite signs of neural transients lead to detection of
discharge covariation without rate modulations (Friston, 1995).
We do not exclude the possibility that the increase in the correlation of the DAN population at the time of the response is due
to dynamics of neural transients. Other possibilities are that
the increase in correlation is due to changes in the effective
connectivity in the dopaminergic neuron network or covariability
of inputs. Hence we did not focus on the source of correlation,
but refer to the possible effect of the correlation dynamics on
the postsynaptic striatal neurons (see below).
Thus the JPSTH analysis of the neuromodulators can be
considered valid and provides valuable insights into the encoding of the basal ganglia. Similar studies of the dynamics of noise
correlation of the basal ganglia main axis neurons will need to
wait for future technical and methodological advances.
Reward-Related Increase in the Noise Correlation
of Dopaminergic Neurons
Previous studies have shown that the discharge rate of DANs is
modulated by reward, and it was suggested that these neurons
encode the reward prediction error (Schultz, 1997; Nakahara
et al., 2004; Bayer and Glimcher, 2005; Pan et al., 2005; Morris
et al., 2006). Other behavioral factors might also lead to an
increase in the dopaminergic rate (Horvitz, 2000; Kakade and
Dayan, 2002; Redgrave and Gurney, 2006; Day et al., 2007).
We showed that in a classical conditioning task, the activity of
Results IV
Neuron
Basal Ganglia Correlations
Figure 6. Population JPSTH of TANs
Reveals a Decrease in Noise Correlation
around Cue Offset
(A) The population JPSTH of the TANs (n = 163
pairs). Bin size 50 3 50 ms, smoothed with a twodimensional Gaussian filter with SD = 1 bin. Cue
appeared at time 0 and lasted until the beginning
of the outcome/no-outcome epochs at time = 2 s
(marked by dashed lines).
(B) Diagonal of the population JPSTH (smoothed
with Gaussian kernel, SD = 1 bin), average in solid
line and SEM in light gray.
(C) The mean diagonal of TAN JPSTH (blue) and
the mean PSTH predictor (common rate modulation, green) superimposed. The temporal pattern
of noise correlation modulations does not reflect
the temporal pattern of rate modulations. Specifically, the decrease in noise correlation before the
end of the cue epoch is not coincident with rate
modulations.
the dopaminergic neurons also increased following nonrewarding events such as the prediction and delivery of airpuffs (Figures
4 and S4, and Joshua et al., 2008). Nonetheless, we found an
increase in the noise correlation of DANs to expectation and
delivery of reward and not to other events (Figures 3 and 4).
These finding for a reward-related increase of the noise correlation extend previous findings of unspecific spike-to-spike (noise)
correlations of the DANs (Grace and Bunney, 1983b; Morris
et al., 2004).
The modulations of the noise correlation were small compared
with the modulations of rate (Figure 4). In a recent study, Schneidman et al. (2006) showed that a weak pairwise correlation might
imply a strongly correlated network and provides an effective
description of the system. It remains to be determined whether
pairwise correlations can yield an effective description of the
dopaminergic neurons because current recording methods
do not enable in vivo simultaneous recording of many neurons;
nevertheless, it demonstrates the potential importance of the
current finding of an increase in the pairwise noise correlations.
Dopamine transmission is probably not limited to classical
synaptic action because it might also diffuse and reach extrasynaptic receptors (Cragg and Rice, 2004; Arbuthnott and Wickens,
2007; Moss and Bolam, 2008). The spatiotemporal distribution of
dopamine effects in the striatum depends on the interaction of
release, reuptake, and diffusion. The degree of temporal correlation of the release events influences the relative importance of
reuptake versus diffusion. Reuptake by the dopamine transporter
is a slow process compared with diffusion of dopamine away
from a synapse. Diffusion produces a relatively rapid decrease
in concentration if the extracellular concentration of dopamine
from other sources is relatively low. However, if dopamine is
released from many adjacent sources simultaneously, diffusion
is slowed, and reuptake predominates. We used a one-dimensional random walk model to simulate diffusion of dopamine
from multiple sources, combined with Michaelis-Menten reuptake kinetics. In Figure S6 we show that the DAN correlation
might increase the efficiency of dopamine signaling by reduced
clearance through diffusion in the correlated condition. Future
studies, using 3D models of the striatum and more comprehen-
sive models of correlated DAN activity, could provide a
better understanding of the physiological significance of this
phenomenon.
TAN Correlations Are Modulated by Task Timing
but Not by Value
Previous studies have shown that TANs are highly synchronized
(Raz et al., 1996; Kimura et al., 2003; Morris et al., 2004).
However, these studies did not consider the temporal dynamics
of the noise correlation. Consistent with these studies, we found
that TANs are indeed highly synchronized. Additionally, we found
that there is a decrease in their noise correlation just before cue
offset (Figure 6). This decrease in noise correlation did not
discriminate significantly between the aversive and reward trials
(Figures 5 and S5) and appears after the average TAN discharge
rate returns to baseline (Figure 6C). It was shown that subpopulations of striatal projection cells encode the outcome stages of
the task (Lau and Glimcher, 2007). Thus the decorrelation of
TANs at the end of the cue epoch could enable or facilitate this
encoding of striatal projection neurons through the cholinergic
control of cortico-striatal plasticity (Calabresi et al., 2000; Pisani
et al., 2007).
Concluding Remarks
Consistent with the classical concept of dopamine-acetylcholine
balance (Barbeau, 1962), the DANs and the TANs have opposing
single cell responses. DANs typically increase their discharge
rate in response to appetitive predictive cues and outcomes
(Schultz, 1998), whereas TANs show a decrease or pause in their
background discharge (Aosaki et al., 1994). We found that during
the cue epoch the noise correlation of the DANs increases,
whereas the correlation for the TANs decreases. We therefore
suggest that the concept of dopamine-acetylcholine balance
can be extended to the noise correlation of these systems. It is
possible that increasing the DAN correlation and the decorrelation of TANs enables an increase and decrease, respectively,
in the effective concentrations of striatal dopamine and acetylcholine. The right balance of the basal ganglia neuromodulators
and cortico-striatal activity might lead to a maximization of
58
Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 701
Results IV Neuron
Basal Ganglia Correlations
information in the basal ganglia main axis and an optimal behavioral policy.
EXPERIMENTAL PROCEDURES
All experimental protocols were conducted in accordance with the National
Institutes of Health Guide for the Care and Use of Laboratory Animals and
with the Hebrew University guidelines for the use and care of laboratory
animals in research, supervised by the Institutional Animal Care and Use
Committee. Behavioral task, data-recoding methods, and single cell analysis
appear in detail in previous manuscripts (Joshua et al., 2008, 2009). Here we
present a brief summary of these methods and describe methods not used
in the previous manuscripts.
Behavioral Task
Two monkeys (L and S, Macaca fascicularis, female 4 kg and male 5 kg) were
introduced to seven different fractal visual cues, each predicting the outcome
in a probabilistic manner (Figure 1A). Fractal cues (full-screen images, 17’’ LCD
monitor, 50 cm in front of the monkey’s face) were presented for 2 s. The cues
were immediately followed by a result epoch, which could include an outcome
(food, airpuff) or no outcome, according to the probabilities associated with the
cue. The beginning of the result epoch was signaled by one of three sounds
that discriminated the three possible events: a drop of food, an airpuff, or no
outcome. Trials were followed by a variable intertrial interval (ITI, monkey S:
3–7 s, monkey L: 4–8 s; Figure 1A).
Recording and Data Acquisition
During the acquisition of the neuronal data, two experimenters (M.J. and A.A.)
controlled the vertical position of the eight glass-coated tungsten electrodes
(confined with 1.65 mm guide) and real-time spike sorting (AlphaMap, ASD,
AlphaOmega). Recorded units were subjected to offline quality analysis that
included tests for rate stability, refractory period, waveform isolation, and
recording time. First, firing rate as a function of time during the recording
session was graphically displayed, and the largest continuous segment of
stable data was selected for further analysis. Second, cells in which more
than 0.02 of the total ISIs were shorter than 2 ms were excluded from the database. Third, only units with an isolation score (Joshua et al., 2007) above 0.8
(except for the DANs, in which we used a threshold of isolation score > 0.5)
were included in the database. The lower threshold used for the DANs is
due to the highly dense cellular structure of the SNc, which makes single
cell isolation difficult. We also performed the analysis on the high-quality
DANs (isolation score > 0.8) and received similar results to those reported.
The largest segment for which two simultaneously recorded units fulfilled the
inclusion criteria was included in the analysis database only if it was greater
than 30 min.
Quantification of Similarity of Temporal Profile of Neuronal
Responses: Response Correlation Analysis
For each cell and each behavioral event, we calculated the PSTH. Each of
these PSTHs is an n-dimensional vector, where n is the number of 1 ms bins
in the histogram (n = 2000 bins, starting at the event onset). This vector was
smoothed with a Gaussian window (standard deviation [SD] = 20 ms). To avoid
spurious positive correlations due to smoothing of the PSTHs, we padded the
PSTH edges with the mirrors of the PSTHs before smoothing. Responses were
considered significant if they exceeded the mean of the ITI three times the ITI
SD (3 s rule) for 60 consecutive bins (three times the smoothing SD). To calculate the ITI SD, we randomly pruned the number of ITI trials to the same number
of trials for which we calculated the PSTH.
We determined the similarity of the responses of two cells to a behavioral
event by calculating the correlation coefficient of the PSTHs. We denoted
this correlation the response correlation. The response correlation was calculated only for PSTHs with significant responses. To obtain the population
response correlation, we grouped all the correlation values, transformed
them by a z-transform (Sokal and Rohlf, 1981), and calculated their mean
and the standard error of the mean (SEM). The population mean and SEM
were obtained by inverse z-transform of these values. For the response
702 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc.
59
correlation analysis, we used a time window of 2 s starting at the event onset.
Because the neuormodulators have a short response, we also performed the
analysis on a time window of 1 s, and this analysis gave similar results.
Quantification of Similarity of Responses across Different Events:
Signal Correlation Analysis
For each neuron, we computed the PSTHs for all behavioral events (18 events).
For this analysis we used the first five 100 ms bins (with no Gaussian
smoothing) of the response. We combined all PSTHs into an 18 3 5 matrix,
where each row was a task event and each column was a 100 ms bin. For
each column, we subtracted that column’s mean and then flattened the matrix
into a vector of length 90 (18 events 3 5 bins). For each pair of simultaneously
recorded neurons, we computed the signal correlation by calculating the
correlation coefficient of these vectors. For the population average and SEM
we z-transformed the correlation coefficients (Sokal and Rohlf, 1981) calculated the average and SEM and obtained the inverse of the transform.
The response and signal correlation were also calculated for pairs of
neurons that were not simultaneously recorded and therefore were probably
more remote than neurons recorded simultaneously. Analysis of nonsimultaneously recorded cells generated similar trends as the simultaneous ones
(i.e., large positive correlations for the neuromodulators versus close to zero
average correlations for the main axis); however, correlation values were
generally smaller (data not shown).
Quantification of the Temporal Dynamics of the Noise Correlation:
JPSTH Analysis
The JPSTH analysis quantifies the temporal dynamics of the modulation of
correlations (Gerstein and Perkel, 1969; Aertsen et al., 1989). For this analysis,
we calculated the raw JPSTH matrix in which the (t1,t2)-th bin was the count of
the number of times that a coincidence occurred, in which neuron #1 spiked in
time bin t1 and neuron #2 spiked in time bin t2 on the same trial (see examples
in the first column of Figure S3). To correct for rate modulations we calculated
the PSTH predictor (Aertsen et al., 1989). The predictor matrix is the product of
the single-neuron PSTHs, i.e., the (t1,t2)-th bin is equal to PSTH1(t1)*PSTH2(t2)
(see examples in the second column of Figure S3). The JPSTH was calculated
as the subtraction of the number of coincident spikes expected by chance
(PSTH predictor) from the raw matrix (see examples in Figure S3). The JPSTH
was calculated in bins of 50 ms and smoothed with a two-dimensional
Gaussian window with an SD of 50 ms (1 bin).
We also corrected the raw JPSTH using the shift predictor. The different
predictors gave the similar results and no trend was found when calculating
the difference between these predictors (data not shown). We therefore
concluded that the data did not suffer from long-lasting trends because
such trends affected the shift predictor and the PSTH predictor differently.
We preferred the use of the PSTH correction in the graphical displays in this
manuscript because it results in less noisy estimates (Aertsen et al., 1989).
In the text, JPSTH refers to the JPSTH corrected by the PSTH predictor.
To group several JPSTHs from several events, we calculated the corrected
JPSTH of each event separately and then summed all corrected JPSTHs. For
example, the JPSTH for the reward cue is the sum of the corrected JPSTH of
the three cues with different probabilities (p = 1/3, 2/3, 1) of receiving reward.
We also normalized the JPSTH to obtain correlation coefficient values as introduced by Aertsen et al. (1989); i.e., each bin was divided by the SD of the trial to
trial response. Population analysis of the normalized and nonnormalized (but
corrected) JPSTH gave similar qualitatively results. In the text, JPSTH refers
to the corrected but not normalized JPSTH. To test whether the population
JPSTHs for two different events were significantly different, we performed
a bin by bin paired t test. The surprise values were obtained by transforming
the p value of this test by ln (p).
We carried out JPSTH analysis for both the neuromodulators and main axis
neurons; however, as we and others have shown, for the neurons of the main
axis of the basal ganglia, JPSTH analysis might lead to false detection of correlation dynamics due to variability in the motor-related responses (Arkadir et al.,
2002). Indeed many of the JPSTH matrices of the main axis neurons revealed
significant marginal effects of the PSTH. This indicates that the PSTH and shift
predictors were not able to correct the raw JPSTH reliably, and therefore we
excluded the main axis populations from the JPSTH analysis.
Results IV
Neuron
Basal Ganglia Correlations
SUPPLEMENTAL DATA
Supplemental Data include six figures and can be found with this article online
at http://www.cell.com/neuron/supplemental/S0896-6273(09)00350-X.
ACKNOWLEDGMENTS
This study was partly supported by the Hebrew University Netherlands Association (HUNA)’s ‘‘Fighting against Parkinson,’’ the Vorst family foundation
grants, FP7 ‘‘Select and Act’’ grant, and the Okinawa Institute of Science
and Technology (OIST).
Ben Shaul, Y., Bergman, H., Ritov, Y., and Abeles, M. (2001). Trial to trial
variability in either stimulus or action causes apparent correlation and
synchrony in neuronal activity. J. Neurosci. Methods 111, 99–110.
Bergman, H., Feingold, A., Nini, A., Raz, A., Slovin, H., Abeles, M., and Vaadia,
E. (1998). Physiological aspects of information processing in the basal ganglia
of normal and parkinsonian primates. Trends Neurosci. 21, 32–38.
Brody, C.D. (1999). Correlations without synchrony. Neural Comput. 11,
1537–1551.
Calabresi, P., Centonze, D., Gubellini, P., Pisani, A., and Bernardi, G. (2000).
Acetylcholine-mediated modulation of striatal function. Trends Neurosci. 23,
120–126.
Accepted: April 28, 2009
Published: June 10, 2009
Centonze, D., Gubellini, P., Pisani, A., Bernardi, G., and Calabresi, P. (2003).
Dopamine, acetylcholine and nitric oxide systems interact to induce corticostriatal synaptic plasticity. Rev. Neurosci. 14, 207–216.
REFERENCES
Cragg, S.J., and Rice, M.E. (2004). DAncing past the DAT at a DA synapse.
Trends Neurosci. 27, 270–277.
Abeles, M. (1982). Local Cortical Circuits (Berlin, Heidelberg, New York:
Springer-Verlag).
Aertsen, A.M., Gerstein, G.L., Habib, M.K., and Palm, G. (1989). Dynamics of
neuronal firing correlation: modulation of ‘‘effective connectivity’’. J. Neurophysiol. 61, 900–917.
Alexander, G.E., DeLong, M.R., and Strick, P.L. (1986). Parallel organization of
functionally segregated circuits linking basal ganglia and cortex. Annu. Rev.
Neurosci. 9, 357–381.
Aosaki, T., Tsubokawa, H., Ishida, A., Watanabe, K., Graybiel, A.M., and
Kimura, M. (1994). Responses of tonically active neurons in the primate’s
striatum undergo systematic changes during behavioral sensorimotor conditioning. J. Neurosci. 14, 3969–3984.
Arbuthnott, G.W., and Wickens, J. (2007). Space, time and dopamine. Trends
Neurosci. 30, 62–69.
Arkadir, D., Ben Shaul, Y., Morris, G., Maraton, S., Goldber, J.A., and Bergman, H. (2002). False detection of dynamic changes. in pallidal neuron interactions by the Joint Peri-Stimulus Histogram method. In The Basal Ganglia VII,
L.F.B. Nicholson and R.L.M. Faull, eds. (New York: Kluwer Academic/Plenum
Publishers), pp. 181–190.
Averbeck, B.B., and Lee, D. (2004). Coding and transmission of information by
neural ensembles. Trends Neurosci. 27, 225–230.
Day, J.J., Roitman, M.F., Wightman, R.M., and Carelli, R.M. (2007). Associative learning mediates dynamic shifts in dopamine signaling in the nucleus
accumbens. Nat. Neurosci. 10, 1020–1028.
DeLong, M.R. (1971). Activity of pallidal neurons during movement. J. Neurophysiol. 34, 414–427.
Eggermont, J.J. (1990). The Correlative Brain. Theory and Experiment in
Neuronal Interaction (Berlin: Springer-Verlag).
Frank, M.J., Seeberger, L.C., and O’Reilly, R.C. (2004). By carrot or by stick:
cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943.
Friston, K.J. (1995). Neuronal transients. Proc. Biol. Sci. 261, 401–405.
Fukai, T., and Tanaka, S. (1997). A simple neural network exhibiting selective
activation of neuronal ensembles: from winner-take-all to winners-share-all.
Neural Comput. 9, 77–97.
Gawne, T.J., and Richmond, B.J. (1993). How independent are the messages
carried by adjacent inferior temporal cortical neurons? J. Neurosci. 13, 2758–
2771.
Gerstein, G.L., and Perkel, D.H. (1969). Simultaneously recorded trains
of action potentials: analysis and functional interpretation. Science 164,
828–830.
Averbeck, B.B., Latham, P.E., and Pouget, A. (2006). Neural correlations,
population coding and computation. Nat. Rev. Neurosci. 7, 358–366.
Grace, A.A., and Bunney, B.S. (1983a). Intracellular and extracellular electrophysiology of nigral dopaminergic neurons—1. Identification and characterization. Neuroscience 10, 301–315.
Baker, S.N., Philbin, N., Spinks, R., Pinches, E.M., Wolpert, D.M., MacManus,
D.G., Pauluis, Q., and Lemon, R.N. (1999). Multiple single unit recording in the
cortex of monkeys using independently moveable microelectrodes. J. Neurosci. Methods 94, 5–17.
Grace, A.A., and Bunney, B.S. (1983b). Intracellular and extracellular electrophysiology of nigral dopaminergic neurons—3. Evidence for electrotonic
coupling. Neuroscience 10, 333–348.
Baker, S.N., Spinks, R., Jackson, A., and Lemon, R.N. (2001). Synchronization
in monkey motor cortex during a precision grip task. I. Task-dependent modulation in single-unit synchrony. J. Neurophysiol. 85, 869–885.
Bar-Gad, I., Ritov, Y., Vaadia, E., and Bergman, H. (2001). Failure in identification of overlapping spikes from multiple neuron activity causes artificial
correlations. J. Neurosci. Methods 107, 1–13.
Bar-Gad, I., Morris, G., and Bergman, H. (2003). Information processing,
dimensionality reduction and reinforcement learning in the basal ganglia.
Prog. Neurobiol. 71, 439–473.
Barbeau, A. (1962). The pathogensis of Parkinson’s disease: A new hypothesis. Can. Med. Assoc. J. 87, 802–807.
Bartho, P., Hirase, H., Monconduit, L., Zugaro, M., Harris, K.D., and Buzsaki,
G. (2004). Characterization of neocortical principal cells and interneurons by
network interactions and extracellular features. J. Neurophysiol. 92, 600–608.
Bayer, H.M., and Glimcher, P.W. (2005). Midbrain dopamine neurons encode
a quantitative reward prediction error signal. Neuron 47, 129–141.
Bayer, H.M., Lau, B., and Glimcher, P.W. (2007). Statistics of midbrain
dopamine neuron spike trains in the awake primate. J. Neurophysiol. 98,
1428–1439.
Graybiel, A.M., Aosaki, T., Flaherty, A.W., and Kimura, M. (1994). The basal
ganglia and adaptive motor control. Science 265, 1826–1831.
Haber, S.N., and Gdowski, M.J. (2004). The basal ganglia. In The Human
Nervous System, G. Paxinos and J.K. Mai, eds. (Amsterdam: Elsevier),
pp. 676–738.
Horvitz, J.C. (2000). Mesolimbocortical and nigrostriatal dopamine responses
to salient non-reward events. Neuroscience 96, 651–656.
Joshua, M., Elias, S., Levine, O., and Bergman, H. (2007). Quantifying the isolation quality of extracellularly recorded action potentials. J. Neurosci. Methods
163, 267–282.
Joshua, M., Adler, A., Mitelman, R., Vaadia, E., and Bergman, H. (2008).
Midbrain dopaminergic neurons and striatal cholinergic interneurons encode
the difference between reward and aversive events at different epochs of
probabilistic classical conditioning trials. J. Neurosci. 28, 11673–11684.
Joshua, M., Adler, A., Rosin, B., Vaadia, E., and Bergman, H. (2009). Encoding
of probabilistic rewarding and aversive events by pallidal and nigral neurons.
J. Neurophysiol. 101, 758–772.
Kakade, S., and Dayan, P. (2002). Dopamine: generalization and bonuses.
Neural Netw. 15, 549–559.
60
Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 703
Results IV Neuron
Basal Ganglia Correlations
Kimura, M., Rajkowski, J., and Evarts, E. (1984). Tonically discharging putamen neurons exhibit set-dependent responses. Proc. Natl. Acad. Sci. USA
81, 4998–5001.
Perkel, D.H., Gerstein, G.L., and Moore, G.P. (1967). Neuronal spike trains
and stochastic point processes. II. Simultaneous spike trains. Biophys. J. 7,
419–440.
Kimura, M., Matsumoto, N., Okahashi, K., Ueda, Y., Satoh, T., Minamimoto, T.,
Sakamoto, M., and Yamada, H. (2003). Goal-directed, serial and synchronous
activation of neurons in the primate striatum. Neuroreport 14, 799–802.
Pisani, A., Bernardi, G., Ding, J., and Surmeier, D.J. (2007). Re-emergence of
striatal cholinergic interneurons in movement disorders. Trends Neurosci. 30,
545–553.
Lau, B., and Glimcher, P.W. (2007). Action and outcome encoding in the
primate caudate nucleus. J. Neurosci. 27, 14502–14514.
Plenz, D. (2003). When inhibition goes incognito: feedback interaction between
spiny projection neurons in striatal function. Trends Neurosci. 26, 436–443.
Lee, D., Port, N.L., Kruse, W., and Georgopoulos, A.P. (1998). Variability and
correlated noise in the discharge of neurons in motor and parietal areas of
the primate cortex. J. Neurosci. 18, 1161–1170.
Raz, A., Feingold, A., Zelanskaya, V., Vaadia, E., and Bergman, H. (1996).
Neuronal synchronization of tonically active neurons in the striatum of normal
and parkinsonian primates. J. Neurophysiol. 76, 2083–2088.
Lewicki, M.S. (1998). A review of methods for spike sorting: the detection and
classification of neural action potentials. Network 9, R53–R78.
Mink, J.W. (1996). The basal ganglia: focused selection and inhibition of
competing motor programs. Prog. Neurobiol. 50, 381–425.
Morris, G., Arkadir, D., Nevet, A., Vaadia, E., and Bergman, H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active
neurons. Neuron 43, 133–143.
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., and Bergman, H. (2006). Midbrain
dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 1057–
1063.
Moss, J., and Bolam, J.P. (2008). A dopaminergic axon lattice in the striatum
and its relationship with cortical and thalamic terminals. J. Neurosci. 28,
11221–11230.
Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., and Hikosaka, O. (2004).
Dopamine neurons can represent context-dependent prediction error. Neuron
41, 269–280.
Nevet, A., Morris, G., Saban, G., Arkadir, D., and Bergman, H. (2007). Lack of
spike-count and spike-time correlations in the substantia nigra reticulata
despite overlap of neural responses. J. Neurophysiol. 98, 2232–2243.
Pan, W.X., Schmidt, R., Wickens, J.R., and Hyland, B.I. (2005). Dopamine cells
respond to predicted events during classical conditioning: evidence for
eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242.
Parent, A., and Hazrati, L.N. (1995). Functional anatomy of the basal ganglia. I.
The cortico-basal ganglia-thalamo-cortical loop. Brain Res. Brain Res. Rev.
20, 91–127.
Redgrave, P., and Gurney, K. (2006). The short-latency dopamine signal: a role
in discovering novel actions? Nat. Rev. Neurosci. 7, 967–975.
Schneidman, E., Bialek, W., and Berry, M.J. (2003). Synergy, redundancy, and
independence in population codes. J. Neurosci. 23, 11539–11553.
Schneidman, E., Berry, M.J., Segev, R., and Bialek, W. (2006). Weak pairwise
correlations imply strongly correlated network states in a neural population.
Nature 440, 1007–1012.
Schultz, W. (1986). Responses of midbrain dopamine neurons to behavioral
trigger stimuli in the monkey. J. Neurophysiol. 56, 1439–1461.
Schultz, W. (1997). Dopamine neurons and their role in reward mechanisms.
Curr. Opin. Neurobiol. 7, 191–197.
Schultz, W. (1998). Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27.
Schultz, W., Dayan, P., and Montague, P.R. (1997). A neural substrate of
prediction and reward. Science 275, 1593–1599.
Sokal, R.R., and Rohlf, F.J. (1981). Biometry (New York: W.H. Freeman & Co.).
Tunstall, M.J., Oorschot, D.E., Kean, A., and Wickens, J.R. (2002). Inhibitory
interactions between spiny projection neurons in the rat striatum. J. Neurophysiol. 88, 1263–1269.
Vaadia, E., Haalman, I., Abeles, M., Bergman, H., Prut, Y., Slovin, H., and
Aertsen, A. (1995). Dynamics of neuronal interactions in monkey cortex in
relation to behavioral events. Nature 373, 515–518.
Percheron, G., and Filion, M. (1991). Parallel processing in the basal ganglia:
up to a point. Trends Neurosci. 14, 55–56.
Yanai, Y., Adamit, N., Harel, R., Israel, Z., and Prut, Y. (2007). Connected
corticospinal sites show enhanced tuning similarity at the onset of voluntary
action. J. Neurosci. 27, 12349–12357.
Percheron, G., Yelnik, J., and Francois, C. (1984). A Golgi analysis of the
primate globus pallidus. III. Spatial organization of the striato-pallidal complex.
J. Comp. Neurol. 227, 214–227.
Zohary, E., Shadlen, M.N., and Newsome, W.T. (1994). Correlated neuronal
discharge rate and its implications for psychophysical performance. Nature
370, 140–143.
704 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc.
61
Results V
Journal of Neuroscience Methods 163 (2007) 267–282
Quantifying the isolation quality of extracellularly
recorded action potentials
Mati Joshua a,∗ , Shlomo Elias b , Odeya Levine b , Hagai Bergman a,b,c
a
The Interdisciplinary Center for Neural Computation, The Hebrew University, Jerusalem 91904, Israel
Department of Physiology, The Hebrew University-Hadassah Medical School, Jerusalem 91120, Israel
c Eric Roland Center for Neurodegenerative Diseases, The Hebrew University, Jerusalem 91904, Israel
b
Received 11 April 2006; received in revised form 18 March 2007; accepted 18 March 2007
Abstract
There have been many approaches to the problem of detection and sorting of extra-cellularly recorded action potentials, but only a few methods
actually quantify the quality of this fundamental process. In most cases, the quality assessment is based on the subjective judgment of human
observers and the recorded units are divided into “well isolated” or “multi-unit” groups. This subjective evaluation precludes comprehensive
assessment of single-unit studies since the most basic parameter, i.e. their data quality, is not explicitly defined. Here we propose objective
measures to evaluate the quality of spike data, based on the time-stamps of the detected spikes and the high-frequency sampling of the analog
signal of cortical and basal-ganglia data. We show that quantification of recording quality by the signal-to-noise ratio (SNR) may be misleading.
The recording quality is better assessed by an isolation score that measures the overlap between the noise (non-spike) and the spike clusters.
Furthermore, we use a nearest-neighbors algorithm to estimate the proportion of false positive and false negative classification errors. To validate
these quality measures, we simulate spike detection and sorting errors and show that the scores are good predictors of the frequency of errors.
The reliability of the isolation score is further verified by errors implanted in real basal ganglia data and by using different sorting algorithms.
We conclude that quantitative measures of spike isolation can be obtained independently of the method used for spike detection and sorting, and
recommend their reports in any study based on the activity of single neurons.
© 2007 Elsevier B.V. All rights reserved.
Keywords: Spike sorting; Multi-unit recording; Signal detection
1. Introduction
The problem of extracting single neuron activity from
extracellular recordings has been investigated extensively and
comprehensively reviewed (e.g. Lewicki, 1998). The process
of detecting action potentials from the extracellular waveforms
(spikes) and clustering them into different neuronal sources is
known as spike detection and sorting. Spike detection and sorting algorithms are not perfect and classification errors can occur
for a number of reasons. First, most algorithms are not fully
automatic (e.g. Abeles and Goldstein, 1977; Worgotter et al.,
1986; Bergman and DeLong, 1992) and their real-time use can
lead to human errors (Wood et al., 2004). Second, inaccurate
assumptions about the data can also lead to errors. Some algo-
∗
Corresponding author. Tel.: +97226757168.
E-mail address: [email protected] (M. Joshua).
0165-0270/$ – see front matter © 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.jneumeth.2007.03.012
62
rithms presuppose a parametric statistical model (Lewicki, 1994;
Pouzat et al., 2002, 2004; Shoham et al., 2003), whereas other
algorithms are based on non-parametric assumptions (Fee et al.,
1996a). In both cases these assumptions, whether explicit or
implicit, may be violated. For example, the analog trace in Fig. 1
shows significant modulation of the spike waveforms and illustrates how the stationarity (waveform stability) assumption may
be violated and thus lead to classification errors.
Although many approaches to the problem of sorting spikes
have been put forward, only a few methods have been developed to quantify the quality of the spike sorting (Harris et al.,
2001; Pouzat et al., 2002; Schmitzer-Torbert et al., 2005). In
most cases, the quality assessment is done subjectively by a
human observer, and units with high scores are then reported as
having a “high signal-to-noise ratio” and being “well isolated”.
These subjective reports do not permit comparison of data quality across different studies and unfortunately are predisposed to
personal bias.
Results V
268
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
Fig. 1. An extreme example of non-stationarity of extracellular recording. Instability in the extracellular recording can lead to misclassifications by the spike sorting
algorithm. (a–d) A single trace of the extracellular recording, at different time scales (b depicts the arrow-marked interval in (a), etc.). In (c and d) spikes detected
by the real-time spike sorter are marked by black dots; high noise events (events that crossed threshold but not classified as spikes) are marked by gray triangles. (e)
Average of squared peak-to-peak differences of spike waveform, over all time intervals as a function of time starting from the real-time detection. This is a gross
measure of the change in spike waveform shape in time, similar to the autocorrelation function. Note the large periodic changes at 0.66 Hz and small periodic changes
at ∼3 Hz. These are probably due to periodic changes in electrode position, caused by respiratory (∼40 min−1 ) and cardiac (∼180 min−1 ) waves, respectively. (f)
Events classified in real-time as a single unit are in black; the noise events are in gray. Note that the noise cluster contains two different classes of events. One
class forms a smooth continuum with the boundaries of the spike cluster (probably missed spikes). The other class can be dissociated from the spike cluster due to
its smaller waveforms (probably spikes from other units). The scores of this unit are: isolation score, 0.93; false negative score, 0.12; false positive score, 0.002;
SNRNo Spk , 5.35; SNRSpk , 5.26. The false negative score is suggestive of the instability shown above.
In this article we propose objective measures to assess the
quality of spike detection and sorting. Our measures quantify
two different aspects of the data:
1. Quality of the recording, by calculating SNR (Section 3.1).
We present and discuss two calculations of the SNR that
differ in their noise estimation. The first is based on the noise
when an action potential occurs and the second is based on
the noise between action potentials.
2. Clustering quality. We introduce an isolation score for
quantifying the overlap between the spike and the noise
(non-spike) clusters (Section 3.2). We then present classification error scores that estimate the fraction of events
that were misclassified as spikes (false positive errors)
or misclassified as noise-events (false negative errors)
(Section 3.3).
To validate these measures, we simulate spike-sorting errors
(Section 3.4.1) and test the isolation score and classification
error scores as a function of the fraction of simulated errors for
different units. We check the scores under different conditions
by applying several clustering algorithms (Section 3.4.2). We
use real data from the basal ganglia and simulated errors to
investigate the score parameter space (Section 3.4.4). Finally,
we compare the results of the different scores (Section 3.4.5).
2. Methods
2.1. Neuronal recording procedures
The data were collected from experiments performed on two
vervet monkeys (Monkey Cu: Cercopithecus aethiops, female,
weighing 3.5–4 kg and monkey T, female, weighing 3 kg) and
63
Results V
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
two Macaque fascicularis (monkey Y, male, weighing 5 kg and
monkey P, female, weighing 3 kg). Details of the behavior of
the monkeys and animal care are described elsewhere (Heimer
et al., 2002; Morris et al., 2004; Elias et al., 2007). Recordings were made in the external segment of the globus pallidus
(GPe), a central nucleus of the basal ganglia and in the primary
motor cortex (M1, monkey T only). Animal care and surgical
procedures were in accordance with the NIH Guide for the Care
and Use of Laboratory Animals (1996) and the Hebrew University guidelines for the use and care of laboratory animals in
research, supervised by the institutional committee for animal
care and use.
During the recording sessions, glass-coated tungsten microelectrodes (impedance at 1 kHz equals 0.2–0.6 M) were
advanced to the target. Neuronal activity from each electrode
was amplified (monkey P Y and T × 5000, monkey Cu ×
10,000), filtered (monkey P and Y: 1–6000 Hz, monkey Cu and
T: 300–6000 Hz), and continuously sampled at 24 kHz/electrode
(AlphaMap, Alpha-Omega Engineering, Nazareth, Israel).
269
a score of 1 meant perfect isolation, where the experimenter
judged that close to 100% of the spikes emitted by a single neuron were detected with no false detections (zero false positive
and negative errors). Scores between 3 and 4 meant that most
(but not all) spikes generated by a neuron were detected (small
fraction of false negative errors), but still with a negligible fraction of false detections (“no false positive errors”). Scores of 5–6
meant a mixture of two to three units. Finally, a grade of 7–8
meant a recording of multi-unit activity (significant fraction of
false negative and positive errors).
2.3. Algorithm development
All the functions used for both the analysis and algorithm
implementations are Matlab 7.1 (Mathworks, Natick, MA, USA)
compatible, and are available at: http://alice.nc.huji.ac.il/∼mati/
sorting quality programs.
2.4. Data preprocessing and event representation
2.1.1. Real-time spike detection and sorting algorithm
The electrode output was processed and classified in real
time (MSD, ASD, Alpha-Omega Engineering) by a templatematching algorithm (Worgotter et al., 1986). The electrode
signal was continuously sampled at 36–50 kHz, placed in a
buffer containing the last 100 samples (2–2.8 ms), and compared continuously with one to three templates. Each template
was constructed of eight equally spaced points separated by
five sampling points (e.g. 0.1 ms for the 50 kHz sampling),
and was defined by the experimenter following a learning process of threshold crossing signals. The sum of squares of the
differences between eight points in the buffer and the templates was calculated. When this sum reached a minimum that
was below a user-defined threshold, detection was hardware
reported. In cases where a buffer was double matched (e.g. a
signal passed the criterion of more than one template), an error
signal was given to the user, but no hardware report was created.
A dead time of 0.06 ms followed detection. The timing of the
hardware detections (100 ␮s active-low TTL pulses) was edge
sampled at 12 kHz (33 kHz in monkey T) in parallel with the
analog signals of the electrode output. During recording sessions, the experimenter closely followed the spike shape and
discharge rate. The experimenter graded the isolation quality
approximately every 2–4 min (see below) and when necessary
adjusted the template, detection threshold, or rarely the electrode
position.
The initial input for the estimation of the spike isolation quality consisted of the time stamps of the detected spikes (spike
trains) and the entire analog signal. We defined two clusters of
events: (i) spike cluster – a cluster classified as a single unit and
(ii) noise cluster – a cluster of events not classified with that unit.
The spike cluster was simply constructed from segments of the
analog signal according to the spike trains. The noise cluster
was extracted from the same analog trace, and contained events
that were not detected as spikes of the given unit. However, the
noise events had some similarity to the events in the spike clusters (e.g. similar amplitude, see details below). Each event, in
both spike and noise clusters, was represented as a point in a
high-dimensional space. Fig. 2 and Sections 2.4.1–2.4.3 depict
step by step the extracting of the spike and noise clusters.
2.2. Real-time grading of isolation quality
2.4.2. The spike cluster
To represent an event we used 1.5 ms (144 sampling points
after the cubic spline interpolation) of the corresponding upsampled analog trace. The resulting vector, whose ith value is
the voltage measured after i time steps from the beginning of the
event, can be viewed as a point in a high-dimensional space:
In most cases one experimenter controlled the position and
spike sorting of four electrodes. The quality of the detection and
spike sorting was estimated on-line experimenter. This quality
estimation was based on the superimposed analog traces of the
recently (20–100) sorted spikes as well as the waveforms of
events that passed an amplitude threshold that was set by the
experimenter but were not classified as spikes. The grade scale
ranged from 1 (highest score) to 8 (lowest score). Generally,
64
2.4.1. Up-sampling using cubic spline
Discrete sampling of analog traces leads to a time jitter
between superimposed frames of events (Fee et al., 1996a;
Pouzat et al., 2002). This jitter contributes to the variability in
extracellular waveforms from the same cell (Fig. 2c). To reduce
this variability, we up-sampled the data using cubic spline interpolation (Fig. 2d). The factor by which we up-sampled the data
was fixed at 4; using this value for the up-sampling factor had a
maximal effect on the reduction of variability between our data
waveforms (data not shown).
= (V1 , V2 , . . . , V144 )
X
(1)
All events were aligned by the largest negative peak. The offset
of this peak from the beginning of the event was set to 0.5 ms
Results V
270
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
Fig. 2. Preprocessing. The process of extracting the spike and noise clusters. (a) The raw analog trace (1–6000 Hz band-pass hardware filtered and digitally sampled
at 24 kHz). (b) Analog trace after filtering with a digital high-pass filter (>300 Hz, two pole Butterworth filter, a zero-phase forward and reverse digital filter, Matlab
filtfilt function). (c) Superimposed waveforms of spikes extracted according to classification by the spike sorter (spike cluster), and aligned to the largest negative
peak (before the spline upsampling). Note the large variability during the fast phase of the action potential that results from the limited sampling rate. (d) Spike
cluster after upsampling the events and realignment to the negative peak revealing reduction in variability compared to (c). (e) The noise cluster, detected by threshold
crossing. The same upsampling and alignment process was used. The scores of this unit are: isolation score, 0.98; false negative score, 0.01; false positive score, 0;
SNRNo Spk , 2.57; SNRSpk , 2.52.
(i.e. V48 ). Our extracellular recordings are from neurons in the
GP and the primary motor cortex with a large negative phase.
Hence, we used the largest negative peak for alignment of the
spike vector (however one can easily generalize this algorithm
to positive peaks). Finally, the aligned vector was normalized
to have a zero mean. The cluster of up-sampled, aligned and
normalized spike events is denoted as Scluster .
2.4.3. The noise cluster
Detection of the events comprising the noise cluster was
based on threshold amplitude crossing. Because of the typical
spike shape in our extra-cellular recordings, we only used a negative (lower) threshold to detect the noise cluster (however one
can easily generalize this algorithm to upper or dual, i.e. upper
and lower, threshold crossing events).
The noise cluster was constructed in the following manner.
First we selected from Scluster the 2% of the spikes with the
smallest negative peak (closest to zero).We then took the average
of these negative peaks and defined the threshold as half of this
value:
threshold =
average negative peak0.02
2
(2)
where 0.02 is the fraction of spikes used for calculation of the
average negative peak.
Next, we identified all events that crossed this threshold, but
removed the events already marked as spikes. Finally, we upsampled and aligned the noise events (0.5 ms offset similar to
the spike waveforms) by the local minimum between the first
two (down and up) threshold crossings (Fig. 2e).
The noise cluster models all high amplitude events that are
not classified as spikes from the given unit. The noise-cluster
should contain all unclassified putative spikes; i.e. events that
are close, but not in, the spike cluster. This is achieved by
using only the Sclsuter events with the smallest negative peaks
to determine the threshold. The spike sorting quality measures
are insensitive to inclusion of more noise event crossings; i.e.
with a more conservative threshold. To verify this insensitivity
we modified the threshold parameters and found that the quality
measures were stable (see below). Therefore, we recommend the
use of a conservative threshold that ensures that the noise cluster contains putative spikes even when the noise cluster is overly
large.
3. Results
Spike detection and sorting quality depends first on the
recording quality and then on the quality of the clustering
algorithm. To evaluate recording quality we used the signal-tonoise ratio (SNR) (Section 3.1). Although the SNR can be used
for initial estimation of recording quality and a high SNR is
usually a necessary condition for good unit isolation, the SNR
is not a direct measure of the isolation of a single unit. Sorting
of recordings with a high SNR may nonetheless result in a
spike cluster that excludes spikes (false negative errors, e.g.
Fig. 1) or a cluster composed of two large units (false positive
errors). We therefore applied more direct measures of cluster
quality by measuring the isolation of the spike cluster from
the noise cluster: the isolation score (Section 3.2), and false
positive and false negative measures (Section 3.3). In Section
65
Results V
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
3.4 we compare and validate the scores under different sorting
methods and simulated error frequencies.
3.1. Signal-to-noise ratio measures
Several previous studies have taken a initial step towards
assessing the quality of spike data by reporting some variants
of the spikes SNR (Pare and Gaudreau, 1996; Likhtik et al.,
2005). However, there is no explicit definition of the SNR in
these reports, making them very difficult to compare. The spike
signal-to-noise ratio can be computed in two ways. Both methods compute the signal in the same fashion but differ in their
noise calculation.
3.1.1. Signal calculation
The average of Scluster (up-sampled and aligned by the negative peak) is defined as:
1
X
(3)
Savg ≡
|Scluster |
x ∈ Scluster
We quantify the signal as the difference between the minimum
and the maximum of the average spike waveform (Fig. 3a):
peak-to-peak ≡ Max(Savg ) − Min(Savg )
(4)
We prefer using the peak-to-peak to quantify the signal rather
than other methods that integrate the area enclosed by the spike
271
waveform (spike energy). This is because the peak-to-peak signal value does not depend on the duration of the spike waveform,
which is conditioned by the filter and the edge detection parameters.
3.1.2. Noise calculation
We quantified noise in two ways:
1. The noise underlying the spike events, which corresponds to
the intra-cluster variability (Fig. 3b). For each spike waveform, Xk , we subtracted the mean waveform, Savg , to produce
Residk . We then concatenated all resulting residuals to produce one long vector Resid. The noise is then defined as the
standard deviation of this vector:
NoiseSpk ≡ S.D.(Resid)
(5)
Since our filters exclude the low frequencies and we use segments larger than most spikes, we can disregard the increase
in variability that may be created by the concatenation process.
2. The noise from the inter-spike-intervals (Fig. 3c), where
spikes are defined as events in the signal cluster. For each
spike in the signal cluster, we extract the 1.5–3 ms period
before the spike event negative peak, Prevk (Fig. 3c), unless
another spike from the cluster occurred in that interval. We
then concatenate all such intervals to produce one long vector
Fig. 3. Signal-to-noise ratio calculation. (a) Average spike waveform (solid line) and peak-to-peak (dotted lines). (b) NoiseSpk . For a given spike event (dashed gray
line), we subtracted the average spike waveform (dotted black line) which results in the noise during the spike event (the solid black line). We then concatenated
segments from all spikes and calculated the standard deviation of this vector. (c) NoiseNo Spk . For a given spike we extracted the analog trace 1.5–3 ms before the
negative peak of the spike event (between the dashed lines). We then concatenated all these traces from all spikes and calculated the standard deviation of this vector.
(d) SNRNo Spk vs. SNRSpk . The SNR scores are the ratios between the peak-to-peak and the noise estimations (S.D. × 5). The SNR scores for 155 GP units. Scores are
highly correlated (R2 = 0.94). Generally, for high values—SNRNo Spk is larger than SNRSpk (large values tend to be above the line Y = X), as changes in the waveform
that contribute to NoiseSpk but not to NoiseNo Spk are more likely when the electrode is close to the cell. On the other hand, when the scores are small, SNRSpk is
larger (small values are below the Y = X line), probably due to failure in detecting overlapping spikes.
66
Results V
272
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
Prev. The noise is then defined equivalently:
NoiseNo Spk ≡ S.D.(Prev)
noise), Y:
(6)
The two SNR scores are highly correlated in our data (Fig. 3d).
There are some cases where these two measures are not equal.
When the waveform of a single unit changes (e.g. due to electrode drift, or intrinsic firing properties), or when the spike
cluster actually reflects multi-unit instead of single-unit activity,
NoiseSpk will be larger than NoiseNo Spk . Surprisingly, the opposite can also occur. For example, when the spike of a second unit
temporally overlaps with the spike of the given unit, the sorting
algorithm may drop these spikes (Bar-Gad et al., 2001). As a
result, the second unit will contribute only to NoiseNo Spk as its
coincidence with the first spike is ignored.
3.1.3. Signal-to-noise ratio
We define the two signal-to-noise ratios as simply:
SNR ≡
peak-to-peak
Noise × C
(7)
where Noise is calculated by one of the two methods and C is a
scaling factor (commonly set by us to 5) which scales the noise
measures to peak-to-peak equivalent units. Examples of spikes
and their SNR measures are given in Figs. 1, 2, 6 and 8.
3.2. Isolation score
As stated above, SNR might be problematic, especially in
cases where the spike cluster actually reflects high amplitude
multi-unit activity. We therefore then assessed the quality of
the spike isolation directly. The isolation score quantifies the
distance between the spike cluster and the noise cluster. We
computed this distance on the raw events directly, without mapping the spikes to some feature space, e.g. PCA (Abeles and
Goldstein, 1977) or wavelet transform (Quiroga et al., 2004;
Nenadic and Burdick, 2005).
3.2.1. Mandatory features of the isolation score
The isolation score needs to exhibit several critical properties:
1. The score should decrease with the number of real spikes
missed by the sorting algorithm (false negatives).
2. The score should decrease with the number of noise events
that were classified as spikes (false positives).
3. The score should be insensitive to the size of the extracted
noise cluster.
4. The score should span an intuitive range, e.g. 0–1.
3.2.2. Isolation score: definition
The isolation score quantifies the distance between events in
the spike cluster to the noise cluster. Nevertheless, since we are
only interested in the spike cluster events, this measure is not
symmetric. First, we compute the normalized similarity between
each event in the spike cluster, X, to all other events (spikes and
Similarity(X, Y ) ≡ exp
−d(X, Y )λ
d0
(8)
where d(X,Y) is the Euclidean distance between vectors X,Y.
Note that Similarity(X,Y) between close events is close to one
(exp(0)), whereas between distant events it is closer to zero
(exp(−∞)). d0 is the average Euclidian distance in the spike
cluster; this parameter normalizes the Euclidian distance to avoid
dependence on the units of a particular data set. The exponent
function stretches the Euclidean distance nonlinearly; thus Similarity(X,Y) of remote events become infinitesimally small. λ is a
gain constant (0 < λ ∞) that sets the gain of this stretch. With
λ 1, all events are similar and Similarity(X,Y) is close to one,
whereas with λ 1, all events are dissimilar and Similarity(X,Y)
become infinitesimally small.
In order to turn the above similarity index into a probabilitylike quantity (positive values that sum to 1), we normalize it by
the sum of similarities between a given event, X, from the spike
cluster, to all other events (spikes and noise):
exp(−d(X, Y )(λ/d0 ))
Z=X exp(−d(X, Z)(λ/d0 ))
PX (Y ) ≡ (9)
For each event X we get a function PX that takes the form of
the Boltzmann–Gibbs distribution, also known as “softmax”
(Goldberger et al., 2004). Note that the parameter λ controls
the “softness” of the max operation; i.e. λ behaves like 1/temperature in some notations of the softmax equation. For a given
event X when λ approaches infinity (zero temperature) PX (Y) is
the deterministic probability function; i.e. PX (Y) = 1 for the event
nearest X and zero for all other events. On the other hand when
λ approaches zero (maximal temperature) PX (Y) is the uniform
distribution; i.e. PX (Y) is equal for all events. In this manuscript
we used λ = 10; i.e. we stretched the distances between near and
remote events.
In the next step, for each event in the spike cluster, X, we sum
over all the normalized similarity values PX (Y) for all the Y’s in
the spike cluster:
PX (Y )
(10)
P(X) ≡
Y ∈ Scluster
P(X) is therefore a measure of how close event X is to the spike
cluster compared to the noise cluster. Intuitively, P(X) is the
probability that event X belongs to the spike cluster. The calculation of P(X) is illustrated in Fig. 4 (note that P(X) and PX (Y),
Eqs. (10) and (9), respectively, are not the same).
The isolation score is defined as:
1
isolation score ≡
P(X)
(11)
|Scluster |
X ∈ Scluster
and can be intuitively considered as the average probability that
an event classified as a spike belongs to the spike cluster. Thus,
our isolation score is a combination of two approaches:
1. Quantifying the connectivity between two clusters using the
energy at the interface of the two groups (Fee et al., 1996a).
67
Results V
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
273
tant from the spike cluster, contributes only small additional
values to PX (Y).
4. The isolation score is the average of probability-like values
and hence is bounded between 0 and 1.
It is crucial to note that the isolation score does not measure
the distance between the noise and spike distributions directly.
Nor does it directly measure the performance of the clustering
procedure. Rather, it measures how far away the noise and the
spike clusters are. It is similar to the gap measure common in
classification discussions, except that it recognizes that there is
no real gap between the two clusters.
3.3. Scores of classification errors
Fig. 4. Calculation of the isolation score. Calculating the proximity of a spike to
the spike cluster, relative to the noise cluster. This figure is a schematic representation of the isolation score calculation. The x–y coordinates represent the 144
dimensions of a waveform from the spike and noise cluster. The gray triangles
represent points in the noise cluster, whereas the black squares represent spike
events in the spike cluster. For a given point X in the spike cluster (black oval),
the numbers next to each of the other points, Y, are PX (Y). The arrows denote the
Euclidian distance. Finally, P(X) for the given point (black oval), is the sum of
all PX (Y) values for all other spike events (black squares). Note that for events
far from X, PX (Y) is infinitesimal, and hence they have only a small influence
on the P(X). On the other hand, noise events that are close to the spike cluster
significantly decrease the P(X) values (e.g. gray triangle in the upper right-hand
corner).
2. Grading the distance of two events using the “softmax over
Euclidean distances” function (Goldberger et al., 2004).
The range of the isolation score is from 0 to 1, where a score
of 1 means ideal isolation, with minimal distances between the
elements of the spike cluster, and a large distance between them
and all the elements of the noise cluster. A score close to zero
means very poor isolation, where the Euclidian distance among
elements in the spike cluster is larger than the Euclidian distances
between them and the noise cluster; i.e. elements from the spike
cluster are surrounded by elements from the noise cluster.
The isolation score satisfies the requirements defined in Section 3.2.1:
1. Spikes that were missed by the spike-detection or sorting
algorithm (false negatives) are nonetheless close to the spike
cluster. As a result, events in the cluster that are close to such
misses will have a reduced P(X) (Fig. 4), which in turn will
reduce the overall isolation score.
2. Likewise, noise events that were classified as spike events
(false positives) are close to the noise cluster. For these false
positives the P(X) value is reduced, due to their proximity
to the other noise events, thus again reducing the overall
isolation score.
3. The isolation score is insensitive to the size of the noise cluster. This is a result of the exponential decay of the similarity
value, PX (Y), between a spike event X and a distant event Y.
Therefore, adding more noise events, which are mostly dis-
68
As described in the previous Section 3.2, the isolation score
quantifies the separation of the spike cluster from other events,
but it does not estimate the number of spikes missed by the spike
detection and sorting process or the number of noise events that
were erroneously classified as spikes (false negative and positive
errors, respectively). Moreover, the isolation quality measure
cannot separate these errors. However, some physiological studies are more sensitive to one of the two errors and therefore their
separate estimates may provide a better database for these experiments. In this section we describe a method for estimating these
errors. For each event we find its K nearest neighbors (KNN)
(Vapnik, 1998) and compare the classification of the majority
of these neighbors to the event classification (produced by the
sorting algorithm). This method is illustrated in Fig. 5.
3.3.1. False negatives score
False negatives are spikes that were missed by the spike detection or sorting algorithm. We estimate these by the number of
noise events having most of their K nearest neighbors (see below
Fig. 5. Illustration of the KNN algorithm for estimating classification error
scores. The x–y coordinates represent the 144 dimensions of a waveform from
the spike and noise cluster. Black squares represent spike events, gray triangles
represent noise events. The notations are similar to those used in Fig. 4. For each
event we calculated the K nearest neighbors (KNN, here K = 3). Spike events
having most of their KNN from the noise cluster are considered false positives;
similarly noise events with a majority of their KNN from the spike cluster are
considered false negatives.
Results V
274
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
for the choices of K) from the spike cluster, denoted Nfn . The
false negatives score is thus defined as:
false negatives score ≡
Nfn
Nfn + |Scluster |
(12)
When the number of false negatives is small the score is close to
zero. The score then increases with the number of false negatives.
When all the real spike events are missed (Nfn |Scluster |) the
score reaches the maximum of 1. However, in practical terms
the score cannot reach this limit (see below Section 3.3.4).
3.3.2. False positives scores
Similarly, we estimate the number of false positive events
(events that were classified as spike events but are noise events):
false positives score ≡
Nfp
|Scluster |
(13)
where Nfp is the number of spike events having most of their
K nearest neighbors as noise events. When the number of false
positives is small the score is close to zero. As this number
increases the score increases. When all spike cluster events are
surrounded by noise events (Nfp = |Scluster |) the score reaches the
maximum of 1.
3.3.3. Choosing K
Choosing inappropriate values of K leads to biases in the
classification error scores. For example, if too small a value is
chosen for K, false negative events may erroneously lead their
correctly classified spike-event-neighbors to be considered as
false positives. Likewise, to take an extreme example, when the
spike cluster is larger than the noise cluster, using a K value that is
larger than twice the size of the noise cluster will cause all noise
events to be considered false negatives. Generally large values
of K may lead to biased estimations of error rates of events that
are close to the boundaries of the clusters.
In this study, we selected an intermediate value for K, such
that a small number of clustering errors did not cause a large bias,
and the K value was far smaller than the size of both clusters.
Typically, our validation tests were performed on clusters that
contained 1500 events, using K = 31. In our experience, a good
rule of thumb is that K should equal 1–5% of the number of
events in the signal cluster.
3.3.4. Using the classification error scores
The classification error scores are a refinement of the isolation score. These false positive and negative estimates may help
constrain neuronal data analysis. For example, the existence of
false positives is one reason one should not expect to find a
perfectly oscillatory cell, or should not be surprised by multiparameter encoding of a single neuron. Naı̈ve use of these error
scores, however, may be misleading. When the spike and noise
clusters overlap highly these scores are biased (Figs. 6b and c,
7c and d and 9c and d). Thus, when a large ratio of the spike
events are missed, real spike events from the spike cluster may
have most of their KNN from the noise cluster and hence be
considered false positives; for the same reason the estimation of
false negatives will be low. This is the reason we argued (Section
3.3.1) that the false negative score will not reach its theoretical
upper limit of 1. Similarly, real noise events may have most of
their KNN from the spike cluster and hence be considered false
negatives. Nonetheless, the isolation score measures the overlap between the spike and the noise clusters. When the isolation
score is high the classification error scores are good estimates
of the frequencies of false positive and negative errors and can
be used. When the isolation score is low, the error classification
scores are biased; however low isolation scores should dissuade
us from using the data and therefore further refinement of the
errors is unnecessary.
3.4. Validation of the isolation scores by simulation and
real data
3.4.1. Random simulation of false negative and false
positive errors
To test the efficiency of the various scores we simulated
spike-sorting errors and calculated the isolation and the classification error scores (Fig. 6). The error simulation was carried
out by modifying well-sorted data (original isolation quality
>0.99, less than 1% false errors) of four real-time detected GPe
neurons with different signal-to-noise ratios (Fig. 6a). To validate the quality of the real-time sorting of the selected units
we further examined the data using the off-line PCA method
and also checked for inconsistency by screening of the analog
signal.
To simulate false negative errors we eliminated spike events
from the spike cluster and marked them as noise events (Fig. 6b).
The independent variable was the ratio between the number of
eliminated (i.e. missed) spikes and the real number of spikes
(false negative ratio). Zero means no false negatives were generated and 1 means all spikes are classified as noise events.
As expected the isolation score was close to 1 when the ratio
of missed events was 0, and dropped to 0.5 when the missed
ratio was 0.5 (Fig. 6b1). We conclude that the isolation score
decreases linearly with the ratio of missed spikes when they are
equally distributed. Moreover, the scores of the four different
units were highly correlated (R2 > 0.99). This demonstrates the
consistency of the isolation score; i.e. the same ratio of errors
yields the same isolation score.
The estimated false negative score was a good estimation of
the simulated false negative ratio values between 0 and approximately 0.35 (Fig. 6b2). When the fraction of simulated errors
was above 0.35 the estimation of the false negative was noisy, and
it fluctuated around 0.35. The false positive score was a valid
estimator when the fraction of simulated false negative errors
was between 0 and 0.3 (Fig. 6b3); in this range the estimate rate
of the false positive errors was close to zero, as expected. However, a ratio of 0.3 or more of simulated false negatives caused
the estimate of the false positive error to erroneously increase.
By contrast to isolation and classification scores the SNRSPK
does not change as a function of the simulated false negative
ratio (Fig. 6b4).
To simulate false positive errors we added events from the
noise cluster to the spike cluster (Fig. 6c). The independent variable was the ratio between the number of noise events included
69
Results V
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
275
Fig. 6. Simulation of false positive and negative errors. False negatives were simulated by random reclassification of spike events as noise events. The independent
variable is the ratio between the number of false negatives introduced and the size of the original spike cluster. The false positive errors were simulated by setting the
noise cluster to be 10 times the size of the spike cluster and reclassifying noise events as spike events. The independent variable is the ratio of the number of false
positives to the size of the spike cluster after reclassifying. (a) Spike waveforms from four well-sorted units with different signal-to-noise ratios. (b) Simulation of
false negative errors. (b1) Isolation score. The score decreases with the number of false negatives; the difference between the units is negligible. (b2) False negative
score. The score predicts the ratio of simulated false negatives well when the fraction of misclassified units is below 0.3. In this range, the difference between the
score and the error ratio is less than 0.02. For larger simulated error ratios the score is misleading; instead of increasing, the score is bounded by 0.45. (b3) False
positive score. The score predicts the false positive errors well when the fraction of misclassified units (simulated false negative) is below 0.3. For large error ratios
of simulated false negatives the false positive score is misleading; instead of remaining at zero the score rises to 0.5. (b4) SNRSPK . The SNRSPK does not change as
a function of the false negatives. (c) Simulation of false positive errors. (c1) Isolation score. The score decreases with the number of simulated false positive errors;
no significant difference was found between the different units. (c2) False negative score. When the ratio of simulated false positive errors is larger than 0.3 the score
increases from 0.01 to 0.08 due to biases when the noise and the spike cluster overlap. (c3) False positive score. The score follows the ratio of simulated errors (less
the 0.02 difference). (c4) SNRSPK . The SNRSPK decreases with the number of false positives; however the SNRspk of different units is not consistently modified by
the fraction of simulated false positives.
in the spike cluster and the size of the spike cluster (false positive ratio). Zero means no false positives were generated and
0.5 means the number of simulated false positives was equal to
the number of real spikes in the spike cluster. Unlike the case of
simulated false negative errors, the reduction in size of the noise
cluster may influence this simulation. To minimize such effects
we set the noise cluster to be 10 times the size of the spike cluster before generating the errors. The isolation score was close
to 1 when the error ratio was 0 and decreased to 0.55 when the
simulated error ratio was 0.5 (Fig. 6c1). The changes in the iso-
70
lation score as a function of the simulated false positive error
were highly correlated (R2 > 0.99) for the four different units
depicted in Fig. 6. The false positive score was a good estimation of the ratio of errors; the difference between the score and
the ratio of the simulated errors was less than 0.02 (Fig. 6c3). The
false negative score changed only slightly when the simulated
false positive error ratio was less than 0.3; when the error ratio
increased the false negative score increased to 0.08 (Fig. 6c2).
The SNRSPK decreased with the number of false positives, however the effect on the different units was not consistent; i.e. the
Results V
276
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
ratio of simulated false positives had different effects on the
SNRspk of the four tested units (Fig. 6c4).
In conclusion, the error simulations verify that the isolation
score can be a measure of the extent to which noise events and
spike events overlap. The simulation results also emphasize the
fact that the classification error scores have a range of good predictability that is dependent on the overlap between clusters and
hence on the isolation score. In this range of good predictability
(e.g. for isolation scores >0.70) the false positive and negative
scores should be used as a refinement of the isolation score. The
results also demonstrate that SNR is misleading; it does not follow the false negatives ratio nor does it have a scale in which
different units with the same ratio of false positives have the
same score.
Fig. 7. Effects of different sorting algorithms on the isolation scores. We generated sorting errors using three different sorting methods on the data of unit 1 of Fig. 6.
(a) The scores and the real ratio of classification errors as a function of the criterion used in the sorting method. (a1) Amplitude threshold crossing classification. (a2)
Template matching algorithm. We used a training set of 200 spikes to generate a 12-point template. We then calculated the distance of all events to this template and
applied a threshold to classify an event as a spike. (a3) Projection on the average template. Similar to the template matching method we projected the data on the
average template (1.5 ms, 36 points) defined by a training set of 200 spikes. (b) The isolation score as a function of the ratio of real false positive and negative errors.
The different lines are the scores given when using different sorting methods. The isolation score was consistent across the sorting methods. (c) False negative score
as a function of the real ratio of errors. (d) False positive score as a function of the ratio of errors.
71
Results V
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
3.4.2. Sorting errors and the isolation scores
To further validate our scores under different sorting conditions we simulated clustering errors using different sorting
methods. From the four well-isolated units in Section 3.4.1 we
took the one with the lowest signal-to-noise ratio (Fig. 6, unit 1)
and re-clustered the continuous sampled analog data using three
clustering methods: (1) threshold crossing—all events that pass
a threshold are marked as spikes. (2) Template matching—we
used a training set to calculate the average spike waveform and
then implemented an off-line algorithm similar to our real-time
eight-point template matching algorithm (Section 2.1.1). Waveforms that were similar to the average of the training set have
small Euclidian distances from the template and were considered as spikes from the same unit. (3) Projecting the data on the
average template. The average template was generated using a
training set. We then normalized this template to have a norm of
1 and convoluted it with the analog data. Spikes were detected
as peaks in the resulting vector. This method is equivalent to
the projection on the first principal component when the data
contains only one unit (Abeles and Goldstein, 1977).
Each of these methods classifies events as spikes or noise by a
user-defined threshold. Here we used this threshold as our independent variable and examined its effect on the isolation scores.
For each threshold we estimated the real ratio of false positive
and negative errors (assuming that the original classification represented the real classification) and calculated the isolation score
and error classification scores. As with the random simulation of
errors (e.g. random switching of spike and noise events; Section
3.4.1 and Fig. 6) the isolation scores decreased with modifications of the sorting thresholds that increased the number of
classification errors. This decrease took place both when the
threshold values were very conservative and led to false negative
errors (Fig. 7a, left side of plots) and when the thresholds were
too permissive and led to false positive errors (Fig. 7a, right side
of plots). The error classification scores followed the real error
ratio when it was small but suffered from biases for large real
error ratios and low isolation scores. To check the dependency of
the scores on the clustering method we compared the scores with
the real ratio of false negatives and the ratio of false positives
(Fig. 7b–d). We found that the isolation score differed slightly
between the tested methods (Fig. 7b). However, when comparing
these isolation scores to the isolation score obtained when errors
were simulated randomly (Fig. 6) we found that for a given number of false negatives the isolation score was higher when we
used different threshold levels. This over-estimation of the isolation score was probably due to the local consistency of errors
induced by systemic modification of the thresholds in the sorting
clustering methods. The false negative score had a range in which
it is equal to the real false negative ratio (Fig. 7c). This range was
larger when using template matching than when using the other
sorting methods. Similarly, the false positive score (Fig. 7d) was
equal to the false positive ratio when such errors existed and was
biased when the number of false negative was large. This bias
was smallest for the template matching algorithm. Nevertheless,
as with the random simulation of errors, systemic modification
of the thresholds by several sorting methods reveals that the isolation quality is a consistent and reliable estimator of the quality
72
277
of the spike clustering. The classification errors can be used in
cases with high levels of isolations scores (>0.8) and small levels
of false positive and negative errors (<0.25).
3.4.3. Dynamic and population analysis of the isolation
scores
Typical physiological experiments include long duration
(>15 min) recordings of the same units. Naturally, the isolation
quality may drift or change over these periods. The isolation
quality tests were applied to real data recorded for periods of
more than 10 min. Each recording was split into segments of
60 s (∼1000–4000 spikes in our GP data). To limit the algorithm
complexity (time and place) we reduced (by random pruning)
the largest cluster to a size of 1500 spikes; the other cluster was
then reduced to maintain the size ratio between clusters. The
length of the segment is thus a tradeoff between computational
time versus effectiveness. When using a short segment the sampling of the spike and noise cluster will be more accurate due to
non-stationarity and less random pruning; however the computational time will increase. After extracting these clusters they
were scored as described in the previous sections. Thus, for each
unit we obtained a series of scores. These series of scores can
be examined for problematic recording epochs which should be
scrutinized more carefully (by re-clustering or omitting these
sessions). This is depicted in Fig. 8 where 43 min of consecutive real-time sorting were scored. After 35 min of recording,
it can be seen that the quality of the sorting decreased rapidly
despite the apparent increase in the SNR of the unit. Our recommendation is therefore to apply these tests to any prolonged
extracellular recording, and then to exclude periods with low
scores from the analysis database. As a rule of thumb we suggest excluding recording periods with an isolation score below
0.7–0.8.
To achieve a single score for each unit we averaged the scores
over all sessions. The average scores of the 155 GPe units in
our database were: isolation score, 0.93 ± 0.08; false negative
score, 0.1 ± 0.09; false positive score, 0.02 ± 0.04 (Fig. 8c). To
compare the scores from different brain areas we calculated
the scores of 87 units recorded in the primary motor cortex.
Action potentials from the cortex were wider than GPe waveforms. Hence, we used 2 ms of analog recordings for each
action potential. The average isolation score of these cells was
0.79 ± 0.17. The average false negative score was 0.09 ± 0.19
and the average false positive score was 0.13 ± 0.18 (Fig. 8d).
All distributions were significantly different from GP scores
(p < 10−3 Kolmogorov–Smirnov test). This difference in scores
is consistent with our subjective sense of the better quality of
the GP data and is probably due to the difference in cell sizes
and cell density in these brain areas.
3.4.4. Exploring parameter space of the isolation and
classification error scores
The isolation score was designed to be insensitive to the noise
cluster size; i.e. adding events to the noise cluster that are far from
the spike cluster should not affect the score. The size of the noise
cluster is determined by the level of the amplitude threshold used
for extracting the noise cluster. To verify this insensitivity we
Results V
278
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
Fig. 8. Score statistics on real data. (a) Dynamic changes of the isolation scores. Data were split into segments of 60 s. For each segment we calculated the isolation
and classification errors scores. (a) All five scores and spike rates were computed for 43 consecutive minutes. Although all scores were stable during the first 35 min,
from the 36th minute they began to change. Both SNR scores increased; in contrast, the isolation score (which was extremely stable) rapidly decreased. Therefore, in
this case the SNR scores are misleading and the isolation score indicates the moment when the quality of the sorting decreased. (b) Spike (b1) and noise (b2) events
(n = 100, randomly selected) from 6 min of recording. The right column is from the first 6 min, the middle column from minute 19 to 25 and the left from the last
6 min of recording. In the last 6 min many noise waveforms resemble spikes. This misclassification is probably due to a slight modification in the spike waveform
(reflected by the SNR) that was not identified by the semi-automatic template matching algorithm. (c) Distributions of the scores of 155 GPe units. Scores from
different sessions were averaged. (c1) Isolation score. (c2) False negative score. (c3) False positive score. (d) Distribution of scores of 87 cortex units. (d1) Isolation
score. (d2) False negative score. (d3) False positive score.
modified the size of the noise cluster by changing the fraction
of events from the spike cluster used to calculate the threshold (these were the low amplitude spikes, hence fewer spikes
means a closer to zero threshold). The distribution of the isolation score was independent of the threshold used for noise cluster
extraction (p > 0.86 one-way ANOVA, p > 0.79 Kruskal–Wallis
non-parametric ANOVA).We then compared the isolation score
of all GPe units (n = 155) and found that the scores calculated
with different noise clusters were highly correlated (Fig. 9a).
Hence, our methods are insensitive to the size of the noise cluster. As described above, in order to reduce computation time we
used a random sample from the spike and noise clusters. As a
result each time we calculated the isolation score we used different events. The fact that we obtained the same scores when using
different random samples from the same distribution further
demonstrates the stability of our method.
The isolation score depends on the λ parameter that sets the
gain of the distance stretch. To check the dependency of the isolation score on this parameter we modified this parameter and
calculated the isolation score of all 155 GPe units (Fig. 9b1).
When λ was larger than 5 the isolation score was highly correlated with the scores calculated with our default value of λ = 10
(R > 0.946). However, when λ was equal to 1 the scores were
not as highly correlated (R = 0.72). This is expected since small
values of λ mean that the Euclidian distance between events is
not stretched, and therefore distant events influence the score. To
further investigate the influence of λ on our scores we simulated
classification errors by applying different thresholds when sorting the data (as described in Section 3.4.2). We modified λ and
calculated the isolation score as a function of the false negatives
and positives ratio (Fig. 9b2–3). We found that when λ is small
the score over-estimates the number of false positives (Fig. 9b3).
73
Results V
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
279
Fig. 9. Investigating parameter space of the isolation and classification-error scores. We modified the parameters used for calculating the score and compared the
scores of real units and the scores of a unit with errors simulated by re-clustering the data using a threshold crossing method. (a) We modified the fraction of the spike
clusters we used to calculate the threshold (these were the low amplitude spikes; hence fewer spikes means a closer to zero threshold) and compared the isolation
scores when using 2% of the spike cluster. (a1) 2% vs. 5%. (a2) 2% vs. 20%. (a3) 2% vs. 100%. (b) Comparison of isolation score when modifying λ. (b1) Real data
results. Units were sorted by the isolation score when using λ = 10. (b2) Isolation score as a function of the ratio of simulated false negatives. (b3) Isolation score as
a function of the ratio of simulated false positives. (c) Comparison of false negative score when modifying K. (c1) Real data results. Units were sorted by the false
negative score when using K = 31. (c2) False negative score as a function of the ratio of false negatives. (c3) False negative score as a function of false positives. (d)
Same as (c) for the false positive score.
In addition we found that as λ increases the false negative score
tends to increase. However, this increase is bounded. We conclude that our selection of λ = 10 does not suffer from biases that
occur when λ is small and it is within the large range in which
the isolation score follows the ratio of classification errors.
The KNN algorithm used for the calculation of the classification error scores depends on the K we use. We modified this
74
parameter and calculated the scores of all 155 GPe units (Fig. 9c1
and d1). The units were sorted by the scores when using the
default value of K = 31. The false negative score changed only
slightly when modifying K (Fig. 9c1); on the other hand the false
positive score was sensitive to the K used (Fig. 9d1). We simulated clustering errors (as we simulated errors when modifying
λ) and calculated the error classification score as a function of
Results V
280
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
Fig. 10. Comparing the scores. The scores were compared using the data from 155 GP units from three different monkeys. (a) Isolation score vs. SNRNo Spk . When
the SNR is small the isolation score tends to be small, and when the SNR is large the isolation score is usually close to 1. But this relation is not linear. Therefore,
any SNR threshold will either include units that are poorly isolated (low isolation score) or exclude units that are well isolated (high isolation score). Furthermore,
the outliers (high SNR with low isolation score) reveal the weaknesses of the spike sorting process. (b) False positive + false negative scores vs. isolation score. As
the isolation score decreases the variability increases. Hence, the error type scores should only be used when the isolation score is high.
error ratio for different K values. We found that although K can
bias the scores when the error ratios is large, there is a range of
good predictability.
compared several spike sorting algorithms, and investigated the
parameter space of the scores.
4.1. Related studies
3.4.5. Comparing the scores
Our SNR, isolation and classification error scores were not
designed to be independent. To determine the degree of dependency we compared the different scores (Figs. 3d and 10). First
we compared the two SNR scores (Fig. 3d). The underlying reasons for the differences between these scores were described
in Section 3.1.2; nevertheless we found the SNR scores to be
highly correlated (R2 = 0.94).
We then compared the isolation score and the SNR score and
found that as expected, in most cases units with a high SNR score
had high isolation scores and units with low SNR scores had low
isolation scores (Fig. 10a). On the other hand, the connection
between these scores was non-linear and had outliers in which
the isolation score was low although the SNR was high (e.g. last
6 min of Fig. 8). Due to these properties, any exclusion/inclusion
criteria of units using a threshold based on the SNR scores will
lead either to inclusion of units with a low isolation score or
to exclusion of units with a high isolation score. Finally we
compared the isolation score and the sum of false positive and
negative scores (Fig. 10b) and found that when the isolation
score was high, the variability of the sum was low and when
the score was low the variability of the sum of the classification
error scores was high. This again shows that the classification
error scores are a refinement of the isolation score only when
it is high, and further that when the isolation score is low these
scores are less reliable and should not be used.
4. Discussion
We quantified the quality of spike detection and sorting using
signal-to-noise ratios (SNR), isolation scores, and classification
error scores. We then simulated errors for validating the scores,
Some previous studies have quantified the quality of clustering of recordings from multi-channel electrodes. These methods
can be adapted to single channel recordings. In their study,
Pouzat et al. (2002) assumed a Gaussian distribution of the
noise, which they used to evaluate the variability of the spike
waveforms. Shoham et al. (2003) have argued that the Gaussian assumption is inaccurate, and that the t-distribution is a
better fit for the data. Furthermore, the distribution of noise, in
general, is not sufficient for estimating the variability of signal
statistics (Fee et al., 1996b). Even modeling the variability generated by the cells’ intrinsic properties is not always sufficient
because it does not predict the variability caused by changes in
the relative position of electrodes and recorded neurons (Fig. 1).
Schmitzer-Torbert et al. (2005) used the χ2 distribution as a
distance measure of the noise events from the spike cluster in a
feature space. In their method the distance between a noise event
and the spike cluster was treated in a global manner; i.e. the score
of each noise event depended on its distance from the center of
the spike cluster. By contrast, our scores are based on the local
properties of the spike cluster. While their approach focused on
the contribution of the noise events, our scores iterate over the
events in the spike cluster. As a result of these differences, our
isolation score captures phenomena found in non-homogenous
spike clusters (i.e. clusters containing false positives), which
the χ2 distance does not. In addition, we can obtain an estimate of the number of false positive and false negative errors
that is not available with previous methods. Harris et al. (2001)
and Schmitzer-Torbert et al. (2005) introduced the isolation distance which quantifies the quality of clustering by the minimal
distance where the number of spike events and noise events is
equal. Although this score is “self consistent”; i.e. the score will
75
Results V
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
281
decrease in the same recording with a reduction of the quality
of the sorting, it does not have a global scale to differentiate
between well and poorly isolated units. For example, a wellisolated unit with a low SNR can have the same score as a poorly
isolated cluster with a high SNR. A major advantage of our isolation score is its intuitive range of zero to one, which enables
easy comparison of units recorded at different times, and even
by different research groups.
In summary, we propose the isolation score, which is a measure of the separation between two groups (clusters); and then we
present the two classification scores using a “one-class classification problem” approach. There are few other metrics for group
separation (usually as evaluations of clustering techniques) and
classification problems (Trevor et al., 2001), e.g. metrics based
on Euclidian distance, city block, etc. However, we feel that
the isolation and classification scores provide better metrics for
spike data due to their insensitivity to noise cluster size.
not distinguish between recordings of several cells on one electrode versus single cell recordings. In both cases given a cluster
of spikes we extract the noise cluster and calculate our scores.
As a result our scores reflect the quality of each unit and not
the overall quality of the all units recorded from a given electrode.
A preliminary condition for quality assessment is the insensitivity of the isolation score to the exact size of the noise cluster.
By using different thresholds for extracting the noise cluster we
showed that once the noise cluster contains the events that are
close to the spike cluster, the score depends only slightly on the
exact size of the noise cluster. As a result our methods can be
applied to systems with intermittent sampling conditioned by
the extraction and analog sampling only of putative spikes. In
such systems it is possible to use other spike clusters, if they
exist, such as the noise reference; however this may lead to
over-estimation of data quality.
4.2. Relationship between scores
4.4. Future directions
The scores show inter-dependence. A low isolation score
is likely when the SNR is low, because low recording quality
leads to cluster errors. On the other hand, large SNR values
that appear with low isolation scores indicate problems with the
clustering algorithm. There are many possible reasons for such
isolation failure. These include assumptions in the clustering
algorithm that may not have been fulfilled; e.g. the statistical
model was wrong, the data were non-stationary or human errors
were made. In this case (high SNR, low isolation score) we
suggest re-clustering the data.
To enhance the reliability of the results of studies based on
extracellular recordings we suggest using the isolation score for
preliminary analysis and exclusion of units or periods with very
low isolation scores from the study data-base. We suggest that
the findings be first verified on the recordings with high isolation
scores and then extended to the entire data base. We suggest
excluding units with isolation scores below 0.8 in studies whose
conclusions may be influenced by the isolation quality of the
recorded units. However, we believe that more testing is needed
for setting this threshold and hope that such a threshold will
emerge after future work is done in different recording settings
and neuronal areas. In any case, this should not limit the report
of the isolation score even when it is not used as a criterion for
excluding data.
An additional benefit of classification error scores is that they
identify likely misclassified events. Our KNN approach can be
used as a post-processing tool to optimize the original spike
sorting, by flipping the classification for these missed events.
An even more promising approach would be to use our isolation
score algorithm to recluster these missed events, by using the
P(X) values (Fig. 4). Recall that this value is akin to the probability that event X belongs to the spike cluster. The re-clustering
could simply flip the classification for events X, for which their
P(X) value is greater than some threshold.
In this study we did not attempt to develop a method for
finding the optimal value of K in the K nearest neighbors
approach, but only constrained it. A data-driven approach, where
K depends on various parameters of the spike and noise clusters
(e.g. number of elements, overlap of the two clusters as measured
by the isolation score, etc.) may be pursued. One may consider
using two values for K, one for detecting false negatives and one
for detecting false positives.
Finally, our methods are based on analyzing the waveform
of extracellular events and did not take spike train properties
into account such as the firing rate or refractory periods. These
properties are valuable for assessing spike sorting quality and
thus can be used independently or could be incorporated into our
4.3. The score under different conditions
Our simulation of spike errors using different sorting algorithms has shown that under different sorting conditions the
scores are consistent and follow the number of simulated classification errors. We showed that the isolation score decreases
with the ratio of classification errors and the classification error
scores have a range in which they follow the real error ratio.
However, applying sorting algorithms that directly reduce the
scores may lead to a bias; i.e. a high isolation score despite a
large ratio of classification errors. Nonetheless such an algorithm requires local consistency of spike clusters. Hence, we
suggest using our scores when the sorting algorithms are based
on global parameters (such as template matching and PCA based
methods). Furthermore we suggest that local consistency algorithms should be used for post processing of sorting algorithms
(see below).
We compared the scores of two different brain areas. To
enable this comparison we adjusted the time interval slightly
for representation of events. We found that the isolation scores
of GPe units were significantly larger than cortex units. This
was consistent with our subjective impression that GPe units
were better isolated. A major difference between GP and cortical recordings is that GP recordings are usually of only one cell
per electrode, whereas two to three units are typically recorded
by a single cortical electrode. Our isolation score methods do
76
Results V
282
M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282
scores. For example, we could introduce a progressive penalty
for units with detected spikes in their estimated refractory period.
In summary, we have developed methods for quantifying the
isolation quality of extracellularly recorded action potentials
and compared these different methods. The scoring methods
were applied directly to the spike waveform; however they
may be used on other representations of the spike, e.g. PCA or
wavelet-based representations. Isolation quality quantifications
are a necessary step in interpreting studies based on extracellular recording. The conclusions of many single-units studies are
more dependent on their unit isolation quality than on the power
of the statistical and analytical methods used for their spike-train
analysis. Nevertheless, in most cases, objective criteria are used
and reported for the later stage but not for the first stages of the
data acquisition process. We encourage research groups to use
isolation measures, as developed in this manuscript, rather than
more commonly used phrases such as “only well-isolated units
were included in our study”.
Acknowledgement
This study was partly supported by a Center of Excellence
grant administered by the ISF and HUNA’s “Fighting against
Parkinson” grant.
References
Abeles M, Goldstein MHJ. Multispike train analysis. IEEE Trans Biomed Eng
1977;65:762–73.
Bar-Gad I, Ritov Y, Bergman H. Failure in identification of overlappig spikes
from multiple neuron activity causes artificial correlations. J Neurosci Methods 2001;107:1–13.
Bergman H, DeLong MR. A personal computer-based spike detector and sorter:
implementation and evaluation. J Neurosci Methods 1992;41:187–97.
Elias S, Joshua M, Goldberg JA, Heimer G, Arkadir D, Morris G, et al. Statistical
properties of pauses of the high-frequency discharge neurons in the external
segment of the globus pallidus. J Neurosci 2007;27:2525–38.
Fee MS, Mitra PP, Kleinfeld D. Automatic sorting of multiple unit neuronal signals in the presence of anisotropic and non-Gaussian variability. J Neurosci
Methods 1996a;69:175–88.
Fee MS, Mitra PP, Kleinfeld D. Variability of extracellular spike waveforms of
cortical neurons. J Neurophysiol 1996b;76:3823–33.
Goldberger J, Roweis S, Hinton G, Salakhutdinov R. Neighbourhood component analysis. Neural Inform Process Syst (NIPS’04) 2004;17:513–
20.
Harris KD, Hirase H, Leinekugel X, Henze DA, Buzsaki G. Temporal interaction
between single spikes and complex spike bursts in hippocampal pyramidal
cells. Neuron 2001;32:141–9.
Heimer G, Bar-Gad I, Goldberg JA, Bergman H. Dopamine replacement
therapy reverses abnormal synchronization of pallidal neurons in the
1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine primate model of parkinsonism. J Neurosci 2002;22:7850–5.
Lewicki MS. Bayesian modeling and classification of neural signals. Neural
Comp 1994;6:1005–30.
Lewicki MS. A review of methods for spike sorting: the detection and classification of neural action potentials. Network 1998;9:R53–78.
Likhtik E, Pelletier JG, Paz R, Pare D. Prefrontal control of the amygdala. J
Neurosci 2005;25:7429–37.
Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. Coincident but distinct
messages of midbrain dopamine and striatal tonically active neurons. Neuron
2004;43:133–43.
Nenadic Z, Burdick JW. Spike detection using the continuous wavelet transform.
IEEE Trans Biomed Eng 2005;52:74–87.
Pare D, Gaudreau H. Projection cells and interneurons of the lateral and basolateral amygdala: distinct firing patterns and differential relation to theta and
delta rhythms in conscious cats. J Neurosci 1996;16:3334–50.
Pouzat C, Delescluse M, Viot P, Diebolt J. Improved spike-sorting by modeling
firing statistics and burst-dependent spike amplitude attenuation: a Markov
chain Monte Carlo approach. J Neurophysiol 2004;91:2910–28.
Pouzat C, Mazor O, Laurent G. Using noise signature to optimize spikesorting and to assess neuronal classification quality. J Neurosci Methods
2002;122:43–57.
Quiroga RQ, Nadasdy Z, Ben Shaul Y. Unsupervised spike detection and
sorting with wavelets and superparamagnetic clustering. Neural Comput
2004;16:1661–87.
Schmitzer-Torbert N, Jackson J, Henze D, Harris K, Redish AD. Quantitative
measures of cluster quality for use in extracellular recordings. Neuroscience
2005;131:1–11.
Shoham S, Fellows MR, Normann RA. Robust, automatic spike sorting using mixtures of multivariate t-distributions. J Neurosci Methods
2003;127:111–22.
Trevor H, Robert T, Jerome F. The elements of statistical learning: data mining,
inference and prediction. New York: Springer Verlag; 2001.
Vapnik VN. Statistical learning theory. New York: Wiley; 1998.
Wood F, Black MJ, Vargas-Irwin C, Fellows M, Donoghue JP. On the variability
of manual spike sorting. IEEE Trans Biomed Eng 2004;51:912–8.
Worgotter F, Daunicht WJ, Eckmiller R. An on-line spike form discriminator for
extracellular recordings based on an analog correlation technique. J Neurosci
Methods 1986;17:141–51.
77
Discussion
Discussion
In this thesis I studied the responses of different basal ganglia neurons to rewarding
and aversive related events. The results are summarized in a series of peer-reviewed
journal manuscripts. The main findings of these manuscripts are discussed below.
In the first paper included in this thesis (64) I found that rate modulations of striatal
tonically active neurons (TANs) and dopaminergic neurons to expectation of reward
were larger than the modulation which followed predictions of aversive events.
Furthermore, these neurons encode the expectation level (or the prior probability) of
reward better than the expectation of aversive events. Finally, TAN responses were
not coincident with dopaminergic neurons responses in all trial epochs. More
specifically, dopaminergic neurons encode the difference between reward and
aversive trials in the cue and outcome epoch whereas the TAN population encodes
this difference in the outcome and no-outcome epochs. Therefore complementary
coding of dopaminergic neurons and TANs expands the encoding scope of the basal
ganglia neuromodulators.
In the second paper (65) and third chapter I extended the first study to the
investigation of the responses of basal ganglia main axis neurons to expectation,
delivery and omission of appetitive (food), aversive (airpuff) and neutral (sound only)
events. I found that the responses of GPe, GPi and SNr neurons were longer in
duration and less stereotypic than the responses of the main basal ganglia
neuromodulators. As with the TANs and dopaminergic neurons, the responses of the
basal ganglia main axis neurons were larger and usually encoded reward better than
aversive related events. I found substantial differences between the three populations
of basal ganglia main axis neurons. Most notably, SNr responses were more frequent,
had shorter latencies, and encoded the airpuff delivery better than the corresponding
responses of GPe and GPi neurons.
In the fourth chapter (66) I used pair wise correlation analysis. I showed that the
average responses of neuromodulators (TANs and dopaminergic neurons) tended to
have a positive response correlation (i.e., similar time pattern). In comparison to the
homogenous responses of the basal ganglia modulators, the neurons of the basal
ganglia main axis had diverse responses. Pairs of dopaminergic neurons, as well as
pairs of TANs dynamically modulate their discharge variation in accordance with
events in the behavioral task. The synchronization between dopaminergic neurons
increased after the cue and outcome events whereas synchronization of TANs
78
Discussion
decreased just before cue offset. Furthermore, although the discharge rate of the
dopaminergic neurons increased both in reward and aversive trials, their
synchronization increased only in the reward trials. Similarly, the dynamic changes in
synchronization of TAN pairs were not coincident with their discharge rate
modulation. Finally in the fifth chapter (67) I developed a method for quantification
of the quality of extracellular recording. This method was used in the analyses in all
the result chapters.
Asymmetry in the encoding of values in the basal Ganglia
Asymmetric encoding of positive and negative expectations by the basal ganglia
I found that before the end of the cue presentation, the fraction of trials in which the
monkey licked in expectation of a future reward and the fraction of trials in which the
monkey blinked in expectation of a future airpuff were similar. In addition I found a
large blinking response even when the airpuff was omitted. Finally, with the
exception of the outcome epoch, the licking and the blinking behavior reflected the
expected (low vs. high) probability of the reward and the aversive events.
Nevertheless the basal ganglia single cell activity was found to be biased toward the
encoding of reward related events, and encoding of aversive events was very weak.
Several studies have used similar paradigms to compare neural responses to reward
food and aversive airpuff (56, 68, 69). Paton et al. showed that in the amygdala,
expectations of food and airpuff are represented symmetrically. My research shows
that by contrast to the amygdala, food and airpuff expectations are represented
asymmetrically in the basal ganglia. Thus, I found comparable aversive and reward
related behavior. However, whereas the activity in the basal ganglia strongly reflects
reward behavior and encodes reward probability, aversive related events and their
probability are only weakly encoded in basal ganglia activity.
Although I found similarity in the behavioral responses, in this study I did not
calibrate the subjective value (utility) of food vs. airpuff. I however did manipulate
the expectation of aversive outcome. In previous instrumental conditioning
experiments including both reward and aversive events the monkey could avoid the
aversive airpuff by a correct response (56, 61, 70). In the current experiment the
airpuff was unavoidable and hence the aversive cue led to direct expectation of
aversion.
79
Discussion
In the first paper (64), I reported that the responses of midbrain dopaminergic neurons
and striatal TANs are biased towards the encoding of rewarding events and in the
second I found a similar result for the main axis neurons (71). The basal ganglia main
axis is affected by additional neuromodulator systems, e.g., serotonin (72).
Theoretical studies have suggested that the phasic serotonin signal might report the
prediction error for future punishment (73, 74) and therefore could compensate for the
biased encoding of the value domain by the TANs and the dopaminergic neurons. The
current study of the basal ganglia output structures indicates that the basal ganglia
main axis neurons have a similar bias toward control of reward related behavior as
TANs and dopaminergic neurons. Thus, even if there are other basal ganglia
modulators than the cholinergic and dopaminergic striatal inputs, the activity of basal
ganglia output neurons follows the same trend as the TANs and dopaminergic neurons
and is biased towards rewarding events. I therefore suggest that the other modulators
do not extend the basal ganglia encoding to aversive events and that there are other
neuronal systems than the basal ganglia that have control over aversive related
behavior.
Encoding of Dopaminergic neurons
Dopaminergic neurons encode more than reward prediction errors
Recent studies have shown that dopaminergic neuron activity encodes the mismatch
between prediction and reality. Most of these studies have focused on the mismatch in
the positive domain; i.e., when conditions are better than expected (25).
Dopaminergic neurons typically increase their discharge rate in response to appetitive
predictive cues and outcomes. In line with the predictions of reinforcement learning
theories, the dopaminergic neurons discharge decreases with omission of predicted
rewards (29, 51, 52). However, this discharge suppression is limited since the
neuronal firing rate is truncated at zero. Several groups (27, 28) have reported that the
instantaneous firing of dopaminergic neurons does not demonstrate incremental
encoding of reward omission, and it was suggested that omission is encoded by
duration of the discharge decrease (53). In this experiment, however, I failed to find
any significant coding of reward omission by response amplitude or duration.
Naïve reinforcement learning models categorize events as having positive or negative
errors and would suggest opposite sign modulation to reward and aversive trials (25).
80
Discussion
However I found similar trends for dopaminergic neuron responses to predictions,
outcomes and omission of reward and aversive related events (64). In particular I
found a substantial increase to both reward and aversive outcomes. Furthermore, the
responses of the dopaminergic neurons to reward omission and aversive outcome
were very different (decrease vs. increase) although in both cases there was a negative
reinforcement error.
To summarize, the results reveal an increase in the complexity of the dopaminergic
neuron encoding of value. This does not rule out their role in the temporal difference
hypothesis. On the contrary, my working hypothesis holds that the discharge rate of
dopaminergic neurons and TANs reflects changes in reward prediction as well as
changes in attention/arousal levels (54, 75, 76).
Reward related increase in the synchronization of dopaminergic neurons
I showed that in a classical conditioning task, the activity of the dopaminergic neurons
also increases following non- rewarding events such as the prediction and delivery of
airpuffs (64). Nonetheless, I found an increase in the noise correlation of
dopaminergic neurons to expectation and delivery of reward and not to other events
(66). These findings indicating a reward related increase of noise correlation extend
previous findings of unspecific spike to spike (noise) correlations of dopaminergic
neurons (28, 77).
The modulations of the noise correlation were small compared to the modulations of
rate. In a recent study Schneidman et. al. (78) have shown that weak pair wise
correlation may imply a strongly correlated network, and provide an effective
description of the system. It is unclear whether pair wise correlations give an effective
description of the dopaminergic neurons since current recording methods do not
enable in vivo simultaneous recording of many neurons, yet it demonstrates the
potential importance of the noise correlations.
Although I found some overlap between noise correlations and rate modulations, they
were dissociated. There were periods with modulation of rate but not of the noise
correlation.
Comparing basal ganglia subpopulations
81
Discussion
Comparing neuromodulators - TANs do not mirror the dopaminergic neurons
responses
The anatomical demonstration of dopaminergic innervations of striatal cholinergic
interneurons (79) and the suppression of acetylcholine efflux from striatal slices by
dopamine (80) suggest that dopaminergic neurons directly inhibit TANs (41). TANs
might mediate the dopaminergic message to the D1 and D2 dopamine receptor
containing striatal projection neurons. The opposite and coincident responses of the
TANs and dopaminergic neurons to predictive cues support direct inhibition.
However TAN responses at the terminal stage of the trial include major positive
deflections which do not mirror any phase of the dopaminergic response. Notably,
following outcome omission, dopaminergic neurons respond similarly to the neutral
outcome, reward and airpuff omissions, whereas the TANs robustly discriminate
between the three events. Thus, dopaminergic neurons may better encode the cue
predicting events and the TANs may provide more information at the completion of
the trial. This is consistent with the findings of sub-populations of striatal projection
neurons with selective evaluative encoding of trial results (42, 81). In any case, these
differential responses indicate that the TAN discharge is not totally governed by its
dopaminergic inputs; neither are the TANs and dopaminergic neurons driven by a
common source (82) with opposite effects on the two systems.
In addition to the differences between single cell activity of TANs and dopaminergic
neurons I found differences in the pattern of the correlation between pairs of cells in
these populations (66). I found that the noise correlation of the dopaminergic neurons
increases whereas the correlation for the TANs decreases. Thus, it is possible that
increasing the dopaminergic neuron correlation and the de-correlation of TANs
enables an increase and decrease respectively in the effective concentrations of striatal
dopamine and acetylcholine respectively. The right balance between basal ganglia
neuromodulators and cortico-striatal activity may lead to a maximization of
information in the basal ganglia main axis and an optimal behavioral policy.
Comparing main axis populations - different response characteristics of the main
axis nuclei
In this study I found several major differences between the GPe, GPi and the SNr
(71). I found more intense changes in the responses of the SNr compared to the
responses of the GPe and the GPi. SNr neurons responded with shorter latencies to the
82
Discussion
cue, and encoded the airpuff outcome better than the pallidal neurons. A simple
explanation for the enhanced encoding is the orofacial (licking and blinking) motor
behavior of the monkeys in this experiment. Initial studies emphasized the role of the
SNr in the control of orofacial movements (83, 84). Although this separation is not
clear cut (85, 86) the results may reflect this organization. Thus the small and less
frequent responses in the GPi could reflect the smaller representation of orofacial
movements in the GPi. This could also account for the activation of the SNr to
aversive events, but as noted above this does not explain the asymmetric value
representation in the SNr.
At the circuitry level, one possibility is that the origins of the difference in pallidal vs.
SNr responses could be a result of different projections from the striatum or the STN
(6). Another possibility is that the GPe has different pathways to the GPi and SNr and
those GPe neurons that do project to the SNr are the neurons with the short latency
and larger response. Nevertheless I did not find any topographic organization in the
responses of the GPe that supports this hypothesis. Finally another putative
explanation for the differences between the GPi and the SNr is the direct effects of
somatodendritic release of dopamine on SNr, but not on pallidal neurons. The similar
latencies of SNc and SNr responses support the hypothesis that SNc neurons may
drive SNr responses by somatodentritic release of dopamine (87, 88).
Finally, the neural recordings were made after the monkey was highly familiar with
the task and hence activity might not be the same as activity that occurs during
learning. Previous studies of dopaminergic neurons have shown that activity on a
familiar probabilistic task resembles the activity in a learning task (26, 28, 29). A
fMRI study has shown that striatal activity underlies novelty-based choice in humans
(89). Whether this is the case for other basal ganglia populations and the single cell
activity that underlies novelty representation should be investigated in future studies.
Comparing
main
axis
and
neuromodulators
-
Phasic
response
of
neuromodulators vs. long lasting response of main axis
In contrast to the short (<0.7 s) responses of the basal ganglia modulators (28, 64, 90,
91), the responses of the basal ganglia main axis high frequency discharge neurons
lasted throughout the two second cue epochs. This is in line with previous
descriptions of pallidal (50) and SNr (85) responses. Long duration, set- related
responses have frequently been described in the cortex (92-94) where they have been
83
Discussion
attributed to short term memory or action preparation processes. I cannot rule out
similar processes in the basal ganglia and the experimental design could not dissociate
set- related vs. cue- evoked responses. However, the encoding of probability by the
basal ganglia main axis neurons and the dissociation between actions and neural
response (for example no neural encoding of the probability of aversive trials, the
early decay of the neural activity compared to licking behavior after reward delivery)
suggests that the activity of these neurons may encode the value of the current state or
state-action pairs (42, 45).
The tonic high frequency discharge rate of neurons (population average: 45-88
spikes/s in this study) endows them with a better dynamic range for responses with a
decrease in discharge rate. Nevertheless, consistent with many previous studies (9598) I found that the high frequency discharge neurons respond to behavioral events
more frequently with increases than with decreases in discharge rate. The latencies
and the temporal distribution of the responses with increases and decreases in
discharge rate were similar, thus leading to highly diverse basal ganglia encoding,
with different polarities and different amplitudes of responses. The differences
between the population responses with no encoding of the a-priori probability of
outcome vs. the single unit encoding of this probability is in line with such a balanced
diversity of responses of basal ganglia single units. These diverse responses augment
the information capacity of the basal ganglia output structure (99).
Correlations of the average response set neuromodulators apart from the main
axis
Previous studies have observed that different neuromodulator cells have responses
with similar temporal patterns (40, 91). In the forth chapter (66) I quantified the
similarity of the temporal pattern of the response (response correlation) and the
similarity of the encoding of different events (signal correlation). I showed that as
opposed to the basal ganglia neuromodulators, the main axis responses are diverse.
The homogeneous and synchronized responses of the neuromodulators suggest that
these populations as a whole provide the main axis with a scalar message; i.e., the
encoding of different dopaminergic neurons, as well as of different TANs, is similar.
On the other hand the diversity of the main axis responses suggests that its activity is
highly independent, which is conducive to a large information capacity (99). The
contrast between the diversity of main axis response and the homogeneity of
84
Discussion
modulators was demonstrated in a behavioral task with 18 different events.
Nevertheless, I cannot rule out the possibility that the recording of neural activity
during other tasks or over greater spatial distances (including dopaminergic neurons
in the ventral tegmental area and TANs in the caudate or ventral striatum) may reveal
other effects. Future studies using a large variety of tasks and wider sampling of basal
ganglia neurons should test the consistency and the spatial extent of the homogeneity
of the basal ganglia modulators.
Based mainly on the activity of the dopaminergic neurons it was suggested that the
basal ganglia implements a reinforcement learning algorithm (25). The distinction
between the correlation properties of neuromodulators and the main axis is in line
with the idea that these populations have a different role in the reinforcement learning
algorithm. The neuromodulator scalar response is consistent with these neurons being
the teacher (e.g., a critic) of this system and the diversity of the main axis is in
agreement with it being the executor of the system (e.g., the actor) which requires
specificity in the encoding of the different neuronal elements.
The basal ganglia in control of motor behavior
In the previous sections I have discussed the different results of my thesis; in this
section I will try to unify the different results under one framework.
The response to silent non rewarding events
I found that responses to expectation of aversive events are similar to the responses to
the neutral cue. Nevertheless in many cells we do see a large (but similar) response to
both of these events (64, 65). In addition I found many neurons that respond to the
aversive outcome with a short duration respond. In the following section I discuss the
possibility that the responses to events represent two modes of activity in the basal
ganglia- a fast component that encodes the saliency and another component that is
selective for rewarding events. I suggest that both of these phases have origins in the
dynamic response of the dopaminergic signal.
The dopaminergic signal may provides different messages to different basal
ganglia pathways
Close examination of the response of the dopaminergic neurons reveals that many of
these cells have a bi-phasic response with an increase that is followed by a decrease in
85
Discussion
activity (51, 64). The increase in activity for non rewarding events is brief (64, 66),
whereas the increase to the rewarding events is longer. Dopamine transmission is not
limited to classical synaptic action since it may also diffuse and reach extra synaptic
receptors (23, 100). The coordinated burst of dopaminergic neurons leads to large
extra synaptic dopamine concentrations (101), and a pause in activity of dopaminergic
neurons induces a decrease in the extracellular dopamine level (102). This suggests
that the bi-phasic response enables fast increase in the extra synaptic dopamine that is
followed by a fast clearance. The larger, longer (64) and more synchronized (66)
response to the rewarding events may selectively potentiate the transmission of
dopamine in the striatum for rewarding events (103). The augmented dopamine
release to rewarding events would enable summation of the extracellular dopamine
released for adjacent synapses. The brief unsynchronized increase for the non
rewarding events may lead to a fast localized increase in dopamine that will then
rapidly decrease and lead to a wide-ranging decrease in extra synaptic dopamine.
Based on the reuptake, diffusion of dopamine and the affinity of the different
dopamine receptors Cragg and Rice (100) have calculated the sphere of influence of a
single release. They concluded that D1 receptors are activated at short distances (<2
µm) from the dopamine release site while the D2 are activated in longer distances (<7
µm). The biphasic response may lead to differences in the activation of the receptor
types; i.e. the first phase of the response reaches the close D1 and D2 receptors while
late increase (which is limited to rewarding events) influences the remote high affinity
D2 receptors. Together with the previous assumption of fast clearance for non
rewarding events, this different field of effects suggests that the fast direct D1
pathways receive only the fast (saliency) signal and that the slow indirect D2 pathway
is activated by both fast and slow dopamine signal.
Two time scales for controlling motor behavior
Control of behavior in short latencies has the advantage of responding online which is
essential when the environment is rapidly changing. On the other hand, fast responses
may lead to errors which might be avoided with more prolonged processing of
information. Theories of the function of the basal ganglia have suggested that the
main role of their output is to open a gate to enable motor behavior (104). The
dynamic response enables fast non selective opening of the motor gate (perhaps in the
fast direct pathway). These fast motor responses are probably necessary for a rapidly
86
Discussion
changing environment with salient events. Accordingly, we found that many neurons
in the output of the basal ganglia have a fast response to the air puff delivery (65). The
second phase of the dopaminergic response enables selective behavioral responses
which might require planning for maximization of future reward. Indeed, we found
significant encoding of reward expectation, but hardly any representation of
expectation of air puff in the output stages of the basal ganglia (65).
Diseases of the basal ganglia causes severe motor and cognitive impairments (12).
These symptoms can be divided to those, in which behavior is categorized as
unrestrained behavior (positive symptoms; e.g., impulsivity, gambling, obsessive
behavior, dyskinesia and tremor) and those in which behavior is over restraint
(negative symptoms; e.g., bradykinesia, akinesia). Previous action selection models
of the basal ganglia (104) holds that akinesia is due to dopamine depletion and closure
of the basal ganglia gate but bardykinesia is not easily explained. The convergence of
high and low order information in the basal ganglia suggests that it could be that these
clinical symptoms reflect impairments in the ability of the basal ganglia to tradeoff
between the fast and slow pathways. Unrestrained behavior is due to over activity in
the fast pathways and the over restrained activity is due to over processing in the slow
pathways.
87
Bibliography
1. Marr,D.
(1983) Vision: A Computational Investigation into the Human
Representation and Processing of Visual Information, W. H. Freeman
2. Sutton,R.S. and Barto,A.G. (1998) Reinforcement learning - an introduction, The
MIT Press
3. Bar-Gad,I. and Bergman,H (2001) .Stepping out of the box: information
processing in the neural networks of the basal ganglia. Curr. Opin. Neurobiol. 11,
689-695
4. Gurney,K. et al. (2004) Computational models of the basal ganglia: from robots
to membranes. Trends Neurosci. 27, 453-459
5. Parent,A. and Hazrati,L.N. (1995) Functional anatomy of the basal ganglia. I. The
cortico-basal ganglia-thalamo-cortical loop. Brain Res. Rev. 20, 91-127
6. Haber,S.N. and Gdowski,M.J. (2004) The Basal Ganglia. In The Human Nervous
System (Second edn) (Paxinos,G. and Mai,J.K., eds), pp. 676-738, Elsevier
7. Albin,R.L. et al. (1989) The functional anatomy of basal ganglia disorders.
Trends Neurosci. 12, 366-375
8. Tepper,J.M. et al. (2004) GABAergic microcircuits in the neostriatum. Trends
Neurosci. 2662-669 ,7
9. Aosaki,T. et al. (1994) Responses of tonically active neurons in the primate's
striatum undergo systematic changes during behavioral sensorimotor conditioning.
J. Neurosci. 14, 3969-3984
10. Wilson,C.J. et al. (1990) Firing patterns and synaptic potentials of identified giant
aspiny interneurons in the rat neostriatum. J. Neurosci. 10, 508-519
11. Tepper,J.M. and Bolam,J.P. (2004) Functional diversity and specificity of
neostriatal interneurons. Curr. Opin. Neurobiol. 14, 685-692
12. DeLong,M.R (1990) .Primate models of movement disorders of basal ganglia
origin. Trends. Neurosci. 13, 281-285
13. Gerfen,C.R. et al. (1990) D1 and D2 dopamine receptor-regulated gene
expression of striatonigral and striatopallidal neurons. Science 250, 1429-1432
14. Shen,W. et al. (2008) Dichotomous dopaminergic control of striatal synaptic
plasticity. Science 321, 848-851
15. Surmeier,D.J. and Kitai,S.T. (1994) Dopaminergic regulation of striatal efferent
pathways. Curr. Opin. Neurobiol. 4, 915-919
16. Levesque,M. and Parent,A. (2005) The striatofugal fiber system in primates: a
reevaluation of its organization based on single-axon tracing studies. Proc. Natl.
Acad. Sci. U. S. A 102, 11888-11893
17. Nadjar,A. et al. (2006) Phenotype of striatofugal medium spiny neurons in
parkinsonian and dyskinetic nonhuman primates: a call for a reappraisal of the
functional organization of the basal ganglia. J. Neurosci. 26, 8653-8661
18. Feger,J. et al. (1994) The projections from the parafascicular thalamic nucleus to
the subthalamic nucleus and the striatum arise from separate neuronal populations:
a comparison with the corticostriatal and corticosubthalamic efferents in a
retrograde fluorescent double- labelling study. Neuroscience 60, 125-132
19. Nambu,A. et al. (2002) Functional significance of the cortico-subthalamo-pallidal
' hyperdirect' pathway. Neurosci. Res. 43, 111-117
20. Bolam,J.P. et al. (2000) Synaptic organisation of the basal ganglia. J Anat. 196,
527-542
21. Reynolds,J.N. et al. (2001) A cellular mechanism of reward-related learning.
Nature 413, 67-70
22. Calabresi,P. et al. (2000) Acetylcholine-mediated modulation of striatal function.
Trends Neurosci. 23, 120-126
88
Bibliography
23. Arbuthnott,G.W. and Wickens,J. (2007) Space, time and dopamine. Trends
Neurosci. 30, 62-69
24. Jellinger,K.A. (1991) Pathology of Parkinson's disease. Changes other than the
nigrostriatal pathway. Mol. Chem. Neuropathol. 14, 153-197
25. Schultz,W. et al. (1997) A neural substrate of prediction and reward. Science 275,
1593-1599
26. Hollerman,J.R. and Schultz,W. (1998) Dopamine neurons report an error in the
temporal prediction of reward during learning. Nat. Neurosci. 1, 304-309
27. Bayer,H.M. and Glimcher,P.W. (2005) Midbrain dopamine neurons encode a
quantitative reward prediction error signal. Neuron 47129-141 ,
28. Morris,G. et al. (2004) Coincident but distinct messages of midbrain dopamine
and striatal tonically active neurons. Neuron 43, 133-143
29. Fiorillo,C.D. et al. (2003) Discrete coding of reward probability and uncertainty
by dopamine neurons. Science 299, 1898-1902
30. Tobler,P.N. et al. (2005) Adaptive coding of reward value by dopamine neurons.
Science 307, 1642-1645
31. Waelti,P. et al. (2001) Dopamine responses comply with basic assumptions of
formal learning theory. Nature 412, 43-48
32. Tobler,P.N. et al. (2003) Coding of predicted reward omission by dopamine
neurons in a conditioned inhibition paradigm. J. Neurosci. 23, 10402-10410
33. Pan,W.X. et al. (2005) Dopamine cells respond to predicted events during
classical conditioning: evidence for eligibility traces in the reward-learning
network. J. Neurosci. 25, 6235-6242
34. Satoh,T. et al. (2003) Correlated coding of motivation and outcome of decision
by dopamine neurons. J. Neurosci. 23, 9913-9923
35. Nakahara,H. et al. (2004) Dopamine neurons can represent context-dependent
prediction error. Neuron 41, 269-280
36. D'Ardenne,K. et al. (2008) BOLD responses reflecting dopaminergic signals in
the human ventral tegmental area. Science 319, 1264-1267
37. Fiorillo,C.D. et al. (2008) The temporal precision of reward prediction in
dopamine neurons. Nat. Neurosci.
38. Centonze,D. et al. (2003) Dopamine, acetylcholine and nitric oxide systems
interact to induce corticostriatal synaptic plasticity. Rev. Neurosci. 14, 207-216
39. Barbeau,A. (1962) The pathogensis of Parkinson's disease: A new hypothesis.
Canad. Med. Ass. J. 87, 802-807
40. Graybiel,A.M. et al. (1994) The basal ganglia and adaptive motor control.
Science 265, 1826-1831
41. Wang,Z. et al. (2006) Dopaminergic control of corticostriatal long-term synaptic
depression in medium spiny neurons is mediated by cholinergic interneurons.
Neuron 50, 443-452
42. Lau,B. and Glimcher,P.W. (2007) Action and outcome encoding in the primate
caudate nucleus. J. Neurosci. 27, 14502-14514
43. Apicella,P. et al. (1992 (Neuronal activity in monkey striatum related to the
expectation of predictable environmental events. J. Neurophysiol. 68, 945-960
44. Lauwereyns,J. et al. (2002) A neural correlate of response bias in monkey
caudate nucleus. Nature 418, 413-417
45. Samejima,K. et al. (2005) Representation of action-specific reward values in the
striatum. Science 310, 1337-1340
89
Bibliography
46. Turner,R.S. and Anderson,M.E. (2005) Context-dependent modulation of
movement-related discharge in the primate globus pallidus. J. Neurosci. 252965- ,
2976
47. Gdowski,M.J. et al. (2001) Context dependency in the globus pallidus internal
segment during targeted arm movements. J Neurophysiol 85, 998-1004
48. Handel,A. and Glimcher,P.W. (2000) Contextual modulation of substantia nigra
pars reticulata neurons. J. Neurophysiol. 83, 3042-3048
49. Pasquereau,B. et al. (2007) Shaping of motor responses by incentive values
through the basal ganglia. J. Neurosci. 27, 1176-1183
50. Arkadir,D. et al. (2004) Independent coding of movement direction and reward
prediction by single pallidal neurons. J. Neurosci. 24, 10047-10056
51. Schultz,W. et al. (1993) Responses of monkey dopamine neurons to reward and
conditioned stimuli during successive steps of learning a delayed response task. J.
Neurosci. 13, 900-913
52. Matsumoto,M. and Hikosaka,O. (2007) Lateral habenula as a source of negative
reward signals in dopamine neurons. Nature 447, 1111-1115
53. Bayer,H.M. et al. (2007) Statistics of midbrain dopamine neuron spike trains in
the awake primate. J. Neurophysiol. 981428-1439 ,
54. Horvitz,J.C. (2000) Mesolimbocortical and nigrostriatal dopamine responses to
salient non-reward events. Neuroscience 96, 651-656
55. Guarraci,F.A. and Kapp,B.S. (1999) An electrophysiological characterization of
ventral tegmental area dopaminergic neurons during differential pavlovian fear
conditioning in the awake rabbit. Behav. Brain Res. 99, 169-179
56. Mirenowicz,J. and Schultz,W. (1996) Preferential activation of midbrain
dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449-451
57. Ungless,M.A. et al. (2004) Uniform inhibition of dopamine neurons in the ventral
tegmental area by aversive stimuli. Science 303, 2040-2042
58. Coizet,V. et al. (2006) Nociceptive responses of midbrain dopaminergic neurones
are modulated by the superior colliculus in the rat. Neuroscience 139, 1479-1493
59. Brown,M.T. et al. (2009) Activity of neurochemically heterogeneous
dopaminergic neurons in the substantia nigra during spontaneous and driven
changes in brain state. J. Neurosci. 29, 2915-2925
60. Ravel,S. et al. (2003) Responses of tonically active neurons in the monkey
striatum discriminate between motivationally opposing stimuli. J. Neurosci. 23,
8489-8497
61. Yamada,H. et al. (2004) Tonically active neurons in the primate caudate nucleus
and putamen differentially encode instructed motivational outcomes of action. J.
Neurosci. 24, 3500-3510
62. Martin,R.F. and Bowden,D.M. (2000) Primate Brain Maps: Structure of the
Macaque Brain, Elsevier Science
63. Szabo,J. and Cowan,W.M. (1984) A stereotaxic atlas of the brain of the
cynomolgus monkey ( Macaca fascicularis). J Comp Neurol. 222, 265-300
64. Joshua,M. et al. (2008) Midbrain dopaminergic neurons and striatal cholinergic
interneurons encode the difference between reward and aversive events at
different epochs of probabilistic classical conditioning trials. J. Neurosci. 28,
11673-11684
65. Joshua,M. et al. (2009) Encoding of probabilistic rewarding and aversive events
by pallidal and nigral neurons. J. Neurophysiol. 101, 758-772
66. Joshua,M .et al. (2009) Synchronization of midbrain dopaminergic neurons is
enhanced by rewarding events. Neuron 62(5), 695-704
90
Bibliography
67. Joshua,M. et al. (2007) Quantifying the isolation quality of extracellularly
recorded action potentials. J. Neurosci. Methods 163, 267-282
68. Paton,J.J. et al. (2006) The primate amygdala represents the positive and negative
value of visual stimuli during learning. Nature 439, 865-870
69. Kobayashi,S. et al. (2006) Influences of rewarding and aversive outcomes on
activity in macaque lateral prefrontal cortex. Neuron 51, 861-870
70. Yamada,H. et al. (2007) History- and current instruction-based coding of
forthcoming behavioral outcomes in the striatum. J. Neurophysiol. 98, 3557-3567
71. Joshua,M. et al. (2008) Different encoding of probabilistic rewarding and
aversive events by pallidal and nigral neurons. J. Neurophys. In press
72. Lavoie,B. and Parent,A. (1990) Immunohistochemical study of the serotoninergic
innervation of the basal ganglia in the squirrel monkey. J Comp Neurol. 299, 1-16\
73. Daw,N.D. et al. (2002) Opponent interactions between serotonin and dopamine.
Neural Netw. 15, 603-616
74. Dayan,P. and Huys,Q.J. (2008) Serotonin, inhibition, and negative mood. PLoS.
Comput. Biol. 4, e4
75. Redgrave,P. and Gurney,K. (2006) The short-latency dopamine signal: a role in
discovering novel actions? Nat. Rev. Neurosci. 7, 967-975
76. Ravel,S. and Richmond,B.J. (2006) Dopamine neuronal responses in monkeys
performing visually cued reward schedules. Eur. J. Neurosci. 24, 277-290
77. Grace,A.A
.and Bunney,B.S. (1983) Intracellular and extracellular
electrophysiology of nigral dopaminergic neurons--3. Evidence for electrotonic
coupling. Neuroscience 10, 333-348
78. Schneidman,E. et al. (2006) Weak pairwise correlations imply strongly correlated
network states in a neural population. Nature 440, 1007-1012
79. Lehmann,J. and Langer,S.Z. (1983) The striatal cholinergic interneuron: synaptic
target of dopaminergic terminals? Neuroscience 10, 1105-1120
80. Stoof,J.C. et al. (1992) Regulation of the activity of striatal cholinergic neurons
by dopamine. Neuroscience 47, 755-770
81. Lau,B. and Glimcher,P.W. (2008) Value representations in the primate striatum
during matching behavior. Neuron 58, 451-463
82. Matsumoto,N. et al. (2001) Neurons in the thalamic CM-Pf complex supply
striatal neurons with information about behaviorally significant sensory events. J
Neurophysiol. 85, 960-976
83. Hikosaka,O. and Wurtz,R.H. (1983) Visual and oculomotor functions of monkey
substantia nigra pars reticulata. II. Visual responses related to fixation of gaze. J.
Neurophysiol 49, 1254-1267
84. DeLong,M.R. et al. (1983) Relations between movement and single cell
discharge in the substantia nigra of the behaving monkey. J. Neurosci. 3, 15991606
85. Wichmann,T. and Kliem,M.A. (2 (004Neuronal activity in the primate substantia
nigra pars reticulata during the performance of simple and memory-guided elbow
movements. J. Neurophysiol. 91, 815-827
86. DeLong,M.R. et al. (1985) Primate globus pallidus and subthalamic nucleus:
functional organization. J. Neurophysiol. 53, 530-543
87. Cragg,S.J. et al. (2001) Dopamine-mediated volume transmission in midbrain is
regulated by distinct extracellular geometry and uptake. J Neurophysiol 85, 17611771
91
Bibliography
88. Windels,F. and Kiyatkin,E.A. (2006) Dopamine action in the substantia nigra
pars reticulata: iontophoretic studies in awake, unrestrained rats. Eur. J. Neurosci.
24, 1385-1394
89. Wittmann,B.C. et al. (2008) Striatal activity underlies novelty-based choice in
humans. Neuron 58, 967-973
90. Apicella,P. (2007) Leading tonically active neurons of the striatum from reward
detection to context recognition. Trends Neurosci. 30, 299-306
91. Schultz,W. (1998) Predictive reward signal of dopamine neurons. J.
Neurophysiol. 80, 1-27
92. Miyashita,Y. (1988) Neuronal correlate of visual associative long-term memory
in the primate temporal cortex. Nature. 335, 817-820
93. Wise,S.P. and Kurata,K. (1989) Set-related activity in the premotor cortex of
rhesus monkeys: effect of triggering cues and relatively long delay intervals.
Somatosens. Mot. Res. 6, 455-476
94. Fuster,J.M. (1999) The Prefrontal Cortex
Anatomy, Physiology, and
Neuropsychology of the frontal lobes, Lippincott-Raven
95. Turner,R.S. and Anderson,M.E. (1997) Pallidal discharge related to the
kinematics of reaching movements in two dimensions. J. Neurophysiol. 77, 10511074
96. Georgopoulos,A.P. et al. (1983) Relations between parameters of step-tracking
movements and single cell discharge in the globus pallidus and subthalamic
nucleus of the behaving monkey. J. Neurosci. 3, 1586-1598
97. Mink,J.W. and Thach,W.T. (1991) Basal ganglia motor control. I. Nonexclusive
relation of pallidal discharge to five movement modes. J. Neurophysiol. 65, 273300
98. Mitchell,S.J. et al. (1987) The primate globus pallidus :neuronal activity related
to direction of movement. Exp. Brain Res. 68, 491-505
99. Bar-Gad,I. et al. (2003) Information processing, dimensionality reduction and
reinforcement learning in the basal ganglia. Prog. Neurobiol. 71, 439-473
100.
Cragg,S.J. and Rice,M.E. (2004) DAncing past the DAT at a DA synapse.
Trends Neurosci. 27, 270-277
101.
Day,J.J. et al. (2007) Associative learning mediates dynamic shifts in
dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020-1028
102.
Suaud-Chagny,M.F .et al. (1992) Relationship between dopamine release in
the rat nucleus accumbens and the discharge activity of dopaminergic neurons
during local in vivo application of amino acids in the ventral tegmental area.
Neuroscience 49, 63-72
103.
Horvitz,J.C. (20 (09Stimulus-response and response-outcome learning
mechanisms in the striatum. Behav. Brain Res. 199, 129-140
104.
Mink,J.W. (1996) The basal ganglia: focused selection and inhibition of
competing motor programs. Prog. Neurobiol. 50, 381-425
92
Journal of Neuroscience Methods 178 (2009) 350–356
Appendix
Contents lists available at ScienceDirect
Journal of Neuroscience Methods
journal homepage: www.elsevier.com/locate/jneumeth
A noninvasive, fast and inexpensive tool for the detection of eye open/closed
state in primates
Rea Mitelman a,b,∗,1 , Mati Joshua a,b,1 , Avital Adler a,b , Hagai Bergman a,b,c
a
b
c
Department of Physiology, The Hebrew University – Hadassah Medical School, Jerusalem 91120, Israel
The Interdisciplinary Center for Neural Computation, The Hebrew University, Jerusalem 91904, Israel
Eric Roland Center for Neurodegenerative Diseases, The Hebrew University, Jerusalem 91904, Israel
a r t i c l e
i n f o
Article history:
Received 5 October 2008
Received in revised form 4 December 2008
Accepted 4 December 2008
Keywords:
Electrophysiological recordings
Image processing
Eyeblink conditioning
Primates
Eyelid
a b s t r a c t
Accurate detection of the eye state (i.e., open or closed) of animals during electrophysiological recordings
is often crucial for analyzing physiological data. This requires a system which is reliable, and preferably noninvasive and inexpensive. Here we present such a tool incorporating a standard digital camera
and a semi-automatic eye state detection (ESD) algorithm that can be used easily in typical primate
electrophysiological setups.
The ESD algorithm is based on the high light absorbance of the iris and pupil relative to the eyelid
and takes advantage of the unique conditions found in primate physiological recordings (minimal area of
sclera and head fixation). The ESD algorithm is as accurate as a human observer, and is not vulnerable to
variance inherent to human decisions that it requires (i.e., eye location setting, training set classification
and threshold setting). The temporal resolution with standard interlaced digital cameras is 17–20 ms.
This is sufficient for the detection of eye state changes during electrophysiological recordings including
spontaneous blinking and eye blink conditioning, as demonstrated here. Furthermore, the ESD tool can
be applied to other physiological areas of research in which changes in eye state are critical to analyzing
neuronal activity.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
Vision is the main sense by which primates (both human and
nonhuman) perceive the world. Unlike other senses, visual input
can be completely blocked at the level of the sensory organ by
the eyelid. Therefore, understanding any neuronal activity involving
the visual system requires an accurate recording of the state of the
eyelids, i.e., whether they are open or closed. Furthermore, detecting the state of the eyelid is crucial for monitoring motor output
during eyeblink conditioning (Marquis and Hilgard, 1937). Finally,
detection of eyeblink enables the study of the natural frequency of
blinking, which is altered in different pathological states such as
schizophrenia or Parkinson’s disease (Ponder and Kennedy, 1927;
Stevens, 1978; Karson, 1983).
Several methods have been suggested for detection of the eye
state of primates. One useful technique is electromyography (EMG)
of the orbicularis oculi, the main muscle that is involved in blink-
∗ Corresponding author at: Department of Physiology, The Hebrew University –
Hadassah Medical School, POB 12272, Jerusalem 91120, Israel. Tel.: +972 2 6757388;
fax: +972 2 6439736.
E-mail address: [email protected] (R. Mitelman).
1
These authors contributed equally.
ing movement, and detecting its activation during eye closure
(Silverstein et al., 1978; Blazquez et al., 2002). Another more direct
method attaches the eyelid by a wire to a microtorque potentiometer that can measure its movements (Pennypacker et al., 1966).
These methods are somewhat invasive, and it is unclear how these
devices influence the natural movements of the eyelid.
Less invasive ways include connecting an electromagnetic
search coil to the eyelid (Robinson, 1963; Porter et al., 1993). Here,
a wire coil is secured to the upper eyelid of the animal, and placed
in a weak magnetic field. This generates a current in the coil that
is proportional to the angular velocity of the eyelid, thus enabling
detection of changes in the state of the eye. Another noninvasive
method uses an infrared light-emitting diode (LED) and a photo
sensor (Thompson et al., 1994; Clark and Zola, 1998). However, this
method requires placing the detector at a distance of 4–5 mm from
the animal’s eye, which may block significant parts of its field of
view. These methods may be irritating to the primates, and therefore could influence their behavior.
The least invasive method that has been used by researchers is
direct detection by a human observer. This is usually done offline,
after videotaping the animal’s behavior (e.g., Nevet et al., 2004).
However this method is very cumbersome and time consuming,
and therefore is not feasible for processing large amounts of data.
Furthermore, human observers are prone to mistakes when asked
0165-0270/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.jneumeth.2008.12.007
93
R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356
to classify long video sequences and may be biased by their a priori
expectations.
Several automatic visual analysis based methods have been suggested for eye state detection in other mammals. In humans there
are several algorithms (Tian et al., 2000; Miyakawa et al., 2004;
Benoit and Caplier, 2005; Tan and Zhang, 2006; Heishman and
Duric, 2007), but they are rather complex, and do not take into
account some of the differences between human and non-human
primates (e.g. the difference in the sclera’s relative size). Moreover,
these algorithms are primarily designed for non-scientific goals
such as driver fatigue detection, and are intended to achieve impressive stability under unsupervised circumstances. On the other hand,
they do not take advantage of the typical primate physiological
recording setting, and fall below the performance level of human
observers. A system that was suggested for use in rabbits (Bracha et
al., 2003) has the disadvantage of attaching markers on the upper
and lower eyelid of the animal and therefore is less suitable for
daily repeating recording sessions that are typical of physiological
studies of awake behaving primates.
In this manuscript we suggest a simple, noninvasive and
inexpensive video-based method to detect the state of the eye
of primates under head immobilization conditions. The system
takes advantage of the typical setting of primate physiological
experiments, and operates on the basis of minimal changes in
the position of the eyes during a recording session. The video
camera can be positioned at a distance from the monkey (depending on its zoom properties) and therefore does not obscure
the visual field and does not modify natural blinking behavior. The method is also highly accurate, with a performance
level equivalent to that of a human observer (a mean normalized error of 0.15%). Furthermore, since this method works with
infrared videotaping, the eye state can be detected in a dark
environment.
Appendix
351
2. Materials and methods
The tool we describe in this paper includes standard hardware
and simple custom-made software. We present the hardware we
used in the experiments, and the way we chose to implement the
algorithm, although any equivalent hardware and software implementation can be employed.
2.1. Physical setup and data acquisition
All experimental protocols were performed in accordance with
the National Institutes of Health Guide for the Care and Use of Laboratory Animals and with Hebrew University guidelines for the use
and care of laboratory animals in research, supervised by the institutional animal care and use committee. Briefly, monkeys went
through an operation during which a head holder and a recording
chamber were attached to their head. During recording sessions,
the monkeys’ heads were immobilized and microelectrodes were
advanced into different targets in the basal ganglia.
A standard infrared digital surveillance camera was used to
digitally record the monkey’s facial movements (AVer-s 2.54, AverMedia Systems, Taipei, Taiwan). The recording was done in an
interlaced mode, with sampling rate of 25 frames per second (PAL
mode). In the interlaced mode, each frame is composed of two separately sampled fields: one occupying the even rows and the other
the odd ones, without smoothing them. Movies were saved in AVI
format in 640 × 480 pixel resolution, with a grayscale color depth
of 8 bits (i.e., 256 levels of infrared brightness).
All data analysis was done in Matlab (Version 7.5, R2007b, The
MathWorks). Movies or single frames were easily imported to Matlab, such that each frame is a single brightness matrix and an entire
movie is a hypermatrix. To improve performance, importing was
done in blocks of a few dozen frames. Each frame was de-interlaced
Fig. 1. Example of density histograms of the brightness of a closed and open eye. (a and b) Randomly chosen open (a) and closed (b) eye field. Pixels in two ranges of brightness
are color marked, and the original eye fields are shown in the inset. The darker hue, marked in blue, is seen specifically in the pupil, and the intermediate hue, marked in green,
is seen mostly in the iris. Scale indicates 10 pixels horizontally and vertically. (c and d) Brightness histograms of the corresponding pictures. The two peaks were manually
marked, and the color codes are as in a and b. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
94
352
R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356
Appendix
to its two fields, and missing lines were interpolated by averaging
each adjacent pair of existing lines. This recreated a full-sized image
and doubled the sampling rate to 50 images per second.
2.2. Algorithm description
The eye state detection (ESD) algorithm is based on the difference in light absorbance between the eyelid and the eye itself – the
pupil, as well as the iris. Unlike humans, non-human primates have
a relatively small sclera, so the pupil and the iris occupy most of
the eye opening space. Visible light, as well as infrared radiation, is
absorbed by the pupil and the iris considerably more than it is from
the eyelid (Durkin et al., 1990; Thompson et al., 1994). As a result,
in an open eye image, a certain number of pixels are dramatically
darker than all other pixels, whereas in a closed eye image there are
hardly any such dark pixels. This can be seen in Fig. 1 which plots
the brightness histogram of the area of the eye. The brightness histogram of a typical open eye has two peaks that do not appear in
the unimodal brightness histogram of the closed eye. The darkest
peak originates from the high light absorbance of the pupil, and the
second dim peak from the iris. As outlined below, proper detection
of these peaks is enough for correct automatic classification of the
state of the eye.
The ESD algorithm is semi-automatic and requires three quick
human decisions. First, the user is asked to indicate the location of
the monkey’s eye (termed “eye field”) in an arbitrarily chosen frame,
by marking two opposite corners of a rectangle (Fig. 2a). Since the
head was immobilized during the experiment reported here, this
rectangle only needed to be marked once per experimental day.
The next step is training the algorithm. Eye fields from the video
are chosen randomly by the algorithm, and are presented to the
user. The user classifies the state of the eye in each eye field as
open, closed, or inconclusive (Fig. 2b). This step is completed when
the user determines that enough eye fields of both conclusive states
have been classified (usually about 20 fields in total). Most videos
contain more open eye fields than closed ones and a similar ratio is
therefore found in the training set.
The last step is to set the thresholds for the classification: brightness threshold and eye state threshold. This is done by pooling
the eye field matrices for each conclusive state. This yields two
brightness histograms, one for the open and one for the closed
eye (Fig. 2c). The brightness histogram of the open eye fields consistently includes two peaks of darker pixels that fail to appear
in the brightness histogram of the closed eye. The user is asked
to set a brightness threshold that includes the maximum area of
these peaks and the minimum area of the closed eye histogram
(the dashed line in Fig. 2c). All pixels darker than this threshold are
considered “black” for the following stage of the algorithm.
The ESD algorithm calculates the number of black pixels in each
eye field in the training set. The closed eye field with the maximal
number of black pixels and the open eye field with the minimal
number of black pixels are defined as ‘anchors’. The average number
of black pixels in the two anchors is set as the eye state threshold.
Taking the midpoint of the anchors as a threshold yields optimal
separation in the training set, in the aspect of minimizing the generalization error. Such a threshold is conceptually similar to the
one-dimensional case in the support vector machine (SVM) classification algorithm (Cortes and Vapnik, 1995). The calculation of the
eye state threshold completes the training stage, and the algorithm
now has all the necessary data for classification of the entire day of
the experiment. The entire training stage takes the user about 30 s,
and contains all the human-based decision input to the process.
The eye state classification is obtained by calculating the number
of black pixels for each eye field of the entire video sequence, based
on the user’s chosen brightness threshold. Each eye field is then
classified according to the eye state threshold: eye fields with more
Fig. 2. The training stage of the eye state detection algorithm (ESD). (a) Arbitrarily chosen frame presented to the user, and the marked location of the eye. Scale
indicates 50 pixels horizontally and vertically. (b) Three randomly chosen eye fields
that were categorized by the user as open (I), closed (II) and inconclusive (III). Scale
indicates 20 pixels horizontally and vertically. (c) Brightness histogram of the open
(top) and closed (bottom) eye fields of the training set. The vertical dashed line is the
brightness value that was chosen by the user as the threshold for the categorization
of the entire video.
black points than the eye state threshold are classified “open”, and
ones with fewer black points than the eye state threshold are classified “closed”. This classification is ‘hard’ in the sense that a decision
is forced. Therefore, eye fields that could be perceived by a human
observer as inconclusive are also classified according to the number
of black pixels. Using our hardware (2.8 GHz Pentium with 2 GB of
RAM), the classification of a 2-h video took about 20 min (Matlab m
files can be found at http://basalganglia.huji.ac.il/assets/ESD.zip).
3. Results
3.1. Algorithm performance and stability
The eye state detection algorithm is based on two thresholds. As
described above, the first is the brightness threshold, which determines how dark (on a scale of 0–255) a pixel needs to be so as to
be considered black and is set by the user during the training stage.
The second is the eye state threshold, which determines the state
of the eye according to the number of black pixels, and is calculated
automatically (based on the user’s classification during the training stage). Note that these two thresholds could cause performance
instability in the algorithm, since they are influenced by the features
of the training set and the user’s decisions. Therefore, we calculated
95
R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356
Fig. 3. ESD algorithm error as a function of the two thresholds. Normalized error is
shown (color coded) as a function of the two types of classification thresholds used
in the ESD algorithm (brightness threshold and eye state threshold). Normalized
error is the probability of a classification error, assuming equal probability of open
and closed eye. Brightness threshold is the threshold which defines which pixels
are dark enough and is determined by the user. The eye state threshold defines how
many black pixels (as defined by the first threshold) are enough to determine that the
eye is open, and is determined by the algorithm according to the human decisions
on the training set. The green Xs denote brightness thresholds chosen by the user,
and the corresponding number of eye state thresholds for different training sets.
This was done by repeating the algorithm 50 times with a training set of 20 images
chosen randomly. Although the eye state threshold has a relatively large variance,
all repetitions produced a normalized error of 0–1.5%. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version
of the article.)
the algorithm error as a function of these two thresholds. We chose
a 20,000 field (400 s) video, and one of the experimenters (RM)
classified its fields manually into open eye fields (88.0%), closed
eye fields (8.9%) and inconclusive eye fields (3.1%). The inconclusive
eye fields were discarded in this analysis, since the human observer
performance was considered the gold standard.
Two types of error are possible: detecting a closed eye when it
is open, and vice versa. We were interested in performance independent of the eye state statistics in the chosen video. Therefore we
used a normalized error (EN ), which is identical to the probability
of an error in the case of equiprobability of open and closed eyes:
EN =
1 P(classify open, closed eye)
2
P(closed eye)
+
=
1
2
+
1 P(classify closed, open eye)
2
P(open eye)
P(classify open, closed eye)
P(closed eye)
P(classify closed, open eye)
P(open eye)
By the definition of conditional probability, this is also the average of the conditioned probability of an error:
EN =
1
[P(classify open|closed eye) + P(classify closed|open eye)]
2
Each run of the algorithm produces two values, as described
above: a brightness threshold selected by the user, and an eyestate threshold calculated by the algorithm according to the user’s
open/close classification in the training set. To show how the
normalized error depends on these two values, Fig. 3 plots the
normalized error as a function of them. The figure shows that a
large range of these two thresholds results in a relatively low error;
hence the algorithm’s performance is stable over a large range of
thresholds.
96
Appendix
353
Fig. 4. ESD algorithm performs more accurately than the SVM algorithm and needs a
much smaller training set. The median of the normalized error of the ESD algorithm
(continuous), and the SVM algorithm (dashed) ± the median absolute deviation
(gray shadow), calculated following 10 repeats per training set size. The training sets
were chosen randomly, while keeping a constant ratio of closed and open eye fields.
SVM fails to match the performance of the ESD algorithm for median performance,
variability and dependence on training size.
Next, in order to verify that the thresholds generated during the
training stage of the algorithm actually resulted in a low normalized
error, we ran the semi-automatic algorithm 50 times with different
randomly chosen training sets of 20 conclusive eye fields (i.e., open
or closed) while applying the same open and closed eye statistics as
in the entire video (i.e. 18 open eye fields and 2 closed). The resulting
threshold pair of each run is plotted as an ‘x’ in Fig. 3. The brightness thresholds have a relatively low range of values, whereas the
eye-state thresholds have a higher range of values. This is due to
the low variance in the brightness of the iris and the pupil, in comparison to a higher variance in the number of pupil and iris pixels.
Nevertheless, the number of classification errors for this range of
thresholds falls within the span of a relatively small error. The normalized classification error had a mean of 0.15%, and a maximal
value of 1.4%.
Next, to assess the semi-automatic algorithm’s dependence on
the size of the training set, we trained it on different training set
sizes, ranging from two to thirty eye fields. Again, this was done by
forcing the statistics of the entire video on the training sets (while
keeping at least one eye field of each conclusive type). This was
repeated 10 times for each training set size. The median ± absolute
median deviation of the classification’s normalized error is plotted
in Fig. 4 (continuous line). This shows an impressively low normalized error median even on a training set with a single eye field for
each category, and a negligible error with training set as small as 8
eye fields.
To demonstrate the robustness of the ESD algorithm, we compared its performance to the SVM algorithm, which is an optimal
linear classification algorithm, in the sense of minimizing generalization error. Briefly, this algorithm finds the linear classifier of two
clusters (in an n-dimensional hyperplane) with maximal margins
(Cortes and Vapnik, 1995). We reshaped the eye field matrices to
vectors, and used randomly chosen training sets of these vectors
to train the SVM algorithm. We used the linear classifier to classify
the entire video, and calculated the normalized error (identically
to the normalized error of the ESD algorithm). This was repeated
with different training set sizes, 10 times per size, and we obtained
the median ± absolute median deviation (Fig. 4, dashed line). This
shows that the ESD algorithm performs considerably better than the
SVM algorithm. Although the difference decreases with the increase
354
R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356
Appendix
not synchronized. Additionally, transient luminance changes may
occur, e.g. due to opening of the recording chamber cover. To test
the stability of the ESD algorithm as regards changes of luminance,
we calculated the average brightness of the entire image (while
omitting the area of the frame showing the time, see Fig. 2a bottom
right) for each field. This is plotted in Fig. 6a, and reveals high frequency changes in luminance, as well as a transient change around
200 s. We manually split the video into brighter and darker periods,
which are gray color coded in the figure. We ran the ESD algorithm
50 times with a training set of 20 fields, and calculated the average error per field. Fig. 6b shows the average normalized error in
each of the video streams ± the standard deviation. The difference
between the two error values was not significant (Student’s t-test,
p = 0.19), which shows nicely that the open/close classification error
is negligibly affected by the general luminance.
3.2. Possible applications of the ESD algorithm
Fig. 5. ESD algorithm error as a function of the size of the rectangle which marked
the eye location. (a) Normalized error as a function of the size of the eye field (in
pixels on the diagonal). The algorithm was trained with the same training set, but
with different sizes of eye rectangles. The normalized error of each eye rectangle
size is shown as a single point on the curve. (b) An arbitrary field, with different
sizes of marked rectangles. The rectangles correspond to the same color points as in
(a). There is a broad area between the green and blue frames in which the error is
zero, and a broader area between the green point and the cyan in which the error is
less than 5%. The error only becomes considerable beyond this range. Scale indicates
50 pixels horizontally and vertically. (For interpretation of the references to color in
this figure legend, the reader is referred to the web version of the article.)
in training set size, it remains noticeable even in larger (n = 100)
training sets (data not shown). Furthermore, the ESD algorithm
emerges as more reliable, since the deviation around the median
error is smaller than the deviation obtained in SVM (Fig. 4, gray
shadow).
Another user-selected parameter which could be an additional
source of error is the location and size of the area of the eye (eye
field). To test ESD algorithm’s stability for different eye field sizes,
we compared the algorithm’s performance on a randomly chosen
training set. Fig. 5 depicts the normalized error as a function of the
eye field’s rectangular size, measured by the length of diagonal (in
pixels). The ESD algorithm maintained its level of performance for
a large range of sizes, with a normalized error of less than 10% even
for a large rectangle that occupied almost the entire monkey’s face.
Furthermore, a low level of error was maintained for a relatively
small rectangle.
Finally, surveillance cameras are sensitive to visual light; hence
changes in the luminance might affect the tool’s performance.
Luminance can go through high frequency modulations, e.g.,
because the camera’s sampling and the refresh rate of the video
screen (where visual stimuli were presented to the monkey) were
Eye state classification has potential for a wide variety of applications. We used the system on a delayed probabilistic classical
conditioning task (Joshua et al., 2008). In this task, the monkey
was repeatedly presented with one of a set of visual stimuli, each
predicting an outcome with a different probability. Three stimuli
predicted the administration of liquid food and three predicted the
delivery of an airpuff with the same probabilities. Fig. 7a shows the
percentage of the trials in which the eye is closed, in 20 ms bins,
with respect to the administration of the outcome (food/airpuff).
This indicates that the monkey closes its eyes to airpuffs but not
to food. Furthermore, the monkey indeed learned to distinguish
between these stimuli, as can be seen by the timing of the response
which preceded the airpuff itself (Fig. 7a). The algorithm provides
an elegant way of showing that the monkey’s eye state has an
increasingly higher probability of being closed as the time of the
airpuff approaches.
We further used the algorithm to assess the blinking response
to apomorphine (Apo) induced dyskinesias. Systemic injection of
Apo, an ultra-fast dopamine agonist, induces orofacial dyskinesias
which are known to include higher blinking rates (Blin et al., 1990;
Kleven and Koek, 1996; Nevet et al., 2004). This is usually measured
by human observers who count the number of blinks, a method
which is prone to error and bias. Fig. 7b depicts the blinking rate
of a monkey after intramuscular injection of 0.1 mg/kg Apo HCl 1%,
as measured by the ESD algorithm. Closed eye events that lasted
less than a second were defined as a blink. The blinks were counted
in 1 min bins, and then smoothed with a Gaussian window with a
standard deviation of 1 bin. As described in previous studies (Nevet
et al., 2004, Fig. 1c), the blinking rate increased with the Apo administration, and remained so for at least 25 min.
4. Discussion
This manuscript describes a fast, simple, inexpensive and noninvasive tool for eye state detection during electrophysiological
studies of primates. It is adapted to perform optimally in the typical
setting of primate physiological studies; e.g., head fixation (Lemon,
1984). This type of tool is valuable for many primate studies, and
can be easily adapted to most setups, since the only hardware it
requires is a digital surveillance camera. The use of such camera,
which is sensitive to the infrared wavelength and has its own source
of infrared radiation, makes it possible to detect eye state under
different illumination conditions, including darkness. Furthermore,
using a digital camera rather than an analog device makes the eye
state detection less vulnerable to electrical noise.
The temporal resolution of these cameras is usually 25–30
frames per second, which can be doubled by de-interlacing. This
97
R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356
Appendix
355
Fig. 6. ESD algorithm error as a function of general scene luminance. (a) Luminance in the entire scene as a function of time. Luminance was calculated by taking the average
brightness value in each field during the 400-s video (omitting the area of the video image presenting the time). A striking increase in luminance can be seen around 200 s
(due to opening of the recording chamber cover). This was manually marked and the video was split to a darker, early period (marked in black) and a lighter, late period
(marked in gray). (b) The average normalized error of the darker and the lighter periods is presented (same color code as in a). The error was calculated by repeating the ESD
algorithm 50 times with a training set of 20 eye fields. The error bars represent the standard deviation of the normalized error. Both periods present a very low error level,
with no significant difference (Student’s t-test, p = 0.19), indicating the low dependence of ESD performance on general scene luminance.
makes the tool suitable for detection of non-human primates blinks
(Baker et al., 2002), although cases of blinks with a fractional
closure of the eyelid (Rambold et al., 2005) might be missed.
For higher temporal resolution, a faster camera is needed, rendering the system more expensive, but requiring no changes in
the algorithm. Correct timing of the electrophysiological recording and the eye state calls for accurate synchronization between
the video and the electrophysiological recording. In our setup
this was done by feeding a time signal from the recording system into the digital video, presenting it on the bottom left corner
(e.g., Fig. 2a), and detecting it offline. Other synchronization signals, such as an auditory signal for the camera, can be used as
well.
Although detection of the state of a single eye was sufficient for
our purposes, the algorithm could be easily adapted to detect the
states of both eyes by marking their location and possibly detecting the two thresholds separately. This would also require proper
positioning of the camera, such that both eyes are clearly visible.
Both ESD and SVM algorithms are supervised learning binary
classifiers, but ESD performs considerably better than the SVM algorithm in detecting the eye state, although SVM is considered a very
robust linear classifier. This is probably because the ESD algorithm
makes assumptions regarding the data, whereas SVM does not. The
ESD assumes that the number of pixels in the eye field that are suf-
ficiently dark is a strong enough rule for the detection of the state
of the eye. Naturally, when this assumption does not hold, ESD will
perform worse than SVM.
Indeed, ESD algorithm’s errors occur mostly when there are
fewer dark pixels, e.g. when the eye is turned away from the camera,
or the eye-lid is partially closed. This also accounts for the increase
in error with the decrease in eye field size (Fig. 5a). Nevertheless,
the ESD algorithm maintains a low level of error for a relatively
small rectangle. This suggests that the algorithm is stable to physiological changes in the angle of the eye in which smaller areas of
the pupil and iris are visible. ESD algorithm is also stable to changes
in general scene luminance. This is because the iris, and even more
so the pupil, have a high light absorbance, and therefore there are
enough dark pixels in an open eye image, even with greater light
intensity.
This semi-automatic procedure was found to be adequate for
our needs, because it was highly accurate and required very little
training time. Therefore, we did not find it necessary to make it
fully automatic. However, fully automatic algorithms that detect
the location of the eye for a moving human face have been reported
by other researchers (e.g., Craw et al., 1992), and could be adapted to
our tool. This may enable using the tool for experiments that do not
require a head restraint; e.g., with chronically implanted electrodes
(Nordhausen et al., 1996; Jackson and Fetz, 2007).
Fig. 7. Possible applications of the eye state detection tool. (a) Eye closure to airpuff. The monkey was presented with a reward (liquid food) or an aversive stimulus (airpuff)
at t = 0, after a reward- or aversion-predicting visual stimulus. The percentage of times that the animal kept the eye closed in 20 ms bins is plotted ± variance (shaded).
The monkey closed its eyes in anticipation and following the aversive stimulus, but not for the reward. (b) Apomorphine induced dyskinesias increases blinking frequency.
Blinking rate was measured by counting eye closures lasting less than a second, before and after the Apomorphine injection (at t = 0). The dashed line is the raw blinking rate
per minute and the continuous line is the blinking rate after smoothing with a Gaussian window of one bin (1 min).
98
356
R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356
Finally, there is continual interest in the effect of arousal levels on neural activity (e.g., Steriade and McCarley, 2005). Eye state
detection, supported by analysis of eye-movements, EEG and EMG
provide an accurate estimation of arousal state. Moreover, it provides a reliable estimation of blinking rate, which is affected by
many physiological and pathological processes. Thus overall, our
tool provides a reliable, noninvasive and inexpensive method for
detection of eye open/closed states, and is therefore a recommended add-on for primate electrophysiological setups.
Acknowledgement
This work was partly supported by a Hebrew University Netherlands Association grant entitled “Fighting against Parkinson”, and
the Harry and Sylvia Hoffman leadership and responsibility program. We would like to thank E. Singer for language editing.
References
Baker RS, Radmanesh SM, Abell KM. The effect of apomorphine on blink kinematics
in subhuman primates with and without facial nerve palsy. Invest Ophthalmol
Vis Sci 2002;43:2933–8.
Benoit A, Caplier A. Hypovigilence analysis: open or closed eye or mouth? Blinking
or yawning frequency? In: IEEE conference on advanced video and signal based
surveillance, AVSS; 2005. p. 207–12.
Blazquez PM, Fujii N, Kojima J, Graybiel AM. A network representation of response
probability in the striatum. Neuron 2002;33(6):973–82.
Blin O, Masson G, Azulay JP, Fondarai J, Serratrice G. Apomorphine-induced blinking
and yawning in healthy volunteers. Br J Clin Pharmacol 1990;30:769–73.
Bracha V, Nilaweera W, Zenitsky G, Irwin K. Video recording system for the measurement of eyelid movements during classical conditioning of the eyeblink response
in the rabbit. J Neurosci Methods 2003;125:173–81.
Clark RE, Zola S. Trace eyeblink classical conditioning in the monkey: a nonsurgical
method and behavioral analysis. Behav Neurosci 1998;112:1062–8.
Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20:273–97.
Craw I, Tock D, Bennett A. Finding face features. In: Goos G, Hartmanis J, editors.
Computer vision – ECCV’92. Berlin: Springer; 1992. p. 92–6.
Durkin M, Prescott L, Jonet CJ, Frank E, Niggel M, Powell DA. Photoresistive measurement of the Pavlovian conditioned eyelid response in human subjects.
Psychophysiology 1990;27:599–603.
Heishman R, Duric Z. Using image flow to detect eye blinks in color videos. In: IEEE
workshop on applications of computer vision, WACV; 2007. p. 52.
Appendix
Jackson A, Fetz EE. Compact movable microwire array for long-term chronic
unit recording in cerebral cortex of primates. J Neurophysiol 2007;98:
3109–18.
Joshua M, Adler A, Mitelman R, Vaadia E, Bergman H. Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward
and aversive events at different epochs of probabilistic classical conditioning
trials. J Neurosci 2008;28:116.73–84.
Karson CN. Spontaneous eye-blink rates and dopaminergic systems. Brain
1983;106(Pt 3):643–53.
Kleven MS, Koek W. Differential effects of direct and indirect dopamine agonists on
eye blink rate in cynomolgus monkeys. J Pharmacol Exp Ther 1996;279:1211–9.
Lemon RN. Methods for neuronal recording in conscious animals. In: IBRO handbook
series: methods in neurosciences. London: Wiley; 1984.
Marquis DG, Hilgard ER. Conditioned responses to light in monkeys after removal of
the occipital lobes. Brain 1937;60:1–12.
Miyakawa T, Takano H, Nakamura K. Development of non-contact real-time
blink detection system for doze alarm. In: SICE Annual Conference; 2004.
p. 1626–31.
Nevet A, Morris G, Saban G, Fainstein N, Bergman H. Discharge rate of substantia
nigra pars reticulata neurons is reduced in non-parkinsonian monkeys with
apomorphine-induced orofacial dyskinesia. J Neurophysiol 2004;92:1973–81.
Nordhausen CT, Maynard EM, Normann RA. Single unit recording capabilities of a
100 microelectrode array. Brain Res 1996;726:129–40.
Pennypacker HS, King FA, Achenbach KE, Roberts L. An apparatus and procedure
for conditioning the eye-blink reflex in the squirrel monkey. J Exp Anal Behav
1966;9:601–4.
Ponder E, Kennedy WP. On the Act of Blinking. Q J Exp Physiol 1927;18:89–110.
Porter JD, Stava MW, Gaddie IB, Baker RS. Quantitative analysis of eyelid movement metrics reveals the highly stereotyped nature of monkey blinks. Brain Res
1993;609:159–66.
Rambold H, El Baz I, Helmchen C. Blink effects on ongoing smooth pursuit eye
movements in humans. Exp Brain Res 2005;161:11–26.
Robinson DA. A method of measuring eye movement using a scleral search coil in a
magnetic field. IEEE Trans Biomed Eng 1963;10:137–45.
Silverstein LD, Graham FK, Eyeblink EMG. a miniature eyelid electrode for recording
from orbicularis oculi. Psychophysiology 1978;15:377–9.
Steriade M, McCarley R. Brain control of wakefulness and sleep. 2nd ed. New York:
Springer; 2005.
Stevens JR. Eye blink and schizophrenia: psychosis or tardive dyskinesia? Am J Psychiatry 1978;135:223–6.
Tan H, Zhang YJ. Detecting eye blink states by tracking iris and eyelids. Pattern Recog
Lett 2006;27:667–75.
Thompson LT, Moyer JR, Akase E, Disterhoft JF. A system for quantitative analysis of
associative learning. Part 1. Hardware interfaces with cross-species applications.
J Neurosci Methods 1994;54:109–17.
Tian Yl, Kanade T, Cohn J. Eye-state action unit detection by gabor wavelets. In:
Advances in multimodal interfaces – ICMI 2000. Berlin: Springer; 2000. p.
143–150.
99
‫תגובות לצפייה לתגמול מראה על הבדל בין מערכת התגמולים והעונשים במוח וייתכן שזהו הבסיס‬
‫העצבי להבדלים בין מערכות הללו הנצפים בהתנהגות האנושית‪.‬‬
‫תקציר‬
‫הגרעינים הבאזליים הם מבנים עצביים בתוך מנגנוני השליטה המוטוריים‪ ,‬קוגניטיביים ואמוציונאליים‪.‬‬
‫מחקריים ניסויים ועבודות תיאורטיות שנעשו לאחרונה מתארים את הגרעינים הבזליים כמערכת‬
‫המיישמת למידת חיזוקים‪ .‬המחקרים הללו הציעו שהפעילות העצבית של הגרעינים הבזליים מאפשרת‬
‫מירוב של התגמולים העתידיים על ידי שליטה בסביבה‪.‬‬
‫בפרט‪ ,‬מחקרים הראו שהתאים הדופמינרגים במוח האמצעי מגיבים בעליית קצב הירי כאשר מצב החיה‬
‫טוב מהמצופה )הפתעה חיובית(‪ .‬האות העצבי הזה תואם לאות השגיאה המתקבל בלמידת חיזוקים‪ .‬אולם‬
‫קצב הירי הבסיסי הנמוך של התאים הללו מגביל את היכולת שלהם לקודד אירועים שליליים על ידי‬
‫הורדת קצב הירי‪.‬‬
‫מגבלות התגובה של התאים הדופמינרגים הובילו אותי לבחינת שתי אפשריות‪ .‬הראשונה היא שהפעילות‬
‫בגרעינים הבזליים מקודדת הן ערכיים חיוביים והן ערכיים שליליים‪ .‬האפשרות השנייה היא שרק ערכיים‬
‫חיוביים מקודדים בגרעינים הבזליים והערכים השליליים מקודדים על ידי מבנים עצביים אחרים‪.‬‬
‫בכדי להפריד בין האפשריות אימנתי שני קופים במשימה הסתברותית של התניה קלאסית‪ .‬בכל ניסוי‬
‫הקוף נחשף לתמונה שלאחריה הוא יכול לקבל אוכל )תגמול(‪ ,‬פרץ של אויר )עונש( או כלום‪ .‬במהלך‬
‫המשימה רשמתי את הפעילות העצבית מתאים בחמישה אזורים שונים בגרעינים הבזליים של קופים‬
‫מתנהגים‪ .‬הפעילות נרשמה הן מהמודולטורים )התאים הדופאמינרגים במוח התיכון ותאי הביניים‬
‫הכולינרגים בסטריאטום( והן מהתאים מפרישי ה‪) GABA‬סטריאטום‪ ,‬החלק הפנימי והחיצוני של ה‬
‫‪ Globus Pallidus‬ומ‪ (Substantia Nigra pars reticulata‬בציר המרכזי של הגרעינים הבזאליים‪.‬‬
‫הליקוקים והמצמוצים במהלך הצגת התמונות הראו שהקופים מצפים לתגמול ולעונש אולם בכל‬
‫האוכלוסיות העצביות שמהם רשמתי הפעילות קודדה את הצפייה לאוכל אך לא את הצפייה לפרץ האוויר‪.‬‬
‫בנוסף השוויתי את מאפייני התגובה של התאים המודולטוריים ושל תאים מפרישי ה‪ GABA‬ומצאתי‬
‫שבעוד שלתאים המודולטוריים יש תגובה מהירה ואחידה התגובות של התאים מפרישי ה‪ GABA‬היו‬
‫ארוכות מגוונות שכללו הן עליות והורדות קצב‪.‬‬
‫ניתוח המתאם בין הפעילות של תאים דופמינריגים שנרשמו בו זמנית הראה עליה מהירה בתיאום לאחר‬
‫אירועים הקשורים לתגמול אך לא לאחר אירועים הקשורים לעונש‪ .‬העלייה במתאם הפעילות לא שיקפה‬
‫באופן ישיר את שינויי הקצב של התאים‪ .‬הדמיה ממוחשבת הראתה שייתכן והעלייה במתאם הפעילות של‬
‫התאים הדופאמינרגים מספקת מנגנון נוסף לשליטה על כמויות הדופמין בסטריאטום מעבר לשליטה‬
‫המתאפשרת בעזרת שינויי קצב ודפוס תגובה‪.‬‬
‫לסיכום ההבדל בין התגובות של תת האוכלוסיות של הגרעינים הבזליים )מודולטורים לעומת תאים‬
‫מפרישי ה‪ (GABA‬מראה שלאוכלוסיות הללו תפקידים שונים בלמידת חיזוקים שבו האוכלוסיות‬
‫המודולטוריות מספקות לציר המרכזי אות חד מימדי‪ .‬ההבדל בין ההתנהגות שבה מצאתי הן תגובות הן‬
‫לצפייה לעונש והן תגובות לצפייה לתגמול לבין הפעילות התאית בגרעינים הבזליים שבה מצאתי בעיקר‬
‫עבודה זו נעשתה בהדרכתו של פרופ' חגי ברגמן‬
‫תפקיד הגרעינים הבזליים בלמידת חיזוקים‬
‫חיבור לשם קבלת תואר דוקטור לפילוסופיה‬
‫מאת‬
‫מתי יהושע‬
‫הוגש לסינט האוניברסיטה העברית בשנת תשס"ט‬
‫מרץ ‪2009‬‬