Download Discrete coding of stimulus value, reward expectation, and reward

Document related concepts

Apical dendrite wikipedia , lookup

Biochemistry of Alzheimer's disease wikipedia , lookup

Endocannabinoid system wikipedia , lookup

Haemodynamic response wikipedia , lookup

Environmental enrichment wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Artificial general intelligence wikipedia , lookup

Convolutional neural network wikipedia , lookup

Activity-dependent plasticity wikipedia , lookup

Neurotransmitter wikipedia , lookup

Axon guidance wikipedia , lookup

Axon wikipedia , lookup

Molecular neuroscience wikipedia , lookup

Electrophysiology wikipedia , lookup

Single-unit recording wikipedia , lookup

Nonsynaptic plasticity wikipedia , lookup

Multielectrode array wikipedia , lookup

Biological neuron model wikipedia , lookup

Caridoid escape reaction wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Spike-and-wave wikipedia , lookup

Metastability in the brain wikipedia , lookup

Neural correlates of consciousness wikipedia , lookup

Mirror neuron wikipedia , lookup

Development of the nervous system wikipedia , lookup

Multi-armed bandit wikipedia , lookup

Neuroanatomy wikipedia , lookup

Neural coding wikipedia , lookup

Circumventricular organs wikipedia , lookup

Neural oscillation wikipedia , lookup

Central pattern generator wikipedia , lookup

Nervous system network models wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Neuroeconomics wikipedia , lookup

Optogenetics wikipedia , lookup

Synaptic gating wikipedia , lookup

Pre-Bötzinger complex wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Transcript
Articles in PresS. J Neurophysiol (September 16, 2015). doi:10.1152/jn.00097.2015
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Discrete coding of stimulus value, reward expectation, and
reward prediction error in the dorsal striatum
Kei Oyama1,2, Yukina Tateyama1, Istvan Hernadi3, Philippe N. Tobler4, Toshio Iijima1, and
Ken-Ichiro Tsutsui1
1
Division of Systems Neuroscience, Tohoku University Graduate School of Life Sciences, Sendai
980-8577, Japan
2
Department of Physiology, Tohoku University School of Medicine, Sendai 980-8575, Japan
3
Department of Experimental Zoology and Neurobiology, and Szentagothai Research Center,
University of Pécs, 7624 Pécs, Hungary
4
Laboratory for Social and Neural Systems Research, Department of Economics, University of
Zurich, 8006 Zurich, Switzerland
Running head: Value, expectation, and prediction error in dorsal striatum
Corresponding author:
Ken-Ichiro Tsutsui, Ph.D.
Division of Systems Neuroscience
Tohoku University Graduate School of Life Sciences,
2-1-1 Katahira, Aoba-ku, Sendai 980-8577, Japan
E-mail: [email protected]
Copyright © 2015 by the American Physiological Society.
2
45
Abstract
46
To investigate how the striatum integrates sensory information with reward information for
47
behavioral guidance, we recorded single-unit activity in the dorsal striatum of head-fixed rats
48
participating in a probabilistic Pavlovian conditioning task with auditory conditioned stimuli (CSs) in
49
which reward probability was fixed for each CS but parametrically varied across CSs. We found that
50
the activity of many neurons was linearly correlated with the reward probability indicated by the CSs.
51
The recorded neurons could be classified according to their firing patterns into functional subtypes
52
coding reward probability in different forms, such as stimulus value, reward expectation, and reward
53
prediction error. These results suggest that several functional subgroups of dorsal striatal neurons
54
represent different kinds of information formed through extensive prior exposure to CS-reward
55
contingencies.
56
57
Keywords
58
single-unit recording, head-fixed rats, Pavlovian conditioning
59
3
60
Introduction
61
The striatum, an input stage of the basal ganglia, receives projections from almost all areas of the
62
cerebral cortices (Bolam et al. 2000) as well as from dopamine neurons in the substantia nigra pars
63
compacta (SNc) (Anden 1964). These diverse anatomical inputs make the striatum a structure ideal
64
for integrating sensory and motor information from the cerebral cortex with reward information from
65
the SNc. Previous studies have shown that striatal neurons are activated at various events within a
66
trial in a behavioral task—such as instruction cue, delay, and execution of the movement leading to
67
reward delivery (Apicella et al. 1992; Hikosaka et al. 1989a, 1989b; Kimura 1990; Rolls et al.
68
1983)—and that such task-related activity is modulated by the likeliness of obtaining the reward
69
(Cromwell et al. 2003; Hassani et al. 2001; Hollerman et al. 1998; Kawagoe et al. 1998; Nakamura et
70
al. 2012). Furthermore, investigators using behavioral tasks in which the probability of obtaining
71
reward by one or another action dynamically changed found striatal neurons to track the value of a
72
specific action (Samejima et al. 2005) or that of the one actually chosen (Lau and Glimcher 2008).
73
A useful way to investigate how the striatum integrates information from the SNc and the
74
cerebral cortex is to record single-unit activity in these structures while animals are performing the
75
same task. The activity of dopamine neurons in the SNc has been widely investigated using a
76
probabilistic Pavlovian conditioning task in which the association between the conditioning stimulus
77
(CS) and subsequent reward (US) is varied parametrically across the full probability range (p = 0 to
78
1). Using such a task, Schultz and colleagues have found that dopamine neurons code reward
79
prediction error in their phasic response to the stimulus and the reward and that they also code
4
80
reward uncertainty in their tonic activity between the CS and the outcome (Fiorillo et al. 2003). We
81
have recently recorded single-unit activity in the rat dorsal striatum and SNc during a probabilistic
82
Pavlovian conditioning task and found that a group of neurons in the dorsal striatum codes reward
83
prediction error information in the same manner as dopamine neurons in the SNc (Oyama et al.
84
2010). While in that study we focused on the neurons coding reward prediction error, in this study
85
we analyzed the same dataset looking for any task-related variation of activity. Furthermore, we
86
analyzed data from an additional experiment extending the delay duration.
87
5
88
Materials & Methods
89
90
Subjects
91
Twenty-one male albino Wistar rats weighing 220–300 g were used as subjects. They were
92
individually housed under a 12:12 hr light:dark cycle with light onset at 8:00 P.M. Throughout the
93
experiments they were treated in accordance with the National Institute of Health Guide for the Care
94
and Use of Laboratory Animals, the Tohoku University Guidelines for Animal Care and Use, and the
95
APS Guiding Principles for the Care and Use of Vertebrate Animals in Research and Training.
96
Apparatus
97
Experiments were conducted in a dimly lit sound-attenuated room. Brief auditory CSs were
98
generated by a personal computer and presented diagonally from two loudspeakers (ASP-701,
99
Elecom) 30 cm from the head of the rat. An infrared sensor system was used to detect conditioned
100
and unconditioned spout-licking movement.
101
Behavioral procedure and task
102
Before behavioral training, a head-fixation device consisting of two metal tubes and a stainless steel
103
screw as a grounded reference electrode were fixed to the skull with dental cement under anesthesia
104
induced by a combination of ketamine (80.0 mg/kg) and xylazine (0.8 mg/kg). After recovery from
105
the surgery, each rat was habituated to an acrylic half-cylinder restraining device (diameter: 8.5 cm,
106
length: 15 cm) that was combined with a stereotaxic head-fixation frame (SR-5R, Narishige). During
107
the task training and single-unit recording sessions, the rat was placed in the restraining device with
108
its head fixed firmly and painlessly in a stereotaxic device (Fig. 1).
6
109
The rats were trained with a probabilistic classical conditioning procedure. Five different
110
auditory stimuli with the same intensity but with different frequencies ranging from 1.2 to 14 kHz
111
(1.2, 2, 5, 9, and 14 kHz) were used as CSs, each indicating a different reward probability (p = 0,
112
0.25, 0.5, 0.75, or 1.0). Combinations of tone frequencies and reward probabilities were varied
113
between rats. In order to dissociate reward-probability-dependent neuronal activity from the auditory
114
sensory response, the combinations of tone frequencies and reward probabilities were organized so
115
that a reward-probability-dependent tuning of response amplitude would appear as multi-peaked
116
tuning when responses are plotted against log-aligned tone frequency. This allowed us to dissociate
117
reward-probability-dependent activity from typical sensory responses that would be expressed as
118
single-peaked tuning when activity is plotted against tone frequency (Bordi and LeDoux 1992;
119
Doron et al. 2002; Sally and Kelly 1988; Sutter and Schreiner 1999).
120
In each trial a 1.5-s CS was followed by a 0.5-s delay. Whether reward occurred
121
immediately after the delay was determined probabilistically depending on the CS, and in a rewarded
122
trial a solenoid valve opened for 250 ms and delivered 50 μl of a sucrose solution through a spout in
123
front of the rat’s mouth. The intertrial interval (ITI) was usually set to one of six durations, each
124
consisting of a fixed 4 s plus an exponentially distributed interval with a mean of 5 s. The exception
125
was when an unpredicted reward was given during the ITI. In that case the time between the end of
126
the previous trial and the unpredicted reward and the time between the unpredicted reward and the
127
start of the next trial were both set to one of the above regular ITI durations. Trial sequence was
128
predetermined by a personal computer so that each of the five CSs and the unpredicted reward
7
129
appeared twice in a block of 10 trials. A daily session consisted of 600 trials.
130
An additional experiment was conducted in order to identify the task events to which the
131
activity of the recorded neurons was time-locked. In this experiment, 6 rats were used as subjects.
132
The length of the delay period was extended in a stepwise fashion, and three different auditory
133
stimuli were used as CSs indicating reward probabilities of 0, 50, and 100%. Again, an unpredicted
134
reward was occasionally given during the ITI. The CS indicating a reward probability of 50%
135
appeared twice as often as the CSs indicating reward probabilities of 0 or 100%. As a consequence,
136
the number of rewarded trials in the 50% condition was the same as that in the 100% condition. Each
137
recording session for a delay duration consisted of 60–90 trials, and the initial 20–30 trials after
138
delay extension were excluded from analysis (allowing rats to adapt to the new timing of the reward
139
delivery). After a neuron was isolated, the neuronal activity was first recorded with a 0.5-s delay and
140
then the delay duration (i.e., the time without an explicit timing cue) was extended to 1.5 s. The
141
delay was then extended from 1.5 to 3.5 s and, finally, set back to 0.5 s.
142
Single-unit recording
143
The recording session began after the rat’s anticipatory licking responses discriminated between
144
probabilities during the CS and delay period. Chronic access to the brain was provided by using a
145
second surgical procedure to open a hole in the skull and attach a recording chamber over it. The
146
position of the hole (AP = +2.0 to −1.5 from bregma; L = 1.0 to 4.5 from the midline) was
147
determined according to the standard stereotaxic atlas (Paxinos and Watson 2005). After recovery
148
from surgery, the activity of single neurons was recorded extracellularly, using tungsten
8
149
microelectrodes with a platinized tip (1–3 MΩ measured at 1 kHz, 0.125-mm-diameter shaft; FHC),
150
while the rat performed the Pavlovian conditioning task. The electrode was attached to a hydraulic
151
microdrive (MO-15, Narishige) so that it could be advanced into the brain. Electrophysiological
152
signals were amplified (10,000 times) and bandpass filtered (low-cut: 100 Hz; high-cut: 10,000 Hz)
153
with a standard biophysical amplifier (Bio Amp A2-v6, Supertech) and were displayed on an
154
oscilloscope (CS-4125A, Kenwood). The amplified signals were also rendered audible and presented
155
to the experimenter through headphones. The action potentials of isolated neurons were sorted by a
156
window-discriminator (DDIS-1, Bak Electronics) and displayed on a digital storage oscilloscope
157
(DCS-7040, Kenwood). The recorded electrophysiological signals were digitized at 25 kHz by using
158
an analogue-digital conversion interface (Power 1401, CED) and then stored on a hard disk of a
159
personal computer (X100, IBM). The times of the detected action potentials, licking movements, and
160
task events were also stored on the hard disk. Rasters and histograms showing the neuronal activity
161
recorded under each probability condition and the response to the unpredicted reward were displayed
162
on-line on an LCD video screen. If visual inspection suggested that the neural activity was related to
163
one or more task events (CS, delay, and/or reward) or to the unpredicted reward given during the ITI,
164
we stored the recorded data on the computer for off-line analysis. The dataset included the activity
165
recorded in at least 7 trials for each probability condition, and in this study it was subjected to further
166
analysis.
167
Analysis of neuronal activity
168
When analyzing the activity of each neuron, we focused on the activity during four time periods
9
169
within the trial: stimulus-related activity from CS onset till 750 ms after the CS onset (first half of the
170
CS presentation), stimulus-related activity from 750 ms before the CS offset to CS offset (second
171
half of the CS presentation), reward expectation activity from 1000 ms before reward delivery till
172
reward delivery, and reward activity (and prediction error response) from reward onset till 500 ms
173
after reward delivery.
174
We classified neurons into three groups according to which of the three pre-reward time
175
windows was the one in which the neuron showed the greatest change in activity (in this analysis we
176
excluded the time window after reward delivery as we wanted to classify neurons based on the
177
activity before receipt of the reward). We calculated the t-value by comparing the baseline activity
178
(500 ms before CS onset) and the activity in each time window. The time window that yielded the
179
largest absolute t-value was considered the one with the greatest change in activity. Neurons showing
180
the greatest change during the 750 ms after the CS onset (first half of the CS presentation), during
181
the 750 ms before the CS offset (second half of the CS presentation), and during the 1000 ms before
182
reward delivery were classified as CS phasic, CS tonic, and US build-up neurons, respectively.
183
To analyze the information content coded by each neuron in each time window, we
184
conducted a multiple linear regression analysis on a trial-by-trial basis with the reward probability
185
and uncertainty as regressors (p < 0.05, uncorrected for multiple regressors). Uncertainty was
186
quantified as relative variance, which is 1 at p = 0.5 and is 0 at p = 0 and at p = 1. When analyzing
187
the activity within the 500 ms after the reward onset, we used only the rewarded trials. The response
188
to an unpredicted reward given during the ITI served as an approximation of the reward response in
10
189
the p = 0 condition in the multiple regression analysis. Neurons showing greater activity within 500
190
ms after the onset of an unpredicted reward than they did within 500 ms before the onset were judged
191
to be responsive to reward.
192
We used Wilcoxon signed rank test to determine whether the distributions of the
193
standardized beta coefficients obtained in each neuron from the multiple linear regression analysis
194
for probability and uncertainty deviated significantly (p < 0.05) from zero. To compare the effect size
195
of each regressor, we compared the absolute values of the standardized beta coefficients for reward
196
probability and uncertainty using paired t-test (p < 0.05) in each time window. We also conducted a
197
permutation test to test whether the number of neurons showing a statistical significance was greater
198
than chance level (p < 0.05). To construct population-averaged histograms, we used the following
199
procedure. First, for each neuron, we subtracted the baseline firing rate from the firing rate in each
200
bin. We then normalized the firing rate of each bin by the firing rate of the bin with the highest firing
201
rate. Finally, the normalized activity of each neuron was averaged across neuronal population.
202
Histology
203
Electrolytic lesions were made by passing electrical current through the tip of the electrode and into
204
the brain tissue. After the rat was killed with an overdose of pentobarbital, it was transcardially
205
perfused first with 0.9% saline and then with 10% formalin. Then the brain was removed from the
206
skull and stored in a 10% formalin solution. For histological inspection the brain was sliced into
207
50-μm coronal sections and stained with thionine. Slices were examined under a light microscope to
208
verify lesion site and electrode tracks. Electrode placements were finally verified using the rat brain
11
209
atlas (Paxinos and Watson 2005). The plots of recording sites were superimposed at 0.5-mm intervals
210
on corresponding coronal sections of the left hemisphere.
211
12
212
Results
213
We trained 15 head-fixed rats in a probabilistic Pavlovian conditioning task (Fig. 1). After 2 to 3
214
months of training, the rats exhibited probability-dependent spout-licking movement (i.e., longer
215
cumulative duration of licking for higher reward probability) during the CS and/or delay period
216
(Oyama et al. 2010). The training was considered complete with the emergence of discriminative
217
anticipatory licking responses during the CS and delay period.
218
Temporal response profiles of dorsal striatal neurons during the probabilistic Pavlovian
219
conditioning task
220
After the completion of training, we recorded single-unit activity in the dorsal striatum during the
221
performance of the probabilistic Pavlovian conditioning task. Of the 1102 striatal neurons whose
222
activity was isolated, 18% (N = 194) were judged from the experimenter’s visual inspection to show
223
changed activity (more or less firing than the baseline level) during a trial, and their activity was
224
recorded on the computer.
225
Figure 2 shows the activity of every recorded neuron in rewarded (Fig. 2, upper left) and
226
unrewarded trials (Fig. 2, lower left) in the 50% reward probability condition normalized by the peak
227
response. We rank-sorted neuronal activities by occurrence of their peak response in the trial as
228
calculated in the pooled data from all reward probability conditions including both rewarded and
229
unrewarded trials. We also show the activity of each neuron around the delivery of the unpredicted
230
reward (Fig. 2, right). By visual inspection there appeared to be several subtypes of neurons that had
231
different firing patterns. The first consisted of neurons showing a phasic response to the CS
13
232
presentation and to the delivery of the reward (neuron ID #1 to around #100), the second consisted of
233
neurons showing a tonic response during CS presentation (neuron ID around #101 to around #120),
234
and the third consisted of neurons showing a build-up activity towards the time of reward delivery
235
(neuron ID around #121 to #194).
236
We classified neurons into three groups according to which of the three pre-reward time
237
windows was the one in which the neuron showed the greatest change in activity (see Materials &
238
Methods). Of the 194 recorded neurons, 39% (76: all excitatory) showed the greatest change during
239
the 750 ms after the CS onset (first half of the CS presentation), 12% (23: 17 excitatory and 6
240
inhibitory) showed the greatest change during the 750 ms before the CS offset (second half of the CS
241
presentation), and 49% (95: 91 excitatory and 4 inhibitory) showed the greatest change during the
242
1000 ms before reward delivery. Hereafter we refer to the neurons showing the greatest activity
243
change during the first half of the CS presentation, second half of the CS presentation, and before
244
reward as CS phasic, CS tonic, and US build-up neurons, respectively. We will focus on the firing
245
patterns of these neurons.
246
Neurons with phasic CS response (CS phasic neurons)
247
Of the 194 recorded neurons, 39% (76/194) were classified as CS phasic neurons. They typically
248
showed a phasic response after the CS onset. Most of them (76%, 58/76) also showed a significant
249
response to the reward. We then assessed the specific correlation of activity to reward probability. We
250
conducted a multiple linear regression analysis to test whether the activity in each time window (first
251
half of the CS presentation; second half of the CS presentation; 1000 ms before reward delivery; 500
14
252
ms after reward delivery) was linearly related to reward probability or reward uncertainty, which can
253
be quantified as relative variance (1 at p = 0.5 and 0 at p = 0 and 1). In this paper we have used the
254
term “uncertainty” according to the definition by Fiorillo et al. (2003), although recent studies have
255
also referred to “uncertainty” as “risk” (e.g., Burke and Tobler 2011). We included uncertainty in the
256
regression model because tonic activity of midbrain dopamine neurons is correlated with reward
257
uncertainty (Fiorillo et al. 2003). Table 1 summarizes the results of the multiple linear regression
258
analysis. The proportions of neurons that showed a positive correlation between the activity and
259
reward probability were highest in the three time windows before the reward delivery, especially
260
during the first half of the CS presentation. In that time window the majority (53%, 40/76) showed a
261
positive correlation between CS response and reward probability. On the other hand, half of the CS
262
phasic neurons (50%, 38/76) showed a negative correlation between reward response and reward
263
probability. In total, 34% (26/76) showed a positive correlation between the CS response and reward
264
probability and showed a negative correlation between the reward response and reward probability.
265
Figures 3 and 4 show the activity of a representative CS phasic neuron and the average
266
histograms of the activity of 26 CS phasic neurons that showed CS and reward responses positively
267
and negatively related to the reward probability. They showed a phasic response both to the CS and
268
to the reward. The response during the first half of the CS was positively related to reward
269
probability (r = 0.55, p < 0.0001), whereas the reward response was negatively related to reward
270
probability (r = −0.44, p < 0.0001) and was highest for the unpredicted reward. These neurons were
271
considered to code a reward prediction error: the discrepancy between the prediction and occurrence
15
272
of a reward. These neurons were almost the same population as the one we have previously reported
273
as reward prediction error-coding neurons (Oyama et al. 2010).
274
Figure 5 shows the distributions of the standardized beta coefficients for reward probability
275
and uncertainty of all CS phasic neurons in each time window. In the three time windows before the
276
reward delivery, the distribution for reward probability showed a positive deviation from zero (p <
277
0.05, Wilcoxon signed-rank test) (Fig. 5A). On the other hand, the distribution of the reward
278
responses for reward probability showed a negative deviation (p < 0.05, Wilcoxon signed-rank test)
279
(Fig. 5A). The distribution for uncertainty did not show any deviation during the first half of the CS
280
presentation (p > 0.05, Wilcoxon signed-rank test), that during the second half of the CS presentation
281
and that before reward showed positive deviations from zero (p < 0.05, Wilcoxon signed-rank test),
282
and that after the reward delivery showed a negative deviation (p < 0.05, Wilcoxon signed-rank test)
283
(Fig. 5B).
284
To compare the effect size of each regressor, we compared the absolute values of the
285
standardized beta coefficients for reward probability and uncertainty. In every time window the
286
absolute value of the standardized beta coefficient for reward probability was greater than that of the
287
standardized beta coefficient for uncertainty (p < 0.05, paired t-test). We also confirmed this
288
tendency in the proportion of neurons that showed a statistically significant activity change during
289
the first half of the CS presentation and after the reward (chi-square test, p < 0.05 with Bonferroni
290
correction).
291
Neurons with tonic CS response (CS tonic neurons)
16
292
Of the 194 recorded neurons, 12% (23/194) were classified as CS tonic neurons. They typically
293
showed a tonic response during the CS presentation, and 30% of those neurons (7/23) showed a
294
significant response to the reward. The proportions of neurons that showed a positive correlation
295
between activity and reward probability were highest in the three time windows before the reward
296
delivery, especially during the second half of the CS presentation (Table 1). In that time window the
297
majority (57%, 13/23) showed a positive correlation between CS response and reward probability.
298
Figures 6 and 7 show the activity of a representative CS tonic neuron and the average
299
histograms of the activity of 13 CS tonic neurons that showed a CS response positively related to the
300
reward probability. The neurons were tonically active during the presentation of the CS and showed
301
no response to reward delivery. The CS response was positively related to reward probability (r =
302
0.78, p < 0.0001).
303
Figure 8 shows the distribution of the standardized beta coefficients for reward probability
304
and uncertainty of all CS tonic neurons in each time window. In the three time windows before the
305
reward delivery the distribution for reward probability showed a positive deviation from zero (p <
306
0.05, Wilcoxon signed-rank test) (Fig. 8A). The distribution of reward responses for reward
307
probability did not show any deviation (p > 0.05, Wilcoxon signed-rank test) (Fig. 8A). In every time
308
window the distribution for uncertainty did not show any deviation (p > 0.05, Wilcoxon signed-rank
309
test) (Fig. 8B). In the three time windows before the reward delivery, the absolute value of the
310
standardized beta coefficient for reward probability was greater than that of the standardized beta
311
coefficient for uncertainty (p < 0.05, paired t-test). We also confirmed this tendency in the proportion
17
312
of neurons that showed a statistically significant activity change during the second half of the CS
313
presentation (chi-square test, p < 0.05 with Bonferroni correction).
314
Neurons with pre-US build-up activity (US build-up neurons)
315
Of the 194 recorded neurons, 49% (95/194) were classified as US build-up neurons. They typically
316
showed gradually increasing activity toward the time of reward delivery. Of them, 47% (45/95) also
317
showed a significant response to the reward, while 53% (50/95) did not. The proportions of neurons
318
that showed a positive correlation between the activity and reward probability were highest in every
319
time window, especially during the 1000 ms before the reward delivery (Table 1). In that time
320
window the majority (68%, 65/95) showed a positive correlation between CS response and reward
321
probability.
322
Figures 9A and 10A show the activity of a representative US build-up neuron and the
323
average histograms of the activity of 31 US build-up neurons that showed pre-reward activity
324
positively related to reward probability with a reward response. The activity of these neurons
325
gradually increased towards the time of reward delivery and also was high after the reward. The
326
activity during the 1000 ms before the reward delivery was positively related to reward probability (r
327
= 0.68, p < 0.0001), whereas the activity after the time of reward delivery was not (p > 0.1).
328
Figures 9B and 10B show the activity of a representative US build-up neuron and the
329
average histograms of the activity of the 34 US build-up neurons that showed pre-reward activity
330
positively related to reward probability without a reward response. As in the reward-responsive class
331
of US build-up neurons, activity gradually increased towards the time of the reward delivery. The
18
332
activity of these neurons, however, steeply decreased after reward delivery. The activity during the
333
1000 ms before the reward delivery was positively and linearly related to reward probability (r =
334
0.70, p < 0.0001). The activity after the time of the reward delivery showed a positive correlation
335
with reward probability (r = 0.35, p < 0.0001), but this may simply reflect the preceding
336
probability-dependent build-up activity. The activity after the usual time of reward was higher in
337
unrewarded trials than in rewarded trials (comparison in intermediate reward probability conditions;
338
75%, 50%, and 25%; p < 0.0001, two-way ANOVA).
339
Figure 11 shows the distribution of the standardized beta coefficients for reward probability
340
and uncertainty of all US build-up neurons in each time window. In every time window the
341
distribution for reward probability showed a positive deviation from zero (p < 0.05, Wilcoxon
342
signed-rank test) (Fig. 11A). The distribution for uncertainty did not show any deviation during the
343
first half of the CS presentation (p > 0.05, Wilcoxon signed-rank test), that during the second half of
344
the CS presentation and that before reward showed a positive deviation from zero (p < 0.05,
345
Wilcoxon signed-rank test), and that after the reward delivery did not show any deviation from zero
346
(p < 0.05, Wilcoxon signed-rank test) (Fig. 11B). In every time window the absolute value of the
347
standardized beta coefficient for reward probability was greater than that of the standardized beta
348
coefficient for uncertainty (p < 0.05, paired t-test). We also confirmed this tendency in the proportion
349
of neurons that showed a statistically significant activity difference during the 1000 ms before the
350
reward delivery (chi-square test, p < 0.05 with Bonferroni correction).
351
Impact of delay extensions
19
352
In classifying the neurons as above, we assumed that the activity between the CS onset and the
353
reward onset of CS phasic and CS tonic neurons was time-locked to the CS onset and that of US
354
build-up neurons was time-locked to the reward onset. To test this assumption we conducted an
355
additional experiment in which we recorded the activity of neurons while we changed the length of
356
the delay from 0.5 s to 1.5 s, then to 3.5 s, and finally back to 0.5 s. The activity of representative CS
357
phasic, CS tonic, and US build-up neurons in this delay manipulation is shown in Fig. 12A, B, and C.
358
We calculated the onset and peak latency of event-related activity in each neuron (Fig. 13A). In CS
359
phasic neurons the onset and the peak latency of activity related to the CS onset remained unchanged
360
(p > 0.05, Kruskal-Wallis test). And the responses related to the reward onset occurred similarly,
361
irrespective of delay duration. In CS tonic neurons the onset and the peak latency of responses
362
related to the CS remained unchanged (p > 0.05, Kruskal-Wallis test). Thus we confirmed that the
363
pre-reward activity of CS phasic and CS tonic neurons was time-locked to the CS onset whereas the
364
post-reward activity of CS phasic neurons was time-locked to the reward onset. In strong contrast to
365
the pre-reward responses of CS phasic and CS tonic neurons, the onset times of pre-reward responses
366
in US build-up neurons shifted away from the CS onset when the delay to reward increased (p <
367
0.0001, Kruskal-Wallis test). The peak response times with respect to the reward onset remained
368
unchanged (p > 0.05, Kruskal-Wallis test). Thus in terms of peak response time, the activity of US
369
build-up neurons was time-locked to the reward onset.
370
We also examined whether the magnitude of the event-related activity changed with the
371
delay extension (Fig. 13B). On average, CS phasic neurons showed weaker CS responses and
20
372
stronger reward responses with delay extension, CS tonic neurons showed weaker CS responses, and
373
US build-up neurons showed reduced activity during the last 1 s before reward time (p < 0.05,
374
one-way ANOVA). However, it appeared that the activity level not only changed abruptly with the
375
shift of the delay duration but also changed gradually as the daily session progressed. It is possible
376
that the gradual decrease of the neuronal firing reflects the decrease of the animal’s motivation. We
377
used as an indicator of motivational level the number of licking movements after reward delivery. In
378
order to dissociate abrupt change with the shift of the delay duration and gradual change with the
379
motivational level, we applied multiple linear regression analysis on an individual-neuron basis, with
380
the delay length as one factor and with the number of licking movements after reward delivery as
381
another factor (p < 0.05, uncorrected). In this analysis the number of licking movements was
382
averaged for rewarded trials in 10 trials, and the averaged data was applied to all of 10 trials. The
383
results are summarized in Table 2. Of all the CS phasic neurons we found (N = 14), 6 neurons
384
showed reduced CS responses only with delay extension, 1 showed reduced response only with the
385
decrease of licking movement, and 1 showed both effects. Moreover, 6 neurons showed stronger
386
reward responses with delay extension and 1 showed stronger responses with the decrease of licking
387
movement. Of all the CS tonic neurons we found (N = 6), 1 showed reduced CS responses only
388
with delay extension, 1 showed reduced response only with the decrease of licking movement, and 3
389
showed both effects. Of all the US build-up neurons we found (N = 19), 4 showed reduced CS
390
responses only with delay extension, 6 showed reduced response only with the decrease of licking
391
movement, and 4 showed both effects. Thus in about half of the neurons the analysis of individual
21
392
neuron data confirmed the main findings from the analysis of average data. We found that a
393
considerable number of striatal neurons were affected by the delay factor, while some of them and
394
some others were affected by the motivational factor.
395
Relationship between licking movement and neuronal activity
396
As it is known that the striatum is involved in motor functions and that many neurons fire in relation
397
to movement (Rolls et al. 1983; Hikosaka et al. 1989a), it is possible that the observed neuronal
398
activity is related to movement rather than reward probability. As we have previously shown (Oyama
399
et al., 2010), anticipatory licking movement was often positively correlated with reward probability.
400
In such cases, it is impossible to include these two factors as regressors in a multiple linear
401
regression model because of the multicollinearity between them. Therefore, in order to evaluate
402
whether the reward probability or the licking movement was more suitable to explain the change of
403
neuronal activity, we first divided recorded neurons into two groups: those recorded when
404
probability-dependent anticipatory licking movement was not observed and those recorded when it
405
was observed. For the former group, we applied a model in which reward probability, uncertainty,
406
and the number of licking movements were included as regressors. For the latter group, we applied
407
two models of multiple linear regression analysis independently: one that included reward
408
probability and uncertainty as regressors, and one in which reward probability was replaced by the
409
number of licking movements. For neurons that showed correlations with both reward probability
410
and licking movement, we compared the R-squared value for each model in order to find out which
411
factor affected on the neuronal activity more. The results are summarized in Table 3.
22
412
Of all CS phasic neurons, the activity of 60 was recorded when probability-dependent
413
anticipatory licking movement was not observed. Of these neurons, 27 showed a correlation only
414
with reward probability, 3 showed a correlation only with licking movement, and 5 showed
415
correlations with both factors during the first half of the CS presentation. Of 16 neurons whose
416
activity was recorded when probability-dependent anticipatory licking movement was observed, 8
417
only showed a correlation only with reward probability, none showed a correlation only with licking
418
movement, and 2 showed correlations with both factors. Both of those neurons had a higher
419
R-squared value for the model including licking movement as a regressor.
420
Of all CS tonic neurons, the activity of 17 was recorded when probability-dependent
421
anticipatory licking movement was not observed. Of those neurons, 10 showed a correlation only
422
with reward probability, none showed a correlation only with licking movement, and 1 showed
423
correlations with both factors during the second half of the CS presentation. Of 6 neurons whose
424
activity was recorded when probability-dependent anticipatory licking movement was observed, 3
425
showed a correlation only with reward probability, 2 showed a correlation only with licking
426
movement, and none showed correlations with both factors.
427
Of all US build-up neurons, the activity of 69 was recorded when probability-dependent
428
anticipatory licking movement was not observed. Of those neurons, 29 showed a correlation only
429
with reward probability, 6 showed a correlation only with licking movement, and 19 neurons showed
430
correlations with both factors during the 1000 ms before the reward delivery. Of 26 neurons whose
431
activity was recorded when probability-dependent anticipatory licking movement was observed, 14
23
432
showed a correlation only with reward probability, 1 showed a correlation only with licking
433
movement, and 4 showed correlations with both factors. All 4 of the neurons that showed
434
correlations with both factors had a higher R-squared value for the model including licking
435
movement as a regressor.
436
Thus a few neurons that were originally considered to be non-differential were found to
437
show licking-related activity by re-analyzing the data with a model including the number of licking
438
movements as a factor. In Table 3A the numbers of those neurons are listed under the heading
439
“Licking only.” On the other hand, several neurons that were originally considered to be
440
probability-dependent were found to also show licking-related activity. The numbers of those
441
neurons are listed in Table 3A under the heading “Both.” Of the neurons that were recorded while
442
probability-dependent licking movement was observed, only a small number were found more likely
443
to be licking movement-dependent rather than reward probability. The numbers of those neurons are
444
listed in parentheses of Table 3B under the heading “Both.” These results suggest that the activity of
445
striatal neurons recorded under the Pavlovian conditioning paradigm was related more to reward
446
probability than licking movement, although some neurons, especially some US build-up neurons,
447
showed licking-related activity. However, as anticipatory spout-licking behavior may emerge with
448
the increase of animals’ intrinsic expectation level, it is unclear whether the observed licking-related
449
activity was directly related to motor function or was related only indirectly through such
450
motivational factors.
451
Relationship between tone frequency and neuronal activity
24
452
As it is known that the striatum receives sensory inputs from various cortices, we tested whether the
453
observed activity of striatal neurons can be explained by the typical sensory responses that would
454
appear as single-peaked tuning when activity in auditory-related areas is plotted against tone
455
frequency (Bordi and LeDoux 1992; Doron et al. 2002; Sally and Kelly 1988; Sutter and Schreiner
456
1999). In designing the task we determined the combination of the tone frequency and the reward
457
probability so that the reward-probability-dependent response would not appear as single-peaked
458
tone tuning. We tested for single-peaked tuning by Gaussian curve-fitting to the response magnitude
459
of each type of neurons against the logarithmically scaled tone frequency (in this curve-fitting the
460
time windows for the responses were for CS phasic neurons the first half of the CS presentation, for
461
CS tonic neurons the second half of the CS presentation, and for US build-up neurons 1000 ms
462
before the reward delivery). Since we found good fitting only for one US build-up neuron, we think
463
there is little possibility that the activity of recorded striatal neurons was an artifact of the simple
464
auditory tone tuning.
465
Comparison of firing property and waveforms between neuron types
466
We compared the baseline firing rates of the above three types of neurons. The baseline firing rates
467
of CS phasic, CS tonic, and US build-up neurons were 3.1 ± 2.5, 3.0 ± 1.9, and 3.3 ± 3.1 (mean ±
468
s.d.) spikes/s, respectively, and they did not differ between neuron types (p > 0.05, one-way
469
ANOVA). We also compared the duration of the waveforms of action potentials (width of negative
470
component at half-maximum). The durations for CS phasic, CS tonic, and US build-up neurons were
25
471
205 ± 31, 223 ± 45, and 212 ± 30 (mean ± s.d.) µs, respectively, and did not differ significantly
472
between neuron types (p > 0.05, one-way ANOVA).
473
Recording sites
474
The recording sites of each neuron were reconstructed histologically and superimposed onto coronal
475
sections of the left hemisphere of the standard rat brain atlas (Paxinos and Watson 2005). Figure 14
476
shows the recording sites of each neuron type. We found that all three neuron types were widely
477
distributed within the dorsal striatum, without any specific topographical clustering for any of the
478
three types.
479
26
480
Discussion
481
In this study we recorded single-unit activity in the dorsal striatum of head-fixed rats that had been
482
pretrained in a probabilistic Pavlovian conditioning task using auditory cues. The neurons recorded
483
in rats performing this task could be categorized into three types based on their firing patterns. CS
484
phasic neurons showed a phasic response to the CS onset, and the magnitude of this response was
485
positively related to reward probability. The majority of these neurons also showed a phasic reward
486
response whose magnitude was negatively related to reward probability (Fig. 4). Thus, many CS
487
phasic neurons showed greater phasic responses to CSs that predicted higher reward probability and
488
showed greater phasic responses to less probable rewards. These firing properties correspond to the
489
firing properties of midbrain dopamine neurons and indicate that a subset of CS phasic neurons code
490
a reward prediction error at both the CS onset and the reward onset. In our previous study (Oyama et
491
al. 2010) we compared this type of neurons with midbrain dopamine neurons and concluded that they
492
have highly similar firing properties. CS tonic neurons showed a tonic response during the CS
493
presentation without a reward response, and the magnitude of this response was positively related to
494
reward probability (Fig. 7). These neurons can be considered to code the value of the stimulus. US
495
build-up neurons showed gradually increasing activity toward the time of reward delivery, and the
496
magnitude of the pre-reward activity was positively related to reward probability. The firing of this
497
type of neurons may reflect the animal’s internal expectation about the upcoming reward (Fig. 10).
498
For all three types of neurons, we examined how activity changed as the delay duration
499
between CS offset and reward onset was prolonged. As expected, the phasic responses in CS phasic
27
500
neurons were time-locked to the CS and reward onset and the tonic response in CS tonic neurons was
501
time-locked to the CS onset. The build-up activity in US build-up neuron continued to peak at the
502
reward onset regardless of the delay extension, but the build-up activity itself was prolonged as the
503
delay was extended. These results confirmed the validity of the interpretation of the function of each
504
neuron type. In general, extension of the time interval between the CS and the reward leads to a
505
decrease of the value of the CS or the value of the reward when measured at the time of the CS,
506
which in behavioral economics is known as “temporal discounting” (Ainslie 1975). It has been
507
reported that value-coding neurons show this devaluing effect induced by delay prolongation
508
(Kobayashi and Schultz 2008; Cai et al. 2011). We found that for CS phasic and CS tonic neurons the
509
CS responses in the high-probability conditions decreased as the delay was prolonged. And given
510
that the reward value predicted by the CS decreases with longer delays, the prediction error that
511
occurs at the actual delivery of reward increases. In addition, the timing of reward is less precise with
512
longer CS-US intervals, which also increases the prediction error (Fiorillo et al. 2008). In accordance
513
with this, we also observed that for CS phasic neurons the reward responses under the
514
high-probability conditions were stronger when the delay was prolonged. When the delay was again
515
set to the original duration, the activity level did not return all the way to the original level: the peak
516
of the CS response in CS phasic and CS tonic neurons and the peak of the build-up activity in US
517
build-up neurons were lower than in the initial session. As the whole procedure of extending the
518
delay duration in two steps and then bringing the delay duration back to the original length took a
519
long time, the motivational level of the animal could have been reduced considerably during the
28
520
procedure. It is possible that the data not only reflects the effect of time discounting but also an
521
overall decrease of the motivational level over time.
522
In the present task the activity of the majority of the recorded neurons depended on reward
523
probability as indicated by different CSs. This dependency on reward probability implies that the
524
activity was the product of associative learning during the extensive training phase. Previous studies
525
on the synaptic mechanisms in the striatum have shown that long-term potentiation can occur at
526
cortico-striatal synapses when the striatal neuron receives both cortical and dopaminergic inputs
527
(Canales et al. 2002; Reynolds et al. 2001; Wickens et al. 1996). In addition, dopamine neurons,
528
which send dense projections to the striatum, fire in response to unexpected rewards, i.e., when a
529
positive reward prediction error occurs (Schultz et al. 1998). These results suggest that when a
530
reward is given after the presentation of a CS, the cortico-striatal synapses that transmit the sensory
531
information of the CS would be strengthened. During extensive training, the synapses that transmit
532
the information of a CS indicating higher reward probability, which is more frequently followed by
533
reward, would be strengthened further. As a result of this process, the presentation of a CS indicating
534
higher reward probability would elicit greater striatal activation. This may be the mechanism by
535
which information about stimulus value is acquired. Similarly, the probability-dependent phasic CS
536
response of the reward prediction error-coding neurons may be formed through this process.
537
Moreover, it is conceivable that the firing of neurons coding stimulus value may differentially
538
change the animals’ internal motivational state, the elevation of which would be reflected in the
539
build-up activity of reward expectation-coding neurons toward the time of reward delivery. The
29
540
reward expectation signal would lead to preparation for appropriately acquiring the reward, such as
541
directing attention to the reward and preparing to execute the reward-acquiring action. At the timing
542
of the reward delivery, the reward expectation signal would be used to calculate the reward
543
prediction error signal, which is represented in the phasic response to the reward of the reward
544
prediction error-coding neurons.
545
According to the firing properties of neurons and the waveform of action potentials (Oyama
546
et al. 2010), it is most likely that we recorded from medium spiny projection neurons, which
547
constitute the vast majority of striatal neurons (Apicella 2007; Oorschot 1996). Our results indicate
548
that within the striatal medium spiny neurons there are discrete functional subtypes that code
549
different aspects of reward. It is known that there are different subpopulations of medium spiny
550
neurons, such as neurons belonging to the direct pathway or indirect pathway and neurons located in
551
the patch or matrix. Recent studies using transgenic animals and molecular biological techniques
552
have found that neurons belonging to the direct and indirect pathway have different motor functions
553
(Kravitz et al. 2010) and cognitive functions, such as learning (Hikida et al. 2010; Kravitz et al.
554
2012). These results suggest that striatal neurons with different histochemical properties code
555
different information. However, in order to understand how a neuron relates to a larger neuronal
556
network and how it functions and interacts with other neurons, we need to investigate the precise
557
morphological and histochemical background of the neuron, including its type, which receptors it
558
expresses, and which other neurons it projects to. Staining a single neuron after having recorded
559
from it (Oyama et al. 2013) during a behavioral paradigm will allow for such histochemical and
30
560
morphological investigations and may reveal the functions and relationships of discrete subtypes of
561
striatal neurons that code different reward-related information.
562
In our recorded neurons, only a small population showed activity related to reward
563
uncertainty, which is maximal at a reward probability of 50% and gradually decreases as reward
564
probability becomes smaller or larger (although we found in our previous study (Oyama et al. 2010)
565
that none of the RPE-coding neurons, a subset of CS phasic neurons of this study, showed activity
566
related to uncertainty, that may have been a consequence of our having underestimated the number of
567
neurons that code uncertainty because the statistical method we used was not as powerful as the one
568
used in this study). This suggests that the striatum is preferentially involved in coding parametric
569
reward value rather than reward uncertainty. Such a conclusion is consistent with human imaging
570
findings (Tobler et al. 2008) indicating that striatal activation is dependent on reward probability but
571
not on reward uncertainty in a very similar probabilistic Pavlovian conditioning task. Furthermore,
572
we found that only a small number of neurons showed negative correlations between CS related
573
activity and reward probability, even though positive correlations between CS-related activity and
574
reward probability were substantial and numerous. This suggests that in a probabilistic Pavlovian
575
conditioning paradigm, negative reward value coding is not common in dorsal striatal neurons. This
576
contrasts with previous studies investigating striatal value representation in monkeys involved in an
577
instrumental task, in which 30–60% of task-related neurons coded value negatively (Cromwell et al.
578
2003; Samejima et al. 2005). In addition, our striatal neurons showed neither an increase nor
579
decrease of activity at the time of an unexpected reward omission, neither on the population level nor
31
580
on the single-neuron level. This suggests that negative reward prediction error was not coded by
581
striatal neurons in our task and contrasts with what is known about dopamine neurons, which are
582
known to code both positive and negative prediction errors in monkeys (Schultz et al. 1998) and in
583
rodents (Oyama et al. 2010). On the other hand, a recent study recording from monkeys performing
584
an instrumental conditioning task demonstrated that both positive and negative prediction errors were
585
represented in presumed medium spiny neurons primarily by increases in firing rates (Assad and
586
Eskandar 2011). These inconsistencies with previous studies may be attributable to task or species
587
differences.
588
In a behavioral task in which both reward and punishment were used as unconditioned
589
stimuli, Matsumoto and Hikosaka (2009) claimed that a subpopulation of dopamine neurons encodes
590
general motivational salience rather than value (but see Fiorillo et al. 2013). Their argument raises
591
the possibility that striatal neurons showing probability-dependent activity may reflect the
592
motivational salience but not the value of stimuli and outcomes. The behavioral task we used in this
593
study, however, cannot dissociate value from motivational salience (Kahnt et al. 2014). Therefore we
594
cannot rule out the possibility that some neurons recorded in this study code motivational salience
595
rather than value. To determine whether the activity of a neuron reflects reward, punishment, or
596
motivational salience, we need to record the activity in a behavioral paradigm in which both reward
597
and punishment are used as unconditioned stimuli.
598
It has been suggested that the dorsomedial striatum mediates action-outcome learning or
599
goal-directed behavior and that the dorsolateral striatum mediates stimulus-response learning or
32
600
habitual behavior (Barnes et al. 2005; Jog et al. 1999; Packard and Knowlton 2002; Yin et al. 2004;
601
Yin et al. 2005). In this study we recorded from both medial and lateral areas and found many active
602
neurons, although we used a Pavlovian paradigm in which the animals were not required to perform
603
any action. Our data suggest that the dorsal striatum is involved not only in goal-directed or habitual
604
behavior but also in more general associative learning including probabilistic Pavlovian conditioning.
33
605
Acknowledgements
606
Grants
607
This study was funded by Grants-in-Aid for Scientific Research (KAKENHI) #24223004,
608
#24243067 and #19673002 to K.T. K.O. was supported by JSPS as a Research Fellow and was
609
funded by KAKENHI #24-8027. PNT was supported by the Swiss NSF (PP00P1_128574 and
610
PP00P1_150739).
611
Disclosures
612
No conflicts of interest, financial or otherwise, are declared by the authors.
613
34
614
615
616
617
618
619
620
621
622
References
Ainslie G. Specious reward: a behavioral theory of impulsiveness and impulse control. Psychol Bull
82: 463–496, 1975.
Anden NE, Carlsson A, Dahlstroem A, Fuxe K, Hillarp NA, Larsson K. Demonstration and
mapping out of nigro-neostriatal dopamine neurons. Life Sci 3: 523–530, 1964.
Apicella P, Scarnati E, Ljunberg T, Schultz W. Neuronal activity in monkey striatum related to
the expectation of predictable environmental events. J Neurophysiol 68: 945–960, 1992.
Apicella P. Leading tonically active neurons of the striatum from reward detection to context
recognition. Trends Neurosci 30: 299–306, 2007.
623
Asaad WF, Eskandar EN. Encoding of both positive and negative reward prediction errors by
624
neurons of the primate lateral prefrontal cortex and caudate nucleus. J Neurosci 31:
625
17772–17787, 2011.
626
627
Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM. Activity of striatal neurons reflects dynamic
encoding and recoding of procedural memories. Nature 437: 1158–1161, 2005.
628
Bordi F, LeDoux J. Sensory tuning beyond the sensory system: An initial analysis of auditory
629
response properties of neurons in the lateral amygdaloid nucleus and overlying areas of the
630
striatum. J Neurophysiol 12: 2493–2503, 1992.
631
632
633
Bolam JP, Hanley JJ, Booth PA, Bevan MD. Synaptic organization of the basal ganglia. J Anat
196: 527–542, 2000.
Burke CJ, Tobler PN. Coding of reward probability and risk by single neurons in animals. Front
35
634
635
636
Neurosci 5:121, 2011.
Cai X, Kim S, Lee D. Heterogeneous coding of temporally discounted values in the dorsal and
ventral striatum during intertemporal choice. Neuron 69: 170–182, 2011.
637
Canales JJ, Capper-Loup C, Hu D, Choe ES, Upadhyay U, Graybiel AM. Shifts in striatal
638
responsivity evoked by chronic stimulation of dopamine and glutamate systems. Brain 125:
639
2353–2363, 2002.
640
641
642
643
644
645
646
647
Cromwell HC, Schultz W. Effects of expectations for different reward magnitudes on neuronal
activity in primate striatum. J Neurophysiol 89: 2823–2838, 2003.
Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by
dopamine neurons. Science 299:1898–1902, 2003.
Fiorillo CD, Newsome WT, Schultz W. The temporal precision of reward prediction in dopamine
neurons. Nat Neurosci 11: 966–973, 2008.
Fiorillo CD, Yun SR, Song MR. Diversity and homogeneity in responses of midbrain dopamine
neurons. J Neurosci 33: 4693–4709, 2013.
648
Graybiel AM. The basal ganglia. Curr Biol 10: R509–R511, 2000.
649
Hassani OK, Cromwell HC, Schultz W. Influence of expectation of different rewards on
650
behavior-related neuronal activity in the striatum. J Neurophysiol 85: 2477–2489, 2001.
651
Hikida T, Kimura K, Wada N, Funabiki K, Nakanishi S. Distinct roles of synaptic transmission
652
in direct and indirect striatal pathways to reward and aversive behavior. Neuron 66: 896–907,
653
2010.
36
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons I. Activities
related to saccadic eye movements. J Neurophysiol 61: 780–798, 1989.
Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons II. Visual
and auditory responses. J Neurophysiol 61: 799–813, 1989.
Hollerman JR, Tremblay L, Schultz W. Influence of reward expectation on behavior-related
neuronal activity in primate striatum. J Neurophysiol 80: 947–963, 1998.
Jog MS, Kubota Y, Connolly CI, Hillegaart V, Graybiel AM. Building neural representations of
habits. Science 286: 1745–1749, 1999.
Kahnt T, Park SQ, Haynes JD, Tobler PN. Disentangling neural representations of value and
salience in the human brain. Proc Natl Acad Sci 111: 5000–5005, 2014.
Kawagoe R, Takikawa Y, Hikosaka O. Expectation of reward modulates cognitive signals in the
basal ganglia. Nat Neurosci 1: 411–416, 1998.
Kepecs A, Uchida N, Zariwala HA, Mainen ZF. Neural correlates, computation and behavioral
impact of decision confidence. Nature 455: 227–231, 2008.
Kimura M. Behaviorally contingent property of movement-related activity of the primate putamen.
J Neurophysiol 63: 1277–1296, 1990.
670
Kravitz AV, Freeze BS, Parker PRL, Kay K, Thwin MT, Deisseroth K, Kreitzer AC.
671
Regulation of parkisonian motor behaviors by optogenetic control of basal ganglia circuitry.
672
Nature 466: 622–626, 2010.
673
Kravitz AV, Tye LD, Kreitzer AC. Distinct roles for direct and indirect pathway striatal neurons in
37
674
675
676
677
678
reinforcement. Nat Neurosci 15: 816–818, 2012.
Lau B, Glimcher PW. Value representations in the primate striatum during matching behavior.
Neuron 58: 451–463, 2008.
Matsumoto M, Hikosaka O. Two types of dopamine neurons distinctly convey positive and
negative motivational signals. Nature 459: 837–841, 2009.
679
Nakamura K, Santos G, Matsuzaki R, Nakahara H. Differential reward coding in the
680
subdivisions of the primate caudate during an oculomotor task. J Neurosci 32: 15963–15982,
681
2012.
682
683
Ogawa M, van der Meer MAA, Esber GR, Cerri DH, Stalnaker TA, Schoenbaum G.
Risk-responsive orbitofrontal neurons track acquired salience. Neuron 77: 251–258, 2013.
684
Oorschot DE. Total number of neurons in the neostriatal, pallidal, subthalamic, and substantia
685
nigral nuclei of the rat basal ganglia: a stereological study using the Cavalieri and optical
686
disector methods. J Comp Neurol 366: 580–599, 1996.
687
688
Oyama K, Hernádi I, Iijima T, Tsutsui K. Reward prediction error coding in dorsal striatal
neurons. J Neurosci 30: 11447–11457, 2010.
689
Oyama K, Ohara S, Sato S, Karube F, Fujiyama F, Isomura Y, Mushiake H, Iijima T, Tsutsui
690
KI. Long-lasting single-neuron labeling by in vivo electroporation without microscopic
691
guidance. J Neurosci Methods 218: 139–147, 2013.
692
693
Packard MG. Knowlton BJ. Learning and memory functions of the basal ganglia. Annu Rev
Neurosci 25: 563–593, 2002.
38
694
695
696
697
698
699
700
701
702
703
Paxinos G, Watson C. The Rat Brain in Stereotaxic Coordinates. San Diego, CA: Academic Press,
2005.
Reynolds JN, Hyland BI, Wickens JR. A cellular mechanism of reward-related learning. Nature
413: 67–70, 2001.
Rolls ET, Thorpe SJ, Maddison SP. Responses of striatal neurons in the behaving monkey. 1.
Head of the caudate nucleus. Behav Brain Res 7: 179–210, 1983.
Sally SL, Kelly JB. Organization of auditory cortex in the albino rat: sound frequency. J
Neurophysiol 59: 1627–1638, 1988.
Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the
striatum. Science 310: 1337–1340, 2005.
704
Schultz W. Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1–27, 1998.
705
Sutter ML, Schreiner CE. Physiology and topography of neurons with multipeaked tuning curves
706
707
708
709
710
in cat primary auditory cortex. J Neurophysiol 65: 1207–1226, 1991.
Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press,
1998.
Tobler N, Christopoulos GI, O’Doherty JP, Dolan RJ, Schultz W. Neuronal distortions of
reward probability without choice. J Neurosci 28: 11703–11711, 2008.
711
Wickens JR, Begg AJ, Arbuthnott GW. Dopamine reverses the depression of rat corticostriatal
712
synapses which normally follows high-frequency stimulation of cortex in vitro. Neuroscience
713
70: 1–5, 1996.
39
714
715
716
717
718
Yin HH, Knowlton BJ, Balleine BW. Lesions dorsolateral striatum preserve outcome expectancy
but disrupt habit formation in instrumental learning. Eur J Neurosci 19: 181–189, 2004.
Yin HH, Ostlund SB, Knowlton BJ, Balleine BW. The role of the dorsomedial striatum in
instrumental conditioning. Eur J Neurosci 22: 513–523, 2005.
40
719
Figure Legends
720
Fig 1. Outline of the behavioral paradigm. A: The apparatus. The rats were involved in a probabilistic
721
Pavlovian conditioning task with the head stabilized with a head fixation device and with body
722
movement restricted by an acrylic half-cylinder. The auditory stimuli used in this study were
723
generated by a personal computer and presented from two loudspeakers 30 cm from the head of the
724
rat. A sucrose solution was given through a spout in front of the rat’s mouth, and an infrared sensor
725
system was used to detect spout-licking movements. B: Time sequence of task events in a trial. Lt.,
726
Left; Rt, right.
727
Fig. 2. Temporal response profiles of all 194 neurons whose activity was recorded during the
728
probabilistic Pavlovian conditioning task. The upper left panel shows the activity in rewarded trials,
729
and the lower left panel shows the activity in unrewarded trials of the 50% condition. The right panel
730
shows the activity around the time of delivery of the unpredicted reward. Each row represents
731
peak-normalized and baseline-subtracted activity for a single neuron, and the data are sorted from
732
top to bottom by peak response time. For neurons which showed responses to both the CS/delay and
733
the reward, the peak response time of their CS/delay response was used to align them. The horizontal
734
bars above the histograms and white dashed lines indicate the durations and times of CS presentation
735
and reward delivery. We used the moving-window method to calculate the peak response time of
736
each neuron. A 50-ms window was moved in 10-ms steps from the onset of the CS to the end of the
737
delay period, and the peak response time was determined as the center of the 50-ms window showing
738
maximum bin height.
41
739
Fig. 3. Activity of a representative CS phasic neuron. Rasters and histograms (bin width = 50 ms)
740
showing the activity recorded in different reward probability conditions. Rasters and histograms are
741
aligned to the CS onset. The horizontal bars below the histograms indicate the durations of CS
742
presentation and reward delivery.
743
Fig. 4. Population activity of CS phasic neurons that showed CS and reward responses positively and
744
negatively related to the reward probability (N = 26). Peak-normalized and baseline
745
activity-subtracted population histograms (bin width = 50 ms) are shown for rewarded trials (upper
746
left), for unrewarded trials (lower left), and for unpredicted reward (upper right). Lines of different
747
colors represent the neuronal activity recorded in different reward probability conditions (red =
748
100%, orange = 75%, purple = 50%, green = 25%, light blue = 0%) and during delivery of
749
unpredicted rewards (blue). Each bin was smoothed by a moving average of three bins.
750
Fig. 5. Distributions of the standardized beta coefficients of CS phasic neurons obtained from the
751
multiple linear regression analysis. A: Distribution of the standardized beta coefficients for reward
752
probability. B: Distribution of the standardized beta coefficients for uncertainty. Time windows used
753
for the analyses are first half of the CS presentation (upper left), second half of the CS presentation
754
(upper right), 1000 ms before reward delivery (lower left), 500 ms after reward delivery (lower right),
755
respectively. Asterisks at upper right and upper left in the graph indicate that the distribution showed
756
a positive and negative deviation from zero (p < 0.05, Wilcoxon signed-rank test), respectively.
757
Fig. 6. Activity of a representative CS tonic neuron. Conventions are the same as in Fig. 3.
758
Fig. 7. Population activity of CS tonic neurons that showed CS response positively related to reward
42
759
probability (N = 13). Conventions are the same as in Fig. 4.
760
Fig. 8. Distributions of the standardized beta coefficients of CS tonic neurons. Conventions are the
761
same as in Fig. 5.
762
Fig. 9. Activity of two representative US build-up neurons, one with a reward response (A) and the
763
other without a reward response (B). Conventions are the same as in Fig. 3.
764
Fig. 10. Population activity of US build-up neurons. A: Peak-normalized and baseline
765
activity-subtracted population histograms of the US build-up neurons that showed pre-reward
766
activity positively related to reward probability with a reward response (N = 31). B: Peak-normalized
767
and baseline activity-subtracted population histograms of the US build-up neurons that showed
768
pre-reward activity positively related to reward probability without a reward response (N = 34).
769
Conventions are the same as in Fig. 4.
770
Fig. 11. Distributions of the standardized beta coefficients of US build-up neurons. Conventions are
771
the same as in Fig. 5.
772
Fig. 12. Effects of delay extension on the activity of representative neurons. A: Activity of a
773
representative CS phasic neuron with a delay of 0.5 s (top row), 1.5 s (second row), and 3.5 s (third
774
row). The bottom row represents the activity in a second 0.5-s delay condition after the 3.5-s delay
775
condition. The horizontal bar below the raster indicates the duration of the CS presentation, and the
776
arrows above each condition indicate the reward delivery times. B: Activity of a representative CS
777
tonic neuron. C: Activity of a representative US build-up neuron. For all types of neurons, only
778
responding in the 100% condition is shown for simplicity.
43
779
Fig. 13. Effects of delay extension on the onset and peak latency (A) and the magnitude of
780
event-related activity (B) in each type of neuron. A: Onset and peak latencies of CS phasic neurons
781
related to the CS onset (upper left) and to the reward onset (upper right), CS tonic neurons related to
782
the CS onset (lower left), and US build-up neurons related to the reward onset (lower right). Filled
783
circles and open triangles represent the onset and peak latency of each neuron, respectively. B:
784
Magnitude of the CS (upper left) and reward response (upper right) of CS phasic neurons, CS
785
response of CS tonic neurons (lower left), and pre-reward activity of US build-up neurons (lower
786
right). Filled circles, filled squares, and open triangles represent the activity in the 100%, 50%, and
787
0% conditions, respectively. The time window for determining the peak normalized response was
788
750 ms from CS onset for the CS response of CS phasic neurons, 500 ms from reward onset for the
789
reward response of CS phasic neurons, 750 ms before CS offset for CS tonic neurons, and 1000 ms
790
before reward onset for US build-up neurons.
791
Fig. 14. Recording site for each neuron type. Numbers at the bottom indicate the anteroposterior
792
coordinates (in mm) from bregma. Coordinates were taken from the stereotaxic atlas of Paxinos &
793
Watson (2005). Filled circles represent the recording locations of neurons that showed
794
probability-dependent CS response or pre-reward activity, and open circles represent the recording
795
locations of neurons that showed nonprobability-dependent CS response or pre-reward activity.
A
B
Stimulus
Sucrose
Reward
Rt. Speaker
0
1
2
Time(s)
Solenoid
valve
Lick
sensor
Lt. Speaker
Figure 1
Rewarded trials in
50% reward condition
CS
Unpredicted reward
Reward
Reward
0
20
Neuron ID number
40
60
80
100
120
140
160
180
-1
0
1
2
3
Time from CS onset (s)
-1
0
1
Time from reward onset (s)
Peak normalized activity
Unrewarded trials in
50% reward condition
CS
-0.1 0
0.8
No reward
0
20
Neuron ID number
40
60
80
100
120
140
160
180
-1
0
1
2
Time from CS onset (s)
3
Figure 2
Rewarded trials
Unrewarded trials
100%
CS
Reward
75%
75%
50%
50%
25%
25%
Unpredicted
reward
0%
1s
20/s
0
Figure 3
CS
Reward
0.25
100% 0.5
75%
50%
25%
0.25
0.0
0.0
Peak normalized
response
0.5
N = 26
CS
Peak normalized
response
0.5
0.25
Unpredicted reward
No reward
75%
50%
25%
0%
0.0
1s
Figure 4
A
B
50
50
*
25
0
-0.8
50
*
25
0
0.8
Before reward
0
-0.8
50
*
25
0
-0.8
CS 2nd half
0
0.8
After reward
0.8
0
-0.8
50
*
CS 1st half
50
25
0
-0.8
50
25
0
Uncertainty
Proportion of neurons (%)
Proportion of neurons (%)
Probability
CS 1st half
0
0.8
0
-0.8
*
25
0
0.8
Before reward
0
-0.8
50
*
25
Standardized beta coefficient
CS 2nd half
0
0.8
After reward
*
25
0
0.6
0
-0.6
0
0.8
Standardized beta coefficient
Figure 5
Rewarded trials
Unrewarded trials
100%
CS
Reward
75%
75%
50%
50%
25%
25%
20/s
Unpredicted
reward
0%
1s
0
Figure 6
CS
Reward
0.25
100% 0.5
75%
50%
25%
0.25
0.0
0.0
Peak normalized
response
0.5
N = 13
CS
Peak normalized
response
0.5
0.25
Unpredicted reward
No reward
75%
50%
25%
0%
0.0
1s
Figure 7
A
B
50
CS 1st half
50
*
25
0
-0.8
50
*
25
0
0.8
Before reward
0
-0.8
50
*
25
0
-0.8
CS 2nd half
0
0.8
After reward
0.8
0
-0.8
50
CS 1st half
50
25
0
-0.8
50
0
Standardized beta coefficient
0.8
0
-0.8
CS 2nd half
25
0
0.8
Before reward
0
-0.8
50
25
25
0
Uncertainty
Proportion of neurons (%)
Proportion of neurons (%)
Probability
0
0.8
After reward
25
0
0.6
0
-0.6
0
0.8
Standardized beta coefficient
Figure 8
A
Rewarded trials
Unrewarded trials
100%
CS
Reward
75%
75%
50%
50%
25%
25%
Unpredicted
reward
0%
1s
B
Rewarded trials
20/s
0
Unrewarded trials
100%
CS
Reward
75%
75%
50%
50%
25%
25%
Unpredicted
reward
0%
20/s
1s
Figure 9
0
A
CS
0.25
0.0
0.0
Peak normalized
response
N = 31
CS
0.5
Peak normalized
response
Reward
100% 0.5
75%
50%
25%
0.25
0.5
Unpredicted reward
No reward
75%
50%
25%
0%
0.25
0.0
1s
B
CS
0.25
0.0
0.0
Peak normalized
response
N = 34
0.5
Peak normalized
response
Reward
100% 0.5
75%
50%
25%
0.25
0.5
0.25
CS
Unpredicted reward
No reward
75%
50%
25%
0%
0.0
1s
Figure 10
A
B
50
CS 1st half
50
*
25
0
-0.8
50
*
25
0
0.8
Before reward
0
-0.8
50
*
25
0
-0.8
CS 2nd half
0
0.8
After reward
0.8
0
-0.8
50
*
CS 1st half
50
25
0
-0.8
50
0
Standardized beta coefficient
0.8
0
-0.8
CS 2nd half
*
25
0
0.8
Before reward
0
-0.8
50
*
25
25
0
Uncertainty
Proportion of neurons (%)
Proportion of neurons (%)
Probability
0
0.8
After reward
25
0
0.6
0
-0.6
0
0.8
Standardized beta coefficient
Figure 11
A
Reward
1s
CS
B
Reward
1s
CS
C
Reward
CS
1s
Figure 12
B
300
200
100
0
0.5
1.5
3.5
0.5
CS tonic
1200
800
400
0
0.5
1.5
3.5
0.5
Time from reward onset (ms)
CS phasic (CS res.)
Time from reward onset (ms)
Time from CS onset (ms)
Time from CS onset (ms)
Onset and peak latency
400
CS phasic (Rew. res.)
200
0
0.5
1.5
3.5
0.5
Magnitude of event-related activity
Peak normalized response
A
CS phasic (CS res.)
0.2
0.2
0.1
0
0
0.5
1.5
3.5
0.5
0.5
CS tonic
US build-up
1000
CS phasic (Rew. res.)
0.4
0.3
1.5
3.5
0.5
US build-up
0.5
0.5
0.25
0.25
0
-2000
0
-4000
0.5
1.5
3.5
0.5
Delay length (s)
onset latency
peak latency
0
0.5
1.5
3.5
0.5
0.5
1.5
3.5
0.5
Delay length (s)
100%
50%
0%
Figure 13
CS phasic
CS tonic
US build-up
-1.0
-0.5
0.0
+0.5
+1.0
+1.5
2 mm
Probability-dependent activity
Non-probability-dependent activity
Figure 14
Table 1. Summary of the relationship between the activity of each type of neuron
and reward probability or uncertainty.
Type
Window
Probability
Positive
CS phasic
N = 76
Uncertainty
Negative
Positive
Negative
CS 1st half
(0-750)
40 (53*)
2 (3)
0 (0)
2 (3)
CS 2nd half
(750-1500)
16 (21*)
8 (11*)
5 (7*)
2 (3)
Before reward
17 (22*)
5 (7*)
6 (8*)
2 (3)
1 (1)
38 (50*)
2 (3)
6 (8*)
CS 1st half
(0-750)
12 (52*)
0 (0)
1 (4)
3 (13*)
CS 2nd half
(750-1500)
13 (57*)
1 (4)
2 (9*)
1 (4)
Before reward
(1000-2000)
12 (52*)
1 (4)
2 (9*)
2 (9*)
0 (0)
2 (9*)
0 (0)
0 (0)
26 (27*)
3 (3)
5 (5)
5 (5)
CS 2nd half
(750-1500)
46 (48*)
1 (1)
5 (5)
3 (3)
Before reward
(1000-2000)
65 (68*)
1 (1)
10 (11*)
3 (3)
After reward
(2000-2500)
28 (29*)
10 (11*)
7 (7*)
7 (7*)
(1000-2000)
After reward
(2000-2500)
CS tonic
N = 23
After reward
(2000-2500)
CS 1st half
US build-up
N = 95
(0-750)
Numbers in parentheses below the title of the time window show the time from the CS
onset (msec). Numbers in parentheses in each box show percentages of the neurons.
Asterisks indicate that the proportion is greater than chance level (Permutation test, p <
0.05).
Table 2. Summary of the relationship between the activity of each type of neuron
and delay length or licking movement after reward delivery.
Type
Delay length only
Licking only
Both
CS phasic (CS)
N = 14
6
1
1
CS phasic (Reward)
6
1
0
1
1
3
4
6
4
N = 14
CS tonic
N=6
US build-up
N = 19
Note that reward responses of CS phasic neurons increased with delay extension or
decrease of licking movement, while pre-reward activity of all types of neurons
decreased with delay extension or decrease of licking movement.
Table 3. Summary of the relationship between the activity of each type of neuron
and reward probability or licking movement.
A, Recorded without probability-dependent licking movement
Type
Probability only
Licking only
Both
CS phasic
N = 60
27
3
5
CS tonic
N = 17
10
0
1
US build-up
N = 69
29
6
19
B, Recorded with probability-dependent licking movement
Recorded with probability-dependent licking movement
Type
CS phasic
Probability only
Licking only
Both
8
0
2 (2)
CS tonic
N=6
3
2
0 (0)
US build-up
N = 26
14
1
4 (4)
N = 16
Numbers in parentheses show the number of neurons that had a higher R-squared
value for the model including licking movement as a regressor.