Download Discrete coding of stimulus value, reward expectation, and reward

Articles in PresS. J Neurophysiol (September 16, 2015). doi:10.1152/jn.00097.2015 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Discrete coding of stimulus value, reward expectation, and reward prediction error in the dorsal striatum Kei Oyama1,2, Yukina Tateyama1, Istvan Hernadi3, Philippe N. Tobler4, Toshio Iijima1, and Ken-Ichiro Tsutsui1 1 Division of Systems Neuroscience, Tohoku University Graduate School of Life Sciences, Sendai 980-8577, Japan 2 Department of Physiology, Tohoku University School of Medicine, Sendai 980-8575, Japan 3 Department of Experimental Zoology and Neurobiology, and Szentagothai Research Center, University of Pécs, 7624 Pécs, Hungary 4 Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, 8006 Zurich, Switzerland Running head: Value, expectation, and prediction error in dorsal striatum Corresponding author: Ken-Ichiro Tsutsui, Ph.D. Division of Systems Neuroscience Tohoku University Graduate School of Life Sciences, 2-1-1 Katahira, Aoba-ku, Sendai 980-8577, Japan E-mail: [email protected] Copyright © 2015 by the American Physiological Society. 2 45 Abstract 46 To investigate how the striatum integrates sensory information with reward information for 47 behavioral guidance, we recorded single-unit activity in the dorsal striatum of head-fixed rats 48 participating in a probabilistic Pavlovian conditioning task with auditory conditioned stimuli (CSs) in 49 which reward probability was fixed for each CS but parametrically varied across CSs. We found that 50 the activity of many neurons was linearly correlated with the reward probability indicated by the CSs. 51 The recorded neurons could be classified according to their firing patterns into functional subtypes 52 coding reward probability in different forms, such as stimulus value, reward expectation, and reward 53 prediction error. These results suggest that several functional subgroups of dorsal striatal neurons 54 represent different kinds of information formed through extensive prior exposure to CS-reward 55 contingencies. 56 57 Keywords 58 single-unit recording, head-fixed rats, Pavlovian conditioning 59 3 60 Introduction 61 The striatum, an input stage of the basal ganglia, receives projections from almost all areas of the 62 cerebral cortices (Bolam et al. 2000) as well as from dopamine neurons in the substantia nigra pars 63 compacta (SNc) (Anden 1964). These diverse anatomical inputs make the striatum a structure ideal 64 for integrating sensory and motor information from the cerebral cortex with reward information from 65 the SNc. Previous studies have shown that striatal neurons are activated at various events within a 66 trial in a behavioral task—such as instruction cue, delay, and execution of the movement leading to 67 reward delivery (Apicella et al. 1992; Hikosaka et al. 1989a, 1989b; Kimura 1990; Rolls et al. 68 1983)—and that such task-related activity is modulated by the likeliness of obtaining the reward 69 (Cromwell et al. 2003; Hassani et al. 2001; Hollerman et al. 1998; Kawagoe et al. 1998; Nakamura et 70 al. 2012). Furthermore, investigators using behavioral tasks in which the probability of obtaining 71 reward by one or another action dynamically changed found striatal neurons to track the value of a 72 specific action (Samejima et al. 2005) or that of the one actually chosen (Lau and Glimcher 2008). 73 A useful way to investigate how the striatum integrates information from the SNc and the 74 cerebral cortex is to record single-unit activity in these structures while animals are performing the 75 same task. The activity of dopamine neurons in the SNc has been widely investigated using a 76 probabilistic Pavlovian conditioning task in which the association between the conditioning stimulus 77 (CS) and subsequent reward (US) is varied parametrically across the full probability range (p = 0 to 78 1). Using such a task, Schultz and colleagues have found that dopamine neurons code reward 79 prediction error in their phasic response to the stimulus and the reward and that they also code 4 80 reward uncertainty in their tonic activity between the CS and the outcome (Fiorillo et al. 2003). We 81 have recently recorded single-unit activity in the rat dorsal striatum and SNc during a probabilistic 82 Pavlovian conditioning task and found that a group of neurons in the dorsal striatum codes reward 83 prediction error information in the same manner as dopamine neurons in the SNc (Oyama et al. 84 2010). While in that study we focused on the neurons coding reward prediction error, in this study 85 we analyzed the same dataset looking for any task-related variation of activity. Furthermore, we 86 analyzed data from an additional experiment extending the delay duration. 87 5 88 Materials & Methods 89 90 Subjects 91 Twenty-one male albino Wistar rats weighing 220–300 g were used as subjects. They were 92 individually housed under a 12:12 hr light:dark cycle with light onset at 8:00 P.M. Throughout the 93 experiments they were treated in accordance with the National Institute of Health Guide for the Care 94 and Use of Laboratory Animals, the Tohoku University Guidelines for Animal Care and Use, and the 95 APS Guiding Principles for the Care and Use of Vertebrate Animals in Research and Training. 96 Apparatus 97 Experiments were conducted in a dimly lit sound-attenuated room. Brief auditory CSs were 98 generated by a personal computer and presented diagonally from two loudspeakers (ASP-701, 99 Elecom) 30 cm from the head of the rat. An infrared sensor system was used to detect conditioned 100 and unconditioned spout-licking movement. 101 Behavioral procedure and task 102 Before behavioral training, a head-fixation device consisting of two metal tubes and a stainless steel 103 screw as a grounded reference electrode were fixed to the skull with dental cement under anesthesia 104 induced by a combination of ketamine (80.0 mg/kg) and xylazine (0.8 mg/kg). After recovery from 105 the surgery, each rat was habituated to an acrylic half-cylinder restraining device (diameter: 8.5 cm, 106 length: 15 cm) that was combined with a stereotaxic head-fixation frame (SR-5R, Narishige). During 107 the task training and single-unit recording sessions, the rat was placed in the restraining device with 108 its head fixed firmly and painlessly in a stereotaxic device (Fig. 1). 6 109 The rats were trained with a probabilistic classical conditioning procedure. Five different 110 auditory stimuli with the same intensity but with different frequencies ranging from 1.2 to 14 kHz 111 (1.2, 2, 5, 9, and 14 kHz) were used as CSs, each indicating a different reward probability (p = 0, 112 0.25, 0.5, 0.75, or 1.0). Combinations of tone frequencies and reward probabilities were varied 113 between rats. In order to dissociate reward-probability-dependent neuronal activity from the auditory 114 sensory response, the combinations of tone frequencies and reward probabilities were organized so 115 that a reward-probability-dependent tuning of response amplitude would appear as multi-peaked 116 tuning when responses are plotted against log-aligned tone frequency. This allowed us to dissociate 117 reward-probability-dependent activity from typical sensory responses that would be expressed as 118 single-peaked tuning when activity is plotted against tone frequency (Bordi and LeDoux 1992; 119 Doron et al. 2002; Sally and Kelly 1988; Sutter and Schreiner 1999). 120 In each trial a 1.5-s CS was followed by a 0.5-s delay. Whether reward occurred 121 immediately after the delay was determined probabilistically depending on the CS, and in a rewarded 122 trial a solenoid valve opened for 250 ms and delivered 50 μl of a sucrose solution through a spout in 123 front of the rat’s mouth. The intertrial interval (ITI) was usually set to one of six durations, each 124 consisting of a fixed 4 s plus an exponentially distributed interval with a mean of 5 s. The exception 125 was when an unpredicted reward was given during the ITI. In that case the time between the end of 126 the previous trial and the unpredicted reward and the time between the unpredicted reward and the 127 start of the next trial were both set to one of the above regular ITI durations. Trial sequence was 128 predetermined by a personal computer so that each of the five CSs and the unpredicted reward 7 129 appeared twice in a block of 10 trials. A daily session consisted of 600 trials. 130 An additional experiment was conducted in order to identify the task events to which the 131 activity of the recorded neurons was time-locked. In this experiment, 6 rats were used as subjects. 132 The length of the delay period was extended in a stepwise fashion, and three different auditory 133 stimuli were used as CSs indicating reward probabilities of 0, 50, and 100%. Again, an unpredicted 134 reward was occasionally given during the ITI. The CS indicating a reward probability of 50% 135 appeared twice as often as the CSs indicating reward probabilities of 0 or 100%. As a consequence, 136 the number of rewarded trials in the 50% condition was the same as that in the 100% condition. Each 137 recording session for a delay duration consisted of 60–90 trials, and the initial 20–30 trials after 138 delay extension were excluded from analysis (allowing rats to adapt to the new timing of the reward 139 delivery). After a neuron was isolated, the neuronal activity was first recorded with a 0.5-s delay and 140 then the delay duration (i.e., the time without an explicit timing cue) was extended to 1.5 s. The 141 delay was then extended from 1.5 to 3.5 s and, finally, set back to 0.5 s. 142 Single-unit recording 143 The recording session began after the rat’s anticipatory licking responses discriminated between 144 probabilities during the CS and delay period. Chronic access to the brain was provided by using a 145 second surgical procedure to open a hole in the skull and attach a recording chamber over it. The 146 position of the hole (AP = +2.0 to −1.5 from bregma; L = 1.0 to 4.5 from the midline) was 147 determined according to the standard stereotaxic atlas (Paxinos and Watson 2005). After recovery 148 from surgery, the activity of single neurons was recorded extracellularly, using tungsten 8 149 microelectrodes with a platinized tip (1–3 MΩ measured at 1 kHz, 0.125-mm-diameter shaft; FHC), 150 while the rat performed the Pavlovian conditioning task. The electrode was attached to a hydraulic 151 microdrive (MO-15, Narishige) so that it could be advanced into the brain. Electrophysiological 152 signals were amplified (10,000 times) and bandpass filtered (low-cut: 100 Hz; high-cut: 10,000 Hz) 153 with a standard biophysical amplifier (Bio Amp A2-v6, Supertech) and were displayed on an 154 oscilloscope (CS-4125A, Kenwood). The amplified signals were also rendered audible and presented 155 to the experimenter through headphones. The action potentials of isolated neurons were sorted by a 156 window-discriminator (DDIS-1, Bak Electronics) and displayed on a digital storage oscilloscope 157 (DCS-7040, Kenwood). The recorded electrophysiological signals were digitized at 25 kHz by using 158 an analogue-digital conversion interface (Power 1401, CED) and then stored on a hard disk of a 159 personal computer (X100, IBM). The times of the detected action potentials, licking movements, and 160 task events were also stored on the hard disk. Rasters and histograms showing the neuronal activity 161 recorded under each probability condition and the response to the unpredicted reward were displayed 162 on-line on an LCD video screen. If visual inspection suggested that the neural activity was related to 163 one or more task events (CS, delay, and/or reward) or to the unpredicted reward given during the ITI, 164 we stored the recorded data on the computer for off-line analysis. The dataset included the activity 165 recorded in at least 7 trials for each probability condition, and in this study it was subjected to further 166 analysis. 167 Analysis of neuronal activity 168 When analyzing the activity of each neuron, we focused on the activity during four time periods 9 169 within the trial: stimulus-related activity from CS onset till 750 ms after the CS onset (first half of the 170 CS presentation), stimulus-related activity from 750 ms before the CS offset to CS offset (second 171 half of the CS presentation), reward expectation activity from 1000 ms before reward delivery till 172 reward delivery, and reward activity (and prediction error response) from reward onset till 500 ms 173 after reward delivery. 174 We classified neurons into three groups according to which of the three pre-reward time 175 windows was the one in which the neuron showed the greatest change in activity (in this analysis we 176 excluded the time window after reward delivery as we wanted to classify neurons based on the 177 activity before receipt of the reward). We calculated the t-value by comparing the baseline activity 178 (500 ms before CS onset) and the activity in each time window. The time window that yielded the 179 largest absolute t-value was considered the one with the greatest change in activity. Neurons showing 180 the greatest change during the 750 ms after the CS onset (first half of the CS presentation), during 181 the 750 ms before the CS offset (second half of the CS presentation), and during the 1000 ms before 182 reward delivery were classified as CS phasic, CS tonic, and US build-up neurons, respectively. 183 To analyze the information content coded by each neuron in each time window, we 184 conducted a multiple linear regression analysis on a trial-by-trial basis with the reward probability 185 and uncertainty as regressors (p < 0.05, uncorrected for multiple regressors). Uncertainty was 186 quantified as relative variance, which is 1 at p = 0.5 and is 0 at p = 0 and at p = 1. When analyzing 187 the activity within the 500 ms after the reward onset, we used only the rewarded trials. The response 188 to an unpredicted reward given during the ITI served as an approximation of the reward response in 10 189 the p = 0 condition in the multiple regression analysis. Neurons showing greater activity within 500 190 ms after the onset of an unpredicted reward than they did within 500 ms before the onset were judged 191 to be responsive to reward. 192 We used Wilcoxon signed rank test to determine whether the distributions of the 193 standardized beta coefficients obtained in each neuron from the multiple linear regression analysis 194 for probability and uncertainty deviated significantly (p < 0.05) from zero. To compare the effect size 195 of each regressor, we compared the absolute values of the standardized beta coefficients for reward 196 probability and uncertainty using paired t-test (p < 0.05) in each time window. We also conducted a 197 permutation test to test whether the number of neurons showing a statistical significance was greater 198 than chance level (p < 0.05). To construct population-averaged histograms, we used the following 199 procedure. First, for each neuron, we subtracted the baseline firing rate from the firing rate in each 200 bin. We then normalized the firing rate of each bin by the firing rate of the bin with the highest firing 201 rate. Finally, the normalized activity of each neuron was averaged across neuronal population. 202 Histology 203 Electrolytic lesions were made by passing electrical current through the tip of the electrode and into 204 the brain tissue. After the rat was killed with an overdose of pentobarbital, it was transcardially 205 perfused first with 0.9% saline and then with 10% formalin. Then the brain was removed from the 206 skull and stored in a 10% formalin solution. For histological inspection the brain was sliced into 207 50-μm coronal sections and stained with thionine. Slices were examined under a light microscope to 208 verify lesion site and electrode tracks. Electrode placements were finally verified using the rat brain 11 209 atlas (Paxinos and Watson 2005). The plots of recording sites were superimposed at 0.5-mm intervals 210 on corresponding coronal sections of the left hemisphere. 211 12 212 Results 213 We trained 15 head-fixed rats in a probabilistic Pavlovian conditioning task (Fig. 1). After 2 to 3 214 months of training, the rats exhibited probability-dependent spout-licking movement (i.e., longer 215 cumulative duration of licking for higher reward probability) during the CS and/or delay period 216 (Oyama et al. 2010). The training was considered complete with the emergence of discriminative 217 anticipatory licking responses during the CS and delay period. 218 Temporal response profiles of dorsal striatal neurons during the probabilistic Pavlovian 219 conditioning task 220 After the completion of training, we recorded single-unit activity in the dorsal striatum during the 221 performance of the probabilistic Pavlovian conditioning task. Of the 1102 striatal neurons whose 222 activity was isolated, 18% (N = 194) were judged from the experimenter’s visual inspection to show 223 changed activity (more or less firing than the baseline level) during a trial, and their activity was 224 recorded on the computer. 225 Figure 2 shows the activity of every recorded neuron in rewarded (Fig. 2, upper left) and 226 unrewarded trials (Fig. 2, lower left) in the 50% reward probability condition normalized by the peak 227 response. We rank-sorted neuronal activities by occurrence of their peak response in the trial as 228 calculated in the pooled data from all reward probability conditions including both rewarded and 229 unrewarded trials. We also show the activity of each neuron around the delivery of the unpredicted 230 reward (Fig. 2, right). By visual inspection there appeared to be several subtypes of neurons that had 231 different firing patterns. The first consisted of neurons showing a phasic response to the CS 13 232 presentation and to the delivery of the reward (neuron ID #1 to around #100), the second consisted of 233 neurons showing a tonic response during CS presentation (neuron ID around #101 to around #120), 234 and the third consisted of neurons showing a build-up activity towards the time of reward delivery 235 (neuron ID around #121 to #194). 236 We classified neurons into three groups according to which of the three pre-reward time 237 windows was the one in which the neuron showed the greatest change in activity (see Materials & 238 Methods). Of the 194 recorded neurons, 39% (76: all excitatory) showed the greatest change during 239 the 750 ms after the CS onset (first half of the CS presentation), 12% (23: 17 excitatory and 6 240 inhibitory) showed the greatest change during the 750 ms before the CS offset (second half of the CS 241 presentation), and 49% (95: 91 excitatory and 4 inhibitory) showed the greatest change during the 242 1000 ms before reward delivery. Hereafter we refer to the neurons showing the greatest activity 243 change during the first half of the CS presentation, second half of the CS presentation, and before 244 reward as CS phasic, CS tonic, and US build-up neurons, respectively. We will focus on the firing 245 patterns of these neurons. 246 Neurons with phasic CS response (CS phasic neurons) 247 Of the 194 recorded neurons, 39% (76/194) were classified as CS phasic neurons. They typically 248 showed a phasic response after the CS onset. Most of them (76%, 58/76) also showed a significant 249 response to the reward. We then assessed the specific correlation of activity to reward probability. We 250 conducted a multiple linear regression analysis to test whether the activity in each time window (first 251 half of the CS presentation; second half of the CS presentation; 1000 ms before reward delivery; 500 14 252 ms after reward delivery) was linearly related to reward probability or reward uncertainty, which can 253 be quantified as relative variance (1 at p = 0.5 and 0 at p = 0 and 1). In this paper we have used the 254 term “uncertainty” according to the definition by Fiorillo et al. (2003), although recent studies have 255 also referred to “uncertainty” as “risk” (e.g., Burke and Tobler 2011). We included uncertainty in the 256 regression model because tonic activity of midbrain dopamine neurons is correlated with reward 257 uncertainty (Fiorillo et al. 2003). Table 1 summarizes the results of the multiple linear regression 258 analysis. The proportions of neurons that showed a positive correlation between the activity and 259 reward probability were highest in the three time windows before the reward delivery, especially 260 during the first half of the CS presentation. In that time window the majority (53%, 40/76) showed a 261 positive correlation between CS response and reward probability. On the other hand, half of the CS 262 phasic neurons (50%, 38/76) showed a negative correlation between reward response and reward 263 probability. In total, 34% (26/76) showed a positive correlation between the CS response and reward 264 probability and showed a negative correlation between the reward response and reward probability. 265 Figures 3 and 4 show the activity of a representative CS phasic neuron and the average 266 histograms of the activity of 26 CS phasic neurons that showed CS and reward responses positively 267 and negatively related to the reward probability. They showed a phasic response both to the CS and 268 to the reward. The response during the first half of the CS was positively related to reward 269 probability (r = 0.55, p < 0.0001), whereas the reward response was negatively related to reward 270 probability (r = −0.44, p < 0.0001) and was highest for the unpredicted reward. These neurons were 271 considered to code a reward prediction error: the discrepancy between the prediction and occurrence 15 272 of a reward. These neurons were almost the same population as the one we have previously reported 273 as reward prediction error-coding neurons (Oyama et al. 2010). 274 Figure 5 shows the distributions of the standardized beta coefficients for reward probability 275 and uncertainty of all CS phasic neurons in each time window. In the three time windows before the 276 reward delivery, the distribution for reward probability showed a positive deviation from zero (p < 277 0.05, Wilcoxon signed-rank test) (Fig. 5A). On the other hand, the distribution of the reward 278 responses for reward probability showed a negative deviation (p < 0.05, Wilcoxon signed-rank test) 279 (Fig. 5A). The distribution for uncertainty did not show any deviation during the first half of the CS 280 presentation (p > 0.05, Wilcoxon signed-rank test), that during the second half of the CS presentation 281 and that before reward showed positive deviations from zero (p < 0.05, Wilcoxon signed-rank test), 282 and that after the reward delivery showed a negative deviation (p < 0.05, Wilcoxon signed-rank test) 283 (Fig. 5B). 284 To compare the effect size of each regressor, we compared the absolute values of the 285 standardized beta coefficients for reward probability and uncertainty. In every time window the 286 absolute value of the standardized beta coefficient for reward probability was greater than that of the 287 standardized beta coefficient for uncertainty (p < 0.05, paired t-test). We also confirmed this 288 tendency in the proportion of neurons that showed a statistically significant activity change during 289 the first half of the CS presentation and after the reward (chi-square test, p < 0.05 with Bonferroni 290 correction). 291 Neurons with tonic CS response (CS tonic neurons) 16 292 Of the 194 recorded neurons, 12% (23/194) were classified as CS tonic neurons. They typically 293 showed a tonic response during the CS presentation, and 30% of those neurons (7/23) showed a 294 significant response to the reward. The proportions of neurons that showed a positive correlation 295 between activity and reward probability were highest in the three time windows before the reward 296 delivery, especially during the second half of the CS presentation (Table 1). In that time window the 297 majority (57%, 13/23) showed a positive correlation between CS response and reward probability. 298 Figures 6 and 7 show the activity of a representative CS tonic neuron and the average 299 histograms of the activity of 13 CS tonic neurons that showed a CS response positively related to the 300 reward probability. The neurons were tonically active during the presentation of the CS and showed 301 no response to reward delivery. The CS response was positively related to reward probability (r = 302 0.78, p < 0.0001). 303 Figure 8 shows the distribution of the standardized beta coefficients for reward probability 304 and uncertainty of all CS tonic neurons in each time window. In the three time windows before the 305 reward delivery the distribution for reward probability showed a positive deviation from zero (p < 306 0.05, Wilcoxon signed-rank test) (Fig. 8A). The distribution of reward responses for reward 307 probability did not show any deviation (p > 0.05, Wilcoxon signed-rank test) (Fig. 8A). In every time 308 window the distribution for uncertainty did not show any deviation (p > 0.05, Wilcoxon signed-rank 309 test) (Fig. 8B). In the three time windows before the reward delivery, the absolute value of the 310 standardized beta coefficient for reward probability was greater than that of the standardized beta 311 coefficient for uncertainty (p < 0.05, paired t-test). We also confirmed this tendency in the proportion 17 312 of neurons that showed a statistically significant activity change during the second half of the CS 313 presentation (chi-square test, p < 0.05 with Bonferroni correction). 314 Neurons with pre-US build-up activity (US build-up neurons) 315 Of the 194 recorded neurons, 49% (95/194) were classified as US build-up neurons. They typically 316 showed gradually increasing activity toward the time of reward delivery. Of them, 47% (45/95) also 317 showed a significant response to the reward, while 53% (50/95) did not. The proportions of neurons 318 that showed a positive correlation between the activity and reward probability were highest in every 319 time window, especially during the 1000 ms before the reward delivery (Table 1). In that time 320 window the majority (68%, 65/95) showed a positive correlation between CS response and reward 321 probability. 322 Figures 9A and 10A show the activity of a representative US build-up neuron and the 323 average histograms of the activity of 31 US build-up neurons that showed pre-reward activity 324 positively related to reward probability with a reward response. The activity of these neurons 325 gradually increased towards the time of reward delivery and also was high after the reward. The 326 activity during the 1000 ms before the reward delivery was positively related to reward probability (r 327 = 0.68, p < 0.0001), whereas the activity after the time of reward delivery was not (p > 0.1). 328 Figures 9B and 10B show the activity of a representative US build-up neuron and the 329 average histograms of the activity of the 34 US build-up neurons that showed pre-reward activity 330 positively related to reward probability without a reward response. As in the reward-responsive class 331 of US build-up neurons, activity gradually increased towards the time of the reward delivery. The 18 332 activity of these neurons, however, steeply decreased after reward delivery. The activity during the 333 1000 ms before the reward delivery was positively and linearly related to reward probability (r = 334 0.70, p < 0.0001). The activity after the time of the reward delivery showed a positive correlation 335 with reward probability (r = 0.35, p < 0.0001), but this may simply reflect the preceding 336 probability-dependent build-up activity. The activity after the usual time of reward was higher in 337 unrewarded trials than in rewarded trials (comparison in intermediate reward probability conditions; 338 75%, 50%, and 25%; p < 0.0001, two-way ANOVA). 339 Figure 11 shows the distribution of the standardized beta coefficients for reward probability 340 and uncertainty of all US build-up neurons in each time window. In every time window the 341 distribution for reward probability showed a positive deviation from zero (p < 0.05, Wilcoxon 342 signed-rank test) (Fig. 11A). The distribution for uncertainty did not show any deviation during the 343 first half of the CS presentation (p > 0.05, Wilcoxon signed-rank test), that during the second half of 344 the CS presentation and that before reward showed a positive deviation from zero (p < 0.05, 345 Wilcoxon signed-rank test), and that after the reward delivery did not show any deviation from zero 346 (p < 0.05, Wilcoxon signed-rank test) (Fig. 11B). In every time window the absolute value of the 347 standardized beta coefficient for reward probability was greater than that of the standardized beta 348 coefficient for uncertainty (p < 0.05, paired t-test). We also confirmed this tendency in the proportion 349 of neurons that showed a statistically significant activity difference during the 1000 ms before the 350 reward delivery (chi-square test, p < 0.05 with Bonferroni correction). 351 Impact of delay extensions 19 352 In classifying the neurons as above, we assumed that the activity between the CS onset and the 353 reward onset of CS phasic and CS tonic neurons was time-locked to the CS onset and that of US 354 build-up neurons was time-locked to the reward onset. To test this assumption we conducted an 355 additional experiment in which we recorded the activity of neurons while we changed the length of 356 the delay from 0.5 s to 1.5 s, then to 3.5 s, and finally back to 0.5 s. The activity of representative CS 357 phasic, CS tonic, and US build-up neurons in this delay manipulation is shown in Fig. 12A, B, and C. 358 We calculated the onset and peak latency of event-related activity in each neuron (Fig. 13A). In CS 359 phasic neurons the onset and the peak latency of activity related to the CS onset remained unchanged 360 (p > 0.05, Kruskal-Wallis test). And the responses related to the reward onset occurred similarly, 361 irrespective of delay duration. In CS tonic neurons the onset and the peak latency of responses 362 related to the CS remained unchanged (p > 0.05, Kruskal-Wallis test). Thus we confirmed that the 363 pre-reward activity of CS phasic and CS tonic neurons was time-locked to the CS onset whereas the 364 post-reward activity of CS phasic neurons was time-locked to the reward onset. In strong contrast to 365 the pre-reward responses of CS phasic and CS tonic neurons, the onset times of pre-reward responses 366 in US build-up neurons shifted away from the CS onset when the delay to reward increased (p < 367 0.0001, Kruskal-Wallis test). The peak response times with respect to the reward onset remained 368 unchanged (p > 0.05, Kruskal-Wallis test). Thus in terms of peak response time, the activity of US 369 build-up neurons was time-locked to the reward onset. 370 We also examined whether the magnitude of the event-related activity changed with the 371 delay extension (Fig. 13B). On average, CS phasic neurons showed weaker CS responses and 20 372 stronger reward responses with delay extension, CS tonic neurons showed weaker CS responses, and 373 US build-up neurons showed reduced activity during the last 1 s before reward time (p < 0.05, 374 one-way ANOVA). However, it appeared that the activity level not only changed abruptly with the 375 shift of the delay duration but also changed gradually as the daily session progressed. It is possible 376 that the gradual decrease of the neuronal firing reflects the decrease of the animal’s motivation. We 377 used as an indicator of motivational level the number of licking movements after reward delivery. In 378 order to dissociate abrupt change with the shift of the delay duration and gradual change with the 379 motivational level, we applied multiple linear regression analysis on an individual-neuron basis, with 380 the delay length as one factor and with the number of licking movements after reward delivery as 381 another factor (p < 0.05, uncorrected). In this analysis the number of licking movements was 382 averaged for rewarded trials in 10 trials, and the averaged data was applied to all of 10 trials. The 383 results are summarized in Table 2. Of all the CS phasic neurons we found (N = 14), 6 neurons 384 showed reduced CS responses only with delay extension, 1 showed reduced response only with the 385 decrease of licking movement, and 1 showed both effects. Moreover, 6 neurons showed stronger 386 reward responses with delay extension and 1 showed stronger responses with the decrease of licking 387 movement. Of all the CS tonic neurons we found (N = 6), 1 showed reduced CS responses only 388 with delay extension, 1 showed reduced response only with the decrease of licking movement, and 3 389 showed both effects. Of all the US build-up neurons we found (N = 19), 4 showed reduced CS 390 responses only with delay extension, 6 showed reduced response only with the decrease of licking 391 movement, and 4 showed both effects. Thus in about half of the neurons the analysis of individual 21 392 neuron data confirmed the main findings from the analysis of average data. We found that a 393 considerable number of striatal neurons were affected by the delay factor, while some of them and 394 some others were affected by the motivational factor. 395 Relationship between licking movement and neuronal activity 396 As it is known that the striatum is involved in motor functions and that many neurons fire in relation 397 to movement (Rolls et al. 1983; Hikosaka et al. 1989a), it is possible that the observed neuronal 398 activity is related to movement rather than reward probability. As we have previously shown (Oyama 399 et al., 2010), anticipatory licking movement was often positively correlated with reward probability. 400 In such cases, it is impossible to include these two factors as regressors in a multiple linear 401 regression model because of the multicollinearity between them. Therefore, in order to evaluate 402 whether the reward probability or the licking movement was more suitable to explain the change of 403 neuronal activity, we first divided recorded neurons into two groups: those recorded when 404 probability-dependent anticipatory licking movement was not observed and those recorded when it 405 was observed. For the former group, we applied a model in which reward probability, uncertainty, 406 and the number of licking movements were included as regressors. For the latter group, we applied 407 two models of multiple linear regression analysis independently: one that included reward 408 probability and uncertainty as regressors, and one in which reward probability was replaced by the 409 number of licking movements. For neurons that showed correlations with both reward probability 410 and licking movement, we compared the R-squared value for each model in order to find out which 411 factor affected on the neuronal activity more. The results are summarized in Table 3. 22 412 Of all CS phasic neurons, the activity of 60 was recorded when probability-dependent 413 anticipatory licking movement was not observed. Of these neurons, 27 showed a correlation only 414 with reward probability, 3 showed a correlation only with licking movement, and 5 showed 415 correlations with both factors during the first half of the CS presentation. Of 16 neurons whose 416 activity was recorded when probability-dependent anticipatory licking movement was observed, 8 417 only showed a correlation only with reward probability, none showed a correlation only with licking 418 movement, and 2 showed correlations with both factors. Both of those neurons had a higher 419 R-squared value for the model including licking movement as a regressor. 420 Of all CS tonic neurons, the activity of 17 was recorded when probability-dependent 421 anticipatory licking movement was not observed. Of those neurons, 10 showed a correlation only 422 with reward probability, none showed a correlation only with licking movement, and 1 showed 423 correlations with both factors during the second half of the CS presentation. Of 6 neurons whose 424 activity was recorded when probability-dependent anticipatory licking movement was observed, 3 425 showed a correlation only with reward probability, 2 showed a correlation only with licking 426 movement, and none showed correlations with both factors. 427 Of all US build-up neurons, the activity of 69 was recorded when probability-dependent 428 anticipatory licking movement was not observed. Of those neurons, 29 showed a correlation only 429 with reward probability, 6 showed a correlation only with licking movement, and 19 neurons showed 430 correlations with both factors during the 1000 ms before the reward delivery. Of 26 neurons whose 431 activity was recorded when probability-dependent anticipatory licking movement was observed, 14 23 432 showed a correlation only with reward probability, 1 showed a correlation only with licking 433 movement, and 4 showed correlations with both factors. All 4 of the neurons that showed 434 correlations with both factors had a higher R-squared value for the model including licking 435 movement as a regressor. 436 Thus a few neurons that were originally considered to be non-differential were found to 437 show licking-related activity by re-analyzing the data with a model including the number of licking 438 movements as a factor. In Table 3A the numbers of those neurons are listed under the heading 439 “Licking only.” On the other hand, several neurons that were originally considered to be 440 probability-dependent were found to also show licking-related activity. The numbers of those 441 neurons are listed in Table 3A under the heading “Both.” Of the neurons that were recorded while 442 probability-dependent licking movement was observed, only a small number were found more likely 443 to be licking movement-dependent rather than reward probability. The numbers of those neurons are 444 listed in parentheses of Table 3B under the heading “Both.” These results suggest that the activity of 445 striatal neurons recorded under the Pavlovian conditioning paradigm was related more to reward 446 probability than licking movement, although some neurons, especially some US build-up neurons, 447 showed licking-related activity. However, as anticipatory spout-licking behavior may emerge with 448 the increase of animals’ intrinsic expectation level, it is unclear whether the observed licking-related 449 activity was directly related to motor function or was related only indirectly through such 450 motivational factors. 451 Relationship between tone frequency and neuronal activity 24 452 As it is known that the striatum receives sensory inputs from various cortices, we tested whether the 453 observed activity of striatal neurons can be explained by the typical sensory responses that would 454 appear as single-peaked tuning when activity in auditory-related areas is plotted against tone 455 frequency (Bordi and LeDoux 1992; Doron et al. 2002; Sally and Kelly 1988; Sutter and Schreiner 456 1999). In designing the task we determined the combination of the tone frequency and the reward 457 probability so that the reward-probability-dependent response would not appear as single-peaked 458 tone tuning. We tested for single-peaked tuning by Gaussian curve-fitting to the response magnitude 459 of each type of neurons against the logarithmically scaled tone frequency (in this curve-fitting the 460 time windows for the responses were for CS phasic neurons the first half of the CS presentation, for 461 CS tonic neurons the second half of the CS presentation, and for US build-up neurons 1000 ms 462 before the reward delivery). Since we found good fitting only for one US build-up neuron, we think 463 there is little possibility that the activity of recorded striatal neurons was an artifact of the simple 464 auditory tone tuning. 465 Comparison of firing property and waveforms between neuron types 466 We compared the baseline firing rates of the above three types of neurons. The baseline firing rates 467 of CS phasic, CS tonic, and US build-up neurons were 3.1 ± 2.5, 3.0 ± 1.9, and 3.3 ± 3.1 (mean ± 468 s.d.) spikes/s, respectively, and they did not differ between neuron types (p > 0.05, one-way 469 ANOVA). We also compared the duration of the waveforms of action potentials (width of negative 470 component at half-maximum). The durations for CS phasic, CS tonic, and US build-up neurons were 25 471 205 ± 31, 223 ± 45, and 212 ± 30 (mean ± s.d.) µs, respectively, and did not differ significantly 472 between neuron types (p > 0.05, one-way ANOVA). 473 Recording sites 474 The recording sites of each neuron were reconstructed histologically and superimposed onto coronal 475 sections of the left hemisphere of the standard rat brain atlas (Paxinos and Watson 2005). Figure 14 476 shows the recording sites of each neuron type. We found that all three neuron types were widely 477 distributed within the dorsal striatum, without any specific topographical clustering for any of the 478 three types. 479 26 480 Discussion 481 In this study we recorded single-unit activity in the dorsal striatum of head-fixed rats that had been 482 pretrained in a probabilistic Pavlovian conditioning task using auditory cues. The neurons recorded 483 in rats performing this task could be categorized into three types based on their firing patterns. CS 484 phasic neurons showed a phasic response to the CS onset, and the magnitude of this response was 485 positively related to reward probability. The majority of these neurons also showed a phasic reward 486 response whose magnitude was negatively related to reward probability (Fig. 4). Thus, many CS 487 phasic neurons showed greater phasic responses to CSs that predicted higher reward probability and 488 showed greater phasic responses to less probable rewards. These firing properties correspond to the 489 firing properties of midbrain dopamine neurons and indicate that a subset of CS phasic neurons code 490 a reward prediction error at both the CS onset and the reward onset. In our previous study (Oyama et 491 al. 2010) we compared this type of neurons with midbrain dopamine neurons and concluded that they 492 have highly similar firing properties. CS tonic neurons showed a tonic response during the CS 493 presentation without a reward response, and the magnitude of this response was positively related to 494 reward probability (Fig. 7). These neurons can be considered to code the value of the stimulus. US 495 build-up neurons showed gradually increasing activity toward the time of reward delivery, and the 496 magnitude of the pre-reward activity was positively related to reward probability. The firing of this 497 type of neurons may reflect the animal’s internal expectation about the upcoming reward (Fig. 10). 498 For all three types of neurons, we examined how activity changed as the delay duration 499 between CS offset and reward onset was prolonged. As expected, the phasic responses in CS phasic 27 500 neurons were time-locked to the CS and reward onset and the tonic response in CS tonic neurons was 501 time-locked to the CS onset. The build-up activity in US build-up neuron continued to peak at the 502 reward onset regardless of the delay extension, but the build-up activity itself was prolonged as the 503 delay was extended. These results confirmed the validity of the interpretation of the function of each 504 neuron type. In general, extension of the time interval between the CS and the reward leads to a 505 decrease of the value of the CS or the value of the reward when measured at the time of the CS, 506 which in behavioral economics is known as “temporal discounting” (Ainslie 1975). It has been 507 reported that value-coding neurons show this devaluing effect induced by delay prolongation 508 (Kobayashi and Schultz 2008; Cai et al. 2011). We found that for CS phasic and CS tonic neurons the 509 CS responses in the high-probability conditions decreased as the delay was prolonged. And given 510 that the reward value predicted by the CS decreases with longer delays, the prediction error that 511 occurs at the actual delivery of reward increases. In addition, the timing of reward is less precise with 512 longer CS-US intervals, which also increases the prediction error (Fiorillo et al. 2008). In accordance 513 with this, we also observed that for CS phasic neurons the reward responses under the 514 high-probability conditions were stronger when the delay was prolonged. When the delay was again 515 set to the original duration, the activity level did not return all the way to the original level: the peak 516 of the CS response in CS phasic and CS tonic neurons and the peak of the build-up activity in US 517 build-up neurons were lower than in the initial session. As the whole procedure of extending the 518 delay duration in two steps and then bringing the delay duration back to the original length took a 519 long time, the motivational level of the animal could have been reduced considerably during the 28 520 procedure. It is possible that the data not only reflects the effect of time discounting but also an 521 overall decrease of the motivational level over time. 522 In the present task the activity of the majority of the recorded neurons depended on reward 523 probability as indicated by different CSs. This dependency on reward probability implies that the 524 activity was the product of associative learning during the extensive training phase. Previous studies 525 on the synaptic mechanisms in the striatum have shown that long-term potentiation can occur at 526 cortico-striatal synapses when the striatal neuron receives both cortical and dopaminergic inputs 527 (Canales et al. 2002; Reynolds et al. 2001; Wickens et al. 1996). In addition, dopamine neurons, 528 which send dense projections to the striatum, fire in response to unexpected rewards, i.e., when a 529 positive reward prediction error occurs (Schultz et al. 1998). These results suggest that when a 530 reward is given after the presentation of a CS, the cortico-striatal synapses that transmit the sensory 531 information of the CS would be strengthened. During extensive training, the synapses that transmit 532 the information of a CS indicating higher reward probability, which is more frequently followed by 533 reward, would be strengthened further. As a result of this process, the presentation of a CS indicating 534 higher reward probability would elicit greater striatal activation. This may be the mechanism by 535 which information about stimulus value is acquired. Similarly, the probability-dependent phasic CS 536 response of the reward prediction error-coding neurons may be formed through this process. 537 Moreover, it is conceivable that the firing of neurons coding stimulus value may differentially 538 change the animals’ internal motivational state, the elevation of which would be reflected in the 539 build-up activity of reward expectation-coding neurons toward the time of reward delivery. The 29 540 reward expectation signal would lead to preparation for appropriately acquiring the reward, such as 541 directing attention to the reward and preparing to execute the reward-acquiring action. At the timing 542 of the reward delivery, the reward expectation signal would be used to calculate the reward 543 prediction error signal, which is represented in the phasic response to the reward of the reward 544 prediction error-coding neurons. 545 According to the firing properties of neurons and the waveform of action potentials (Oyama 546 et al. 2010), it is most likely that we recorded from medium spiny projection neurons, which 547 constitute the vast majority of striatal neurons (Apicella 2007; Oorschot 1996). Our results indicate 548 that within the striatal medium spiny neurons there are discrete functional subtypes that code 549 different aspects of reward. It is known that there are different subpopulations of medium spiny 550 neurons, such as neurons belonging to the direct pathway or indirect pathway and neurons located in 551 the patch or matrix. Recent studies using transgenic animals and molecular biological techniques 552 have found that neurons belonging to the direct and indirect pathway have different motor functions 553 (Kravitz et al. 2010) and cognitive functions, such as learning (Hikida et al. 2010; Kravitz et al. 554 2012). These results suggest that striatal neurons with different histochemical properties code 555 different information. However, in order to understand how a neuron relates to a larger neuronal 556 network and how it functions and interacts with other neurons, we need to investigate the precise 557 morphological and histochemical background of the neuron, including its type, which receptors it 558 expresses, and which other neurons it projects to. Staining a single neuron after having recorded 559 from it (Oyama et al. 2013) during a behavioral paradigm will allow for such histochemical and 30 560 morphological investigations and may reveal the functions and relationships of discrete subtypes of 561 striatal neurons that code different reward-related information. 562 In our recorded neurons, only a small population showed activity related to reward 563 uncertainty, which is maximal at a reward probability of 50% and gradually decreases as reward 564 probability becomes smaller or larger (although we found in our previous study (Oyama et al. 2010) 565 that none of the RPE-coding neurons, a subset of CS phasic neurons of this study, showed activity 566 related to uncertainty, that may have been a consequence of our having underestimated the number of 567 neurons that code uncertainty because the statistical method we used was not as powerful as the one 568 used in this study). This suggests that the striatum is preferentially involved in coding parametric 569 reward value rather than reward uncertainty. Such a conclusion is consistent with human imaging 570 findings (Tobler et al. 2008) indicating that striatal activation is dependent on reward probability but 571 not on reward uncertainty in a very similar probabilistic Pavlovian conditioning task. Furthermore, 572 we found that only a small number of neurons showed negative correlations between CS related 573 activity and reward probability, even though positive correlations between CS-related activity and 574 reward probability were substantial and numerous. This suggests that in a probabilistic Pavlovian 575 conditioning paradigm, negative reward value coding is not common in dorsal striatal neurons. This 576 contrasts with previous studies investigating striatal value representation in monkeys involved in an 577 instrumental task, in which 30–60% of task-related neurons coded value negatively (Cromwell et al. 578 2003; Samejima et al. 2005). In addition, our striatal neurons showed neither an increase nor 579 decrease of activity at the time of an unexpected reward omission, neither on the population level nor 31 580 on the single-neuron level. This suggests that negative reward prediction error was not coded by 581 striatal neurons in our task and contrasts with what is known about dopamine neurons, which are 582 known to code both positive and negative prediction errors in monkeys (Schultz et al. 1998) and in 583 rodents (Oyama et al. 2010). On the other hand, a recent study recording from monkeys performing 584 an instrumental conditioning task demonstrated that both positive and negative prediction errors were 585 represented in presumed medium spiny neurons primarily by increases in firing rates (Assad and 586 Eskandar 2011). These inconsistencies with previous studies may be attributable to task or species 587 differences. 588 In a behavioral task in which both reward and punishment were used as unconditioned 589 stimuli, Matsumoto and Hikosaka (2009) claimed that a subpopulation of dopamine neurons encodes 590 general motivational salience rather than value (but see Fiorillo et al. 2013). Their argument raises 591 the possibility that striatal neurons showing probability-dependent activity may reflect the 592 motivational salience but not the value of stimuli and outcomes. The behavioral task we used in this 593 study, however, cannot dissociate value from motivational salience (Kahnt et al. 2014). Therefore we 594 cannot rule out the possibility that some neurons recorded in this study code motivational salience 595 rather than value. To determine whether the activity of a neuron reflects reward, punishment, or 596 motivational salience, we need to record the activity in a behavioral paradigm in which both reward 597 and punishment are used as unconditioned stimuli. 598 It has been suggested that the dorsomedial striatum mediates action-outcome learning or 599 goal-directed behavior and that the dorsolateral striatum mediates stimulus-response learning or 32 600 habitual behavior (Barnes et al. 2005; Jog et al. 1999; Packard and Knowlton 2002; Yin et al. 2004; 601 Yin et al. 2005). In this study we recorded from both medial and lateral areas and found many active 602 neurons, although we used a Pavlovian paradigm in which the animals were not required to perform 603 any action. Our data suggest that the dorsal striatum is involved not only in goal-directed or habitual 604 behavior but also in more general associative learning including probabilistic Pavlovian conditioning. 33 605 Acknowledgements 606 Grants 607 This study was funded by Grants-in-Aid for Scientific Research (KAKENHI) #24223004, 608 #24243067 and #19673002 to K.T. K.O. was supported by JSPS as a Research Fellow and was 609 funded by KAKENHI #24-8027. PNT was supported by the Swiss NSF (PP00P1_128574 and 610 PP00P1_150739). 611 Disclosures 612 No conflicts of interest, financial or otherwise, are declared by the authors. 613 34 614 615 616 617 618 619 620 621 622 References Ainslie G. Specious reward: a behavioral theory of impulsiveness and impulse control. Psychol Bull 82: 463–496, 1975. Anden NE, Carlsson A, Dahlstroem A, Fuxe K, Hillarp NA, Larsson K. Demonstration and mapping out of nigro-neostriatal dopamine neurons. Life Sci 3: 523–530, 1964. Apicella P, Scarnati E, Ljunberg T, Schultz W. Neuronal activity in monkey striatum related to the expectation of predictable environmental events. J Neurophysiol 68: 945–960, 1992. Apicella P. Leading tonically active neurons of the striatum from reward detection to context recognition. Trends Neurosci 30: 299–306, 2007. 623 Asaad WF, Eskandar EN. Encoding of both positive and negative reward prediction errors by 624 neurons of the primate lateral prefrontal cortex and caudate nucleus. J Neurosci 31: 625 17772–17787, 2011. 626 627 Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM. Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature 437: 1158–1161, 2005. 628 Bordi F, LeDoux J. Sensory tuning beyond the sensory system: An initial analysis of auditory 629 response properties of neurons in the lateral amygdaloid nucleus and overlying areas of the 630 striatum. J Neurophysiol 12: 2493–2503, 1992. 631 632 633 Bolam JP, Hanley JJ, Booth PA, Bevan MD. Synaptic organization of the basal ganglia. J Anat 196: 527–542, 2000. Burke CJ, Tobler PN. Coding of reward probability and risk by single neurons in animals. Front 35 634 635 636 Neurosci 5:121, 2011. Cai X, Kim S, Lee D. Heterogeneous coding of temporally discounted values in the dorsal and ventral striatum during intertemporal choice. Neuron 69: 170–182, 2011. 637 Canales JJ, Capper-Loup C, Hu D, Choe ES, Upadhyay U, Graybiel AM. Shifts in striatal 638 responsivity evoked by chronic stimulation of dopamine and glutamate systems. Brain 125: 639 2353–2363, 2002. 640 641 642 643 644 645 646 647 Cromwell HC, Schultz W. Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. J Neurophysiol 89: 2823–2838, 2003. Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299:1898–1902, 2003. Fiorillo CD, Newsome WT, Schultz W. The temporal precision of reward prediction in dopamine neurons. Nat Neurosci 11: 966–973, 2008. Fiorillo CD, Yun SR, Song MR. Diversity and homogeneity in responses of midbrain dopamine neurons. J Neurosci 33: 4693–4709, 2013. 648 Graybiel AM. The basal ganglia. Curr Biol 10: R509–R511, 2000. 649 Hassani OK, Cromwell HC, Schultz W. Influence of expectation of different rewards on 650 behavior-related neuronal activity in the striatum. J Neurophysiol 85: 2477–2489, 2001. 651 Hikida T, Kimura K, Wada N, Funabiki K, Nakanishi S. Distinct roles of synaptic transmission 652 in direct and indirect striatal pathways to reward and aversive behavior. Neuron 66: 896–907, 653 2010. 36 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons I. Activities related to saccadic eye movements. J Neurophysiol 61: 780–798, 1989. Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons II. Visual and auditory responses. J Neurophysiol 61: 799–813, 1989. Hollerman JR, Tremblay L, Schultz W. Influence of reward expectation on behavior-related neuronal activity in primate striatum. J Neurophysiol 80: 947–963, 1998. Jog MS, Kubota Y, Connolly CI, Hillegaart V, Graybiel AM. Building neural representations of habits. Science 286: 1745–1749, 1999. Kahnt T, Park SQ, Haynes JD, Tobler PN. Disentangling neural representations of value and salience in the human brain. Proc Natl Acad Sci 111: 5000–5005, 2014. Kawagoe R, Takikawa Y, Hikosaka O. Expectation of reward modulates cognitive signals in the basal ganglia. Nat Neurosci 1: 411–416, 1998. Kepecs A, Uchida N, Zariwala HA, Mainen ZF. Neural correlates, computation and behavioral impact of decision confidence. Nature 455: 227–231, 2008. Kimura M. Behaviorally contingent property of movement-related activity of the primate putamen. J Neurophysiol 63: 1277–1296, 1990. 670 Kravitz AV, Freeze BS, Parker PRL, Kay K, Thwin MT, Deisseroth K, Kreitzer AC. 671 Regulation of parkisonian motor behaviors by optogenetic control of basal ganglia circuitry. 672 Nature 466: 622–626, 2010. 673 Kravitz AV, Tye LD, Kreitzer AC. Distinct roles for direct and indirect pathway striatal neurons in 37 674 675 676 677 678 reinforcement. Nat Neurosci 15: 816–818, 2012. Lau B, Glimcher PW. Value representations in the primate striatum during matching behavior. Neuron 58: 451–463, 2008. Matsumoto M, Hikosaka O. Two types of dopamine neurons distinctly convey positive and negative motivational signals. Nature 459: 837–841, 2009. 679 Nakamura K, Santos G, Matsuzaki R, Nakahara H. Differential reward coding in the 680 subdivisions of the primate caudate during an oculomotor task. J Neurosci 32: 15963–15982, 681 2012. 682 683 Ogawa M, van der Meer MAA, Esber GR, Cerri DH, Stalnaker TA, Schoenbaum G. Risk-responsive orbitofrontal neurons track acquired salience. Neuron 77: 251–258, 2013. 684 Oorschot DE. Total number of neurons in the neostriatal, pallidal, subthalamic, and substantia 685 nigral nuclei of the rat basal ganglia: a stereological study using the Cavalieri and optical 686 disector methods. J Comp Neurol 366: 580–599, 1996. 687 688 Oyama K, Hernádi I, Iijima T, Tsutsui K. Reward prediction error coding in dorsal striatal neurons. J Neurosci 30: 11447–11457, 2010. 689 Oyama K, Ohara S, Sato S, Karube F, Fujiyama F, Isomura Y, Mushiake H, Iijima T, Tsutsui 690 KI. Long-lasting single-neuron labeling by in vivo electroporation without microscopic 691 guidance. J Neurosci Methods 218: 139–147, 2013. 692 693 Packard MG. Knowlton BJ. Learning and memory functions of the basal ganglia. Annu Rev Neurosci 25: 563–593, 2002. 38 694 695 696 697 698 699 700 701 702 703 Paxinos G, Watson C. The Rat Brain in Stereotaxic Coordinates. San Diego, CA: Academic Press, 2005. Reynolds JN, Hyland BI, Wickens JR. A cellular mechanism of reward-related learning. Nature 413: 67–70, 2001. Rolls ET, Thorpe SJ, Maddison SP. Responses of striatal neurons in the behaving monkey. 1. Head of the caudate nucleus. Behav Brain Res 7: 179–210, 1983. Sally SL, Kelly JB. Organization of auditory cortex in the albino rat: sound frequency. J Neurophysiol 59: 1627–1638, 1988. Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the striatum. Science 310: 1337–1340, 2005. 704 Schultz W. Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1–27, 1998. 705 Sutter ML, Schreiner CE. Physiology and topography of neurons with multipeaked tuning curves 706 707 708 709 710 in cat primary auditory cortex. J Neurophysiol 65: 1207–1226, 1991. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998. Tobler N, Christopoulos GI, O’Doherty JP, Dolan RJ, Schultz W. Neuronal distortions of reward probability without choice. J Neurosci 28: 11703–11711, 2008. 711 Wickens JR, Begg AJ, Arbuthnott GW. Dopamine reverses the depression of rat corticostriatal 712 synapses which normally follows high-frequency stimulation of cortex in vitro. Neuroscience 713 70: 1–5, 1996. 39 714 715 716 717 718 Yin HH, Knowlton BJ, Balleine BW. Lesions dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci 19: 181–189, 2004. Yin HH, Ostlund SB, Knowlton BJ, Balleine BW. The role of the dorsomedial striatum in instrumental conditioning. Eur J Neurosci 22: 513–523, 2005. 40 719 Figure Legends 720 Fig 1. Outline of the behavioral paradigm. A: The apparatus. The rats were involved in a probabilistic 721 Pavlovian conditioning task with the head stabilized with a head fixation device and with body 722 movement restricted by an acrylic half-cylinder. The auditory stimuli used in this study were 723 generated by a personal computer and presented from two loudspeakers 30 cm from the head of the 724 rat. A sucrose solution was given through a spout in front of the rat’s mouth, and an infrared sensor 725 system was used to detect spout-licking movements. B: Time sequence of task events in a trial. Lt., 726 Left; Rt, right. 727 Fig. 2. Temporal response profiles of all 194 neurons whose activity was recorded during the 728 probabilistic Pavlovian conditioning task. The upper left panel shows the activity in rewarded trials, 729 and the lower left panel shows the activity in unrewarded trials of the 50% condition. The right panel 730 shows the activity around the time of delivery of the unpredicted reward. Each row represents 731 peak-normalized and baseline-subtracted activity for a single neuron, and the data are sorted from 732 top to bottom by peak response time. For neurons which showed responses to both the CS/delay and 733 the reward, the peak response time of their CS/delay response was used to align them. The horizontal 734 bars above the histograms and white dashed lines indicate the durations and times of CS presentation 735 and reward delivery. We used the moving-window method to calculate the peak response time of 736 each neuron. A 50-ms window was moved in 10-ms steps from the onset of the CS to the end of the 737 delay period, and the peak response time was determined as the center of the 50-ms window showing 738 maximum bin height. 41 739 Fig. 3. Activity of a representative CS phasic neuron. Rasters and histograms (bin width = 50 ms) 740 showing the activity recorded in different reward probability conditions. Rasters and histograms are 741 aligned to the CS onset. The horizontal bars below the histograms indicate the durations of CS 742 presentation and reward delivery. 743 Fig. 4. Population activity of CS phasic neurons that showed CS and reward responses positively and 744 negatively related to the reward probability (N = 26). Peak-normalized and baseline 745 activity-subtracted population histograms (bin width = 50 ms) are shown for rewarded trials (upper 746 left), for unrewarded trials (lower left), and for unpredicted reward (upper right). Lines of different 747 colors represent the neuronal activity recorded in different reward probability conditions (red = 748 100%, orange = 75%, purple = 50%, green = 25%, light blue = 0%) and during delivery of 749 unpredicted rewards (blue). Each bin was smoothed by a moving average of three bins. 750 Fig. 5. Distributions of the standardized beta coefficients of CS phasic neurons obtained from the 751 multiple linear regression analysis. A: Distribution of the standardized beta coefficients for reward 752 probability. B: Distribution of the standardized beta coefficients for uncertainty. Time windows used 753 for the analyses are first half of the CS presentation (upper left), second half of the CS presentation 754 (upper right), 1000 ms before reward delivery (lower left), 500 ms after reward delivery (lower right), 755 respectively. Asterisks at upper right and upper left in the graph indicate that the distribution showed 756 a positive and negative deviation from zero (p < 0.05, Wilcoxon signed-rank test), respectively. 757 Fig. 6. Activity of a representative CS tonic neuron. Conventions are the same as in Fig. 3. 758 Fig. 7. Population activity of CS tonic neurons that showed CS response positively related to reward 42 759 probability (N = 13). Conventions are the same as in Fig. 4. 760 Fig. 8. Distributions of the standardized beta coefficients of CS tonic neurons. Conventions are the 761 same as in Fig. 5. 762 Fig. 9. Activity of two representative US build-up neurons, one with a reward response (A) and the 763 other without a reward response (B). Conventions are the same as in Fig. 3. 764 Fig. 10. Population activity of US build-up neurons. A: Peak-normalized and baseline 765 activity-subtracted population histograms of the US build-up neurons that showed pre-reward 766 activity positively related to reward probability with a reward response (N = 31). B: Peak-normalized 767 and baseline activity-subtracted population histograms of the US build-up neurons that showed 768 pre-reward activity positively related to reward probability without a reward response (N = 34). 769 Conventions are the same as in Fig. 4. 770 Fig. 11. Distributions of the standardized beta coefficients of US build-up neurons. Conventions are 771 the same as in Fig. 5. 772 Fig. 12. Effects of delay extension on the activity of representative neurons. A: Activity of a 773 representative CS phasic neuron with a delay of 0.5 s (top row), 1.5 s (second row), and 3.5 s (third 774 row). The bottom row represents the activity in a second 0.5-s delay condition after the 3.5-s delay 775 condition. The horizontal bar below the raster indicates the duration of the CS presentation, and the 776 arrows above each condition indicate the reward delivery times. B: Activity of a representative CS 777 tonic neuron. C: Activity of a representative US build-up neuron. For all types of neurons, only 778 responding in the 100% condition is shown for simplicity. 43 779 Fig. 13. Effects of delay extension on the onset and peak latency (A) and the magnitude of 780 event-related activity (B) in each type of neuron. A: Onset and peak latencies of CS phasic neurons 781 related to the CS onset (upper left) and to the reward onset (upper right), CS tonic neurons related to 782 the CS onset (lower left), and US build-up neurons related to the reward onset (lower right). Filled 783 circles and open triangles represent the onset and peak latency of each neuron, respectively. B: 784 Magnitude of the CS (upper left) and reward response (upper right) of CS phasic neurons, CS 785 response of CS tonic neurons (lower left), and pre-reward activity of US build-up neurons (lower 786 right). Filled circles, filled squares, and open triangles represent the activity in the 100%, 50%, and 787 0% conditions, respectively. The time window for determining the peak normalized response was 788 750 ms from CS onset for the CS response of CS phasic neurons, 500 ms from reward onset for the 789 reward response of CS phasic neurons, 750 ms before CS offset for CS tonic neurons, and 1000 ms 790 before reward onset for US build-up neurons. 791 Fig. 14. Recording site for each neuron type. Numbers at the bottom indicate the anteroposterior 792 coordinates (in mm) from bregma. Coordinates were taken from the stereotaxic atlas of Paxinos & 793 Watson (2005). Filled circles represent the recording locations of neurons that showed 794 probability-dependent CS response or pre-reward activity, and open circles represent the recording 795 locations of neurons that showed nonprobability-dependent CS response or pre-reward activity. A B Stimulus Sucrose Reward Rt. Speaker 0 1 2 Time(s) Solenoid valve Lick sensor Lt. Speaker Figure 1 Rewarded trials in 50% reward condition CS Unpredicted reward Reward Reward 0 20 Neuron ID number 40 60 80 100 120 140 160 180 -1 0 1 2 3 Time from CS onset (s) -1 0 1 Time from reward onset (s) Peak normalized activity Unrewarded trials in 50% reward condition CS -0.1 0 0.8 No reward 0 20 Neuron ID number 40 60 80 100 120 140 160 180 -1 0 1 2 Time from CS onset (s) 3 Figure 2 Rewarded trials Unrewarded trials 100% CS Reward 75% 75% 50% 50% 25% 25% Unpredicted reward 0% 1s 20/s 0 Figure 3 CS Reward 0.25 100% 0.5 75% 50% 25% 0.25 0.0 0.0 Peak normalized response 0.5 N = 26 CS Peak normalized response 0.5 0.25 Unpredicted reward No reward 75% 50% 25% 0% 0.0 1s Figure 4 A B 50 50 * 25 0 -0.8 50 * 25 0 0.8 Before reward 0 -0.8 50 * 25 0 -0.8 CS 2nd half 0 0.8 After reward 0.8 0 -0.8 50 * CS 1st half 50 25 0 -0.8 50 25 0 Uncertainty Proportion of neurons (%) Proportion of neurons (%) Probability CS 1st half 0 0.8 0 -0.8 * 25 0 0.8 Before reward 0 -0.8 50 * 25 Standardized beta coefficient CS 2nd half 0 0.8 After reward * 25 0 0.6 0 -0.6 0 0.8 Standardized beta coefficient Figure 5 Rewarded trials Unrewarded trials 100% CS Reward 75% 75% 50% 50% 25% 25% 20/s Unpredicted reward 0% 1s 0 Figure 6 CS Reward 0.25 100% 0.5 75% 50% 25% 0.25 0.0 0.0 Peak normalized response 0.5 N = 13 CS Peak normalized response 0.5 0.25 Unpredicted reward No reward 75% 50% 25% 0% 0.0 1s Figure 7 A B 50 CS 1st half 50 * 25 0 -0.8 50 * 25 0 0.8 Before reward 0 -0.8 50 * 25 0 -0.8 CS 2nd half 0 0.8 After reward 0.8 0 -0.8 50 CS 1st half 50 25 0 -0.8 50 0 Standardized beta coefficient 0.8 0 -0.8 CS 2nd half 25 0 0.8 Before reward 0 -0.8 50 25 25 0 Uncertainty Proportion of neurons (%) Proportion of neurons (%) Probability 0 0.8 After reward 25 0 0.6 0 -0.6 0 0.8 Standardized beta coefficient Figure 8 A Rewarded trials Unrewarded trials 100% CS Reward 75% 75% 50% 50% 25% 25% Unpredicted reward 0% 1s B Rewarded trials 20/s 0 Unrewarded trials 100% CS Reward 75% 75% 50% 50% 25% 25% Unpredicted reward 0% 20/s 1s Figure 9 0 A CS 0.25 0.0 0.0 Peak normalized response N = 31 CS 0.5 Peak normalized response Reward 100% 0.5 75% 50% 25% 0.25 0.5 Unpredicted reward No reward 75% 50% 25% 0% 0.25 0.0 1s B CS 0.25 0.0 0.0 Peak normalized response N = 34 0.5 Peak normalized response Reward 100% 0.5 75% 50% 25% 0.25 0.5 0.25 CS Unpredicted reward No reward 75% 50% 25% 0% 0.0 1s Figure 10 A B 50 CS 1st half 50 * 25 0 -0.8 50 * 25 0 0.8 Before reward 0 -0.8 50 * 25 0 -0.8 CS 2nd half 0 0.8 After reward 0.8 0 -0.8 50 * CS 1st half 50 25 0 -0.8 50 0 Standardized beta coefficient 0.8 0 -0.8 CS 2nd half * 25 0 0.8 Before reward 0 -0.8 50 * 25 25 0 Uncertainty Proportion of neurons (%) Proportion of neurons (%) Probability 0 0.8 After reward 25 0 0.6 0 -0.6 0 0.8 Standardized beta coefficient Figure 11 A Reward 1s CS B Reward 1s CS C Reward CS 1s Figure 12 B 300 200 100 0 0.5 1.5 3.5 0.5 CS tonic 1200 800 400 0 0.5 1.5 3.5 0.5 Time from reward onset (ms) CS phasic (CS res.) Time from reward onset (ms) Time from CS onset (ms) Time from CS onset (ms) Onset and peak latency 400 CS phasic (Rew. res.) 200 0 0.5 1.5 3.5 0.5 Magnitude of event-related activity Peak normalized response A CS phasic (CS res.) 0.2 0.2 0.1 0 0 0.5 1.5 3.5 0.5 0.5 CS tonic US build-up 1000 CS phasic (Rew. res.) 0.4 0.3 1.5 3.5 0.5 US build-up 0.5 0.5 0.25 0.25 0 -2000 0 -4000 0.5 1.5 3.5 0.5 Delay length (s) onset latency peak latency 0 0.5 1.5 3.5 0.5 0.5 1.5 3.5 0.5 Delay length (s) 100% 50% 0% Figure 13 CS phasic CS tonic US build-up -1.0 -0.5 0.0 +0.5 +1.0 +1.5 2 mm Probability-dependent activity Non-probability-dependent activity Figure 14 Table 1. Summary of the relationship between the activity of each type of neuron and reward probability or uncertainty. Type Window Probability Positive CS phasic N = 76 Uncertainty Negative Positive Negative CS 1st half (0-750) 40 (53*) 2 (3) 0 (0) 2 (3) CS 2nd half (750-1500) 16 (21*) 8 (11*) 5 (7*) 2 (3) Before reward 17 (22*) 5 (7*) 6 (8*) 2 (3) 1 (1) 38 (50*) 2 (3) 6 (8*) CS 1st half (0-750) 12 (52*) 0 (0) 1 (4) 3 (13*) CS 2nd half (750-1500) 13 (57*) 1 (4) 2 (9*) 1 (4) Before reward (1000-2000) 12 (52*) 1 (4) 2 (9*) 2 (9*) 0 (0) 2 (9*) 0 (0) 0 (0) 26 (27*) 3 (3) 5 (5) 5 (5) CS 2nd half (750-1500) 46 (48*) 1 (1) 5 (5) 3 (3) Before reward (1000-2000) 65 (68*) 1 (1) 10 (11*) 3 (3) After reward (2000-2500) 28 (29*) 10 (11*) 7 (7*) 7 (7*) (1000-2000) After reward (2000-2500) CS tonic N = 23 After reward (2000-2500) CS 1st half US build-up N = 95 (0-750) Numbers in parentheses below the title of the time window show the time from the CS onset (msec). Numbers in parentheses in each box show percentages of the neurons. Asterisks indicate that the proportion is greater than chance level (Permutation test, p < 0.05). Table 2. Summary of the relationship between the activity of each type of neuron and delay length or licking movement after reward delivery. Type Delay length only Licking only Both CS phasic (CS) N = 14 6 1 1 CS phasic (Reward) 6 1 0 1 1 3 4 6 4 N = 14 CS tonic N=6 US build-up N = 19 Note that reward responses of CS phasic neurons increased with delay extension or decrease of licking movement, while pre-reward activity of all types of neurons decreased with delay extension or decrease of licking movement. Table 3. Summary of the relationship between the activity of each type of neuron and reward probability or licking movement. A, Recorded without probability-dependent licking movement Type Probability only Licking only Both CS phasic N = 60 27 3 5 CS tonic N = 17 10 0 1 US build-up N = 69 29 6 19 B, Recorded with probability-dependent licking movement Recorded with probability-dependent licking movement Type CS phasic Probability only Licking only Both 8 0 2 (2) CS tonic N=6 3 2 0 (0) US build-up N = 26 14 1 4 (4) N = 16 Numbers in parentheses show the number of neurons that had a higher R-squared value for the model including licking movement as a regressor.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Discrete coding of stimulus value, reward expectation, and reward