* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The role of the basal ganglia in reinforcement learning
Convolutional neural network wikipedia , lookup
Neuroethology wikipedia , lookup
Haemodynamic response wikipedia , lookup
Neurotransmitter wikipedia , lookup
Nonsynaptic plasticity wikipedia , lookup
Endocannabinoid system wikipedia , lookup
Neural modeling fields wikipedia , lookup
Response priming wikipedia , lookup
Electrophysiology wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Single-unit recording wikipedia , lookup
Mirror neuron wikipedia , lookup
Multielectrode array wikipedia , lookup
Caridoid escape reaction wikipedia , lookup
Molecular neuroscience wikipedia , lookup
Central pattern generator wikipedia , lookup
Neuroanatomy wikipedia , lookup
Axon guidance wikipedia , lookup
Biological neuron model wikipedia , lookup
Circumventricular organs wikipedia , lookup
Neural oscillation wikipedia , lookup
Sensory cue wikipedia , lookup
Stimulus (physiology) wikipedia , lookup
Neural correlates of consciousness wikipedia , lookup
Metastability in the brain wikipedia , lookup
Neuroeconomics wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
Pre-Bötzinger complex wikipedia , lookup
Clinical neurochemistry wikipedia , lookup
Development of the nervous system wikipedia , lookup
Synaptic gating wikipedia , lookup
Optogenetics wikipedia , lookup
Nervous system network models wikipedia , lookup
Efficient coding hypothesis wikipedia , lookup
Premovement neuronal activity wikipedia , lookup
Feature detection (nervous system) wikipedia , lookup
Neural coding wikipedia , lookup
The role of the basal ganglia in reinforcement learning Thesis submitted for the degree of “Doctor of Philosophy” by Mati Joshua Submitted to the Senate of the Hebrew University of Jerusalem March 2009 This work was carried out under the supervision of Prof. Hagai Bergman Abstract Table of Contents ……………………….…....... 1 Abstract …………………………………….…... 2 Introduction ………………………..………....... 4 I. Formalism of the reinforcement learning problem ..................................... 4 II. Basal ganglia anatomy ................................................................................... 6 III. The basal ganglia as a reinforcement learning agent ................................. 7 IV. The research goals and thesis outline ........................................................... 9 Methods …………………………….……….... 11 Results ………………………….….....……...... 13 I. Value encoding by basal ganglia neuromodulators .................................. 14 II. Value encoding by basal ganglia high frequency GABAergic neurons .. 26 III. Value encoding by basal ganglia low frequency GABAergic neurons .... 41 IV. Value encoding by correlated activity of the basal ganglia ...................... 52 V. Quantifying quality of extracellular recording ..........................................62 Discussion ………………………..…...……...... 78 I. Asymmetry in the encoding of values in the basal ganglia ....................... 79 II. Encoding of dopaminergic neurons ............................................................ 80 III. Comparing basal ganglia subpopulations .................................................. 81 IV. The basal ganglia in control of motor behavior..........................................85 Bibliography........................................................88 Appendix …………………………………….... 93 An algorithm for detection of eye state ............................................................ 93 1 Abstract Abstract The basal ganglia are neural structures within the motor, cognitive and limbic control circuits of the mammalian forebrain. Recent experimental and theoretical studies depict the basal ganglia as a reinforcement learning system. This model suggests that basal ganglia activity enables maximization of future reward by controlling the environment. Research has indicated that midbrain dopaminergic neurons respond with an increase in their firing rate when the situation is better than expected (positive surprise). This signal is in accordance with a reinforcement error signal. However, the low tonic discharge rate of the dopaminergic neurons suggests that their capability to encode negative events by suppressing firing rate is limited. This limitation of the dopaminergic signal suggests two possibilities. The first is that activity in the basal ganglia encodes both positive and negative values. The second is that activity in the basal ganglia encodes only positive values and negative values are encoded by other neural structures. To dissociate these possibilities I have trained two monkeys on a probabilistic conditioning task with food, neutral and airpuff outcomes. I recorded the activity of single neurons in six distinct areas of the basal ganglia of awake behaving monkeys from both basal ganglia neuro-modulators (midbrain dopaminergic neurons and cholinergic interneurons of the striatum - TANs) and from the GABAergic neurons of the main axis of the basal ganglia (medium spiny neurons, external and internal segments of the globus pallidus and substantia nigra reticulata). The licking and blinking behavior during cue presentation indicated that the monkeys expected the different probabilistic appetitive, neutral and aversive outcomes. Nevertheless, the activity of all five basal ganglia nuclei following the cues was strongly modulated by expectation of reward but not by expectation of the aversive event. Furthermore, this neural activity better reflected the probability of future reward than the probability of future aversive outcome. A comparison of the properties of responses of the modulators and GABAergic neurons showed that modulators had phasic and homogeneous responses whereas responses of the GABAergic neurons were sustained and diverse including coincident increases and decreases of discharge rate. 2 Abstract Analysis of the correlation between cells revealed that the synchronization between dopaminergic neurons transiently increased following rewarding but not aversive events. The dynamics of the increase in synchronization did not mirror the dynamics of rate modulations. A simulation suggests that the changes in dopaminergic synchronization could provide an additional mechanism for controlling their concentrations in the striatum, beyond firing rate and pattern. Thus, the difference between the response properties of the basal ganglia subsystems suggests distinct function of these populations where the modulators provide a scalar signal to the main axis of the basal ganglia network. The neural-behavioral asymmetry shows that aversive events and rewards are represented in segregated neuronal systems. This might be the physiological basis for aversive-appetitive asymmetric human behavior. 3 Introduction Introduction In an attempt to understand neuronal information processing, David Marr (1) identified three levels of analysis: the problems which must be overcome (computational level), the strategy that can be used (algorithmic level) and how it actually occurs in neural activity (implementation level). In their influential book, Sutton and Barto deal with the first two levels for the reinforcement learning problem (2). They state: "Reinforcement learning (RL) is learning what to do…how to map a situation to actions…-so to maximize a numerical reward signal". They investigated the different classes of algorithms that can solve the RL problem. In the last ten years neuroscientists have started dealing with RL problem at the third level of neural implementation. None of these levels stand alone; it is the interactions between disciplines and questions that may lead to a better grasp of the nature of RL. In the following sections of the introduction I present a formalization of the reinforcement learning problem and its solutions. I then review the main anatomical components of the basal ganglia and discuss the physiological evidence that connects RL theory and basal ganglia activity. Finally I outline the major goals of this research, which is aimed at reducing the gap between computational theoretical RL models and current knowledge of basal ganglia activity. Formalism of the reinforcement learning problem A reinforcement learning system can be divided into four sub-elements: Policy – the policy defines the agent's actions in the environment. Given a state of the environment, a policy is a mapping from this state to an action. Reward function – After executing an action in a given state the agent receives a single value reward. This reward defines the goal of the learning agent; i.e., to maximize this value in the long run. Value function – given a policy, the value function of a state is the total amount of reward an agent can expect to receive in the future. Environment Model – given a state and an action, the environment model provides the statistics of the next state and the expected reward. Mathematically, let S be a set of states and A be a set of all actions; then a policy is a conditioned probability function: 4 Introduction Π(a ∈ A | s ∈ S ) that gives the probability of taking an action at a given state. The environment model contains a probabilistic function: P (a ∈ A, st ∈ S , s t +1 ∈ S ) that specifies the transition probability of moving from state st to st+1 by taking action a. The reward function is a probability function: R (r ∈ ℜ | a ∈ A, s t ∈ S , s t +1 ∈ S ) that gives the probability of receiving a reward given an action that was taken in state st and has led to state st+1. Π The value function V (s ∈ S ) gives the expected future reward given a policy. Rewards in the distant future may be worth less than near-future rewards. One of the ways to model this "present preference" is by defining the value function as a discounted sum of the future reward: ∞ V Π (st ) = E ∑ γ n ⋅ r (t + n) n=0 , where st is the state at time t, r(t+n) is the reward n time steps after time t, and γ is a discount parameter between 0 and 1. The computational problem of RL is to follow a policy that maximizes the future reward; i.e., to search for a policy ∏* that maximizes the value function V(s) (it can be proven that such a policy exists). Solution of the RL problem - algorithms Finding the optimal policy ∏* relies on two processes. The first is evaluating the quality of a policy ∏; i.e., calculating the value function of all states given a policy V∏(s). The second is improving a policy; i.e., finding a policy ∏' such that V∏'(s) ≥ V∏(s). These two processes have been combined in many ways in different situations. When a full description of the environment has been established, one can use dynamic programming. This method uses the statistics of the environment and combines value iterations to evaluate the value function and policy iteration to improve the policy. However this approach cannot be implemented in the most common case where a full and reliable model of the environment cannot be found. Even when such a model exists it may be impractical to use it. The state of the art algorithm for solving the RL with no prior knowledge of the environment statistics is TD (λ). At each step the algorithm improves the estimation of the value function by generating a prediction error δt: 5 Introduction δ t = rt +1 + γ ⋅ V(s t +1 )-V(s t ) Where rt+1 is the reward given at time t+1, st and st+1 are the states of the agent at time t and t+1 respectively; V is the estimation of the value function at time t and γ is the discounting factor defined above. The error is then used to update the estimation of the policy and value function. This estimation is updated to states that were visited in the past and the parameter λ fixes the amount of credit each step in the past should receive. When λ = 0, the algorithm is known as the TD algorithm; in this case only the value function of the last state is updated. When λ = 1 the algorithm is the Monte Carlo algorithm- in this case there is no decay in the update rate of past states. A very important subclass of methods that uses the δ error for solving the RL problem is known as actor-critic methods. These methods use a separate memory to represent the policy and the value function. The actor stores the policy and the critic stores the value and generates the TD error when there is a mismatch between predictions and actual outcomes. The error is then used to update the policy and the value function. This sub-class is important because of its similarity to the biological structure of basal ganglia networks. Basal Ganglia anatomy The basal ganglia are neural structures within the motor, cognitive and limbic control circuits in the mammalian forebrain. The neural network of the basal ganglia is commonly viewed as two functionally related subsystems, the main axis and the neuromodulators (3-5). The main axis subsystem includes connections between all neocortical areas, the amygdala and the hippocampus and the basal ganglia input structures; i.e., the striatum (caudate, putamen and ventral striatum) and the subthalamic nucleus. These project both directly and indirectly through the external segment of the globus pallidus (GPe) to the basal ganglia output structures - the internal segment of the globus pallidus (GPi) and the substantia nigra reticulata (SNr). The GPi and SNr modify behavior through their projections to the frontal cortex (via the thalamus) and brain stem pre- motor nuclei (5-7). The major population of neurons in the striatum is made up of the medium spiny neurons (MSN). These GABAergic neurons, which constitute >90% of the striatum 6 Introduction cells, receive their major excitatory input from the cortex and the thalamus and project to both segments of the globus pallidus and SNr. In addition their axons give rise to a local collateral arborization, which contact other spiny neurons (8). Other striatal neurons are the small GABAergic interneurons (1% of the population) and the large cholinergic interneurons (2%). These cholinergic neurons are thought to correspond to the physiologically defined (by extra-cellular recording) tonically active neurons TANs (9, 10). Other types of striatal interneurons have also been observed (11). In the classic view of the basal ganglia (7, 12) transmission of information within the basal ganglia occurs both directly from the striatum to the GPi/SNr and indirectly through the GPe and STN. The striatal origins of the direct and indirect pathways are oppositely affected by D1 and D2 dopamine receptors (13-15). Recently, single axon tracing anatomical studies have revealed an even more complex map of basal ganglia connectivity. Striatal neurons projecting to the GPi and SNr send collaterals to the GPe (16, 17). The physiological evidence for the importance of direct projections from the motor cortex to the STN (the ‘hyper-direct pathway’) indicates that like the striatum, the STN is an input stage of the basal ganglia (18, 19). In addition the recently described feedback projections from the GPe to the striatum (8, 20) demonstrate the additional complexity of the network compared to the classical view. The basal ganglia neuro-modulators adjust activity along the main axis by regulation of plasticity at the corticostriatal synapses (21, 22). The primary basal ganglia neuromodulators are dopamine (from midbrain dopaminergic neurons, 23) and acetylcholine (from striatal cholinergic interneurons, TANs, 22). In Parkinson's disease, in which the dopaminergic system is the most seriously damaged, but the noradrenergic, serotonergic and cholinergic systems are also affected (24), demonstrates the importance of neuromodulator input to the basal ganglia main axis. The basal ganglia as a reinforcement learning agent The pioneering studies of Schultz et. al. (25) showed that dopaminergic neurons increase their discharge rate when conditions are better than expected. These studies (26) indicated that a dopaminergic cell that initially responds to delivery of food shifts its response to an external cue that predicts the delivery of food and stops responding to food delivery. 7 Introduction Based on these results it was suggested that temporal difference prediction error (25). dopaminergic neurons encode the Other studies extended these groundbreaking findings and showed that the dopaminergic signal resembles the TD error signal (26-37). It has been shown at the cellular level that dopamine contributes to plasticity in the striatum (14, 21, 38). Based on the response properties of dopamine neurons and the plasticity effects of the dopamine on the striatal neurons, reinforcement learning models of the basal ganglia assume that the teaching message is transmitted to striatal territories and reshapes the behavioral policy. Reinforcement learning models have influenced basal ganglia research for the last decade, yet there are still many fundamental questions which have not been addressed. One of the major issues that still needs to be investigated in detail is: what are the other neural correlates for reinforcement learning besides the dopaminergic activity? An important neuromodulator subpopulation is the cholinergic neurons of the striatum. Consistent with the classical concept of dopamine-acetylcholine balance (39), the dopaminergic neurons and the TANs have opposite responses. Dopaminergic neurons typically increase their discharge rate in response to appetitive predictive cues and outcomes, whereas TANs suppress their tonic discharge (40). Thus, it has been suggested that some of the dopamine influence on striatal projections neurons is mediated through inhibition of the TANs (41). The typical TANs response has led to the conclusion that they may not encode the prediction error themselves but may condition the dopaminergic signal (28). Another fundamental issue is the neural correlates of RL with main axis activity. Reward modulation of the main axis has mainly been studied at the level of the striatum (42-45). Several studies have revealed discharge modulation of pallidal and SNr neurons by reward (46-49) and even by the probability of future reward (50). Nevertheless, unlike the dopaminergic studies, these studies did not find a simple and coherent relation with RL models. The lack of a negative teacher Most studies of dopaminergic neurons have focused on the mismatch in the positive domain of reinforcement; i.e., when conditions are better than expected (25). Dopaminergic neurons typically increase their discharge rate in response to appetitive 8 Introduction predictive cues and outcomes. In line with the predictions of reinforcement learning theories, dopaminergic neuron discharge decreases with omission of predicted rewards (29, 51, 52). However, this discharge suppression is limited since the neuronal firing rate is truncated at zero. In fact, several groups (27, 28) have reported that the instantaneous firing of Dopaminergic neurons does not demonstrate incremental encoding of reward omission, and it was suggested that omission is encoded by the duration of the discharge decrease (53). There are even fewer studies and less agreement on basal ganglia responses to aversive events. There is no consensus regarding the responses of dopaminergic neurons to aversive events. Some studies suggest that at least some of the dopaminergic neurons increase their firing rate following an aversive outcome (see 54 for review). Other classical and instrumental conditioning studies suggest that some dopamine neurons increase their firing rate following a cue that predicts aversive outcomes (55, 56). Studies on anesthetized rats have shown that Dopaminergic neurons mainly decrease their discharge rate following an aversive stimulus (57, 58), but a recent study by this group showed that the decrease is limited to VTA dopaminergic neurons (59). There are reports that TANs activity differentiates appetitive and aversive stimuli (60, 61), but it remains unclear whether and how TANs respond to expectation of aversion. There are no studies on the responses of the primate basal ganglia main axis high frequency neurons to expectation of deterministic or probabilistic aversive events. The research goals and thesis outline The main goal of my research was to compare aversive and reward related activity in the basal ganglia. The main research question was whether activity in the basal ganglia encodes positive and/or negative values. Another goal was to test whether the anatomical division of the basal ganglia systems into neuromodulators and main axis is also reflected in the activity of these populations. Furthermore, does this division also reflect functional differences between these subpopulations? Specifically does neuromodulator activity resemble activity expected from a RL teacher (e.g., a critic) and is activity in the main axis consistent with it being the executor of the system (e.g., the actor)? 9 Introduction To test these issues I trained two monkeys on a probabilistic conditioning task with food, neutral and airpuff outcomes and recorded single cell activity in the basal ganglia. The first chapter describing my work was published in the Journal of Neuroscience. In this paper I analyzed the activity midbrain dopaminergic neurons and striatal cholinergic interneurons (neuromodulators). I found that both dopaminergic and cholinergic neurons were more strongly modulated by reward than by aversive related events and better reflected the probability of reward than aversive outcome. I also found that these populations encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. The second chapter was published in the Journal of Neurophysiology. In this paper I analyzed the activity of main axis neurons. Like neuromodulators, the cells in the GPe, GPi and SNr also showed preferential activation to reward. I compared these populations and found differences between the output structures of the basal ganglia. The third chapter of the results analyzes the activity of the two GABAergic subpopulations of the basal ganglia: the low frequency discharge neurons of the globus pallidus and the phasically active neurons of the striatum. I found that although these populations have different physiological properties (low vs. high frequency of the other GABAergic populations) the low frequency discharge neurons show asymmetry in value encoding. The fourth chapter was published in Neuron. In this paper I conducted a correlation analysis and found that responses of the neuromodulators were homogenous whereas the main axis responses were diverse. In addition I found that pairs of neuromodulator cells dynamically modulate correlation. These changes in correlations may provide an additional mechanism for controlling their concentrations in the striatum, beyond firing rate and pattern. The fifth chapter was published in the Journal of Neuroscience Methods. In this paper I describe methods which I developed to quantify the quality of the isolation of extracellular recordings. These methods were used in the first four chapters of the thesis. 10 Methods Methods A full description of the methods which were used in this research can be found in the method section of the articles. In this chapter I briefly summarize the behavior task and the recoding technique. Behavioral Task recording and data acquisition Two monkeys (L and S, Macaque fascicularis, female 4 kg and male 5 kg) were engaged in a probabilistic delay classical-conditioning task. Seven different fractal cues, filling the entire screen, were introduced to the monkey, each predicting the outcome in a probabilistic manner. Three cues (reward cues) predicted a liquid food outcome with a delivery probability of 1/3, 2/3 and 1. Three other cues (aversive cues) predicted an airpuff outcome with a delivery probability of 1/3, 2/3 and 1. The 7th cue (the neutral cue) was never followed by a food or airpuff outcome. Cues were presented for two seconds and were immediately followed by a result epoch which could include an outcome (food, airpuff) or no outcome according to the probabilities associated with the cue. All trials were followed by a variable inter- trial interval. Following the training period (L: 6, S: 2 months), I recorded the behavior and the basal ganglia neural activity while the monkeys were engaged in the behavioral task. Both monkeys had reached a steady state in their behavior before recording; monkey L was trained for a longer period since during training I was preparing the data acquisition setup for recording. After the training period a MRI compatible head holder and a recording chamber were attached to the monkey’s head. The head holder enabled the immobilization of the head during recording. The recording chamber was attached to the skull tilted 40° laterally in the coronal plane with its center targeted at the stereotaxic coordinates of the GPe (62, A15, L7, H1; 63).. In each recording sessions I recorded extracellular activity from 8 glass-coated tungsten microelectrodes which were advanced separately into the targets in the basal ganglia. Spike activity was sorted and classified online using a template-matching algorithm. During recording, units were classified according to anatomical location, extracellular waveform, firing rate and pattern, background activity and in some cases response to free reward and to injection of dopamine agonists. 11 Methods In addition to spiking data I monitored mouth movements by an infrared reflection detector and three computerized digital video cameras recorded the monkey's face and upper limbs at 50 Hz. Video analysis was carried out on home-made custom software to identify periods when the monkeys closed their eyes (see appendix). . 12 Results RESULTS Chapter details: I. Midbrain Dopaminergic Neurons and Striatal Cholinergic Interneurons Encode the Difference between Reward and Aversive Events at Different Epochs of Probabilistic Classical Conditioning Trials. Mati Joshua, Avital Adler, Rea Mitelman, Eilon Vaadia and Hagai Bergman. Journal of Neuroscience. 2008 28(45): 11673-11684. II. Encoding of probabilistic rewarding and aversive events by pallidal and nigral neurons. Mati Joshua, Avital Adler, Boris Rosin, Eilon Vaadia and Hagai Bergman. J Neurophysiol. 2009 Feb;101(2):758-72 III. Asymmetric Encoding of Positive and Negative Expectations by LowFrequency Discharge Basal Ganglia Neurons. Mati Joshua, Avital Adler and Hagai Bergman. IV. Synchronization of midbrain dopaminergic neurons is enhanced by rewarding events. Mati Joshua, Avital Adler, Yifat Prut, Eilon Vaadia, Jeffery R. Wickens and Hagai Bergman. Neuron, 2009 June 11; 62(5): 695–704 V. Quantifying the isolation quality of extracellularly recorded action potentials. Mati Joshua, Shlomo Elias, Odeya Levine and Hagai Bergman. The Journal of Neuroscience Methods, 2007 Jul 30;163(2):267-82. 13 Results I The Journal of Neuroscience, November 5, 2008 • 28(45):11673–11684 • 11673 Behavioral/Systems/Cognitive Midbrain Dopaminergic Neurons and Striatal Cholinergic Interneurons Encode the Difference between Reward and Aversive Events at Different Epochs of Probabilistic Classical Conditioning Trials Mati Joshua,1,2 Avital Adler,1,2 Rea Mitelman,1,2 Eilon Vaadia,1,2 and Hagai Bergman1,2,3 1 Department of Physiology, The Hebrew University–Hadassah Medical School, Jerusalem 91120, Israel, and 2The Interdisciplinary Center for Neural Computation and 3Eric Roland Center for Neurodegenerative Diseases, The Hebrew University, Jerusalem 91904, Israel Midbrain dopaminergic neurons (DANs) typically increase their discharge rate in response to appetitive predictive cues and outcomes, whereas striatal cholinergic tonically active interneurons (TANs) decrease their rate. This may indicate that the activity of TANs and DANs is negatively correlated and that TANs can broaden the basal ganglia reinforcement teaching signal, for instance by encoding worse than predicted events. We studied the activity of 106 DANs and 180 TANs of two monkeys recorded during the performance of a classical conditioning task with cues predicting the probability of food, neutral, and air puff outcomes. DANs responded to all cues with elevations of discharge rate, whereas TANs depressed their discharge rate. Nevertheless, although dopaminergic responses to appetitive cues were larger than their responses to neutral or aversive cues, the TAN responses were more similar. Both TANs and DANs responded faster to an air puff than to a food outcome; however, DANs responded with a discharge elevation, whereas the TAN responses included major negative and positive deflections. Finally, food versus air puff omission was better encoded by TANs. In terms of the activity of single neurons with distinct responses to the different behavioral events, both DANs and TANs were more strongly modulated by reward than by aversive related events and better reflected the probability of reward than aversive outcome. Thus, TANs and DANs encode the task episodes differentially. The DANs encode mainly the cue and outcome delivery, whereas the TANs mainly encode outcome delivery and omission at termination of the behavioral trial episode. Key words: primate; basal ganglia; spike train; reinforcement; substantia nigra; striatum Introduction The neural network of the basal ganglia (Bar-Gad and Bergman, 2001; Gurney et al., 2004) is commonly viewed as two functionally related subsystems. The main axis includes fast neurotransmissions (glutamate and GABA) between the cortex, striatum, and the basal ganglia output structures. The second subsystem is composed of neuromodulators that adjust the activity along the main axis by regulation of plasticity at the corticostriatal synapse (Calabresi et al., 2000; Reynolds et al., 2001). The primary basal ganglia neuromodulators are dopamine [from midbrain dopaminergic neurons (DANs) (Arbuthnott and Wickens, 2007)] and acetylcholine [from striatal cholinergic tonically active interneurons (TANs) (Calabresi et al., 2000)]. Previous studies have shown that DANs encode the prediction Received Aug. 13, 2008; accepted Sept. 16, 2008. This work was partly supported by the “Fighting against Parkinson” grant from the Hebrew University Netherlands Association. We thank Dr. Bryon Gomberg for MRI; Michael Levi and Michal Rivlin for help in preparing the experimental setup; Yael Renernt and Inna Finkes for monkey training and general assistance; and Geoffrey Schoenbaum, Yavin Shaham, and Genela Morris for critical reading of early versions of this manuscript. Correspondence should be addressed to Mati Joshua, Department of Physiology, The Hebrew University–Hadassah Medical School, P.O. Box 12272, Jerusalem 91120, Israel. E-mail: [email protected]. DOI:10.1523/JNEUROSCI.3839-08.2008 Copyright © 2008 Society for Neuroscience 0270-6474/08/2811673-12$15.00/0 14 error in the positive domain; (i.e., they respond when conditions are better than expected) (Schultz et al., 1997). Consistent with the classical concept of dopamine–acetylcholine balance (Barbeau, 1962), the DANs and the TANs have opposite responses. DANs typically increase their discharge rate in response to appetitive predictive cues and outcomes, whereas TANs suppress their tonic discharge (Graybiel et al., 1994). Thus, it has been suggested that some of the dopamine influence on striatal projections neurons is mediated through inhibition of the TANs (Wang et al., 2006). In contrast to the extensive research on reward-related activity, only a few studies have explored whether basal ganglia neurons encode the negative domain (e.g., aversive outcome or omission of rewards, which might not be identically encoded by the nervous system). Dopamine neurons decrease their firing rate in response to reward omission (Schultz et al., 1997). However, this suppression is limited because firing rate is truncated at zero. Other groups (Morris et al., 2004; Bayer and Glimcher, 2005) have reported that the discharge rate of dopaminergic neurons does not demonstrate instantaneous incremental encoding of reward omission, and an alternative encoding scheme, based on response duration, has been proposed (Bayer et al., 2007). There are even fewer studies and less agreement on basal ganglia re- Results I Joshua et al. • Value Encoding by Basal Ganglia Critics 11674 • J. Neurosci., November 5, 2008 • 28(45):11673–11684 sponses to aversive events. Classical and instrumental conditioning studies suggest that some of the dopamine neurons increase their firing rate after a cue that predicts aversive outcomes (Mirenowicz and Schultz, 1996; Guarraci and Kapp, 1999). However, that increase in firing rate may be a result of reward generalization. Studies on anesthetized rats have shown that DANs mainly decrease their discharge rate after aversive stimulus (Ungless et al., 2004; Coizet et al., 2006). There are reports that TAN activity differentiates appetitive and aversive stimuli (Ravel et al., 2003; Yamada et al., 2004), but it remains unclear whether and how TANs respond to expectation of aversion. Here, we designed a classical conditioning paradigm with aversive and rewarding probabilistic outcomes. Symmetric manipulations of expectation of food (rewarding event) or an air puff (aversive event) enable the comparison of neural responses to expectation of positive and negative outcomes. To provide additional controls for sensory, arousal-related, and generalization responses, our behavioral task included neutral trials, which had the same structure as the rewarding and the aversive trials but never yielded positive or negative outcomes. side were attached to the monkey’s head. The head holder enabled the immobilization of the head during recording. The recording chamber was attached to the skull tilted 40° laterally in the coronal plane with its center targeted at the stereotaxic coordinates of the GPe (A15, L7, H1) (Szabo and Cowan, 1984; Martin and Bowden, 2000). Analgesia and antibiotics were administered during surgery and continued for 2 d postoperatively. Recording began after a postoperative recovery period of 5 d. We estimated the stereotaxic coordinates of the physiological recordings within the basal ganglia nuclei with MRI scans (see Fig. 1a). The MRI scan (General Electric 1.5 Tesla system; fast spin echo inversion recovery sequence; dual surface coil; repetition time, 3 s; echo time, 0.044 s; inversion time, 0.3 s; echo train length, 8; coronal slices, 2 mm wide) (Matsui et al., 2007) was performed with five tungsten electrodes at accurate coordinates of the chamber [Y,X ⫽ (6,0), (0,⫺6), (0,0), (0,6), and (⫺6,0) in mm from the chamber center]. We then aligned the two-dimensional MRI images with the sections of the atlas of Macaca fascicularis (Martin and Bowden, 2000). We performed an additional MRI scan at the final stage of the recording period of monkey L to verify our coordinate system. At the end of the experiment, the chamber and head holder of both monkeys were removed, the skin was sutured, and after a recovery period the monkeys were sent to a primate sanctuary (http://monkeypark.co.il). All surgical procedures were performed under aseptic conditions and general isoflurane and N2O deep anesthesia. MRI procedure was performed under Dormitor and ketamine light anesthesia. Recording and data acquisition. During recording sessions, the monkey’s head was immobilized and eight glass-coated tungsten microelectrodes (impedance, 0.2– 0.8 M⍀ at 1000 Hz), confined within a cylindrical guide (1.65 mm inner diameter), were advanced separately (EPS; Alpha Omega Engineering) into the targets in the basal ganglia. The electrical activity was amplified with a gain of 5K and bandpass filtered with a 1– 6000 Hz four-pole Butterworth filter and continuously sampled at 25 kHz by 12 bits ⫾ 5 V analog-to-digital (A/D). Spike activity was sorted and classified on-line using a template-matching algorithm (ASD; Alpha Omega Engineering). Spike detection pulses and behavioral events were sampled at 25 kHz (AlphaLab; Alpha Omega Engineering). Mouth movements were monitored by an infrared reflection detector (see Fig. 2a) (Dr. Bouis Devices). The infrared signal was filtered between 1 and 100 Hz by a bandpass four-pole Butterworth filter, and sampled at 1.56 kHz. In addition, three computerized digital video cameras recorded the monkey’s face and upper limbs at 50 Hz. Video analysis was performed on home-made custom software to identify periods when the monkeys closed their eyes (see Fig. 2b). Briefly, monkey eye location was identified by a human observer (once for a daily recording session in which the monkey’s head was immobilized by connecting the head holder to an external metal frame), and a classification of eye states (open or closed) was made based on the number of dark pixels in the eye area. The algorithm was tested by random samples from several recording days and found to be consistent with the judgments of a human observer for ⬎99% of the images. In representing recording sessions, we recorded the monkeys’ spontaneous vocalizations, their arm movements with an accelerometer, eye position using infrared reflection, and heartbeat by electrocardiogram (ECG) (veterinary ECG 5 leads system; Palco Laboratories). During the acquisition of the neuronal data, two experimenters (M.J., A.A.) controlled the position (2–50 m steps) of the eight electrodes and the on-line spike sorting (ASD; Alpha Omega). Quality of detection and spike sorting was estimated and graded on-line every 3 min. The on-line quality estimation was based on the superimposed analog traces of the recently (20 –100) sorted spikes, the waveforms of events that crossed an amplitude threshold set by the experimenter above the noise level of each electrode, the cumulative distribution of the distances from the detected events to the detection template, and the stability of the discharge rate. The first step in the neuronal data analysis targeted verification of the real-time isolation quality (Joshua et al., 2007) and stability of the discharge rate (Gourévitch and Eggermont, 2007). Recorded units were subjected to off-line quality analysis that included tests for rate stability, refractory period, waveform isolation, and recording time. First, firing rate as a function of time during the recording session was graphically displayed, and the largest continuous segment of stable data were selected Materials and Methods All experimental protocols were performed in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and with Hebrew University guidelines for the use and care of laboratory animals in research, supervised by the institutional animal care and use committee. Behavioral task. Two monkeys (L and S; macaque fascicularis; female, 4 kg; male, 5 kg) were engaged in a probabilistic delay classicalconditioning task (see Fig. 1b). The monkeys were seated in a primate chair facing a 17 inch computer screen placed at a distance of 50 cm. Seven different fractal cues (Chaos Pro 3.2 program; www.chaospro.de), stretched on the entire screen, were introduced to the monkey, each predicting the outcome in a probabilistic manner. Three cues (reward cues) predicted a liquid food outcome (L, 0.4 ml, 100 ms duration; S, 0.6 ml, 150 ms) with a delivery probability of 1/3, 2/3, and 1. Three other cues (aversive cues) predicted an air puff outcome (L, 100 ms duration; S, 150 ms; 50 –70 psi; split and directed 2 cm from each eye; Airstim; San Diego Instruments) with a delivery probability of 1/3, 2/3, and 1. The seventh cue (the neutral cue) was never followed by a food or air puff outcome. Cues were presented for 2 s and were immediately followed by a result epoch, which could include an outcome (food, air puff) or no outcome according to the probabilities associated with the cue. The beginning of the result epoch was signaled by one of three sounds that discriminated the three possible events: a drop of food, an air puff, or no outcome (see Fig. 1b). Sounds were normalized to the same intensity and duration. These sounds were additional to the background device sounds (air puff solenoid and food pump). All trials were followed by a variable intertrial interval (ITI) (monkey S, 3–7; monkey L, 4 – 8 s). Because of the probabilistic structure of the behavioral task and to equalize the average occurrence of each outcome the nondeterministic cues ( p ⫽ 1 for reward or aversive outcome) were introduced three time more than the deterministic ones. With this occurrence ratio, all trials were randomly interleaved. During a behavioral session (usually five sessions per week), the monkeys performed 900 –1300 trials/d before losing their motivation for food. Over the weekend, the monkeys were given ad libitum access to food. Water was available ad libitum during all training and recording periods. After the training period (L, 6; S, 2 months), we recorded the behavior and the basal ganglia neural activity while the monkeys were engaged in the behavioral task. The same images and sounds were used both for training and for the recording periods (L, 6; S, 5 months); however, the visual and the auditory stimuli were shuffled between monkeys. Surgery, magnetic resonance imaging, and rehabilitation. After the training period, a magnetic resonance imaging (MRI)-compatible Cilux head holder and a square Cilux recording chamber with a 27 mm (inner) 15 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11675 Table 1. The neural database Population No. of cells TAN L: 71 S:109 DAN L: 41 S: 65 Isolation score Fraction ISI ⬍2 ms Discharge rate (spike/s) Recorded time (s) No. of recorded trials No. of spikes/recorded cell 0.92 ⫾ 0.06 关0.8 – 0.99兴 0.91 ⫾ 0.06 关0.8 – 0.99兴 0.76 ⫾ 0.16 关0.5– 0.99兴 0.8 ⫾ 0.13 关0.5– 0.99兴 0.0002 ⫾ 0.0003 关0 – 0.0016兴 0.0002 ⫾ 0.0003 关0 – 0.0016兴 0.0008 ⫾ 0.0009 关0 – 0.0039兴 0.0007 ⫾ 0.001 关0 – 0.005兴 6.8 ⫾ 1.4 关3.1–9.4兴 5.0 ⫾ 1.4 关1.93–9.2兴 4.5 ⫾ 1.9 关0.69 –9.4兴 3.7 ⫾ 1.6 关0.77–9.4兴 3647 ⫾ 1891 关1210 –9901兴 3414 ⫾ 1920 关1260 –10,801兴 2992 ⫾ 1372 关1260 –7195兴 3984 ⫾ 2037 关1260 –11,161兴 309 ⫾ 168 关111– 828兴 378 ⫾ 204 关123–1221兴 254 ⫾ 118 关110 – 621兴 393 ⫾ 200 关126 –1098兴 25,112 ⫾ 14,270 关5026 – 66,410兴 17,147 ⫾ 10,948 关2442– 63,364兴 13,211 ⫾ 10,817 关3952– 67,895兴 15,070 ⫾ 10,327 关2324 – 46,617兴 The recording statistics were calculated separately for each neural population. Each cell in the table contains the mean and SD and in brackets the range of the scores. The range of the isolation score is 0 to 1. ⬙Fraction ISI ⬍2 ms⬙ is the fraction of interspike intervals (ISIs) ⬍2 ms of all ISIs of a cell. Recording time and number of recorded trials represent only the part of the recording satisfying the inclusion criteria and included in the analysis database. Figure 1. MRI and task. a, MRI identification of recording coordinates. Coronal MRI images numbered with respect to distance (in millimeters) from anterior commissure. Tungsten microelectrodes are inserted at known chamber coordinates. Identification of the brain structures is based on alignment of the MRI images with the monkey atlas. Abbreviations: AC, Anterior commissure; C, caudate; Chm, recording chamber (filled with 3% agar); Elc, electrode; G, globus pallidus; P, putamen; S, substantia nigra; T, thalamus. b, Behavioral task. Top, Reward trials; middle, neutral trials; bottom, aversive trials. Cues are shown for monkey L. Different speaker colors represent different sounds. for additional analysis. Second, cells in which ⬎0.02 of the total interspike intervals were ⬍2 ms were excluded from the database. Third, only TANs with an isolation score (Joshua et al., 2007) ⬎0.8 and DANs with an isolation score ⬎0.5 were included in the database. The lower threshold used for the DANs is attributable to the highly dense cellular structure of the substantia nigra pars compacta (SNc) which makes single-cell isolation difficult. We tested the subgroup of DANs with an isolation score ⬎0.8 (N ⫽ 52) and found the same qualitative population results as reported for the larger DAN population. Finally, only cells that met the above inclusion criteria for ⬎20 min during performance of the behavioral task were included in the neural database (average, 59 min and 346 trials). Table 1 provides the statistical details of the cells that were included in the analysis database. During recording, units were classified according to anatomical location, extracellular waveform, firing rate and pattern, background activity, and in some cases response to free reward and to injection of dopamine agonists. To validate classification, we performed off-line analyses of the extracellular waveform shape, firing rate, and firing pattern of the neurons (see Figs. 3b, 4b). Waveform shape was quantified as the duration from the first negative peak to the next positive peak; rate was defined as the average of the overall firing rate; firing pattern was quantified by the coefficient of variation (SD/mean) of the interspike intervals. To further validate the DAN population response, we repeated the population analysis on a subset of DANs with a firing rate ⬍8 Hz and peak-topeak duration of ⬎0.5 ms. The results of this analysis were similar to those of the whole recorded population (data not shown). Finally, apomorphine (0.1 mg/kg) was injected in a few cases (see Fig. 4c) to test for suppression of DAN activity (Aebischer and Schultz, 1984). We quantified this suppression as the root mean square (RMS) of the high-pass-filtered signal (300 – 6000 Hz). We used the RMS and not the spike rate to avoid possible errors and biases induced by spike detection and sorting (Moran et al., 2006), which are enhanced after apomorphine intramuscular injection because of monkey movements. Statistical analysis. Neuronal responses to behavioral events were first characterized by their poststimulus time histogram (PSTH). The histograms were calculated in 1 ms bins and smoothed with a Gaussian window with a SD of 20 ms. The baseline firing rate was calculated by averaging the firing rate in the last 3 s of the variable (4 – 8, 3–7 s; monkey L and S, respectively) ITI and was denoted as baselineFR as follows: baselineFR ⬅ mean(psthITI_END共t兲). 0ⱕtⱕ3 To determine significant responses in the single PSTH analysis, we calculated the SD of the PSTH of the last 3 s of the ITI using the same number of trials as in the studied PSTH and identified time segments in which the deviation from baselineFR exceeded three times the ITI-SD (3 rule). A response was considered significant only if the 16 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics 11676 • J. Neurosci., November 5, 2008 • 28(45):11673–11684 duration of the deviant segment was ⬎60 ms (three times the SD of the smoothing filter). A cell was considered to have a significant response on a trial epoch if at least one of its PSTHs (e.g., one of the three PSTHs after reward aversive or neutral cue) had a significant response. To check that this analysis was not biased by multiple comparison confounds, we performed the same analysis on ITI epoch and found that none of the cells was significantly modulated at this epoch. We defined the difference index between two events as the mean absolute difference between their PSTHs, i.e., the following: difference index (event1, event2)⬅ mean(abs(psth1共t兲⫺psth2共t兲)). t This index is a mean difference between rate functions and hence has units of spike per second. To test the significance of this index, we used resampling (bootstrap) methods. Singletrial responses were shuffled and resampled repeatedly into two groups, and the difference index was then calculated between them. This process was repeated 500 times. A difference index was considered significant if it was larger than the difference indices of a given fraction of these surrogates (1 ⫺ p where p is the test confidence level). To cross-check the difference index results, we performed MANOVAs ( p ⬍ 0.05) using 50 and 100 ms time bins. We also bootstrapped the MANOVA statistics and found that all these analyses yielded similar results. In this manuscript, we elected to show the difference index because it gives an intuitive range of difference [i.e., the average difference (in spike rate) between the responses to two events]. We derived two indices from the difference index; the first was the response index that was defined as the difference index when one of the events was the neutral event, i.e. the following: response index (event)⬅difference index (event, neutral event). Figure 2. Behavioral monitoring and results. a, Mouth signal: example from the reward cue epoch of the licking signal, monitored by an infrared reflection detector. The black arrow indicates time of cue presentation, and the gray arrow indicates cue offset and reward tone onset. b, Image of monkey’s eyes. Video signal was processed and each frame was classified according to the state of the eyes [i.e., open (top) or closed (bottom)]. c, Behavioral results. Top, Licking (average ⫾ SEM) as recorded by an infrared reflection detector directed at the monkey’s mouth. The voltage output of the detector was sampled by A/D converter and the y-scale is given in arbitrary A/D units. Bottom, Fraction of trials with eyes closed (average⫾ SEM) as recorded by computerized video processing. Columns correspond to trial epoch (cue; outcome, food or air puff; no outcome, sound only) aligned to event onset (time ⫽ 0). Note the overlap of 0.5 s between the start of the outcome and the no-outcome epochs and the last 0.5 s of the cue epoch. Data were averaged for each session and then across sessions (N, number of recording sessions). Color coding of trial types is given at bottom right (A, aversive; N, neutral; R, reward; the number is the outcome probability). d, Normalized behavioral response. Licking (blue) and blinking (red) response (average ⫾ SEM, number of sessions as in c) in a time window around the behavioral event (cue, 500 – 0 ms before cue end; outcome and no outcome, 0 –500 ms after cue end for blinking response and 500 –1000 ms for licking response). The responses are normalized between 0 and 1 [i.e., in each epoch a response ( X) is transformed by (X ⫺ min)/(max ⫺ min), where min and max are the minimal and maximal values of the response in this epoch]. Abscissa, Different behavioral conditions (A, aversive; N, neutral; R, reward; the number is the outcome probability). The second was the probability coding index. This index was defined as the difference index between the events with a high probability ( p ⫽ 2/3 and 1) of receiving an outcome and the event with a low probability ( p ⫽ 1/3) of receiving the same outcome. The clustering of the events into high and low probability followed the behavioral responses of the monkey (see Results) and allowed us to generate a simple graphic representation of our results. A MANOVA of the responses to all the three different probabilities yielded similar results. In addition to the single-cell analysis, we performed population analyses. The responses of striatal TANs and DANs are very stereotypic (Graybiel et al., 1994; Schultz, 1998). Hence, the average population response was estimated by averaging the PSTH deviation from baselineFR across the whole population. To determine whether the population response was significant, we first constructed the single-cell PSTH at bins of 20 ms, and then averaged across the population to obtain the population PSTH. Finally, we performed a t test to check bin by bin whether the population response was significantly different from zero ( p ⬍ 0.01). If the population PSTH was significant for more than three consecutive bins, it was considered a significant population response. The data of the two monkeys were grouped unless a significant difference between the individual monkeys was detected. Data analysis was performed on custom software using MATLAB V7 (Mathworks). Results We recorded the neuronal activity of TANs and DANs (Fig. 1a, Table 1) in parallel with the monitoring of the monkeys’ behavior (Fig. 2). During recordings, the monkeys performed a probabilistic classical conditioning task (Fig. 1b) with food or air puff as the rewarding and aversive outcomes, respectively. This task design provides a symmetric expectation of a rewarding or aversive event after cue presentation and therefore served to test the following three hypotheses. First, DAN and TAN activity reflects expectation, delivery, and omission of reward and of aversive events. The alternative is 17 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11677 livery of aversive events and omission of predicted rewarding event). Third, TAN activity mirrors DAN activity. In a previous study, it was shown that the pause response of TANs was coincident with the increase in DAN activity; however, DANs but not TANs incrementally encoded reward probability (Morris et al., 2004). Here, we test whether this simultaneous opposite response also appears in a task that includes expectation and delivery of aversive events. We also examine whether TANs and DANs discriminate between reward and aversive events during the same parts of the task episode. Monkey behavior reflects expectation of rewarding and aversive events We recorded the monkeys’ behavior during performance of a probabilistic classical conditioning task (Fig. 1b) with food or air puff as the rewarding and aversive outcomes, respectively. We tested how extensive (several months; 5 d/week; ⬃1000 trials/d) conditioning affected the monkeys’ behavior by monitoring licking and blinking responses during neural recordings (Fig. 2a,b). The monkeys increased their licking in response to cues predicting food but only slightly to the aversive and neutral cues (Fig. 2c, top row). Similarly, the monkeys’ frequency of blinking increased to cues predicting air puff but only slightly to reward and neutral cues (Fig. 2c, bottom row). The increase in blinking and licking during the cue epoch was maximal in trials in which the Figure 3. An example of neural activity of a single striatal TAN and identification of striatal cell types. a, Rasters and PSTHs of probability of outcome was 2/3 or 1 and a single TAN of monkey L aligned to the trial behavioral events. The rows are separated according to the expected outcome. First smaller in trials in which the probability row, Trials with cues that predict the delivery of food. Second row, Trials with the neutral cue (a cue always followed by no was 1/3 (0 –500 ms before cue ending; outcome). Third row, Trials with cues that predict an air puff. Columns are aligned according to the trial epoch. First column, Cue p ⬍ 0.01, Tukey’s HSD post hoc). presentation epoch (⫺0.2 to 1 s after cue onset). Second column, Outcome epoch (⫺0.2 to 1 s after delivery of food or air puff). The behavioral responses to food or air Third column, Trials in which no outcome was delivered; outcome omission was signaled to the monkey by the no-outcome sound puff delivery (and their corresponding (⫺0.2 to 1 s after sound onset). Color codes are marked at the left side of the cue rasters (A, aversive; N, neutral; R, reward; the sounds) were not dependent on their prenumber is the outcome probability). For the graphic presentation, rasters were randomly pruned and adjusted to contain the same vious predictions (Fig. 2c, outcome colnumber of trials. The total number of trials (before pruning) was 708. PSTHs were constructed by summing activity across trials in umn). Food and air puff omission, as well 1 ms resolution and then smoothing with a Gaussian window (SD, 20 ms). Examples from three 500 ms segments of the analog as the final (no outcome) event of the neusignal (from first, second, and last third of the recording session) are shown in the middle plot. Examples of spike waveforms are shown next to the 500 ms analog segment. The spike waveform plot includes 100 superimposed waveforms selected randomly tral trials were indicated to the monkeys by around the time of the corresponding analog trace. Isolation score was 0.98; the fraction of spikes in first 2 ms of the interspike an additional “no outcome” sound. When interval (ISI) histogram was 0.00002. b, Off-line analysis of striatal cell identification based on firing pattern (abscissa) and spike expected food or air puff were not delivpeak-to-peak duration (ordinate). Color code: Black, TANs; gray, phasic active neurons (PANs). Off-line analysis of neuron shape ered (no outcome on the p ⫽ 1/3 or p ⫽ and coefficient of variance (CV) of the time of interspike interval shows that striatal neurons are separated into two clusters in 2/3 trials), licking and blinking increased, which the PANs have a large CV with comparably narrow waveforms and TANs have a small CV with very wide waveforms. The cell respectively; this increase was in accorin a is plotted in a large black circle and marked with an arrow. dance with the previously instructed probability. The increase in the licking and that only reward-related events are represented by the activity of blinking behavior was smaller and shorter than the increase after one or both basal ganglia neuromodulators. food or air puff outcomes (Fig. 2c, no outcome). Licking and Second, DAN and TAN activity encode an error in the temblinking increased slightly to the neutral trials (Fig. 2c, no outporal prediction (TD) of reward and aversive events (Sutton and come, green line). Barto, 1998). The TD hypothesis suggests opposite modulations Normalization of the behavioral responses (Fig. 2d) reflects for positive (i.e., delivery and expectation of reward and omission the opposite trends of the response to aversive versus rewardof aversive events) and negative errors (i.e., expectation and deing events. It suggests that the monkeys mainly categorized the 18 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics 11678 • J. Neurosci., November 5, 2008 • 28(45):11673–11684 high-probability ( p ⫽ 2/3 and 1) versus the low-probability ( p ⫽ 1/3) cues. Heart rate analysis can discriminate between high- and low-arousal states (Berntson et al., 1997). However, analysis of the heart rate and its variability did not reveal significant differences between the epochs after aversive versus reward predicting cues, suggesting a symmetric effect on monkey arousal. In sum, the analysis of the behavioral responses indicates the monkeys could distinguish between aversive, reward, and neutral cues and between the cues with high- and low-outcome probabilities. According to these behavioral findings, we grouped the events with high probability ( p ⫽ 2/3 and p ⫽ 1) for the neural activity analysis. The neuronal database We recorded 191 DANs from the SNc and 313 TANs from the putamen; of these, 106 DANs and 180 TANs passed the quality criteria (see Materials and Methods) and their response was further analyzed (Table 1). Figures 3a and 4a show examples of the activity of a TAN and a DAN, respectively, recorded during the performance of the behavioral task. The TANs and DANs were identified on-line (see Materials and Methods) and identity was verified by offline clustering of the spike waveforms, spike train pattern (Figs. 3b, 4b), and occasionally by analysis of the responses to apomorphine injection (Fig. 4c). We found that, in each trial epoch (cue, outcome, and no outcome), most of the cells had a significant response to at least one event (Fig. 5). Below, we provide additional analysis both of the population and the single-cell responses at each epoch and compare the responses to the aversive, neutral, and reward-related events and between the DANs and TANs. Figure 4. An example of neural activity from a single DAN and identification of substantia nigra cell types. a, Same conventions as in Figure 3a. Total number of trials was 271; isolation score was 0.67; fraction of spike in first 2 ms of the ISI histogram was 0.0001. b, Off-line analysis of substantia nigra (SN) cell identification based on firing rate (abscissa) and spike peak-to-peak duration (ordinate). Color code: Black, DANs; gray, substantia nigra pars reticulata (SNr) neurons; light gray, unclassified SN neurons. Off-line analysis of the spike shape and firing rate shows that nigral neurons are separated into two clusters in which the SNr cells have a high firing rate with narrow waveforms and DANs have a low firing rate with wide waveforms. Cells that were not classified as DAN or SNr tended to be between clusters. The cell in a is plotted in a large black circle and marked with an arrow. c, Example of neuronal responses to apomorphine injection in a single recording day. The continuous line is the RMS of the bandpassfiltered analog signal (300 – 6000 Hz) in bins of 10 s. Color code: Black, Electrodes in which a DAN was identified; dotted gray, electrode with a SNr neuron. TAN and DAN activity is asymmetrically modulated by expectation of aversive events and reward in the cue epoch Population analysis of the neuronal activity in the cue epoch shows that, whereas TAN average responses to the aversive, neutral, and reward predicting cues tended to overlap (Fig. 6a, top), the population average activity of the DANs was highly discriminative both between reward and aversive events and between cues with high ( p ⫽ 2/3 and p ⫽ 1) and low ( p ⫽ 1/3) prediction probability of reward delivery. The DANs positive response to aversive cues was smaller than the response to the reward cues (Fig. 6a, bottom). The suppression of TAN activity after highprobability reward cues tended to be longer than after lowprobability cues (Fig. 6a, compare blue with light blue lines) (for similar trends, see Shimo and Hikosaka, 2001; Ravel et al., 2003). As previously reported (Morris et al., 2004), comparison of the time of reward cue modulation showed that the DAN increase and TAN decrease of discharge rate were coincident (Fig. 6b, top) Figure 5. Percentage of TANs and DANs with significant responses to the different behavioral events. The percentage of neurons with significant responses to the cue, outcome, and no-outcome tone events of the total number of studied neurons (n ⫽ 180 TANs and 106 DANs). Color code: Black, TAN; white, DAN. For each epoch, we grouped trials according to trial type (aversive, reward, and neutral). A cell was considered to be significantly modulated in an epoch if at least one of the responses in that epoch was significant. 19 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11679 population did not robustly encode the outcome probability. Disconfirming the TD hypothesis, DANs also increased their activity in response to aversive cues and there was a small (but significant) difference between the DAN responses to aversive cues with different predictions of a future aversive event (hypothesis 2). Finally, the first phase of the responses of the TANs to the visual cues generally mirrored the response of the DANs. DANs but not the TAN population discriminated robustly between reward and aversive cues (hypothesis 3). Both TANs and DANs respond to aversive and reward outcome Population analysis of the neural activity at the time of outcome delivery and coincident sounds showed that both TANs and DANs respond to food and air puff delivFigure 6. TAN and DAN population response at cue epoch. a, Population average response to behavioral cues. Only the first ery, but with a faster response to the air 0.8 s after the cue is shown to highlight the short duration of the responses. Top, TANs (n ⫽ 180 neurons). Bottom, DANs (n ⫽ 106). Color coding: Dark blue, Responses to high-probability ( p ⫽ 1 and p ⫽ 2/3) reward cues; light blue, reward low ( p ⫽ puff (Fig. 8a). Whereas the cumulative re1/3)-probability cues; green, neutral cue; orange, aversive low-probability cues; red, aversive high-probability cues. b, TAN versus sponses of the TANs to aversive and reDAN population response. The populations and timescale are the same as in a. The population response was considered significant ward outcome were similar in magnitude, if it passed the significance criteria (t test, p ⬍ 0.01) for at least three consecutive 20 ms bins. For this analysis, all trials of the same the response of the DANs was larger for the type (aversive or reward) were grouped. Top, TAN versus DAN in the reward trials. Bottom, TAN versus DAN in the aversive trials. reward events. However, although smaller Color coding: Orange, TAN significant bins; white, TAN nonsignificant bins; purple, DAN significant bins; gray, DAN nonsignificant in magnitude, the DANs respond with an bins. excitation to the aversive outcome (Fig. 8a). TANs and DANs activity at reward with slightly shorter lags for the DAN responses. Finally, unlike delivery was larger for the low-probability trials than for the highthe responses to the reward cue, the TAN and DAN responses to probability trials (Fig. 8a). Comparison of the modulation time aversive cues had a significant second phase in which the TANs showed that the TAN and the DAN responses at the outcome increased their activity and DANs decreased their activity (Fig. epoch did not mirror each other. The large significant increase in 6b, bottom). the second phase of the TAN response to the reward outcome was The population average PSTH can be biased by a few neurons coincident with only a small nonsignificant decrease in DAN with an extreme response or opposite effects may be averaged activity (Fig. 8b, top). Furthermore, in the aversive outcome, the out. We therefore formulated the difference index (see Materials second phase of the TAN response overlapped the end of the first and Methods) as a measure of the modulations of a single neuron phase of the DAN response (i.e., the increase in the discharge rate to different events. We grouped responses across probabilities of the TAN overlapped the increase in DAN activity) (Fig. 8b, and tested whether the single-cell responses to reward and averbottom). sive cues were different from the response to the neutral cue. We Single-cell analysis shows that a large fraction of the cells refound that, in both TAN and DAN populations, the response sponded to reward or aversive events (Fig. 9a). However, as in the index (absolute deviation from the neutral response) for the recue epoch, reward probability was better encoded than air puff ward trials was larger than the response index for aversive trials probability in the outcome epoch as well (Fig. 9b), further show(Fig. 7a). A substantial fraction of TANs and DANs showed a ing that TAN and DAN activity was more strongly modulated by significant response index to reward cues, whereas only a small expectation of reward. number of cells had a significant response index to aversive cues To summarize, we found a large modulation of TAN and (Fig. 7a, inset). DAN discharge rate after delivery of food or air puff, but with When separating the DAN and the TAN responses into highprobability coding only for reward (hypothesis 1). We found probability ( p ⫽ 2/3 and 1) and low-probability ( p ⫽ 1/3) cues, larger TAN and DAN activity at reward delivery for the lowcoding of the reward probability was larger and more frequent probability trials than for the high-probability trials. This trend than coding of the aversive probability (Fig. 7b). A multivariate was opposite to the trend found in the cue epoch (large response ANOVA in which we did not group the high probability ( p ⫽ 2/3 to the large probability) as expected according to the TD hypothand 1) cues yielded similar results (data not shown). The differesis (hypothesis 2). However, contrary to the naive TD hypothesis ence between TAN single-cell responses and the TAN population (which predicts that activity moves from outcome to cue during results suggests that single-cell responses had opposite trends and conditioning), we found that many TANs and DANs encoded the were averaged out in the population analysis. aversive outcome, whereas only very few encoded the aversive cue To summarize, we found larger and more frequent single-cell (compare Figs. 8, 9 with Figs. 6, 7). Furthermore, as opposed to modulation of TAN and DAN discharge after reward than after the TD hypothesis, which predicts a decrease, DANs increased aversive predicting cues (hypothesis 1). As expected from the TD activity in response to the aversive outcome. In addition, the hypothesis, DANs code reward probability both at level of the multiphasic response of the TANs, and the major changes obpopulation and the single-unit responses; however, the TAN served in the second excitatory phase of the TAN responses, does 20 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics 11680 • J. Neurosci., November 5, 2008 • 28(45):11673–11684 not enable a straightforward comparison with the TD predictions. Finally, the TAN and DAN activity were not completely coincident, and both populations discriminated between reward and aversive trials (hypothesis 3). TAN but not DAN populations robustly differentiates between omission of rewards and omission of aversive events In contrast to our previous study (Morris et al., 2004), outcome omission was explicitly notified to the monkeys by a typical sound (Fig. 1b). In the 2004 study, the responses of both DANs and TANs to the reward omission were small. Population analysis of the no-outcome events in the current study showed that TANs, but not DANs, had large modulations of their discharge rates. During this epoch, the TAN population response (Fig. 10a) differentiated between the reward trials (food omission), the aversive trials (air puff omission), and the neutral trials (expected no outcome). Furthermore, the suppression of TAN activity was slightly longer after omission of the high-probability reward (Fig. 10a). Analysis of the population PSTHs shows that only the average response of the TANs, but not the DANs to the outcome omission events was significant (Fig. 10b). The multiphase TAN response was not coincident with the phases of the insignificant DAN modulation (Fig. 10b). Finally, we did not find a significant difference between the duration of the DAN responses (Bayer et al., 2007) to the outcome omission after high- and lowprobability cues (data not shown). Single-cell analysis shows that, as in the cue epoch, the response index of both TANs and DANs for the reward trials was larger than the response index for aversive trials (Fig. 11a). A substantial fraction of TANs and DANs showed a significant response index to reward omission, whereas only a smaller number of cells had a significant response index to aversive omission (Fig. 11a, inset). Coding of reward probability was larger and more frequent than coding of the probability of the aversive events (Fig. 11b). The difference between DAN single-cell responses and the population results suggests that some single-cell responses had opposite trends and were averaged out in the population analysis. To summarize, TAN and DAN singlecell modulations were larger for reward than for aversive omission, with probability coding only for reward omission (hypothesis 1). As shown previously, after outcome omission (no outcome) the Figure 7. TAN and DAN single-cell response at cue epoch. a, Scatter plots comparing the response index of individual neurons to reward and aversive cues. Response index was calculated for each cell (n ⫽ 180 TANs and 106 DANs) as the absolute difference between the aversive or reward cue-aligned PSTH and the PSTH of the neutral cue. The black line is the identity (Y ⫽ X ) line. Points below this line represent cells with a response index that is larger for the reward cues than for aversive cues. Top, TAN. Bottom, DAN. Color code: Blue, Response index significant only for reward cues; red, response index significant only for aversive cues; green, both response indices were significant; gray, neither response index was significant. Significance level was p ⬍ 0.05. The time window used for this analysis was 0 –1000 ms from cue presentation. Inset, Pie chart of the fraction of cells with a significant index for reward (blue), aversive (red), and both (green) cues of all cells with significant response index (number of responding of total number of cells is given in the text at inset top). b, Scatter plot comparing the probability coding of individual TAN and DAN neurons. The index was calculated as the difference between the grouped response to the high-probability ( p ⫽ 2/3 and p ⫽ 1) and the low-probability ( p ⫽ 1/3) events. The format and color code are the same as in a. Points below the identity line represent cells with a probability-coding index that is larger for the reward cues than for aversive cues. Figure 8. TAN and DAN population response at outcome delivery. a, Population responses at the time of outcome (food or air puff) and the corresponding sounds delivery. b, Comparison between the responses of TANs and DANs. The conventions are the same as in Figure 6. 21 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11681 reflects other psychological and behavioral processes. We found that rate modulations of TANs and DANs to expectation of reward were larger than the modulation, which followed predictions of aversive events. Furthermore, these neurons encode the expectation level (or the previous probability) of reward better than the expectation of aversive events. Finally, TAN responses were not coincident with DAN responses in all trial epochs. DANs encode the difference between reward and aversive trials in the cue and outcome epoch, whereas the TAN population encodes this difference in the outcome and nooutcome epochs. Therefore, complementary coding of TANs and DANs expands the encoding scope of the basal ganglia neuromodulators. TANs and DANs strongly encode aversive outcome but not aversive expectations There is no consensus regarding the responses of DANs to aversive events. Some studies suggest that at least some of the DANs increase their firing rate after averFigure 9. TAN and DAN single-cell response at outcome epoch. a, Scatter plots comparing the response index of individual sive outcome (for review, see Horvitz, neurons to reward and aversive outcomes. b, Scatter plot comparing the probability-coding index of the single neuron response 2000), whereas others have evidence of a to reward and aversive outcome. The conventions are the same as in Figure 7. decrease (Ungless et al., 2004; Coizet et al., 2006). The reported increase in the firing rate of the DANs and striatal dopamine levels to negative events might be attributable to reward generalization (Mirenowicz and Schultz, 1996; Kakade and Dayan, 2002; Day et al., 2007). However, the blinking and licking behavior observed here indicate that the monkeys were able to reliably discriminate between reward, neutral, and aversive cues. Second, the calculation of the response index as the difference between the responses to appetitive/aversive and the neutral events overcomes the confounding effects of generalization. Finally, we found that the neuronal response to the aversive outcome was faster than responses to reward trials. Thus, DAN responses to aversive events may reflect multiple sources of modulation (see below for error prediction encoding). This may explain some of the inconsistencies between previous experiments. Whereas in the behaving Figure 10. TAN and DAN population response at no outcome. a, Population responses in trials with no food or air puff delivery. animal there can be positive modulaThe same no-outcome tone is given at time ⫽ 0. b, Comparison between the responses of TANs and DANs. The conventions are the tions of DAN discharge in response to same as in Figure 6. aversive events because of attention/ arousal processes, when the animal is DANs encode the TD error weakly (hypothesis 2). The TAN and anesthetized, only discomfort-related activity can be demonDAN activity were not coincident, and the TAN, but not the strated (Ungless et al., 2004; Coizet et al., 2006). DAN, population coded the difference between reward, aversive, As for the TANs, our study confirms the pioneering works and neutral trials robustly (hypothesis 3). showing fast and robust TAN responses to an aversive outcome Discussion (Ravel et al., 1999, 2003). However, our study extends our previIn this manuscript, we have shown that DAN and TAN encoding ous work (Morris et al., 2004) showing minimal differences beis not limited to encoding of reward prediction error but also tween the TAN population responses to cues predicting future 22 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics 11682 • J. Neurosci., November 5, 2008 • 28(45):11673–11684 rewards, to cues predicting future aversive events, and neutral cues. However, we found that unlike the population results that probably represent the average of opposing effects, many single TANs differentiate between reward and neutral trials and encode reward probability (Fig. 7). We further found that the population TANs encode the difference in the reward probability at outcome delivery (Fig. 8a) and that these cells have a large response to outcome omission (Fig. 10a). In the previous work (Morris et al., 2004), we found no discriminative response at outcome delivery and only a low response to reward omission. Future studies should explore whether this lack of concordance is attributable to differences in behavioral paradigms (for example, explicit vs implicit notification of trial termination, operant vs classical conditioning, and only rewarding outcomes vs rewarding and aversive outcomes) or to the animals’ behavioral strategy and confidence in the prediction of future outcome. Figure 11. TAN and DAN single-cell response at no outcome. a, Scatter plots comparing the response index of individual DANs encode more than reward neurons in trials in which food and air puff were not delivered. b, Scatter plot comparing the probability-coding index. The prediction errors conventions are the same as in Figure 7. Recent studies have shown that DAN activity encodes the mismatch between preattention/arousal levels (Horvitz, 2000; Ravel and Richmond, diction and reality. Most of these studies have focused on the 2006; Redgrave and Gurney, 2006). mismatch in the positive domain (i.e., when conditions are better than expected) (Schultz et al., 1997). DANs typically increase Asymmetric encoding of positive and negative expectations their discharge rate in response to appetitive predictive cues and by the basal ganglia outcomes. In line with the predictions of reinforcement learning Previous primate instrumental conditioning experiments in theories, the DAN discharge decreases with omission of predicted which DANs and TANs were recorded did not include expectarewards (Schultz et al., 1993; Fiorillo et al., 2003; Matsumoto and tion of aversive outcome because the air puff could be avoided by Hikosaka, 2007). However, this discharge suppression is limited a correct response (Mirenowicz and Schultz, 1996; Yamada et al., because the neuronal firing rate is truncated at zero. Indeed, sev2004). The symmetric classical conditioning paradigm of this eral groups (Morris et al., 2004; Bayer and Glimcher, 2005) have study, which included reward predicting, aversive predicting, and reported that the instantaneous firing of DANs does not demona neutral cue, enabled us to explore whether there was symmetry strate incremental encoding of reward omission, and it was sugin the encoding of expectation of rewarding versus aversive gested that omission is encoded by duration of the discharge events by the TANs and the DANs. decrease (Bayer et al., 2007). In this experiment, however, we Single-cell analysis revealed that TAN and DAN encoding failed to find any significant coding of reward omission by reof reward expectation and omission was larger and more fresponse amplitude or duration. quent than encoding of expectation and omission of aversive Naive reinforcement learning models categorize events as events (Figs. 7a, 11a). Furthermore, we found that TAN and having positive or negative errors and would suggest opposite DAN encoding of the reward probability was larger and more sign modulation to reward and aversive trials (Schultz et al., frequent than their encoding of the probability of the air puff1997). However, we found similar trends for DAN responses to related events (Figs. 7b, 9b, 11b). The preferential activation to predictions, outcomes, and omission of reward and aversivereward was also apparent in the population response of the related events (Figs. 6a, 8a, 10a). In particular, we found a subDANs at the cue and outcome epoch in which the activity of stantial increase to both reward and aversive outcome. Furtherthese cells was larger and coded the probabilities better (Figs. 6a, 8a). Thus, in line with previous studies (Mirenowicz and more, responses of the DANs to reward omission and aversive Schultz, 1996; Yamada et al., 2007), we show that even in a outcome (Figs. 8a, 10a, respectively) were very different (declassical conditioning task in which the air puff is unavoidable, crease vs increase), although in both cases there was a negative expectation of aversive events is weakly represented in the reinforcement error. basal ganglia activity. To summarize, our results reveal an increase in the complexity of the encoding by the DANs of value. This does not rule out their role in the temporal difference hypothesis. On the contrary, our TANs do not mirror the DAN responses working hypothesis holds that the discharge rate of DANs and The anatomical demonstration of dopaminergic innervations of TANs reflects changes in reward prediction as well as changes in striatal cholinergic interneurons (Lehmann and Langer, 1983) 23 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics J. Neurosci., November 5, 2008 • 28(45):11673–11684 • 11683 ing mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci 10:1020 –1028. Fiorillo CD, Tobler PN, Schultz W (2003) Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299:1898 –1902. Frank MJ, Seeberger LC, O’reilly RC (2004) By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306:1940 –1943. Gourévitch B, Eggermont JJ (2007) A simple indicator of nonstationarity of firing rate in spike trains. J Neurosci Methods 163:181–187. Graybiel AM, Aosaki T, Flaherty AW, Kimura M (1994) The basal ganglia and adaptive motor control. Science 265:1826 –1831. Guarraci FA, Kapp BS (1999) An electrophysiological characterization of ventral tegmental area dopaminergic neurons during differential pavlovian fear conditioning in the awake rabbit. Behav Brain Res 99:169 –179. Gurney K, Prescott TJ, Wickens JR, Redgrave P (2004) Computational models of the basal ganglia: from robots to membranes. Trends Neurosci 27:453– 459. Horvitz JC (2000) Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96:651– 656. Joshua M, Elias S, Levine O, Bergman H (2007) Quantifying the isolation quality of extracellularly recorded action potentials. J Neurosci Methods 163:267–282. Kakade S, Dayan P (2002) Dopamine: generalization and bonuses. Neural Netw 15:549 –559. Lau B, Glimcher PW (2007) Action and outcome encoding in the primate caudate nucleus. J Neurosci 27:14502–14514. Lau B, Glimcher PW (2008) Value representations in the primate striatum during matching behavior. Neuron 58:451– 463. Lehmann J, Langer SZ (1983) The striatal cholinergic interneuron: synaptic target of dopaminergic terminals? Neuroscience 10:1105–1120. Martin RF, Bowden DM (2000) Primate brain maps: structure of the macaque brain. Amsterdam: Elsevier Science. Matsui T, Koyano KW, Koyama M, Nakahara K, Takeda M, Ohashi Y, Naya Y, Miyashita Y (2007) MRI-based localization of electrophysiological recording sites within the cerebral cortex at single-voxel accuracy. Nat Methods 4:161–168. Matsumoto M, Hikosaka O (2007) Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447:1111–1115. Matsumoto N, Minamimoto T, Graybiel AM, Kimura M (2001) Neurons in the thalamic CM-Pf complex supply striatal neurons with information about behaviorally significant sensory events. J Neurophysiol 85:960 –976. Mirenowicz J, Schultz W (1996) Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379:449 – 451. Moran A, Bar-Gad I, Bergman H, Israel Z (2006) Real-time refinement of subthalamic nucleus targeting using Bayesian decision-making on the root mean square measure. Mov Disord 21:1425–1431. Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H (2004) Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43:133–143. Ravel S, Richmond BJ (2006) Dopamine neuronal responses in monkeys performing visually cued reward schedules. Eur J Neurosci 24:277–290. Ravel S, Legallet E, Apicella P (1999) Tonically active neurons in the monkey striatum do not preferentially respond to appetitive stimuli. Exp Brain Res 128:531–534. Ravel S, Legallet E, Apicella P (2003) Responses of tonically active neurons in the monkey striatum discriminate between motivationally opposing stimuli. J Neurosci 23:8489 – 8497. Redgrave P, Gurney K (2006) The short-latency dopamine signal: a role in discovering novel actions? Nat Rev Neurosci 7:967–975. Reynolds JN, Hyland BI, Wickens JR (2001) A cellular mechanism of reward-related learning. Nature 413:67–70. Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80:1–27. Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900 –913. Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599. and the suppression of acetylcholine efflux from striatal slice by dopamine (Stoof et al., 1992) suggest that DANs directly inhibit the TANs (Wang et al., 2006). TANs might mediate the dopaminergic message to the D1 and D2 dopamine receptor containing striatal projection neurons. The opposite and coincident responses of the TANs and DANs to predictive cues (Fig. 6) support direct inhibition. However, TAN responses at the terminal stage of the trial (Figs. 8b, 10b) include major positive deflections that do not mirror any phase of the dopaminergic response. Notably, after outcome omission, DANs respond similarly to the neutral outcome, reward, and air puff omissions, whereas the TANs robustly discriminate between the three events (Fig. 10). Thus, DANs may better encode the cue predicting events and the TANs may provide more information at the completion of the trial. This is consistent with the findings of subpopulations of striatal projection neurons with selective evaluative encoding of trial results (Lau and Glimcher, 2007, 2008). In any case, these differential responses indicate that the TAN discharge is not totally governed by its dopaminergic inputs; neither are the TANs and DANs driven by a common source (Matsumoto et al., 2001) with opposite effects on the two systems. Concluding remarks In this study, we showed that the dopaminergic and the cholinergic neuromodulators of the basal ganglia encode the positive domain of behavior in a nonredundant manner. This asymmetric encoding of behavior suggests that the basal ganglia collaborate with other neuronal systems to shape the animal’s response to diverse environmental events. The characteristics and interactions of these different neuronal systems may provide the basis for asymmetric, irrational human attitudes toward rewarding and aversive events (Tversky and Kahneman, 1981). Finally, the stronger involvement of the basal ganglia in positive reinforcement learning is congruent with the findings that parkinsonian patients are better at learning to avoid choices that lead to negative outcomes than they are at learning from positive outcomes (Frank et al., 2004). References Aebischer P, Schultz W (1984) The activity of pars compacta neurons of the monkey substantia nigra is depressed by apomorphine. Neurosci Lett 50:25–29. Arbuthnott GW, Wickens J (2007) Space, time and dopamine. Trends Neurosci 30:62– 69. Barbeau A (1962) The pathogenesis of Parkinson’s disease: a new hypothesis. Can Med Assoc J 87:802– 807. Bar-Gad I, Bergman H (2001) Stepping out of the box: information processing in the neural networks of the basal ganglia. Curr Opin Neurobiol 11:689 – 695. Bayer HM, Glimcher PW (2005) Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47:129 –141. Bayer HM, Lau B, Glimcher PW (2007) Statistics of midbrain dopamine neuron spike trains in the awake primate. J Neurophysiol 98:1428 –1439. Berntson GG, Bigger JT Jr, Eckberg DL, Grossman P, Kaufmann PG, Malik M, Nagaraja HN, Porges SW, Saul JP, Stone PH, van der Molen MW (1997) Heart rate variability: origins, methods, and interpretive caveats. Psychophysiology 34:623– 648. Calabresi P, Centonze D, Gubellini P, Pisani A, Bernardi G (2000) Acetylcholine-mediated modulation of striatal function. Trends Neurosci 23:120 –126. Coizet V, Dommett EJ, Redgrave P, Overton PG (2006) Nociceptive responses of midbrain dopaminergic neurones are modulated by the superior colliculus in the rat. Neuroscience 139:1479 –1493. Day JJ, Roitman MF, Wightman RM, Carelli RM (2007) Associative learn- 24 Results I Joshua et al. • Value Encoding by Basal Ganglia Critics 11684 • J. Neurosci., November 5, 2008 • 28(45):11673–11684 neurons in the ventral tegmental area by aversive stimuli. Science 303:2040 –2042. Wang Z, Kai L, Day M, Ronesi J, Yin HH, Ding J, Tkatch T, Lovinger DM, Surmeier DJ (2006) Dopaminergic control of corticostriatal long-term synaptic depression in medium spiny neurons is mediated by cholinergic interneurons. Neuron 50:443– 452. Yamada H, Matsumoto N, Kimura M (2004) Tonically active neurons in the primate caudate nucleus and putamen differentially encode instructed motivational outcomes of action. J Neurosci 24:3500 –3510. Yamada H, Matsumoto N, Kimura M (2007) History- and current instruction-based coding of forthcoming behavioral outcomes in the striatum. J Neurophysiol 98:3557–3567. Shimo Y, Hikosaka O (2001) Role of tonically active neurons in primate caudate in reward-oriented saccadic eye movement. J Neurosci 21:7804 –7814. Stoof JC, Drukarch B, de Boer P, Westerink BH, Groenewegen HJ (1992) Regulation of the activity of striatal cholinergic neurons by dopamine. Neuroscience 47:755–770. Sutton RS, Barto AG (1998) Reinforcement learning—an introduction. Cambridge, MA: MIT. Szabo J, Cowan WM (1984) A stereotaxic atlas of the brain of the cynomolgus monkey (Macaca fascicularis). J Comp Neurol 222:265–300. Tversky A, Kahneman D (1981) The framing of decisions and the psychology of choice. Science 211:453– 458. Ungless MA, Magill PJ, Bolam JP (2004) Uniform inhibition of dopamine 25 Results II J Neurophysiol 101: 758 –772, 2009. First published December 3, 2008; doi:10.1152/jn.90764.2008. Encoding of Probabilistic Rewarding and Aversive Events by Pallidal and Nigral Neurons Mati Joshua,1,2 Avital Adler,1,2 Boris Rosin,1 Eilon Vaadia,1,2 and Hagai Bergman1,2,3 1 Department of Physiology, The Hebrew University–Hadassah Medical School; and 2The Interdisciplinary Center for Neural Computation and 3Eric Roland Center for Neurodegenerative Diseases, The Hebrew University, Jerusalem, Israel Submitted 14 July 2008; accepted in final form 1 December 2008 INTRODUCTION The neural network of the basal ganglia (BG) is commonly viewed as two functionally related subsystems (e.g., Bar-Gad and Bergman 2001; Gurney et al. 2004): the neuromodulator subsystem and the main-axis subsystem. The neuromodulators (e.g., midbrain dopaminergic neurons and cholinergic tonically active interneurons of the striatum, DANs and TANs, respectively) control plasticity of the corticostriatal synapse (Calabresi et al. 2000; Reynolds et al. 2001). The main-axis subsystem includes connections between all neocortical areas, the amygdala and the hippocampus and the BG input structures, i.e., the striatum (caudate, putamen, and ventral striatum) and the subthalamic nucleus. These project both directly and indirectly through the external segment of the globus pallidus (GPe) to the BG output structures: the internal segment of the globus pallidus (GPi) and the substantia nigra pars reticulata (SNr). The GPi and SNr modify behavior through their proAddress for reprint requests and other correspondence: M. Joshua, Department of Physiology, The Hebrew University–Hadassah Medical School, POB 12272, Jerusalem 91120, Israel (E-mail: [email protected]). 758 jection to the frontal cortex (via the thalamus) and brain stem premotor nuclei (Haber and Gdowski 2004). Previous studies on primates have shown that BG neuromodulator activity is modulated by expectation, delivery, and omission of rewards (Morris et al. 2004; Nakahara et al. 2004; Ravel et al. 2001; Schultz 1998). These data have been modeled in a reinforcement framework in which the dopamine neurons could signal prediction error (Schultz et al. 1997). Reward modulation of the main axis has mainly been studied at the level of the striatum (Apicella et al. 1992; Lau and Glimcher 2007; Lauwereyns et al. 2002; Samejima et al. 2005). Several studies have revealed discharge modulation of pallidal and SNr neurons by reward (Gdowski et al. 2001; Handel and Glimcher 2000; Pasquereau et al. 2007; Turner and Anderson 2005) and even by the probability of future reward (Arkadir et al. 2004). Nevertheless, understanding the full domain of value encoding by a neural network calls for study of neuronal responses to expectation, delivery, and omission of predicted aversive events as well. We recently reported that the responses of DANs and TANs of monkeys engaged in a probabilistic conditioning task involving both aversive and appetitive outcomes are biased toward the encoding of the rewarding events (Joshua et al. 2008). The BG main axis may be affected by other neuromodulator systems, e.g., serotonin (Daw et al. 2002; Parent et al. 1995), and thus may have a broader encoding domain than that of the TANs and the DANs. However, there are no studies on the responses of the primate BG main-axis high-frequency discharge (HFD) neurons to expectation of deterministic or probabilistic aversive events. We therefore used the same classical conditioning paradigm with aversive and rewarding probabilistic outcomes used in a previous study (Joshua et al. 2008) and recorded the activity of GPe, GPi, and SNr neurons in the same two monkeys that served as subjects for the recording of DANs and TANs activity. This enabled us to compare the different structures of the main axis and these structures and the main BG neuromodulators. We limited this study to the major neuronal population of these BG structures: the HFD neurons (DeLong 1971; Elias et al. 2007; Schultz 1986). METHODS All experimental protocols were performed in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and with the Hebrew University guidelines for the use and care of laboratory animals in research, supervised by the instituThe costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 0022-3077/09 $8.00 Copyright © 2009 The American Physiological Society 26 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 Joshua M, Adler A, Rosin B, Vaadia E, Bergman H. Encoding of probabilistic rewarding and aversive events by pallidal and nigral neurons. J Neurophysiol 101: 758 –772, 2009. First published December 3, 2008; doi:10.1152/jn.90764.2008. Previous studies have rarely tested whether the activity of high-frequency discharge (HFD) neurons of the basal ganglia (BG) is modulated by expectation, delivery, and omission of aversive events. Therefore the full value domain encoded by the BG network is still unknown. We studied the activity of HFD neurons of the globus pallidus external segment (GPe, n ⫽ 310), internal segment (GPi, n ⫽ 149), and substantia nigra pars reticulata (SNr, n ⫽ 145) in two monkeys during a classical conditioning task with cues predicting the probability of food, neutral, or airpuff outcomes. The responses of BG HFD neurons were longlasting and diverse with coincident increases and decreases in discharge rate. The population responses to reward-related events were larger than the responses to aversive and neutral-related events. The latter responses were similar, except for the responses to actual airpuff delivery. The fraction of responding cells was larger for rewardrelated events, with better discrimination between rewarding and aversive trials in the responses with an increase rather than a decrease in discharge rate. GPe and GPi single units were more strongly modulated and better reflected the probability of reward- than aversive-related events. SNr neurons were less biased toward the encoding of the rewarding events, especially during the outcome epoch. Finally, the latency of SNr responses to all predictive cues was shorter than the latency of pallidal responses. These results suggest preferential activation of the BG HFD neurons by rewarding compared with aversive events. Results II VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS tional animal care and use committee. Methods are explained in detail in a previous study (Joshua et al. 2008). Here we present a brief summary of these methods but describe in detail methods not used in the previous study. Behavioral task Recording and data acquisition During the acquisition of the neuronal data, two experimenters (MJ and AA) controlled the position of eight coated tungsten microelectrodes (impedance 0.2– 0.8 M⍀ at 1,000 Hz), and the real-time spike sorting (AlphaMap, Alpha Spike Detector, Alpha-Omega Engineering) of the eight electrodes. Recorded units were subjected to off-line quality analysis that included tests for rate stability, refractory period, waveform isolation, and recording time. First, firing rate as a function of time during the recording session was graphically displayed and the largest continuous segments of stable data were selected for further analysis. Second, cells in which ⬎0.02 of the total interspike intervals were ⬍2 ms were excluded from the database. Third, only units with an isolation score (Joshua et al. 2007) ⬎0.8 were included in the database. Finally, only cells that met the above-cited inclusion criteria for ⬎20 min during the performance of the behavioral task were included in the neural database (average 56 min and 307 trials). Table 1 provides the statistics for the cells that were included in the analysis database. GPe neurons were identified according to their stereotaxic coordinates (based on magnetic resonance imaging [MRI] and primate atlas data) and their real-time physiological identification. These physiological parameters included the characteristic symmetric, narrow, and high-amplitude spike shape; the typical firing rate and pattern (DeLong 1971); and the neuronal activity of the striatum obtained earlier P=1/3 P=2/3 A P=2/3 P=1/3 Cue Cue Outcome Cue 1 N = 105 0.5 0.5 0 0 0 2 1 0 Time(s) Outcome ITI C A1 A2/3 A1/3 N1 R1/3 R2/3 R1 1 1 Licking (fraction of licking) B Blinking (fraction eye closed) Outcome ITI ... Outcome Cue 1 N = 126 0.5 0.5 2 No outcome 0 0 0 B1 Time(s) 2 1 0 Time(s) 2 No outcome 0.5 0.5 0 0 1 Time(s) 2 0.5 0 Time(s) 0 2 0 Time(s) 2 FIG. 1. Behavioral task and results. A: flow of the behavioral task. Two representative trials are shown. The outcome delivery on each trial was randomized according to a probability associated with the cue. Cue duration ⫽ 2 s; Outcome ⫽ 0.1– 0.15 s; Intertrial Interval (ITI) ⫽ 3– 8 s. Two of 7 possible cues with a different probability for food and airpuff are shown. Trial order and ITI length were randomized. B: fraction of trials with eyes closed (line: average; shadow: SE) as detected by computerized eye state detection (ESD) algorithm. Trial epoch (cue, outcome: food or airpuff; no outcome: sound only) are aligned to event onset (time ⫽ 0). Note the overlap of 0.5 s between the start of the Outcome and the No-outcome epochs and the last 0.5 s of the Cue epoch. Data were averaged for each session (several hundred trials) and then across sessions (n ⫽ 105, number of recording sessions; 89,727, total number of trials). Color coding of trial types is given on the right (A, Aversive; N, Neutral; R, Reward; the number is the outcome probability). B1: enlargement of the last second of the cue epoch. C: fraction of trials with licking as computed according to an infrared reflection detector signal directed at the monkey’s mouth. Same conventions as in B (n ⫽ 125 number of recording session; 113,022, total number of trials). J Neurophysiol • VOL 101 • FEBRUARY 2009 • 27 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 Two monkeys (L and S, Macaca fascicularis, female 4 kg and male 5 kg) were introduced to seven different fractal visual cues, each predicting the outcome in a probabilistic manner. Three cues (reward cues) predicted a food outcome (L: 0.4 ml, 100-ms duration; S: 0.6 ml, 150-ms duration) with delivery probabilities of 1/3, 2/3, and 1; three cues (aversive cues) predicted an airpuff outcome (100- and 150-ms duration for L and S, respectively; 50 –70 psi; split and directed 2 cm from each eye; Airstim System, San Diego Instruments) with delivery probabilities of 1/3, 2/3, and 1. The seventh cue (the neutral cue) was never followed by a food or an airpuff outcome. The full-screen cues were presented on a 17-in. monitor (located 50 cm from the monkeys’ eyes) for 2 s and were immediately followed by an outcome (food, airpuff) or no outcome, according to the probabilities associated with the cue. Outcomes and outcome omissions were signaled by one of three sounds that discriminated the three possible events: a drop of food, an airpuff, or no outcome. Trials were followed by a variable intertrial interval (ITI, monkey S: 3–7 s; monkey L: 4 – 8 s; Fig. 1A). 759 Results II 760 TABLE JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN 1. The neural database Population Number of Cells GPe L: 191 S: 119 GPi L: 82 S: 67 SNr L: 68 S: 77 Isolation Score Fraction ISI ⬍2 ms Discharge Rate, spikes/s Recorded Time, s Number of Recorded Trials Number of Spikes/ Recorded Cell 0.95 ⫾ 0.05 [0.80–0.99] 0.96 ⫾ 0.04 [0.80–0.99] 0.96 ⫾ 0.04 [0.82–0.99] 0.94 ⫾ 0.05 [0.82–0.99] 0.95 ⫾ 0.05 [0.81–0.99] 0.96 ⫾ 0.04 [0.82–0.99] 0.0034 ⫾ 0.0042 [0–0.020] 0.0020 ⫾ 0.0038 [0–0.018] 0.0034 ⫾ 0.0037 [0–0.017] 0.0036 ⫾ 0.0046 [0–0.020] 0.0025 ⫾ 0.0042 [0–0.019] 0.0020 ⫾ 0.0040 [0–0.020] 83.3 ⫾ 21.7 [27–171] 77.6 ⫾ 25.7 [22–160] 88.1 ⫾ 21.9 [38–153] 83.2 ⫾ 18.7 [42–124] 56.5 ⫾ 26.1 [6–108] 45.1 ⫾ 17.1 [14–91.6] 3,574.0 ⫾ 2,136 [1,260–10,618] 3,188.7 ⫾ 1,591 [1,260–7,740] 3,350.0 ⫾ 1,136 [1,260–9,719] 2,572.0 ⫾ 1,136 [1,260–5,580] 3,976.0 ⫾ 2,136 [1,440–11,341] 3,062.0 ⫾ 1,136 [1,260–8,640] 304 ⫾ 187 [103–988] 341 ⫾ 198 [111–1,183] 285 ⫾ 150 [108–829] 252 ⫾ 118 [112–535] 352 ⫾ 219 [125–981] 301 ⫾ 176 [124–870] 301,810 ⫾ 207,937 [44,456–1,298,582] 246,557 ⫾ 145,823 [43,515–671,140] 291,892 ⫾ 157,110 [71,502–795,572] 212,567 ⫾ 114,868 [75,577–591,935] 238,492 ⫾ 225,442 [20,370–1,013,418] 133,548 ⫾ 91,485 [40,366–563,125] Values are means ⫾ SD, with the range of scores in brackets. Recording statistics were calculated separately for each monkey and each neural population. The range of the isolation score is 0 to 1. Fraction ISI ⬍2 ms is the fraction of ISIs shorter than 2 ms out of all ISIs of a cell. Recording time and number of recorded trials represent only the part of the recording satisfying the inclusion criteria and included in the analysis database. J Neurophysiol • VOL At the end of the experiment the chamber and head holder of both monkeys were removed, the skin was sutured, and following a recovery period the monkeys were sent to a primate sanctuary (http:// monkeypark.co.il). Statistical analysis of population responses Responses of the HFD neurons in the GP (Arkadir et al. 2004; Georgopoulos et al. 1983; Mink and Thach 1991b; Mitchell et al. 1987a; Turner and Anderson 2005) and SNr (Nevet et al. 2007; Sato and Hikosaka 2002) to behavioral events are composed of either increases or decreases in discharge rate. For this reason, responses of BG main-axis neurons were calculated as the absolute deviation from the baseline of the firing rate (baselineFR) and then averaged across the population. However, this statistic does not have a natural zero baseline. To obtain such a baseline we calculated the average of the same statistic (i.e., absolute deviation from baseline) in the last 3 s of the ITI when using the same number of trials as those used for the calculation of the cell response and denoted it as baselineabs. First, we define baselineFR as baseline FR ⬅ mean 关psthITI_END共t兲兴 0ⱕtⱕ3 Then baselineabs is defined as baseline abs ⬅ mean兵abs关psthITI_END共t兲 ⫺ baselineFR 兴其 0ⱕtⱕ3 Note that baselineabs calculates the mean fluctuations of the baseline firing rate around baselineFR. We then subtract this value from the response, i.e. response共t兲 ⬅ abs关path共t兲 ⫺ baselineFR 兴 ⫺ baselineabs The average population response was defined as the average of the responses of all units (Figs. 3A, 5A, and 7A). To validate results obtained using this statistic we divided each cell’s response into 1-ms bins with either increases or decreases in firing rate. We then averaged these responses separately across the populations. This analysis yielded the same qualitative result as the former (data not shown). In addition, we calculated the average peristimulus time histogram (PSTH) without the absolute operation (Supplemental Fig. S1).1 Finally, some of the neurons had sustained ITI activity after reward delivery; we analyzed the population responses to cues following trials with no reward; however, analysis yielded the same results as those of the whole population analysis (data not shown). 1 The online version of this article contains supplemental data. 101 • FEBRUARY 2009 • 28 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 in the same electrode trajectory to the GPe. The GPe cells can be categorized into two subgroups (DeLong 1971): one with a highfrequency discharge rate (in this study ⬎20 Hz, HFD) and the other with a low-frequency discharge rate (LFD). Typically the discharge of the HFD neurons was found to be interrupted by long intervals of total silence (Elias et al. 2007) and the LFD firing pattern usually included short bursts with the amplitude of the spike declining along the burst. Pallidal border cells (Bezard et al. 2001; DeLong 1971; Mitchell et al. 1987b) were identified by their typical regular firing pattern and broad action potentials and were excluded from the study database. Cells were also recorded from the output structures of the basal ganglia: the GPi and the SNr. Neurons of both structures were identified according to their stereotaxic coordinates (based on MRI and primate atlas data) and real-time physiological recordings. For GPi neurons, the identification criteria constituted the depth of the electrode, the physiological identification of border cells between the GPe and the GPi (DeLong 1971), and the real-time assessment of the firing pattern of the cell. SNr neurons were identified according to the electrophysiological characteristics (narrow spike shape and high firing rate) of the cells (DeLong et al. 1983; Schultz 1986) and the firing characteristics of neighboring neurons and fibers (e.g., fibers of the internal capsule, SN pars compacta (SNc) dopaminergic neurons, and fibers of the oculomotor nerve). We estimated the stereotaxic coordinates of the physiological recordings within the basal ganglia nuclei by alignment of MRI scans and the primate atlas (Martin and Bowden 2000) sections. By using these anatomical and physiological criteria we attempted to sample all territories of the three studied BG nuclei. Three computerized digital video cameras recorded the monkey’s face and upper limbs at 50 Hz. Video analysis was carried out on custom software to identify periods when the monkeys closed their eyes. Briefly, the monkey’s eye location was identified by a human observer (once for a daily recording session in which the monkey’s head was immobilized by connecting the head holder to an external metal frame); a classification of eye states (open or closed) was made based on the number of dark pixels in the eye area. The eye state detection (ESD) algorithm was tested by random samples from several recording days and found to be consistent with the judgments of a human observer for ⬎99% of the images. Mouth movements were monitored by an infrared reflection detector (Dr. Bouis Devices, Karlsruhe, Germany). The infrared signal was filtered between 1 and 100 Hz by a band-pass four-pole Butterworth filter and sampled at 1.56 kHz. Based on these recordings we detected times in which the monkeys moved their mouths by implementing a threshold-based method. We compared mouth-movement detection with the video of the monkeys’ faces over several recording days and found that they were consistent. Results II VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS Statistical analysis of single-unit responses We defined the difference index between the responses of a single cell to two events as the mean absolute difference between the corresponding PSTHs and used resampling (bootstrap) methods to test the significance of this index (Joshua et al. 2008). We calculated two difference indices: the first, the response index, measures the difference between the reward or aversive event and the neutral event. The second, the probability coding index, measured the difference between responses to the events with a high probability (P ⫽ 2/3 and 1) of receiving an outcome and responses to the events with a low probability (P ⫽ 1/3) of receiving the same outcome. We also calculated the temporal evolution of the fraction of cells with significant probability discrimination. Responses were binned in nonoverlapping 100-ms bins and tested for significance (ANOVA test, P ⬍ 0.01) at each time bin (Supplemental Fig. S2). Note that the statistical significance of the response and probability coding index analyses depends on the number of trials. In the response index analysis we compare the reward or aversive responses to the neutral trial response; however, there are relatively fewer neutral than TABLE aversive or rewarding trials. In the probability coding index analysis we compare the high and low probabilities that are usually introduced more often than the neutral cue (threefold more for the low-probability cue and fourfold more for the high-probability cue). Due to these limitations we did not compare between the response and probability coding indices but only between the same indices when the number of trials was similar (e.g., we compared the response index for the reward and aversive trials). The responses of most HFD pallidal and SNr neurons to the cue were sustained and thus the deviation from rate baseline in the outcome and no-outcome epochs could be the result of a slow decay from the sustained cue-related activity. We tested whether activity after the ending of the cue (average rate in 1 s) differed significantly (t-test, P ⬍ 0.05) from both the activity before the cue (0.5 s precue) and from the activity at the end of the cue epoch (0.5 s before cue ending). Cells in which both of these tests were significant and activity did not fall between the precue and end of the cue activity were considered to have a response that was not suspected to be due to decay of their discharge back to baseline level. RESULTS Monkey behavior reflected expectation of rewarding and aversive events We recorded the monkeys’ behavior during performance of a probabilistic classical conditioning task (Fig. 1A) with food or airpuff as the rewarding and aversive outcomes, respectively. We tested how extensive conditioning (several months, 5 days/week, ⬃1,000 trials/day) affected the monkeys’ behavior by monitoring licking and blinking responses during neural recordings (Fig. 1, B and C). Figure 1 shows the average frequency of blinking and licking in all trial epochs. The frequency of licking increased in response to cues predicting food but only slightly to the aversive and neutral cues (Fig. 1C). Similarly the monkeys’ frequency of blinking increased to cues predicting airpuff but only slightly to reward and neutral cues (Fig. 1B). The increase in blinking and licking during the cue epoch was maximal in trials where the probability of outcome was 2/3 or 1 and smaller in trials where the probability was 1/3. The frequency of the behavioral responses to reward and aversive events was only slightly larger for the licking versus the blinking responses. For example, at the end of the R1 (Reward, P ⫽ 1) cue presentation the monkeys increased their licking frequency from baseline by 40%, whereas in the A1 (Aversive, P ⫽ 1) cue the monkeys increased their blinking frequency by 35% (t-test, P ⫽ 0.057). 2. The fraction of cells with a significant increase or decrease in discharge response Cue Outcome Reward Aversive No Outcome Reward Aversive Reward Aversive Population Inc Dec Inc Dec Inc Dec Inc Dec Inc Dec Inc Dec GPe GPi SNr 17.3 12.0 25.3 5.7 4.8 14.5 5.9 6.5 13.9 4.0 3.2 11.1 20.0 16.7 24.0 7.6 8.6 15.0 5.6 5.7 12.7 3.7 2.5 10.0 7.7 7.6 17.7 3.9 3.2 4.9 2.4 2.1 5.5 1.8 1.2 5.4 For each time bin (1 ms) the percentage of cells that responded to a given event by a significant (3-sigma rule) increase (Inc) or decrease (Dec) in firing rate was calculated; this percentage was then averaged across all time bins of each of the three epochs. Epoch duration of 2,000 ms, starting at the event, was used for the three epochs. For example, in the GPe the average percentage of cells that responded with an increase in firing rate at the cue epoch was 17.3% (Cue–Reward–Inc–GPe entry). Note that this measure gives a smaller percentage from the overall number of responding cells since it is dependent on the fraction of bins with a significant modulation. Each neuron might make a different contribution to this average according to the number of significant response bins in the relevant epoch. J Neurophysiol • VOL 101 • FEBRUARY 2009 • 29 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 To determine significant responses in the single-unit analysis we calculated the SD of the PSTH of the last 3 s of the ITI using the same number of trials as in the target PSTH and identified time segments in which the response exceeded threefold the ITI SD (3-sigma rule). A response was considered significant only if the duration of the deviant segment was ⬎60 ms (threefold the SD of the smoothing filter). To obtain the number of time bins in which a cell had a significant response to an event, we calculated the fraction of cells that had a significant response in each 1-ms time bin after an event. We divided these responses into increases and decreases in the firing rate and calculated the fraction of cells that increased their firing rate and the fraction of cells that decreased their firing rate during the response epoch (Figs. 3B, 5B, and 7B and Table 2). The latency of a response was defined as the first bin in which a significant (3-sigma rule) response was detected. This conservative estimate of response latency enables comparison of the relative latencies of different neuronal populations; however, other methods (e.g., Berenyi et al. 2007; Ritov et al. 2002) might yield different estimates of the response latencies. For each population we calculated the median of the response latency and the confidence interval (CI) of this median. The CI was calculated by resampling (bootstrapping with repetitions) the latencies and recalculating the median of these surrogates. We repeated this process 1,000 times and the 95% CI was determined as the boundary values that included 95% of the median surrogates (excluding 2.5% above and 2.5% below the boundaries). Calculating the CI with bias correction gave similar results. 761 Results II JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN The behavioral responses to food or airpuff delivery were not dependent on their previous predictions (Fig. 1, B and C, outcome). Food and airpuff omission, as well as the final (no-outcome) event of the neutral trials, were indicated to the monkeys by a “no-outcome” sound. When expected food or airpuff was not delivered (no outcome of the P ⫽ 1/3 or P ⫽ 2/3 trials) the licking and blinking frequency increased, respectively; this increase was in line with the previously instructed probability. Licking and blinking increased slightly to the neutral trials (Fig. 1, B and C, no outcome, green line). Analysis of the behavioral responses indicates that the monkeys could distinguish between aversive, reward, and neutral cues and between the high (P ⫽ 2/3 and 1) and low (P ⫽ 1/3) outcome reward 140 B no outcome R1 R2/3 R1/3 N1 A1/3 A2/3 A1 40 140 40 1 C 2 0 Spike/s reward cue 1 2 Time(s) outcome cue aversive neutral Spike/s 40 0 0 1 no outcome 70 40 70 40 70 40 2 0 no outcome outcome 1 2 0 1 2 Time(s) 0 1 2 D 150 100 50 R1 150 100 50 A1 0.5 s E 0 1 2 0 1 2 Time(s) 0 1 0.5 s 0.1 s 0.1 mV 150 100 50 0.1 mV aversive neutral outcome We recorded 592 GPe, 267 GPi, and 226 SNr units during the performance of the probabilistic conditioning task (Fig. 1A); out of these, 310 GPe, 149 GPi, and 145 SNr units passed the quality inclusion criteria (see METHODS) and their responses were further analyzed (Table 1). Figure 2 shows examples of the responses of neurons form GPe, GPi, and SNr to the 18 events of our behavioral task. The GPe neuron in Fig. 2A had a large response in the reward- Downloaded from jn.physiology.org on March 1, 2009 aversive neutral cue 140 Neuronal database reward A probabilities. Accordingly, we grouped the events with high probability (P ⫽ 2/3 and P ⫽ 1) for the neural activity analysis. Spike/s 762 2 FIG. 2. Neural activity of neurons of the globus pallidus external and internal segments (GPe and GPi, respectively) and substantia nigra pars reiculata (SNr). A: peristimulus time histograms (PSTHs) of a single GPe cell of monkey L aligned to the trial behavioral events. The rows are separated according to the expected outcome. First row: trials with cues that predict the delivery of food. Second row: trials with the neutral cue (a cue always followed by no outcome). Third row: trials with cues that predict an airpuff. Columns are aligned according to the trial epoch. First column: cue presentation epoch (⫺0.5 to 2 s after cue onset). Second column: outcome epoch (⫺0.5 to 2 s after delivery of food or airpuff). Third column: trials in which no outcome was delivered; outcome omission was signaled to the monkey by the no-outcome sound (⫺0.5 to 2 s after sound onset). The first 0.5 s of the 2nd and 3rd columns overlaps the last 0.5 s of the left column. Gray-level codes are marked on the middle plot (A, Aversive; N, Neutral; R, Reward; the number is the outcome probability). PSTHs were constructed by summing activity across trials in 1-ms resolution and then smoothing with a Gaussian window (SD of 20 ms). Total number of trials in this example ⫽ 511; isolation score ⫽ 0.98; fraction of spikes in first 2 ms of the interspike interval (ISI) histogram ⫽ 0.0007. B: same conventions as in A for a GPi neuron. Total number of trials ⫽ 530; isolation score ⫽ 0.99, fraction of spikes in first 2 ms of the ISI histogram ⫽ 0.0002. C: same conventions as in A for a SNr neuron. Total number of trials ⫽ 234; isolation score ⫽ 0.98, fraction of spikes in first 2 ms of the ISI histogram 0.0007. D: example of 2 raster plots from the SNr neuron in C. Top: raster of the R1 cue. Bottom: raster of the A1 cue. Vertical black arrows mark the cue onset. E: an example of the analog data (after digital 250- to 6,000-Hz band-pass filter) for a single trial (marked by a gray horizontal arrow in D). The last row contains a magnified 0.75-s segment from the 2.5-s analog segment above. J Neurophysiol • VOL 101 • FEBRUARY 2009 • 30 www.jn.org Results II VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS Activity was asymmetrically modulated by expectation of aversive events and reward in the cue epoch Figure 3 shows the population analysis of the absolute response (deviation from the background discharge rate) of A GPe, GPi, and SNr neurons to the cues. The absolute population response (see METHODS) was sustained and spanned the complete (2-s) duration of the cue epoch. The GPe and SNr population responses to reward cues were significantly larger than the responses to aversive and neutral cues (Fig. 3A). Furthermore, in the beginning of the cue epoch, responses were larger for the cues indicating a high probability of future reward than for the low-probability cue; however, this probability-dependent difference was not observed for aversive cues. Compared with the large differential modulation of the GPe and SNr, the difference in GPi population response between reward and aversive events was small and the population response did not robustly differentiate reward probabilities (Fig. 3A). We used the absolute operator to examine the deviation of the discharge rate of the BG main axis from their baseline (ITI) discharge rate since the high-frequency tonic discharge (Table 1) of these neurons enables them to respond with both increases and decreases in their discharge rate. Absolute population analysis assumes that opposite modulations can be detected by the nervous system (for example, due to specificity in connectivity); however, this may not be the case. Thus we also performed the population analysis without using absolute operator. This analysis assumes that target structures are homogeneously innervated by neurons of the studied structure and do not keep labeled lines for individual neurons with increases or decreases in discharge rate. This standard population analysis revealed the same trends of larger responses for reward cues (Supplemental Fig. S1). B 12 0.5 GPe Reward Aversive GPe Rwd High P Rwd Low P Neutral Avr Low P Avr High P 0 0 12 Spike/s 0 0 0.5 Decrease Increase Fraction of cells GPi 12 0.5 0 2 2 GPi 0.50 0.5 SNr 2 2 SNr FIG. 3. Population response in the cue epoch. A: population responses (average ⫾ SE) to the task cues. The PSTHs were calculated with the absolute operator and show the mean deviation from the background activity. Top: GPe (n ⫽ 310 neurons). Middle: GPi (n ⫽ 149). Bottom: SNr (n ⫽ 145). Color coding: dark blue, responses to high-probability (P ⫽ 1 and P ⫽ 2/3) reward cues; light blue, low-probability (P ⫽ 1/3) reward cue; green, neutral cue; orange, aversive low-probability cue; red, aversive high-probability cue. B: fraction of cells with significant (3-sigma rule) modulations of firing rate in the cue epochs. Blue, responses to all reward predicting cues; red, responses to all aversive predicting cues. Neutral events are not included because of their relatively lower number and to enable inclusion of all rewarding/aversive events in the statistical tests. The ordinate is the fraction of cells that had a significant response at each time bin (1 ms). The values above zero are the fraction of cells that significantly increased their firing rate; the values below zero are the fraction of cells that significantly decreased their firing rate. 0 0 Time(s) 2 0.5 0 Time(s) J Neurophysiol • VOL 2 101 • FEBRUARY 2009 • 31 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 predicting cue conditions (top left) that differentiates between reward probabilities. This neuron had a small response in the aversive and neutral conditions (middle and bottom left). Discharge rate returned rapidly to baseline in the outcome and no-outcome (omission) phases (middle and right columns). Figure 2B shows a GPi neuron; with respect to the GPe example, responses of the GPi neuron were largest in the reward-cue conditions (top left). Unlike the neuron in Fig. 2A, this neuron also responded to the aversive cue (bottom left); however, this response was similar to the response to the neutral cue (middle left). Finally, the SNr neuron in Fig. 2C also responded to the reward cue and only slightly to the aversive cue (left column). However, unlike the other two neurons, this neuron responded mainly with a decrease to the reward cue. Notably this neuron had a very large response when reward was omitted (top right). To summarize, all the neurons had larger responses to the reward cues than to the aversive cue. Furthermore, reward probability, but not aversive probability, was encoded by these neurons. The neurons that did respond to the aversive cue responded similarly to the aversive and neutral cues. In the following text we provide further analyses of both the population PSTH and the single-cell responses of all recorded neurons at the three BG structures. 763 Results II 764 JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN 23 100 GPe 78 1 1 10 28 GPi 40 10 10 100 N = 75/145; 51.7% 6 SNr 100 10 1 1 100 N = 49/149; 32.9% 6 GPi 7 36 10 1 1 10 100 N = 77/145; 53.1% 9 SNr 100 21 24 48 44 10 10 1 10 100 100 GPe 20 55 100 N = 50/149; 33.6% 2 8 1 1 N = 103/310; 33.2% 100 reward both aversive none 10 aversive response index (Spike/s) B N = 108/310; 34.8% 7 aversive probability index (Spike/s) A 2008). Note that in the population PSTH analysis we found a substantial response to the aversive cue. However, in the next sections we show that these responses were similar to the response to the neutral cue and thus do not reflect the expectation of an aversive event. The population PSTH is an average measure and therefore may be biased by a few neurons with an extreme response and, likewise, opposite effects may be averaged out. On the other hand, the fractional analysis classifies the responding bins in a binary rather than a graded way. We therefore formulated the difference index as a measure of the modulations of a single neuron to different events. For the response index, we grouped responses across probabilities and tested whether single-cell responses to reward and aversive cues were different from their response to the neutral cue. Figure 4A shows the scatterplots comparing the response index for the reward and aversive trials. Many of the BG main-axis neurons had a significant reward and/or aversive response index (GPe: 34.8%; GPi: 33.6%; and SNr: 51.7% of the total number of recorded neurons), indicating a significant difference between these responses and the responses to the neutral cue. In all populations, the response index for the reward trials of most neurons was larger than the response index for aversive trials (Fig. 4A). A substantial fraction of the BG units showed a significant response index for reward cues, whereas FIG. 4. Single-cell responses in the cue epoch. A: log-log scatterplots comparing the response index of individual neurons to reward and aversive cues. The response index was calculated for each cell (310 GPe, 149 GPi, and 145 SNr neurons) as the absolute difference between the aversive or reward cue-aligned PSTH and the PSTH of the neutral cue. The black line is the identity (Y ⫽ X) line. Points below this line represent cells with a response index that was larger for the reward cues than for aversive cues. Top: GPe. Middle: GPi. Bottom: SNr. Color code: blue, response index significant only for reward cues; red, response index significant only for aversive cues; green, both response indices were significant; gray, neither response index was significant. Significance level was P ⬍ 0.05. The time window used for this analysis was 0 –2,000 ms from cue presentation. Inset: pie chart of the fraction of cells with a significant index for reward (blue), aversive (red), and both (green) cues out of all cells with a significant response index (number of responding cells is given in the text at inset, top). B: log-log scatterplots comparing the probability coding of rewarding and aversive events by individual GPe, GPi, and SNr neurons. The index was calculated as the difference between the grouped response to the high-probability (P ⫽ 2/3 and P ⫽ 1) and the low-probability (P ⫽ 1/3) events. Color code: blue, probability coding index significant only for reward cues; red, probability coding index significant only for aversive cues; green, both response-indices were significant; gray, neither response index was significant. Points below the identity line represent cells with a probability coding index that was larger for the reward cues than for aversive cues. Inset: pie chart of the fraction of cells with a significant probability coding index for reward only (blue), for aversive only (red), and for both (green) cues out of all cells with a significant probability index. 1 1 10 100 reward response index (Spike/s) 1 10 100 reward probability index (Spike/s) J Neurophysiol • VOL 101 • FEBRUARY 2009 • 32 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 This is probably due to the larger fraction of these neurons that responded with increases rather than with decreases in their firing rate (Fig. 3B and Table 2). Figure 3B shows the fraction of cells that increased or decreased their rate at each time after the cue presentation. Unlike the population PSTH analysis, this analysis uses a cutoff (3-sigma rule; see METHODS) for the identification of bins with a significant deviation from the background discharge rate. In line with the population PSTH analysis, the fraction of cells that significantly modulated their firing rate in each of the 1-ms bins of the cue epoch was larger for the reward cue than for the aversive cue. This difference in the number of cells with significant responses to the reward versus the aversive cue was larger for increases in firing rate than for decreases (Fig. 3B and Table 2). Comparing the patterns of response bins with the increase versus decrease in discharge rate showed that, unlike the BG neuromodulators (Joshua et al. 2008), these opposing responses were coincident (Fig. 3B); i.e., some of the cells increased their firing rate whereas others decreased it at the same time. Finally, both the population (Fig. 3A and Supplemental Fig. S1) and the fractional analyses (Fig. 3B) showed that activity in the main axis was sustained, which contrasts with the phasic responses of the neuromodulators (Joshua et al. Results II VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS In the outcome epoch neurons responded both to food and air puff delivery but, unlike in the cue epoch, they did not consistently encode the probability of these events Figure 5 shows the population PSTH and fraction of responding cells for the outcome epoch. PSTH population analysis of the outcome epoch showed that all BG main-axis A B GPe 12 populations responded to both reward and aversive outcomes (Fig. 5A). Responses in this epoch to the neutral trials (i.e., when no reward or airpuff was expected) were small (Fig. 5A, green traces, and next paragraph). In the GPe and GPi the peak response was larger for the food outcome than that for the airpuff, whereas in the SNr the magnitude of the peak response to aversive and reward outcomes was similar. Unlike the population cue responses, the population responses to the outcomes that followed cues indicating different outcome probabilities were similar and the SNr population alone showed a slight difference at food delivery time (Fig. 5A). As in the cue epoch, the BG responses to the different outcomes contained both increases and decreases, with more cells increasing than decreasing their firing rate (Fig. 5B and Table 2) and the differences between the average responses to reward versus aversive outcomes were due to differences in the fraction of cells responding with increases in discharge rate (Table 2). Figure 6 shows the response index and probability coding index analysis in the outcome epoch. The GPe and GPi responses to the reward outcome were larger and more frequent than the responses to aversive outcome (Fig. 6A, top subplots). However, many SNr cells responded to both food and airpuff outcomes (Fig. 6A, bottom). Contrary to the population analysis (Fig. 5A), many GPe, GPi, and SNr cells did in fact encode the difference between high- and low-reward probabilities (Fig. 6B). These differences between the average population and the single-unit analysis suggest that the absence of significant probability coding in the population analysis can be attributed to opposite modulation effects; i.e., some cells had a GPe 0.5 Reward Aversive Rwd High P Rwd Low P Neutral Avr Low P Avr High P 0 0 0 0.5 2 GPi Spike/s Decrease Increase Fraction of cells GPi 12 0.5 2 0 0 2 SNr 12 0.5 FIG. 5. Population response in outcome epoch. A: population responses at the time of outcome delivery (blue, food; red, airpuff) and the response to the neutral noise in the trials when no outcome was expected (green, neutral trials). The PSTHs are calculated with the absolute operator and show mean deviation from baseline. B: fraction of cells with significant modulations of firing rate. Same conventions as in Fig. 3. 0 0.5 2 SNr 0 0 Time(s) 2 0.5 0 Time(s) J Neurophysiol • VOL 2 101 • FEBRUARY 2009 • 33 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 only a small number of cells had a significant response index for aversive cues (Fig. 4A, insets). The probability coding index compares the difference between reward and aversive probability coding. For this purpose we classified the cues into high-probability (P ⫽ 2/3 and 1) and low-probability (P ⫽ 1/3) cues (in accordance with the monkeys’ behavior; Fig. 1). In Fig. 4B we show scatterplots of the probability coding indices of three neuronal populations. In addition to the larger reward response index (Fig. 4A), coding of the reward probability was larger (Fig. 4B) and more frequent (Fig. 4B, insets) than coding of the aversive probability in the three neuronal populations. Supplemental Fig. S2 shows the time course of the probability encoding (100-ms bins, ANOVA). In most cases, a sustained encoding is seen that is greater for the rewarding than that for the aversive trials. In both of these difference index analyses (response index and probability coding index) the fraction of cells with a significant index was larger for SNr (51–53% of the cells) than that for GPe and GPi (32–34%; Fig. 4, inset text; 2 test, P ⬍ 0.01). The difference in the fraction of cells between the GPe and GPi was not significant (2 test, P ⫽ 0.78). 765 Results II 766 JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN A B N=80/310; 25.8% 10 5 100 N=180/310; 58.1% 29 GPe GPe 100 36 10 N=87/149; 58.4% 8 GPi 100 1 1 100 10 GPi 100 20 59 100 N = 27/149; 18.1% 5 2 FIG. 6. Single-cell responses in the outcome epoch. A: loglog scatterplots comparing the response index of individual neurons to reward and aversive outcomes. B: log-log scatterplot comparing the probability coding index of the responses of individual neurons to reward and aversive outcomes. Same conventions as in Fig. 4. In this analysis we used a short time window of 0 –1,000 ms from outcome delivery to enable better comparison between the fast response to the aversive event and the slower response to reward outcome. 20 10 10 1 1 10 100 N=122/145;84.1% 26 38 SNr 100 1 1 100 10 100 N=49/145; 33.8% 10 SNr 7 32 58 10 10 1 1 10 100 reward response index (Spike/s) 1 10 100 reward probability index (Spike/s) larger response to the high probability, whereas others had a larger response to the lower-probability trials. Finally, the fraction of SNr neurons with a significant response index (84%) was greater than the corresponding fraction of GPe and GPi cells (58%; Fig. 6; 2 test, P ⬍ 0.01). The fraction of SNr cells with probability coding indices (34%) was greater than the corresponding fraction of GPe and GPi cells (25 and 18%, respectively, Fig. 6B). However, this difference in fraction of cells was significant only for the GPi (2 test, P ⬍ 0.01). Encoding of reward prediction error would predict the opposite trend in the coding of reward probability in the cue and outcome (Fiorillo et al. 2003; Morris et al. 2004). To probe this possibility we tested for correlations between the difference in response to the high and low probabilities at the cue epoch versus the difference at the outcome epoch. For the GPe and GPi we found a small positive correlation coefficient (CC ⫽ 0.16 and 0.34, respectively; t-test, P ⬍ 0.01); for the SNr we found a small negative correlation that did not reach significance (CC ⫽ ⫺0.08; P ⫽ 0.32). Thus we conclude that HFD neurons of the main axis of the BG do not encode the prediction error. Neural response in the no-outcome epoch Figure 7 shows the PSTH population and fraction of responding cells for the no-outcome epoch. As in the cue J Neurophysiol • VOL epoch, population analysis in the no-outcome epoch showed that responses to reward omission trials were larger (Fig. 7A) and more frequent (Fig. 7B) than responses to aversive omission trials. As in the outcome epoch the difference between the population responses to omission of high- and low-probability outcomes was small (Fig. 7A). The population response (Fig. 7A) and the fraction of units with significant changes in their discharge rate (Fig. 7B) to outcome omission declined rapidly and reached the baseline within ⬍1.5 s. This contrasts with the outcome responses where the response did not decline to the background (ITI) level even after 2 s (Fig. 5, A and B). Figure 8 shows the response index and probability coding index analysis in the no-outcome epoch. This single-cell analysis shows that, comparable to the population analysis, cell responses to the reward omission were larger and more frequent than their responses to aversive omission (Fig. 8A) and more cells encoded the a priori reward probability than the aversive probability (Fig. 8B). The fraction of SNr cells with a significant response index (41%) was greater than the fraction of GPe and GPi neurons with a significant response index (30 and 33%, respectively). This difference was significant only for the GPe (2 test, P ⬍ 0.05). The fraction of SNr cells with a significant probability coding index (34%) was greater than the fraction of GPe and GPi neurons 101 • FEBRUARY 2009 • 34 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 aversive probability index (Spike/s) 1 1 aversive response index (Spike/s) 65 reward 10 both aversive none 115 10 Results II VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS A B GPe 12 767 GPe 0.5 Rwd High P Rwd Low P Neutral Avr Low P Avr High P 0 0 0 0.5 2 GPi Spike/s Decrease Increase Fraction of cells GPi 12 0.5 2 0 2 SNr 12 0.5 0 0.5 2 SNr Reward Aversive 0 0 Time(s) 2 0.5 0 Time(s) with a significant index (22 and 19%, respectively; 2 test, P ⬍ 0.05). Activity in the outcome and no-outcome epochs did not only reflect decay from sustained cue activity Activity in the cue epoch is sustained and continues until the end of the cue epoch (Fig. 3 and Supplemental Fig. S1). Thus activity after the cue epoch (i.e., at outcome and no-outcome epochs) could reflect a slow decay of cuerelated activity to the tonic discharge level of these neurons. For example, the response of the GPe neuron in Fig. 2A at the outcome epoch (Fig. 2A, top middle plot) could be attributed to a slow decay from cue activity. A contrasting example is the response of the SNr neuron in Fig. 2C at the no-outcome epoch (Fig. 2C, top right plot). This response cannot be attributed to a slow decay since it shows a clear increase after reward omission (no outcome). In Fig. 9 we show the percentage of cells whose activity in the outcome/no-outcome epochs was significantly different from the precue activity and the percentage of cells from these groups in which activity did not reflect decay (see METHODS). We found that many of the responses to the reward outcome could not be attributed to slow decay of the sustained cue activity (Fig. 9A, black bars; GPe: 28%; GPi: 22%; and SNr: 40% out of the whole population). The number of responses (that cannot be attributed to decay of cue activity) to aversive outcome was smaller than the number of responses to reward outcome (Fig. 9A, gray bars; GPe: 4%; GPi: 6%; and SNr: J Neurophysiol • VOL 2 20%). Very few GPe and GPi cells responded to reward omission itself (Fig. 9B, black bars; GPe: 9%, GPi: 6%); however, in the SNr a larger fraction of cells responded (decay excluded) to reward omission (Fig. 9B, black bar; SNr: 21%). In all the structures the number of cells that responded (decay excluded) to aversive omission was smaller than the fraction of cells that responded to reward omission (Fig. 9B, white bar; GPe: 1%; GPi: 1%; and SNr: 8%). In summary, we found that activity in the outcome/nooutcome epoch did not only reflect the decay from sustained cue-related activity and that BG HFD cells clearly encode outcome and no-outcome events. Note that the fraction of cells of which we could rule out the possibility of decay from sustained activity is a lower limit of the actual number of responding cells. This is because our method for testing the null hypothesis—that activity is not due to decay—is very conservative (i.e., the discharge at the outcome or the no-outcome epoch may fall between the ITI and the end of cue discharge level and still reflect a valid response to the outcome or no-outcome events). Other methods that include interpolation of the whole temporal pattern of the response may report a larger number of responding cells to the outcome and no-outcome events. SNr neurons responded with shorter latencies than those of GPe and GPi neurons Figure 10 shows the analysis of the response latency to the reward and aversive cues. The latency of SNr responses was significantly shorter than the responses of the GPe and GPi 101 • FEBRUARY 2009 • 35 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 0 FIG. 7. Population response in no-outcome epoch. A: population responses in trials with no food or airpuff delivery. Mean deviations from background calculated with the absolute operator are shown. The same no-outcome tone was given at time ⫽ 0 in all trials. B: fraction of cells with significant modulations of firing rate. Same conventions as in Fig. 3. Results II 768 JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN N=95/310; 30.6% 10 A 100 2 18 reward both aversive none 67 10 1 1 100 N = 50/149; 33.6% 4 5 41 10 1 26 10 1 1 10 100 N = 50/145; 34.5% 11 SNr SNr 100 16 3 36 41 10 1 1 10 1 1 100 reward response index (Spike/s) 10 100 reward probability index (Spike/s) (Fig. 10A; Mann–Whitney, P ⬍ 0.001). No difference between the GPe and GPi populations was found (P ⫽ 0.93). We grouped the responses to reward and aversive cues and the increase and decrease responses since we did not find any significant difference between these parameters (Fig. 10, B and C). Although not significant, the GPi decrease response tended to be earlier than the increase response (Turner and Anderson 1997). We did see similar trends in the responses in the outcome and the no- outcome epochs. However, the persistent but nevertheless nonsteady activity of the BG neurons during the cue response (Fig. 3A) prevented us from establishing a reliable baseline for testing the outcome epoch responses Outcome 0.6 0.4 0.2 0 GPe GPi SNr and thus we carried out only the latency analysis for the cue epoch. DISCUSSION In this report we extended our previous study (Joshua et al. 2008) to the study of the responses of BG main-axis HFD neurons to expectation, delivery, and omission of appetitive (food), aversive (airpuff), and neutral (sound only) events. We found that the responses of GPe, GPi, and SNr neurons were longer in duration and less stereotypic than the responses of the main BG neuromodulators (TANs and DANs). As with the TANs and DANs, the responses of the BG No Outcome B Fraction of cells Fraction of cells A FIG. 8. Single-cell responses in the no-outcome epoch. A: log-log scatterplots comparing the response index of individual neurons in trials in which food and airpuff were not delivered. B: scatterplot comparing the probability coding index to food and airpuff omissions. Since P ⫽ 1 trials were never omitted, high-probability trials include only P ⫽ 2/3 trials. Same conventions as in Fig. 4 and same time window as in Fig. 6 (0 –1,000 ms from end of cue epoch and the onset of the omission sound). Reward non decay Aversive non decay Decay not excluded 0.6 0.4 0.2 0 J Neurophysiol • VOL GPe GPi 101 • FEBRUARY 2009 • 36 SNr www.jn.org FIG. 9. Fraction of cells responding at the end stage of the trials. A: black: fraction of cells that responded to the reward outcome itself and their outcome activity does not reflect decay from reward cue response; gray: fraction of cells that responded to the aversive outcome itself; activity does not reflect decay from aversive cue response. White bars: total fraction of cells with a significant difference between the precue and the outcome epoch (reward above black; aversive above gray) in which the possibility of decay activity was not excluded. B: same as B for the no-outcome epoch. Downloaded from jn.physiology.org on March 1, 2009 100 N=60/145; 41.4% 3 100 100 GPi 7 100 100 10 10 N = 34/149; 22.8% GPi 1 1 45 10 aversive probability index (Spike/s) 1 1 aversive response index (Spike/s) GPe 100 10 10 N=59/310; 19.0% 12 B GPe Results II VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS B GPe GPi SNr 0.75 Reward Aversive 500 250 0 C 0.5 0.25 0 100 GPe GPi SNr 500 Time(ms) Fraction of cells 1 Time(ms) A Increase Decrease 250 Time(ms) 1000 0 GPe GPi SNr Neural responses were larger for the reward than for the aversive trials We found preferential activation to reward versus aversive events. One possible explanation for this asymmetric neural activity is that the asymmetry arises from differences in the relative value of the rewarding/aversive stimuli that we used. An alternative possibility is that the encoding of reward/ aversion expectation is asymmetric in the BG. We find the second possibility more likely since the population responses to the aversive predicting cue and to the neutral cue were remarkably similar (Fig. 3) and very few cells encoded the cue predicting airpuff (Fig. 4). It could be argued that the monkeys ignored the air puff; we have shown this is not the case since there were large behavioral responses to cues predicting the airpuff (Fig. 1). In a previous experiment (Mirenowicz and Schultz 1996), in which the subjective values were calibrated, the authors compared a reward of 0.15 ml of juice and an aversive 28- to 58-psi airpuff directed to the hand. Similar airpuff intensities have been used in other studies comparing the responses of amygdala neurons (40- to 60-psi airpuff vs. 0.1– 0.9 ml of liquid food) (Belova et al. 2007; Paton et al. 2006) and lateral prefrontal cortex neurons (29-psi airpuff, 10 cm from the monkey’s face) (Kobayashi et al. 2006) to both rewarding and aversive events. The airpuff in the current experiment was larger (50 –70 psi) and delivered 2 cm from the monkey’s eyes. Thus this larger and closer airpuff must have had a negative subjective value. We further discuss the possibility of asymmetric encoding in the following text. Preferential control over reward-related behavior We have shown that just before the end of the cue, the fraction of trials in which the monkey licked in expectation of future reward and the fraction of trials in which the monkey blinked in expectation of future airpuff were similar in magnitude. In addition we found a large blinking response even when the airpuff was omitted (Fig. 1C). Finally, with the J Neurophysiol • VOL exception of the outcome epoch, the licking and the blinking behaviors reflected the expected (low vs. high) probability of the reward and the aversive events. Nevertheless, the BG single-cell activity was found to be biased toward the encoding of reward-related events and encoding of aversive events was very weak. This difference in activity may be compensated by the difference in synaptic connections between the BG and their targets; however, such differences have yet to be described. Several studies have used similar paradigms to compare neural responses to reward food and aversive airpuff (Kobayashi et al. 2006; Mirenowicz and Schultz 1996; Paton et al. 2006). Paton et al. (2006) showed that in the amygdala, expectations of food and airpuff are represented symmetrically. Our research shows that, in contrast to the amygdala, food and airpuff expectations are represented asymmetrically in the basal ganglia. Thus we found comparable aversive- and reward-related behaviors; however, whereas the activity in the basal ganglia strongly reflects reward behavior and encodes probability, aversive-related events and their probability are only weakly encoded in basal ganglia activity. Although we found similarity in the behavioral responses (Fig. 1, B and C), in this study we did not calibrate the subjective value (utility) of food versus airpuff; however, we did manipulate the expectation of aversive outcome. In previous instrumental conditioning experiments, including both reward and aversive events, the monkey could avoid the aversive airpuff by a correct response (Mirenowicz and Schultz 1996; Yamada et al. 2004, 2007). In the current experiment the airpuff was unavoidable and thus the aversive cue led to direct expectation of aversion. In a previous study (Joshua et al. 2008), we reported that the responses of midbrain DANs and striatal TANs (of the same monkeys engaged in the same behavioral task) are biased toward the encoding of rewarding events. The BG main axis is affected by additional neuromodulator systems, e.g., serotonin (Lavoie and Parent 1990). Theoretical studies have suggested that the phasic serotonin signal might report the prediction error for future punishment (Daw et al. 2002; Dayan and Huys 2008) and therefore could compensate for the biased encoding of the value domain by the TANs and the DANs. The current study of the BG output structures indicates that the BG mainaxis neurons have a bias toward control of reward-related behavior similar to that of TANs and DANs. Thus even if there are BG modulators other than the cholinergic and dopaminer- 101 • FEBRUARY 2009 • 37 FIG. 10. Response latency to cue. A: cumulative response latency distribution. Fraction of cells revealing significant response vs. time after cue presentation. The faster response of the SNr is represented by the faster increase of the cumulative sum and by the early crossing of the median (0.5) horizontal line. Timescale (abscissa) is shown in log scale and starts at 50 ms after cue onset. Gray coding: solid light gray, GPe; dashed gray, GPi; solid black, SNr. B: bar plot of the median and 95% confidence interval of the response latency to reward (white) and aversive cues (gray). Confidence intervals were calculated using bootstrap methods; since distributions are not symmetric upper and lower limits may be uneven. C: bar plot of the median and 95% confidence interval of the response latency of responses with increases (white) and decreases (gray) in firing rate. www.jn.org Downloaded from jn.physiology.org on March 1, 2009 main-axis neurons were larger and usually encoded reward better than aversive-related events. We found substantial differences between the three populations of BG main-axis neurons. Most notably, SNr responses were more frequent, had shorter latencies, and encoded the airpuff delivery better than the corresponding responses of GPe and GPi neurons. 769 Results II 770 JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN gic striatal inputs, the activity of BG output neurons follows the same trend as that of the TANs and DANs and is biased toward rewarding events. We therefore suggest that the other modulators do not extend the basal ganglia encoding to aversive events and that there are neuronal systems other than the BG that have control over aversive-related behavior. BG main-axis responses were long-lasting and diverse Different response characteristics of the main-axis nuclei In this study we found several major differences between the GPe, GPi, and the SNr. We found more intense changes in the responses of the SNr compared with the responses of the GPe and the GPi. SNr neurons responded with shorter latencies to the cue (Fig. 10A) and encode the airpuff outcome better than the pallidal neurons (Figs. 5 and 6). A simple explanation for the enhanced encoding is the orofacial (licking and blinking) motor behavior of the monkeys in this experiment. Initial studies emphasized the role of the SNr in the control of orofacial movements (DeLong et al. 1983; Hikosaka J Neurophysiol • VOL Concluding remarks In this study we extend our previous work on BG neuromodulators (Joshua et al. 2008). We found a similar bias of GPe, GPi, SNr, TANs, and DANs for the encoding of expectation of rewarding versus aversive events. Thus the BG main axis may mainly reflect the teaching message (and corticostriatal plasticity control) of the TANs and DANs and may not be significantly affected by additional modulators with broader or different messages. Our results show a complex and different encoding by GPe, GPi, and SNr neurons. Moreover, they indicate a different encoding by GPi and SNr neurons and therefore suggest that there are many functional differences between these two BG output nuclei, despite their similar biochemical and physiological characteristics. Future models and studies of the computational physiology of the basal ganglia and their disorders should therefore attempt to disentangle the different functions of GPi and SNr. ACKNOWLEDGMENTS We thank Dr. Bryon Gomberg for MRI; M. Levi and M. Rivlin for help in preparing the experimental setup; Y. Renernt and I. Finkes for monkey training and general assistance; and G. Schoenbaum, Y. Shaham, and G. Morris for critical reading of earlier versions of this manuscript. 101 • FEBRUARY 2009 • 38 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 In contrast to the short (⬍0.7 s) responses of the BG modulators (Apicella 2007; Joshua et al. 2008; Morris et al. 2004; Schultz 1998), the responses of the BG main-axis HFD neurons lasted throughout the 2-s-cue epoch. This is in line with previous descriptions of pallidal (Arkadir et al. 2004) and SNr (Wichmann and Kliem 2004) responses. Long-duration, set-related responses have frequently been described in the cortex (Fuster 1999; Miyashita 1988; Wise and Kurata 1989), where they have been attributed to short-term memory or action-preparation processes. We cannot rule out similar processes in the basal ganglia and the experimental design does not allow us to dissociate set-related versus cue-evoked responses. However, the encoding of probability by the BG main-axis neurons (Figs. 3 and 4) and the dissociation between actions and neural response (for example, no neural encoding of the probability of aversive trials, the early decay of the neural activity compared with licking behavior after reward delivery) suggests that the activity of these neurons may encode the value of the current state or state–action pairs (Lau and Glimcher 2007; Samejima et al. 2005). The tonic discharge rate of the HFD neurons (population average: 45.1– 88.1 spikes/s in this study) endows them with a better dynamic range for responses with a decrease in discharge rate. Nevertheless, consistent with many previous studies (Georgopoulos et al. 1983; Mink and Thach 1991a; Mitchell et al. 1987a; Turner and Anderson 1997) we found that the BG HFD neurons respond to behavioral events more frequently with increases than with decreases in discharge rate. The latencies and the temporal distribution of the responses with increases and decreases in discharge rate were similar (Figs. 3B, 5B, 7B, and 10C), thus leading to highly diverse BG encoding, with different polarities and different amplitudes of the responses. The differences between the population responses with no encoding of the a priori probability of outcome (Fig. 5) versus the single-unit encoding of this probability (Fig. 6) are in line with such a balanced diversity of the responses of BG single units. These diverse responses augment the information capacity of the BG output structure (Bar-Gad et al. 2003). and Wurtz 1983). Although this separation is not clear-cut (DeLong et al. 1985; Wichmann and Kliem 2004) our results may reflect this organization. Thus the small and less-frequent responses in the GPi could reflect the smaller representation of orofacial movements in the GPi. This could be also the reason for the activation of the SNr to aversive events, but as noted earlier this does not explain the asymmetric value representation in the SNr. At the circuitry level, one possibility is that the origins of the difference in pallidal versus SNr responses could be a result of different projections from the striatum or the subthalamic nucleus (Haber and Gdowski 2004). Another possibility is that the GPe has different pathways to the GPi and SNr and those GPe neurons that do project to the SNr are the neurons with the short latency and larger response. Nevertheless, we did not find any topographic organization in the responses of the GPe that supports this hypothesis (data not shown). Finally, another putative explanation for the differences between the GPi and the SNr is the direct effects of somatodendritic release of dopamine on SNr, but not on pallidal, neurons. The similar latencies of SNc and SNr responses support the hypothesis that SNc neurons may drive SNr responses by somatodentritic release of dopamine (Cragg et al. 2001; Windels and Kiyatkin 2006). Finally, the neural recordings were made after the monkey was highly familiar with the task and thus activity might not be the same as activity that occurs during learning. Previous studies of dopaminergic neurons have shown that activity in a familiar probabilistic task does resemble the activity in a learning task (Fiorillo et al. 2003; Hollerman and Schultz 1998; Morris et al. 2004). A functional MRI study has shown that striatal activity underlies novelty-based choice in humans (Wittmann et al. 2008). Whether this is the case for other basal ganglia populations and the single-cell activity that underlies novelty representation should be investigated by future studies. Results II VALUE ENCODING IN THE BASAL GANGLIA MAIN AXIS GRANTS This study was partly supported by a “Fighting against Parkinson” Grant from the Hebrew University Netherlands Association and a Max Vorst Family Foundation grant. REFERENCES J Neurophysiol • VOL Hikosaka O, Wurtz RH. Visual and oculomotor functions of monkey substantia nigra pars reticulata. II. Visual responses related to fixation of gaze. J Neurophysiol 49: 1254 –1267, 1983. Hollerman JR, Schultz W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1: 304 –309, 1998. Joshua M, Adler A, Mitelman R, Vaadia E, Bergman H. Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J Neurosci 28: 11673–11684, 2008. Joshua M, Elias S, Levine O, Bergman H. Quantifying the isolation quality of extracellularly recorded action potentials. J Neurosci Methods 163: 267–282, 2007. Kobayashi S, Nomoto K, Watanabe M, Hikosaka O, Schultz W, Sakagami M. Influences of rewarding and aversive outcomes on activity in macaque lateral prefrontal cortex. Neuron 51: 861– 870, 2006. Lau B, Glimcher PW. Action and outcome encoding in the primate caudate nucleus. J Neurosci 27: 14502–14514, 2007. Lauwereyns J, Watanabe K, Coe B, Hikosaka O. A neural correlate of response bias in monkey caudate nucleus. Nature 418: 413– 417, 2002. Lavoie B, Parent A. Immunohistochemical study of the serotoninergic innervation of the basal ganglia in the squirrel monkey. J Comp Neurol 299: 1–16, 1990. Martin RF, Bowden DM. Primate Brain Maps: Structure of the Macaque Brain. Amsterdam: Elsevier Science, 2000. Mink JW, Thach WT. Basal ganglia motor control. I. Nonexclusive relation of pallidal discharge to five movement modes. J Neurophysiol 65: 273–300, 1991a. Mink JW, Thach WT. Basal ganglia motor control. II. Late pallidal timing relative to movement onset and inconsistent pallidal coding of movement parameters. J Neurophysiol 65: 301–329, 1991b. Mirenowicz J, Schultz W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379: 449 – 451, 1996. Mitchell SJ, Richardson RT, Baker FH, DeLong MR. The primate globus pallidus: neuronal activity related to direction of movement. Exp Brain Res 68: 491–505, 1987a. Mitchell SJ, Richardson RT, Baker FH, DeLong MR. The primate nucleus basalis of Meynert: neuronal activity related to a visuomotor tracking task. Exp Brain Res 68: 506 –515, 1987b. Miyashita Y. Neuronal correlate of visual associative long-term memory in the primate temporal cortex. Nature 335: 817– 820, 1988. Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43: 133–143, 2004. Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O. Dopamine neurons can represent context-dependent prediction error. Neuron 41: 269 – 280, 2004. Nevet A, Morris G, Saban G, Arkadir D, Bergman H. Lack of spike-count and spike-time correlations in the substantia nigra reticulata despite overlap of neural responses. J Neurophysiol 98: 2232–2243, 2007. Parent A, Cote PY, Lavoie B. Chemical anatomy of primate basal ganglia. Prog Neurobiol 46: 131–197, 1995. Pasquereau B, Nadjar A, Arkadir D, Bezard E, Goillandeau M, Bioulac B, Gross CE, Boraud T. Shaping of motor responses by incentive values through the basal ganglia. J Neurosci 27: 1176 –1183, 2007. Paton JJ, Belova MA, Morrison SE, Salzman CD. The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature 439: 865– 870, 2006. Ravel S, Sardo P, Legallet E, Apicella P. Reward unpredictability inside and outside of a task context as a determinant of the responses of tonically active neurons in the monkey striatum. J Neurosci 21: 5730 – 5739, 2001. Reynolds JN, Hyland BI, Wickens JR. A cellular mechanism of rewardrelated learning. Nature 413: 67–70, 2001. Ritov Y, Raz A, Bergman H. Detection of onset of neuronal activity by allowing for heterogeneity in the change points. J Neurosci Methods 122: 25– 42, 2002. Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the striatum. Science 310: 1337–1340, 2005. Sato M, Hikosaka O. Role of primate substantia nigra pars reticulata in reward-oriented saccadic eye movement. J Neurosci 22: 2363–2373, 2002. 101 • FEBRUARY 2009 • 39 www.jn.org Downloaded from jn.physiology.org on March 1, 2009 Apicella P. Leading tonically active neurons of the striatum from reward detection to context recognition. Trends Neurosci 30: 299 –306, 2007. Apicella P, Scarnati E, Ljungberg T, Schultz W. Neuronal activity in monkey striatum related to the expectation of predictable environmental events. J Neurophysiol 68: 945–960, 1992. Arkadir D, Morris G, Vaadia E, Bergman H. Independent coding of movement direction and reward prediction by single pallidal neurons. J Neurosci 24: 10047–10056, 2004. Bar-Gad I, Bergman H. Stepping out of the box: information processing in the neural networks of the basal ganglia. Curr Opin Neurobiol 11: 689 – 695, 2001. Bar-Gad I, Morris G, Bergman H. Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Prog Neurobiol 71: 439 – 473, 2003. Belova MA, Paton JJ, Morrison SE, Salzman CD. Expectation modulates neural responses to pleasant and aversive stimuli in primate amygdala. Neuron 55: 970 –984, 2007. Berenyi A, Benedek G, Nagy A. Double sliding-window technique: a new method to calculate the neuronal response onset latency. Brain Res 1178: 141–148, 2007. Bezard E, Boraud T, Chalon S, Brotchie JM, Guilloteau D, Gross CE. Pallidal border cells: an anatomical and electrophysiological study in the 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine-treated monkey. Neuroscience 103: 117–123, 2001. Calabresi P, Centonze D, Gubellini P, Pisani A, Bernardi G. Acetylcholinemediated modulation of striatal function. Trends Neurosci 23: 120 –126, 2000. Cragg SJ, Nicholson C, Kume-Kick J, Tao L, Rice ME. Dopaminemediated volume transmission in midbrain is regulated by distinct extracellular geometry and uptake. J Neurophysiol 85: 1761–1771, 2001. Daw ND, Kakade S, Dayan P. Opponent interactions between serotonin and dopamine. Neural Networks 15: 603– 616, 2002. Dayan P, Huys QJ. Serotonin, inhibition, and negative mood. PLoS Comput Biol 4: e4, 2008. DeLong MR. Activity of pallidal neurons during movement. J Neurophysiol 34: 414 – 427, 1971. DeLong MR, Crutcher MD, Georgopoulos AP. Relations between movement and single cell discharge in the substantia nigra of the behaving monkey. J Neurosci 3: 1599 –1606, 1983. DeLong MR, Crutcher MD, Georgopoulos AP. Primate globus pallidus and subthalamic nucleus: functional organization. J Neurophysiol 53: 530 –543, 1985. Elias S, Joshua M, Goldberg JA, Heimer G, Arkadir D, Morris G, Bergman H. Statistical properties of pauses of the high-frequency discharge neurons in the external segment of the globus pallidus. J Neurosci 27: 2525–2538, 2007. Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 1898 –1902, 2003. Fuster JM. The Prefrontal Cortex: Anatomy, Physiology, and Neuropsychology of the Frontal Lobes (3rd ed.). Philadelphia, PA: Lippincott–Raven, 1999. Gdowski MJ, Miller LE, Parrish T, Nenonene EK, Houk JC. Context dependency in the globus pallidus internal segment during targeted arm movements. J Neurophysiol 85: 998 –1004, 2001. Georgopoulos AP, DeLong MR, Crutcher MD. Relations between parameters of step-tracking movements and single cell discharge in the globus pallidus and subthalamic nucleus of the behaving monkey. J Neurosci 3: 1586 –1598, 1983. Gurney K, Prescott TJ, Wickens JR, Redgrave P. Computational models of the basal ganglia: from robots to membranes. Trends Neurosci 27: 453– 459, 2004. Haber SN, Gdowski MJ. The basal ganglia. In The Human Nervous System, edited by Paxinos G, Mai JK. Amsterdam: Elsevier, 2004, p. 676 –738. Handel A, Glimcher PW. Contextual modulation of substantia nigra pars reticulata neurons. J Neurophysiol 83: 3042–3048, 2000. 771 Results II 772 JOSHUA, ADLER, ROSIN, VAADIA, AND BERGMAN Schultz W. Activity of pars reticulata neurons of monkey substantia nigra in relation to motor, sensory, and complex events. J Neurophysiol 55: 660 – 677, 1986. Schultz W. Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1–27, 1998. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science 275: 1593–1599, 1997. Turner RS, Anderson ME. Pallidal discharge related to the kinematics of reaching movements in two dimensions. J Neurophysiol 77: 1051–1074, 1997. Turner RS, Anderson ME. Context-dependent modulation of movement-related discharge in the primate globus pallidus. J Neurosci 25: 2965–2976, 2005. Wichmann T, Kliem MA. Neuronal activity in the primate substantia nigra pars reticulata during the performance of simple and memory-guided elbow movements. J Neurophysiol 91: 815– 827, 2004. Windels F, Kiyatkin EA. Dopamine action in the substantia nigra pars reticulata: iontophoretic studies in awake, unrestrained rats. Eur J Neurosci 24: 1385–1394, 2006. Wise SP, Kurata K. Set-related activity in the premotor cortex of rhesus monkeys: effect of triggering cues and relatively long delay intervals. Somatosens Mot Res 6: 455– 476, 1989. Wittmann BC, Daw ND, Seymour B, Dolan RJ. Striatal activity underlies novelty-based choice in humans. Neuron 58: 967–973, 2008. Yamada H, Matsumoto N, Kimura M. Tonically active neurons in the primate caudate nucleus and putamen differentially encode instructed motivational outcomes of action. J Neurosci 24: 3500 –3510, 2004. Yamada H, Matsumoto N, Kimura M. History- and current instructionbased coding of forthcoming behavioral outcomes in the striatum. J Neurophysiol 98: 3557–3567, 2007. Downloaded from jn.physiology.org on March 1, 2009 J Neurophysiol • VOL 101 • FEBRUARY 2009 • 40 www.jn.org Results III Asymmetric Encoding of Positive and Negative Expectations by Low-Frequency Discharge Basal Ganglia Neurons Mati Joshua1, 2, Avital Adler1, 2 and Hagai Bergman1, 2, 3 1 Department of Physiology, The Hebrew University-Hadassah Medical School, Jerusalem, 91120, Israel 2 The Interdisciplinary Center for Neural Computation, The Hebrew University, Jerusalem, 91904, 3 Eric Roland Center for Neurodegenerative Diseases, The Hebrew University, Jerusalem, 91904, Israel. Abstract. Experimental and theoretical studies depict the basal ganglia as a reinforcement learning system where the dopaminergic neurons provide reinforcement error signal by modulation of their firing rate. However, the low tonic discharge rate of the dopaminergic neurons suggests that their capability to encode negative events by suppressing firing rate is limited. We recorded the activity of single neurons in the basal ganglia of two monkeys during the performance of probabilistic conditioning task with food, neutral and air-puff outcomes. In a related paper we analyzed the activity of five basal ganglia populations; here we extend this to the low frequency discharge neurons of the main axis of the basal ganglia i.e. the striatal phasically active neurons (PANs), and the low frequency discharge (LFD) neurons in the external segment of the globus pallidus (GPe). The licking and blinking behavior during the cue presentation epoch reveals that monkeys expected the different probabilistic appetitive, neutral and aversive outcomes. Nevertheless, the activity of single striatal and GPe neurons is more strongly modulated by expectation of reward than by expectation of the aversive event. The neural-behavioral asymmetry suggests that expectation of aversive events and rewards are differentially represented at many levels of the basal ganglia. 41 Results III 1. Introduction Experimental and theoretical studies depict the basal ganglia as a reinforcement learning system where the dopaminergic neurons provide the reinforcement error signal by modulation of their firing rate. Previous studies in primates have shown that basal ganglia activity is modulated by expectation of rewards. Most of these studies have focused on midbrain dopaminergic neurons and striatal cholinergic interneurons (tonically active neurons, TANs Wilson et al. 1990). Midbrain dopaminergic neurons have been shown to encode the mismatch in the positive domain of reinforcement; i.e., they respond when conditions are better than expected (Fiorillo et al. 2003; Satoh et al. 2003; Nakahara et al. 2004; Bayer and Glimcher 2005). TANs also modulate their activity when a reward is given (Kimura et al. 1984) or expected (Graybiel et al. 1994). However, some reports have indicated that TAN modulation is invariant to reward predictability (Shimo and Hikosaka 2001; Morris et al. 2004). Finally, although to a lesser extent, several studies have demonstrated that the main axis of the basal ganglia is modulated by expectation of reward (Sato and Hikosaka 2002; Arkadir et al. 2004; Samejima et al. 2005; Darbaky et al. 2005; Pasquereau et al. 2007; Lau and Glimcher 2007). In contrast to the extensive research on reward related activity, only a few physiological studies have explored whether neural activity of the basal ganglia encodes the negative domain (i.e., aversive outcome or omission of rewards, which as outlined below might not be identical events). Dopamine neurons decrease their firing rate in response to reward omission (Schultz 1998; Satoh et al. 2003; Matsumoto and Hikosaka 2007). However, this suppression is limited since the firing rate is truncated at zero. In fact, other groups (Morris et al. 2004; Bayer and Glimcher 2005) have reported that the firing of dopaminergic neurons does not demonstrate incremental encoding of reward omission, and alternative encoding schemes have been proposed (Tobler et al. 2005; Bayer et al. 2007). There are even fewer studies on the responses of the basal ganglia neurons to aversive events. Although arising from slightly different experimental paradigms (instrumental vs. classical conditioning, behaving vs. anesthetized animals) and slightly different recording locations (ventral tegmental area vs. the more lateral substantia nigra pars compacta) the findings are incompatible. Some studies suggest that dopamine neurons increase their firing rate following aversive events (Guarraci and Kapp 1999; Coizet et al. 2006) whereas others have evidence of a decrease (Mirenowicz and Schultz 1996; Ungless et al. 2004). There are reports that TAN activity differentiates motivationally opposing 42 Results III stimuli (Ravel et al. 2003; Yamada et al. 2004), but it remains unclear whether and how TANs respond to expectation of aversion. We designed a classical conditioning paradigm with aversive and rewarding probabilistic outcomes. Thus, symmetric manipulations of expectations of food (appetitive event) or airpuff (aversive event) were built into the experimental design, allowing for comparison of neural responses to expectation of positive and negative outcomes. In a parallel studies (Joshua et al. 2008a; Joshua et al. 2008b) we describe the activity of basal ganglia critics (midbrain dopaminergic neurons and striatal cholinergic interneurons) and high frequency (> 50 spikes/s) discharge neurons in the main axis (external and internal segments of the globus pallidus and the substantia nigra pars reticulata). In this manuscript, we analyzed the activity of two basal ganglia populations with low frequency (< 20 spikes/s) discharge, the striatal phasically active neurons (PANs), and the low frequency discharge (LFD) neurons in the external segment of the globus pallidus (GPe). 2 Methods All experimental protocols were performed in accordance with the National Institute of Health Guide for the Care and Use of Laboratory Animals and with Hebrew University guidelines for the use and care of laboratory animals in research, supervised by the institutional animal care and use committee. Two monkeys (L and S, Macaque fascicularis, female 4 kg and male 5 kg) were engaged in a probabilistic delay classical-conditioning task. At the beginning of each trial, a visual cue covering the full extent of a 17" computer screen was presented for 2 seconds. After the cue offset the monkeys received the outcome in a probabilistic manner. Images were fractal patterns constructed with the Chaos Pro 3.2 program (www.chaospro.de), with the same images presented during all training and recording periods. We delivered liquid food (L: 0.4 ml, 100 ms duration, S: 0.6 ml 150 ms) as the positive reward and an airpuff (L: 100 ms duration, S: 150 ms; 50-70 psi; 2 cm from eye), split and directed to both eyes, as an aversive stimulus. To enhance the monkeys' ability to discriminate trial outcomes, the beginning of the result epoch was signaled by one of three sounds that discriminated the delivery of food, the delivery of the airpuff and no outcome: i.e., each possible end result was accompanied by a different sound. Sounds were normalized to the same intensity and duration. These sounds were additional to the background device sounds (airpuff solenoid and food pump). Sounds and visual cues were shuffled between monkeys. All trials were followed by a variable inter trial interval (ITI) (Monkey S: 3-7 seconds, Monkey L: 4-8 seconds). Due to the different 43 Results III probabilities and in order to equalize the average occurrence of each outcome we introduced the non-deterministic cues (P ≠1 for reward or aversive) three times more than the deterministic ones. With this occurrence ratio, all trials were randomly interleaved. 3 Results 3.1 Monkey behavior reflects expectation of rewarding and aversive events We recorded neuronal activity in the basal ganglia (Fig. 1a, b) during performance of a probabilistic classical conditioning task (Fig. 1c) with food or airpuff as the rewarding and aversive outcomes, respectively. The two monkeys were introduced to seven different fractal visual cues, each predicting the outcome in a probabilistic manner. Three cues predicted a food outcome (reward cues) with a delivery probability of 1/3, 2/3 and 1; three cues predicted an airpuff outcome (aversive cues) with a delivery probability of 1/3, 2/3 and 1. The 7th cue (the neutral cue) was never followed by a food or airpuff outcome. The same seven fractal cues were presented to both monkeys; however the associated outcomes were randomized for each monkey. Cues were presented for two seconds and were immediately followed by a result epoch which could be an outcome (food, airpuff) or no-outcome, according to the probabilities associated with the cue. The beginning of the result epoch was signaled by one of three sounds that discriminated the three possible events: a drop of food, an airpuff, or no outcome (Fig. 1c). We tested how conditioning affected the monkeys' behavior by monitoring licking and blinking during neural recordings. The monkeys increased their licking in response to cues predicting food but only slightly to the aversive and neutral cues. Similarly the monkeys increased their frequency of blinking to cues predicting an airpuff but only slightly to reward and neutral cues (Fig. 1d, left). Moreover, the increase of blinking and licking during the cue epoch was maximal in trials where the probability of the outcome was 2/3 or 1 and smaller in trials where the probability was 1/3. When no food or airpuff were delivered (no outcome epoch - the p=1/3 or p=2/3 trials) licking and blinking increased, respectively. Furthermore, the increase was in accordance with the previously instructed probability (Fig. 1d, right). These behavioral results indicate that the monkeys could distinguish between aversive, reward and neutral cues and between the different outcome probabilities they were 44 Results III intended to signal, and that the symmetry in task design was reflected by the monkeys' behavior. Figure 1 – MRI, Electrophysiology and Behavior a) MRI identification of recording coordinates. Coronal MRI image at anterior commissure level. Tungsten microelectrodes are inserted at known chamber coordinates enabling identification of the brain structures by alignment of the MRI images with the monkey atlas. Abbreviations C, caudate; Chm, recording chamber (filled with 3% agar); Elc, electrode; G, globus pallidus; P, putamen. b) Example from the recordings of a low frequency discharge neuron in the GPe (top) and a phasically activated neuron in the striatum (down). c) Behavioral task. Top - reward trials; Middle - neutral trials; Bottom – aversive trials. d) Normalized behavioral response. Licking (Black) and blinking (Gray) response (average ± SEM) in a time window around the behavioral event (cue: 500-0 ms before cue ending; outcome and no-outcome: 0-500 ms after cue ending for blinking response and 500-1000 ms for licking response). The responses are normalized in each epoch by the minimal and maximal values (normalized responses = (Response-min)/(max-min)). Abscissa: different behavioral conditions (A-Aversive, N-Neutral, R-Reward; the number is the outcome probability). 3.2 PANs and GPe LFD activity is asymmetrically modulated by expectation of aversive and reward outcomes 45 Results III We recorded single unit activity from putamen PANs and from GPe LFD neurons (Fig. 1b). Cells were included in the study if they passed criteria of waveform isolation (isolation score > 0.5; Joshua et al. 2007) and discharge rate stability. Of the cells that met these criteria we further analyzed only those that were recorded during at least 20 minutes of task performance. Finally, we grouped the neurons in the same structure recorded from both monkeys. A total of 113 neurons were recorded of which 65 neurons (38 PANs and 27 GPe LFD) passed the above criteria. Figure 2a is an example of a PAN recorded during the performance of the behavioral task. The neural activity following the reward cue was larger than the activity following the neutral and aversive cues. Furthermore, the response to reward cues increased with reward probability. Population analysis shows that both GPe LFD neurons and PANs had larger responses to reward cues than to aversive cues and a larger fraction of cells responded to the reward cues (Fig. 2b-c). The population PSTH (Fig 2b, c – left columns) is an average estimate and may be biased by a few neurons with an extreme response. However, analysis of the fraction of cells with a significant response (Fig. 2b, c – right columns) is not sensitive to the relative amplitude of the responses. We therefore formulated the response-index as a measure of the relative differences between the neutral vs. the aversive and the reward responses of single neurons. We found that for the majority of cells the response-index for the reward trials was larger than the response-index for aversive trials (Fig. 3a-b). In addition, a substantial fraction of the low frequency discharge neurons in the basal ganglia showed a significant response-index to reward cues, whereas only a small number of cells had a significant response-index to aversive cues (Fig. 3c). 46 Results III Figures 2 – Population responses of striatal PANs and GPe LFD neurons to reward cues are larger and more common than responses to aversive cues a) Example of rasters and peri-stimulus time histograms (PSTHs) of a PAN aligned to behavioral events. The rows are separated according to the expected outcome. First row: trials with cues that predict the delivery of food. Second row: trials with the neutral cue (a cue always followed by no outcome). Third row: trials with cues that predict an airpuff. Columns are aligned according to the trial epoch. First column: cue presentation epoch (-0.5s to 2s after cue onset). Second column: outcome epoch (-0.5s to 2s after delivery of food or airpuff). Third column: trials in which no outcome was delivered; outcome omission was signaled to the monkey by the no-outcome sound (-0.5s to 2s after sound onset). The first 0.5s of the second and third column overlap the last 0.5s of the first column. Gray level codes are marked at the right side of the no outcome rasters (A-Aversive, N-Neutral, R-Reward; the number is the outcome probability). For the graphic presentation, rasters were randomly pruned and adjusted to contain the same number of trials. PSTHs were constructed by summing activity across trials in 1ms resolution and then smoothing with a Gaussian window (SD of 20ms). b) Left column: Population responses of PANs (n=38) to behavioral cues. PSTH were smoothed with a Gaussian (SD = 20) and averaged across cells. Black – average responses to reward predicting cues, Gray – neutral cue, Light gray – aversive cues. Right column: Fraction of cells with significant (2-sigma rule) modulations of firing rate in the cue epoch. Same color coding as in left column. Neutral events are not included to enable inclusion of all rewarding/aversive events in the statistical tests. The ordinate is the fraction of cells that had a 47 Results III significant response at each time bin (1ms). c) Left column: Population responses of GPe LFD neurons (n= 27) to behavioral cues. Smoothing with a SD =40 ms. Right column: Fraction of cells with significant firing rate modulations in the cue epoch. Same analysis and gray level code as in b. Figure 3 – Response-index analysis reveals larger responses to reward than to aversive cues a) Scatter plots comparing the response-index of individual PAN to reward and aversive cues. Response-index was calculated for each cell as the absolute difference between the (aversive or reward) cue-aligned PSTH and the PSTH of the neutral cue. The black line is the identity (Y=X) line. Points below this line represent cells with a response-index that is larger for the reward cues than for aversive cues. The differences in the scale between nuclei reflect differences in modulation size. Significance level was p<0.05. The time window used for this analysis was 0-2000 ms from cue presentation. b) Same as a, for the GPe LFD population. c) Summary of the fraction of cells with a significant response-index. 4 Discussion 48 Results III We have shown that despite the symmetry in behavior, expectation of reward - but not of an aversive event - affects rate modulations of basal ganglia low-frequency discharge neurons beyond the modulation which followed a neutral cue. Asymmetry in value expectation is congruent with theories on the localization of antagonistic motivational systems (Konorski 1967). It has been shown that neural systems other than the basal ganglia; e.g., the amygdala (LeDoux 2000) and the cerebellum (McCormick and Thompson 1984), are involved in aversive conditioning. In a parallel studies, we showed that both the neuromodulators (TANs and SNc) and basal ganglia populations with high frequency discharge (SNr, GPi and GPe) encode expectation of reward but not the expectation of an aversive event (Joshua et al. 2008a; Joshua et al. 2008b). Here we extend this notion and show similar results for the PANs and for the LFD neurons of the GPe. In a previous study, Samejima et al. (2005) showed that the activity of many striatal projection neurons was selective to both values and action; however, only a few neurons were tuned to relative values or action choice. Lau et al. (2007) reported that the encoding of action and outcome was carried out by largely separate populations of caudate neurons that were active after movement execution. Although these studies focus on different trial epochs they are suggestive of two different coding schemes. In our task the monkey performed actions in the cue epoch both in the reward and the aversive trials but not in the neutral trials (Fig 1d). However, although the action dissociates neutral and aversive trials, responses to the aversive and neutral cue were the same (Fig 3a). This suggests that it is not action per se that is encoded in these neurons. Furthermore, in the reward trials we found considerable modulations, suggesting that it is the action-value that is encoded in the striatum. We found that the fraction of cells with a short latency response was larger for the GPe LFD than for the PANs (Fig. 2b right vs. 2c right). This difference is surprising since the striatum is the main input of the GPe. Our recordings in the striatum are limited to the putamen and it could be that the fast responses of GPe LFD are due to input from other Striatum territories with faster responses. Human decisions are not symmetric in response to negative and positive prospects (Tversky and Kahneman 1981). Here, we show that the basal ganglia encoding of the positive domain surpasses their encoding of the negative domain. We extend our report to the input stage of the basal ganglia – the striatum, as well as to a unique population of neurons in the GPe – the low frequency discharge neurons. Having two biological systems, one for the aversive domain and one for the reward 49 Results III domain might be the neural basis of risk-aversive, asymmetric and non-rational human behavior (Tversky and Kahneman 1981). Reference List 1. D. Arkadir, G. Morris, E. Vaadia, H. Bergman, J. Neurosci. 24, 10047 (2004). 2. H. M. Bayer, P. W. Glimcher, Neuron 47, 129 (2005). 3. H. M. Bayer, B. Lau, P. W. Glimcher, J. Neurophysiol. 98, 1428 (2007). 4. V. Coizet, E. J. Dommett, P. Redgrave, P. G. Overton, Neuroscience 139, 1479 (2006). 5. Y. Darbaky, C. Baunez, P. Arecchi, E. Legallet, P. Apicella, Neuroreport 16, 1241 (2005). 6. C. D. Fiorillo, P. N. Tobler, W. Schultz, Science 299, 1898 (2003). 7. A. M. Graybiel, T. Aosaki, A. W. Flaherty, M. Kimura, Science 265, 1826 (1994). 8. F. A. Guarraci, B. S. Kapp, Behav. Brain Res. 99, 169 (1999). 9. M. Joshua, A. Adler, R. Mitelman, E. Vaadia, H. Bergman. Asymmetric Encoding of Positive and Negative Expectations in the Basal Ganglia. Submitted . 2008. Ref Type: Journal (Full) 10. M. Joshua, S. Elias, O. Levine, H. Bergman, J. Neurosci. Methods 163, 267 (2007). 11. M. Kimura, J. Rajkowski, E. Evarts, Proc. Natl. Acad. Sci U. S. A. 81, 4998 (1984). 12. J. Konorski, Inegrative Activity of the Brain: An Interdisciplinary Approach (Chicago Univ. Press, Chicago, 1967). 13. B. Lau, P. W. Glimcher, J. Neurosci. 27, 14502 (2007). 14. J. E. LeDoux, in The Amygdala, J. P. Aggleton, Ed. (Oxford University, 2000) ,chap. 7, pp. 289-310. 15. M. Matsumoto, O. Hikosaka, Nature 447, 1111 (2007). 16. D. A. McCormick, R. F. Thompson, Science 223, 296 (1984). 17. J. Mirenowicz, W. Schultz, Nature 379, 449 (1996). 18. G. Morris, D. Arkadir, A. Nevet, E. Vaadia, H. Bergman, Neuron 43, 133 (2004). 19. H. Nakahara, H. Itoh, R. Kawagoe, Y. Takikawa, O. Hikosaka, Neuron 41, 269 (2004). 20. B. Pasquereau et al., J. Neurosci. 27, 1176 (2007). 21. S. Ravel, E. Legallet, P. Apicella, J. Neurosci. 23, 8489 (2003). 22. K. Samejima, Y. Ueda, K. Doya, M. Kimura, Science 310, 1337 (2005). 50 Results III 23. M. Sato, O. Hikosaka, J. Neurosci. 22, 2363 (2002). 24. T. Satoh, S. Nakai, T. Sato, M. Kimura, J. Neurosci. 23, 9913 (2003). 25. W. Schultz, J. Neurophysiol. 80, 1 (1998). 26. Y. Shimo, O. Hikosaka, J. Neurosci. 21, 7804 (2001). 27. P. N. Tobler, C. D. Fiorillo, W. Schultz, Science 307, 1642 (2005). 28. A. Tversky, D. Kahneman, Science 211, 453 (1981). 29. M. A. Ungless, P. J. Magill, J. P. Bolam, Science 303, 2040 (2004). 30. C. J. Wilson, H. T. Chang, S. T. Kitai, J. Neurosci. 10, 508 (1990). 31. H. Yamada, N. Matsumoto, M. Kimura, J. Neurosci. 24, 3500 (2004). 51 Results IV Neuron Article Synchronization of Midbrain Dopaminergic Neurons Is Enhanced by Rewarding Events Mati Joshua,1,2,* Avital Adler,1,2 Yifat Prut,1,2 Eilon Vaadia,1,2 Jeffery R. Wickens,4 and Hagai Bergman1,2,3 1Department of Physiology, The Hebrew University-Hadassah Medical School, Jerusalem 91120, Israel Interdisciplinary Center for Neural Computation 3Eric Roland Center for Neurodegenerative Diseases The Hebrew University, Jerusalem 91904, Israel 4Okinawa Institute of Science and Technology, 12-22, Suzaki, Uruma, Okinawa 904-2234, Japan *Correspondence: [email protected] DOI 10.1016/j.neuron.2009.04.026 2The SUMMARY The basal ganglia network is divided into two functionally related subsystems: the neuromodulators and the main axis. It is assumed that neuromodulators adjust cortico-striatal coupling. This adjustment might depend on the response properties and temporal interactions between neuromodulators. We studied functional interactions between simultaneously recorded pairs of neurons in the basal ganglia while monkeys performed a classical conditioning task that included rewarding, neutral, and aversive events. Neurons that belong to a single neuromodulator group exhibited similar average responses, whereas main axis neurons responded in a highly diverse manner. Dopaminergic neuromodulators transiently increased trial-to-trial (noise) correlation following rewarding but not aversive events, whereas cholinergic neurons of the striatum decreased their trial-to-trial correlation. These changes in functional connectivity occurred at different epochs of the trial. Thus, the coding scheme of neuromodulators (but not main axis neurons) can be viewed as a singledimensional code that is further enriched by dynamic neuronal interactions. INTRODUCTION Technical advances enabling recordings of the simultaneous activity of several neurons (Abeles, 1982; Eggermont, 1990; Baker et al., 1999) have made it possible to study the properties of neuronal networks. Early studies (Perkel et al., 1967; Abeles, 1982; Aertsen et al., 1989; Bartho et al., 2004) focused on detection and quantization of the functional connectivity between neurons (e.g., direct excitatory, inhibitory synapses or common synaptic inputs). In the basal ganglia (Bergman et al., 1998), this approach was used to provide insights into the debate regarding the existence of parallel segregated basal ganglia pathways (Alexander et al., 1986) versus a convergent funneling architecture (Percheron et al., 1984; Percheron and Filion, 1991). Recent studies have used data from simultaneously recorded neurons to examine issues related to encoding/decoding and information processing in the nervous system (Gawne and Richmond, 1993; Schneidman et al., 2003; Averbeck et al., 2006). One study conducted by our group (Nevet et al., 2007) showed that contrary to the positive noise and signal correlation found between pairs of cortical neurons (Gawne and Richmond, 1993; Zohary et al., 1994; Lee et al., 1998; Yanai et al., 2007), the average correlation in the substantia nigra pars reticulata (SNr) population does not differ significantly from zero. However, there are no studies of correlations exploring the similarity of average responses of neurons in other structures of the basal ganglia such as the globus pallidus external and internal segments (GPe and GPi respectively) on the one hand, or the neuromodulators of the basal ganglia, such as tonically active neurons (TANs, striatal cholinergic interneurons) and midbrain dopaminergic neurons (DANs) on the other. Moreover, there are no studies on the basal ganglia that have examined dynamics in the correlation of trial-by-trial discharge variations; i.e., the dynamics of the noise correlation. The division of the basal ganglia into neuromodulator and main axis subsystems is based on both anatomical (Parent and Hazrati, 1995; Haber and Gdowski, 2004) and physiological properties of these neurons (DeLong, 1971; Grace and Bunney, 1983a; Kimura et al., 1984; Joshua et al., 2008, 2009). It was suggested that the neuromodulators provide the network a single-dimensional signal (scalar) and that the main axis utilizes this scalar (Schultz, 1998; Bar-Gad et al., 2003). The most common basal ganglia models suggest that they operate as a reinforcement learning system in which the DANs encode the temporal-difference prediction error (Schultz et al., 1997). These models assume that the teaching message is transmitted to all striatal territories, and the neural plasticity of the cortico-striatal synapses is regulated by a homogenous dopamine signal and selective corticostriatal activity (Arbuthnott and Wickens, 2007). The cholinergic interneurons are assumed to mediate or complement the teaching message of the DANs (Centonze et al., 2003; Pisani et al., 2007). Models that include the basal ganglia main axis suggest that by contrast to the scalar nature of the neuromodulators, the main axis activity is diverse (Mink, 1996; Bar-Gad et al., 2003). The GABAergic lateral connections in the main axis (Tunstall et al., 2002; Plenz, 2003; Haber and Gdowski, 2004) support the notion of a competitive component in the activity of main axis neurons (Fukai and Tanaka, 1997; Frank et al., 2004). 52 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 695 Results IV Neuron Basal Ganglia Correlations Figure 1. Recording and Behavioral Task (A) Behavioral task. Classical conditioning task with three cues that predicted a food outcome (reward cues), three cues predicted an airpuff outcome (aversive cues), and one neutral cue. The outcome delivery on each trial was randomized according to a fixed probability associated with the trial cue. Cues were randomized between monkeys and are shown as presented to monkey S. (B) Top: Simultaneous extracellular recordings from eight electrodes in the globus pallidus. In seven electrodes the cells were classified as GPe pausers, and one of the cells was classified as a pallidal border cell (electrode 6). Bottom: Simultaneous extracellular recordings of TANs from six electrodes in the striatum. Data are shown after 300–6000 Hz digital band-pass filtering. (C) A schematic diagram of basal ganglia connectivity. Dark blue arrows indicate glutamatergic excitatory connections; light blue arrows, GABAergic inhibitory connections; red, neuromodulators. Abbreviations: GPe indicates external segment of the globus pallidus; GPi, internal segment of the globus pallidus; SNc, substantia nigra pars compacta; SNr, substantia nigra pars reticulata; STN, subthalamic nucleus; TAN, tonically active neurons (putative striatal cholinergic interneurons). The recent development of efficient tools for simultaneous recording of multineuron activity from the basal ganglia makes it possible to explore the correlation of basal ganglia neurons. Given the above, our working hypothesis predicts that the responses of neuromodulators should be homogenous and synchronized whereas main axis activity should be diverse and independent. In addition, the temporal modulation of noise correlation (Aertsen et al., 1989; Vaadia et al., 1995; Baker et al., 2001) might provide another domain, beyond rate and pattern, for neuronal encoding. RESULTS Behavior Task and the Neuronal Data Base Two monkeys were introduced to seven different visual cues, each predicting the outcome in a probabilistic manner (Figure 1A). 696 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 53 Three cues predicted a food outcome (reward cues) with a delivery probability of 1/3, 2/3, and 1, and three cues predicted an airpuff outcome (aversive cues) with a delivery probability of 1/3, 2/3, and 1. The seventh cue (the neutral cue) was never followed by a food or an airpuff outcome. Thus the task contained 18 different events, i.e., 7 different cues and 11 cue-outcome/no-outcome combinations. During the task we recorded the spiking activity of two to eight electrodes simultaneously (see Figure 1B for an example of simultaneous recordings of eight electrodes in the globus pallidus and for the simultaneous recording of six electrodes in the striatum that show activity of TANs). To avoid bias caused by shadowing effects (Lewicki, 1998; Bar-Gad et al., 2001), we limited this study to units recorded by different electrodes. Our neural database included 163 TANs, 144 DANs, 368 GPe, 158 GPi, and 174 SNr pairs of neurons (see Figure 1C for schematic network diagram) that were recorded simultaneously and satisfied the study inclusion criteria (see Experimental Procedures) for more than 30 successive minutes during task performance. Response Homogeneity of Neuromodulators versus Diversity of Responses in the Main Axis of Basal Ganglia Networks We used the response correlation (Nevet et al., 2007) to quantify the similarity of the responses of a pair of cells to the same event. Results IV Neuron Basal Ganglia Correlations Figure 2. Response Correlation Reveals Similarity of Responses of the Basal Ganglia Modulators versus Heterogeneity of Responses of Main Axis Neurons (A) Distribution of the GPe, GPi, and SNr (main axis) response correlations. Only responses with significant rate modulations of both neurons were included. N indicates number of included response pairs out of the total number of response pairs. For this analysis we constructed the PSTHs for the 2 s after the event onset in bins of 1 ms and smoothed them with a Gaussian filter of SD = 20 ms. (B) Distribution of the DAN and TAN (neuromodulators) response correlations (same conventions as in A). (C) The mean and SEM of the response correlation in each of the recorded populations. (D) The percentage of significant response correlations (t test; p < 0.05). Black indicates positive response correlations; white, negative response correlations. The smoothing of the PSTHs leads to dependency between bins, and hence for the significance testing we constructed the PSTHs in bins of 50 ms with no smoothing. The response correlation is the correlation coefficient between two average responses (poststimulus time histogram [PSTH]) and hence quantifies the similarity of the temporal pattern of the responses. Figure 2 shows the distribution of the response correlation analysis for all studied populations. The response correlations for the GPe, GPi, and SNr neurons were symmetrically distributed with an average close to zero (Figure 2A). However, the distribution of the response correlation of DANs and TANs was skewed toward positive values (Figure 2B). The mean response correlation of the neuromodulators was larger than the mean correlation for the main axis (p < 0.001; t test on the z transformed values, Figure 2C). We found that the difference was also apparent in the fraction of significant positive and negative response correlations. A large proportion of the positive response correlations of the DANs and TANs were significantly different from zero, but this was true for only a small proportion of the negative correlations (Figure 2D). In the GPe, GPi, and SNr, although many of the response correlations were significantly different from zero, the proportion of cells with positive and negative response correlations was similar (Figure 2D). We conclude that the neuromodulators of the basal ganglia have homogenous responses whereas the responses of the main axis are diverse. Response correlation analysis tests the correlation between pairs of responses to single events; however, it does not directly test the correlation between the average responses of pairs of neurons to more than one event. To test whether encoding of different events is correlated we performed signal correlation analysis (Gawne and Richmond, 1993; Lee et al., 1998; Averbeck and Lee, 2004). We found that the signal and response correlation analysis yielded similar results; i.e., the distribution of the signal correlation of the neuromodulators was skewed toward positive values and for the main axis the signal correlation was symmetrically distributed with an average close to zero (see Figures S1A–S1D available online). Comparing the signal and response correlations showed that these two correlation measures were correlated (Figure S1E). This indicates that the cell pairs with comparable temporal response pattern are those that encode different events similarly. To summarize, the average responses of the basal ganglia neuromodulators (TANs and DANs) were homogeneous, in contrast to the diverse responses of neurons in the main axis of the basal ganglia (GPe, GPi, and SNr). Reward Expectation and Delivery Enhances Temporal Modulation of DAN Correlations The response and signal correlations are measures of the correlation of the average responses (across trials) of two cells and do not take into account the dynamic changes in their noise correlation (correlations between variations from the average response) that can occur within a given epoch (see Figure S2 for average noise correlation). We therefore calculated the joint peristimulus histogram (JPSTH) (Gerstein and Perkel, 1969; Aertsen et al., 1989; Vaadia et al., 1995). The JPSTH is obtained by subtracting the PSTH predictors from the raw coincident count matrix to obtain an estimate of the unpredicted correlations, i.e., correlations beyond those predicted by the modulation of 54 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 697 Results IV Neuron Basal Ganglia Correlations Figure 3. Noise Correlation of DAN Pairs Increased with Expectation of Reward and Reward Delivery but Not for Aversive Events (A) The population JPSTH of the DANs (n = 144 pairs) for the reward trials. Left, cue; middle, outcome; right, no outcome. Bin size 50 3 50 ms, smoothed with a two-dimensional Gaussian filter with SD = 1 bin. The different JPSTHs have different intensity (color bars on the right) scales to enhance the visibility of the correlation dynamics. (B) The DAN population JPSTH for aversive trials. Corresponding epochs in (A) and (B) have the same color scaling to enable comparison of aversive and reward JPSTHs. the average discharge rate (see Figure S3 for three examples of JPSTH analysis). Note that the JPSTH diagonal quantifies the time-dependent modulation of zero lag noise correlation. We extended the JPSTH analysis of a single neuron pair to the populations of neuromodulator neurons. To examine whether the DANs noise correlation depends on the context of the behavioral task, we analyzed the reward and aversive trials separately. In Figure 3 we show the separation of the DAN population JPSTHs into reward and aversive trials. In the cue and outcome epochs, the DAN noise correlation increased only for the reward trials (Figure 3A) but not for the aversive trials (Figure 3B). Testing for differences between the average JPSTH diagonal before and after the event (paired t test on the average diagonal comparing 0.5–0.0 s versus 0.1–0.6 s) shows that there was a substantial increase in the noise correlation for the reward cue (p < 0.01) and outcome (p < 0.001) as compared with a nonsignificant increase for the aversive cue (p = 0.46) and a nonsignificant decrease for the aversive outcome (p = 0.06). The JPSTH analysis revealed changes in the synchronization level beyond those expected by the changes in firing rate (Aertsen et al., 1989). In Figure 4 we show the comparison between synchronization and rate modulations (JPSTH and predictor diagonals, respectively). We found that although there was an increase in rate for both reward and aversive trials (Figure 4A and Joshua et al., 2008), the increase in the noise correlation was found only in the reward trials (Figure 4B, and see Figure 4C 698 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 55 for a comparison of noise correlation dynamics for epochs with similar rate modulation). Furthermore, the JPSTH analysis for the subset of dopaminergic pairs that simultaneously increase their firing rate to aversive outcome shows that the noise correlation of these cells does not increase (Figure S4). JPSTH analysis of the TANs did not reveal a correlation encoding of the rewarding versus aversive events (Figure S5). Figure 5 shows the results of the significance test (paired t test) comparing the JPSTH diagonals for the reward and aversive trials. The difference between reward and aversive in the cue and outcome epochs was highly significant for the DANs (Figure 5, red line) but not for the TAN pairs (Figure 5, green line). Thus, the transient changes in noise correlation in the DANs, but not TANs, discriminate between reward and aversive related events. TANs Show an Unspecific Decrease in Noise Correlation before Cue Ending Figure 6 presents the analysis of the population JPSTH for the TANs (from 0.5 s before cue onset to 1 s after cue offset and the beginning of the outcome/no-outcome epoch). We grouped the outcome and no-outcome epochs because we did not find significant differences between their JPSTHs (paired t test; p > 0.l6). As was previously shown (Raz et al., 1996; Kimura et al., 2003; Morris et al., 2004), we found that TANs tend to have positive noise correlations. In comparison to the fast increase of the Results IV Neuron Basal Ganglia Correlations Figure 4. Modulations of DAN Noise Correlation Do Not Mirror Rate Modulation (A) Common rate modulations: Diagonal of the PSTH predictor (±SEM in gray shading, n = 144 DAN pairs) for the reward (blue) and aversive events (red). Left, cue; middle, outcome; right, no outcome. (B) Zero lag noise correlation: JPSTH diagonal (±SEM in gray shading) of the DANs for the reward (blue) and aversive (red) events. Same conventions as in (A). (C) An example of reward and aversive events with similar rate modulation but opposite JPSTH modulations. Left: Predictor diagonal (common rate modulation) for reward cue (blue solid line) and aversive outcome (red solid line). Right : Corresponding JPSTH diagonals (noise correlation modulations). The rate and JPSTH modulation of the other events in (A) and (B) left and middle subplots are given in dashed lines. Although both PSTH predictors (common rate modulations) have a similar positive peak (left), only the diagonal of the JPSTH for the reward cue has positive modulations (right). noise correlation of the DANs (Figures 3 and 4) following the onset of rewarding cue and outcome, the TAN correlations decreased gradually during the cue epoch and increased in the outcome epoch (Figures 6A and 6B). We found that the TANs correlation and rate modulations tended to be separated in time (Figure 6C). DISCUSSION We showed that the responses of cells from the same neuromodulator population (TANs or DANs) tended to have a positive correlation. In comparison to the homogenous responses of the basal ganglia modulators, the neurons of the basal ganglia main axis had diverse responses. Pairs of DANs, as well as pairs of TANs, dynamically modulate their discharge variation (noise correlation) in accordance with events in the behavioral task. The noise correlation between the DANs increased after the cue and outcome events, whereas the TANs noise correlation decreased just before cue offset. Furthermore, although the discharge rate of the DANs increased both in reward and aversive trials, their noise correlation increased only in the reward trials. Correlations of the Average Response Set Neuromodulators Apart from the Main Axis Previous studies have observed that different neuromodulator cells have responses with similar temporal patterns (Graybiel et al., 1994; Schultz, 1998). In this manuscript we quantified the similarity of the temporal pattern of the response (response correlation) and the similarity of the encoding of different events (signal correlation). We showed that in contrast to the basal ganglia neuromodulators, the main axis responses are diverse (Figures 2, S1, and S2). The homogeneous responses of the neuromodulators suggest that these populations as a whole provide the main axis with a scalar message; i.e., the encoding of different DANs, as well as different TANs, is similar. By contrast, the diversity of the main axis responses suggests that its activity is highly independent, which is conducive to a large information capacity (Bar-Gad et al., 2003). The contrast between the diversity of the main axis response and the homogeneity of the 56 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 699 Results IV Neuron Basal Ganglia Correlations scalar response is consistent with these neurons being the teacher (e.g., a critic) of this system. The actor, however, requires specificity in encoding of different neuronal elements. Indeed we have found such diversity in the encoding of the main axis neurons. Figure 5. DAN but Not TAN Noise Correlation Differentiates Reward from Aversive Trials The surprise ( ln(p), p of the paired t test) of the difference between reward and aversive JPSTH diagonals for TANs (green) and DANs (red) neuronal pairs. Dashed line indicates surprise at p = 0.01, values above the dashed line indicate p < 0.01 events. Top, cue; middle, outcome; bottom, no outcome. modulators was demonstrated in a behavioral task with 18 different events. Nevertheless, we cannot rule out the possibility the recording of neural activity during other tasks or over greater spatial distances (including DANs in the ventral tegmental area and TANs in the caudate or ventral striatum) might reveal other effects. Future studies using a large variety of tasks and wider sampling of basal ganglia neurons should test the consistency and the spatial extent of the homogeneity of the basal ganglia modulators. Based mainly on the activity of the DANs, it has been suggested that the basal ganglia implements a reinforcement learning algorithm (Schultz et al., 1997). The distinction between the correlation properties of neuromodulators and the main axis is in line with the idea that these populations have a different role in the reinforcement learning system. The neuromodulators’ 700 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 57 Limitations of JPSTH Analysis Several factors limit the interpretation of JPSTH analysis. Variability of latency or excitability effects contribute confounding factors to the JPSTH matrix (Brody, 1999). We could not unequivocally exclude the possibility that these effects contributed to our JPSTHs. For the TANs, however, this is unlikely because the decrease in noise correlation toward the end of the cue epoch does not overlap with the typical fast and transient TAN response (Figure 6C). For the DANs we indeed found a tendency toward coincidence of noise correlation and rate modulations, but the JPSTH analysis dissociated the rewarding and aversive events which nevertheless have similar rate modulations (Figure 4). Trial-to-trial variability in action might also confound the interpretation of JPSTH analysis (Ben Shaul et al., 2001). Previously we have shown that due to their motor-related sustained responses, the JPSTHs of main axis neuronal pairs are sensitive to false detection of dynamic changes (Arkadir et al., 2002). However, action itself is not encoded in neuromodulators (Kimura et al., 1984; Schultz, 1998; Morris et al., 2004). Hence, we conclude that variability in action did not contribute to the neuromodulator JPSTH analysis. The neuromodulators’ firing pattern is composed of a stereotypic short latency phasic response to external events and tonic Poisson-like activity between these responses. (Kimura et al., 1984; Schultz, 1986; Bayer et al., 2007). This excludes the possibility that opposite signs of neural transients lead to detection of discharge covariation without rate modulations (Friston, 1995). We do not exclude the possibility that the increase in the correlation of the DAN population at the time of the response is due to dynamics of neural transients. Other possibilities are that the increase in correlation is due to changes in the effective connectivity in the dopaminergic neuron network or covariability of inputs. Hence we did not focus on the source of correlation, but refer to the possible effect of the correlation dynamics on the postsynaptic striatal neurons (see below). Thus the JPSTH analysis of the neuromodulators can be considered valid and provides valuable insights into the encoding of the basal ganglia. Similar studies of the dynamics of noise correlation of the basal ganglia main axis neurons will need to wait for future technical and methodological advances. Reward-Related Increase in the Noise Correlation of Dopaminergic Neurons Previous studies have shown that the discharge rate of DANs is modulated by reward, and it was suggested that these neurons encode the reward prediction error (Schultz, 1997; Nakahara et al., 2004; Bayer and Glimcher, 2005; Pan et al., 2005; Morris et al., 2006). Other behavioral factors might also lead to an increase in the dopaminergic rate (Horvitz, 2000; Kakade and Dayan, 2002; Redgrave and Gurney, 2006; Day et al., 2007). We showed that in a classical conditioning task, the activity of Results IV Neuron Basal Ganglia Correlations Figure 6. Population JPSTH of TANs Reveals a Decrease in Noise Correlation around Cue Offset (A) The population JPSTH of the TANs (n = 163 pairs). Bin size 50 3 50 ms, smoothed with a twodimensional Gaussian filter with SD = 1 bin. Cue appeared at time 0 and lasted until the beginning of the outcome/no-outcome epochs at time = 2 s (marked by dashed lines). (B) Diagonal of the population JPSTH (smoothed with Gaussian kernel, SD = 1 bin), average in solid line and SEM in light gray. (C) The mean diagonal of TAN JPSTH (blue) and the mean PSTH predictor (common rate modulation, green) superimposed. The temporal pattern of noise correlation modulations does not reflect the temporal pattern of rate modulations. Specifically, the decrease in noise correlation before the end of the cue epoch is not coincident with rate modulations. the dopaminergic neurons also increased following nonrewarding events such as the prediction and delivery of airpuffs (Figures 4 and S4, and Joshua et al., 2008). Nonetheless, we found an increase in the noise correlation of DANs to expectation and delivery of reward and not to other events (Figures 3 and 4). These finding for a reward-related increase of the noise correlation extend previous findings of unspecific spike-to-spike (noise) correlations of the DANs (Grace and Bunney, 1983b; Morris et al., 2004). The modulations of the noise correlation were small compared with the modulations of rate (Figure 4). In a recent study, Schneidman et al. (2006) showed that a weak pairwise correlation might imply a strongly correlated network and provides an effective description of the system. It remains to be determined whether pairwise correlations can yield an effective description of the dopaminergic neurons because current recording methods do not enable in vivo simultaneous recording of many neurons; nevertheless, it demonstrates the potential importance of the current finding of an increase in the pairwise noise correlations. Dopamine transmission is probably not limited to classical synaptic action because it might also diffuse and reach extrasynaptic receptors (Cragg and Rice, 2004; Arbuthnott and Wickens, 2007; Moss and Bolam, 2008). The spatiotemporal distribution of dopamine effects in the striatum depends on the interaction of release, reuptake, and diffusion. The degree of temporal correlation of the release events influences the relative importance of reuptake versus diffusion. Reuptake by the dopamine transporter is a slow process compared with diffusion of dopamine away from a synapse. Diffusion produces a relatively rapid decrease in concentration if the extracellular concentration of dopamine from other sources is relatively low. However, if dopamine is released from many adjacent sources simultaneously, diffusion is slowed, and reuptake predominates. We used a one-dimensional random walk model to simulate diffusion of dopamine from multiple sources, combined with Michaelis-Menten reuptake kinetics. In Figure S6 we show that the DAN correlation might increase the efficiency of dopamine signaling by reduced clearance through diffusion in the correlated condition. Future studies, using 3D models of the striatum and more comprehen- sive models of correlated DAN activity, could provide a better understanding of the physiological significance of this phenomenon. TAN Correlations Are Modulated by Task Timing but Not by Value Previous studies have shown that TANs are highly synchronized (Raz et al., 1996; Kimura et al., 2003; Morris et al., 2004). However, these studies did not consider the temporal dynamics of the noise correlation. Consistent with these studies, we found that TANs are indeed highly synchronized. Additionally, we found that there is a decrease in their noise correlation just before cue offset (Figure 6). This decrease in noise correlation did not discriminate significantly between the aversive and reward trials (Figures 5 and S5) and appears after the average TAN discharge rate returns to baseline (Figure 6C). It was shown that subpopulations of striatal projection cells encode the outcome stages of the task (Lau and Glimcher, 2007). Thus the decorrelation of TANs at the end of the cue epoch could enable or facilitate this encoding of striatal projection neurons through the cholinergic control of cortico-striatal plasticity (Calabresi et al., 2000; Pisani et al., 2007). Concluding Remarks Consistent with the classical concept of dopamine-acetylcholine balance (Barbeau, 1962), the DANs and the TANs have opposing single cell responses. DANs typically increase their discharge rate in response to appetitive predictive cues and outcomes (Schultz, 1998), whereas TANs show a decrease or pause in their background discharge (Aosaki et al., 1994). We found that during the cue epoch the noise correlation of the DANs increases, whereas the correlation for the TANs decreases. We therefore suggest that the concept of dopamine-acetylcholine balance can be extended to the noise correlation of these systems. It is possible that increasing the DAN correlation and the decorrelation of TANs enables an increase and decrease, respectively, in the effective concentrations of striatal dopamine and acetylcholine. The right balance of the basal ganglia neuromodulators and cortico-striatal activity might lead to a maximization of 58 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 701 Results IV Neuron Basal Ganglia Correlations information in the basal ganglia main axis and an optimal behavioral policy. EXPERIMENTAL PROCEDURES All experimental protocols were conducted in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and with the Hebrew University guidelines for the use and care of laboratory animals in research, supervised by the Institutional Animal Care and Use Committee. Behavioral task, data-recoding methods, and single cell analysis appear in detail in previous manuscripts (Joshua et al., 2008, 2009). Here we present a brief summary of these methods and describe methods not used in the previous manuscripts. Behavioral Task Two monkeys (L and S, Macaca fascicularis, female 4 kg and male 5 kg) were introduced to seven different fractal visual cues, each predicting the outcome in a probabilistic manner (Figure 1A). Fractal cues (full-screen images, 17’’ LCD monitor, 50 cm in front of the monkey’s face) were presented for 2 s. The cues were immediately followed by a result epoch, which could include an outcome (food, airpuff) or no outcome, according to the probabilities associated with the cue. The beginning of the result epoch was signaled by one of three sounds that discriminated the three possible events: a drop of food, an airpuff, or no outcome. Trials were followed by a variable intertrial interval (ITI, monkey S: 3–7 s, monkey L: 4–8 s; Figure 1A). Recording and Data Acquisition During the acquisition of the neuronal data, two experimenters (M.J. and A.A.) controlled the vertical position of the eight glass-coated tungsten electrodes (confined with 1.65 mm guide) and real-time spike sorting (AlphaMap, ASD, AlphaOmega). Recorded units were subjected to offline quality analysis that included tests for rate stability, refractory period, waveform isolation, and recording time. First, firing rate as a function of time during the recording session was graphically displayed, and the largest continuous segment of stable data was selected for further analysis. Second, cells in which more than 0.02 of the total ISIs were shorter than 2 ms were excluded from the database. Third, only units with an isolation score (Joshua et al., 2007) above 0.8 (except for the DANs, in which we used a threshold of isolation score > 0.5) were included in the database. The lower threshold used for the DANs is due to the highly dense cellular structure of the SNc, which makes single cell isolation difficult. We also performed the analysis on the high-quality DANs (isolation score > 0.8) and received similar results to those reported. The largest segment for which two simultaneously recorded units fulfilled the inclusion criteria was included in the analysis database only if it was greater than 30 min. Quantification of Similarity of Temporal Profile of Neuronal Responses: Response Correlation Analysis For each cell and each behavioral event, we calculated the PSTH. Each of these PSTHs is an n-dimensional vector, where n is the number of 1 ms bins in the histogram (n = 2000 bins, starting at the event onset). This vector was smoothed with a Gaussian window (standard deviation [SD] = 20 ms). To avoid spurious positive correlations due to smoothing of the PSTHs, we padded the PSTH edges with the mirrors of the PSTHs before smoothing. Responses were considered significant if they exceeded the mean of the ITI three times the ITI SD (3 s rule) for 60 consecutive bins (three times the smoothing SD). To calculate the ITI SD, we randomly pruned the number of ITI trials to the same number of trials for which we calculated the PSTH. We determined the similarity of the responses of two cells to a behavioral event by calculating the correlation coefficient of the PSTHs. We denoted this correlation the response correlation. The response correlation was calculated only for PSTHs with significant responses. To obtain the population response correlation, we grouped all the correlation values, transformed them by a z-transform (Sokal and Rohlf, 1981), and calculated their mean and the standard error of the mean (SEM). The population mean and SEM were obtained by inverse z-transform of these values. For the response 702 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 59 correlation analysis, we used a time window of 2 s starting at the event onset. Because the neuormodulators have a short response, we also performed the analysis on a time window of 1 s, and this analysis gave similar results. Quantification of Similarity of Responses across Different Events: Signal Correlation Analysis For each neuron, we computed the PSTHs for all behavioral events (18 events). For this analysis we used the first five 100 ms bins (with no Gaussian smoothing) of the response. We combined all PSTHs into an 18 3 5 matrix, where each row was a task event and each column was a 100 ms bin. For each column, we subtracted that column’s mean and then flattened the matrix into a vector of length 90 (18 events 3 5 bins). For each pair of simultaneously recorded neurons, we computed the signal correlation by calculating the correlation coefficient of these vectors. For the population average and SEM we z-transformed the correlation coefficients (Sokal and Rohlf, 1981) calculated the average and SEM and obtained the inverse of the transform. The response and signal correlation were also calculated for pairs of neurons that were not simultaneously recorded and therefore were probably more remote than neurons recorded simultaneously. Analysis of nonsimultaneously recorded cells generated similar trends as the simultaneous ones (i.e., large positive correlations for the neuromodulators versus close to zero average correlations for the main axis); however, correlation values were generally smaller (data not shown). Quantification of the Temporal Dynamics of the Noise Correlation: JPSTH Analysis The JPSTH analysis quantifies the temporal dynamics of the modulation of correlations (Gerstein and Perkel, 1969; Aertsen et al., 1989). For this analysis, we calculated the raw JPSTH matrix in which the (t1,t2)-th bin was the count of the number of times that a coincidence occurred, in which neuron #1 spiked in time bin t1 and neuron #2 spiked in time bin t2 on the same trial (see examples in the first column of Figure S3). To correct for rate modulations we calculated the PSTH predictor (Aertsen et al., 1989). The predictor matrix is the product of the single-neuron PSTHs, i.e., the (t1,t2)-th bin is equal to PSTH1(t1)*PSTH2(t2) (see examples in the second column of Figure S3). The JPSTH was calculated as the subtraction of the number of coincident spikes expected by chance (PSTH predictor) from the raw matrix (see examples in Figure S3). The JPSTH was calculated in bins of 50 ms and smoothed with a two-dimensional Gaussian window with an SD of 50 ms (1 bin). We also corrected the raw JPSTH using the shift predictor. The different predictors gave the similar results and no trend was found when calculating the difference between these predictors (data not shown). We therefore concluded that the data did not suffer from long-lasting trends because such trends affected the shift predictor and the PSTH predictor differently. We preferred the use of the PSTH correction in the graphical displays in this manuscript because it results in less noisy estimates (Aertsen et al., 1989). In the text, JPSTH refers to the JPSTH corrected by the PSTH predictor. To group several JPSTHs from several events, we calculated the corrected JPSTH of each event separately and then summed all corrected JPSTHs. For example, the JPSTH for the reward cue is the sum of the corrected JPSTH of the three cues with different probabilities (p = 1/3, 2/3, 1) of receiving reward. We also normalized the JPSTH to obtain correlation coefficient values as introduced by Aertsen et al. (1989); i.e., each bin was divided by the SD of the trial to trial response. Population analysis of the normalized and nonnormalized (but corrected) JPSTH gave similar qualitatively results. In the text, JPSTH refers to the corrected but not normalized JPSTH. To test whether the population JPSTHs for two different events were significantly different, we performed a bin by bin paired t test. The surprise values were obtained by transforming the p value of this test by ln (p). We carried out JPSTH analysis for both the neuromodulators and main axis neurons; however, as we and others have shown, for the neurons of the main axis of the basal ganglia, JPSTH analysis might lead to false detection of correlation dynamics due to variability in the motor-related responses (Arkadir et al., 2002). Indeed many of the JPSTH matrices of the main axis neurons revealed significant marginal effects of the PSTH. This indicates that the PSTH and shift predictors were not able to correct the raw JPSTH reliably, and therefore we excluded the main axis populations from the JPSTH analysis. Results IV Neuron Basal Ganglia Correlations SUPPLEMENTAL DATA Supplemental Data include six figures and can be found with this article online at http://www.cell.com/neuron/supplemental/S0896-6273(09)00350-X. ACKNOWLEDGMENTS This study was partly supported by the Hebrew University Netherlands Association (HUNA)’s ‘‘Fighting against Parkinson,’’ the Vorst family foundation grants, FP7 ‘‘Select and Act’’ grant, and the Okinawa Institute of Science and Technology (OIST). Ben Shaul, Y., Bergman, H., Ritov, Y., and Abeles, M. (2001). Trial to trial variability in either stimulus or action causes apparent correlation and synchrony in neuronal activity. J. Neurosci. Methods 111, 99–110. Bergman, H., Feingold, A., Nini, A., Raz, A., Slovin, H., Abeles, M., and Vaadia, E. (1998). Physiological aspects of information processing in the basal ganglia of normal and parkinsonian primates. Trends Neurosci. 21, 32–38. Brody, C.D. (1999). Correlations without synchrony. Neural Comput. 11, 1537–1551. Calabresi, P., Centonze, D., Gubellini, P., Pisani, A., and Bernardi, G. (2000). Acetylcholine-mediated modulation of striatal function. Trends Neurosci. 23, 120–126. Accepted: April 28, 2009 Published: June 10, 2009 Centonze, D., Gubellini, P., Pisani, A., Bernardi, G., and Calabresi, P. (2003). Dopamine, acetylcholine and nitric oxide systems interact to induce corticostriatal synaptic plasticity. Rev. Neurosci. 14, 207–216. REFERENCES Cragg, S.J., and Rice, M.E. (2004). DAncing past the DAT at a DA synapse. Trends Neurosci. 27, 270–277. Abeles, M. (1982). Local Cortical Circuits (Berlin, Heidelberg, New York: Springer-Verlag). Aertsen, A.M., Gerstein, G.L., Habib, M.K., and Palm, G. (1989). Dynamics of neuronal firing correlation: modulation of ‘‘effective connectivity’’. J. Neurophysiol. 61, 900–917. Alexander, G.E., DeLong, M.R., and Strick, P.L. (1986). Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357–381. Aosaki, T., Tsubokawa, H., Ishida, A., Watanabe, K., Graybiel, A.M., and Kimura, M. (1994). Responses of tonically active neurons in the primate’s striatum undergo systematic changes during behavioral sensorimotor conditioning. J. Neurosci. 14, 3969–3984. Arbuthnott, G.W., and Wickens, J. (2007). Space, time and dopamine. Trends Neurosci. 30, 62–69. Arkadir, D., Ben Shaul, Y., Morris, G., Maraton, S., Goldber, J.A., and Bergman, H. (2002). False detection of dynamic changes. in pallidal neuron interactions by the Joint Peri-Stimulus Histogram method. In The Basal Ganglia VII, L.F.B. Nicholson and R.L.M. Faull, eds. (New York: Kluwer Academic/Plenum Publishers), pp. 181–190. Averbeck, B.B., and Lee, D. (2004). Coding and transmission of information by neural ensembles. Trends Neurosci. 27, 225–230. Day, J.J., Roitman, M.F., Wightman, R.M., and Carelli, R.M. (2007). Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028. DeLong, M.R. (1971). Activity of pallidal neurons during movement. J. Neurophysiol. 34, 414–427. Eggermont, J.J. (1990). The Correlative Brain. Theory and Experiment in Neuronal Interaction (Berlin: Springer-Verlag). Frank, M.J., Seeberger, L.C., and O’Reilly, R.C. (2004). By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943. Friston, K.J. (1995). Neuronal transients. Proc. Biol. Sci. 261, 401–405. Fukai, T., and Tanaka, S. (1997). A simple neural network exhibiting selective activation of neuronal ensembles: from winner-take-all to winners-share-all. Neural Comput. 9, 77–97. Gawne, T.J., and Richmond, B.J. (1993). How independent are the messages carried by adjacent inferior temporal cortical neurons? J. Neurosci. 13, 2758– 2771. Gerstein, G.L., and Perkel, D.H. (1969). Simultaneously recorded trains of action potentials: analysis and functional interpretation. Science 164, 828–830. Averbeck, B.B., Latham, P.E., and Pouget, A. (2006). Neural correlations, population coding and computation. Nat. Rev. Neurosci. 7, 358–366. Grace, A.A., and Bunney, B.S. (1983a). Intracellular and extracellular electrophysiology of nigral dopaminergic neurons—1. Identification and characterization. Neuroscience 10, 301–315. Baker, S.N., Philbin, N., Spinks, R., Pinches, E.M., Wolpert, D.M., MacManus, D.G., Pauluis, Q., and Lemon, R.N. (1999). Multiple single unit recording in the cortex of monkeys using independently moveable microelectrodes. J. Neurosci. Methods 94, 5–17. Grace, A.A., and Bunney, B.S. (1983b). Intracellular and extracellular electrophysiology of nigral dopaminergic neurons—3. Evidence for electrotonic coupling. Neuroscience 10, 333–348. Baker, S.N., Spinks, R., Jackson, A., and Lemon, R.N. (2001). Synchronization in monkey motor cortex during a precision grip task. I. Task-dependent modulation in single-unit synchrony. J. Neurophysiol. 85, 869–885. Bar-Gad, I., Ritov, Y., Vaadia, E., and Bergman, H. (2001). Failure in identification of overlapping spikes from multiple neuron activity causes artificial correlations. J. Neurosci. Methods 107, 1–13. Bar-Gad, I., Morris, G., and Bergman, H. (2003). Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Prog. Neurobiol. 71, 439–473. Barbeau, A. (1962). The pathogensis of Parkinson’s disease: A new hypothesis. Can. Med. Assoc. J. 87, 802–807. Bartho, P., Hirase, H., Monconduit, L., Zugaro, M., Harris, K.D., and Buzsaki, G. (2004). Characterization of neocortical principal cells and interneurons by network interactions and extracellular features. J. Neurophysiol. 92, 600–608. Bayer, H.M., and Glimcher, P.W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141. Bayer, H.M., Lau, B., and Glimcher, P.W. (2007). Statistics of midbrain dopamine neuron spike trains in the awake primate. J. Neurophysiol. 98, 1428–1439. Graybiel, A.M., Aosaki, T., Flaherty, A.W., and Kimura, M. (1994). The basal ganglia and adaptive motor control. Science 265, 1826–1831. Haber, S.N., and Gdowski, M.J. (2004). The basal ganglia. In The Human Nervous System, G. Paxinos and J.K. Mai, eds. (Amsterdam: Elsevier), pp. 676–738. Horvitz, J.C. (2000). Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96, 651–656. Joshua, M., Elias, S., Levine, O., and Bergman, H. (2007). Quantifying the isolation quality of extracellularly recorded action potentials. J. Neurosci. Methods 163, 267–282. Joshua, M., Adler, A., Mitelman, R., Vaadia, E., and Bergman, H. (2008). Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J. Neurosci. 28, 11673–11684. Joshua, M., Adler, A., Rosin, B., Vaadia, E., and Bergman, H. (2009). Encoding of probabilistic rewarding and aversive events by pallidal and nigral neurons. J. Neurophysiol. 101, 758–772. Kakade, S., and Dayan, P. (2002). Dopamine: generalization and bonuses. Neural Netw. 15, 549–559. 60 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 703 Results IV Neuron Basal Ganglia Correlations Kimura, M., Rajkowski, J., and Evarts, E. (1984). Tonically discharging putamen neurons exhibit set-dependent responses. Proc. Natl. Acad. Sci. USA 81, 4998–5001. Perkel, D.H., Gerstein, G.L., and Moore, G.P. (1967). Neuronal spike trains and stochastic point processes. II. Simultaneous spike trains. Biophys. J. 7, 419–440. Kimura, M., Matsumoto, N., Okahashi, K., Ueda, Y., Satoh, T., Minamimoto, T., Sakamoto, M., and Yamada, H. (2003). Goal-directed, serial and synchronous activation of neurons in the primate striatum. Neuroreport 14, 799–802. Pisani, A., Bernardi, G., Ding, J., and Surmeier, D.J. (2007). Re-emergence of striatal cholinergic interneurons in movement disorders. Trends Neurosci. 30, 545–553. Lau, B., and Glimcher, P.W. (2007). Action and outcome encoding in the primate caudate nucleus. J. Neurosci. 27, 14502–14514. Plenz, D. (2003). When inhibition goes incognito: feedback interaction between spiny projection neurons in striatal function. Trends Neurosci. 26, 436–443. Lee, D., Port, N.L., Kruse, W., and Georgopoulos, A.P. (1998). Variability and correlated noise in the discharge of neurons in motor and parietal areas of the primate cortex. J. Neurosci. 18, 1161–1170. Raz, A., Feingold, A., Zelanskaya, V., Vaadia, E., and Bergman, H. (1996). Neuronal synchronization of tonically active neurons in the striatum of normal and parkinsonian primates. J. Neurophysiol. 76, 2083–2088. Lewicki, M.S. (1998). A review of methods for spike sorting: the detection and classification of neural action potentials. Network 9, R53–R78. Mink, J.W. (1996). The basal ganglia: focused selection and inhibition of competing motor programs. Prog. Neurobiol. 50, 381–425. Morris, G., Arkadir, D., Nevet, A., Vaadia, E., and Bergman, H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43, 133–143. Morris, G., Nevet, A., Arkadir, D., Vaadia, E., and Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 1057– 1063. Moss, J., and Bolam, J.P. (2008). A dopaminergic axon lattice in the striatum and its relationship with cortical and thalamic terminals. J. Neurosci. 28, 11221–11230. Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., and Hikosaka, O. (2004). Dopamine neurons can represent context-dependent prediction error. Neuron 41, 269–280. Nevet, A., Morris, G., Saban, G., Arkadir, D., and Bergman, H. (2007). Lack of spike-count and spike-time correlations in the substantia nigra reticulata despite overlap of neural responses. J. Neurophysiol. 98, 2232–2243. Pan, W.X., Schmidt, R., Wickens, J.R., and Hyland, B.I. (2005). Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242. Parent, A., and Hazrati, L.N. (1995). Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamo-cortical loop. Brain Res. Brain Res. Rev. 20, 91–127. Redgrave, P., and Gurney, K. (2006). The short-latency dopamine signal: a role in discovering novel actions? Nat. Rev. Neurosci. 7, 967–975. Schneidman, E., Bialek, W., and Berry, M.J. (2003). Synergy, redundancy, and independence in population codes. J. Neurosci. 23, 11539–11553. Schneidman, E., Berry, M.J., Segev, R., and Bialek, W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440, 1007–1012. Schultz, W. (1986). Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. J. Neurophysiol. 56, 1439–1461. Schultz, W. (1997). Dopamine neurons and their role in reward mechanisms. Curr. Opin. Neurobiol. 7, 191–197. Schultz, W. (1998). Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27. Schultz, W., Dayan, P., and Montague, P.R. (1997). A neural substrate of prediction and reward. Science 275, 1593–1599. Sokal, R.R., and Rohlf, F.J. (1981). Biometry (New York: W.H. Freeman & Co.). Tunstall, M.J., Oorschot, D.E., Kean, A., and Wickens, J.R. (2002). Inhibitory interactions between spiny projection neurons in the rat striatum. J. Neurophysiol. 88, 1263–1269. Vaadia, E., Haalman, I., Abeles, M., Bergman, H., Prut, Y., Slovin, H., and Aertsen, A. (1995). Dynamics of neuronal interactions in monkey cortex in relation to behavioral events. Nature 373, 515–518. Percheron, G., and Filion, M. (1991). Parallel processing in the basal ganglia: up to a point. Trends Neurosci. 14, 55–56. Yanai, Y., Adamit, N., Harel, R., Israel, Z., and Prut, Y. (2007). Connected corticospinal sites show enhanced tuning similarity at the onset of voluntary action. J. Neurosci. 27, 12349–12357. Percheron, G., Yelnik, J., and Francois, C. (1984). A Golgi analysis of the primate globus pallidus. III. Spatial organization of the striato-pallidal complex. J. Comp. Neurol. 227, 214–227. Zohary, E., Shadlen, M.N., and Newsome, W.T. (1994). Correlated neuronal discharge rate and its implications for psychophysical performance. Nature 370, 140–143. 704 Neuron 62, 695–704, June 11, 2009 ª2009 Elsevier Inc. 61 Results V Journal of Neuroscience Methods 163 (2007) 267–282 Quantifying the isolation quality of extracellularly recorded action potentials Mati Joshua a,∗ , Shlomo Elias b , Odeya Levine b , Hagai Bergman a,b,c a The Interdisciplinary Center for Neural Computation, The Hebrew University, Jerusalem 91904, Israel Department of Physiology, The Hebrew University-Hadassah Medical School, Jerusalem 91120, Israel c Eric Roland Center for Neurodegenerative Diseases, The Hebrew University, Jerusalem 91904, Israel b Received 11 April 2006; received in revised form 18 March 2007; accepted 18 March 2007 Abstract There have been many approaches to the problem of detection and sorting of extra-cellularly recorded action potentials, but only a few methods actually quantify the quality of this fundamental process. In most cases, the quality assessment is based on the subjective judgment of human observers and the recorded units are divided into “well isolated” or “multi-unit” groups. This subjective evaluation precludes comprehensive assessment of single-unit studies since the most basic parameter, i.e. their data quality, is not explicitly defined. Here we propose objective measures to evaluate the quality of spike data, based on the time-stamps of the detected spikes and the high-frequency sampling of the analog signal of cortical and basal-ganglia data. We show that quantification of recording quality by the signal-to-noise ratio (SNR) may be misleading. The recording quality is better assessed by an isolation score that measures the overlap between the noise (non-spike) and the spike clusters. Furthermore, we use a nearest-neighbors algorithm to estimate the proportion of false positive and false negative classification errors. To validate these quality measures, we simulate spike detection and sorting errors and show that the scores are good predictors of the frequency of errors. The reliability of the isolation score is further verified by errors implanted in real basal ganglia data and by using different sorting algorithms. We conclude that quantitative measures of spike isolation can be obtained independently of the method used for spike detection and sorting, and recommend their reports in any study based on the activity of single neurons. © 2007 Elsevier B.V. All rights reserved. Keywords: Spike sorting; Multi-unit recording; Signal detection 1. Introduction The problem of extracting single neuron activity from extracellular recordings has been investigated extensively and comprehensively reviewed (e.g. Lewicki, 1998). The process of detecting action potentials from the extracellular waveforms (spikes) and clustering them into different neuronal sources is known as spike detection and sorting. Spike detection and sorting algorithms are not perfect and classification errors can occur for a number of reasons. First, most algorithms are not fully automatic (e.g. Abeles and Goldstein, 1977; Worgotter et al., 1986; Bergman and DeLong, 1992) and their real-time use can lead to human errors (Wood et al., 2004). Second, inaccurate assumptions about the data can also lead to errors. Some algo- ∗ Corresponding author. Tel.: +97226757168. E-mail address: [email protected] (M. Joshua). 0165-0270/$ – see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jneumeth.2007.03.012 62 rithms presuppose a parametric statistical model (Lewicki, 1994; Pouzat et al., 2002, 2004; Shoham et al., 2003), whereas other algorithms are based on non-parametric assumptions (Fee et al., 1996a). In both cases these assumptions, whether explicit or implicit, may be violated. For example, the analog trace in Fig. 1 shows significant modulation of the spike waveforms and illustrates how the stationarity (waveform stability) assumption may be violated and thus lead to classification errors. Although many approaches to the problem of sorting spikes have been put forward, only a few methods have been developed to quantify the quality of the spike sorting (Harris et al., 2001; Pouzat et al., 2002; Schmitzer-Torbert et al., 2005). In most cases, the quality assessment is done subjectively by a human observer, and units with high scores are then reported as having a “high signal-to-noise ratio” and being “well isolated”. These subjective reports do not permit comparison of data quality across different studies and unfortunately are predisposed to personal bias. Results V 268 M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 Fig. 1. An extreme example of non-stationarity of extracellular recording. Instability in the extracellular recording can lead to misclassifications by the spike sorting algorithm. (a–d) A single trace of the extracellular recording, at different time scales (b depicts the arrow-marked interval in (a), etc.). In (c and d) spikes detected by the real-time spike sorter are marked by black dots; high noise events (events that crossed threshold but not classified as spikes) are marked by gray triangles. (e) Average of squared peak-to-peak differences of spike waveform, over all time intervals as a function of time starting from the real-time detection. This is a gross measure of the change in spike waveform shape in time, similar to the autocorrelation function. Note the large periodic changes at 0.66 Hz and small periodic changes at ∼3 Hz. These are probably due to periodic changes in electrode position, caused by respiratory (∼40 min−1 ) and cardiac (∼180 min−1 ) waves, respectively. (f) Events classified in real-time as a single unit are in black; the noise events are in gray. Note that the noise cluster contains two different classes of events. One class forms a smooth continuum with the boundaries of the spike cluster (probably missed spikes). The other class can be dissociated from the spike cluster due to its smaller waveforms (probably spikes from other units). The scores of this unit are: isolation score, 0.93; false negative score, 0.12; false positive score, 0.002; SNRNo Spk , 5.35; SNRSpk , 5.26. The false negative score is suggestive of the instability shown above. In this article we propose objective measures to assess the quality of spike detection and sorting. Our measures quantify two different aspects of the data: 1. Quality of the recording, by calculating SNR (Section 3.1). We present and discuss two calculations of the SNR that differ in their noise estimation. The first is based on the noise when an action potential occurs and the second is based on the noise between action potentials. 2. Clustering quality. We introduce an isolation score for quantifying the overlap between the spike and the noise (non-spike) clusters (Section 3.2). We then present classification error scores that estimate the fraction of events that were misclassified as spikes (false positive errors) or misclassified as noise-events (false negative errors) (Section 3.3). To validate these measures, we simulate spike-sorting errors (Section 3.4.1) and test the isolation score and classification error scores as a function of the fraction of simulated errors for different units. We check the scores under different conditions by applying several clustering algorithms (Section 3.4.2). We use real data from the basal ganglia and simulated errors to investigate the score parameter space (Section 3.4.4). Finally, we compare the results of the different scores (Section 3.4.5). 2. Methods 2.1. Neuronal recording procedures The data were collected from experiments performed on two vervet monkeys (Monkey Cu: Cercopithecus aethiops, female, weighing 3.5–4 kg and monkey T, female, weighing 3 kg) and 63 Results V M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 two Macaque fascicularis (monkey Y, male, weighing 5 kg and monkey P, female, weighing 3 kg). Details of the behavior of the monkeys and animal care are described elsewhere (Heimer et al., 2002; Morris et al., 2004; Elias et al., 2007). Recordings were made in the external segment of the globus pallidus (GPe), a central nucleus of the basal ganglia and in the primary motor cortex (M1, monkey T only). Animal care and surgical procedures were in accordance with the NIH Guide for the Care and Use of Laboratory Animals (1996) and the Hebrew University guidelines for the use and care of laboratory animals in research, supervised by the institutional committee for animal care and use. During the recording sessions, glass-coated tungsten microelectrodes (impedance at 1 kHz equals 0.2–0.6 M) were advanced to the target. Neuronal activity from each electrode was amplified (monkey P Y and T × 5000, monkey Cu × 10,000), filtered (monkey P and Y: 1–6000 Hz, monkey Cu and T: 300–6000 Hz), and continuously sampled at 24 kHz/electrode (AlphaMap, Alpha-Omega Engineering, Nazareth, Israel). 269 a score of 1 meant perfect isolation, where the experimenter judged that close to 100% of the spikes emitted by a single neuron were detected with no false detections (zero false positive and negative errors). Scores between 3 and 4 meant that most (but not all) spikes generated by a neuron were detected (small fraction of false negative errors), but still with a negligible fraction of false detections (“no false positive errors”). Scores of 5–6 meant a mixture of two to three units. Finally, a grade of 7–8 meant a recording of multi-unit activity (significant fraction of false negative and positive errors). 2.3. Algorithm development All the functions used for both the analysis and algorithm implementations are Matlab 7.1 (Mathworks, Natick, MA, USA) compatible, and are available at: http://alice.nc.huji.ac.il/∼mati/ sorting quality programs. 2.4. Data preprocessing and event representation 2.1.1. Real-time spike detection and sorting algorithm The electrode output was processed and classified in real time (MSD, ASD, Alpha-Omega Engineering) by a templatematching algorithm (Worgotter et al., 1986). The electrode signal was continuously sampled at 36–50 kHz, placed in a buffer containing the last 100 samples (2–2.8 ms), and compared continuously with one to three templates. Each template was constructed of eight equally spaced points separated by five sampling points (e.g. 0.1 ms for the 50 kHz sampling), and was defined by the experimenter following a learning process of threshold crossing signals. The sum of squares of the differences between eight points in the buffer and the templates was calculated. When this sum reached a minimum that was below a user-defined threshold, detection was hardware reported. In cases where a buffer was double matched (e.g. a signal passed the criterion of more than one template), an error signal was given to the user, but no hardware report was created. A dead time of 0.06 ms followed detection. The timing of the hardware detections (100 s active-low TTL pulses) was edge sampled at 12 kHz (33 kHz in monkey T) in parallel with the analog signals of the electrode output. During recording sessions, the experimenter closely followed the spike shape and discharge rate. The experimenter graded the isolation quality approximately every 2–4 min (see below) and when necessary adjusted the template, detection threshold, or rarely the electrode position. The initial input for the estimation of the spike isolation quality consisted of the time stamps of the detected spikes (spike trains) and the entire analog signal. We defined two clusters of events: (i) spike cluster – a cluster classified as a single unit and (ii) noise cluster – a cluster of events not classified with that unit. The spike cluster was simply constructed from segments of the analog signal according to the spike trains. The noise cluster was extracted from the same analog trace, and contained events that were not detected as spikes of the given unit. However, the noise events had some similarity to the events in the spike clusters (e.g. similar amplitude, see details below). Each event, in both spike and noise clusters, was represented as a point in a high-dimensional space. Fig. 2 and Sections 2.4.1–2.4.3 depict step by step the extracting of the spike and noise clusters. 2.2. Real-time grading of isolation quality 2.4.2. The spike cluster To represent an event we used 1.5 ms (144 sampling points after the cubic spline interpolation) of the corresponding upsampled analog trace. The resulting vector, whose ith value is the voltage measured after i time steps from the beginning of the event, can be viewed as a point in a high-dimensional space: In most cases one experimenter controlled the position and spike sorting of four electrodes. The quality of the detection and spike sorting was estimated on-line experimenter. This quality estimation was based on the superimposed analog traces of the recently (20–100) sorted spikes as well as the waveforms of events that passed an amplitude threshold that was set by the experimenter but were not classified as spikes. The grade scale ranged from 1 (highest score) to 8 (lowest score). Generally, 64 2.4.1. Up-sampling using cubic spline Discrete sampling of analog traces leads to a time jitter between superimposed frames of events (Fee et al., 1996a; Pouzat et al., 2002). This jitter contributes to the variability in extracellular waveforms from the same cell (Fig. 2c). To reduce this variability, we up-sampled the data using cubic spline interpolation (Fig. 2d). The factor by which we up-sampled the data was fixed at 4; using this value for the up-sampling factor had a maximal effect on the reduction of variability between our data waveforms (data not shown). = (V1 , V2 , . . . , V144 ) X (1) All events were aligned by the largest negative peak. The offset of this peak from the beginning of the event was set to 0.5 ms Results V 270 M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 Fig. 2. Preprocessing. The process of extracting the spike and noise clusters. (a) The raw analog trace (1–6000 Hz band-pass hardware filtered and digitally sampled at 24 kHz). (b) Analog trace after filtering with a digital high-pass filter (>300 Hz, two pole Butterworth filter, a zero-phase forward and reverse digital filter, Matlab filtfilt function). (c) Superimposed waveforms of spikes extracted according to classification by the spike sorter (spike cluster), and aligned to the largest negative peak (before the spline upsampling). Note the large variability during the fast phase of the action potential that results from the limited sampling rate. (d) Spike cluster after upsampling the events and realignment to the negative peak revealing reduction in variability compared to (c). (e) The noise cluster, detected by threshold crossing. The same upsampling and alignment process was used. The scores of this unit are: isolation score, 0.98; false negative score, 0.01; false positive score, 0; SNRNo Spk , 2.57; SNRSpk , 2.52. (i.e. V48 ). Our extracellular recordings are from neurons in the GP and the primary motor cortex with a large negative phase. Hence, we used the largest negative peak for alignment of the spike vector (however one can easily generalize this algorithm to positive peaks). Finally, the aligned vector was normalized to have a zero mean. The cluster of up-sampled, aligned and normalized spike events is denoted as Scluster . 2.4.3. The noise cluster Detection of the events comprising the noise cluster was based on threshold amplitude crossing. Because of the typical spike shape in our extra-cellular recordings, we only used a negative (lower) threshold to detect the noise cluster (however one can easily generalize this algorithm to upper or dual, i.e. upper and lower, threshold crossing events). The noise cluster was constructed in the following manner. First we selected from Scluster the 2% of the spikes with the smallest negative peak (closest to zero).We then took the average of these negative peaks and defined the threshold as half of this value: threshold = average negative peak0.02 2 (2) where 0.02 is the fraction of spikes used for calculation of the average negative peak. Next, we identified all events that crossed this threshold, but removed the events already marked as spikes. Finally, we upsampled and aligned the noise events (0.5 ms offset similar to the spike waveforms) by the local minimum between the first two (down and up) threshold crossings (Fig. 2e). The noise cluster models all high amplitude events that are not classified as spikes from the given unit. The noise-cluster should contain all unclassified putative spikes; i.e. events that are close, but not in, the spike cluster. This is achieved by using only the Sclsuter events with the smallest negative peaks to determine the threshold. The spike sorting quality measures are insensitive to inclusion of more noise event crossings; i.e. with a more conservative threshold. To verify this insensitivity we modified the threshold parameters and found that the quality measures were stable (see below). Therefore, we recommend the use of a conservative threshold that ensures that the noise cluster contains putative spikes even when the noise cluster is overly large. 3. Results Spike detection and sorting quality depends first on the recording quality and then on the quality of the clustering algorithm. To evaluate recording quality we used the signal-tonoise ratio (SNR) (Section 3.1). Although the SNR can be used for initial estimation of recording quality and a high SNR is usually a necessary condition for good unit isolation, the SNR is not a direct measure of the isolation of a single unit. Sorting of recordings with a high SNR may nonetheless result in a spike cluster that excludes spikes (false negative errors, e.g. Fig. 1) or a cluster composed of two large units (false positive errors). We therefore applied more direct measures of cluster quality by measuring the isolation of the spike cluster from the noise cluster: the isolation score (Section 3.2), and false positive and false negative measures (Section 3.3). In Section 65 Results V M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 3.4 we compare and validate the scores under different sorting methods and simulated error frequencies. 3.1. Signal-to-noise ratio measures Several previous studies have taken a initial step towards assessing the quality of spike data by reporting some variants of the spikes SNR (Pare and Gaudreau, 1996; Likhtik et al., 2005). However, there is no explicit definition of the SNR in these reports, making them very difficult to compare. The spike signal-to-noise ratio can be computed in two ways. Both methods compute the signal in the same fashion but differ in their noise calculation. 3.1.1. Signal calculation The average of Scluster (up-sampled and aligned by the negative peak) is defined as: 1 X (3) Savg ≡ |Scluster | x ∈ Scluster We quantify the signal as the difference between the minimum and the maximum of the average spike waveform (Fig. 3a): peak-to-peak ≡ Max(Savg ) − Min(Savg ) (4) We prefer using the peak-to-peak to quantify the signal rather than other methods that integrate the area enclosed by the spike 271 waveform (spike energy). This is because the peak-to-peak signal value does not depend on the duration of the spike waveform, which is conditioned by the filter and the edge detection parameters. 3.1.2. Noise calculation We quantified noise in two ways: 1. The noise underlying the spike events, which corresponds to the intra-cluster variability (Fig. 3b). For each spike waveform, Xk , we subtracted the mean waveform, Savg , to produce Residk . We then concatenated all resulting residuals to produce one long vector Resid. The noise is then defined as the standard deviation of this vector: NoiseSpk ≡ S.D.(Resid) (5) Since our filters exclude the low frequencies and we use segments larger than most spikes, we can disregard the increase in variability that may be created by the concatenation process. 2. The noise from the inter-spike-intervals (Fig. 3c), where spikes are defined as events in the signal cluster. For each spike in the signal cluster, we extract the 1.5–3 ms period before the spike event negative peak, Prevk (Fig. 3c), unless another spike from the cluster occurred in that interval. We then concatenate all such intervals to produce one long vector Fig. 3. Signal-to-noise ratio calculation. (a) Average spike waveform (solid line) and peak-to-peak (dotted lines). (b) NoiseSpk . For a given spike event (dashed gray line), we subtracted the average spike waveform (dotted black line) which results in the noise during the spike event (the solid black line). We then concatenated segments from all spikes and calculated the standard deviation of this vector. (c) NoiseNo Spk . For a given spike we extracted the analog trace 1.5–3 ms before the negative peak of the spike event (between the dashed lines). We then concatenated all these traces from all spikes and calculated the standard deviation of this vector. (d) SNRNo Spk vs. SNRSpk . The SNR scores are the ratios between the peak-to-peak and the noise estimations (S.D. × 5). The SNR scores for 155 GP units. Scores are highly correlated (R2 = 0.94). Generally, for high values—SNRNo Spk is larger than SNRSpk (large values tend to be above the line Y = X), as changes in the waveform that contribute to NoiseSpk but not to NoiseNo Spk are more likely when the electrode is close to the cell. On the other hand, when the scores are small, SNRSpk is larger (small values are below the Y = X line), probably due to failure in detecting overlapping spikes. 66 Results V 272 M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 Prev. The noise is then defined equivalently: NoiseNo Spk ≡ S.D.(Prev) noise), Y: (6) The two SNR scores are highly correlated in our data (Fig. 3d). There are some cases where these two measures are not equal. When the waveform of a single unit changes (e.g. due to electrode drift, or intrinsic firing properties), or when the spike cluster actually reflects multi-unit instead of single-unit activity, NoiseSpk will be larger than NoiseNo Spk . Surprisingly, the opposite can also occur. For example, when the spike of a second unit temporally overlaps with the spike of the given unit, the sorting algorithm may drop these spikes (Bar-Gad et al., 2001). As a result, the second unit will contribute only to NoiseNo Spk as its coincidence with the first spike is ignored. 3.1.3. Signal-to-noise ratio We define the two signal-to-noise ratios as simply: SNR ≡ peak-to-peak Noise × C (7) where Noise is calculated by one of the two methods and C is a scaling factor (commonly set by us to 5) which scales the noise measures to peak-to-peak equivalent units. Examples of spikes and their SNR measures are given in Figs. 1, 2, 6 and 8. 3.2. Isolation score As stated above, SNR might be problematic, especially in cases where the spike cluster actually reflects high amplitude multi-unit activity. We therefore then assessed the quality of the spike isolation directly. The isolation score quantifies the distance between the spike cluster and the noise cluster. We computed this distance on the raw events directly, without mapping the spikes to some feature space, e.g. PCA (Abeles and Goldstein, 1977) or wavelet transform (Quiroga et al., 2004; Nenadic and Burdick, 2005). 3.2.1. Mandatory features of the isolation score The isolation score needs to exhibit several critical properties: 1. The score should decrease with the number of real spikes missed by the sorting algorithm (false negatives). 2. The score should decrease with the number of noise events that were classified as spikes (false positives). 3. The score should be insensitive to the size of the extracted noise cluster. 4. The score should span an intuitive range, e.g. 0–1. 3.2.2. Isolation score: definition The isolation score quantifies the distance between events in the spike cluster to the noise cluster. Nevertheless, since we are only interested in the spike cluster events, this measure is not symmetric. First, we compute the normalized similarity between each event in the spike cluster, X, to all other events (spikes and Similarity(X, Y ) ≡ exp −d(X, Y )λ d0 (8) where d(X,Y) is the Euclidean distance between vectors X,Y. Note that Similarity(X,Y) between close events is close to one (exp(0)), whereas between distant events it is closer to zero (exp(−∞)). d0 is the average Euclidian distance in the spike cluster; this parameter normalizes the Euclidian distance to avoid dependence on the units of a particular data set. The exponent function stretches the Euclidean distance nonlinearly; thus Similarity(X,Y) of remote events become infinitesimally small. λ is a gain constant (0 < λ ∞) that sets the gain of this stretch. With λ 1, all events are similar and Similarity(X,Y) is close to one, whereas with λ 1, all events are dissimilar and Similarity(X,Y) become infinitesimally small. In order to turn the above similarity index into a probabilitylike quantity (positive values that sum to 1), we normalize it by the sum of similarities between a given event, X, from the spike cluster, to all other events (spikes and noise): exp(−d(X, Y )(λ/d0 )) Z=X exp(−d(X, Z)(λ/d0 )) PX (Y ) ≡ (9) For each event X we get a function PX that takes the form of the Boltzmann–Gibbs distribution, also known as “softmax” (Goldberger et al., 2004). Note that the parameter λ controls the “softness” of the max operation; i.e. λ behaves like 1/temperature in some notations of the softmax equation. For a given event X when λ approaches infinity (zero temperature) PX (Y) is the deterministic probability function; i.e. PX (Y) = 1 for the event nearest X and zero for all other events. On the other hand when λ approaches zero (maximal temperature) PX (Y) is the uniform distribution; i.e. PX (Y) is equal for all events. In this manuscript we used λ = 10; i.e. we stretched the distances between near and remote events. In the next step, for each event in the spike cluster, X, we sum over all the normalized similarity values PX (Y) for all the Y’s in the spike cluster: PX (Y ) (10) P(X) ≡ Y ∈ Scluster P(X) is therefore a measure of how close event X is to the spike cluster compared to the noise cluster. Intuitively, P(X) is the probability that event X belongs to the spike cluster. The calculation of P(X) is illustrated in Fig. 4 (note that P(X) and PX (Y), Eqs. (10) and (9), respectively, are not the same). The isolation score is defined as: 1 isolation score ≡ P(X) (11) |Scluster | X ∈ Scluster and can be intuitively considered as the average probability that an event classified as a spike belongs to the spike cluster. Thus, our isolation score is a combination of two approaches: 1. Quantifying the connectivity between two clusters using the energy at the interface of the two groups (Fee et al., 1996a). 67 Results V M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 273 tant from the spike cluster, contributes only small additional values to PX (Y). 4. The isolation score is the average of probability-like values and hence is bounded between 0 and 1. It is crucial to note that the isolation score does not measure the distance between the noise and spike distributions directly. Nor does it directly measure the performance of the clustering procedure. Rather, it measures how far away the noise and the spike clusters are. It is similar to the gap measure common in classification discussions, except that it recognizes that there is no real gap between the two clusters. 3.3. Scores of classification errors Fig. 4. Calculation of the isolation score. Calculating the proximity of a spike to the spike cluster, relative to the noise cluster. This figure is a schematic representation of the isolation score calculation. The x–y coordinates represent the 144 dimensions of a waveform from the spike and noise cluster. The gray triangles represent points in the noise cluster, whereas the black squares represent spike events in the spike cluster. For a given point X in the spike cluster (black oval), the numbers next to each of the other points, Y, are PX (Y). The arrows denote the Euclidian distance. Finally, P(X) for the given point (black oval), is the sum of all PX (Y) values for all other spike events (black squares). Note that for events far from X, PX (Y) is infinitesimal, and hence they have only a small influence on the P(X). On the other hand, noise events that are close to the spike cluster significantly decrease the P(X) values (e.g. gray triangle in the upper right-hand corner). 2. Grading the distance of two events using the “softmax over Euclidean distances” function (Goldberger et al., 2004). The range of the isolation score is from 0 to 1, where a score of 1 means ideal isolation, with minimal distances between the elements of the spike cluster, and a large distance between them and all the elements of the noise cluster. A score close to zero means very poor isolation, where the Euclidian distance among elements in the spike cluster is larger than the Euclidian distances between them and the noise cluster; i.e. elements from the spike cluster are surrounded by elements from the noise cluster. The isolation score satisfies the requirements defined in Section 3.2.1: 1. Spikes that were missed by the spike-detection or sorting algorithm (false negatives) are nonetheless close to the spike cluster. As a result, events in the cluster that are close to such misses will have a reduced P(X) (Fig. 4), which in turn will reduce the overall isolation score. 2. Likewise, noise events that were classified as spike events (false positives) are close to the noise cluster. For these false positives the P(X) value is reduced, due to their proximity to the other noise events, thus again reducing the overall isolation score. 3. The isolation score is insensitive to the size of the noise cluster. This is a result of the exponential decay of the similarity value, PX (Y), between a spike event X and a distant event Y. Therefore, adding more noise events, which are mostly dis- 68 As described in the previous Section 3.2, the isolation score quantifies the separation of the spike cluster from other events, but it does not estimate the number of spikes missed by the spike detection and sorting process or the number of noise events that were erroneously classified as spikes (false negative and positive errors, respectively). Moreover, the isolation quality measure cannot separate these errors. However, some physiological studies are more sensitive to one of the two errors and therefore their separate estimates may provide a better database for these experiments. In this section we describe a method for estimating these errors. For each event we find its K nearest neighbors (KNN) (Vapnik, 1998) and compare the classification of the majority of these neighbors to the event classification (produced by the sorting algorithm). This method is illustrated in Fig. 5. 3.3.1. False negatives score False negatives are spikes that were missed by the spike detection or sorting algorithm. We estimate these by the number of noise events having most of their K nearest neighbors (see below Fig. 5. Illustration of the KNN algorithm for estimating classification error scores. The x–y coordinates represent the 144 dimensions of a waveform from the spike and noise cluster. Black squares represent spike events, gray triangles represent noise events. The notations are similar to those used in Fig. 4. For each event we calculated the K nearest neighbors (KNN, here K = 3). Spike events having most of their KNN from the noise cluster are considered false positives; similarly noise events with a majority of their KNN from the spike cluster are considered false negatives. Results V 274 M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 for the choices of K) from the spike cluster, denoted Nfn . The false negatives score is thus defined as: false negatives score ≡ Nfn Nfn + |Scluster | (12) When the number of false negatives is small the score is close to zero. The score then increases with the number of false negatives. When all the real spike events are missed (Nfn |Scluster |) the score reaches the maximum of 1. However, in practical terms the score cannot reach this limit (see below Section 3.3.4). 3.3.2. False positives scores Similarly, we estimate the number of false positive events (events that were classified as spike events but are noise events): false positives score ≡ Nfp |Scluster | (13) where Nfp is the number of spike events having most of their K nearest neighbors as noise events. When the number of false positives is small the score is close to zero. As this number increases the score increases. When all spike cluster events are surrounded by noise events (Nfp = |Scluster |) the score reaches the maximum of 1. 3.3.3. Choosing K Choosing inappropriate values of K leads to biases in the classification error scores. For example, if too small a value is chosen for K, false negative events may erroneously lead their correctly classified spike-event-neighbors to be considered as false positives. Likewise, to take an extreme example, when the spike cluster is larger than the noise cluster, using a K value that is larger than twice the size of the noise cluster will cause all noise events to be considered false negatives. Generally large values of K may lead to biased estimations of error rates of events that are close to the boundaries of the clusters. In this study, we selected an intermediate value for K, such that a small number of clustering errors did not cause a large bias, and the K value was far smaller than the size of both clusters. Typically, our validation tests were performed on clusters that contained 1500 events, using K = 31. In our experience, a good rule of thumb is that K should equal 1–5% of the number of events in the signal cluster. 3.3.4. Using the classification error scores The classification error scores are a refinement of the isolation score. These false positive and negative estimates may help constrain neuronal data analysis. For example, the existence of false positives is one reason one should not expect to find a perfectly oscillatory cell, or should not be surprised by multiparameter encoding of a single neuron. Naı̈ve use of these error scores, however, may be misleading. When the spike and noise clusters overlap highly these scores are biased (Figs. 6b and c, 7c and d and 9c and d). Thus, when a large ratio of the spike events are missed, real spike events from the spike cluster may have most of their KNN from the noise cluster and hence be considered false positives; for the same reason the estimation of false negatives will be low. This is the reason we argued (Section 3.3.1) that the false negative score will not reach its theoretical upper limit of 1. Similarly, real noise events may have most of their KNN from the spike cluster and hence be considered false negatives. Nonetheless, the isolation score measures the overlap between the spike and the noise clusters. When the isolation score is high the classification error scores are good estimates of the frequencies of false positive and negative errors and can be used. When the isolation score is low, the error classification scores are biased; however low isolation scores should dissuade us from using the data and therefore further refinement of the errors is unnecessary. 3.4. Validation of the isolation scores by simulation and real data 3.4.1. Random simulation of false negative and false positive errors To test the efficiency of the various scores we simulated spike-sorting errors and calculated the isolation and the classification error scores (Fig. 6). The error simulation was carried out by modifying well-sorted data (original isolation quality >0.99, less than 1% false errors) of four real-time detected GPe neurons with different signal-to-noise ratios (Fig. 6a). To validate the quality of the real-time sorting of the selected units we further examined the data using the off-line PCA method and also checked for inconsistency by screening of the analog signal. To simulate false negative errors we eliminated spike events from the spike cluster and marked them as noise events (Fig. 6b). The independent variable was the ratio between the number of eliminated (i.e. missed) spikes and the real number of spikes (false negative ratio). Zero means no false negatives were generated and 1 means all spikes are classified as noise events. As expected the isolation score was close to 1 when the ratio of missed events was 0, and dropped to 0.5 when the missed ratio was 0.5 (Fig. 6b1). We conclude that the isolation score decreases linearly with the ratio of missed spikes when they are equally distributed. Moreover, the scores of the four different units were highly correlated (R2 > 0.99). This demonstrates the consistency of the isolation score; i.e. the same ratio of errors yields the same isolation score. The estimated false negative score was a good estimation of the simulated false negative ratio values between 0 and approximately 0.35 (Fig. 6b2). When the fraction of simulated errors was above 0.35 the estimation of the false negative was noisy, and it fluctuated around 0.35. The false positive score was a valid estimator when the fraction of simulated false negative errors was between 0 and 0.3 (Fig. 6b3); in this range the estimate rate of the false positive errors was close to zero, as expected. However, a ratio of 0.3 or more of simulated false negatives caused the estimate of the false positive error to erroneously increase. By contrast to isolation and classification scores the SNRSPK does not change as a function of the simulated false negative ratio (Fig. 6b4). To simulate false positive errors we added events from the noise cluster to the spike cluster (Fig. 6c). The independent variable was the ratio between the number of noise events included 69 Results V M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 275 Fig. 6. Simulation of false positive and negative errors. False negatives were simulated by random reclassification of spike events as noise events. The independent variable is the ratio between the number of false negatives introduced and the size of the original spike cluster. The false positive errors were simulated by setting the noise cluster to be 10 times the size of the spike cluster and reclassifying noise events as spike events. The independent variable is the ratio of the number of false positives to the size of the spike cluster after reclassifying. (a) Spike waveforms from four well-sorted units with different signal-to-noise ratios. (b) Simulation of false negative errors. (b1) Isolation score. The score decreases with the number of false negatives; the difference between the units is negligible. (b2) False negative score. The score predicts the ratio of simulated false negatives well when the fraction of misclassified units is below 0.3. In this range, the difference between the score and the error ratio is less than 0.02. For larger simulated error ratios the score is misleading; instead of increasing, the score is bounded by 0.45. (b3) False positive score. The score predicts the false positive errors well when the fraction of misclassified units (simulated false negative) is below 0.3. For large error ratios of simulated false negatives the false positive score is misleading; instead of remaining at zero the score rises to 0.5. (b4) SNRSPK . The SNRSPK does not change as a function of the false negatives. (c) Simulation of false positive errors. (c1) Isolation score. The score decreases with the number of simulated false positive errors; no significant difference was found between the different units. (c2) False negative score. When the ratio of simulated false positive errors is larger than 0.3 the score increases from 0.01 to 0.08 due to biases when the noise and the spike cluster overlap. (c3) False positive score. The score follows the ratio of simulated errors (less the 0.02 difference). (c4) SNRSPK . The SNRSPK decreases with the number of false positives; however the SNRspk of different units is not consistently modified by the fraction of simulated false positives. in the spike cluster and the size of the spike cluster (false positive ratio). Zero means no false positives were generated and 0.5 means the number of simulated false positives was equal to the number of real spikes in the spike cluster. Unlike the case of simulated false negative errors, the reduction in size of the noise cluster may influence this simulation. To minimize such effects we set the noise cluster to be 10 times the size of the spike cluster before generating the errors. The isolation score was close to 1 when the error ratio was 0 and decreased to 0.55 when the simulated error ratio was 0.5 (Fig. 6c1). The changes in the iso- 70 lation score as a function of the simulated false positive error were highly correlated (R2 > 0.99) for the four different units depicted in Fig. 6. The false positive score was a good estimation of the ratio of errors; the difference between the score and the ratio of the simulated errors was less than 0.02 (Fig. 6c3). The false negative score changed only slightly when the simulated false positive error ratio was less than 0.3; when the error ratio increased the false negative score increased to 0.08 (Fig. 6c2). The SNRSPK decreased with the number of false positives, however the effect on the different units was not consistent; i.e. the Results V 276 M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 ratio of simulated false positives had different effects on the SNRspk of the four tested units (Fig. 6c4). In conclusion, the error simulations verify that the isolation score can be a measure of the extent to which noise events and spike events overlap. The simulation results also emphasize the fact that the classification error scores have a range of good predictability that is dependent on the overlap between clusters and hence on the isolation score. In this range of good predictability (e.g. for isolation scores >0.70) the false positive and negative scores should be used as a refinement of the isolation score. The results also demonstrate that SNR is misleading; it does not follow the false negatives ratio nor does it have a scale in which different units with the same ratio of false positives have the same score. Fig. 7. Effects of different sorting algorithms on the isolation scores. We generated sorting errors using three different sorting methods on the data of unit 1 of Fig. 6. (a) The scores and the real ratio of classification errors as a function of the criterion used in the sorting method. (a1) Amplitude threshold crossing classification. (a2) Template matching algorithm. We used a training set of 200 spikes to generate a 12-point template. We then calculated the distance of all events to this template and applied a threshold to classify an event as a spike. (a3) Projection on the average template. Similar to the template matching method we projected the data on the average template (1.5 ms, 36 points) defined by a training set of 200 spikes. (b) The isolation score as a function of the ratio of real false positive and negative errors. The different lines are the scores given when using different sorting methods. The isolation score was consistent across the sorting methods. (c) False negative score as a function of the real ratio of errors. (d) False positive score as a function of the ratio of errors. 71 Results V M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 3.4.2. Sorting errors and the isolation scores To further validate our scores under different sorting conditions we simulated clustering errors using different sorting methods. From the four well-isolated units in Section 3.4.1 we took the one with the lowest signal-to-noise ratio (Fig. 6, unit 1) and re-clustered the continuous sampled analog data using three clustering methods: (1) threshold crossing—all events that pass a threshold are marked as spikes. (2) Template matching—we used a training set to calculate the average spike waveform and then implemented an off-line algorithm similar to our real-time eight-point template matching algorithm (Section 2.1.1). Waveforms that were similar to the average of the training set have small Euclidian distances from the template and were considered as spikes from the same unit. (3) Projecting the data on the average template. The average template was generated using a training set. We then normalized this template to have a norm of 1 and convoluted it with the analog data. Spikes were detected as peaks in the resulting vector. This method is equivalent to the projection on the first principal component when the data contains only one unit (Abeles and Goldstein, 1977). Each of these methods classifies events as spikes or noise by a user-defined threshold. Here we used this threshold as our independent variable and examined its effect on the isolation scores. For each threshold we estimated the real ratio of false positive and negative errors (assuming that the original classification represented the real classification) and calculated the isolation score and error classification scores. As with the random simulation of errors (e.g. random switching of spike and noise events; Section 3.4.1 and Fig. 6) the isolation scores decreased with modifications of the sorting thresholds that increased the number of classification errors. This decrease took place both when the threshold values were very conservative and led to false negative errors (Fig. 7a, left side of plots) and when the thresholds were too permissive and led to false positive errors (Fig. 7a, right side of plots). The error classification scores followed the real error ratio when it was small but suffered from biases for large real error ratios and low isolation scores. To check the dependency of the scores on the clustering method we compared the scores with the real ratio of false negatives and the ratio of false positives (Fig. 7b–d). We found that the isolation score differed slightly between the tested methods (Fig. 7b). However, when comparing these isolation scores to the isolation score obtained when errors were simulated randomly (Fig. 6) we found that for a given number of false negatives the isolation score was higher when we used different threshold levels. This over-estimation of the isolation score was probably due to the local consistency of errors induced by systemic modification of the thresholds in the sorting clustering methods. The false negative score had a range in which it is equal to the real false negative ratio (Fig. 7c). This range was larger when using template matching than when using the other sorting methods. Similarly, the false positive score (Fig. 7d) was equal to the false positive ratio when such errors existed and was biased when the number of false negative was large. This bias was smallest for the template matching algorithm. Nevertheless, as with the random simulation of errors, systemic modification of the thresholds by several sorting methods reveals that the isolation quality is a consistent and reliable estimator of the quality 72 277 of the spike clustering. The classification errors can be used in cases with high levels of isolations scores (>0.8) and small levels of false positive and negative errors (<0.25). 3.4.3. Dynamic and population analysis of the isolation scores Typical physiological experiments include long duration (>15 min) recordings of the same units. Naturally, the isolation quality may drift or change over these periods. The isolation quality tests were applied to real data recorded for periods of more than 10 min. Each recording was split into segments of 60 s (∼1000–4000 spikes in our GP data). To limit the algorithm complexity (time and place) we reduced (by random pruning) the largest cluster to a size of 1500 spikes; the other cluster was then reduced to maintain the size ratio between clusters. The length of the segment is thus a tradeoff between computational time versus effectiveness. When using a short segment the sampling of the spike and noise cluster will be more accurate due to non-stationarity and less random pruning; however the computational time will increase. After extracting these clusters they were scored as described in the previous sections. Thus, for each unit we obtained a series of scores. These series of scores can be examined for problematic recording epochs which should be scrutinized more carefully (by re-clustering or omitting these sessions). This is depicted in Fig. 8 where 43 min of consecutive real-time sorting were scored. After 35 min of recording, it can be seen that the quality of the sorting decreased rapidly despite the apparent increase in the SNR of the unit. Our recommendation is therefore to apply these tests to any prolonged extracellular recording, and then to exclude periods with low scores from the analysis database. As a rule of thumb we suggest excluding recording periods with an isolation score below 0.7–0.8. To achieve a single score for each unit we averaged the scores over all sessions. The average scores of the 155 GPe units in our database were: isolation score, 0.93 ± 0.08; false negative score, 0.1 ± 0.09; false positive score, 0.02 ± 0.04 (Fig. 8c). To compare the scores from different brain areas we calculated the scores of 87 units recorded in the primary motor cortex. Action potentials from the cortex were wider than GPe waveforms. Hence, we used 2 ms of analog recordings for each action potential. The average isolation score of these cells was 0.79 ± 0.17. The average false negative score was 0.09 ± 0.19 and the average false positive score was 0.13 ± 0.18 (Fig. 8d). All distributions were significantly different from GP scores (p < 10−3 Kolmogorov–Smirnov test). This difference in scores is consistent with our subjective sense of the better quality of the GP data and is probably due to the difference in cell sizes and cell density in these brain areas. 3.4.4. Exploring parameter space of the isolation and classification error scores The isolation score was designed to be insensitive to the noise cluster size; i.e. adding events to the noise cluster that are far from the spike cluster should not affect the score. The size of the noise cluster is determined by the level of the amplitude threshold used for extracting the noise cluster. To verify this insensitivity we Results V 278 M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 Fig. 8. Score statistics on real data. (a) Dynamic changes of the isolation scores. Data were split into segments of 60 s. For each segment we calculated the isolation and classification errors scores. (a) All five scores and spike rates were computed for 43 consecutive minutes. Although all scores were stable during the first 35 min, from the 36th minute they began to change. Both SNR scores increased; in contrast, the isolation score (which was extremely stable) rapidly decreased. Therefore, in this case the SNR scores are misleading and the isolation score indicates the moment when the quality of the sorting decreased. (b) Spike (b1) and noise (b2) events (n = 100, randomly selected) from 6 min of recording. The right column is from the first 6 min, the middle column from minute 19 to 25 and the left from the last 6 min of recording. In the last 6 min many noise waveforms resemble spikes. This misclassification is probably due to a slight modification in the spike waveform (reflected by the SNR) that was not identified by the semi-automatic template matching algorithm. (c) Distributions of the scores of 155 GPe units. Scores from different sessions were averaged. (c1) Isolation score. (c2) False negative score. (c3) False positive score. (d) Distribution of scores of 87 cortex units. (d1) Isolation score. (d2) False negative score. (d3) False positive score. modified the size of the noise cluster by changing the fraction of events from the spike cluster used to calculate the threshold (these were the low amplitude spikes, hence fewer spikes means a closer to zero threshold). The distribution of the isolation score was independent of the threshold used for noise cluster extraction (p > 0.86 one-way ANOVA, p > 0.79 Kruskal–Wallis non-parametric ANOVA).We then compared the isolation score of all GPe units (n = 155) and found that the scores calculated with different noise clusters were highly correlated (Fig. 9a). Hence, our methods are insensitive to the size of the noise cluster. As described above, in order to reduce computation time we used a random sample from the spike and noise clusters. As a result each time we calculated the isolation score we used different events. The fact that we obtained the same scores when using different random samples from the same distribution further demonstrates the stability of our method. The isolation score depends on the λ parameter that sets the gain of the distance stretch. To check the dependency of the isolation score on this parameter we modified this parameter and calculated the isolation score of all 155 GPe units (Fig. 9b1). When λ was larger than 5 the isolation score was highly correlated with the scores calculated with our default value of λ = 10 (R > 0.946). However, when λ was equal to 1 the scores were not as highly correlated (R = 0.72). This is expected since small values of λ mean that the Euclidian distance between events is not stretched, and therefore distant events influence the score. To further investigate the influence of λ on our scores we simulated classification errors by applying different thresholds when sorting the data (as described in Section 3.4.2). We modified λ and calculated the isolation score as a function of the false negatives and positives ratio (Fig. 9b2–3). We found that when λ is small the score over-estimates the number of false positives (Fig. 9b3). 73 Results V M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 279 Fig. 9. Investigating parameter space of the isolation and classification-error scores. We modified the parameters used for calculating the score and compared the scores of real units and the scores of a unit with errors simulated by re-clustering the data using a threshold crossing method. (a) We modified the fraction of the spike clusters we used to calculate the threshold (these were the low amplitude spikes; hence fewer spikes means a closer to zero threshold) and compared the isolation scores when using 2% of the spike cluster. (a1) 2% vs. 5%. (a2) 2% vs. 20%. (a3) 2% vs. 100%. (b) Comparison of isolation score when modifying λ. (b1) Real data results. Units were sorted by the isolation score when using λ = 10. (b2) Isolation score as a function of the ratio of simulated false negatives. (b3) Isolation score as a function of the ratio of simulated false positives. (c) Comparison of false negative score when modifying K. (c1) Real data results. Units were sorted by the false negative score when using K = 31. (c2) False negative score as a function of the ratio of false negatives. (c3) False negative score as a function of false positives. (d) Same as (c) for the false positive score. In addition we found that as λ increases the false negative score tends to increase. However, this increase is bounded. We conclude that our selection of λ = 10 does not suffer from biases that occur when λ is small and it is within the large range in which the isolation score follows the ratio of classification errors. The KNN algorithm used for the calculation of the classification error scores depends on the K we use. We modified this 74 parameter and calculated the scores of all 155 GPe units (Fig. 9c1 and d1). The units were sorted by the scores when using the default value of K = 31. The false negative score changed only slightly when modifying K (Fig. 9c1); on the other hand the false positive score was sensitive to the K used (Fig. 9d1). We simulated clustering errors (as we simulated errors when modifying λ) and calculated the error classification score as a function of Results V 280 M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 Fig. 10. Comparing the scores. The scores were compared using the data from 155 GP units from three different monkeys. (a) Isolation score vs. SNRNo Spk . When the SNR is small the isolation score tends to be small, and when the SNR is large the isolation score is usually close to 1. But this relation is not linear. Therefore, any SNR threshold will either include units that are poorly isolated (low isolation score) or exclude units that are well isolated (high isolation score). Furthermore, the outliers (high SNR with low isolation score) reveal the weaknesses of the spike sorting process. (b) False positive + false negative scores vs. isolation score. As the isolation score decreases the variability increases. Hence, the error type scores should only be used when the isolation score is high. error ratio for different K values. We found that although K can bias the scores when the error ratios is large, there is a range of good predictability. compared several spike sorting algorithms, and investigated the parameter space of the scores. 4.1. Related studies 3.4.5. Comparing the scores Our SNR, isolation and classification error scores were not designed to be independent. To determine the degree of dependency we compared the different scores (Figs. 3d and 10). First we compared the two SNR scores (Fig. 3d). The underlying reasons for the differences between these scores were described in Section 3.1.2; nevertheless we found the SNR scores to be highly correlated (R2 = 0.94). We then compared the isolation score and the SNR score and found that as expected, in most cases units with a high SNR score had high isolation scores and units with low SNR scores had low isolation scores (Fig. 10a). On the other hand, the connection between these scores was non-linear and had outliers in which the isolation score was low although the SNR was high (e.g. last 6 min of Fig. 8). Due to these properties, any exclusion/inclusion criteria of units using a threshold based on the SNR scores will lead either to inclusion of units with a low isolation score or to exclusion of units with a high isolation score. Finally we compared the isolation score and the sum of false positive and negative scores (Fig. 10b) and found that when the isolation score was high, the variability of the sum was low and when the score was low the variability of the sum of the classification error scores was high. This again shows that the classification error scores are a refinement of the isolation score only when it is high, and further that when the isolation score is low these scores are less reliable and should not be used. 4. Discussion We quantified the quality of spike detection and sorting using signal-to-noise ratios (SNR), isolation scores, and classification error scores. We then simulated errors for validating the scores, Some previous studies have quantified the quality of clustering of recordings from multi-channel electrodes. These methods can be adapted to single channel recordings. In their study, Pouzat et al. (2002) assumed a Gaussian distribution of the noise, which they used to evaluate the variability of the spike waveforms. Shoham et al. (2003) have argued that the Gaussian assumption is inaccurate, and that the t-distribution is a better fit for the data. Furthermore, the distribution of noise, in general, is not sufficient for estimating the variability of signal statistics (Fee et al., 1996b). Even modeling the variability generated by the cells’ intrinsic properties is not always sufficient because it does not predict the variability caused by changes in the relative position of electrodes and recorded neurons (Fig. 1). Schmitzer-Torbert et al. (2005) used the χ2 distribution as a distance measure of the noise events from the spike cluster in a feature space. In their method the distance between a noise event and the spike cluster was treated in a global manner; i.e. the score of each noise event depended on its distance from the center of the spike cluster. By contrast, our scores are based on the local properties of the spike cluster. While their approach focused on the contribution of the noise events, our scores iterate over the events in the spike cluster. As a result of these differences, our isolation score captures phenomena found in non-homogenous spike clusters (i.e. clusters containing false positives), which the χ2 distance does not. In addition, we can obtain an estimate of the number of false positive and false negative errors that is not available with previous methods. Harris et al. (2001) and Schmitzer-Torbert et al. (2005) introduced the isolation distance which quantifies the quality of clustering by the minimal distance where the number of spike events and noise events is equal. Although this score is “self consistent”; i.e. the score will 75 Results V M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 281 decrease in the same recording with a reduction of the quality of the sorting, it does not have a global scale to differentiate between well and poorly isolated units. For example, a wellisolated unit with a low SNR can have the same score as a poorly isolated cluster with a high SNR. A major advantage of our isolation score is its intuitive range of zero to one, which enables easy comparison of units recorded at different times, and even by different research groups. In summary, we propose the isolation score, which is a measure of the separation between two groups (clusters); and then we present the two classification scores using a “one-class classification problem” approach. There are few other metrics for group separation (usually as evaluations of clustering techniques) and classification problems (Trevor et al., 2001), e.g. metrics based on Euclidian distance, city block, etc. However, we feel that the isolation and classification scores provide better metrics for spike data due to their insensitivity to noise cluster size. not distinguish between recordings of several cells on one electrode versus single cell recordings. In both cases given a cluster of spikes we extract the noise cluster and calculate our scores. As a result our scores reflect the quality of each unit and not the overall quality of the all units recorded from a given electrode. A preliminary condition for quality assessment is the insensitivity of the isolation score to the exact size of the noise cluster. By using different thresholds for extracting the noise cluster we showed that once the noise cluster contains the events that are close to the spike cluster, the score depends only slightly on the exact size of the noise cluster. As a result our methods can be applied to systems with intermittent sampling conditioned by the extraction and analog sampling only of putative spikes. In such systems it is possible to use other spike clusters, if they exist, such as the noise reference; however this may lead to over-estimation of data quality. 4.2. Relationship between scores 4.4. Future directions The scores show inter-dependence. A low isolation score is likely when the SNR is low, because low recording quality leads to cluster errors. On the other hand, large SNR values that appear with low isolation scores indicate problems with the clustering algorithm. There are many possible reasons for such isolation failure. These include assumptions in the clustering algorithm that may not have been fulfilled; e.g. the statistical model was wrong, the data were non-stationary or human errors were made. In this case (high SNR, low isolation score) we suggest re-clustering the data. To enhance the reliability of the results of studies based on extracellular recordings we suggest using the isolation score for preliminary analysis and exclusion of units or periods with very low isolation scores from the study data-base. We suggest that the findings be first verified on the recordings with high isolation scores and then extended to the entire data base. We suggest excluding units with isolation scores below 0.8 in studies whose conclusions may be influenced by the isolation quality of the recorded units. However, we believe that more testing is needed for setting this threshold and hope that such a threshold will emerge after future work is done in different recording settings and neuronal areas. In any case, this should not limit the report of the isolation score even when it is not used as a criterion for excluding data. An additional benefit of classification error scores is that they identify likely misclassified events. Our KNN approach can be used as a post-processing tool to optimize the original spike sorting, by flipping the classification for these missed events. An even more promising approach would be to use our isolation score algorithm to recluster these missed events, by using the P(X) values (Fig. 4). Recall that this value is akin to the probability that event X belongs to the spike cluster. The re-clustering could simply flip the classification for events X, for which their P(X) value is greater than some threshold. In this study we did not attempt to develop a method for finding the optimal value of K in the K nearest neighbors approach, but only constrained it. A data-driven approach, where K depends on various parameters of the spike and noise clusters (e.g. number of elements, overlap of the two clusters as measured by the isolation score, etc.) may be pursued. One may consider using two values for K, one for detecting false negatives and one for detecting false positives. Finally, our methods are based on analyzing the waveform of extracellular events and did not take spike train properties into account such as the firing rate or refractory periods. These properties are valuable for assessing spike sorting quality and thus can be used independently or could be incorporated into our 4.3. The score under different conditions Our simulation of spike errors using different sorting algorithms has shown that under different sorting conditions the scores are consistent and follow the number of simulated classification errors. We showed that the isolation score decreases with the ratio of classification errors and the classification error scores have a range in which they follow the real error ratio. However, applying sorting algorithms that directly reduce the scores may lead to a bias; i.e. a high isolation score despite a large ratio of classification errors. Nonetheless such an algorithm requires local consistency of spike clusters. Hence, we suggest using our scores when the sorting algorithms are based on global parameters (such as template matching and PCA based methods). Furthermore we suggest that local consistency algorithms should be used for post processing of sorting algorithms (see below). We compared the scores of two different brain areas. To enable this comparison we adjusted the time interval slightly for representation of events. We found that the isolation scores of GPe units were significantly larger than cortex units. This was consistent with our subjective impression that GPe units were better isolated. A major difference between GP and cortical recordings is that GP recordings are usually of only one cell per electrode, whereas two to three units are typically recorded by a single cortical electrode. Our isolation score methods do 76 Results V 282 M. Joshua et al. / Journal of Neuroscience Methods 163 (2007) 267–282 scores. For example, we could introduce a progressive penalty for units with detected spikes in their estimated refractory period. In summary, we have developed methods for quantifying the isolation quality of extracellularly recorded action potentials and compared these different methods. The scoring methods were applied directly to the spike waveform; however they may be used on other representations of the spike, e.g. PCA or wavelet-based representations. Isolation quality quantifications are a necessary step in interpreting studies based on extracellular recording. The conclusions of many single-units studies are more dependent on their unit isolation quality than on the power of the statistical and analytical methods used for their spike-train analysis. Nevertheless, in most cases, objective criteria are used and reported for the later stage but not for the first stages of the data acquisition process. We encourage research groups to use isolation measures, as developed in this manuscript, rather than more commonly used phrases such as “only well-isolated units were included in our study”. Acknowledgement This study was partly supported by a Center of Excellence grant administered by the ISF and HUNA’s “Fighting against Parkinson” grant. References Abeles M, Goldstein MHJ. Multispike train analysis. IEEE Trans Biomed Eng 1977;65:762–73. Bar-Gad I, Ritov Y, Bergman H. Failure in identification of overlappig spikes from multiple neuron activity causes artificial correlations. J Neurosci Methods 2001;107:1–13. Bergman H, DeLong MR. A personal computer-based spike detector and sorter: implementation and evaluation. J Neurosci Methods 1992;41:187–97. Elias S, Joshua M, Goldberg JA, Heimer G, Arkadir D, Morris G, et al. Statistical properties of pauses of the high-frequency discharge neurons in the external segment of the globus pallidus. J Neurosci 2007;27:2525–38. Fee MS, Mitra PP, Kleinfeld D. Automatic sorting of multiple unit neuronal signals in the presence of anisotropic and non-Gaussian variability. J Neurosci Methods 1996a;69:175–88. Fee MS, Mitra PP, Kleinfeld D. Variability of extracellular spike waveforms of cortical neurons. J Neurophysiol 1996b;76:3823–33. Goldberger J, Roweis S, Hinton G, Salakhutdinov R. Neighbourhood component analysis. Neural Inform Process Syst (NIPS’04) 2004;17:513– 20. Harris KD, Hirase H, Leinekugel X, Henze DA, Buzsaki G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 2001;32:141–9. Heimer G, Bar-Gad I, Goldberg JA, Bergman H. Dopamine replacement therapy reverses abnormal synchronization of pallidal neurons in the 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine primate model of parkinsonism. J Neurosci 2002;22:7850–5. Lewicki MS. Bayesian modeling and classification of neural signals. Neural Comp 1994;6:1005–30. Lewicki MS. A review of methods for spike sorting: the detection and classification of neural action potentials. Network 1998;9:R53–78. Likhtik E, Pelletier JG, Paz R, Pare D. Prefrontal control of the amygdala. J Neurosci 2005;25:7429–37. Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 2004;43:133–43. Nenadic Z, Burdick JW. Spike detection using the continuous wavelet transform. IEEE Trans Biomed Eng 2005;52:74–87. Pare D, Gaudreau H. Projection cells and interneurons of the lateral and basolateral amygdala: distinct firing patterns and differential relation to theta and delta rhythms in conscious cats. J Neurosci 1996;16:3334–50. Pouzat C, Delescluse M, Viot P, Diebolt J. Improved spike-sorting by modeling firing statistics and burst-dependent spike amplitude attenuation: a Markov chain Monte Carlo approach. J Neurophysiol 2004;91:2910–28. Pouzat C, Mazor O, Laurent G. Using noise signature to optimize spikesorting and to assess neuronal classification quality. J Neurosci Methods 2002;122:43–57. Quiroga RQ, Nadasdy Z, Ben Shaul Y. Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput 2004;16:1661–87. Schmitzer-Torbert N, Jackson J, Henze D, Harris K, Redish AD. Quantitative measures of cluster quality for use in extracellular recordings. Neuroscience 2005;131:1–11. Shoham S, Fellows MR, Normann RA. Robust, automatic spike sorting using mixtures of multivariate t-distributions. J Neurosci Methods 2003;127:111–22. Trevor H, Robert T, Jerome F. The elements of statistical learning: data mining, inference and prediction. New York: Springer Verlag; 2001. Vapnik VN. Statistical learning theory. New York: Wiley; 1998. Wood F, Black MJ, Vargas-Irwin C, Fellows M, Donoghue JP. On the variability of manual spike sorting. IEEE Trans Biomed Eng 2004;51:912–8. Worgotter F, Daunicht WJ, Eckmiller R. An on-line spike form discriminator for extracellular recordings based on an analog correlation technique. J Neurosci Methods 1986;17:141–51. 77 Discussion Discussion In this thesis I studied the responses of different basal ganglia neurons to rewarding and aversive related events. The results are summarized in a series of peer-reviewed journal manuscripts. The main findings of these manuscripts are discussed below. In the first paper included in this thesis (64) I found that rate modulations of striatal tonically active neurons (TANs) and dopaminergic neurons to expectation of reward were larger than the modulation which followed predictions of aversive events. Furthermore, these neurons encode the expectation level (or the prior probability) of reward better than the expectation of aversive events. Finally, TAN responses were not coincident with dopaminergic neurons responses in all trial epochs. More specifically, dopaminergic neurons encode the difference between reward and aversive trials in the cue and outcome epoch whereas the TAN population encodes this difference in the outcome and no-outcome epochs. Therefore complementary coding of dopaminergic neurons and TANs expands the encoding scope of the basal ganglia neuromodulators. In the second paper (65) and third chapter I extended the first study to the investigation of the responses of basal ganglia main axis neurons to expectation, delivery and omission of appetitive (food), aversive (airpuff) and neutral (sound only) events. I found that the responses of GPe, GPi and SNr neurons were longer in duration and less stereotypic than the responses of the main basal ganglia neuromodulators. As with the TANs and dopaminergic neurons, the responses of the basal ganglia main axis neurons were larger and usually encoded reward better than aversive related events. I found substantial differences between the three populations of basal ganglia main axis neurons. Most notably, SNr responses were more frequent, had shorter latencies, and encoded the airpuff delivery better than the corresponding responses of GPe and GPi neurons. In the fourth chapter (66) I used pair wise correlation analysis. I showed that the average responses of neuromodulators (TANs and dopaminergic neurons) tended to have a positive response correlation (i.e., similar time pattern). In comparison to the homogenous responses of the basal ganglia modulators, the neurons of the basal ganglia main axis had diverse responses. Pairs of dopaminergic neurons, as well as pairs of TANs dynamically modulate their discharge variation in accordance with events in the behavioral task. The synchronization between dopaminergic neurons increased after the cue and outcome events whereas synchronization of TANs 78 Discussion decreased just before cue offset. Furthermore, although the discharge rate of the dopaminergic neurons increased both in reward and aversive trials, their synchronization increased only in the reward trials. Similarly, the dynamic changes in synchronization of TAN pairs were not coincident with their discharge rate modulation. Finally in the fifth chapter (67) I developed a method for quantification of the quality of extracellular recording. This method was used in the analyses in all the result chapters. Asymmetry in the encoding of values in the basal Ganglia Asymmetric encoding of positive and negative expectations by the basal ganglia I found that before the end of the cue presentation, the fraction of trials in which the monkey licked in expectation of a future reward and the fraction of trials in which the monkey blinked in expectation of a future airpuff were similar. In addition I found a large blinking response even when the airpuff was omitted. Finally, with the exception of the outcome epoch, the licking and the blinking behavior reflected the expected (low vs. high) probability of the reward and the aversive events. Nevertheless the basal ganglia single cell activity was found to be biased toward the encoding of reward related events, and encoding of aversive events was very weak. Several studies have used similar paradigms to compare neural responses to reward food and aversive airpuff (56, 68, 69). Paton et al. showed that in the amygdala, expectations of food and airpuff are represented symmetrically. My research shows that by contrast to the amygdala, food and airpuff expectations are represented asymmetrically in the basal ganglia. Thus, I found comparable aversive and reward related behavior. However, whereas the activity in the basal ganglia strongly reflects reward behavior and encodes reward probability, aversive related events and their probability are only weakly encoded in basal ganglia activity. Although I found similarity in the behavioral responses, in this study I did not calibrate the subjective value (utility) of food vs. airpuff. I however did manipulate the expectation of aversive outcome. In previous instrumental conditioning experiments including both reward and aversive events the monkey could avoid the aversive airpuff by a correct response (56, 61, 70). In the current experiment the airpuff was unavoidable and hence the aversive cue led to direct expectation of aversion. 79 Discussion In the first paper (64), I reported that the responses of midbrain dopaminergic neurons and striatal TANs are biased towards the encoding of rewarding events and in the second I found a similar result for the main axis neurons (71). The basal ganglia main axis is affected by additional neuromodulator systems, e.g., serotonin (72). Theoretical studies have suggested that the phasic serotonin signal might report the prediction error for future punishment (73, 74) and therefore could compensate for the biased encoding of the value domain by the TANs and the dopaminergic neurons. The current study of the basal ganglia output structures indicates that the basal ganglia main axis neurons have a similar bias toward control of reward related behavior as TANs and dopaminergic neurons. Thus, even if there are other basal ganglia modulators than the cholinergic and dopaminergic striatal inputs, the activity of basal ganglia output neurons follows the same trend as the TANs and dopaminergic neurons and is biased towards rewarding events. I therefore suggest that the other modulators do not extend the basal ganglia encoding to aversive events and that there are other neuronal systems than the basal ganglia that have control over aversive related behavior. Encoding of Dopaminergic neurons Dopaminergic neurons encode more than reward prediction errors Recent studies have shown that dopaminergic neuron activity encodes the mismatch between prediction and reality. Most of these studies have focused on the mismatch in the positive domain; i.e., when conditions are better than expected (25). Dopaminergic neurons typically increase their discharge rate in response to appetitive predictive cues and outcomes. In line with the predictions of reinforcement learning theories, the dopaminergic neurons discharge decreases with omission of predicted rewards (29, 51, 52). However, this discharge suppression is limited since the neuronal firing rate is truncated at zero. Several groups (27, 28) have reported that the instantaneous firing of dopaminergic neurons does not demonstrate incremental encoding of reward omission, and it was suggested that omission is encoded by duration of the discharge decrease (53). In this experiment, however, I failed to find any significant coding of reward omission by response amplitude or duration. Naïve reinforcement learning models categorize events as having positive or negative errors and would suggest opposite sign modulation to reward and aversive trials (25). 80 Discussion However I found similar trends for dopaminergic neuron responses to predictions, outcomes and omission of reward and aversive related events (64). In particular I found a substantial increase to both reward and aversive outcomes. Furthermore, the responses of the dopaminergic neurons to reward omission and aversive outcome were very different (decrease vs. increase) although in both cases there was a negative reinforcement error. To summarize, the results reveal an increase in the complexity of the dopaminergic neuron encoding of value. This does not rule out their role in the temporal difference hypothesis. On the contrary, my working hypothesis holds that the discharge rate of dopaminergic neurons and TANs reflects changes in reward prediction as well as changes in attention/arousal levels (54, 75, 76). Reward related increase in the synchronization of dopaminergic neurons I showed that in a classical conditioning task, the activity of the dopaminergic neurons also increases following non- rewarding events such as the prediction and delivery of airpuffs (64). Nonetheless, I found an increase in the noise correlation of dopaminergic neurons to expectation and delivery of reward and not to other events (66). These findings indicating a reward related increase of noise correlation extend previous findings of unspecific spike to spike (noise) correlations of dopaminergic neurons (28, 77). The modulations of the noise correlation were small compared to the modulations of rate. In a recent study Schneidman et. al. (78) have shown that weak pair wise correlation may imply a strongly correlated network, and provide an effective description of the system. It is unclear whether pair wise correlations give an effective description of the dopaminergic neurons since current recording methods do not enable in vivo simultaneous recording of many neurons, yet it demonstrates the potential importance of the noise correlations. Although I found some overlap between noise correlations and rate modulations, they were dissociated. There were periods with modulation of rate but not of the noise correlation. Comparing basal ganglia subpopulations 81 Discussion Comparing neuromodulators - TANs do not mirror the dopaminergic neurons responses The anatomical demonstration of dopaminergic innervations of striatal cholinergic interneurons (79) and the suppression of acetylcholine efflux from striatal slices by dopamine (80) suggest that dopaminergic neurons directly inhibit TANs (41). TANs might mediate the dopaminergic message to the D1 and D2 dopamine receptor containing striatal projection neurons. The opposite and coincident responses of the TANs and dopaminergic neurons to predictive cues support direct inhibition. However TAN responses at the terminal stage of the trial include major positive deflections which do not mirror any phase of the dopaminergic response. Notably, following outcome omission, dopaminergic neurons respond similarly to the neutral outcome, reward and airpuff omissions, whereas the TANs robustly discriminate between the three events. Thus, dopaminergic neurons may better encode the cue predicting events and the TANs may provide more information at the completion of the trial. This is consistent with the findings of sub-populations of striatal projection neurons with selective evaluative encoding of trial results (42, 81). In any case, these differential responses indicate that the TAN discharge is not totally governed by its dopaminergic inputs; neither are the TANs and dopaminergic neurons driven by a common source (82) with opposite effects on the two systems. In addition to the differences between single cell activity of TANs and dopaminergic neurons I found differences in the pattern of the correlation between pairs of cells in these populations (66). I found that the noise correlation of the dopaminergic neurons increases whereas the correlation for the TANs decreases. Thus, it is possible that increasing the dopaminergic neuron correlation and the de-correlation of TANs enables an increase and decrease respectively in the effective concentrations of striatal dopamine and acetylcholine respectively. The right balance between basal ganglia neuromodulators and cortico-striatal activity may lead to a maximization of information in the basal ganglia main axis and an optimal behavioral policy. Comparing main axis populations - different response characteristics of the main axis nuclei In this study I found several major differences between the GPe, GPi and the SNr (71). I found more intense changes in the responses of the SNr compared to the responses of the GPe and the GPi. SNr neurons responded with shorter latencies to the 82 Discussion cue, and encoded the airpuff outcome better than the pallidal neurons. A simple explanation for the enhanced encoding is the orofacial (licking and blinking) motor behavior of the monkeys in this experiment. Initial studies emphasized the role of the SNr in the control of orofacial movements (83, 84). Although this separation is not clear cut (85, 86) the results may reflect this organization. Thus the small and less frequent responses in the GPi could reflect the smaller representation of orofacial movements in the GPi. This could also account for the activation of the SNr to aversive events, but as noted above this does not explain the asymmetric value representation in the SNr. At the circuitry level, one possibility is that the origins of the difference in pallidal vs. SNr responses could be a result of different projections from the striatum or the STN (6). Another possibility is that the GPe has different pathways to the GPi and SNr and those GPe neurons that do project to the SNr are the neurons with the short latency and larger response. Nevertheless I did not find any topographic organization in the responses of the GPe that supports this hypothesis. Finally another putative explanation for the differences between the GPi and the SNr is the direct effects of somatodendritic release of dopamine on SNr, but not on pallidal neurons. The similar latencies of SNc and SNr responses support the hypothesis that SNc neurons may drive SNr responses by somatodentritic release of dopamine (87, 88). Finally, the neural recordings were made after the monkey was highly familiar with the task and hence activity might not be the same as activity that occurs during learning. Previous studies of dopaminergic neurons have shown that activity on a familiar probabilistic task resembles the activity in a learning task (26, 28, 29). A fMRI study has shown that striatal activity underlies novelty-based choice in humans (89). Whether this is the case for other basal ganglia populations and the single cell activity that underlies novelty representation should be investigated in future studies. Comparing main axis and neuromodulators - Phasic response of neuromodulators vs. long lasting response of main axis In contrast to the short (<0.7 s) responses of the basal ganglia modulators (28, 64, 90, 91), the responses of the basal ganglia main axis high frequency discharge neurons lasted throughout the two second cue epochs. This is in line with previous descriptions of pallidal (50) and SNr (85) responses. Long duration, set- related responses have frequently been described in the cortex (92-94) where they have been 83 Discussion attributed to short term memory or action preparation processes. I cannot rule out similar processes in the basal ganglia and the experimental design could not dissociate set- related vs. cue- evoked responses. However, the encoding of probability by the basal ganglia main axis neurons and the dissociation between actions and neural response (for example no neural encoding of the probability of aversive trials, the early decay of the neural activity compared to licking behavior after reward delivery) suggests that the activity of these neurons may encode the value of the current state or state-action pairs (42, 45). The tonic high frequency discharge rate of neurons (population average: 45-88 spikes/s in this study) endows them with a better dynamic range for responses with a decrease in discharge rate. Nevertheless, consistent with many previous studies (9598) I found that the high frequency discharge neurons respond to behavioral events more frequently with increases than with decreases in discharge rate. The latencies and the temporal distribution of the responses with increases and decreases in discharge rate were similar, thus leading to highly diverse basal ganglia encoding, with different polarities and different amplitudes of responses. The differences between the population responses with no encoding of the a-priori probability of outcome vs. the single unit encoding of this probability is in line with such a balanced diversity of responses of basal ganglia single units. These diverse responses augment the information capacity of the basal ganglia output structure (99). Correlations of the average response set neuromodulators apart from the main axis Previous studies have observed that different neuromodulator cells have responses with similar temporal patterns (40, 91). In the forth chapter (66) I quantified the similarity of the temporal pattern of the response (response correlation) and the similarity of the encoding of different events (signal correlation). I showed that as opposed to the basal ganglia neuromodulators, the main axis responses are diverse. The homogeneous and synchronized responses of the neuromodulators suggest that these populations as a whole provide the main axis with a scalar message; i.e., the encoding of different dopaminergic neurons, as well as of different TANs, is similar. On the other hand the diversity of the main axis responses suggests that its activity is highly independent, which is conducive to a large information capacity (99). The contrast between the diversity of main axis response and the homogeneity of 84 Discussion modulators was demonstrated in a behavioral task with 18 different events. Nevertheless, I cannot rule out the possibility that the recording of neural activity during other tasks or over greater spatial distances (including dopaminergic neurons in the ventral tegmental area and TANs in the caudate or ventral striatum) may reveal other effects. Future studies using a large variety of tasks and wider sampling of basal ganglia neurons should test the consistency and the spatial extent of the homogeneity of the basal ganglia modulators. Based mainly on the activity of the dopaminergic neurons it was suggested that the basal ganglia implements a reinforcement learning algorithm (25). The distinction between the correlation properties of neuromodulators and the main axis is in line with the idea that these populations have a different role in the reinforcement learning algorithm. The neuromodulator scalar response is consistent with these neurons being the teacher (e.g., a critic) of this system and the diversity of the main axis is in agreement with it being the executor of the system (e.g., the actor) which requires specificity in the encoding of the different neuronal elements. The basal ganglia in control of motor behavior In the previous sections I have discussed the different results of my thesis; in this section I will try to unify the different results under one framework. The response to silent non rewarding events I found that responses to expectation of aversive events are similar to the responses to the neutral cue. Nevertheless in many cells we do see a large (but similar) response to both of these events (64, 65). In addition I found many neurons that respond to the aversive outcome with a short duration respond. In the following section I discuss the possibility that the responses to events represent two modes of activity in the basal ganglia- a fast component that encodes the saliency and another component that is selective for rewarding events. I suggest that both of these phases have origins in the dynamic response of the dopaminergic signal. The dopaminergic signal may provides different messages to different basal ganglia pathways Close examination of the response of the dopaminergic neurons reveals that many of these cells have a bi-phasic response with an increase that is followed by a decrease in 85 Discussion activity (51, 64). The increase in activity for non rewarding events is brief (64, 66), whereas the increase to the rewarding events is longer. Dopamine transmission is not limited to classical synaptic action since it may also diffuse and reach extra synaptic receptors (23, 100). The coordinated burst of dopaminergic neurons leads to large extra synaptic dopamine concentrations (101), and a pause in activity of dopaminergic neurons induces a decrease in the extracellular dopamine level (102). This suggests that the bi-phasic response enables fast increase in the extra synaptic dopamine that is followed by a fast clearance. The larger, longer (64) and more synchronized (66) response to the rewarding events may selectively potentiate the transmission of dopamine in the striatum for rewarding events (103). The augmented dopamine release to rewarding events would enable summation of the extracellular dopamine released for adjacent synapses. The brief unsynchronized increase for the non rewarding events may lead to a fast localized increase in dopamine that will then rapidly decrease and lead to a wide-ranging decrease in extra synaptic dopamine. Based on the reuptake, diffusion of dopamine and the affinity of the different dopamine receptors Cragg and Rice (100) have calculated the sphere of influence of a single release. They concluded that D1 receptors are activated at short distances (<2 µm) from the dopamine release site while the D2 are activated in longer distances (<7 µm). The biphasic response may lead to differences in the activation of the receptor types; i.e. the first phase of the response reaches the close D1 and D2 receptors while late increase (which is limited to rewarding events) influences the remote high affinity D2 receptors. Together with the previous assumption of fast clearance for non rewarding events, this different field of effects suggests that the fast direct D1 pathways receive only the fast (saliency) signal and that the slow indirect D2 pathway is activated by both fast and slow dopamine signal. Two time scales for controlling motor behavior Control of behavior in short latencies has the advantage of responding online which is essential when the environment is rapidly changing. On the other hand, fast responses may lead to errors which might be avoided with more prolonged processing of information. Theories of the function of the basal ganglia have suggested that the main role of their output is to open a gate to enable motor behavior (104). The dynamic response enables fast non selective opening of the motor gate (perhaps in the fast direct pathway). These fast motor responses are probably necessary for a rapidly 86 Discussion changing environment with salient events. Accordingly, we found that many neurons in the output of the basal ganglia have a fast response to the air puff delivery (65). The second phase of the dopaminergic response enables selective behavioral responses which might require planning for maximization of future reward. Indeed, we found significant encoding of reward expectation, but hardly any representation of expectation of air puff in the output stages of the basal ganglia (65). Diseases of the basal ganglia causes severe motor and cognitive impairments (12). These symptoms can be divided to those, in which behavior is categorized as unrestrained behavior (positive symptoms; e.g., impulsivity, gambling, obsessive behavior, dyskinesia and tremor) and those in which behavior is over restraint (negative symptoms; e.g., bradykinesia, akinesia). Previous action selection models of the basal ganglia (104) holds that akinesia is due to dopamine depletion and closure of the basal ganglia gate but bardykinesia is not easily explained. The convergence of high and low order information in the basal ganglia suggests that it could be that these clinical symptoms reflect impairments in the ability of the basal ganglia to tradeoff between the fast and slow pathways. Unrestrained behavior is due to over activity in the fast pathways and the over restrained activity is due to over processing in the slow pathways. 87 Bibliography 1. Marr,D. (1983) Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, W. H. Freeman 2. Sutton,R.S. and Barto,A.G. (1998) Reinforcement learning - an introduction, The MIT Press 3. Bar-Gad,I. and Bergman,H (2001) .Stepping out of the box: information processing in the neural networks of the basal ganglia. Curr. Opin. Neurobiol. 11, 689-695 4. Gurney,K. et al. (2004) Computational models of the basal ganglia: from robots to membranes. Trends Neurosci. 27, 453-459 5. Parent,A. and Hazrati,L.N. (1995) Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamo-cortical loop. Brain Res. Rev. 20, 91-127 6. Haber,S.N. and Gdowski,M.J. (2004) The Basal Ganglia. In The Human Nervous System (Second edn) (Paxinos,G. and Mai,J.K., eds), pp. 676-738, Elsevier 7. Albin,R.L. et al. (1989) The functional anatomy of basal ganglia disorders. Trends Neurosci. 12, 366-375 8. Tepper,J.M. et al. (2004) GABAergic microcircuits in the neostriatum. Trends Neurosci. 2662-669 ,7 9. Aosaki,T. et al. (1994) Responses of tonically active neurons in the primate's striatum undergo systematic changes during behavioral sensorimotor conditioning. J. Neurosci. 14, 3969-3984 10. Wilson,C.J. et al. (1990) Firing patterns and synaptic potentials of identified giant aspiny interneurons in the rat neostriatum. J. Neurosci. 10, 508-519 11. Tepper,J.M. and Bolam,J.P. (2004) Functional diversity and specificity of neostriatal interneurons. Curr. Opin. Neurobiol. 14, 685-692 12. DeLong,M.R (1990) .Primate models of movement disorders of basal ganglia origin. Trends. Neurosci. 13, 281-285 13. Gerfen,C.R. et al. (1990) D1 and D2 dopamine receptor-regulated gene expression of striatonigral and striatopallidal neurons. Science 250, 1429-1432 14. Shen,W. et al. (2008) Dichotomous dopaminergic control of striatal synaptic plasticity. Science 321, 848-851 15. Surmeier,D.J. and Kitai,S.T. (1994) Dopaminergic regulation of striatal efferent pathways. Curr. Opin. Neurobiol. 4, 915-919 16. Levesque,M. and Parent,A. (2005) The striatofugal fiber system in primates: a reevaluation of its organization based on single-axon tracing studies. Proc. Natl. Acad. Sci. U. S. A 102, 11888-11893 17. Nadjar,A. et al. (2006) Phenotype of striatofugal medium spiny neurons in parkinsonian and dyskinetic nonhuman primates: a call for a reappraisal of the functional organization of the basal ganglia. J. Neurosci. 26, 8653-8661 18. Feger,J. et al. (1994) The projections from the parafascicular thalamic nucleus to the subthalamic nucleus and the striatum arise from separate neuronal populations: a comparison with the corticostriatal and corticosubthalamic efferents in a retrograde fluorescent double- labelling study. Neuroscience 60, 125-132 19. Nambu,A. et al. (2002) Functional significance of the cortico-subthalamo-pallidal ' hyperdirect' pathway. Neurosci. Res. 43, 111-117 20. Bolam,J.P. et al. (2000) Synaptic organisation of the basal ganglia. J Anat. 196, 527-542 21. Reynolds,J.N. et al. (2001) A cellular mechanism of reward-related learning. Nature 413, 67-70 22. Calabresi,P. et al. (2000) Acetylcholine-mediated modulation of striatal function. Trends Neurosci. 23, 120-126 88 Bibliography 23. Arbuthnott,G.W. and Wickens,J. (2007) Space, time and dopamine. Trends Neurosci. 30, 62-69 24. Jellinger,K.A. (1991) Pathology of Parkinson's disease. Changes other than the nigrostriatal pathway. Mol. Chem. Neuropathol. 14, 153-197 25. Schultz,W. et al. (1997) A neural substrate of prediction and reward. Science 275, 1593-1599 26. Hollerman,J.R. and Schultz,W. (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304-309 27. Bayer,H.M. and Glimcher,P.W. (2005) Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47129-141 , 28. Morris,G. et al. (2004) Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43, 133-143 29. Fiorillo,C.D. et al. (2003) Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898-1902 30. Tobler,P.N. et al. (2005) Adaptive coding of reward value by dopamine neurons. Science 307, 1642-1645 31. Waelti,P. et al. (2001) Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43-48 32. Tobler,P.N. et al. (2003) Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. J. Neurosci. 23, 10402-10410 33. Pan,W.X. et al. (2005) Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235-6242 34. Satoh,T. et al. (2003) Correlated coding of motivation and outcome of decision by dopamine neurons. J. Neurosci. 23, 9913-9923 35. Nakahara,H. et al. (2004) Dopamine neurons can represent context-dependent prediction error. Neuron 41, 269-280 36. D'Ardenne,K. et al. (2008) BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science 319, 1264-1267 37. Fiorillo,C.D. et al. (2008) The temporal precision of reward prediction in dopamine neurons. Nat. Neurosci. 38. Centonze,D. et al. (2003) Dopamine, acetylcholine and nitric oxide systems interact to induce corticostriatal synaptic plasticity. Rev. Neurosci. 14, 207-216 39. Barbeau,A. (1962) The pathogensis of Parkinson's disease: A new hypothesis. Canad. Med. Ass. J. 87, 802-807 40. Graybiel,A.M. et al. (1994) The basal ganglia and adaptive motor control. Science 265, 1826-1831 41. Wang,Z. et al. (2006) Dopaminergic control of corticostriatal long-term synaptic depression in medium spiny neurons is mediated by cholinergic interneurons. Neuron 50, 443-452 42. Lau,B. and Glimcher,P.W. (2007) Action and outcome encoding in the primate caudate nucleus. J. Neurosci. 27, 14502-14514 43. Apicella,P. et al. (1992 (Neuronal activity in monkey striatum related to the expectation of predictable environmental events. J. Neurophysiol. 68, 945-960 44. Lauwereyns,J. et al. (2002) A neural correlate of response bias in monkey caudate nucleus. Nature 418, 413-417 45. Samejima,K. et al. (2005) Representation of action-specific reward values in the striatum. Science 310, 1337-1340 89 Bibliography 46. Turner,R.S. and Anderson,M.E. (2005) Context-dependent modulation of movement-related discharge in the primate globus pallidus. J. Neurosci. 252965- , 2976 47. Gdowski,M.J. et al. (2001) Context dependency in the globus pallidus internal segment during targeted arm movements. J Neurophysiol 85, 998-1004 48. Handel,A. and Glimcher,P.W. (2000) Contextual modulation of substantia nigra pars reticulata neurons. J. Neurophysiol. 83, 3042-3048 49. Pasquereau,B. et al. (2007) Shaping of motor responses by incentive values through the basal ganglia. J. Neurosci. 27, 1176-1183 50. Arkadir,D. et al. (2004) Independent coding of movement direction and reward prediction by single pallidal neurons. J. Neurosci. 24, 10047-10056 51. Schultz,W. et al. (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci. 13, 900-913 52. Matsumoto,M. and Hikosaka,O. (2007) Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447, 1111-1115 53. Bayer,H.M. et al. (2007) Statistics of midbrain dopamine neuron spike trains in the awake primate. J. Neurophysiol. 981428-1439 , 54. Horvitz,J.C. (2000) Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96, 651-656 55. Guarraci,F.A. and Kapp,B.S. (1999) An electrophysiological characterization of ventral tegmental area dopaminergic neurons during differential pavlovian fear conditioning in the awake rabbit. Behav. Brain Res. 99, 169-179 56. Mirenowicz,J. and Schultz,W. (1996) Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449-451 57. Ungless,M.A. et al. (2004) Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science 303, 2040-2042 58. Coizet,V. et al. (2006) Nociceptive responses of midbrain dopaminergic neurones are modulated by the superior colliculus in the rat. Neuroscience 139, 1479-1493 59. Brown,M.T. et al. (2009) Activity of neurochemically heterogeneous dopaminergic neurons in the substantia nigra during spontaneous and driven changes in brain state. J. Neurosci. 29, 2915-2925 60. Ravel,S. et al. (2003) Responses of tonically active neurons in the monkey striatum discriminate between motivationally opposing stimuli. J. Neurosci. 23, 8489-8497 61. Yamada,H. et al. (2004) Tonically active neurons in the primate caudate nucleus and putamen differentially encode instructed motivational outcomes of action. J. Neurosci. 24, 3500-3510 62. Martin,R.F. and Bowden,D.M. (2000) Primate Brain Maps: Structure of the Macaque Brain, Elsevier Science 63. Szabo,J. and Cowan,W.M. (1984) A stereotaxic atlas of the brain of the cynomolgus monkey ( Macaca fascicularis). J Comp Neurol. 222, 265-300 64. Joshua,M. et al. (2008) Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J. Neurosci. 28, 11673-11684 65. Joshua,M. et al. (2009) Encoding of probabilistic rewarding and aversive events by pallidal and nigral neurons. J. Neurophysiol. 101, 758-772 66. Joshua,M .et al. (2009) Synchronization of midbrain dopaminergic neurons is enhanced by rewarding events. Neuron 62(5), 695-704 90 Bibliography 67. Joshua,M. et al. (2007) Quantifying the isolation quality of extracellularly recorded action potentials. J. Neurosci. Methods 163, 267-282 68. Paton,J.J. et al. (2006) The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature 439, 865-870 69. Kobayashi,S. et al. (2006) Influences of rewarding and aversive outcomes on activity in macaque lateral prefrontal cortex. Neuron 51, 861-870 70. Yamada,H. et al. (2007) History- and current instruction-based coding of forthcoming behavioral outcomes in the striatum. J. Neurophysiol. 98, 3557-3567 71. Joshua,M. et al. (2008) Different encoding of probabilistic rewarding and aversive events by pallidal and nigral neurons. J. Neurophys. In press 72. Lavoie,B. and Parent,A. (1990) Immunohistochemical study of the serotoninergic innervation of the basal ganglia in the squirrel monkey. J Comp Neurol. 299, 1-16\ 73. Daw,N.D. et al. (2002) Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603-616 74. Dayan,P. and Huys,Q.J. (2008) Serotonin, inhibition, and negative mood. PLoS. Comput. Biol. 4, e4 75. Redgrave,P. and Gurney,K. (2006) The short-latency dopamine signal: a role in discovering novel actions? Nat. Rev. Neurosci. 7, 967-975 76. Ravel,S. and Richmond,B.J. (2006) Dopamine neuronal responses in monkeys performing visually cued reward schedules. Eur. J. Neurosci. 24, 277-290 77. Grace,A.A .and Bunney,B.S. (1983) Intracellular and extracellular electrophysiology of nigral dopaminergic neurons--3. Evidence for electrotonic coupling. Neuroscience 10, 333-348 78. Schneidman,E. et al. (2006) Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440, 1007-1012 79. Lehmann,J. and Langer,S.Z. (1983) The striatal cholinergic interneuron: synaptic target of dopaminergic terminals? Neuroscience 10, 1105-1120 80. Stoof,J.C. et al. (1992) Regulation of the activity of striatal cholinergic neurons by dopamine. Neuroscience 47, 755-770 81. Lau,B. and Glimcher,P.W. (2008) Value representations in the primate striatum during matching behavior. Neuron 58, 451-463 82. Matsumoto,N. et al. (2001) Neurons in the thalamic CM-Pf complex supply striatal neurons with information about behaviorally significant sensory events. J Neurophysiol. 85, 960-976 83. Hikosaka,O. and Wurtz,R.H. (1983) Visual and oculomotor functions of monkey substantia nigra pars reticulata. II. Visual responses related to fixation of gaze. J. Neurophysiol 49, 1254-1267 84. DeLong,M.R. et al. (1983) Relations between movement and single cell discharge in the substantia nigra of the behaving monkey. J. Neurosci. 3, 15991606 85. Wichmann,T. and Kliem,M.A. (2 (004Neuronal activity in the primate substantia nigra pars reticulata during the performance of simple and memory-guided elbow movements. J. Neurophysiol. 91, 815-827 86. DeLong,M.R. et al. (1985) Primate globus pallidus and subthalamic nucleus: functional organization. J. Neurophysiol. 53, 530-543 87. Cragg,S.J. et al. (2001) Dopamine-mediated volume transmission in midbrain is regulated by distinct extracellular geometry and uptake. J Neurophysiol 85, 17611771 91 Bibliography 88. Windels,F. and Kiyatkin,E.A. (2006) Dopamine action in the substantia nigra pars reticulata: iontophoretic studies in awake, unrestrained rats. Eur. J. Neurosci. 24, 1385-1394 89. Wittmann,B.C. et al. (2008) Striatal activity underlies novelty-based choice in humans. Neuron 58, 967-973 90. Apicella,P. (2007) Leading tonically active neurons of the striatum from reward detection to context recognition. Trends Neurosci. 30, 299-306 91. Schultz,W. (1998) Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1-27 92. Miyashita,Y. (1988) Neuronal correlate of visual associative long-term memory in the primate temporal cortex. Nature. 335, 817-820 93. Wise,S.P. and Kurata,K. (1989) Set-related activity in the premotor cortex of rhesus monkeys: effect of triggering cues and relatively long delay intervals. Somatosens. Mot. Res. 6, 455-476 94. Fuster,J.M. (1999) The Prefrontal Cortex Anatomy, Physiology, and Neuropsychology of the frontal lobes, Lippincott-Raven 95. Turner,R.S. and Anderson,M.E. (1997) Pallidal discharge related to the kinematics of reaching movements in two dimensions. J. Neurophysiol. 77, 10511074 96. Georgopoulos,A.P. et al. (1983) Relations between parameters of step-tracking movements and single cell discharge in the globus pallidus and subthalamic nucleus of the behaving monkey. J. Neurosci. 3, 1586-1598 97. Mink,J.W. and Thach,W.T. (1991) Basal ganglia motor control. I. Nonexclusive relation of pallidal discharge to five movement modes. J. Neurophysiol. 65, 273300 98. Mitchell,S.J. et al. (1987) The primate globus pallidus :neuronal activity related to direction of movement. Exp. Brain Res. 68, 491-505 99. Bar-Gad,I. et al. (2003) Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Prog. Neurobiol. 71, 439-473 100. Cragg,S.J. and Rice,M.E. (2004) DAncing past the DAT at a DA synapse. Trends Neurosci. 27, 270-277 101. Day,J.J. et al. (2007) Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020-1028 102. Suaud-Chagny,M.F .et al. (1992) Relationship between dopamine release in the rat nucleus accumbens and the discharge activity of dopaminergic neurons during local in vivo application of amino acids in the ventral tegmental area. Neuroscience 49, 63-72 103. Horvitz,J.C. (20 (09Stimulus-response and response-outcome learning mechanisms in the striatum. Behav. Brain Res. 199, 129-140 104. Mink,J.W. (1996) The basal ganglia: focused selection and inhibition of competing motor programs. Prog. Neurobiol. 50, 381-425 92 Journal of Neuroscience Methods 178 (2009) 350–356 Appendix Contents lists available at ScienceDirect Journal of Neuroscience Methods journal homepage: www.elsevier.com/locate/jneumeth A noninvasive, fast and inexpensive tool for the detection of eye open/closed state in primates Rea Mitelman a,b,∗,1 , Mati Joshua a,b,1 , Avital Adler a,b , Hagai Bergman a,b,c a b c Department of Physiology, The Hebrew University – Hadassah Medical School, Jerusalem 91120, Israel The Interdisciplinary Center for Neural Computation, The Hebrew University, Jerusalem 91904, Israel Eric Roland Center for Neurodegenerative Diseases, The Hebrew University, Jerusalem 91904, Israel a r t i c l e i n f o Article history: Received 5 October 2008 Received in revised form 4 December 2008 Accepted 4 December 2008 Keywords: Electrophysiological recordings Image processing Eyeblink conditioning Primates Eyelid a b s t r a c t Accurate detection of the eye state (i.e., open or closed) of animals during electrophysiological recordings is often crucial for analyzing physiological data. This requires a system which is reliable, and preferably noninvasive and inexpensive. Here we present such a tool incorporating a standard digital camera and a semi-automatic eye state detection (ESD) algorithm that can be used easily in typical primate electrophysiological setups. The ESD algorithm is based on the high light absorbance of the iris and pupil relative to the eyelid and takes advantage of the unique conditions found in primate physiological recordings (minimal area of sclera and head fixation). The ESD algorithm is as accurate as a human observer, and is not vulnerable to variance inherent to human decisions that it requires (i.e., eye location setting, training set classification and threshold setting). The temporal resolution with standard interlaced digital cameras is 17–20 ms. This is sufficient for the detection of eye state changes during electrophysiological recordings including spontaneous blinking and eye blink conditioning, as demonstrated here. Furthermore, the ESD tool can be applied to other physiological areas of research in which changes in eye state are critical to analyzing neuronal activity. © 2008 Elsevier B.V. All rights reserved. 1. Introduction Vision is the main sense by which primates (both human and nonhuman) perceive the world. Unlike other senses, visual input can be completely blocked at the level of the sensory organ by the eyelid. Therefore, understanding any neuronal activity involving the visual system requires an accurate recording of the state of the eyelids, i.e., whether they are open or closed. Furthermore, detecting the state of the eyelid is crucial for monitoring motor output during eyeblink conditioning (Marquis and Hilgard, 1937). Finally, detection of eyeblink enables the study of the natural frequency of blinking, which is altered in different pathological states such as schizophrenia or Parkinson’s disease (Ponder and Kennedy, 1927; Stevens, 1978; Karson, 1983). Several methods have been suggested for detection of the eye state of primates. One useful technique is electromyography (EMG) of the orbicularis oculi, the main muscle that is involved in blink- ∗ Corresponding author at: Department of Physiology, The Hebrew University – Hadassah Medical School, POB 12272, Jerusalem 91120, Israel. Tel.: +972 2 6757388; fax: +972 2 6439736. E-mail address: [email protected] (R. Mitelman). 1 These authors contributed equally. ing movement, and detecting its activation during eye closure (Silverstein et al., 1978; Blazquez et al., 2002). Another more direct method attaches the eyelid by a wire to a microtorque potentiometer that can measure its movements (Pennypacker et al., 1966). These methods are somewhat invasive, and it is unclear how these devices influence the natural movements of the eyelid. Less invasive ways include connecting an electromagnetic search coil to the eyelid (Robinson, 1963; Porter et al., 1993). Here, a wire coil is secured to the upper eyelid of the animal, and placed in a weak magnetic field. This generates a current in the coil that is proportional to the angular velocity of the eyelid, thus enabling detection of changes in the state of the eye. Another noninvasive method uses an infrared light-emitting diode (LED) and a photo sensor (Thompson et al., 1994; Clark and Zola, 1998). However, this method requires placing the detector at a distance of 4–5 mm from the animal’s eye, which may block significant parts of its field of view. These methods may be irritating to the primates, and therefore could influence their behavior. The least invasive method that has been used by researchers is direct detection by a human observer. This is usually done offline, after videotaping the animal’s behavior (e.g., Nevet et al., 2004). However this method is very cumbersome and time consuming, and therefore is not feasible for processing large amounts of data. Furthermore, human observers are prone to mistakes when asked 0165-0270/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jneumeth.2008.12.007 93 R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356 to classify long video sequences and may be biased by their a priori expectations. Several automatic visual analysis based methods have been suggested for eye state detection in other mammals. In humans there are several algorithms (Tian et al., 2000; Miyakawa et al., 2004; Benoit and Caplier, 2005; Tan and Zhang, 2006; Heishman and Duric, 2007), but they are rather complex, and do not take into account some of the differences between human and non-human primates (e.g. the difference in the sclera’s relative size). Moreover, these algorithms are primarily designed for non-scientific goals such as driver fatigue detection, and are intended to achieve impressive stability under unsupervised circumstances. On the other hand, they do not take advantage of the typical primate physiological recording setting, and fall below the performance level of human observers. A system that was suggested for use in rabbits (Bracha et al., 2003) has the disadvantage of attaching markers on the upper and lower eyelid of the animal and therefore is less suitable for daily repeating recording sessions that are typical of physiological studies of awake behaving primates. In this manuscript we suggest a simple, noninvasive and inexpensive video-based method to detect the state of the eye of primates under head immobilization conditions. The system takes advantage of the typical setting of primate physiological experiments, and operates on the basis of minimal changes in the position of the eyes during a recording session. The video camera can be positioned at a distance from the monkey (depending on its zoom properties) and therefore does not obscure the visual field and does not modify natural blinking behavior. The method is also highly accurate, with a performance level equivalent to that of a human observer (a mean normalized error of 0.15%). Furthermore, since this method works with infrared videotaping, the eye state can be detected in a dark environment. Appendix 351 2. Materials and methods The tool we describe in this paper includes standard hardware and simple custom-made software. We present the hardware we used in the experiments, and the way we chose to implement the algorithm, although any equivalent hardware and software implementation can be employed. 2.1. Physical setup and data acquisition All experimental protocols were performed in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and with Hebrew University guidelines for the use and care of laboratory animals in research, supervised by the institutional animal care and use committee. Briefly, monkeys went through an operation during which a head holder and a recording chamber were attached to their head. During recording sessions, the monkeys’ heads were immobilized and microelectrodes were advanced into different targets in the basal ganglia. A standard infrared digital surveillance camera was used to digitally record the monkey’s facial movements (AVer-s 2.54, AverMedia Systems, Taipei, Taiwan). The recording was done in an interlaced mode, with sampling rate of 25 frames per second (PAL mode). In the interlaced mode, each frame is composed of two separately sampled fields: one occupying the even rows and the other the odd ones, without smoothing them. Movies were saved in AVI format in 640 × 480 pixel resolution, with a grayscale color depth of 8 bits (i.e., 256 levels of infrared brightness). All data analysis was done in Matlab (Version 7.5, R2007b, The MathWorks). Movies or single frames were easily imported to Matlab, such that each frame is a single brightness matrix and an entire movie is a hypermatrix. To improve performance, importing was done in blocks of a few dozen frames. Each frame was de-interlaced Fig. 1. Example of density histograms of the brightness of a closed and open eye. (a and b) Randomly chosen open (a) and closed (b) eye field. Pixels in two ranges of brightness are color marked, and the original eye fields are shown in the inset. The darker hue, marked in blue, is seen specifically in the pupil, and the intermediate hue, marked in green, is seen mostly in the iris. Scale indicates 10 pixels horizontally and vertically. (c and d) Brightness histograms of the corresponding pictures. The two peaks were manually marked, and the color codes are as in a and b. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.) 94 352 R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356 Appendix to its two fields, and missing lines were interpolated by averaging each adjacent pair of existing lines. This recreated a full-sized image and doubled the sampling rate to 50 images per second. 2.2. Algorithm description The eye state detection (ESD) algorithm is based on the difference in light absorbance between the eyelid and the eye itself – the pupil, as well as the iris. Unlike humans, non-human primates have a relatively small sclera, so the pupil and the iris occupy most of the eye opening space. Visible light, as well as infrared radiation, is absorbed by the pupil and the iris considerably more than it is from the eyelid (Durkin et al., 1990; Thompson et al., 1994). As a result, in an open eye image, a certain number of pixels are dramatically darker than all other pixels, whereas in a closed eye image there are hardly any such dark pixels. This can be seen in Fig. 1 which plots the brightness histogram of the area of the eye. The brightness histogram of a typical open eye has two peaks that do not appear in the unimodal brightness histogram of the closed eye. The darkest peak originates from the high light absorbance of the pupil, and the second dim peak from the iris. As outlined below, proper detection of these peaks is enough for correct automatic classification of the state of the eye. The ESD algorithm is semi-automatic and requires three quick human decisions. First, the user is asked to indicate the location of the monkey’s eye (termed “eye field”) in an arbitrarily chosen frame, by marking two opposite corners of a rectangle (Fig. 2a). Since the head was immobilized during the experiment reported here, this rectangle only needed to be marked once per experimental day. The next step is training the algorithm. Eye fields from the video are chosen randomly by the algorithm, and are presented to the user. The user classifies the state of the eye in each eye field as open, closed, or inconclusive (Fig. 2b). This step is completed when the user determines that enough eye fields of both conclusive states have been classified (usually about 20 fields in total). Most videos contain more open eye fields than closed ones and a similar ratio is therefore found in the training set. The last step is to set the thresholds for the classification: brightness threshold and eye state threshold. This is done by pooling the eye field matrices for each conclusive state. This yields two brightness histograms, one for the open and one for the closed eye (Fig. 2c). The brightness histogram of the open eye fields consistently includes two peaks of darker pixels that fail to appear in the brightness histogram of the closed eye. The user is asked to set a brightness threshold that includes the maximum area of these peaks and the minimum area of the closed eye histogram (the dashed line in Fig. 2c). All pixels darker than this threshold are considered “black” for the following stage of the algorithm. The ESD algorithm calculates the number of black pixels in each eye field in the training set. The closed eye field with the maximal number of black pixels and the open eye field with the minimal number of black pixels are defined as ‘anchors’. The average number of black pixels in the two anchors is set as the eye state threshold. Taking the midpoint of the anchors as a threshold yields optimal separation in the training set, in the aspect of minimizing the generalization error. Such a threshold is conceptually similar to the one-dimensional case in the support vector machine (SVM) classification algorithm (Cortes and Vapnik, 1995). The calculation of the eye state threshold completes the training stage, and the algorithm now has all the necessary data for classification of the entire day of the experiment. The entire training stage takes the user about 30 s, and contains all the human-based decision input to the process. The eye state classification is obtained by calculating the number of black pixels for each eye field of the entire video sequence, based on the user’s chosen brightness threshold. Each eye field is then classified according to the eye state threshold: eye fields with more Fig. 2. The training stage of the eye state detection algorithm (ESD). (a) Arbitrarily chosen frame presented to the user, and the marked location of the eye. Scale indicates 50 pixels horizontally and vertically. (b) Three randomly chosen eye fields that were categorized by the user as open (I), closed (II) and inconclusive (III). Scale indicates 20 pixels horizontally and vertically. (c) Brightness histogram of the open (top) and closed (bottom) eye fields of the training set. The vertical dashed line is the brightness value that was chosen by the user as the threshold for the categorization of the entire video. black points than the eye state threshold are classified “open”, and ones with fewer black points than the eye state threshold are classified “closed”. This classification is ‘hard’ in the sense that a decision is forced. Therefore, eye fields that could be perceived by a human observer as inconclusive are also classified according to the number of black pixels. Using our hardware (2.8 GHz Pentium with 2 GB of RAM), the classification of a 2-h video took about 20 min (Matlab m files can be found at http://basalganglia.huji.ac.il/assets/ESD.zip). 3. Results 3.1. Algorithm performance and stability The eye state detection algorithm is based on two thresholds. As described above, the first is the brightness threshold, which determines how dark (on a scale of 0–255) a pixel needs to be so as to be considered black and is set by the user during the training stage. The second is the eye state threshold, which determines the state of the eye according to the number of black pixels, and is calculated automatically (based on the user’s classification during the training stage). Note that these two thresholds could cause performance instability in the algorithm, since they are influenced by the features of the training set and the user’s decisions. Therefore, we calculated 95 R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356 Fig. 3. ESD algorithm error as a function of the two thresholds. Normalized error is shown (color coded) as a function of the two types of classification thresholds used in the ESD algorithm (brightness threshold and eye state threshold). Normalized error is the probability of a classification error, assuming equal probability of open and closed eye. Brightness threshold is the threshold which defines which pixels are dark enough and is determined by the user. The eye state threshold defines how many black pixels (as defined by the first threshold) are enough to determine that the eye is open, and is determined by the algorithm according to the human decisions on the training set. The green Xs denote brightness thresholds chosen by the user, and the corresponding number of eye state thresholds for different training sets. This was done by repeating the algorithm 50 times with a training set of 20 images chosen randomly. Although the eye state threshold has a relatively large variance, all repetitions produced a normalized error of 0–1.5%. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.) the algorithm error as a function of these two thresholds. We chose a 20,000 field (400 s) video, and one of the experimenters (RM) classified its fields manually into open eye fields (88.0%), closed eye fields (8.9%) and inconclusive eye fields (3.1%). The inconclusive eye fields were discarded in this analysis, since the human observer performance was considered the gold standard. Two types of error are possible: detecting a closed eye when it is open, and vice versa. We were interested in performance independent of the eye state statistics in the chosen video. Therefore we used a normalized error (EN ), which is identical to the probability of an error in the case of equiprobability of open and closed eyes: EN = 1 P(classify open, closed eye) 2 P(closed eye) + = 1 2 + 1 P(classify closed, open eye) 2 P(open eye) P(classify open, closed eye) P(closed eye) P(classify closed, open eye) P(open eye) By the definition of conditional probability, this is also the average of the conditioned probability of an error: EN = 1 [P(classify open|closed eye) + P(classify closed|open eye)] 2 Each run of the algorithm produces two values, as described above: a brightness threshold selected by the user, and an eyestate threshold calculated by the algorithm according to the user’s open/close classification in the training set. To show how the normalized error depends on these two values, Fig. 3 plots the normalized error as a function of them. The figure shows that a large range of these two thresholds results in a relatively low error; hence the algorithm’s performance is stable over a large range of thresholds. 96 Appendix 353 Fig. 4. ESD algorithm performs more accurately than the SVM algorithm and needs a much smaller training set. The median of the normalized error of the ESD algorithm (continuous), and the SVM algorithm (dashed) ± the median absolute deviation (gray shadow), calculated following 10 repeats per training set size. The training sets were chosen randomly, while keeping a constant ratio of closed and open eye fields. SVM fails to match the performance of the ESD algorithm for median performance, variability and dependence on training size. Next, in order to verify that the thresholds generated during the training stage of the algorithm actually resulted in a low normalized error, we ran the semi-automatic algorithm 50 times with different randomly chosen training sets of 20 conclusive eye fields (i.e., open or closed) while applying the same open and closed eye statistics as in the entire video (i.e. 18 open eye fields and 2 closed). The resulting threshold pair of each run is plotted as an ‘x’ in Fig. 3. The brightness thresholds have a relatively low range of values, whereas the eye-state thresholds have a higher range of values. This is due to the low variance in the brightness of the iris and the pupil, in comparison to a higher variance in the number of pupil and iris pixels. Nevertheless, the number of classification errors for this range of thresholds falls within the span of a relatively small error. The normalized classification error had a mean of 0.15%, and a maximal value of 1.4%. Next, to assess the semi-automatic algorithm’s dependence on the size of the training set, we trained it on different training set sizes, ranging from two to thirty eye fields. Again, this was done by forcing the statistics of the entire video on the training sets (while keeping at least one eye field of each conclusive type). This was repeated 10 times for each training set size. The median ± absolute median deviation of the classification’s normalized error is plotted in Fig. 4 (continuous line). This shows an impressively low normalized error median even on a training set with a single eye field for each category, and a negligible error with training set as small as 8 eye fields. To demonstrate the robustness of the ESD algorithm, we compared its performance to the SVM algorithm, which is an optimal linear classification algorithm, in the sense of minimizing generalization error. Briefly, this algorithm finds the linear classifier of two clusters (in an n-dimensional hyperplane) with maximal margins (Cortes and Vapnik, 1995). We reshaped the eye field matrices to vectors, and used randomly chosen training sets of these vectors to train the SVM algorithm. We used the linear classifier to classify the entire video, and calculated the normalized error (identically to the normalized error of the ESD algorithm). This was repeated with different training set sizes, 10 times per size, and we obtained the median ± absolute median deviation (Fig. 4, dashed line). This shows that the ESD algorithm performs considerably better than the SVM algorithm. Although the difference decreases with the increase 354 R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356 Appendix not synchronized. Additionally, transient luminance changes may occur, e.g. due to opening of the recording chamber cover. To test the stability of the ESD algorithm as regards changes of luminance, we calculated the average brightness of the entire image (while omitting the area of the frame showing the time, see Fig. 2a bottom right) for each field. This is plotted in Fig. 6a, and reveals high frequency changes in luminance, as well as a transient change around 200 s. We manually split the video into brighter and darker periods, which are gray color coded in the figure. We ran the ESD algorithm 50 times with a training set of 20 fields, and calculated the average error per field. Fig. 6b shows the average normalized error in each of the video streams ± the standard deviation. The difference between the two error values was not significant (Student’s t-test, p = 0.19), which shows nicely that the open/close classification error is negligibly affected by the general luminance. 3.2. Possible applications of the ESD algorithm Fig. 5. ESD algorithm error as a function of the size of the rectangle which marked the eye location. (a) Normalized error as a function of the size of the eye field (in pixels on the diagonal). The algorithm was trained with the same training set, but with different sizes of eye rectangles. The normalized error of each eye rectangle size is shown as a single point on the curve. (b) An arbitrary field, with different sizes of marked rectangles. The rectangles correspond to the same color points as in (a). There is a broad area between the green and blue frames in which the error is zero, and a broader area between the green point and the cyan in which the error is less than 5%. The error only becomes considerable beyond this range. Scale indicates 50 pixels horizontally and vertically. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.) in training set size, it remains noticeable even in larger (n = 100) training sets (data not shown). Furthermore, the ESD algorithm emerges as more reliable, since the deviation around the median error is smaller than the deviation obtained in SVM (Fig. 4, gray shadow). Another user-selected parameter which could be an additional source of error is the location and size of the area of the eye (eye field). To test ESD algorithm’s stability for different eye field sizes, we compared the algorithm’s performance on a randomly chosen training set. Fig. 5 depicts the normalized error as a function of the eye field’s rectangular size, measured by the length of diagonal (in pixels). The ESD algorithm maintained its level of performance for a large range of sizes, with a normalized error of less than 10% even for a large rectangle that occupied almost the entire monkey’s face. Furthermore, a low level of error was maintained for a relatively small rectangle. Finally, surveillance cameras are sensitive to visual light; hence changes in the luminance might affect the tool’s performance. Luminance can go through high frequency modulations, e.g., because the camera’s sampling and the refresh rate of the video screen (where visual stimuli were presented to the monkey) were Eye state classification has potential for a wide variety of applications. We used the system on a delayed probabilistic classical conditioning task (Joshua et al., 2008). In this task, the monkey was repeatedly presented with one of a set of visual stimuli, each predicting an outcome with a different probability. Three stimuli predicted the administration of liquid food and three predicted the delivery of an airpuff with the same probabilities. Fig. 7a shows the percentage of the trials in which the eye is closed, in 20 ms bins, with respect to the administration of the outcome (food/airpuff). This indicates that the monkey closes its eyes to airpuffs but not to food. Furthermore, the monkey indeed learned to distinguish between these stimuli, as can be seen by the timing of the response which preceded the airpuff itself (Fig. 7a). The algorithm provides an elegant way of showing that the monkey’s eye state has an increasingly higher probability of being closed as the time of the airpuff approaches. We further used the algorithm to assess the blinking response to apomorphine (Apo) induced dyskinesias. Systemic injection of Apo, an ultra-fast dopamine agonist, induces orofacial dyskinesias which are known to include higher blinking rates (Blin et al., 1990; Kleven and Koek, 1996; Nevet et al., 2004). This is usually measured by human observers who count the number of blinks, a method which is prone to error and bias. Fig. 7b depicts the blinking rate of a monkey after intramuscular injection of 0.1 mg/kg Apo HCl 1%, as measured by the ESD algorithm. Closed eye events that lasted less than a second were defined as a blink. The blinks were counted in 1 min bins, and then smoothed with a Gaussian window with a standard deviation of 1 bin. As described in previous studies (Nevet et al., 2004, Fig. 1c), the blinking rate increased with the Apo administration, and remained so for at least 25 min. 4. Discussion This manuscript describes a fast, simple, inexpensive and noninvasive tool for eye state detection during electrophysiological studies of primates. It is adapted to perform optimally in the typical setting of primate physiological studies; e.g., head fixation (Lemon, 1984). This type of tool is valuable for many primate studies, and can be easily adapted to most setups, since the only hardware it requires is a digital surveillance camera. The use of such camera, which is sensitive to the infrared wavelength and has its own source of infrared radiation, makes it possible to detect eye state under different illumination conditions, including darkness. Furthermore, using a digital camera rather than an analog device makes the eye state detection less vulnerable to electrical noise. The temporal resolution of these cameras is usually 25–30 frames per second, which can be doubled by de-interlacing. This 97 R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356 Appendix 355 Fig. 6. ESD algorithm error as a function of general scene luminance. (a) Luminance in the entire scene as a function of time. Luminance was calculated by taking the average brightness value in each field during the 400-s video (omitting the area of the video image presenting the time). A striking increase in luminance can be seen around 200 s (due to opening of the recording chamber cover). This was manually marked and the video was split to a darker, early period (marked in black) and a lighter, late period (marked in gray). (b) The average normalized error of the darker and the lighter periods is presented (same color code as in a). The error was calculated by repeating the ESD algorithm 50 times with a training set of 20 eye fields. The error bars represent the standard deviation of the normalized error. Both periods present a very low error level, with no significant difference (Student’s t-test, p = 0.19), indicating the low dependence of ESD performance on general scene luminance. makes the tool suitable for detection of non-human primates blinks (Baker et al., 2002), although cases of blinks with a fractional closure of the eyelid (Rambold et al., 2005) might be missed. For higher temporal resolution, a faster camera is needed, rendering the system more expensive, but requiring no changes in the algorithm. Correct timing of the electrophysiological recording and the eye state calls for accurate synchronization between the video and the electrophysiological recording. In our setup this was done by feeding a time signal from the recording system into the digital video, presenting it on the bottom left corner (e.g., Fig. 2a), and detecting it offline. Other synchronization signals, such as an auditory signal for the camera, can be used as well. Although detection of the state of a single eye was sufficient for our purposes, the algorithm could be easily adapted to detect the states of both eyes by marking their location and possibly detecting the two thresholds separately. This would also require proper positioning of the camera, such that both eyes are clearly visible. Both ESD and SVM algorithms are supervised learning binary classifiers, but ESD performs considerably better than the SVM algorithm in detecting the eye state, although SVM is considered a very robust linear classifier. This is probably because the ESD algorithm makes assumptions regarding the data, whereas SVM does not. The ESD assumes that the number of pixels in the eye field that are suf- ficiently dark is a strong enough rule for the detection of the state of the eye. Naturally, when this assumption does not hold, ESD will perform worse than SVM. Indeed, ESD algorithm’s errors occur mostly when there are fewer dark pixels, e.g. when the eye is turned away from the camera, or the eye-lid is partially closed. This also accounts for the increase in error with the decrease in eye field size (Fig. 5a). Nevertheless, the ESD algorithm maintains a low level of error for a relatively small rectangle. This suggests that the algorithm is stable to physiological changes in the angle of the eye in which smaller areas of the pupil and iris are visible. ESD algorithm is also stable to changes in general scene luminance. This is because the iris, and even more so the pupil, have a high light absorbance, and therefore there are enough dark pixels in an open eye image, even with greater light intensity. This semi-automatic procedure was found to be adequate for our needs, because it was highly accurate and required very little training time. Therefore, we did not find it necessary to make it fully automatic. However, fully automatic algorithms that detect the location of the eye for a moving human face have been reported by other researchers (e.g., Craw et al., 1992), and could be adapted to our tool. This may enable using the tool for experiments that do not require a head restraint; e.g., with chronically implanted electrodes (Nordhausen et al., 1996; Jackson and Fetz, 2007). Fig. 7. Possible applications of the eye state detection tool. (a) Eye closure to airpuff. The monkey was presented with a reward (liquid food) or an aversive stimulus (airpuff) at t = 0, after a reward- or aversion-predicting visual stimulus. The percentage of times that the animal kept the eye closed in 20 ms bins is plotted ± variance (shaded). The monkey closed its eyes in anticipation and following the aversive stimulus, but not for the reward. (b) Apomorphine induced dyskinesias increases blinking frequency. Blinking rate was measured by counting eye closures lasting less than a second, before and after the Apomorphine injection (at t = 0). The dashed line is the raw blinking rate per minute and the continuous line is the blinking rate after smoothing with a Gaussian window of one bin (1 min). 98 356 R. Mitelman et al. / Journal of Neuroscience Methods 178 (2009) 350–356 Finally, there is continual interest in the effect of arousal levels on neural activity (e.g., Steriade and McCarley, 2005). Eye state detection, supported by analysis of eye-movements, EEG and EMG provide an accurate estimation of arousal state. Moreover, it provides a reliable estimation of blinking rate, which is affected by many physiological and pathological processes. Thus overall, our tool provides a reliable, noninvasive and inexpensive method for detection of eye open/closed states, and is therefore a recommended add-on for primate electrophysiological setups. Acknowledgement This work was partly supported by a Hebrew University Netherlands Association grant entitled “Fighting against Parkinson”, and the Harry and Sylvia Hoffman leadership and responsibility program. We would like to thank E. Singer for language editing. References Baker RS, Radmanesh SM, Abell KM. The effect of apomorphine on blink kinematics in subhuman primates with and without facial nerve palsy. Invest Ophthalmol Vis Sci 2002;43:2933–8. Benoit A, Caplier A. Hypovigilence analysis: open or closed eye or mouth? Blinking or yawning frequency? In: IEEE conference on advanced video and signal based surveillance, AVSS; 2005. p. 207–12. Blazquez PM, Fujii N, Kojima J, Graybiel AM. A network representation of response probability in the striatum. Neuron 2002;33(6):973–82. Blin O, Masson G, Azulay JP, Fondarai J, Serratrice G. Apomorphine-induced blinking and yawning in healthy volunteers. Br J Clin Pharmacol 1990;30:769–73. Bracha V, Nilaweera W, Zenitsky G, Irwin K. Video recording system for the measurement of eyelid movements during classical conditioning of the eyeblink response in the rabbit. J Neurosci Methods 2003;125:173–81. Clark RE, Zola S. Trace eyeblink classical conditioning in the monkey: a nonsurgical method and behavioral analysis. Behav Neurosci 1998;112:1062–8. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20:273–97. Craw I, Tock D, Bennett A. Finding face features. In: Goos G, Hartmanis J, editors. Computer vision – ECCV’92. Berlin: Springer; 1992. p. 92–6. Durkin M, Prescott L, Jonet CJ, Frank E, Niggel M, Powell DA. Photoresistive measurement of the Pavlovian conditioned eyelid response in human subjects. Psychophysiology 1990;27:599–603. Heishman R, Duric Z. Using image flow to detect eye blinks in color videos. In: IEEE workshop on applications of computer vision, WACV; 2007. p. 52. Appendix Jackson A, Fetz EE. Compact movable microwire array for long-term chronic unit recording in cerebral cortex of primates. J Neurophysiol 2007;98: 3109–18. Joshua M, Adler A, Mitelman R, Vaadia E, Bergman H. Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J Neurosci 2008;28:116.73–84. Karson CN. Spontaneous eye-blink rates and dopaminergic systems. Brain 1983;106(Pt 3):643–53. Kleven MS, Koek W. Differential effects of direct and indirect dopamine agonists on eye blink rate in cynomolgus monkeys. J Pharmacol Exp Ther 1996;279:1211–9. Lemon RN. Methods for neuronal recording in conscious animals. In: IBRO handbook series: methods in neurosciences. London: Wiley; 1984. Marquis DG, Hilgard ER. Conditioned responses to light in monkeys after removal of the occipital lobes. Brain 1937;60:1–12. Miyakawa T, Takano H, Nakamura K. Development of non-contact real-time blink detection system for doze alarm. In: SICE Annual Conference; 2004. p. 1626–31. Nevet A, Morris G, Saban G, Fainstein N, Bergman H. Discharge rate of substantia nigra pars reticulata neurons is reduced in non-parkinsonian monkeys with apomorphine-induced orofacial dyskinesia. J Neurophysiol 2004;92:1973–81. Nordhausen CT, Maynard EM, Normann RA. Single unit recording capabilities of a 100 microelectrode array. Brain Res 1996;726:129–40. Pennypacker HS, King FA, Achenbach KE, Roberts L. An apparatus and procedure for conditioning the eye-blink reflex in the squirrel monkey. J Exp Anal Behav 1966;9:601–4. Ponder E, Kennedy WP. On the Act of Blinking. Q J Exp Physiol 1927;18:89–110. Porter JD, Stava MW, Gaddie IB, Baker RS. Quantitative analysis of eyelid movement metrics reveals the highly stereotyped nature of monkey blinks. Brain Res 1993;609:159–66. Rambold H, El Baz I, Helmchen C. Blink effects on ongoing smooth pursuit eye movements in humans. Exp Brain Res 2005;161:11–26. Robinson DA. A method of measuring eye movement using a scleral search coil in a magnetic field. IEEE Trans Biomed Eng 1963;10:137–45. Silverstein LD, Graham FK, Eyeblink EMG. a miniature eyelid electrode for recording from orbicularis oculi. Psychophysiology 1978;15:377–9. Steriade M, McCarley R. Brain control of wakefulness and sleep. 2nd ed. New York: Springer; 2005. Stevens JR. Eye blink and schizophrenia: psychosis or tardive dyskinesia? Am J Psychiatry 1978;135:223–6. Tan H, Zhang YJ. Detecting eye blink states by tracking iris and eyelids. Pattern Recog Lett 2006;27:667–75. Thompson LT, Moyer JR, Akase E, Disterhoft JF. A system for quantitative analysis of associative learning. Part 1. Hardware interfaces with cross-species applications. J Neurosci Methods 1994;54:109–17. Tian Yl, Kanade T, Cohn J. Eye-state action unit detection by gabor wavelets. In: Advances in multimodal interfaces – ICMI 2000. Berlin: Springer; 2000. p. 143–150. 99 תגובות לצפייה לתגמול מראה על הבדל בין מערכת התגמולים והעונשים במוח וייתכן שזהו הבסיס העצבי להבדלים בין מערכות הללו הנצפים בהתנהגות האנושית. תקציר הגרעינים הבאזליים הם מבנים עצביים בתוך מנגנוני השליטה המוטוריים ,קוגניטיביים ואמוציונאליים. מחקריים ניסויים ועבודות תיאורטיות שנעשו לאחרונה מתארים את הגרעינים הבזליים כמערכת המיישמת למידת חיזוקים .המחקרים הללו הציעו שהפעילות העצבית של הגרעינים הבזליים מאפשרת מירוב של התגמולים העתידיים על ידי שליטה בסביבה. בפרט ,מחקרים הראו שהתאים הדופמינרגים במוח האמצעי מגיבים בעליית קצב הירי כאשר מצב החיה טוב מהמצופה )הפתעה חיובית( .האות העצבי הזה תואם לאות השגיאה המתקבל בלמידת חיזוקים .אולם קצב הירי הבסיסי הנמוך של התאים הללו מגביל את היכולת שלהם לקודד אירועים שליליים על ידי הורדת קצב הירי. מגבלות התגובה של התאים הדופמינרגים הובילו אותי לבחינת שתי אפשריות .הראשונה היא שהפעילות בגרעינים הבזליים מקודדת הן ערכיים חיוביים והן ערכיים שליליים .האפשרות השנייה היא שרק ערכיים חיוביים מקודדים בגרעינים הבזליים והערכים השליליים מקודדים על ידי מבנים עצביים אחרים. בכדי להפריד בין האפשריות אימנתי שני קופים במשימה הסתברותית של התניה קלאסית .בכל ניסוי הקוף נחשף לתמונה שלאחריה הוא יכול לקבל אוכל )תגמול( ,פרץ של אויר )עונש( או כלום .במהלך המשימה רשמתי את הפעילות העצבית מתאים בחמישה אזורים שונים בגרעינים הבזליים של קופים מתנהגים .הפעילות נרשמה הן מהמודולטורים )התאים הדופאמינרגים במוח התיכון ותאי הביניים הכולינרגים בסטריאטום( והן מהתאים מפרישי ה) GABAסטריאטום ,החלק הפנימי והחיצוני של ה Globus Pallidusומ (Substantia Nigra pars reticulataבציר המרכזי של הגרעינים הבזאליים. הליקוקים והמצמוצים במהלך הצגת התמונות הראו שהקופים מצפים לתגמול ולעונש אולם בכל האוכלוסיות העצביות שמהם רשמתי הפעילות קודדה את הצפייה לאוכל אך לא את הצפייה לפרץ האוויר. בנוסף השוויתי את מאפייני התגובה של התאים המודולטוריים ושל תאים מפרישי ה GABAומצאתי שבעוד שלתאים המודולטוריים יש תגובה מהירה ואחידה התגובות של התאים מפרישי ה GABAהיו ארוכות מגוונות שכללו הן עליות והורדות קצב. ניתוח המתאם בין הפעילות של תאים דופמינריגים שנרשמו בו זמנית הראה עליה מהירה בתיאום לאחר אירועים הקשורים לתגמול אך לא לאחר אירועים הקשורים לעונש .העלייה במתאם הפעילות לא שיקפה באופן ישיר את שינויי הקצב של התאים .הדמיה ממוחשבת הראתה שייתכן והעלייה במתאם הפעילות של התאים הדופאמינרגים מספקת מנגנון נוסף לשליטה על כמויות הדופמין בסטריאטום מעבר לשליטה המתאפשרת בעזרת שינויי קצב ודפוס תגובה. לסיכום ההבדל בין התגובות של תת האוכלוסיות של הגרעינים הבזליים )מודולטורים לעומת תאים מפרישי ה (GABAמראה שלאוכלוסיות הללו תפקידים שונים בלמידת חיזוקים שבו האוכלוסיות המודולטוריות מספקות לציר המרכזי אות חד מימדי .ההבדל בין ההתנהגות שבה מצאתי הן תגובות הן לצפייה לעונש והן תגובות לצפייה לתגמול לבין הפעילות התאית בגרעינים הבזליים שבה מצאתי בעיקר עבודה זו נעשתה בהדרכתו של פרופ' חגי ברגמן תפקיד הגרעינים הבזליים בלמידת חיזוקים חיבור לשם קבלת תואר דוקטור לפילוסופיה מאת מתי יהושע הוגש לסינט האוניברסיטה העברית בשנת תשס"ט מרץ 2009