Download decision-making in the primate brain

J Comp Physiol A (2005) 191: 201–211 DOI 10.1007/s00359-004-0565-9 R EV IE W Allison N. McCoy Æ Michael L. Platt Expectations and outcomes: decision-making in the primate brain Received: 2 March 2004 / Revised: 28 July 2004 / Accepted: 12 August 2004 / Published online: 12 October 2004 Springer-Verlag 2004 Abstract Success in a constantly changing environment requires that decision-making strategies be updated as reward contingencies change. How this is accomplished by the nervous system has, until recently, remained a profound mystery. New studies coupling economic theory with neurophysiological techniques have revealed the explicit representation of behavioral value. Specifically, when fluid reinforcement is paired with visually-guided eye movements, neurons in parietal cortex, prefrontal cortex, the basal ganglia, and superior colliculus—all nodes in a network linking visual stimulation with the generation of oculomotor behavior—encode the expected value of targets lying within their response fields. Other brain areas have been implicated in the processing of reward-related information in the abstract: midbrain dopaminergic neurons, for instance, signal an error in reward prediction. Still other brain areas link information about reward to the selection and performance of specific actions in order for behavior to adapt to changing environmental exigencies. Neurons in posterior cingulate cortex have been shown to carry signals related to both reward outcomes and oculomotor behavior, suggesting that they participate in updating estimates of orienting value. A. N. McCoy Æ M. L. Platt (&) Department of Neurobiology, Duke University Medical Center, 325 Bryan Research Building, Box 3209, Durham, NC 27710, USA E-mail: [email protected] M. L. Platt Center for Cognitive Neuroscience, Duke University, Durham, NC 27710, USA M. L. Platt Department of Biological Anthropology and Anatomy, Duke University, Durham, NC 27710, USA Introduction For even the simplest of organisms, adaptive behavior requires, to paraphrase Darwin, the preservation of favorable variants and the rejection of injurious ones. In other words, nervous systems must weigh the potential rewards and punishments associated with available options and select the behavior likely to yield the best possible outcome. This process of adaptive decisionmaking is well illustrated by the example of a financial advisor choosing whether to buy, sell or hold portions of stock based on calculations of the most likely monetary gain. In the face of sudden change, such as a stock market crash, the value associated with available options is rapidly updated. In this way, the decision process allows behavior to adjust to environmental flux, resulting in a net benefit to the organism. While the analogy of the financial advisor captures the iterative, adaptive nature of the decision process, how does the brain actually select an action from the behavioral repertoire? This is a question that has long puzzled philosophers, economists, behavioral ecologists, and neuroscientists. In fact, significant advancement in our understanding of the neural basis of decision-making has been made through a combination of economic theory and neurophysiological techniques (see Glimcher 2003, for exposition). This review describes current understanding of the neural mechanisms underlying decision-making, focusing on evidence for the neural representation, computation and revision of economic decision variables used to guide visual orienting in primates. Behavioral economics: expected value and decision-making In neurobiology, as in psychology, the sensory-motor reflex has been wide viewed as a useful model for directly linking sensation to action (Pavlov 1927; Sherrington 1906). As instructive as it has been, however, the simple 202 sensory-motor reflex can only begin to approximate the rich ethology of real-world behavior (Glimcher 2003; Herrnstein 1997; Platt 2002). Only recently have neurobiologists begun to investigate the idea that internal decision variables dynamically link purely sensory and purely motor processes and are explicitly represented in the nervous system (Basso and Wurtz 1997; Dorris and Munoz 1998; Kim and Shadlen 1999; Platt and Glimcher 1999; Shadlen and Newsome 1996, 2001). In contrast, over the past 400 years, economists have developed simple normative models to describe what rational agents should do when confronted with a choice between two options [Arnauld and Nichole 1662; Bernouilli (1758) in Speiser 1982]. The simplest economic model of decision-making, known as Expected Value Theory, posits that rational agents compute the likelihood that a particular action will yield a gain or loss, as well as the amount of gain or loss that can be expected from that choice (Arnauld and Nichole 1662). These values are then multiplied to arrive at an estimate of expected value for each possible course of action, and the option with the highest expected value is chosen (Arnauld and Nichole 1662). Maximizing expected value is the optimal algorithm for a rational chooser with complete information about the costs and benefits of different options, as well as their likelihood of occurrence. Expected value models, and variations on them, i.e., Expected Utility Theory [Bernouilli (1758) in Speiser (1982)], have been shown to be very good descriptors of the choices both people and animals make in a variety of simple situations (Stephens and Krebs 1986; Herrnstein 1997; Camerer 2003; Glimcher 2003). Expected value models of decision-making assume that rational choosers have access to information about the probability and amount of gain that can be expected from each action. Behavioral observations have repeatedly demonstrated that humans and animals are exquisitely sensitive to the expected value of available options (Herrnstein 1961; Stephens and Krebs 1986; Glimcher 2003). This suggests that nervous systems somehow represent information about the estimated costs and benefits of potential behaviors and use that information to dynamically link sensation to action. New studies have begun to shed light on this process. These studies suggest that neurons representing different eye movements compete to reach a firing rate threshold for movement initiation (Hanes and Schall 1996; Roitman and Shadlen 2002). The activity of these neurons is systematically enhanced by both sensory information favoring a particular eye movement (Shadlen and Newsome 1996, 2001; Roitman and Shadlen 2002) and increases in the expected value of that eye movement (Kawagoe et al. 1998; Leon and Shadlen 1999; Platt and Glimcher 1999; Coe et al. 2002). These observations intimate that the eye movement decision process may be instantiated rather simply by scaling the activity of movement-related neurons by the expected value of the eye movements they encode. Some of the important evidence supporting this model is described further below. Expected value and primate parietal cortex In an explicit application of economic theory to experimental neurophysiology, Platt and Glimcher (1999) manipulated the expected value of saccadic eye movements while monitoring the activity of neurons in the lateral intraparietal area (LIP), a subregion of primate posterior parietal cortex thought to intervene between visual sensation and eye movements (Fig. 1; Gnadt and Andersen 1988; Goldberg et al. 1990, 1998). Prior studies had suggested that the responses of LIP neurons were sensitive to the behavioral relevance of visual targets for guiding attention or gaze shifts (Gnadt and Andersen 1988; Colby et al. 1996; Platt and Glimcher 1997) as well as the strength of perceptual evidence favoring the production of a particular eye movement (Shadlen and Newsome 1996, 2001). These observations suggested that LIP neurons play a role in the oculomotor decision-making process (Platt and Glimcher 1999; Shadlen and Newsome 1996). In the first set of experiments designed to test this hypothesis (Platt and Glimcher 1999), monkeys performed cued saccade trials, in which a change in the color of a central fixation light instructed subjects to Fig. 1 A network for oculomotor decision-making. Medial and lateral views of the macaque monkey brain illustrating major pathways carrying signals related to saccadic eye movements (green) and reward (blue). Note: many of these pathways are bidirectional but have been simplified for presentation. AMYG amygdala, CGp posterior cingulate cortex, CGa anterior cingulate cortex, LIP lateral intraparietal cortex, SEF supplementary eye fields, FEF frontal eye fields, PFC prefrontal cortex, OFC orbitofrontal cortex, SC superior colliculus, NAC nucleus accumbens, SNr substantia nigra pars reticulata, SNc substantia nigra pars compacta, VTA ventral tegmental area. After (Platt et al. 2004) 203 shift gaze to one of two possible target locations in order to receive a fruit juice reward. A change from yellow to red indicated that looking to the upper target would be rewarded while a change from yellow to green indicated that the lower target would be rewarded. In successive blocks of trials, either the volume of juice delivered for each eye movement response or the probability that each possible movement would be cued and therefore reinforced was varied. The investigators found that the activity of LIP neurons was systematically modulated by both the size of the reward a monkey could expect to receive and the probability that a particular gaze shift would be reinforced—the two variables needed to compute the expect value of the movement. Intriguingly, the effects of expected value on neuronal firing rates were maximal early on each trial but diminished after the rewarded target had been cued. The information encoded by LIP neurons thus appeared to track the monkey’s running estimate of the expected value of a particular movement, which was maximal once that movement was cued (Glimcher 2002). These experiments demonstrated that LIP neurons carry information about the economic decision variables thought to drive rational decision-making. In those studies, however, monkeys did not have a choice about which eye movement to make and thus the link between decision-making and LIP activity could not be directly observed. In a second experiment (Platt and Glimcher 1999), the investigators studied the activity of LIP neurons while monkeys were permitted to freely choose between two movements differing in expected value. This allowed the investigators to derive a behavioral estimate of the monkey’s subjective valuation of each response, which could be compared directly with LIP neuronal activity. Under these conditions, monkeys behaved like rational consumers: their pattern of target choices was sensitive to the amount of fruit juice associated with each target (Fig. 2b). LIP neuronal activity was also sensitive to expected value in the free choice task (Fig. 2a, c). Activity prior to movement increased systematically with its expected value in concert with the monkey’s valuation of the same movement. Based on these data, Platt and Glimcher (1999) concluded that LIP neurons encode the expected value of potential eye movements. More specifically, Gold and Shadlen (2002) have argued that these data are consistent with the hypothesis that LIP neurons signal the instantaneous likelihood that the monkey will choose and execute a particular eye movement, and that this likelihood is computed, at least in part, based on the expected value of that movement. Neurons in several other brain areas have been shown to carry signals related to the expected value of eye movements. For example, neurons in the caudate nucleus were found to respond in anticipation of rewardpredicting stimuli on a memory saccade task (Kawagoe et al. 1998), and the strength of this signal was correlated with the latency of behavioral response (Takikawa et al. Fig. 2a–c Monkeys and posterior parietal neurons are sensitive to eye movement value. a Firing rate as a function of time for a single LIP neuron when the monkey subject chose to shift gaze to the target in the neuronal response field. Black curve high value trials, grey curve low value trials. Tick marks indicate action potentials recorded on successive trials for the first ten trials of each block. b Proportion of trials on which a single monkey subject chose target 1 as a function of its expected value [reward size target 1/ (reward size target 1 + reward size target 2)]. c Average normalized premovement firing rate (±SE) for a single posterior parietal neuron in the same monkey as a function of the expected value of the same target. After Platt and Glimcher (1999). Reprinted by permission from Nature 2002). On a reaction time eye movement task, anticipatory activity in the caudate was maximal when the contralateral target was reliably rewarded, and tracked changes in the monkeys‘ response time as the rewarded location was reversed (Lauwereyns et al. 2002). Anticipatory activity was absent, however, during a task in which the rewarded target unpredictably changed from trial to trial (Lauwereyns et al. 2002). Signals correlated with the reward value associated with eye movements have also been found in dorsolateral prefrontal cortex (Leon and Shadlen 1999), supplementary eye fields (Amador et al. 2000; Stuphorn et al. 2000a), substantia nigra pars reticulata (Sato and Hikosaka 2002), and even the superior colliculus (Ikeda and Hikosaka 2003). Expected value signals thus appear to penetrate the oculomotor system from the cortex through the final common pathway in the superior colliculus, presumably serving to bias eye movement selection towards maximization of reward. Behavioral measures indicate that expected value systematically influences not only target selection by eye movements, but also modulates saccade metrics, including amplitude, velocity, and reaction time 204 b Fig. 3a, b Monkeys rapidly learn to choose the high value target. a Proportion of trials on which a single monkey chose target 1 as a function of its expected value. Individual graphs plot choice functions for sequential five trial blocks following a change in reward contingencies favoring target 1. Red line indicates a cumulative normal function fit to the choice data. b Reward sensitivity over time. Graph plots the steepness of the psychometric function (Weber fraction) computed by dividing the standard deviation by the mean of the cumulative normal function fit to the choice data in each five trial block for two monkey subjects. After McCoy et al. (2003). Reprinted with permission from Neuron throughout the cortical and subcortical oculomotor afferents to the superior colliculus. Such modulations in saccade-associated activity presumably result in the differential activation of pools of motor neurons in the oculomotor brainstem that ultimately generate the patterns of muscle contraction responsible for shifting gaze. This model, however, has yet to be tested functionally using either microstimulation or inactivation techniques. Learning from mistakes: dopamine and prediction error (Leon and Shadlen 1999; Takikawa et al. 2002; McCoy et al. 2003). These behavioral observations are consistent with a model for oculomotor decision-making in which expected value systematically biases neuronal activity Recent studies by Platt and colleagues (McCoy et al. 2003) have demonstrated that the expected value of eye movements is rapidly updated in the primate brain when reward contingencies change. Figure 3a shows the pattern of choice behavior in the target choice task for one well-trained monkey as a function of time following the introduction of a novel set of reward contingencies. Each graph plots the frequency of choosing target 1 as a function of the relative value of that target (reward for target 1 divided by the sum of the reward values offered for targets 1 and 2) for data gathered with multiple different target values over several weeks of experimental sessions. The choice functions were well fit by cumulative normal functions, which became steeper over time but nearly reached asymptote after only about ten trials. Figure 3b plots reward sensitivity (Weber fraction) against time for two monkeys. These data indicate that the monkeys rapidly learned to choose the target associated with the greatest amount of reward. These data suggest that the brain continuously evaluates the reward (or punishment) outcome associated with movements and uses that information to update the expected value of those movements. Learning theorists have long argued that a comparison of reward expectation and reward outcome directly drives learning (Pearce and Hall 1980; Rescorla and Wagner 1972) and should therefore be encoded in the nervous system. The comparison of expected and actual reward is known as a reward prediction error (Sutton and Barto 1981), and is defined formally as the instantaneous discrepancy between the maximal associative strength sustained by the reinforcer and the current strength of the predictive stimulus (Schultz 2004). Reward prediction error models of learning have been invoked to explain previously elusive behavioral phenomena such as blocking. In traditional associative 205 models of learning (e.g., Skinner 1981; Thorndike 1898) any cue paired with reinforcement should be learned, but in blocking paradigms redundant cues paired with rewards fail to be learned (the cues are ‘‘blocked’’ from acquiring associative value). The typical blocking experiment introduces four stimuli, A, B, X, Y, which are either paired with a reward or with nothing. In the first stage of the experiment, a subject learns that stimulus A is paired with a reward while stimulus B is not. Once this is learned, the same stimuli are subsequently paired with two novel stimuli (X and Y), and, in this second stage of the experiment, the joint stimuli AX and BY are both paired with rewards. If learning were merely associative, the subject would respond to both novel stimuli X and Y as if they predicted the reward. In fact, under these conditions subjects learn to associate Y, but not X, with reward. The explanation for this effect lies in the fact that because stimulus A fully predicted reward in stage 1, stimulus X was redundant and was therefore not highlighted for learning (Rescorla and Wagner 1972). Reward prediction error models, such as the Rescorla-Wagner model or temporal difference learning models (Sutton and Barto 1981), nicely account for these results because once the relationship between cue A and reward has been learned, the prediction error term goes to 0 and remains unchanged when the combined stimulus AX is followed by a reward in stage 2; hence, no new learning occurs for the redundant cue. Over the past decade, Wolfram Schultz and colleagues have demonstrated that the activation of dopamine neurons in the midbrain represents, at least in part, an instantiation of a reward prediction error in neural circuitry (Schultz and Dickinson 2000). These conclusions rest on the observation that the activity of dopamine neurons in the substantia nigra and ventral tegmental area is elevated by the delivery of unpredicted rewards, unchanged following the delivery of predicted rewards, and depressed when expected rewards are withheld (Schultz 1998). Moreover, in blocking paradigms these dopamine neurons fail to respond to redundant cues (Waelti et al. 2001). The activity of these neurons has therefore been proposed to encode reward prediction error and thus to determine the direction and rate of learning: learning occurs when the error is positive and the reward is not predicted by the stimulus; no learning occurs when the error is 0 and the reward is fully predicted; forgetting or extinction occurs when the error is negative and the reward is less than predicted by the stimulus; and the rate of learning is determined by the absolute size of the error, whether positive or negative (Schultz and Dickinson 2000). Recent evidence has called into question whether the signals carried by dopaminergic midbrain neurons are specific to rewards or can be generalized to other salient stimuli (Horvitz 2002). Horvitz and others have shown that putative dopamine neurons respond to a variety of novel or arousing events, including aversive as well as rewarding stimuli (Salamone 1994; Horvitz 2000; but see Mirenowicz and Schultz 1996). These observations suggest that the responses of dopamine neurons might therefore signal surprising events in the environment regardless of whether they are rewarding or aversive (Horvitz 2002). Rather than encoding a reward prediction error per se, dopamine neurons might instead gate cortical and limbic inputs to the striatum following unpredicted events and thereby facilitate learning. The role of dopamine neurons in learning thus appears to be more complex than previously thought, and a deeper understanding will likely lie in further investigations of the response of dopamine neurons to all salient stimuli—good or bad. Updating expectations: the role of posterior cingulate cortex Up to this point, we have discussed the evidence that neurons in parietal cortex, prefrontal cortex, the basal ganglia, and the superior colliculus carry information about the expected value of the decision about where to look. Midbrain dopamine neurons, on the other hand, appear to play a role in processing reward-related information in the abstract—highlighting when expected and actual reward outcomes are in conflict. Such a signal would be useful for initiating the assignment of new estimates of the expected value of potential behaviors, and evidence increasingly suggests this to be the case (Montague and Berns 2002). But how are abstract prediction error signals generated by midbrain dopamine neurons linked to the selection and performance of specific actions, such as eye movements, that maximize reward? With this question in mind, Platt and colleagues (McCoy et al. 2003) investigated the properties of neurons in posterior cingulate cortex (CGp), a poorly understood part of the limbic system thought to be related to eye movements (Olson et al. 1996), spatial localization (Sutherland et al. 1988; Harker and Whishaw 2002), and learning (Bussey et al. 1997; Gabriel et al. 1980; Gabriel and Sparenborg 1987). Many CGp neurons are activated following saccades, in contrast with the activation of saccade-related neurons in parietal and prefrontal cortex prior to movement (Olson et al. 1996). Such a pattern of activation is suggestive of an evaluative, rather than generative, role in oculomotor behavior, an idea first suggested by Vogt and colleagues (Vogt et al. 1992). Anatomically, CGp is interconnected with reward-related areas of the brain (see Fig. 1), including the anterior cingulate cortex (Morecraft et al. 1993), orbitofrontal cortex (Cavada et al. 2000), and the caudate nucleus (Baleydier and Mauguiere 1980), as well as oculomotor areas such as parietal cortex (Cavada and Goldman-Rakic 1989), prefrontal cortex (Vogt and Pandya 1987) and the supplementary eye fields. These connections provide potential sources for a functional linkage of motivational and oculomotor information within CGp. To test the idea that CGp links motivational outcomes with eye movements, Platt and colleagues 206 recorded the activity of CGp neurons while monkeys shifted gaze to a single visual target for fruit juice rewards. In the first experiment, the size of the reward associated with visually-guided saccades was held constant for a block of 50 trials, and then varied between blocks, while the oculomotor behavior of the monkeys and the activity of CGp neurons were examined. The authors found that saccade metrics were sensitive to the size of reward associated with gaze shifts. Specifically, monkeys made faster, higher velocity saccades when expecting smaller rewards, a strategy consistent with reducing the delay to reinforcement and thereby increasing fluid intake rate in low reward blocks of trials. This observation is consistent with the idea that, even in the absence of an overt decision task, monkeys are sensitive to the expected value of gaze shifts. In this study, the activity of CGp neurons was also systematically modulated by reward value, illustrated by data from two example neurons in Fig. 4. Single neurons were sensitive to reward size following movement (Fig. 4a) as well as following reward delivery (Fig. 4b). These represented largely separate modulations since the events were separated in time by at least 500 ms. Across the population of studied neurons, approximately onethird of the population was sensitive to reward size following movement and another third following the receipt of reward (Fig. 4c). These modulations by reward size were independent of any effects of saccade metrics, demonstrated by the inclusion of saccade amplitude, latency, and peak velocity as independent factors in multiple linear regression analysis of firing rate as a function of reward size. These results thus demonstrate that information about both the predicted and experienced reward value of a particular eye movement is carried by the activity of CGp neurons. Such modulation in neuronal activity by reward size cannot be accounted for solely by reafferent input from motor areas. In this experiment, the net effect of changes in reward value was to scale the gain of spatially-selective neuronal responses following movement. Sixty-two percent of studied neurons in CGp responded in a spatially selective manner following eye movement onset. Among these, neurons excited following a particular movement were further excited when the expected value of that movement was increased; similarly, neurons suppressed following a particular movement were further suppressed when the expected value of that movement was increased (Fig. 4d). Across the population, reward value thus tuned the spatial sensitivity of the CGp neuronal population to saccade direction. Improved spatial sensitivity under high reward conditions may be associated with the slower, more deliberate saccades made by monkeys under high reward conditions in this experiment. The timing of modulations in CGp activity by reward value suggests that this area encodes both the predicted and experienced value of a particular movement. If so, CGp might also be expected to carry signals related to an error in reward prediction, arising from a discrepancy between predicted and experienced value. This hypoth- c Fig. 4a–d Representation of saccade value in posterior cingulate cortex. a Left panel, average firing rate (±SE) for a single CGp neuron plotted as a function of time on high reward (black curve) and low reward (grey curve) blocks of trials. Rasters indicate time of action potentials on individual trials. All trials are aligned on movement onset. Right panel, average firing rate (±SE) following movement onset (grey shaded region on left panel) plotted as a function of reward size, for the same neuron. Reward size is measured as the open time (ms) of a computer driven solenoid and is linearly related to juice volume. b left panel, average firing rate (±SE) for a second CGp neuron plotted as a function of time on high reward (black curve) and low reward (grey curve) blocks of trials. Rasters indicate time of action potentials on individual trials. All trials are aligned on reward offset. Right panel, average firing rate (±SE) following reward offset (grey shaded region on left panel) plotted as a function of reward size, for the same neuron. c Population data. Proportion of CGp neurons with significant modulation by reward size, peak saccade velocity, saccade amplitude, and saccade latency plotted as a function of time. Significant modulations P< 0.05 for each individual factor in a multiple regression analysis of firing rate in each of ten sequential 200 ms epochs. The size of liquid reward delivered on correct trials was controlled linearly by the open time of a computer-driven solenoid (volume=0.0026+0.001·open time in ms). d Reward value scales the gain of CGp responses. The correlation coefficient between firing rate and reward size for each neuron was plotted as a function of movement response index, a logarithmically scaled measure of the degree to which each neuron was excited or inhibited after movement relative to fixation level activity on mapping trials. Each dot represents data for one cell (n=67) and the best-fit line is shown in grey. Note: two outliers were removed from this analysis. After McCoy et al. (2003). Reprinted with permission from Neuron esis was addressed in a second experiment in which reward size was held constant but delivered probabilistically while monkeys shifted gaze to a target fixed in the area of maximal response for the neuron under study. Correct trials were reinforced on a variable reward schedule of 0.8, meaning that, on average, 80% of correct trials were followed by an auditory noise burst and reward (‘‘rewarded trials’’), while the remaining 20% of correct trials were followed by an auditory noise burst only but no juice (‘‘unrewarded trials’’). The relatively low frequency of unrewarded trials allowed them to serve as catch trials for which a predicted reward was unexpectedly withheld. Once again, the behavior of monkeys and the activity of CGp neurons were both sensitive to this manipulation in reward contingencies. After predicted rewards were withheld, monkeys made higher velocity saccades, similar to the effects of low reward on saccade metrics found in the previous experiment. Similarly, the activity of single CGp neurons was also significantly different on rewarded and unrewarded trials (Fig. 5a, b). Across the population of cells studied in this experiment, firing rate following the usual time of reward delivery was significantly greater when expected rewards were omitted (Fig. 5c). These data demonstrate that CGp neurons faithfully report the omission of expected reward, suggestive of a prediction error-like signal for eye movements (Schultz and Dickinson 2000). Intriguingly, many CGp neurons responded equivalently to the delivery of larger than average rewards and 207 the omission of predicted rewards (McCoy et al. 2003), unlike dopamine neurons which respond with a burst of action potentials following unpredicted rewards but are suppressed following the omission of predicted rewards (Schultz et al. 1997). The reward modulation of neuronal activity in CGp is therefore consistent with attentional theories of learning, which posit that reward prediction errors highlight unpredicted stimuli as important for learning (Mackintosh 1975; Pearce and Hall 1980). According to this idea, the absolute value of 208 Fig. 5a, b Representation of saccade reward prediction error in posterior cingulate cortex. a Firing rate as a function of time for a single CGp neuron on rewarded (black curve) and unrewarded (grey curve) delayed saccade trials. On average, eight out of ten randomly-selected correct trials were rewarded. Conventions as in Fig. 4. b Population response to reward omission. c Average (+SE) normalized firing rate measured after the normal time of reward delivery for the population of CGp neurons on rewarded and unrewarded trials. After McCoy et al. (2003). Reprinted with permission from Neuron the neuronal response correlates with the extent to which the reward event differed from expectation, whether in a positive or negative direction. While such a signal would not contain information about ‘‘what’’ needs to be learned about the relationship between a stimulus and reward, such a signal would be useful for instructing ‘‘when’’ and ‘‘how rapidly’’ learning should occur. Some of the saccade-related reward signals uncovered in CGp may best be understood in terms of such an attentional learning theory. Taken together, the results of these experiments suggest that neurons in CGp link motivational outcomes with gaze shifts. Recent studies have reported that neurons in the supplementary eye fields, with which CGp is reciprocally connected, also respond to reward outcomes associated with eye movements (Amador et al. 2000; Stuphorn et al. 2000b). Neurons in other areas of cingulate cortex also have been implicated in guiding actions based on reward value. While studying timing behavior in primates, Niki and Watanabe (1979) first uncovered two classes of rewarderror units in anterior cingulate cortex (CGa): one class responding after juice delivery, and the other responding following incorrect trials as well as following the omission of reward on correct trials. Single cells in anterior cingulate cortex (CGa) have also been found to carry signals related to reward expectation on a cued multi-trial color discrimination task (Shidara and Richmond 2002). In the rostral cingulate motor area (CMAr), Shima and Tanji (1998) found neurons that responded when a change in reward contingencies—but not a neutral cue—prompted monkeys to change the strategy of their behavioral response in order to maximize their receipt of reward. Inactivation of the CMAr by muscimol injection resulted in insensitivity of the monkeys to a reward decrement and an apparent inability to modify their behavioral strategy in order to maximize reward. Taken together, these data are consistent with a direct role for cingulate cortex and supplementary eye fields in linking motivational outcomes to action. Neurons in the primate ventral striatum (Cromwell and Schultz 2003; Hassani et al. 2001; Schultz 2004) and orbitofrontal cortex (Rolls 2000; Tremblay and Schultz 1999) have also been shown to carry information about predicted and experienced rewards. These neurons do not appear to be directly related to the generation of action, but their responses otherwise bear striking resemblance to those in cingulate cortex. In particular, neurons in ventral striatum and orbitofrontal cortex respond during delay periods preceding reinforcement, as well as following reinforcement, and these responses reflect the subjective preferences of subjects for particular rewards (Tremblay and Schultz 1999). Thus, orbitofrontal cortex and ventral striatum appear to convert information about rewards, punishments, and their predictors into a common internal currency (Montague and Berns 2002). Neurons in the dorsal striatum and dorsolateral prefrontal cortex, on the other hand, show similar response properties, but are activated in association with particular movements, much like neurons in CGp and supplementary eye fields. In summary, the response properties of neurons in ventral striatum and orbitofrontal cortex suggest they provide a valuation scale for a broad range of stimuli, (Montague and Berns 2002), while those in the dorsal striatum, dorsolateral prefrontal cortex, supplementary eye fields, and cingulate cortex may serve to associate this information with specific actions (Schultz 2004). 209 Decision-making in the human brain Recent neuroimaging studies suggest that some of the same neurophysiological processes guiding oculomotor decision-making characterize more abstract representations of reward, punishment, and decision-making in humans. Specifically, the striatum, amygdala, orbitofrontal cortex, prefrontal cortex, anterior cingulate cortex, and parietal cortex are activated by rewards and reward-associated stimuli (Elliott et al. 2003). Moreover, hemodynamic responses are modulated by reward uncertainty in orbitofrontal cortex, ventral striatum (Berns et al. 2001; Critchley et al. 2001), and CGp (Smith et al. 2002) and are linearly correlated with expected value in orbitofrontal cortex (O’Doherty et al. 2001), ventral and dorsal striatum (Delgado et al. 2000, 2003), amygdala (Breiter et al. 2001), and premotor cortex (Elliott et al. 2003). Further, errors in predicting rewards or punishments evoke hemodynamic responses in anterior cingulate cortex (Holroyd et al. 2004), ventral striatum (Pagnoni et al. 2002; Seymour et al. 2004), CGp (Smith et al. 2002) and insula (Seymour et al. 2004), and error-related electrophysiological responses have been recorded from the medial frontal and anterior cingulate cortices with scalp electrodes in humans as well (Holroyd et al. 2003). These observations indicate that brain regions carrying value-related information are activated in a similar fashion in monkeys and humans performing disparate types of learning and decision-making tasks. Human neuroimaging studies, however, have not yet tested the hypothesis that pools of neurons coding different movements are activated in proportion to movement value. Conclusions In summary, brain regions have been identified that appear to participate in several stages of oculomotor decision-making, from sensation, to reward expectation, to action, to outcome evaluation. Neurons in parietal cortex, prefrontal cortex, the basal ganglia, and the superior colliculus have been shown to encode in their firing rates the expected value of eye movements. Dopamine neurons in the midbrain, on the other hand, appear to signal an abstract reward prediction error, encoding in their firing rates a discrepancy between predicted and actual rewards. Because dopamine neurons terminate on glutamatergic inputs to the striatum from orbitofrontal cortex and amygdala (among other cortical and limbic structures), they are well-situated to gate the flow of motivational information through the striatum and other basal ganglia structures, whose purpose is ultimately to produce adaptive behaviors. Neurons in the ventral striatum, orbitofrontal cortex, and cingulate cortices, in turn, appear to signal the relative value or salience of features in the environment that may be important for learning and controlling behavior. In particular, posterior cingulate and supplementary eye field neurons have been recently shown to carry signals related to both reward outcomes and oculomotor behavior, and may contribute to updating orienting value signals in parietal and prefrontal cortex. Activation of this complex network thus appears to underlie visuallyguided behavior—in particular, how animals choose where to look- and serves as a model for understanding behavioral decision-making more generally. References Amador N, Schlag-Rey M, Schlag J (2000) Reward-predicting and reward-detecting neuronal activity in the primate supplementary eye field. J Neurophysiol 84:2166–2170 Arnauld A, Nichole P (1662) The art of thinking: Port-Royal logic. Translated by Dickoff J, James P. Bobbs-Merrill, Indianapolis Baleydier C, Mauguiere F (1980) The duality of the cingulate gyrus in monkey. Neuroanatomical study and functional hypothesis. Brain 103(3):525–554 Basso MA, Wurtz RH (1997) Modulation of neuronal activity by target uncertainty. Nature 389:66–69 Berns GS, McClure SM, Pagnoni G, Montague PR (2001) Predictability modulates human brain response to reward. J Neurosci 21:2793–2798 Breiter HC, Aharon I, Kahneman D, Dale A, Shizgal P (2001) Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron 30:619–639 Bussey TJ, Everitt BJ, Robbins TW (1997) Dissociable effects of cingulate and medial frontal cortex lesions on stimulus-reward learning using a novel Pavlovian autoshaping procedure for the rat: implications for the neurobiology of emotion. Behav Neurosci 111:908–919 Camerer CF (2003) Behavioral game theory: experiments in strategic interaction (Roundtable Series in Behavioral Economics). Princeton University Press, Princeton Cavada C, Goldman-Rakic PS (1989) Posterior parietal cortex in rhesus monkey: I. Parcellation of areas based on distinctive limbic and sensory corticocortical connections. J Comp Neurol 287:393–421 Cavada C, Company T, Tejedor J, Cruz-Rizzolo RJ, Reinoso-Suarez F (2000) The anatomical connections of the macaque monkey orbitofrontal cortex. A review. Cereb Cortex 10:220– 242 Coe B, Tomihara K, Matsuzawa M, Hikosaka O (2002) Visual and anticipatory bias in three cortical eye fields of the monkey during an adaptive decision-making task. J Neurosci 22(12):5081–5090 Colby CL, Duhamel JR, Goldberg ME (1996) Visual, presaccadic, and cognitive activation of single neurons in monkey lateral intraparietal area. J Neurophysiol 76:2841–2852 Critchley HD, Mathias CJ, Dolan RJ (2001) Neural activity in the human brain relating to uncertainty and arousal during anticipation. Neuron 29:537–545 Cromwell HC, Schultz W (2003) Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. J Neurophysiol 29:29 Delgado MR, Nystrom LE, Fissell C, Noll DC, Fiez JA (2000) Tracking the hemodynamic responses to reward and punishment in the striatum. J Neurophysiol 84:3072–3077 Delgado MR, Locke HM, Stenger VA, Fiez JA (2003) Dorsal striatum responses to reward and punishment: effects of valence and magnitude manipulations. Cogn Affect Behav Neurosci 3:27–38 Dorris MC, Munoz DP (1998) Saccadic probability influences motor preparation signals and time to saccadic initiation. J Neurosci 18:7015–7026 210 Elliott R, Newman JL, Longe OA, Deakin JF (2003) Differential response patterns in the striatum and orbitofrontal cortex to financial reward in humans: a parametric functional magnetic resonance imaging study. J Neurosci 23(1):303–307 Gabriel M, Sparenborg S (1987) Posterior cingulate cortical lesions eliminate learning-related unit activity in the anterior cingulate cortex. Brain Res 409:151–157 Gabriel M, Orona E, Foster K, Lambert RW (1980) Cingulate cortical and anterior thalamic neuronal correlates of reversal learning in rabbits. J Comp Physiol Psychol 94:1087–1100 Glimcher P (2002) Decisions, decisions, decisions: choosing a biological science of choice. Neuron 36:323–332 Glimcher P (2003) Decisions, Uncertainty, and the brain: the science of neuroeconomics. MIT Press, Cambridge Gnadt JW, Andersen RA (1988) Memory related motor planning activity in posterior parietal cortex of macaque. Exp Brain Res 70:216–220 Gold JI, Shadlen MN (2002) Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron 36:299–308 Goldberg ME, Colby CL, Duhamel JR (1990) Representation of visuomotor space in the parietal lobe of the monkey. Cold Spring Harb Symp Quant Biol 55:729–739 Gottlieb JP, Kusunoki M, Goldberg ME (1998) The representation of visual salience in monkey parietal cortex. Nature 391:481–484 Hanes DP, Schall JD (1996) Neural control of voluntary movement initiation. Science 274:427–430 Harker KT, Whishaw IQ (2002) Impaired spatial performance in rats with retrosplenial lesions: importance of the spatial problem and the rat strain in identifying lesion effects in a swimming pool. J Neurosci 22:1155–1164 Hassani OK, Cromwell HC, Schultz W (2001) Influence of expectation of different rewards on behavior-related neuronal activity in the striatum. J Neurophysiol 85:2477–2489 Herrnstein RJ (1961) Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav 4:267–272 Herrnstein RJ (1997) The matching law: papers in psychology and economics. Harvard University Press, Cambridge Holroyd CB, Nieuwenhuis S, Yeung N, Cohen JD (2003) Errors in reward prediction are reflected in the event-related brain potential. Neuroreport 14:2481–2484 Holroyd CB, Nieuwenhuis S, Yeung N, Nystrom L, Mars RB, Coles MG, Cohen JD (2004) Dorsal anterior cingulate cortex shows fMRI response to internal and external error signals. Nat Neurosci 7:497–498 Horvitz JC (2000) Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96:651–656 Horvitz JC (2002) Dopamine gating of glutamatergic sensorimotor and incentive motivational input signals to the striatum. Behav Brain Res 137:65–74 Ikeda T, Hikosaka O (2003) Reward-dependent gain and bias of visual responses in primate superior colliculus. Neuron 39:693– 700 Kawagoe R, Takikawa Y, Hikosaka O (1998) Expectation of reward modulates cognitive signals in the basal ganglia. Nat Neurosci 1:411–416 Kim JN, Shadlen MN (1999) Neural correlates of a decision in the dorsolateral prefrontal cortex of the macaque. Nat Neurosci 2:176–185 Lauwereyns J, Watanabe K, Coe B, Hikosaka O (2002) A neural correlate of response bias in monkey caudate nucleus. Nature 418:413–417 Leon MI, Shadlen MN (1999) Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque. Neuron 24:415–425 Mackintosh NJ (1975) Blocking of conditioned suppression: role of the first compound trial. J Exp Psychol Anim Behav Process 1:335–345 McCoy AN, Crowley JC, Haghighian G, Dean HL, Platt ML (2003) Saccade reward signals in posterior cingulate cortex. Neuron 40:1031–1040 Mirenowicz J, Schultz W (1996) Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379:449–451 Montague PR, Berns GS (2002) Neural economics and the biological substrates of valuation. Neuron 36:265–284 Morecraft RJ, Geula C, Mesulam MM (1993) Architecture of connectivity within a cingulo-fronto-parietal neurocognitive network for directed attention. Arch Neurol 50:279–284 Niki H, Watanabe M (1979) Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res 171:213– 224 O’Doherty J, Kringelbach ML, Rolls ET, Hornak J, Andrews C (2001) Abstract reward and punishment representations in the human orbitofrontal cortex. Nat Neurosci 4:95–102 Olson CR, Musil SY, Goldberg ME (1996) Single neurons in posterior cingulate cortex of behaving macaque: eye movement signals. J Neurophysiol 76:3285–3300 Pagnoni G, Zink CF, Montague PR, Berns GS (2002) Activity in human ventral striatum locked to errors of reward prediction. Nat Neurosci 5:97–98 Pavlov IP (1927) Conditioned reflexes. Oxford University Press, London Pearce JM, Hall G (1980) A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol Rev 87:532–552 Platt ML (2002) Neural correlates of decisions. Curr Opin Neurobiol 12(2):141–148 Platt ML, Glimcher PW (1997) Responses of intraparietal neurons to saccadic targets and visual distractors. J Neurophysiol 78:1574–1589 Platt ML, Glimcher PW (1999) Neural correlates of decision variables in parietal cortex. Nature 400:233–238 Platt ML, Lau B, Glimcher PW (2004) Situating the Superior Colliculus within the Gaze Control Network. In: Hall WC, Moschovakis A (eds) The superior colliculus: new approaches for studying sensorimotor integration. CRC Press, New York Rescorla RA, Wagner AR (1972) A theory of Pavlovian conditioning. Variations in the effectiveness of reinforcement and nonreinforcement. In: Black AHaP WF (eds) Classical conditioning. II. Current research and theory. Appleton-CenturyCrofts, New York Roitman JD, Shadlen MN (2002) Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J Neurosci 22:9475–9489 Rolls ET (2000) The orbitofrontal cortex and reward. Cereb Cortex 10:284–294 Salamone JD (1994) The involvement of nucleus accumbens dopamine in appetitive and aversive motivation. Behav Brain Res 61:117–133 Sato M, Hikosaka O (2002) Role of primate substantia nigra pars reticulata in reward-oriented saccadic eye movement. J Neurosci 22:2363–2373 Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80:1–27 Schultz W (2004) Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology. Curr Opin Neurobiol 14:139–147 Schultz W, Dickinson A (2000) Neuronal coding of prediction errors. Annu Rev Neurosci 23:473–500 Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599 Seymour B, O’Doherty JP, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak RS (2004) Temporal difference models describe higher-order learning in humans. Nature 429:664–667 Shadlen MN, Newsome WT (1996) Motion perception: seeing and deciding. Proc Natl Acad Sci U S A 93:628–633 Shadlen MN, Newsome WT (2001) Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J Neurophysiol 86:1916–1936 Sherrington CS (1906) The integrative action of the nervous system. Scribner, New York 211 Shidara M, Richmond BJ (2002) Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296:1709–1711 Shima K, Tanji J (1998) Role for cingulate motor area cells in voluntary movement selection based on reward. Science 282:1335–1338 Skinner BF (1981) Selection by consequences. Science 213:501–504 Smith K, Dickhaut J, McCabe K, Pardo JV (2002) Neuronal substrates for choice under ambiquity, risk, gains, and losses. Mangement Science 48:711–718 Speiser D (ed) (1982) The works of Daniel Bernouilli. Birkhauser, Boston Stephens DW, Krebs JR (1986) Foraging theory. Princeton University Press, Princeton Stuphorn V, Taylor TL, Schall JD (2000a) Performance monitoring by the supplementary eye field. Nature 408:857–860 Stuphorn V, Taylor TL, Schall JD (2000b) Performance monitoring by the supplementary eye field. Nature 408:857–860 Sutherland RJ, Whishaw IQ, Kolb B (1988) Contributions of cingulate cortex to two forms of spatial learning and memory. J Neurosci 8:1863–1872 Sutton RS, Barto AG (1981) Toward a modern theory of adaptive networks: expectation and prediction. Psychol Rev 88:135–170 Takikawa Y, Kawagoe R, Itoh H, Nakahara H, Hikosaka O (2002) Modulation of saccadic eye movements by predicted reward outcome. Exp Brain Res 142:284–291 Thorndike LL (1898) Animal intelligence: an experimental study of the asssociative processes in animals. Psychol Rev Monogr [Suppl 11] Tremblay L, Schultz W (1999) Relative reward preference in primate orbitofrontal cortex. Nature 398:704–708 Vogt BA, Pandya DN (1987) Cingulate cortex of the rhesus monkey. II. Cortical afferents. J Comp Neurol 262:271–289 Vogt BA, Finch DM, Olson CR (1992) Functional heterogeneity in cingulate cortex: the anterior executive and posterior evaluative regions. Cereb Cortex 2:435–443

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download decision-making in the primate brain