* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Optimal decision making theories - Bristol CS
Stimulus (physiology) wikipedia , lookup
Neural oscillation wikipedia , lookup
Neuroplasticity wikipedia , lookup
Holonomic brain theory wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Neuroanatomy wikipedia , lookup
Pre-Bötzinger complex wikipedia , lookup
Circumventricular organs wikipedia , lookup
Neural modeling fields wikipedia , lookup
Basal ganglia wikipedia , lookup
Neural coding wikipedia , lookup
Development of the nervous system wikipedia , lookup
Optogenetics wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
Neural correlates of consciousness wikipedia , lookup
Premovement neuronal activity wikipedia , lookup
Channelrhodopsin wikipedia , lookup
Central pattern generator wikipedia , lookup
Feature detection (nervous system) wikipedia , lookup
Efficient coding hypothesis wikipedia , lookup
Synaptic gating wikipedia , lookup
Neuroeconomics wikipedia , lookup
Metastability in the brain wikipedia , lookup
Optimal decision making theories Rafal Bogacz, University of Bristol, UK Abstract In case of many decisions based on sensory information, the sensory stimulus or its neural representation are noisy. This chapter reviews theories proposing that the brain implements statistically optimal strategies for decision making on the basis of noisy information. These strategies maximize the accuracy and speed of decisions, as well as the rate of receiving rewards for correct choices. The chapter first reviews computational models of cortical decision circuits that can optimally perform choices between two alternatives. Then, it describes a model of cortico-basal-ganglia circuit that implements the optimal strategy for choice between multiple alternatives. Finally, it shows how the basal ganglia may modulate decision processes in the cortex, allowing cortical neurons to represent the probabilities of alternative choices being correct. For each set of theories their predictions are compared with existing experimental data. 1. Introduction Imagine an animal trying to decide if a shape moving behind the leaves is predator or prey. The sensory information the animal receives may be noisy and partial (e.g. occluded by the leaves), and hence need to be accumulated over time to gain sufficient accuracy, but the speed of such choices is also of critical importance. Due to the noisy nature of the sensory information, such decisions can be considered as statistical problems. This chapter reviews theories assuming that during perceptual decisions the brain performs statistically optimal tests, thereby maximizing the speed and accuracy of choices. The optimal decision making theories are motivated by an assumption that evolutionary pressure has been promoting animals that make fast and accurate choices. Although optimality is not always achieved, these theories provide interpretation for existing data and further experimental predictions that can guide empirical research. The optimal decision making strategies are often precisely defined and relatively simple, thus they constrain the parameters of the models of neural decision circuits. This chapter is organized as follows. Section 2 reviews models assuming that cortical decision circuits implement an optimal test for choice between two alternatives. Section 3 reviews the model of cortico-basal ganglia circuit assuming that it implements optimal choice between multiple alternatives; this model is closely related to that described in [Michael’s chapter]. Section 4 shows how the basal ganglia may modulate the integration of sensory evidence in the cortex and allow cortical neurons represent the probabilities of alternative choices being correct. Section 5 discusses open questions. In addition, the Appendices contain essential derivations presented step-by-step (without skipping any calculations) such that they can be understood without extensive mathematical background. In this chapter we focus on models of decision making in highly practiced task; the models describing task acquisition are discussed in [Michael’s chapter]. Throughout the chapter we use population level models describing activities of neuronal populations rather than individual neurons. Although this level of modelling is unable to capture many important details of neural decision circuits, it helps in understanding the essence of computation performed by various populations during decision process. 1 2. Models of optimal decision making in the cortex In this section we first briefly review the responses of cortical neurons during a choice task; a more detailed review is available in [Hauke’s chapter]. Next, the statistically optimal decision strategy and its proposed neural implementations are described. Finally, the predictions of the models are compared with experimental data. 2.1. Neurobiology of decision processes in the cortex The neural bases of decision are typically studied in the motion discrimination task, in which a monkey is presented with a display containing moving dots (Britten, Shadlen, Newsome, & Movshon, 1993). A fraction of the dots moves left on some trials or right on other trials, while the rest is moving randomly. The task of the animal is to make a saccade (i.e. an eye-movement) in the direction of motion of the majority of dots. One of the areas critically important for this task is the medial temporal (MT) area in the visual cortex. The neurons in area MT respond selectively for a particular directions of motion, thus the MT neurons preferring left motion are more active on trials when the majority of dots move left, while the neurons preferring right motion are more active on trials when the majority of dots move right (Britten et al., 1993). However, their activity is also very noisy, as the stimulus itself is noisy. Let us now consider the decision faced by the areas receiving input from MT. Let us imagine a judge listening to the activity from the two populations of MT neurons, each preferring one of the two directions. The MT neurons produce spikes that can be interpreted as votes for the alternative directions. The task of the judge is to choose the alternative receiving more votes, i.e. corresponding to the MT population with higher mean firing rate. Note however that the judge cannot measure the mean firing rate instantaneously, as the spikes are discrete events spread out in time. Thus the judge needs to observe the MT activity for a period of time, and integrate the evidence or “count the votes” until it reaches a certain level of confidence. Indeed, such an information integration process has been observed in the lateral intraparietal (LIP) area and frontal eye field during the motion coherence task. Neurons in area LIP respond selectively before and during saccades in particular directions. It has been observed that the LIP neurons selective for the chosen direction gradually increase their firing rate during the motion discrimination task (Roitman & Shadlen, 2002; Shadlen & Newsome, 2001), and the data indicate that these neurons integrate the input from corresponding MT neurons over time (Ditterich, Mazurek, & Shadlen, 2003; Gold & Shadlen, 2007; Schall, 2001). Thus using the judge metaphor, the LIP neurons “count the votes” and represent the total numbers of votes in their firing rate. The neurons representing integrated evidence have also been observed in other tasks in areas within frontal lobe (Cisek & Kalaska, 2005; Heekeren, Marrett, Bandettini, & Ungerleider, 2004; Heekeren, Marrett, & Ungerleider, 2008; Thompson, Hanes, Bichot, & Schall, 1996). 2.2. Optimal stopping criterion for two alternatives The above analysis still leaves an open question: When should the integration process be stopped and action executed? This subsection first considers this question intuitively, then it presents an optimal stopping criterion formally, and describes in what sense it is optimal. 2 Stopping criteria. The simplest possible criterion is to stop the integration whenever the integrated evidence, or the total number of votes, for one of the alternatives reaches a threshold. This strategy is known as the race model (Vickers, 1970). Another possibility is to stop the integration when the difference between the integrated evidence in favour of the winning and losing alternatives exceeds a threshold. This strategy is referred to as the diffusion model (Laming, 1968; Ratcliff, 1978; Stone, 1960). The diffusion model is usually formulated in a different but equivalent way: It includes a single integrator which accumulates the difference between the sensory evidence supporting the two alternatives. A choice is made in favour of the first alternative if the integrator exceeds a positive threshold, or in favour of the second alternative if the integrator decreases below a negative threshold. As will be demonstrated formally below, the diffusion model provides an optimal stopping criterion. The advantage of diffusion over race models can also be seen intuitively, as the diffusion model allows the decision process to be modulated by the amount of conflict between evidence supporting alternatives on a given trial: Note that decisions will take longer when the evidence for the two alternatives is similar, and importantly, that decisions will be faster when there is little evidence for the losing alternative. By contrast, in the race model the decision time does not depend on the level of evidence for the losing alternative (Bogacz, 2007). Statistical formulation. Let us now formalize the decision problem. For simplicity, let us assume that time can be divided into discrete steps. Let xi(t) denote the sensory evidence supporting alternative i at time t, which in the motion discrimination task corresponds to the firing rate of the MT neurons selective for alternative i at time t. Let yi denote the evidence for alternative i integrated until time t: t yi = ∑ xi (τ ) . (1) τ =1 Gold and Shadlen (2001; 2002) formulated the decision as a statistical problem in the following way: They assumed that xi(t) come from a normal distribution with mean μi and standard deviation σ. They defined hypotheses Hi stating that the sensory evidence supporting alternative i has higher mean: H1: μ1=μ+, μ2=μ–; H2: μ1=μ–, μ2=μ+, (2) where μ+>μ–. The optimal procedure for distinguishing between these hypotheses is provided by the Sequential Probability Ratio Test (SPRT) (Wald, 1947), which is equivalent to the diffusion model (Laming, 1968), as will be shown below. According to SPRT, at each moment of time t, one computes the ratio of the likelihoods of the sensory evidence given the hypotheses: R= P( x(1..t ) | H1 ) , P( x(1..t ) | H 2 ) (3) where x(1..t) denotes a set of all sensory evidence observed so far (i.e. x1(1), x2(1), …, x1(t), x2(t)). If R exceeds a threshold Z1 or decreases below a lower threshold Z2, the decision process is stopped and the choice is made (H1 if R>Z1 or H2 if R<Z2). Otherwise the decision process continues and another sample of sensory information is observed. Thus note that SPRT observes sensory evidence only for as long as it is necessary to distinguish between the hypotheses. 3 The relationship between SPRT and the diffusion model is described in Appendix A. Its first part shows that R in the SPRT changes in a similar way as the single integrator in the diffusion model (which accumulates the difference between the sensory evidence supporting the two alternatives). Namely, if at a given moment of time t, the sensory evidence supports the first alternative more than the second (i.e. x1(t) > x2(t)) then R increases; and otherwise R decreases. The second part of Appendix A shows that log R is exactly proportional to the difference between the integrated evidence: log R = g ( y1 − y2 ) , (4) where g is a constant. Thus according to SPRT, the decision process will be stopped when the difference (y1 – y2) exceeds a positive threshold (equal to logZ1/g) or decreases below a negative threshold (equal to logZ2/g). Hence the SPRT (with hypotheses of Equation 2) is equivalent to the diffusion model. Optimality. The SPRT is optimal in the following sense: it minimizes the average decision time for any required accuracy (Wald & Wolfowitz, 1948). Let us illustrate this property by comparing the race and the diffusion models. In both models, speed and accuracy are controlled by the height of decision threshold, and there exists a speedaccuracy tradeoff. But if we choose the thresholds in both models giving the same accuracy for a given sensory input, then the diffusion model will on average be faster than the race model. One can ask if the above optimality property could bring benefits to animals. If we consider a scenario in which an animal receives rewards for correct choices, then making fast choices allows an animal to receive more rewards per unit of time. In particular, Gold & Shadlen (2002) considered a task in which the animal receives a reward for correct choices, there is no penalty for errors, and there is a fixed delay D between the response and the onset of the next choice trial. In this task the reward rate is equal to the ratio of accuracy AC and the average duration of the trial (Gold & Shadlen, 2002): RR = AC , RT + D (5) where RT denotes the average reaction time. The reward rate depends on decision threshold. But Appendix B shows that the diffusion model with appropriately chosen threshold gives higher or equal reward rate than any other decision strategy. Furthermore, the diffusion model maximizes reward rate not only for the task described above, but also for a wide range of other tasks (Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006). 2.3. Neural implementations As mentioned in the previous subsection, the optimal decision strategy is to make a choice when the difference between the evidence supporting the two alternatives exceeds a threshold. However, the neurophysiological data suggest that an animal makes a choice when the activity of integrator neurons selective for a given alternative exceeds a fixed threshold (Hanes & Schall, 1996; Roitman & Shadlen, 2002). One way to reconcile these two observations is to assume that the firing rate of these neurons represents the difference in evidence supporting the two alternatives. Several models have been proposed that exhibit this property, and they are briefly reviewed in this subsection. 4 [Figure 1 around here] Let us refer to the population of cortical integrator neurons selective for a particular alternative as an integrator. In the model shown in Figure 1a, the integrators directly accumulate the difference between the activities of sensory neurons via feed-forward inhibitory connections (Mazurek, Roitman, Ditterich, & Shadlen, 2003; Shadlen & Newsome, 2001). Thus this model is equivalent to the diffusion model. An alternative model shown in Figures 1b assumes that integrators mutually inhibit each other (Usher & McClelland, 2001). An analysis of the dynamics of this model reveals that, for certain parameter values, the activity of integrators is approximately proportional to the difference between the integrated evidence, thus it also approximates the diffusion model (Bogacz et al., 2006). In the models shown in Figures 1a and 1b, the sensory neurons send both excitatory and inhibitory connections. But in the brain the cortico-cortical connections are only excitatory, thus Figures 1c and 1d show more realistic versions of the models in Figures 1a and 1b in which the inhibition is provided by a population of inhibitory inter-neurons (the model in Figure 1d has been proposes by Wang, 2002). It is easy to set the parameters of the model shown in Figure 1c so it also integrates the difference between sensory evidence (in particular the weights of connections between sensory and integrator populations need to be twice as large as the other weights). Similarly Wong and Wang (2006) have shown that the model in Figure 1d can also for certain parameters integrate the difference between sensory evidence. In summary, there are several architectures of cortical decision networks, shown in Figure 1, that can approximate the optimal decision making for two alternatives. Matlab codes allowing simulation and comparison of performance of the models described in this chapter are available from [Book website, currently the codes are at: http://www.cs.bris.ac.uk/home/rafal/optimal/codes.html]. 2.4. Comparison with experimental data The diffusion model has been shown to fit the reaction time (RT) distributions from a wide range of choice tasks (Ratcliff, Gomez, & McKoon, 2004; Ratcliff & Rouder, 2000; Ratcliff, Thapar, & McKoon, 2003). Furthermore, the diffusion model has been shown to describe the patterns of RTs and the growth of information in neurons involved in the decision process better than the race model (Ratcliff, Cherian, & Segraves, 2003; Ratcliff & Smith, 2004; Smith & Ratcliff, 2004). The diffusion model has been also used to explain the effect of stimulation of MT and LIP neurons during the motion discrimination task. Ditterich et al. (2003) showed that stimulation of MT neurons selective for a particular alternative produced two effects: reduced RTs on trials when this alternative was chosen, and increased RTs when the other alternative was chosen. The second effect suggests that the decision is made on the basis of the difference between integrated evidence, and thus supports the diffusion model over the race model. Hanks et al. (2006) compared these effects of MT stimulation with the effects of LIP stimulation. By estimating parameters of the diffusion model, they established that the effect of MT stimulation on RTs is best explained by change in sensory evidence, while the effect of LIP stimulation is best explained by change in the integrated evidence. This analysis supports the model in which the activity of LIP neurons represent the difference between integrated information from MT neurons. 5 The models shown in Figure 1 can also fit the time-courses of neural activity in LIP neurons during the choice process. In particular, due to the inhibitory connections, the models can describe the decrease in firing rate of LIP neurons representing the losing alternative (Ditterich, 2006; Mazurek et al., 2003; Wang, 2002). However, the current neurophysiological data do not allow us to distinguish which type of inhibition (feedforward or mutual) is present in the integrator networks. We will come back to this question in Section 5.1. 3. Model of decision making in the cortico-basal-ganglia circuit This section reviews a theory (Bogacz & Gurney, 2007) suggesting that the corticobasal-ganglia circuit implements a generalization of SPRT to choice between multiple alternatives, the Multihypothesis SPRT (MSPRT) (Baum & Veeravalli, 1994). This section follows the same organization as Section 2. 3.1. Neurobiology of decision processes in the basal ganglia The basal ganglia are a set of nuclei connected with one another, cortex and subcortical regions. Redgrave et al. (1999) have proposed that the basal ganglia act as a central switch resolving competition between cortical regions vying for behavioural expression. In the default state the output nuclei of the basal ganglia send tonic inhibition to the thalamus and brain stem, and thus block execution of any actions (Chevalier, Vacher, Deniau, & Desban, 1985; Deniau & Chevalier, 1985). To execute a movement, the firing rates of the corresponding neurons in the output nuclei need to decrease (Chevalier et al., 1985; Deniau & Chevalier, 1985), thus releasing the corresponding motor plan from inhibition. Another property of basal ganglia organization relevant to the model is that within each nucleus different neurons are selective for different body parts (Georgopoulos, DeLong, & Crutcher, 1983; Gerardin et al., 2003). On the basis of this observation it has been proposed that basal ganglia are organized into channels corresponding to individual body parts that traverse all nuclei (Alexander, DeLong, & Strick, 1986). Two sample channels are shown in two colours in Figure 2. Furthermore, the connectivity between nuclei is usually within channels (Alexander et al., 1986), for example, neurons in the motor cortex selective for the right hand project to the neurons in striatum selective for the right hand, etc. The only exception is the subthalamic nucleus (STN) where neurons project more diffusely across channels (Parent & Hazrati, 1993; Parent & Smith, 1987) (see Figure 2). Figure 2 shows a subset of basal ganglia connectivity required for optimal decision making (Bogacz & Gurney, 2007). The figure does not show “the indirect pathway” between the striatum and the output nuclei via the globus pallidus (GP). It has been suggested that this pathway is involved in learning from punishments (Frank, Seeberger, & O'Reilly, 2004) [see Michael’s chapter] and will not be discussed in this chapter for simplicity, but it can be included in the model and it continues to implement MSPRT (Bogacz & Gurney, 2007; Bogacz & Larsen, submitted). 3.2. Optimal stopping criterion for multiple alternatives Let us first describe MSPRT intuitively (a more formal description will follow). As mentioned in Section 2.2, to optimize performance, the choice criterion should take into account the amount of conflict in the evidence. Consequently, in the MSPRT, a choice 6 is made when the integrated evidence for one of the alternatives exceeds a certain level, but this level is not fixed; rather it is increased when the evidence is more conflicting: yi > Threshold + Conflict, (6) where Threshold is a fixed value. Subtracting Conflict from both sides, the criterion for making a choice can be written as: yi – Conflict > Threshold. (7) In order to implement the MSPRT, Conflict needs to take a particular form: N Conflict = log ∑ exp y j , (8) j =1 where N is the number of alternative choices (we use the italicised word ‘Conflict’ to refer to a term defined in Equation 8, and the non-italicised word ‘conflict’ to refer to a general situation where sensory evidence supports more than one alternative). Equation 8 is said to express conflict because it involves summation of evidence across alternatives, but it also includes particular nonlinearities, and to understand where they come from, we need to describe MSPRT more formally. MSPRT is a statistical test between N hypotheses. We can generalize the hypotheses described in Equation 2, so that the hypothesis Hi states that the sensory evidence supporting alternative i has the highest mean: Hi: μi=μ+, μj≠i=μ–. (9) In the MSPRT, at each time t, and for each alternative i, one computes the probability of alternative i being correct given all sensory evidence observed so far; let us denote this probability by Pi: Pi = P(H i | x(1..t )) . (10) If for any alternative Pi exceeds a fixed threshold, the choice is made in favour of the corresponding alternative; otherwise the sampling continues. Appendix C shows that the logarithm of Pi is equal to: N log Pi = yi − log ∑ exp y j . (11) j =1 Equation 11 includes two terms: the integrated evidence yi and Conflict defined in Equation 8. Thus MSPRT is equivalent to making a choice when the difference between yi and Conflict exceeds a threshold. MSPRT has similar optimality property as SPRT, namely it minimizes the decision time for any required accuracy (which has been shown analytically for accuracies close to 100% (Dragalin, Tertakovsky, & Veeravalli, 1999), and numerically for lower accuracies (McMillen & Holmes, 2006)). Thus, given the discussion in Section 2.2 and Appendix B, MSPRT also maximizes the reward rate. [Figure 2 around here] 3.3. Neural implementation Bogacz and Gurney (2007) proposed that the circuit of Figure 2 computes the expression yi – Conflict (required for MSPRT). In their model, yi are computed in cortical integrators accumulating input from sensory neurons (see Figure 2), and the 7 Conflict is computed by STN and GP (we will describe how it can be computed later). The output nuclei receive inhibition from cortical integrators via the striatum and excitation from STN, thus they compute in the model: OUTi = – yi + Conflict = – (yi – Conflict). (12) Thus the output nuclei in the model represent the negative of the expression that is compared against a threshold in MSPRT (cf. Equation 7). Therefore, in the model the choice is made when the activity of any output channel OUTi decreases below the threshold, in agreement with selection by disinhibition reviewed in Section 3.1. Frank (2006) has also proposed that the conflict is computed in STN [see Michael’s chapter]. STN has a suitable anatomical location to modulate choice process according to the conflict, as it can effectively inhibit all motor programs by its diffuse projections to output nuclei (Mink, 1996). Studies of patients who receive deep brain stimulation to STN support the idea that STN computes conflict. They suggest that disrupting computations in STN makes patients unable to prevent premature responding in high conflict choices (Frank, Samanta, Moustafa, & Sherman, 2007; Jahanshahi et al., 2000). [see Michael’s chapter for details]. Bogacz and Gurney (2007) showed that input sent by STN to the output nuclei can be proportional to the particular form of Conflict defined in Equation 8, if the neurons in STN and GP had the following relationships between their input and the firing rate: STN = exp (input), (13) GP = input – log (input). (14) Let us analyse how each term in Equation 8 can be computed. For ease of reading let us restate Equation 8: N Conflict = log ∑ exp y j . (8) j =1 Starting from the right end of Equation 8, the integrated evidence yi is provided to the STN in the model by a direct connection from cortical integrators (see Figure 2). The exponentiation is performed by the STN neurons (cf. Equation 13). The summation across channels is achieved due to the diffuse projections from the STN – in the model each channel in the output nuclei receives input from all channels in the STN hence the input to the output neurons is proportional to the sum of activity in STN channels. The only non-intuitive element of the computation of Equation 8 is the logarithm – it is achieved by the interactions between STN and GP, as shown in Appendix D. 3.4. Comparison with experimental data The input-output relationships of Equations 13 and 14 form predictions of the model. The first prediction says that the firing rate of STN neurons should be proportional to the exponent of their input. Figure 3 shows the firing rate as a function of injected current for seven STN neurons, for which this relationship has been studied precisely (Hallworth, Wilson, & Bevan, 2003; Wilson, Weyrick, Terman, Hallworth, & Bevan, 2004). Solid lines show fits of the exponential function to firing rates below 135 Hz, suggesting that up to approximately 135Hz the STN neurons have an input-output relationship which is very close to exponential. [Figure 3 around here] 8 For the entire range in Figure 3, the input-output relationship seems to be sigmoidal, often used in neural network models, and one could ask how unique the exponential relationship discussed above is, as every sigmoidal curve has a segment with an exponential increase. However, note that typical neurons have much lower maximum firing rate, so even if they had an exponential segment, it would apply to a much smaller range of firing rates. By contrast, the STN neurons have exponential input-output relationship for up to ~135 Hz, and this is approximately the operating range of these neurons in humans during choice tasks (Williams, Neimat, Cosgrove, & Eskandar, 2005). The model also predicts that GP neurons should have approximately linear input-output relationship (because the linear term in Equation 14 dominates for higher input). Neurophysiological studies suggest that there are three distinct subpopulations of neurons in GP, and the one which contributes most to the population firing rate has indeed the linear input-output relationship (Kita & Kitai, 1991; Nambu & Llinas, 1994). The model is also consistent with behavioural data from choice tasks. First, for two alternatives the model produces the same behaviour as the diffusion model, because then MSPRT reduces to SPRT, thus the model fits all behavioural data that the diffusion model can describe (see Section 2.4). Second, the model reproduces the Hick’s law, stating that the decision time is proportional to the logarithm of the number of alternatives (Bogacz & Gurney, 2007). Third, Norris (2006) has shown that MSPRT describes patterns of RTs during word recognition (which can be interpreted as a perceptual decision with the number of alternatives equal to the number of words known by participants). 4. Basal ganglia and cortical integration This section discusses the relationship between the models presented in the two previous sections, and how the basal ganglia may modulate cortical integration allowing cortical integrators to represent probabilities of corresponding alternatives being correct. 4.1. Integration of sensory information In the model shown in Figure 2, a cortical integrator accumulates sensory evidence only for the corresponding alternative. This simple model is inconsistent with the observation that integrators representing the losing alternative decrease their firing rate during a choice process (Shadlen & Newsome, 2001); to account for this observation the inhibitory connections need to be introduced, as shown in Figure 1. Nevertheless, if the simple cortical integration in the model of the cortico-basal-ganglia circuit is replaced by any of the models in Figure 1, the circuit still implements MSPRT (Bogacz & Gurney, 2007), as we shall now explain. Note that in the biologically more realistic models of Figures 1c and 1d, both integrators receive the same inhibition from a pool of inhibitory neurons. The basal ganglia model has a surprising property that if the same inhibition, inh, is applied to all integrators, the activity of output nuclei does not change. Intuitively, this property is desirable for the basal ganglia network because, as we discussed in Section 2.2, the optimal choice criterion should be based on differences between integrated evidence for the alternatives, so increasing or decreasing the activity of all integrators should not affect the optimal choice criterion. This property is shown formally in Appendix E. 9 Bogacz and Larsen (submitted) have proposed another way in which the cortical integrators may compete with one another. In their model, the integration is performed via cortico-basal-ganglia-thalamic loops as shown in Figure 4. In this model the circuit of STN and GP (represented by a small circle in Figure 4) provides the effective inhibition, which is the same for all integrators. Using the property described in the previous paragraph, Bogacz and Larsen (submitted) have shown that the activity of output nuclei in the model of Figure 4 is exactly the same as in the model of Figure 2, thus the former also implements MSPRT. [Figure 4 around here] Let us now investigate the properties of the cortical integrators predicted by the model of Figure 4. Let us recall that in the model the activities of output nuclei are equal to (from Equations 8, 11, 12): OUTi = − log Pi (15) (recall that Pi denotes the probability of alternative i being correct given the sensory evidence). The integrators in Figure 4 receive input from the sensory neurons and effective inhibition from the output nuclei. If there is no sensory evidence coming, the activity of the integrators corresponding to alternative i is proportional to logPi (from Equation 15). Thus in this model the cortical integrators represent logPi in their firing rate, they update logPi according to new sensory evidence, and the basal ganglia continuously renormalizes the cortical activity such that Pi sum up to 1. 4.2. Representing prior probabilities Let us now investigate how one can incorporate into the model of Figure 4 the prior probabilities, i.e. expectations about correct alternative prior to stimulus onset (e.g. that may arise in perceptual choice task when one stimulus is presented on a greater fraction of trials than others). Before the sensory evidence is provided, Pi are equal to the prior probabilities. Hence, given the discussion in the previous paragraph, to utilise the prior information in the decision process, the initial values of the integrators (just before stimulus onset) should be equal to the logarithm of the prior probabilities. 4.3. Comparison with experimental data The theory reviewed in this section predicts that the activity of integrator neurons should be proportional to logPi. Two studies (Platt & Glimcher, 1999; Yang & Shadlen, 2007) have shown directly that activity of LIP neurons is modulated by Pi prior to and during the decision process respectively. But their published results do not allow us to distinguish if the LIP neurons represent logPi or some other function of Pi. Nevertheless, there are two less direct studies supporting the hypothesis that the activity of cortical integrators prior to stimulus onset is proportional to the logarithms of the prior probabilities. First, Carpenter and Williams (1995) have also proposed that the starting point of integration is proportional to the logarithm of the prior probability, on the basis of careful analysis of RTs. In their experiment, participants were required to make an eye movement to a dot appearing on the screen. Carpenter and Williams (1995) observed that the median RT to a dot appearing in a particular location was proportional to: median RT ~ –log Prior, 10 (16) where Prior was computed as the fraction of trials within a block in which the dot appears in this location. Now notice that if there were no noise in sensory evidence, then RT would always be constant and proportional to the distance between the starting point (firing rate of integrators at stimulus onset) and the threshold (firing rate at the moment of decision). If the noise is present, RT differs between trials, but the median RT is equal to the distance between the starting point and the threshold (this relation is exact in a particular model used by Carpenter and Williams (1995) and approximate in other models): median RT ~ Threshold – starting point (17) Putting Equations 16 and 17 together implies that the starting point is proportional to log Prior. Second, Basso and Wurtz (1998) recorded neural activity in the superior colliculus (a subcortical structure receiving input from cortical integrators) in a task in which the number of alternative choices N differed between blocks. Note that in this task Prior = 1/N, and thus log Prior = –log N. Figure 5 shows that the firing rate before stimulus onset in their task was approximately proportional to –log N, and hence to log Prior. [Figure 5 around here] 5. Discussion In this chapter we reviewed theories proposing that cortical networks and cortico-basalganglia circuit implement optimal statistical tests for decision making on the basis of noisy information. The predictions of these theories have been shown to be consistent with both behavioural and neurophysiological data. We have also reviewed a theory suggesting that the basal ganglia modulate cortical integrators allowing them to represent probabilities of corresponding alternatives being correct. In this section we discuss open questions. 5.1. Open questions Let us discuss five open questions relating to the theories reviewed in this chapter. 1. Is MSPRT a realistic model for the cortico-basal-ganglia circuit? Although Section 3.4 shows that a number of predictions of this model are consistent with existing data, more evidence needs to be provided. The model makes many more precise predictions. For example, it predicts that the total activity of all STN channels should be proportional to Equation 8; since this prediction concerns the activity of a substantial neuronal population, it could be verified with functional magnetic resonance imaging. 2. Is the integration of sensory evidence supported by feedback loops within the cortex (Figure 1) or by the cortico-basal-ganglia-thalamic loops (Figure 4)? These possibilities could be distinguished by deactivation of striatal neurons (in the relevant channels) during a decision task. Then, if information is integrated via the cortico-basal-gangliathalamic loops, the gradual increasing firing rates of cortical integrators should no longer be observed. 3. What type of inhibition is present between cortical integrators? In this chapter we reviewed three possible pathways by which cortical integrators can compete and inhibit one another, shown in Figures 1c, 1d and 4. It should be possible to distinguish between these models on the basis of future neurophysiological studies. For example, in tasks in 11 which the amount of evidence for the two alternatives can be varied independently, the feed-forward inhibition model predicts that the activity of the integrators should only depend on the difference between sensory evidence, while the mutual-inhibition model predicts that it should also depend on the total input to the integrators (Bogacz, 2007). 4. What else is represented by the cortical integrators? The theories reviewed in this chapter propose that the integrators encode integrated evidence or the probability of the corresponding alternative being correct. However, it is known that the firing rate of the cortical integrators is also modulated by other factors, e.g. motor preparation or general desirability of alternatives (Dorris & Glimcher, 2004). It would be interesting to investigate how these other factors can further optimize decision making. 5. What is the relationship between decision making and reinforcement learning in the basal ganglia? The basal ganglia are also strongly involved in reinforcement learning [see Michael’s chapter]. It would be interesting to integrate reinforcement learning and decision making theories (Bogacz & Larsen, submitted). Furthermore, it could be interesting to model the role of dopamine during decision process, and how it modulates information processing in the basal ganglia during choice. Appendices A. Diffusion model implements SPRT This appendix describes the relationship between SPRT and the diffusion model. First note, if we assume that xi(t) are sampled independently, then the likelihood of the sequence of samples is equal to the product of the likelihoods of individual samples: t P( x(1..t ) | H1 ) = ∏ P( x(τ ) | H1 ) , (A.1) τ =1 where x(τ) denotes a pair: x1(τ), x2(τ). Substituting Equation A.1 into 3 and taking the logarithm, we obtain: t log R = ∑ log τ =1 P(x(τ ) | H1 ) . P ( x(τ ) | H 2 ) (A.2) We will now illustrate how log R changes during the choice process. First note that according to Equation A.2, at each time step t, a term is added to log R which only depends on x(t). This term will be positive if x1(t) > x2(t), because then x(t) will be more likely given hypothesis H1 (stating that μ1 > μ2) than given hypothesis H2, thus P(x(t)|H1) > P(x(t)|H2). Hence in each time step, log R increases if x1(t) > x2(t), and analogously decreases if x1(t) < x2(t). Let us now derive the likelihood ratio given in Equation 3 for the hypotheses of Equation 2. Let us start with the numerator of Equation 3. Hypothesis H1 states that x1(t) come from a normal distribution with mean μ+, while x2(t) come from a normal distribution with mean μ– (see Equation 2). Thus denoting the probability density of normal distribution with mean μ and standard deviation σ by fμ,σ, Equation A.1 becomes: t P( x(1..t ) | H1 ) = ∏ f μ + ,σ (x1 (τ )) f μ − ,σ ( x2 (τ )) . τ =1 Using the equation for the normal probability density function, we obtain: 12 (A.3) t P(x(1..t ) | H1 ) = ∏ τ =1 ( ⎛ x (τ ) − μ + 1 exp⎜ − 1 ⎜ 2σ 2 2π σ ⎝ ) 2 ( ⎞ 1 ⎛ x (τ ) − μ − ⎟ exp⎜ − 2 ⎟ 2π σ ⎜ 2σ 2 ⎠ ⎝ ) 2 ⎞ ⎟ . (A.4) ⎟ ⎠ Performing analogous calculation for the denominator of Equation 3 we obtain very similar expression as in Equation A.4 but with swapped μ+ and μ–. Hence when we write the ratio of the numerator and the denominator, many terms will cancel. First, the constants in front of the exponents will cancel and we obtain: ( ) ( ) ( ) ( ) ⎛ x (τ ) − μ + exp⎜ − 1 ⎜ t 2σ 2 ⎝ R=∏ ⎛ x (τ ) − μ − τ =1 exp⎜ − 1 ⎜ 2σ 2 ⎝ − ⎛ ⎞ ⎟ exp⎜ − x2 (τ ) − μ ⎜ ⎟ 2σ 2 ⎝ ⎠ 2 + ⎛ ⎞ ⎟ exp⎜ − x2 (τ ) − μ ⎜ ⎟ 2σ 2 ⎝ ⎠ 2 ⎞ ⎟ ⎟ ⎠. 2 ⎞ ⎟ ⎟ ⎠ 2 (A.5) Second, if we expand the squares inside the exponents, and split the exponents using exp(a+b) = exp(a)exp(b), most of the terms will cancel, and we obtain: ⎛ μ + x1 (τ ) ⎞ ⎛ μ − x2 (τ ) ⎞ ⎜ ⎟ ⎜⎜ ⎟ exp exp ⎜ σ2 ⎟ t σ 2 ⎟⎠ ⎝ ⎠ ⎝ R=∏ . ⎛ μ − x1 (τ ) ⎞ ⎛ μ + x2 (τ ) ⎞ τ =1 ⎟⎟ exp⎜⎜ ⎟⎟ exp⎜⎜ 2 2 σ σ ⎝ ⎠ ⎝ ⎠ (A.6) Now if we join the terms including x1(τ) using exp(a)/exp(b) = exp(a–b), and do the same for the terms including x2(τ) we obtain: t R = ∏ exp( gx1 (τ )) exp(− gx2 (τ )) , where g = τ =1 μ+ − μ_ . σ2 (A.7) If we put the product inside the exponent, it becomes a summation, and using Equation 1 we obtain: R= exp( gy1 ) . exp( gy2 ) (A.8) In summary, there were many cancellations during the derivation, and comparing Equations 3 and A.8 reveals that the only term which remains from P(x(1..t)|Hi) is the exponent of the evidence integrated in favour of alternative i, and the exponentiation comes from the exponent in the normal probability density function. Taking the logarithm of Equation A.8 gives Equation 4. B. Diffusion model maximizes reward rate This appendix shows that the diffusion model with optimally chosen threshold achieves higher or equal reward rate than any other model of decision making. As mentioned in Section 2.2, decision making models exhibit a speed-accuracy tradeoff (controlled by the decision threshold). For a given task, let us denote the average reaction time of the diffusion model for a given accuracy by RTd(AC), and the reaction time of another model by RTa(AC). The optimality of SPRT implies that for any accuracy level AC: RTd(AC) ≤ RTa(AC). 13 (B.1) Similarly, let us denote the reward rates of the diffusion and the other model for a given level of accuracy by RRd(AC) and RRa(AC) respectively. Let us denote the accuracy level that maximizes the reward rate of the diffusion model by ACd, so from this definition for any accuracy level AC: RRd(AC) ≤ RRd(ACd). (B.2) Similarly, let us denote the accuracy maximizing the reward rate of the other model by ACa. We can now derive the following relationship between the maximum possible reward rates achieved by the two models: RRa ( ACa ) = ACa ACa ≤ = RRd ( ACa ) ≤ RRd ( ACd ) . RTa ( ACa ) + D RTd ( ACa ) + D (B.3) In the above derivation the first inequality comes from Inequality B.1, and the second inequality from Inequality B.2. Thus Inequality B.3 implies that the highest reward rate possible for the diffusion model, RRd(ACd), is higher or equal than the highest possible reward rate in any other model, RRa(ACa). C. MSPRT This appendix shows that the logarithm of the probability Pi defined in Equation 10 is expressed by Equation 11. We can compute Pi from the Bayes theorem: Pi ≡ P (H i | x(1..t )) = P ( x(1..t ) | H i )P (H i ) . P( x(1..t )) (C.1) As usual in statistical testing, we assume that one of the hypotheses Hi is correct, thus the probability of sensory evidence P(x(1..t)) is equal to the average probability of sensory evidence given the hypotheses (weighted by the prior probabilities of the hypotheses): Pi = P ( x(1..t ) | H i )P (H i ) ∑ P(x(1..t ) | H )P(H ) N j j =1 . (C.2) j Let us for simplicity assume that we do not have any prior knowledge favouring any of the alternatives, so the prior probabilities are equal (Section 4.2 shows how the prior probabilities can be incorporated). Then the prior probabilities in Equation C.2 cancel and it becomes: Pi = P( x(1..t ) | H i ) ∑ P(x(1..t ) | H ) N . (C.3) j j =1 Now, let us recall that in Appendix A we have already evaluated a very similar ratio for a similar set of hypotheses. Following calculations like those in Appendix A, analogous cancellations happen, and Equation C.3 becomes (cf. Equation A.8): Pi = exp(gyi ) ∑ exp(gy ) N j =1 , (C.4) j where g is a constant defined in Equation A.7. As we observed in Appendix A, the exponentiation of integrated evidence comes from the exponents present in the normal 14 probability density function. If we take the logarithm of Equation C.4, then the division becomes a subtraction: log Pi = log exp( gyi ) − log ∑ exp(gy j ) . N (C.5) j =1 Equation C.5 involves two terms: In the first term the logarithm cancels the exponentiation so it becomes the integrated evidence (scaled by constant g), while the second is the Conflict in evidence (scaled by g). For simplicity, we ignore the constant g, because its precise value has been shown to be of little importance for performance (Bogacz & Gurney, 2007), and then Equation C.5 becomes Equation 11. D. STN and GP can compute the optimal conflict In this appendix we show that if STN and GP have input-output relationship given by Equations 13 and 14, then the sum of the activities of STN channels is equal to Conflict. Let us denote the activities of channel i in STN and GP by STNi and GPi respectively. The STN receives input from cortical integrators and inhibition from GP, hence STN i = exp( yi − GPi ) . (D.1) Let us denote the sum of activity of all STN channels by Σ: N Σ = ∑ STN j . (D.2) j =1 In the model of Figure 2, GP receives input from all STN channels, hence GPi = Σ − log Σ . (D.3) Substituting Equation D.3 into D.1 gives STN i = exp( yi − Σ + log Σ ) . (D.4) Using the property of exponentiation ea+b=eaeb we obtain: STN i = exp( yi ) exp(− Σ )Σ . (D.5) Summing over i and using Equation D.2 we obtain: N Σ = ∑ exp( yi ) exp(− Σ )Σ . (D.6) i =1 Taking the logarithm of Equation D.6 we get: N log Σ = log ∑ exp yi − Σ + log Σ . (D.7) i =1 log Σ cancels on both sides in Equation D.7, and moving Σ on the left side we see that the sum of activities of all STN channels is equal to Conflict: N Σ = log ∑ exp yi . i =1 15 (D.8) E. Basal ganglia model is unaffected by inhibition of integrators This Appendix shows that if the same inhibition is applied to all cortical integrators projecting to the basal ganglia, the activity of the output nuclei in the model does not change. To see this property, we subtract inh from yi in the first line of Equation E.1 (cf. Equations 8, 12) and observe that inh cancels out (last Equality in Equation E.1) and the activity of the output nuclei does not change. ⎛ N ⎞ OUTi = −( yi − inh ) + log⎜⎜ ∑ exp( y j − inh )⎟⎟ = ⎝ j =1 ⎠ ⎛⎛ N ⎞ ⎞ = − yi + inh + log⎜ ⎜⎜ ∑ exp y j ⎟⎟ exp(− inh )⎟ = (E.1) ⎜ j =1 ⎟ ⎝ ⎠ ⎝ ⎠ N ⎛ ⎞ ⎛ N ⎞ = − yi + inh + log⎜⎜ ∑ exp y j ⎟⎟ + log exp(− inh ) = − yi + log⎜⎜ ∑ exp y j ⎟⎟. ⎝ j =1 ⎠ ⎝ j =1 ⎠ Acknowledgement This work has been supported by EPSRC grants EP/C514416/1 and EP/C516303/1. The author thanks Simon Chiu for reading the previous version of the manuscript and very useful comments. References Alexander, G. E., DeLong, M. R., & Strick, P. L. (1986). Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu Rev Neurosci, 9, 357-381. Basso, M. A., & Wurtz, R. H. (1998). Modulation of neuronal activity in superior colliculus by changes in target probability. J Neurosci, 18(18), 7519-7534. Baum, C. W., & Veeravalli, V. V. (1994). A sequential procedure for multihypothesis testing. IEEE Transactions on Information Theory, 40, 1996-2007. Bogacz, R. (2007). Optimal decision-making theories: linking neurobiology with behaviour. Trends Cogn Sci, 11(3), 118-125. Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: A formal analysis of models of performance in twoalternative forced choice tasks. Psychological Review, 113, 700-765. Bogacz, R., & Gurney, K. (2007). The basal ganglia and cortex implement optimal decision making between alternative actions. Neural Computation, 19, 442-477. Bogacz, R., & Larsen, T. (submitted). Towards integration of reinforcement learning and optimal decision making theories. Britten, K. H., Shadlen, M. N., Newsome, W. T., & Movshon, J. A. (1993). Responses of neurons in macaque MT to stochastic motion signals. Vis Neurosci, 10(6), 11571169. Carpenter, R. H., & Williams, M. L. (1995). Neural computation of log likelihood in control of saccadic eye movements. Nature, 377(6544), 59-62. 16 Chevalier, G., Vacher, S., Deniau, J. M., & Desban, M. (1985). Disinhibition as a basic process in the expression of striatal functions. I. The striato-nigral influence on tectospinal/tecto-diencephalic neurons. Brain Res, 334(2), 215-226. Cisek, P., & Kalaska, J. F. (2005). Neural correlates of reaching decisions in dorsal premotor cortex: specification of multiple direction choices and final selection of action. Neuron, 45(5), 801-814. Deniau, J. M., & Chevalier, G. (1985). Disinhibition as a basic process in the expression of striatal functions. II. The striato-nigral influence on thalamocortical cells of the ventromedial thalamic nucleus. Brain Res, 334(2), 227-233. Ditterich, J. (2006). Stochastic models of decisions about motion direction: behavior and physiology. Neural Netw, 19(8), 981-1012. Ditterich, J., Mazurek, M., & Shadlen, M. N. (2003). Microstimulation of visual cortex affect the speed of perceptual decisions. Nature Neuroscience, 6, 891-898. Dorris, M. C., & Glimcher, P. W. (2004). Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron, 44(2), 365-378. Dragalin, V. P., Tertakovsky, A. G., & Veeravalli, V. V. (1999). Multihypothesis sequential probability ratio tests – part I: asymptotic optimality. IEEE Transactions on Information Theory, 45, 2448-2461. Frank, M. J. (2006). Hold your horses: a dynamic computational role for the subthalamic nucleus in decision making. Neural Netw, 19(8), 1120-1136. Frank, M. J., Samanta, J., Moustafa, A. A., & Sherman, S. J. (2007). Hold your horses: impulsivity, deep brain stimulation, and medication in parkinsonism. Science, 318(5854), 1309-1312. Frank, M. J., Seeberger, L. C., & O'Reilly, R. C. (2004). By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science, 306(5703), 1940-1943. Georgopoulos, A. P., DeLong, M. R., & Crutcher, M. D. (1983). Relations between parameters of step-tracking movements and single cell discharge in the globus pallidus and subthalamic nucleus of the behaving monkey. J Neurosci, 3(8), 1586-1598. Gerardin, E., Lehericy, S., Pochon, J. B., Montcel, S. T., Mangin, J. F., Poupon, F., et al. (2003). Foot, hand, face and eye representation in human striatum. Cerebral Cortex, 13, 162-169. Gold, J. I., & Shadlen, M. N. (2001). Neural computations that underlie decisions about sensory stimuli. Trends Cogn Sci, 5(1), 10-16. Gold, J. I., & Shadlen, M. N. (2002). Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron, 36(2), 299-308. Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annu Rev Neurosci, 30, 535-574. Hallworth, N. E., Wilson, C. J., & Bevan, M. D. (2003). Apamin-sensitive small conductance calcium-activated potassium channels, through their selective coupling to voltage-gated calcium channels, are critical determinants of the precision, pace, and pattern of action potential generation in rat subthalamic nucleus neurons in vitro. J Neurosci, 23(20), 7525-7542. 17 Hanes, D. P., & Schall, J. D. (1996). Neural control of voluntary movement initiation. Science, 274(5286), 427-430. Hanks, T. D., Ditterich, J., & Shadlen, M. N. (2006). Microstimulation of macaque area LIP affects decision-making in a motion discrimination task. Nat Neurosci, 9(5), 682689. Heekeren, H. R., Marrett, S., Bandettini, P. A., & Ungerleider, L. G. (2004). A general mechanism for perceptual decision-making in the human brain. Nature, 431(7010), 859862. Heekeren, H. R., Marrett, S., & Ungerleider, L. G. (2008). The neural systems that mediate human perceptual decision making. Nat Rev Neurosci, 9(6), 467-479. Jahanshahi, M., Ardouin, C. M., Brown, R. G., Rothwell, J. C., Obeso, J., Albanese, A., et al. (2000). The impact of deep brain stimulation on executive function in Parkinson's disease. Brain, 123 ( Pt 6), 1142-1154. Kita, H., & Kitai, S. T. (1991). Intracellular study of rat globus pallidus neurons: membrane properties and responses to neostriatal, subthalamic and nigral stimulation. Brain Res, 564(2), 296-305. Laming, D. R. J. (1968). Information theory of choice reaction time. New York: Wiley. Mazurek, M. E., Roitman, J. D., Ditterich, J., & Shadlen, M. N. (2003). A role for neural integrators in perceptual decision making. Cereb Cortex, 13(11), 1257-1269. McMillen, T., & Holmes, P. (2006). The dynamics of choice among multiple alternatives. Journal of Mathematical Psychology, 50, 30-57. Mink, J. W. (1996). The basal ganglia: focused selection and inhibition of competing motor programs. Prog Neurobiol, 50(4), 381-425. Nambu, A., & Llinas, R. (1994). Electrophysiology of globus pallidus neurons in vitro. J Neurophysiol, 72(3), 1127-1139. Norris, D. (2006). The Bayesian reader: explaining word recognition as an optimal Bayesian decision process. Psychol Rev, 113(2), 327-357. Parent, A., & Hazrati, L. N. (1993). Anatomical aspects of information processing in primate basal ganglia. Trends Neurosci, 16(3), 111-116. Parent, A., & Smith, Y. (1987). Organization of efferent projections of the subthalamic nucleus in the squirrel monkey as revealed by retrograde labeling methods. Brain Res, 436(2), 296-310. Platt, M. L., & Glimcher, P. W. (1999). Neural correlates of decision variables in parietal cortex. Nature, 400, 233- 238. Ratcliff, R. (1978). A theory of memory retrieval. Psychol Rev, 83, 59-108. Ratcliff, R., Cherian, A., & Segraves, M. (2003). A comparison of macaques behavior and superior colliculus neuronal activity to predictions from models of two-choice decisions. J Neurophysiol, 90, 1392-1407. Ratcliff, R., Gomez, R., & McKoon, G. (2004). A diffusion model account of the lexical decision task. Psychological Review, 111, 159-182. 18 Ratcliff, R., & Rouder, J. N. (2000). A diffusion model account of masking in twochoice letter identification. Journal of Experimental Psychology: Human Perception and Performance, 26, 127-140. Ratcliff, R., & Smith, P. L. (2004). Comparison of sequential sampling models for twochoice reaction time. Psychological Review, 111(2), 333-367. Ratcliff, R., Thapar, A., & McKoon, G. (2003). A diffusion model analysis of the effects of aging on brightness discrimination. Perception and Psychophysics, 65, 523535. Redgrave, P., Prescott, T. J., & Gurney, K. (1999). The basal ganglia: a vertebrate solution to the selection problem? Neuroscience, 89(4), 1009-1023. Roitman, J. D., & Shadlen, M. N. (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J Neurosci, 22(21), 9475-9489. Schall, J. D. (2001). Neural basis of deciding, choosing and acting. Nat Rev Neurosci, 2(1), 33-42. Shadlen, M. N., & Newsome, W. T. (2001). Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J Neurophysiol, 86(4), 1916-1936. Smith, P. L., & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions. Trends in Neurosciences, 27(3), 161-168. Stone, M. (1960). Models for choice reaction time. Psychometrika, 25, 251-260. Thompson, K. G., Hanes, D. P., Bichot, N. P., & Schall, J. D. (1996). Perceptual and motor processing stages identified in the activity of macaque frontal eye field neurons during visual search. J Neurophysiol, 76(6), 4040-4055. Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: the leaky, competing accumulator model. Psychol Rev, 108(3), 550-592. Vickers, D. (1970). Evidence for an accumulator model of psychophysical discrimination. Ergonomics, 13, 37-58. Wald, A. (1947). Sequential Analysis. New York: Wiley. Wald, A., & Wolfowitz, J. (1948). Optimum character of the sequential probability ratio test. Annals of Mathematical Statistics, 19, 326-339. Wang, X. J. (2002). Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 36(5), 955-968. Williams, Z. M., Neimat, J. S., Cosgrove, G. R., & Eskandar, E. N. (2005). Timing and direction selectivity of subthalamic and pallidal neurons in patients with Parkinson disease. Exp Brain Res, 162(4), 407-416. Wilson, C. J., Weyrick, A., Terman, D., Hallworth, N. E., & Bevan, M. D. (2004). A model of reverse spike frequency adaptation and repetitive firing of subthalamic nucleus neurons. J Neurophysiol, 91(5), 1963-1980. Wong, K.-F., & Wang, X.-J. (2006). A recurrent network mechanism of time integration in perceptual decisions. Journal of Neuroscience, 26, 1314-1328. Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080. 19 Figure 1 a) b) c) d) Sensory Integra. Sensory Integra. Figure 1. Connections between neuronal populations in cortical models of decision making a) Shadlen and Newsome (2001) model, b) Usher and McClelland (2001) model, c) feed-forward pooled inhibition model, d) Wang (2002) model. Open circles denote neuronal populations. Arrows denote excitatory connections, lines ended with circles denote inhibitory connections. Thus the open circles with selfexcitatory connections denote the integrator populations. Black and grey pathways correspond to neuronal populations selective for the two alternative choices. Figure 2 Sensory Integra. Striatum STN Conflict Out GP Figure 2. A subset of the cortico-basal-ganglia circuit required to implement MSPRT. Connectivity between areas based on Gurney et al. (2001). Pairs of circles correspond to brain areas: Sensory – sensory cortex encoding relevant aspects of stimuli (e.g. MT in motion discrimination task), Integra. – cortical region integrating sensory evidence (e.g. LIP in tasks with saccadic response), STN – subthalamic nucleus, GP – globus pallidus (or its homologue GPe in primates), Out – output nuclei: substantia nigra pars reticulate and entopeduncular nucleus (or its homologue GPi in primates). Arrows denote excitatory connections, lines ended with circles denote inhibitory connections. Black and grey pathways correspond to two sample channels. Figure 3 Firing rate [Hz] a) b) d) 300 300 300 250 250 250 250 200 200 200 200 150 150 150 150 100 100 100 100 50 50 50 50 0 0 0 100 200 Input current [pA] e) Firing rate [Hz] c) 300 0 0 100 200 300 Input current [pA] f) 300 300 250 250 250 200 200 200 150 150 150 100 100 100 50 50 50 0 200 400 Input current [pA] 100 200 Input current [pA] 0 100 200 Input current [pA] 0 0 100 Input current [pA] 200 g) 300 0 0 0 0 200 400 Input current [pA] 0 Figure 3. Firing rates f of STN neurons as a function of input current I. Panels (A-D) re-plot data on the firing rate of STN neurons presented in Hallworth et al. (2003) in Figures 4b, 4f, 12d, 13d respectively (control condition). Panels (E-G) re-plot the data from STN presented in Wilson et al. (2004) in Figures 1c, 2c, 2f respectively (control condition). Lines show best fit of the function f = a exp(b I) to the points with firing rates below 135 Hz. Figure 4 Sensory Integra. BG Out Thalam. Figure 4. Connectivity in the model of integration in cortico-basal-ganglia thalamic loops. Notation as in Figures 1 and 2. The dotted rectangle indicates the basal ganglia which are modelled in exactly the same way as in Figure 2, but here for simplicity not all nuclei are shown: small circle represents STN and GP, and cortical integrators send effective inhibition to the output nuclei via the striatum not shown for simplicity. Figure 5 100 Firing rate [Hz] 80 60 40 20 1 2 4 Number of alternatives 8 Figure 5. Firing rates f of superior colliculus neurons before stimulus onset as a function of the number of alternatives N. The circles correspond to the experimental data taken from Figure 4 of Basso and Wurtz (1998) averaged across two time periods (the position of each dot was computed from the heights of bars in the original figure as [black bar + 2*grey bar] / 3, because the grey bars were showing the average firing rate on 2 times longer intervals than the black bars). The solid line shows the best fitting logarithmic function f = a logN + b.