* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ACQ_and_the_Basal_Ganglia
Cognitive neuroscience wikipedia , lookup
Central pattern generator wikipedia , lookup
Perceptual learning wikipedia , lookup
Neuroinformatics wikipedia , lookup
Recurrent neural network wikipedia , lookup
Feature detection (nervous system) wikipedia , lookup
Mirror neuron wikipedia , lookup
Nervous system network models wikipedia , lookup
Embodied language processing wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Optogenetics wikipedia , lookup
Machine learning wikipedia , lookup
Development of the nervous system wikipedia , lookup
Eyeblink conditioning wikipedia , lookup
Hypothalamus wikipedia , lookup
Concept learning wikipedia , lookup
Neuroanatomy of memory wikipedia , lookup
Premovement neuronal activity wikipedia , lookup
Channelrhodopsin wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
Synaptic gating wikipedia , lookup
Substantia nigra wikipedia , lookup
Clinical neurochemistry wikipedia , lookup
ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007 6/26/2007 1 Actor-Critic Learning • Actor – learns action policy • Critic – learns value functions • Different actor-critic architectures have been proposed for learning different value functions: – V(s) = State values (most common) – V(a) = Action values – Q(s,a) = State, action pair values 6/26/2007 2 Actor-Critic Architecture • Core Data – recording of midbrain dopaminergic neurons in appetitive learning tasks (Schultz, 1992; Schultz, 1998) (from Barto, 1995) 6/26/2007 3 Critic – V(s), V(a), or Q(s,a)? • How do dopamine cells know about reward value? – Largest striatum input is from cortex (Haber and Gdowski, 2004) – V(s) and Q(s,a) learning may require the ventral striatum, SNc, and/or VTA to receive a copy of the same cortical projections that the dorsal striatum receives (state information) – V(a) may only require a projection from the dorsal striatum or globus pallidus (actor) to the ventral striatum, SNc and/or VTA (critic) – Largest forebrain input to dopamine neurons is striatum (Haber and Gdowski, 2004) - V(a) may be more biologically plausible in terms of connectivity 6/26/2007 4 Actor-Critic in the Basal Ganglia • Dopamine targets (striatum) are site of value and policy learning (Suri & Schultz, 2001) • The striatum split into dorsal and ventral divisions (some say dorsolateral and ventromedial) (Voorn et al., 2004) – Ventral striatum – inputs from limbic structures (critic?) – Dorsal striatum – connected with motor and associative cortices (actor?) 6/26/2007 5 Role of Dopamine • (Joel & Weiner, 2000) Dopamine neurons in the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) – VTA projects to ventral striatum – learning state values – SNc projects to dorsal striatum – policy learning • Little difference in VTA and SNc firing (Schultz et al., 1993) – Predicted by TD learning equation since the policy and values are both updated using TD error 6/26/2007 6 ACQ • Reinforcement learning should maximize total utility, not necessarily total reward. Motivations map outcomes to utilities (Niv et al., 2006) • Multiple critics – one for each dimension of interoception (hunger, thirst, etc.) – Q(si ,a), si =internal state, a=action • Actor – Composite policy • Desirability – based on internal state • Executability – based on environmental state – Eligibility trace from mirror and canonical motor signals 6/26/2007 7 ACQ – Actor/Multiple Critics x=executed action ^ x=recognized action 6/26/2007 8 ACQ - Eligibility Trace xˆ xˆ x x xˆ x Action Outcome Not Attempted • x = executed action (from efference copy) • x̂ = recognized action (from mirror system) 6/26/2007 Unsuccessful Unintended Successful x x^ ε 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 -1.0 1.0 2.0 Idealized situations (perfect recognition) Realistic implementation would have confidence values between 0.0 and 1.0 for x and ^x, but the pattern of values for ε would be the same 9 ACQ - Weight Modification • Desirability and Executability updated using same eligibility and reinforcement signals • Requires different weight change rules: Don’t update the value of • Desirability the last action unless Wi I t rˆ t t 1 max xˆ t • Executability We E t d rˆ t t 6/26/2007 some action is currently recognized Step function of eligibility trace – Makes sign of weight change depend ^ on r(t) Tonic dopamine level, d, added to TD error – Makes sign of weight change 10 depend on ε(t) Multiple Critics – Q(s i ,a) • Is there evidence for multiple critics gated by interoceptive information? – The lateral hypothalamus does project to the SNc, VTA, and the ventral striatum (Saper et al., 1979; Fadel & Deutch, 2002; Brog et al., 1993) – The accumbens shell of the ventral striatum is reciprocally connected with the lateral hypothalamus and has been called a “sensory sentinel” or “visceral striatum” (Kelley, 1999, 2004) – Motivational state, such as food deprivation can influence the magnitude of dopamine release in the ventral striatum (Wilson et al., 1995; Ahn & Phillips, 1999) – Sexual satiety is signaled by serotonin from the lateral hypothalamus to the ventral striatum, which reduces dopamine levels (Lorrain et al., 1999) 6/26/2007 11 Internal State-Dependent Policy • Is there evidence for internal statedependent policies? (Kelley et al., 2005) – Information from the lateral hypothalamus reaches the dorsal striatum through the paraventricular nucleus – Hypothalamic-midline thalamic-striatal projections carry internal state information to cholinergic interneurons of the dorsal striatum • These are thought to modulate dorsal striatal output neurons 6/26/2007 12 Eligibility Trace from the Mirror System • What is the evidence for an eligibility signal from mirror neurons? – People can implicitly learn sequences through action observation (Bird et al., 2005) – The striatum is consistently implicated in implicit sequence learning and the magnitude of activation is correlated with reaction time improvement (Rauch et al., 1997, 1998) – The basal ganglia is active during action observation (Frey & Gerry, 2006) – Projection from ventral premotor cortex (including the arcuate sulcus) to dorsal and ventral striatum in the macaque (McFarland & Haber, 2000) 6/26/2007 13 References • • • • • • • • • • Ahn S, Phillips AG (1999) Dopaminergic Correlates of Sensory-Specific Satiety in the Medial Prefrontal Cortex and Nucleus Accumbens of the Rat. The Journal of Neuroscience, 19:RC29:1-6. Bird G, Osman M, Saggerson A, Heyes C (2005) Sequence learning by action, observation and action observation. British Journal of Psychology, 96: 371–388. Brog JS, Salyapongse A, Deutch AY, Zahm DS (1993) The patterns of afferent innervation of the core and shell in the Accumbens part of the rat ventral striatum: Immunohistochemical detection of retrogradely transported fluoro-gold. The Journal of Comparative Neurology, 338(2): 255-278. Fadel J, Deutch AY (2002) Anatomical Substrates of Orexin-Dopamine Interactions: Lateral hypothalamic projections to the ventral tegmental area. Neuroscience, 111(2): 379-387. Frey SH, Gerry VE (2006) Modulation of Neural Activity during Observational Learning of Actions and Their Sequential Orders. The Journal of Neuroscience, 26(51):13194-13201. Haber SN, Gdowski MJ (2004) The basal ganglia. In: The human nervous system (Paxinos G, Mai JK, eds) Ed 2 pp. 676–738. New York: Elsevier Academic. D. Joel and I. Weiner. The connections of the dopaminergic system with the striatum in rats and primates: An analysis with respect to the functional and compartmental organization of the striatum. Neuroscience, 96:451–474, 2000. Kelley AE (1999) Functional Specificity of Ventral Striatal Compartments in Appetitive Behaviors. Annals New York Academy of Sciences. Kelley AE (2004) Ventral striatal control of appetitive motivation: role in ingestive behavior and reward-related learning. Neurosci Biobehav Rev, 27: 765-776. Kelley AE, Baldo BA, Pratt WE (2005) A proposed hypothalamic-thalamic-striatal axis for the integration of energy balance, arousal, and food reward. J Comp Neurol. 493(1):72-85. 6/26/2007 14 References • • • • • • • • • • • • Lorrain DS, Riolo JV, Matuszewich L, Hull EM (1999) Lateral Hypothalamic Serotonin Inhibits Nucleus Accumbens Dopamine: Implications for Sexual Satiety. The Journal of Neuroscience, 19(17):7648-7652. McFarland NR, Haber SN (2000) Convergent Inputs from Thalamic Motor Nuclei and Frontal Cortical Areas to the Dorsal Striatum in the Primate. The Journal of Neuroscience, 20(10): 3798–3813. Niv Y, Joel D, Dayan P (2006) A normative perspective on motivation. Trends in Cognitive Sciences, 10(8): 375381. Rauch SL, Whalen PJ, Savage CR, Curran T, Kendrick A, Brown HD, Bush G, Breiter HC, Rosen BR (1997) Striatal Recruitment During an Implicit Sequence Learning Task as Measured by Functional Magnetic Resonance Imaging. Human Brain Mapping 5:124–132. Rauch SL, Whalen PJ, Curran T, McInerney S, Heckers S, Savage CR (1998) Thalamic deactivation during early implicit sequence learning: a functional MRI study. NeuroReport, 9: 865–870. Saper, C.B.; Swanson, L.W.; Cowan, W.M. (1979) An autoradiographic study of the efferent connections of the lateral hypothalamic area in the rat. J Comp Neurol., 183(4): 689-706. W. Schultz. Activity of dopamine neurons in the behaving primate. Seminars in the Neurosciences, 4:129–138, 1992. W. Schultz. Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80:1–27, 1998. W. Schultz, P. Apicella, and T. Ljungberg. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. Journal of Neuroscience, 13:900–913, 1993. R. E. Suri and W. Schultz. Temporal difference model reproduces predictive neural activity. Neural Computation, 13:841–862, 2001. P. Voorn, L. J. Vanderschuren, H. J. Groenewegen, T. W. Robbins, and C. M. Pennartz. Putting a spin on the dorsal-ventral divide of the striatum. Trends in Neuroscience, 27:468–474, 2004. Wilson C, Nomikos GG, Collu M, Fibiger HC (1995) Dopaminergic correlates of motivated behavior: importance of drive. Journal of Neuroscience, 15: 5169-5178. 6/26/2007 15