Download ACQ_and_the_Basal_Ganglia

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cognitive neuroscience wikipedia , lookup

Central pattern generator wikipedia , lookup

Perceptual learning wikipedia , lookup

Neuroinformatics wikipedia , lookup

Recurrent neural network wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Mirror neuron wikipedia , lookup

Nervous system network models wikipedia , lookup

Embodied language processing wikipedia , lookup

Learning wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Optogenetics wikipedia , lookup

Machine learning wikipedia , lookup

Development of the nervous system wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Hypothalamus wikipedia , lookup

Concept learning wikipedia , lookup

Neuroanatomy of memory wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Synaptic gating wikipedia , lookup

Substantia nigra wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Neuroeconomics wikipedia , lookup

Basal ganglia wikipedia , lookup

Transcript
ACQ and the Basal Ganglia
Jimmy Bonaiuto
USC Brain Project
6/26/2007
6/26/2007
1
Actor-Critic Learning
• Actor – learns action policy
• Critic – learns value functions
• Different actor-critic architectures have
been proposed for learning different value
functions:
– V(s) = State values (most common)
– V(a) = Action values
– Q(s,a) = State, action pair values
6/26/2007
2
Actor-Critic Architecture
• Core Data – recording of midbrain dopaminergic
neurons in appetitive learning tasks (Schultz,
1992; Schultz, 1998)
(from
Barto, 1995)
6/26/2007
3
Critic – V(s), V(a), or Q(s,a)?
• How do dopamine cells know about reward value?
– Largest striatum input is from cortex (Haber and Gdowski, 2004)
– V(s) and Q(s,a) learning may require the ventral striatum, SNc, and/or
VTA to receive a copy of the same cortical projections that the dorsal
striatum receives (state information)
– V(a) may only require a projection from the dorsal striatum or globus
pallidus (actor) to the ventral striatum, SNc and/or VTA (critic)
– Largest forebrain input to dopamine neurons is striatum (Haber and
Gdowski, 2004)
-
V(a) may be more biologically plausible in terms of connectivity
6/26/2007
4
Actor-Critic in the Basal Ganglia
• Dopamine targets (striatum) are site of
value and policy learning (Suri & Schultz,
2001)
• The striatum split into dorsal and ventral
divisions (some say dorsolateral and
ventromedial) (Voorn et al., 2004)
– Ventral striatum – inputs from limbic
structures (critic?)
– Dorsal striatum – connected with motor and
associative cortices (actor?)
6/26/2007
5
Role of Dopamine
• (Joel & Weiner, 2000) Dopamine neurons in the
ventral tegmental area (VTA) and substantia
nigra pars compacta (SNc)
– VTA projects to ventral striatum – learning state
values
– SNc projects to dorsal striatum – policy learning
• Little difference in VTA and SNc firing (Schultz et
al., 1993)
– Predicted by TD learning equation since the policy
and values are both updated using TD error
6/26/2007
6
ACQ
• Reinforcement learning should maximize total
utility, not necessarily total reward. Motivations
map outcomes to utilities (Niv et al., 2006)
• Multiple critics – one for each dimension of
interoception (hunger, thirst, etc.)
– Q(si ,a), si =internal state, a=action
• Actor
– Composite policy
• Desirability – based on internal state
• Executability – based on environmental state
– Eligibility trace from mirror and canonical motor
signals
6/26/2007
7
ACQ – Actor/Multiple Critics
x=executed action
^
x=recognized
action
6/26/2007
8
ACQ - Eligibility Trace
  xˆ  xˆ  x   x  xˆ  x 
Action
Outcome
Not Attempted
• x = executed action
(from efference
copy)
• x̂ = recognized
action (from mirror
system)
6/26/2007
Unsuccessful
Unintended
Successful
x
x^
ε
0.0
1.0
0.0
1.0
0.0
0.0
1.0
1.0
0.0
-1.0
1.0
2.0
Idealized situations (perfect recognition)
Realistic implementation would have
confidence values between 0.0 and 1.0 for
x and ^x, but the pattern of values for ε
would be the same
9
ACQ - Weight Modification
• Desirability and Executability updated
using same eligibility and reinforcement
signals
• Requires different weight change rules:
Don’t update the value of
• Desirability
the last action unless
Wi  I  t  rˆ  t      t  1  max  xˆ t  
• Executability
We  E  t   d  rˆ t    t 
6/26/2007
some action is currently
recognized
Step function of eligibility trace –
Makes sign of weight change depend
^
on r(t)
Tonic dopamine level, d, added to TD
error – Makes sign of weight change
10
depend on ε(t)
Multiple Critics – Q(s i ,a)
• Is there evidence for multiple critics gated by
interoceptive information?
– The lateral hypothalamus does project to the SNc, VTA, and the
ventral striatum (Saper et al., 1979; Fadel & Deutch, 2002; Brog
et al., 1993)
– The accumbens shell of the ventral striatum is reciprocally
connected with the lateral hypothalamus and has been called a
“sensory sentinel” or “visceral striatum” (Kelley, 1999, 2004)
– Motivational state, such as food deprivation can influence the
magnitude of dopamine release in the ventral striatum (Wilson et
al., 1995; Ahn & Phillips, 1999)
– Sexual satiety is signaled by serotonin from the lateral
hypothalamus to the ventral striatum, which reduces dopamine
levels (Lorrain et al., 1999)
6/26/2007
11
Internal State-Dependent Policy
• Is there evidence for internal statedependent policies? (Kelley et al., 2005)
– Information from the lateral hypothalamus
reaches the dorsal striatum through the
paraventricular nucleus
– Hypothalamic-midline thalamic-striatal
projections carry internal state information to
cholinergic interneurons of the dorsal striatum
• These are thought to modulate dorsal striatal
output neurons
6/26/2007
12
Eligibility Trace from the Mirror
System
• What is the evidence for an eligibility signal from
mirror neurons?
– People can implicitly learn sequences through action
observation (Bird et al., 2005)
– The striatum is consistently implicated in implicit
sequence learning and the magnitude of activation is
correlated with reaction time improvement (Rauch et
al., 1997, 1998)
– The basal ganglia is active during action observation
(Frey & Gerry, 2006)
– Projection from ventral premotor cortex (including the
arcuate sulcus) to dorsal and ventral striatum in the
macaque (McFarland & Haber, 2000)
6/26/2007
13
References
•
•
•
•
•
•
•
•
•
•
Ahn S, Phillips AG (1999) Dopaminergic Correlates of Sensory-Specific Satiety in the Medial
Prefrontal Cortex and Nucleus Accumbens of the Rat. The Journal of Neuroscience, 19:RC29:1-6.
Bird G, Osman M, Saggerson A, Heyes C (2005) Sequence learning by action, observation and
action observation. British Journal of Psychology, 96: 371–388.
Brog JS, Salyapongse A, Deutch AY, Zahm DS (1993) The patterns of afferent innervation of the
core and shell in the Accumbens part of the rat ventral striatum: Immunohistochemical detection of
retrogradely transported fluoro-gold. The Journal of Comparative Neurology, 338(2): 255-278.
Fadel J, Deutch AY (2002) Anatomical Substrates of Orexin-Dopamine Interactions: Lateral
hypothalamic projections to the ventral tegmental area. Neuroscience, 111(2): 379-387.
Frey SH, Gerry VE (2006) Modulation of Neural Activity during Observational Learning of Actions
and Their Sequential Orders. The Journal of Neuroscience, 26(51):13194-13201.
Haber SN, Gdowski MJ (2004) The basal ganglia. In: The human nervous system (Paxinos G, Mai
JK, eds) Ed 2 pp. 676–738. New York: Elsevier Academic.
D. Joel and I. Weiner. The connections of the dopaminergic system with the striatum in rats and
primates: An analysis with respect to the functional and compartmental organization of the
striatum. Neuroscience, 96:451–474, 2000.
Kelley AE (1999) Functional Specificity of Ventral Striatal Compartments in Appetitive Behaviors.
Annals New York Academy of Sciences.
Kelley AE (2004) Ventral striatal control of appetitive motivation: role in ingestive behavior and
reward-related learning. Neurosci Biobehav Rev, 27: 765-776.
Kelley AE, Baldo BA, Pratt WE (2005) A proposed hypothalamic-thalamic-striatal axis for the
integration of energy balance, arousal, and food reward. J Comp Neurol. 493(1):72-85.
6/26/2007
14
References
•
•
•
•
•
•
•
•
•
•
•
•
Lorrain DS, Riolo JV, Matuszewich L, Hull EM (1999) Lateral Hypothalamic Serotonin Inhibits Nucleus Accumbens
Dopamine: Implications for Sexual Satiety. The Journal of Neuroscience, 19(17):7648-7652.
McFarland NR, Haber SN (2000) Convergent Inputs from Thalamic Motor Nuclei and Frontal Cortical Areas to the
Dorsal Striatum in the Primate. The Journal of Neuroscience, 20(10): 3798–3813.
Niv Y, Joel D, Dayan P (2006) A normative perspective on motivation. Trends in Cognitive Sciences, 10(8): 375381.
Rauch SL, Whalen PJ, Savage CR, Curran T, Kendrick A, Brown HD, Bush G, Breiter HC, Rosen BR (1997)
Striatal Recruitment During an Implicit Sequence Learning Task as Measured by Functional Magnetic Resonance
Imaging. Human Brain Mapping 5:124–132.
Rauch SL, Whalen PJ, Curran T, McInerney S, Heckers S, Savage CR (1998) Thalamic deactivation during early
implicit sequence learning: a functional MRI study. NeuroReport, 9: 865–870.
Saper, C.B.; Swanson, L.W.; Cowan, W.M. (1979) An autoradiographic study of the efferent connections of the
lateral hypothalamic area in the rat. J Comp Neurol., 183(4): 689-706.
W. Schultz. Activity of dopamine neurons in the behaving primate. Seminars in the Neurosciences, 4:129–138,
1992.
W. Schultz. Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80:1–27, 1998.
W. Schultz, P. Apicella, and T. Ljungberg. Responses of monkey dopamine neurons to reward and conditioned
stimuli during successive steps of learning a delayed response task. Journal of Neuroscience, 13:900–913, 1993.
R. E. Suri and W. Schultz. Temporal difference model reproduces predictive neural activity. Neural Computation,
13:841–862, 2001.
P. Voorn, L. J. Vanderschuren, H. J. Groenewegen, T. W. Robbins, and C. M. Pennartz. Putting a spin on the
dorsal-ventral divide of the striatum. Trends in Neuroscience, 27:468–474, 2004.
Wilson C, Nomikos GG, Collu M, Fibiger HC (1995) Dopaminergic correlates of motivated behavior: importance of
drive. Journal of Neuroscience, 15: 5169-5178.
6/26/2007
15