Download LTMReinforce

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Donald O. Hebb wikipedia , lookup

Transcript
Long Term Memory – Reinforcement Learning
Task Name
Description
Cognitive Construct Validity
Neural Construct
Validity
Sensitivity to
Manipulation
Probabilistic
Reversal
Learning
Subjects select one stimulus from an
array of concurrently presented stimuli
(preferably 3); typically, more than one
set of discriminanda are presented, in a
randomized order across trials. Each
choice is followed by positive or
negative feedback (correct choice or
incorrect choice). The relationship
between the stimulus presented and the
feedback given can be deterministic or
probabilistic. Subjects must, based
upon feedback alone, learn which cue in
each discriminanda set are associated
with positive feedback and then
maintain optimized performance of the
rule (rule maintenance). Once the
subject meets pre-set performance
criteria, feedback is adjusted without
warning. In other words, the rules are
"reversed" in one or more of the
discriminanda sets. Subjects must
update their behavior based upon the
change in rules. Optimal reversal
performance is often considered to
involve cognitive control over pre-potent
responding.
Acquisition of a multiple
choice visual discrimination is
believed to reflect implicit
learning processes. Optimal
performance of a visual
discrimination is thought to
reflect rule maintenance.
Reversal of a learned visual
discrimination is thought to
measure cognitive control over
pre-potent responding
(response inhibition).
Interactions between
declarative systems
(hippocampal and
prefrontal) and implicit
systems (striatum) are
predicted across initial
learning of a
discrimination, such
that increasing
activation in the
striatum, as a function
of trials performed,
occurs along with
learning. The
ventrolateral and
ventromedial frontal
cortex are particularly
involved in the
effective inhibition of
responses at reversal.
Evidence in both
animals and
humans for
change in
response to
pharmacological
manipulations
(Cools,
Altamirano, &
D'Esposito, 2006;
Cools, Lewis,
Clark, Barker, &
Robbins, 2007).
fMRI
(Waltz & Gold, 2007)
(Lee, Groman, London, & Jentsch,
2007)
(Frank & Claus, 2006) (Cools et al.,
2009; Robinson, Frank, Sahakian, &
Cools, 2010)
MANUSCRIPTS ON THE WEBSITE:
Cools, R., Frank, M. J., Gibbs, S. E.,
Miyakawa, A., Jagust, W., & D'Esposito,
M. (2009). Striatal dopamine predicts
outcome-specific reversal learning and
In humans, striatal
dopamine predicts
outcome specific
reversal learning
(Cools et al., 2009).
(Dias, Robbins, &
Roberts, 1996, 1997)
(Fellows & Farah,
2005)
(Fellows & Farah,
2003)
Relationships to
Behavior and
Schizophrenia
Individuals with
schizophrenia show
impairments in reversal
learning (Waltz & Gold,
2007)
.
Psychometrics
Stage of Research
Practice effects
are unknown, but
are the subject of
active
investigation.
Depending upon
how the test is
conducted, there
can be ceiling
effects for
discrimination
learning. The use
of multiple
discriminada sets,
at least 3 stimuli in
each set and
probabilistic
feedback can
avoid this.
There is evidence
that this specific task
elicits deficits in
schizophrenia at the
behavioral level.
Unknown at the
neural level.
We need to assess
psychometric
characteristics such
as test-retest
reliability, practice
effects, and
ceiling/floor effects
for this task.
There is some
evidence that
performance on this
task changes in
response to
psychological or
pharmacological
intervention
(Cools et al., 2006;
Cools et al., 2007)
its sensitivity to dopaminergic drug
administration. J Neurosci, 29(5), 15381543.
Robinson, O. J., Frank, M. J., Sahakian,
B. J., & Cools, R. (2010). Dissociable
responses to punishment in distinct
striatal regions during reversal learning.
Neuroimage, 51(4), 1459-1467.
Probabilisiti
c Selection
Task
fMRI
The probabilistic selection (PS) task
measures participants' ability to learn
from positive and negative feedback,
both on a trial-to-trial basis, and in
integrating reinforcement probabilities
over many trials. It also probes
performance in a novel 'test' phase
which permits evaluation of whether one
implicitly learned more from the positive
or negative outcomes of their decisions.
This "Go" or "NoGo" learning bias is
very sensitive to dopaminergic
manipulation and dopamine-related
genetics.
(Frank, Seeberger, & O'Reilly R, 2004)
(Waltz, Frank, Robinson, & Gold,
2007a)
(Frank, Moustafa, Haughey, Curran, &
Hutchison, 2007)
MANUSCRIPTS ON THE WEBSITE:
Frank, M. J., Seeberger, L. C., &
O'Reilly R, C. (2004). By carrot or by
stick: cognitive reinforcement learning in
parkinsonism. Science, 306(5703),
1940-1943.
Waltz, J. A., Frank, M. J., Robinson, B.
M., & Gold, J. M. (2007). Selective
reinforcement learning deficits in
schizophrenia support predictions from
computational models of striatal-cortical
dysfunction. Biological Psychiatry.
Performance in this task is
defined by the ability to
choose the probabilistically
most optimal stimulus.
Relative positive and negative
feedback learning is similarly
modulated by dopamine
manipulation in this task and
others that are meant to
measure similar constructs
using different stimuli, motor
responses, and task rules.
Probabilistic positive
and negative feedback
learning are sensitive
to dopaminergic
manipulation.
Increases in
dopaminergic
stimulation, likely in
the striatum, lead to
better positive learning
but cause impairments
in negative feedback
learning. Dopamine
depletions, as in
Parkinson's disease
and older seniors
(greater than 70 years
of age) are associated
with relatively better
negative feedback
learning. Genes that
control dopamine
function in the striatum
are predictive of
probabilistic positive
and negative learning,
whereas genes that
control dopamine
function in prefrontal
cortex are predictive of
rapid trial-to-trial
learning from negative
feedback. Negative
feedback learning is
also associated with
enhanced error-
There are multiple
examples of how
performance is
sensitive to
pharmacological
manipulations of
the dopamine
system (Frank et
al., 2004;
Santesso et al.,
2009). Cavanagh,
Frank & Allen
(2010) reported
that performance
and associated
EEG measures
are affected by
social stress, in a
way that interacts
with both trait
vulnerability and
state variables
(negative affect)
(Cavanagh, Frank,
& Allen, 2010).
Petzold et al
(2010) also
reported a
behavioral study in
which learning
from negative
feedback was
affected by stress
(Petzold, Plessow,
Goschke, &
Kirschbaum,
An initial behavioral study
showed that patients with
schizophrenia exhibit
selective impairments in
learning from positive but
not negative prediction
errors in this task (Waltz,
Frank, Robinson, & Gold,
2007b). Other imaging
studies (using a different
task) report that SZ
patients exhibit reduced
neural representation of
positive but not negative
prediction errors, including
in the striatum (Waltz et
al., 2009).
Practice effects
have been
assessed in Frank
& O'Reilly (2006).
On average
participants are
faster to learn the
task after multiple
sessions, but this
practice does not
affect relative
positive versus
negative feedback
learning.
(Frank & O'Reilly
R, 2006)
However we have
found in other
unpublished data
that when the task
is run twice in a
row (in the same
session) there is
little
correspondence
between learning
bias in the two
iterations, and that
striatal dopamine
genetic
polymorphisms
are only predictive
of performance in
the first iteration
There is evidence
that this specific task
elicits deficits in
schizophrenia.
Data already exists
on psychometric
characteristics of this
task, such as testretest reliability,
practice effects,
ceiling/floor effects.
There is evidence
that performance on
this task can
improve in response
to psychological or
pharmacological
interventions.
related negativity
(brain potentials
originating from
anterior-cingulate
cortex) and activation
of this same region in
fmri.
2010).
(Doll and Frank, in
review). It is likely
that repeated
performance of
the task allows
participants to use
other higher order
strategies once
they understand
the task structure,
such that single
gene effects on
simple aspects of
reward and
punishment
learning are no
longer visible, but
that larger effects
due to
pharmacological
manipulations
nevertheless
persist.
(Frank et al., 2007)
(Frank, Woroch, &
Curran, 2005)
(Klein et al., 2007)
(Frank & O'Reilly R,
2006)
Corlett
Associative
Learning
Task
fMRI
This task is a retrospective revaluation
paradigm in which expectations creating
during a learning phase are violated to
produce a prediction error. Subjects are
asked to imagine themselves working
as an allergist confronted with a new
patient ‘Mr X’, who suffers allergic
reactions following some meals but not
others. Their task was to work out which
foods caused allergic reactions by
observing the consequences of eating
various foods. Trials consistent of the
presentation of a food picture
(representing a meal eaten by Mr X), a
predictive response by the subject and,
following this, an outcome indicated
whether or not Mr. X got sick. There
are three phases to the task. In Stage
1, participants learn the initial
expectancies. The key trial types in this
re four pairs of foods in which subjects
See task description.
A reliable marker for
prediction-error
processing has been
identified in right
prefrontal cortex
(rPFC) using fMRI
(Corlett et al., 2004;
Fletcher et al., 2001;
Turner et al., 2004).
Ketamine, a drug
which causes a
transient psychosis,
perturbs this brain
response in healthy
individuals Corlett et
al., 2006).
Furthermore, the
magnitude of the
prediction-error
response under
placebo predicts an
Ketamine alters
casual associate
prediction learning
Corlett et al.,
2006) .
Psychotic individual show
impaired causal
associative learning and
reduced violation of
expectances (Corlett et al.,
2007)
.
Unknown
There is evidence
that this specific task
elicits deficits in
schizophrenia.
We need to collect
data on
psychometric
characteristics of this
task, such as testretest reliability,
practice effects,
ceiling/floor effects.
There is evidence
that performance on
this task can change
in response to
psychological or
pharmacological
interventions.
learn to expect that these food pairs will
always predict an allergic response. In
Stage 2, single foods, one from each of
the compounds presented in stage 1, re
presented with or without allergic outcomes. These trials re arranged to
cause retrospective revaluation of the
absent but expected cues from stage 1.
For example, in the backward blocking
condition one cue from a pair that had
previously caused an allergy was itself
paired with an allergy. This was
designed to create a downward
revaluation of the allergenic status of
the other cue. In the unovershadowing
condition, one cue from a pair that had
previously been paired with an allergy
was presented without an allergy
outcome. In Stage 3, the outcome of
some trials could violate the expectation
engendered by any retrospective
revaluation that had occurred during
stage 2, and others fulfilled the
predictions. Thus, stage 3 allows one to
compare brain activity during trials in
which prediction error should be larger
(backward blocked items in which an
allergic reaction occurred and
unovershadowed items in which an
allergic response did not occur;) in
comparison to perfectly matched stimuli
from stage 1 in which prediction error
should be smaller (unovershadowed
items in which an allergic reaction
occurred and backward blocked items in
which an allergic reaction did not occur).
MANUSCRIPTS ON THE WEBSITE:
Corlett, P. R., Honey, G. D., Aitken, M.
R., Dickinson, A., Shanks, D. R.,
Absalom, A. R., et al. (2006). Frontal
responses during learning predict
vulnerability to the psychotogenic
effects of ketamine: linking cognition,
brain activity, and psychosis. Arch Gen
individual’s likelihood
of experiencing
delusional beliefs
under ketamine
Corlett et al., 2007).
Psychiatry, 63(6), 611-621.
Corlett, P. R., Murray, G. K., Honey, G.
D., Aitken, M. R., Shanks, D. R.,
Robbins, T. W., et al. (2007). Disrupted
prediction-error signal in psychosis:
evidence for an associative account of
delusions. Brain, 130(Pt 9), 2387-2400.
REFERENCES:
Cavanagh, J. F., Frank, M. J., & Allen, J. J. (2010). Social stress reactivity alters reward and punishment learning. Soc Cogn Affect Neurosci.
Cools, R., Altamirano, L., & D'Esposito, M. (2006). Reversal learning in Parkinson's disease depends on medication status and outcome valence. Neuropsychologia, 44(10), 1663-1673.
Cools, R., Frank, M. J., Gibbs, S. E., Miyakawa, A., Jagust, W., & D'Esposito, M. (2009). Striatal dopamine predicts outcome-specific reversal learning and its sensitivity to dopaminergic drug
administration. J Neurosci, 29(5), 1538-1543.
Cools, R., Lewis, S. J., Clark, L., Barker, R. A., & Robbins, T. W. (2007). L-DOPA disrupts activity in the nucleus accumbens during reversal learning in Parkinson's disease.
Neuropsychopharmacology, 32(1), 180-189.
Corlett, P. R., Aitken, M. R., Dickinson, A., Shanks, D. R., Honey, G. D., Honey, R. A., et al. (2004). Prediction error during retrospective revaluation of causal associations in humans: fMRI
evidence in favor of an associative model of learning. Neuron, 44(5), 877-888.
Corlett, P. R., Murray, G. K., Honey, G. D., Aitken, M. R., Shanks, D. R., Robbins, T. W., et al. (2007). Disrupted prediction-error signal in psychosis: evidence for an associative account of
delusions. Brain, 130(Pt 9), 2387-2400.
Dias, R., Robbins, T. W., & Roberts, A. C. (1996). Dissociation in prefrontal cortex of affective and attentional shifts. Nature, 380, 69-72.
Dias, R., Robbins, T. W., & Roberts, A. C. (1997). Dissociable forms of inhibitory control within prefrontal cortex with an analog of the Wisconsin Card Sort Test: restrictions to novel situations
and independence from "on-line" processing. The Journal of Neuroscience, 17(23), 9285-9297.
Fellows, L. K., & Farah, M. J. (2003). Ventromedial frontal cortex mediates affective shifting in humans: evidence from a reversal learning paradigm. Brain, 126(Pt 8), 1830-1837.
Fellows, L. K., & Farah, M. J. (2005). Different underlying impairments in decision-making following ventromedial and dorsolateral frontal lobe damage in humans. Cereb Cortex, 15(1), 58-63.
Fletcher, P. C., Anderson, J. M., Shanks, D. R., Honey, R., Carpenter, T. A., Donovan, T., et al. (2001). Responses of human frontal cortex to surprising events are predicted by formal associative
learning theory. Nat Neurosci, 4(10), 1043-1048.
Frank, M. J., & Claus, E. D. (2006). Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol Rev, 113(2), 300-326.
Frank, M. J., Moustafa, A. A., Haughey, H. M., Curran, T., & Hutchison, K. E. (2007). Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc Natl Acad
Sci U S A, 104(41), 16311-16316.
Frank, M. J., & O'Reilly R, C. (2006). A mechanistic account of striatal dopamine function in human cognition: psychopharmacological studies with cabergoline and haloperidol. Behav
Neurosci, 120(3), 497-517.
Frank, M. J., Seeberger, L. C., & O'Reilly R, C. (2004). By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science, 306(5703), 1940-1943.
Frank, M. J., Woroch, B. S., & Curran, T. (2005). Error-related negativity predicts reinforcement learning and conflict biases. Neuron, 47(4), 495-501.
Klein, T. A., Neumann, J., Reuter, M., Hennig, J., von Cramon, D. Y., & Ullsperger, M. (2007). Genetically determined differences in learning from errors. Science, 318(5856), 1642-1645.
Lee, B., Groman, S., London, E. D., & Jentsch, J. D. (2007). Dopamine D2/D3 receptors play a specific role in the reversal of a learned visual discrimination in monkeys.
Neuropsychopharmacology, 32(10), 2125-2134.
Petzold, A., Plessow, F., Goschke, T., & Kirschbaum, C. (2010). Stress reduces use of negative feedback in a feedback-based learning task. Behav Neurosci, 124(2), 248-255.
Robinson, O. J., Frank, M. J., Sahakian, B. J., & Cools, R. (2010). Dissociable responses to punishment in distinct striatal regions during reversal learning. Neuroimage, 51(4), 1459-1467.
Santesso, D. L., Evins, A. E., Frank, M. J., Schetter, E. C., Bogdan, R., & Pizzagalli, D. A. (2009). Single dose of a dopamine agonist impairs reinforcement learning in humans: evidence from
event-related potentials and computational modeling of striatal-cortical function. Hum Brain Mapp, 30(7), 1963-1976.
Turner, D. C., Aitken, M. R., Shanks, D. R., Sahakian, B. J., Robbins, T. W., Schwarzbauer, C., et al. (2004). The role of the lateral frontal cortex in causal associative learning: exploring
preventative and super-learning. Cereb Cortex, 14(8), 872-880.
Waltz, J. A., Frank, M. J., Robinson, B. M., & Gold, J. M. (2007a). Selective Reinforcement Learning Deficits in Schizophrenia Support Predictions from Computational Models of StriatalCortical Dysfunction. Biological Psychiatry, 62(7), 756-764.
Waltz, J. A., Frank, M. J., Robinson, B. M., & Gold, J. M. (2007b). Selective Reinforcement Learning Deficits in Schizophrenia Support Predictions from Computational Models of StriatalCortical Dysfunction. Biological Psychiatry.
Waltz, J. A., & Gold, J. M. (2007). Probabilistic reversal learning impairments in schizophrenia: further evidence of orbitofrontal dysfunction. Schizophrenia Research, 93(1-3), 296-303.
Waltz, J. A., Schweitzer, J. B., Gold, J. M., Kurup, P. K., Ross, T. J., Salmeron, B. J., et al. (2009). Patients with schizophrenia have a reduced neural response to both unpredictable and
predictable primary reinforcers. Neuropsychopharmacology, 34(6), 1567-1577.