Download LTFeb7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Perceptual control theory wikipedia , lookup

Reinforcement learning wikipedia , lookup

Transcript
PSY402
Theories of Learning
Friday
February 7, 2003
Discussion of Class Project



Handout on project – see me to get
one if you were not in class today.
Plan due next Wednesday – ½ page
briefly describing the behavior to be
changed and how you will do it.
IMPORTANT – do not use
punishment.

Do nothing that would harm yourself or
others.
Extinction of Operant Responding

Without reward operant responding
gradually stops.


After it stops, with a timeout,
spontaneous recovery occurs.


Before it stops (is extinguished), it
temporarily increases.
This is similar to classical conditioning.
Without reward, spontaneous
recovery also goes away.
Hull’s Explanation


Environmental cues present during
nonrewarded behavior become
associated with the inhibitory state.
Example:




A rat runs down alley but gets no
reward.
An inhibition response is elicited.
Inhibition is associated with the alley.
The alley evokes inhibition next time.
Amsel’s Explanation

Amsel – nonreward elicits
frustration, an aversive state.



Environmental cues associated with
nonreward become able to elicit
frustration.
Escape from frustration is rewarded
because the animal feels better (relief).
Operant behavior is not performed
in order to avoid (escape) the
frustration of nonreward.
Nonreward is Aversive

Adelman & Maatsch – animals jump
out of box associated with
nonreward:




5 sec if not rewarded
20 sec if rewarded for jumping out
60 extinction trials if rewarded, 100+ if
not rewarded
Daly – nonreward cues are aversive

Motivate behavior to terminate cue.
Nonreward Can Increase Behavior


If frustration cues are associated
with appetitive instead of avoidance
behavior, responding increases.
Alternation of rewarded trials:


Responding increases after a
nonrewarded trial, decreases after a
rewarded trial.
Capaldi – animals have a memory
for previous reward.
Resistance to Extinction

Three factors affect how quickly
extinction occurs:



Reward magnitude (in relation to
length of training)
Delay of reward experienced during
acquisition training.
Consistency of reinforcement during
acquisition training.
Reward Size

Effect on extinction depends on
number of learning trials:



With a few trials, higher reward leads
to slower extinction.
With extended training, high reward
leads to faster extinction.
D’Amato -- Anticipatory goal states
and frustration cause this shift.
Effects of Frustration



Frustration builds up when there is a
strong anticipatory goal response
(expectation of reward).
With small reward, there is little
anticipation and little frustration, so only
acquisition trials matter.
With more training and large reward,
greater anticipation leads to greater
frustration which leads to faster
extinction.
Effects of Delay and Consistency



Only variable delay (not constant
delay), when substantial (20-30
sec) makes extinction slower.
Intermittent reinforcement – if the
response was not reinforced very
time it occurred, extinction is
slower.
Partial Reinforcement Effect
Partial Reinforcement Effect (PRE)



Extinction is slowest when behavior
was intermittently reinforced during
learning.
With humans, the lower the slot
machine payoff, the longer people
play (resistance to extinction).
But, if the percent of reinforced
trials is too low, rapid extinction
occurs (U-shaped relationship).
Explanations for PRE

Two explanations:



Amsel – frustration-based
Capaldi – sequential theory
Both provide good explanations for
observed data.
Amsel’s Frustration Theory


Frustration leads to rapid extinction
during continuous reinforcement.
During intermittent reinforcement,
frustration becomes associated with
responding.

Frustration then elicits not suppresses
responding.
Capaldi’s Sequential Theory


If reward follows a nonrewarded
trial, memory of the nonrewarded
trial is associated with responding.
During continuous reinforcement,
animals do not associate lack of
reward with responding.

When they encounter the first
nonrewarded trial, the state it produces
is not associated with responding.
Contingency Management



Assessment phase – determine the
frequency of behavior and the
situations in which it occurs.
Contracting phase – specifies the
relationship between responding
and reinforcement.
Management phase – implement
the contract and evaluate results.