Download lecture 13

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neuroethology wikipedia , lookup

Empirical theory of perception wikipedia , lookup

Perception of infrasound wikipedia , lookup

Time perception wikipedia , lookup

Introspection illusion wikipedia , lookup

Behaviorism wikipedia , lookup

Brain stimulation reward wikipedia , lookup

Neuroeconomics wikipedia , lookup

Transcript
Schedules of reinforcement
Simple schedules of reinforcement
 CRF
 FR
 VR
 FI
 VI
Response-rate schedules of reinforcement
 DRL
 DRH
Why do ratio schedules produce higher rates of
responding than interval schedules?
Inter-response time (IRT)
Francis sells jewelry to a local gift shop. Each time he
completes 10 pairs of earrings, the shopkeeper pays him
for them. This is an example of a
schedule of
reinforcement.
A. Fixed ratio
B. Variable ratio
C. Fixed interval
D. Variable interval
Vernon is practicing his golf putting. On the average, it
takes him four tries before the ball goes in the hole. This
is an example of a
schedule of reinforcement
A. Fixed ratio
B. Variable ratio
C. Fixed interval
D. Variable interval
Sandra’s mail is delivered every day at 10:00. She checks
her mailbox several times each morning, but only finds
mail the first time she checks after 10:00. This is an
example of a
schedule of reinforcement
A. Fixed ratio
B. Variable ratio
C. Fixed interval
D. Variable interval
Paula is an eager third-grader, and loves to be called on
by her teacher. Her teacher calls on her approximately
once each period, although Paula is never sure when her
turn will come. This is an example of a
schedule
of reinforcement
A. Fixed ratio
B. Variable ratio
C. Fixed interval
D. Variable interval
Concurrent schedules
of reinforcement
Two schedules are in effect at the same time and the
subject is free to switch from one response alternative
to the other
Schedule A
VI 60 s
Schedule B
FR 10
Key A
Key B
Choice Behavior
and the
Matching Law
The Matching Law is a mathematical statement describing
the relationship between the rate of responding and the
rate of reward
 developed by Herrstein
 Relative rate of responding on a particular lever
equals the relative rate of reinforcement on that lever
The Matching Law
Formula:
Ra
=
Fa
(Ra + Rb) (Fa + Fb)
Ra and Rb = # of responses on schedules a and b
Fa and Fb = # (frequency) of reinforcers received as a
consequence of responding on schedules a and b
The Matching Law
Herrnstein found that pigeons matched their responses on
a given key to the relative frequency of reinforcement for
that key
That is, the # of pecks on Key A relative to the # pecks on
key B matched the # of rewards earned on schedule A
relative to the # of rewards earned on schedule B
Have similar formula and see similar results for:
- magnitude of reward
- immediacy/delay of reward
Evaluation of
the Matching Law
The matching law provides an accurate description of
choice behavior in many situations, but there are
exceptions and problems
 overmatching
 undermatching
 bias
 ratio versus interval schedules
Overmatching
 higher rate of responding for the better of the two
schedules than the matching law predicts
 overmatching occurs when it is costly for a subject
to switch to the less preferred response alternative
(e.g., when the two levers are far apart)
Undermatching
 occurs when the subjects responds less than
predicted on the advantageous schedule
 absolute versus relative value of the amount or
frequency of reward
• for example, the matching law predicts subjects
should make same choice when reward magnitudes
are 5 versus 3, as when the magnitudes are 10
versus 6, or 100 versus 60
• however, when absolute values are increased, the
matching law is not always accurate
Experiment by Logue & Chavarro (1987)
 varied absolute reward magnitude but kept ratio at 3:1
for the left key and the right key
 what the authors found was that the proportion of
responses devoted to the better choice declined as the
absolute values of the reward increased
• response on left key = 3 grains/pellets of food
• response on right key = 1 grain/pellet of food
• the matching law worked in this example, but then
they increased the absolute value of reward
• response on left key = 30 grains/pellets of food
• response on right key = 10 grains/pellets of food
• in this example the animals responded more on the
right key than the matching law would predict
Bias
 subject may have a special affinity or preference
for one of the choices
 a rat may prefer the R lever over the L lever or a
pigeon may prefer a red key over a green key
Ratio versus interval schedules
 animals do not match when given concurrent ratio
schedules
Theories of Matching
 the matching law is merely a description of behavior
 it does not say why a subject behaves the way it does
 there are two main explanations of why animals match
• maximization
• melioration
Maximization
 subjects attempt to maximize the rate of reinforcement
 animals have evolved to perform in a manner that yields
the greatest rate of reinforcement
 can explain why subjects match with concurrent VI-VI
schedules but not with concurrent ratio schedules
 molecular and molar maximizing theories
• according to molecular theories, animals choose
whichever response alternative is most likely to be
reinforced at that time
• according to molar theories, animals distribute their
choices to maximize reward over the long run
Melioration
 ‘make better’
 melioration mechanisms work on a time scale that is not
molecular or molar
 matching behavior occurs because the subject is
continuously choosing the more promising option – that is,
the schedule with the momentarily higher rate of
reinforcement
 subjects are continuously attempting to better their
current chances of receiving reward by switching to the
other choice
Choice with Commitment
In a standard concurrent schedule of reinforcement, two
(or more) response alternatives are available at the same
time and the subject is free to switch from one to the other
at any time
However, in some (real-life) situations, choosing one
alternative makes other alternatives unavailable
In these cases, the choice may involve assessing complex,
long-range goals
Can study these types of situations in the lab using a
Concurrent-chain schedule of reinforcment
Concurrent-chain schedule
Reinforcement
schedule A
(VR 10)
Reinforcement
schedule B
(FR 10)
Terminal link
Time
A
B
Choice link
Pecking the left key in the choice link puts into effect reinforcement
schedule A in the terminal link. Pecking the right key in the choice
link puts into effect reinforcement schedule B in the terminal link.
Self-Control
Concurrent chain schedules have been used to study
‘self-control’ in the lab
e.g., choosing a large delayed reward over an
immediate small reward
With direct choice procedures, animals often lack
self-control. That is, they choose the immediate, but
smaller reward
With concurrent-chain procedures, animals do show
self-control. That is, they choose the larger, but delayed
reward
Small
reward
Large
reward
Direct-choice procedure
Pigeon chooses immediate,
small reward
Delay
Large
reward
Small
reward
Delay
A
B
Time
A
B
Concurrent-chain procedure
Pigeon chooses the schedule
with the delayed, larger
reward
Chapter 7
The Associative Structure of
Instrumental Conditioning
Instrumental conditioning permits the development of
several types of associations
O
S
R
The instrumental response (R) occurs in the presence of
distinctive stimuli (S) and results in the delivery of the
outcome (O)
• S-R
• S-O
• R-O
The S-R Association and the Law of Effect
According to Thorndike, animals form an S-R association
 an association between the stimuli present in the
experimental situation and the instrumental response
Law of Effect
 according to the law of effect, the role of the
reinforcer (or response outcome) is to ‘stamp in’ an
association between the contextual cues (S) and the
instrumental response (R)
 an important implication of the Law of Effect is that
instrumental conditioning does not involve learning
about the reinforcer
Expectancy of Reward and the S-O Association
Seems intuitive to think that instrumental conditioning
would involve the subject learning to expect the reinforcer
However, Thorndike and Skinner did not talk about the
cognitive notion of an expectancy
The idea that reward expectancy may motivate
instrumental behavior came from developments in
Pavlovian conditioning
In Pavlovian conditioning, animals learn about stimuli
that signal some important event
One way to look for reward expectancy is to consider how
Pavlovian processes might be involved in instrumental
conditioning
Modern Two-Process Theory
The instrumental response is motivated by two factors
 first, the presence of S comes to evoke the response
directly, through a Thorndikian S-R association
 second, the instrumental response comes to be made
in response to the expectancy of reward because of an
S-O association
 through the S-O association, S comes to motivate the
instrumental behavior by activating a central emotional
state
 the implication is that the rate of an instrumental
response will be modified by the presentation of a
classically conditioned stimulus
Modern Two-Process Theory
Studies that evaluate modern two-process theory
employ a transfer-of-control experimental design
 phase 1 = operant conditioning
 phase 2 = Pavlovian conditioning
 phase 3 = transfer phase
 the subjects are allowed to engage in the
instrumental response and the CS from phase 2
is periodically presented to observe its effect on
the rate of the instrumental response
 where have we seen this before???
 CER (Conditioned Emotional Response) procedure
Evidence of R-O Associations
Neither the S-R nor the S-O association involves a direct
link between the R and the outcome, but R-O association
intuitively makes sense
A common technique for assessing R-O associations
involves devaluing the reinforcer after conditioning
to see if this decreases the instrumental response
Read experiment by Colwill & Rescorla (1986) described
on pp. 197-98 of textbook