Download PSY304 Test 2 Review Reinforcement

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Insufficient justification wikipedia , lookup

Symbolic behavior wikipedia , lookup

Psychophysics wikipedia , lookup

Behavioral modernity wikipedia , lookup

Thin-slicing wikipedia , lookup

Classical conditioning wikipedia , lookup

Attribution (psychology) wikipedia , lookup

Parent management training wikipedia , lookup

Residential treatment center wikipedia , lookup

Theory of planned behavior wikipedia , lookup

Transtheoretical model wikipedia , lookup

Sociobiology wikipedia , lookup

Impulsivity wikipedia , lookup

Theory of reasoned action wikipedia , lookup

Neuroeconomics wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Applied behavior analysis wikipedia , lookup

Descriptive psychology wikipedia , lookup

Verbal Behavior wikipedia , lookup

Behavior analysis of child development wikipedia , lookup

Behaviorism wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
Psychology 304
Learning
James T. Todd, Ph.D.
Eastern Michigan University
Operant Conditioning
(also Instrumental Conditioning, Trial and
Error, and Type-R)
1. The reflex (S-R) is defined by association.
2. The operant is defined by its consequences.
3. A response that occurs because it was
reinforced in the past is an operant.
The Operant
r+
R —>S
(Response —>Consequence)
“Law of Effect”
The future probability of a response is dependent
on the effects of past occurrences.
First formally investigated by Edward Thorndike.
His 1911 book on this, Animal Inte!igence, can be
found free online. It is based on his 1896 article of
the same title.
The “Three-Term Contingency”
d
r+
S ——>R ——>S
(Discriminative Stimulus —> Response —Consequence)
Discriminative Stimulus: Signals the kind of consequence.
Response: What the organism does.
Consequence: That happens after the organism emits the response
(We say that operant responses are “emitted.”)
The “Three-Term Contingency”
d
r+
S ——>R ——>S
(Discriminative Stimulus —> Response —Consequence)
Based on B.F. Skinner’s work, first compiled in The Behavior of
Organisms: An Experimental Analysis (1938), then expanded throughout
his life.
Later, we will learn about “schedules of reinforcement,” also based on
Skinner’s work in the 1950s.
Kinds of Consequences
d
r+
S ——>R ——>S
(Discriminative Stimulus —> Response —Consequence)
Positive reinforcer: increases the probability of the response by its occurrence.
Negative reinforcer: increases the probability of the response by its removal.
Positive punisher: decreases the probability of the response by its occurrence.
Negative punisher: decreases the probability of the response by its removal.
Kinds of Discriminative Stimuli
d
r+
S ——>R ——>S
(Discriminative Stimulus —> Response —Consequence)
Sd or S+: Generally indicates the response will be reinforced.
S∆ or S- or S-delta: Generally indicates the response will be punished
or not reinforced.
Effects of Single Reinforcement
(one powerful reinforcer can produce hundreds
of responses in extinction)
Superstitious Behavior
Superstitious behavior: Behavior that is controlled by accidental pairings
of consequences and responses.
Usually, we think of this as behavior accidentally reinforced.
But, it can be behavior accidentally punished or extinguished.
Superstitious behavior is generally unstable, because the reinforcers or
punishers aren’t consistently paired with the responding.
Operant Basics
Extinction: Removing the reinforcing consequence to
decrease the probability of the response.
Response Strength
10
7.5
5
2.5
0
0
1
2
3
4
5
6
7
8
9
10
Real Extinction
There is usually a temporary increase in rate, and
occasionally increases throughout extinction. These are
called “extinction bursts.”
Response Strength
18
Burst
13.5
9
Burst
Burst
4.5
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Response Shaping
Differential Reinforcement of Successive Approximations.
Sometimes called “hand shaping.”
Now we usually do “clicker training.” This is associated with
Karen Pryor in the 1960, but first written about by Skinner in 1950.
You pair a click with food using Pavlovian Conditioning.
Then, you use the click to gradually shape the behavior you want by
approximating it in steps.
Some additional reinforcement terms
Primary reinforcer: A reinforcer that acts without
previous experience with it (e.g., food for a hungry
organism).
Conditioned/secondary reinforcer: A reinforcer
that acts due to previous learning or experience (e.g.,
money).
Generalized reinforcer: a reinforcer that acts on a
broad range of responses under a variety of conditions.
You get what the contingency is on.
Reinforcing variability
Barry Schwartz argued that contingent reinforcement produces
“behavioral stereotypy.”
Behavioral stereotypy is behavior that is the same time after time
and hard to change.
Schwartz suggested using more verbal instruction and less contingent
rewards in teaching to avoid behavioral stereotypy and increase
creativity.
Schwartz required the pigeons not only be different, but get to the
goal, thereby punishing too much variability. Thus, he didn’t get
variability.
Schedules of Reinforcement
Schedules of Reinforcement
Continuous reinforcement (CRF): Each response is reinforced.
• CRF produces rapid acquisition and low resistance to extinction.
• The rate of responding is controlled by time required to consume the reinforcer.
Fixed Ratio reinforcement (FR): Responses are reinforced after a fixed number are emitted.
• FR produces a “pause and run” pattern of responding.
Fixed Interval reinforcement (FI): The first response after a specified interval is
reinforced.
• FI produces a “scalloped” or accelerated pattern.
As the time of the reinforcement approaches,
the responding becomes faster.
• FI responding usually begins about half-way through the interval.
Variable Ratio (VR): The number of responses required to produce a reinforcer varies
according to a mathematical distribution, usually random.
• VR produces a high, steady rate of responding. Slot machines are on VR.
Variable Interval (VI): The first response after variable intervals is reinforced. The
distribution of intervals is usually random, but can follow a range of mathematical functions.
• VI produces a moderate, steady rate of responding.
• VI is useful a baseline for studies of other effects because changes in response rate need not affect
reinforcement rate very strongly.
Schedules of Reinforcement
From: proprofs flashcards
Some Other Schedules
Fixed Time (FT): A reinforcer is delivered entirely on the basis of time, regardless of the
activity of the organism.
Variable Time (VT): A reinforcer is delivered entirely on the basis of time, but the time
varies according to a mathematical distribution.
Differential Reinforcement of Other Behavior (DRO): A reinforcer is delivered after a
specified interval without a specific “target” response.
• DRO is used to eliminate a response without punishment.
Differential Reinforcement of Low Rates (DRL): A reinforcer is delivered if a response
is emitted after a specified interval has elapsed.
• DRL is used to reduce the rate of a response, but not eliminate it.
Differential Reinforcement of High Rates (DRH): Reinforcement is programmed to
reinforce rates above a certain value.
Progressive Ratio (PR): The value of the ratio increases or decreases systematically in one
direction, up or down.
• PR schedules are used to test the motivation to emit responses—how much work the
organism will do for the food.
Some Other Schedule Concepts
Limited (LH): A limited period of time a reinforcer is available on interval
schedules.
• An LH is added to increase the rate of responding and engagement
with interval schedules.
• You would generally write something like FI 20 sec (LH 5) if a
reinforcer was available for only five seconds after the main 20
second FI interval elapsed.
Adjusting Schedule: Any schedule in which the value required changes.
PR is a type of adjusting schedule.
Post-Reinforcement Pause (PRP): The amount of time the organism
pauses after a reinforcer is delivered.
• Usually a consideration in fixed schedules.
Local Rate of Responding: The response rate in a particular part of a
schedule performance, such as the rate of the run in an FR schedule.
Matching Law
33%
• Matching Law: Behavior is distributed
among available alternatives in
proportion to the relative amounts of
obtained reinforcement on the
alternatives.
• This means that if you get 1/3 of your
reinforcement from lever A, and 2/3 from
lever B, you will devote 1/3 of your
responding to A and 2/3 to B.
A
66%
B
Generalized Matching Law
Generalized Matching Law: Solves for the
proportion of behavior accounted for by its
reinforcement relative to the reinforcement for
everything else.
Btarget =
b
(
Rtarget
Rtarget + Reverything else
a
)
b: bias a: sensitivity
Matching Law Facts
• If you change the reinforcement for a behavior
in constant steps, the responding will change
“hyperbolically.”
• That means that the behavior will change
quickly at first, then the effects of the changes
in reinforcement will taper off.
Matching Law Fact
It is important to note that behavior matches various
measures of reinforcement, including its duration,
magnitude, quality, rate, probability, and delay.
The total “hedonic value” of a consequence equals:
Rate x Duration x Magnitude x Quality x Probability
Delay
Matching Law Facts
•If you reinforce two alternatives equally often,
you sometimes see a “bias,” a preference for
one alternative over the other not accounted
for by the schedules.
•Sometimes the change in the response is less
than the change in reinforcer. This is called
“undermatching.” It is due to a sensitivity of
less than 1.0
Matching Law Facts
•Another sign of “undermatching” is the
organism devoting more behavior to the low
probability alternative than is expected by true
matching.
•You might have a 60% -40% ratio of
reinforcement on two keys, yet your organism
distributes its behavior 55% - 45%
Rachlin & Green, 1972
•Rachlin & Green confirmed that delay of
reinforcement is accounted for by
matching.
•Also: Once the reinforcement for an
alternative is great enough, you will
commit to the behavior, giving up the
chance to do the other.
Social Traps
Social Trap: A situation that leads
to a small short-term gain, but a
long-term relative loss.
Quick calculation:
•Unless you have been taught to avoid
the situation, you will reliably choose
the reinforcer with the greatest
hedonic value.
•Always think: “What is the answer to
Reinforcement/Delay?”
Transaction Costs
•Transaction cost: In economics, it is
the cost of making an economic
exchange.
•For us: The response cost required to
shift behavior between alternatives.
•Developed by John Commons in 1933
Delay Discounting
•AKA: Temporal Discounting:
The reduction in the effective value
of a consequence due to the passage
of time.
•Delayed reinforcers are usually
worth less than immediate ones.
Delay Discounting
•Impulsivity: The degree to which a
response is sensitive to temporal delays in
reinforcement.
•High impulsivity means you are highly
affected by delays. You are impatient.
•Low impulsivity means you are less
affected by delays. You are patient.