Download Chapter 4 Learning

Document related concepts

Perceptual control theory wikipedia , lookup

Transcript
Instrumental Learning
All Learning where an animal operates on its
environment to obtain a reinforcement.
Operant (Skinnerian) conditioning
1. Thorndike and the law of
effect
• The animal has an increased probability to
repeat the behavior that was emitted just
before the reward.
• As Thorndike would say, “the memory
becomes stamped in”
Many different type a learning
apparatus were tried between 1900-1945
• Escape learning for cats (Thorndike,
Guthrie).
• Rat Jumping Stand (Myers)
• Complex Maze Learning (Tolman,
Lashley)
• T-maze (Hull, Spence)
• Morris water-maze (modern day
development)
The Common element
• All used a fixed trial presentation method.
• Subjects had a fixed experience across
animals and the learning varied per
animal.
• Subjects had a variable experience but
had a fixed criterion of learning that must
be obtained.
• One looked at the time it took to acquire
the goal (mazes) or the number of trials to
reach the criterion.
Skinner and Operant Behavior
The unique feature of operant training is the
experimenter waits until the animal does
the specified response before a reward is
given. This is called free operant behavior.
Reward vs. Reinforcement
• A reward is a global state of affairs given
to the whole animal (food, electric shock).
• A reinforcement is for the specific discreet
response done by the animal to obtain the
reward.
Primary reinforcers
• Eating, drinking & sexing
• Addicting drugs
• For animals the equivalent of money for
humans, e.g., poker chips, marbles.
• When used in this way it is called
Condition reinforcers
Positive and Negative Reinforcement
• Both positive and aversive stimuli can be
used to guide behavior. Both are used to
increase a desired response. The
reinforcement is delivered close in time
after the emission of the desired response
is accomplished.
Reinforcement & Punishment
• Concept – Positive Reinforcement
Description
• Increasing the frequency of a behavior by
following it with the presentation of a
positive reinforcer – a pleasant, positive
stimulus or experience
Example
• Saying “Good job” after someone works
hard to perform a task.
Types of reinforces
• Appetitive – usually food
• Negative --- shock, air puff; those stimuli
that deliver pain or discomfort.
Positive Reinforcement
Concept:
• Negative reinforcer
Negative Reinforcement
Note the following
• The removal of a negative stimulus is
positively reinforcing – the animal will tend
to do that behavior that removes itself from
the cues associated with the aversive
state of affairs.
Reinforcement/Punishment
Shaping
• Shaping is the method by which one gets
the animal to accomplish the desired
response in the first place.
• The final behavior desired is broken down
into small steps or increments. The
accomplishment of the first step leads
directly to the next step in the chain.
How to train a monkey to hit a key.
Continuous reinforcement
• A reinforcement is given for every desired
response. Stop giving the reinforcement
the animal stops responding..
Intermittent reinforcement
• Intermittent reinforcement is more
resistant to extinction than continuous
reinforcement.
Appetitive Schedules of reinforcement
• Schedules of reinforcement are base on
two criteria., number of responses, or the
passage of time.
Ratio Schedules (FR)
• Fixed ratio schedule delivers a
reinforcement after a given number of
responses has been formed.
Variable Ratio (VR)
• Here the number of responses varies
about a mean response rate
• Slope is not quite as steep as fixed ratio
Fixed interval (FI)
• Here, a reinforcement is delivered after the
first response after the passage of a fixed
amount of time.
• Note the scalloping of the cummulative
record.
Variable Interval (VI)
• Variable interval is similar to FI schedule
except it is the time lapse between the
availability of successive reinforcements
that is varied. For example, 1, 3, 2, ect.
The interval is named after the mean
amount of time past. Again the
reinforcement is delivered after the first
response after the interval has past.
VI
• Note that in variable interval schedules
one does not see the scalloping one sees
in FI schedules. The slope is not as steep
as in VR not FR schedules
Differential Reinforcement for Rate
• In ratio schedules there is a contingency
between the rate of responding and the
rate of reinforcement. That is the faster
the animal responds the faster it gets a
reinforcement. The contingency is not as
strong for interval schedules but still there.
Setting up a Differential Rate
• One sets up a contingency between the
numbers of responses within a given time
interval for reinforcement. The key is to
control the rate of response per unit time,
i.e. control the inter-response time (IRT)
Differential Reinforcement for High
Rates (DRH)
• Here the animals must respond 10 times
in 5 seconds as and example. Each time
this criteria is met the animal get
reinforced after the last response
Differential Reinforcement for Low Rates of
Responding (DRL)
• Here the animal must inhibit early
responses to meet a criterion of say 10
sec. If the animal responds prior to the 10
sec a clock is reset and the animal must
start the wait period over.
Current theory postulates two
underlying processes
• The animal forms a temporal
discrimination.
• The animal actively inhibits responding.
(uses ancillary responses, not to the
requisite key, or bar to pass the time).
DRH/DRL
• Respond within a window of time. Must
respond after a specific time has past,
must not allow an upper time span to be
exceeded.
• Wyler/Prim study using single neuron
Negative Control of Behavior
• Behavior emitted that removes an aversive
state of affairs.
Negative reinforcer
Description: Increasing the frequency of a
behavior by following it with the removal of
an unpleasant stimulus or experience
Concept
• Avoidance conditioning
Avoidance conditioning
• Description: Learning to make a response
that avoids an unpleasant stimulus.
Example
• You slow your car to the speed limit when
you spot a police car, thus avoiding being
stopped and reducing the fear of a fine;
very resistant to extinction
1. Escape and Avoidance
The control of Intrinsic behavior
• Avoidance tasks the removal of one-self
from an environment which has previously
been associated with a negative
reinforcement.
Sidman Avoidance
• Shock-Shock interval (shock every 5 sec)
S. A. (cont.)
• Response shock interval (time delay of
shock/bar push)
S. A. (cont.)
• Very, very hard to extinguish.
• VAN - chimp
VIII. Punishment – different
types
•
Punishment 2 (Penalty)
Example
• You learn to use the mute button on the
TV remote control to remove the sound of
an obnoxious commercial
Concept
• Escape Conditioning
Escape Conditioning
• Description: Learning to make a response
that removes an unpleasant stimulus
Example
• A little boy learns the crying will cut short
the time that he must stay in his room
Concept
• Punishment
Punishment
• Description: Decreasing the frequency of a
behavior by either presenting an
unpleasant stimulus (punishment 1) or
removing a pleasant one (punishment 2
(penalty).
Example
• You swat the dog after it steals food from
the table, or you take a favorite toy away
from a child who misbehaves. A number of
cautions should be kept in mind when
using punishment (see below for an
example).
Learned helplessness
• Continued punishment until the animal
refuses to respond even when there is no
aversive state of affairs.
Combined Operant and C. C.