Download reinforcement

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Developmental psychology wikipedia , lookup

Educational psychology wikipedia , lookup

Motivation wikipedia , lookup

Theory of planned behavior wikipedia , lookup

Thin-slicing wikipedia , lookup

Theory of reasoned action wikipedia , lookup

Sociobiology wikipedia , lookup

Attribution (psychology) wikipedia , lookup

Cognitive development wikipedia , lookup

Neuroeconomics wikipedia , lookup

Learning theory (education) wikipedia , lookup

Descriptive psychology wikipedia , lookup

Applied behavior analysis wikipedia , lookup

Parent management training wikipedia , lookup

Insufficient justification wikipedia , lookup

Adherence management coaching wikipedia , lookup

Classical conditioning wikipedia , lookup

Verbal Behavior wikipedia , lookup

Behavior analysis of child development wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Behaviorism wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
Operant Conditioning – Module 19
Cognitive (Latent) Learning – Module 19
Intro Psychology
Oct 21-23, 2009
Classes #23-24
Instrumental Conditioning

E. L. Thorndike (1905)

Described the learning that
was governed by his "law of
effect" as instrumental
conditioning because
responses are strengthened
when they are instrumental
in producing rewards
 Law
of Effect
 Responses that are
rewarded are more
likely to be repeated
and responses that
are produce
discomfort are less
likely to be repeated
Thorndike's Puzzle Box

In his classic experiment, a cat was locked
in the box and enticed to escape by using
food that was placed out of the reach from
the box
 The box included ropes, levers, and
latches that the cat could use to escape
 Trial and error behavior would lead to
ultimate success (usually within three
minutes)
 Thorndike felt we learned things
through trial and error – awareness
Gestalt Viewpoint
 Wolfgang

Kohler
A Gestalt psychologist had an opposing
view is that we learn things implicitly –
unawareness – natural insight
 Example: gorilla in a cage – food out
of reach – but stick is not…
Operant Conditioning
 Operant

Conditioning
A type of learning in which voluntary
(controllable and non-reflexive)
behavior is strengthened if it is
reinforced and weakened if it is
punished (or not reinforced)
Skinner (1938)
 The
organism learns a response by
operating on the environment…

Note:


The terms instrumental conditioning and
operant conditioning describe essentially the
same learning process and are often used
interchangeably
Basically, Skinner extended and formalized
many of Thorndike's ideas
Operant Conditioning

Response comes first and is voluntary
unlike classical where stimulus comes first
and response is involuntary
 Classical: S  R
 Operant: S  R  S
that becomes
RS
The Skinner Box

Soundproof chamber
with a bar or key
that could be
manipulated to
release a food or
water reward
Shaping:
Reinforcing successive approximations

Responses that come successively closer
to the desired response were
reinforced…



Skinner referred to this as his “Behavioral
Technology”
Taught pigeons “unpigeon-like” behaviors
Walking in Figure 8, playing ping-pong, and
keeping a “guided missile” on course by
pecking at a moving target displayed on a
screen…but most proud of getting them to
hoist an American flag and then to salute it
B.F. Skinner (1904-1990)
In the Lab…
Operant Conditioning

Important terms
 Primary Reinforcers
 Conditioned (Secondary) Reinforcers
 Positive Reinforcement
 Punishment
 Negative Reinforcement
Reinforcers

Primary Reinforcers




Innately rewarding; no learning necessary
Stimulus that naturally strengthens any
response that precedes it without the need for
any learning on the part of the organism
Food, water, etc.
Secondary Reinforcers



A consequence that is learned by pairing with
a primary reinforcer
For people, money, good grades, and words of
praise, etc. are often linked to basic rewards
We need money to buy food, etc.
Positive Reinforcement

Behavior is strengthened
when something pleasant or
desirable occurs following
the behavior

With the use of positive
reinforcement chances that
the behavior will occur in the
future is increased
Punishment

Any stimulus presented immediately after a
behavior in order to decrease the future
probability of that behavior
 For example:
 If your kid runs into the middle of the street
and you flip out and “express to him how
bad he is” this (at least in psychological
terms) is only considered to be punishment
if it does in fact lead to a decrease in that
child’s behavior of running into the street
Negative Reinforcement


One of the most misunderstood terms in
psychology…
Definitely a problem with semantics here


The word reinforcement means that a response is
strengthened
The word negative seems to imply that the
response is somehow weakened




This is not the case here!
So how literally can a response be negatively
reinforced???
Often, this term is misapplied to term punishment
So lets try to proceed slowly in our
attempts to figure this out…
Negative Reinforcement



Positive Reinforcement is a reward
 That’s easy enough
Punishment is something that weakens a
response
 Again, this is pretty basic
In an attempt to increase the likelihood of a
behavior occurring in the future, an operant
response is followed by the removal of an aversive
stimulus. This is negative reinforcement…
 Example: When a child says "please" and
"thank you" to his/her mother, the child may
not have to engage in his/her dreaded chore of
setting the table
Negative Reinforcement
So we are learning to do something to
turn off a bad stimulus
 Example: We put on boots to prevent
sitting in class with wet socks on
 Increasing a behavior to stop a bad thing
from occurring
 Doing something to remove the reinforcer

Types of Negative Reinforcement

Escape Conditioning


This occurs when the behavior has led to a reduction of the
aversiveness of the environment
 Example: Rats moving away from the shock area after
feeling the pain
 This does involve an observable change in the
environment
Avoidance Conditioning

When a behavior has prevented the onset of an impending
increase in the aversiveness of the environment
 Example: Rats moving away from the shock area after
hearing a signal that the shock is about to be
administered
 A child apologizes upon seeing their parent frowning
thus avoiding being yelled at
 Involves no observable change in the environment
Schedules of Reinforcement

Continuous Reinforcement


Reinforcement delivered every time a
particular response occurs
Intermittent Reinforcement

Reinforcement is administered only some of
the time
Intermittent Schedules of
Reinforcement

Fixed-Ratio


Reinforcement
provided after a fixed
number of responses
 Food every tenth
bar press
Variable-Ratio

Reinforcement after a
a variable number of
responses (works on
a average)
 Unpredictable
number of
responses are
required (slot
machines)
Intermittent Schedules of
Reinforcement

Fixed-Interval Schedules



Provides reinforcement for the first response that occurs
after some fixed time has passed since the last reward
Number of responses doesn’t matter only time
 Example: Food is given to rats every 20 min.
Variable-Interval Schedule



Reinforce the first responses after a certain amount of
time has past
Again number of responses doesn’t matter
But this time the amount of time changes
 Might be the first response after ten minutes then the
next time it is the first response after 20 minutes,
and then the next time it is the first response after
30 min…
Applications of Operant Conditioning: In the
Classroom




Skinner thought that our education system was
ineffective
He suggested that one teacher in a classroom
could not teach many students adequately when
each child learns at a different rate
He proposed using teaching machines (what we
now call computers) that would allow each
student to move at their own pace
The teaching machine would provide self-paced
learning that gave immediate feedback,
immediate reinforcement, identification of
problem areas, etc., that a teacher could not
possibly provide
Applications of Operant Conditioning: In the
Workplace
 Pedalino & Gamboa (1974)
 To help reduce the frequency of employee
tardiness, these researchers implemented a
game-like system for all employees that
arrived on time
 When an employee arrived on time, they were
allowed to draw a card
 Over the course of a 5-day workweek, the
employee would have a full hand for poker
 At the end of the week, the best hand won $20
 This simple method reduced employee
tardiness significantly and demonstrated the
effectiveness of operant conditioning on
humans
Criticisms Of The Use Of
Reinforcement

Criticism #1:
 Behavior should not have to rely on
persuasion…
 It
is manipulative and controlling
 Appropriate behavior should be the norm
 Skinner says we are always controlled by
rewards but often are unaware of these…
 Parents, peers, schools, employers, etc. all
use rewards to control our behavior

Skinner:

If its manipulative then everyone is to blame?
Criticisms Of The Use Of
Reinforcement
 Criticism

#2:
Reinforcement undermines Intrinsic
Motivation…
 Messes up our inner desire to do
something
 Now we need to do it for a tangible
reward
 Example: Child cleaning his/her room…
 Why do they do it?
 Be careful of overjustification…
Cognitive Learning
Focus on the role of thinking processes in
learning
 Theory based on unseen internal factors
rather than on external factors


Skinner was very much against these theories
but lets look at one…latent learning…
Latent Learning

Tolman and Honzik (1930)
 Took three groups of rats and had them run a
maze



Group 1
 Reinforced every time they found their way out of
the maze (food box) for ten days
Group 2
 Never reinforced (no food at the end)
Group 3
 Reinforced only after day 10 of the experiment
Latent Learning

On day 11, they timed the three groups to
see which group would make it through
the maze the quickest…

Which group do you think was the
fastest?