Download learning summaries – operant conditioning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Applied behavior analysis wikipedia , lookup

Attribution (psychology) wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Verbal Behavior wikipedia , lookup

Solution-focused brief therapy wikipedia , lookup

Insufficient justification wikipedia , lookup

Behavior analysis of child development wikipedia , lookup

Descriptive psychology wikipedia , lookup

Behaviorism wikipedia , lookup

The Morals of Chess wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
LEARNING SUMMARIES – OPERANT CONDITIONING
In Classical Conditioning we learn to anticipate that important events are about to take
place by coming to recognize that these important events are often predicted by other
events. This helps us prepare for these crucial times. But, obviously, we need to do more
than get ready for threats or the chance to eat or reproduce. We need to know how to
avoid threats and find food and chances to reproduce.
The ability to do things which make us safer, help us find things to eat and drink, and
get close to other people develops through another type of learning called operant
conditioning. Through this process we actively make choices and do things which result
in us getting the things we need whether food, fun, sex, or money.
The most important concept in operant conditioning is reinforcement, an event which
increases the chance that the action which came before it will happen again. For
example, if after every time you walk into IVCC and say hello to the first person you see
you get $50, you will likely say hello to the first person you see when you walk into
IVCC. The $50 reinforces the action/behavior which came just before you received it saying hello. If getting the $50 makes you feel good, but doesn’t make you say hello the
next time you walk into IVCC, it isn’t reinforcement. In other words, reinforcement must
have an effect on your behavior.
This concept was first described by Edward Thorndike over 100 years ago. He
originated the Law of Effect which holds that we are more likely to repeat actions which
lead to (what we view as) favorable outcomes. In other words, if you are hungry and you
learn that asking your Mom for a sandwich will get you one, you are very likely to ask
her for one. Similarly, if you know that your coach will play you more (a favorable
outcome) if you play good defense, you are more likely to play good defense so you can
play more.
Remember that you are actively doing something, playing defense or asking for a
sandwich, to get something you want. In classical conditioning you never got the chance
to do anything, certain things were going to happen and it helped you get ready for them
but you were powerless to control, increase or stop them. In operant conditioning
though you are doing things that change what you will experience.
While Thorndike was the first to discuss these ideas in the study of behavior, it was a
psychologist named B. F. Skinner in the 1930’s and 1940’s who really described and
explained how operant conditioning works both with regard to you and I as well as
birds, mammals, even fish.
B. F. SKINNER & the BASICS of OPERANT CONDITIONING
Skinner is called “the father of operant conditioning”. He invented the most famous
way to study it, putting rats and pigeons in a small metallic box he called an “operant
chamber”. Everyone else calls it a “Skinner box”. In this he would explore the laws of
reinforcement (and punishment) by giving rats little pellets of food if they would pull
down on a bar or pigeons pellets if they would peck a designated circle. Since both the
pigeons and the rats were quite hungry they would do anything for the pellets, the pellets
were powerful reinforcers because the animals would repeat any actions which caused
the delivery of the pellets.
But how did Skinner get the rats/pigeons to press on the bar or peck the circle in the
first place? Neither animal engaged in either action when first placed in the Skinner box.
He did this by reinforcing actions which came close (and then closer) to the desired
behavior, a process he named shaping. For example, when the rat first was placed in the
box he only gave it a pellet when it was in the half of the box where thee lever was
located. This reinforced that choice by the rat, to stay in the portion of the box close to
the lever. Soon the hungry rat spent all of its time in the half of the box near the lever.
Next Skinner only gave the rat a pellet if it was in the quarter of the box near the lever,
then when the rat was next to the lever, then only when the rat faced the lever. Soon a
pellet was only given when the rat raised its paw next to the lever, then only when it
touched the lever, finally only when its paw pulled the lever down. Thus, gradually,
Skinner shaped the rat’s behavior, causing it to do something it would not have done
without the consistent delivery of the reinforcement.
Skinner also taught animals complex behaviors through a process he called chaining.
Please look in your text for a detailed description.
Through these processes Skinner could teach an animal to perform amazing tricks.
These concepts are now used every day, everywhere, to change the behavior of animals
and humans whether in zoos, schools, prisons or homes.
Another way to change behavior is through punishment anything which will lessen
the chance that the action/behavior which came before it will happen again. For
example, imagine if Larisha makes a bad pass and Coach Crick pulls her out of the game,
providing a consequence. If this decreases the odds that Lrisha will throw a bad pass in
the future, the consequence - pulling her out of the game, is punishment. If Larisha
continues to make bad passes, even if Coach Crick takes her out, then pulling her out of
the game is not punishment, even if Larisha doesn’t like it. Remember, consequences
have to change future behavior patterns to earn the labels punishment or reinforcement.
Punishment can also occur when a painful consequence follows an action or behavior,
as long as the painful consequence stops us from engaging in the behavior in the future.
For example, walking off trails and through weeds in the summer can cause me to suffer
from the severe skin rashes and itching caused by poison ivy. These consequences
(itching and rashes) keep me from walking off the trails. Since they make this behavior
less likely to occur, they are punishment for me.
Also, reinforcement can arise from removing a condition we don’t like. For instance,
if taking a pill makes pain or sadness go away, we will probably take the pill next time
we are in pain or sad. If we do take the pill when in pain or sad, then we are looking at
reinforcement because the frequency of the behavior has increased because of its
consequence – taking the pain or sadness away.
Picking Reinforcers
To use the principles of operant conditioning to change behavior we need to find
effective reinforcers. This is not as easy as it sounds. Different people are influenced by
different things at different times. We will do something for food at noon, but maybe not
at 7 a.m... We like to play basketball usually, but maybe not as much after a three hour
practice. How can we select the right reinforcer?
There are two ways. David Premack thought that we can pick an appropriate reinforcer
by looking to see what people like to do. For example, he would children do when ever
they had free time. He would count how much time they spent doing various things.
Perhaps the child spent 90% of his time playing video games and just 10% reading. If we
want to increase the time the child spends reading, we should only let the child play video
games if he has already spent some time reading.
This strategy is called the Premack Principle using the chance to engage in more
common behaviors (in this case, playing video games) as reinforcement for performing
less common behaviors (reading).
Another strategy claims that allowing someone the chance to return to typical routines
will be reinforcing. In other words, if we prevented the child from reading (part of his
typical routine) for long enough, he would clean his room for the chance to read. This is
called the Disequilibrium Principle.
Also, some reinforcers will affect our behavior from the moment we are born such as
food, water, and affection. These are called unconditional reinforcers because we don’t
have to learn of their value. For some reinforcers though, we have to learn their value.
These are called conditional reinforcers and we eventually learn that they can help us get
access to unconditional reinforcers. The best example of such a reinforcer is, of course,
money since it can get us food, water, etc.
MORE OPERANT CONDITIONING CONCEPTS
Generalization – if a behavior works for us (we receive reinforcement or avoid
punishment) in a certain situation, we are likely to repeat the action in a similar situation.
For example, if you scored a lot of points with your jump hook when you played in
Turkey, you are likely to use it here at IVCC. This response or behavior (your jump
hook) has generalized to a new situation.
Discrimination – if you can use a cross-over dribble to get to the basket and score points
in Rockford, you are likely to try your cross-over here at IVCC. However, if you discover
that opposing players are consistently stealing the ball from you when you try your crossover, you will probably stop using it here. You have learned to discriminate between the
two different situations.
For example, if you tell a joke to some of your teammates and they laugh, you will
probably be encouraged to tell the same joke to the rest of your teammates. That behavior
(telling the joke) has spread or generalized. However, you might know that your parents,
minister, Imam or Mullah would not think that joke was funny so you would not share it
with them. You have learned to discriminate between the differing situations based upon
the likely consequences of telling the joke.
Extinction – if we stop receiving reinforcement after performing actions which were
formerly reinforced, we will probably eventually stop performing the actions. When
behaviors stop because they are no longer reinforced, we say that the behavior has gone
through extinction. But behaviors depending on how they were originally reinforced go
through extinction in different ways. To understand these important differences we must
carefully describe how reinforcement is administered through various reinforcement
schedules.
Reinforcement Schedules
Continuous reinforcement – if reinforcement follows every time the correct behavior is
performed, we have a continuous reinforcement schedule. For example, every time we
place money in a soda machine we expect to get a soda in return. Also, if we work we
expect to get paid every week, two weeks, or twice a month.
Intermittent reinforcement – Most of the time, we don’t get reinforcement every time
we do something. People don’t always say “Thank you”, when we help them or smile
back every time we smile at them. Often we know this, that we will have to make a
number of responses (a ratio) or wait some amount of time (an interval) before we get
our reinforcement. There are four types of intermittent reinforcement schedules:
1) Fixed ratio: a certain, constant number of responses have to be made before
reinforcement is received. Migrant workers must pick a bushel full of peaches
before they get paid, not just one peach. A college student must complete a
semester of class work before getting credit for the class, not some small amount
of credit for every class he/she attends.
2) Variable ratio: an unknown, varying number of responses have to be made
before reinforcement follows. When we play a slot machine we don’t know how
many times we’ll have to play to hit the jackpot. It might be 5, it might be 500.
3) Fixed interval: we have to wait a certain, consistent amount of time before we
get reinforced. In most jobs, we don’t get paid every time we work, we have to
wait till Friday, or every other Friday, or maybe twice a month.
4) Variable ratio: we have to wait an unknown amount of time before
reinforcement follows. We’re never really sure how long we’ll have to wait
before our favorite band comes out with their next cd or when our favorite team
will win their next championship so we just wait while still going to their games
or rooting for them while we watch on TV.
Extinction under Intermittent Reinforcement
If we have been continuously reinforced for performing an action in the past, we will
quickly stop making the response if reinforcement ends. If the soda machine doesn’t’
give you a Pepsi or Coke after you put in your 65 cents, you won’t put in another 65
cents.
However, if you have been reinforced through an intermittent schedule, you won’t stop
responding if you don’t receive reinforcement. Behaviors reinforced intermittently go
through extinction much more slowly, if go extinct at all. How many people do we know
who bet on one of the lotteries every week (or more) though they rarely, if ever win
(receive reinforcement)?