Download Operant Conditioning and Canis Familiaris

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Social psychology wikipedia , lookup

Social Bonding and Nurture Kinship wikipedia , lookup

Bullying and emotional intelligence wikipedia , lookup

Motivation wikipedia , lookup

Prosocial behavior wikipedia , lookup

Insufficient justification wikipedia , lookup

Abnormal psychology wikipedia , lookup

Learning theory (education) wikipedia , lookup

Observational methods in psychology wikipedia , lookup

Behavioral modernity wikipedia , lookup

Symbolic behavior wikipedia , lookup

Perceptual control theory wikipedia , lookup

Parent management training wikipedia , lookup

Thin-slicing wikipedia , lookup

Organizational behavior wikipedia , lookup

Neuroeconomics wikipedia , lookup

Classical conditioning wikipedia , lookup

Transtheoretical model wikipedia , lookup

Attribution (psychology) wikipedia , lookup

Applied behavior analysis wikipedia , lookup

Theory of planned behavior wikipedia , lookup

Verbal Behavior wikipedia , lookup

Theory of reasoned action wikipedia , lookup

Descriptive psychology wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Social cognitive theory wikipedia , lookup

Behavior analysis of child development wikipedia , lookup

Behaviorism wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
Operant Conditioning of
Canis lupis familiaris
That is, clicker training the domestic dog!
This course is 3-fold
• To learn and understand basic learning theory including:
– Classical conditioning
– Operant conditioning
– Modern Behavior Analysis
• To understand the behavior of the domestic canine:
– Behavioral signaling and communication
– Development of the species and the individual dog
– Social behavior of the domestic dog, including dog to dog and dog to
human interactions
• To apply basic learning theory through applied behavior analysis
(ABA) to teach basic obedience skills, remediate behavioral issues
and prepare shelter dogs for adoption
How do we do this?
• Begin with an overview of learning theory
• Learn the techniques of positive reinforcement
based teaching and Clicker training
• Begin to interact with our dogs and apply our
lecture-based and readings-based knowledge as
we assist our dogs in becoming adoption ready!
Let’s start with some basic
theory!
Classical vs. Operant Conditioning
Clicker training is based on the science
of operant conditioning
• Emerged from area of psychology called Behavior
Analysis
– Experimental Analysis of Behavior
– Applied Behavior Analysis
• Both study how living organisms learn about
contingencies.
– Antecedents: what is/are the setting condition(s)?
– Behavior: what behavior is emitted?
– Consequences: what consequences maintain the
response?
Defining learning
• Learning is…….
– A relatively permanent change in behavior not due to maturation but to
experience and/or practice.
– A learned behavior is predictable
– For our class we will concentrate on 2 kinds of learning
• Classical conditioning is pairing a predictive stimulus with a predicted
event:
– CSUSUR
 CR
– you respond because a stimulus signals an upcoming event
– Do NOT have to respond
• Operant conditioning: Pairing a consequence with a behavior
– RC
– Can add a predictive stimulus: S+: R C
– Animal MUST engage in the response in order to receive the consequence
Pavlov’s Contribution
• Ivan Pavlov
– Russian physiologist: Studied salivation
– 1901: discovered and wrote about classical conditioning
– Found that his dogs reacted to both his presence and the time
of day for feeding/experimentation
• Researched this:
– Measured amount of salivation during baseline:
• Present food to dogs
• Measure slobber
– Then added a predictive stimulus: a Bell
• Presented the BellFood
• Measured slobber to see if dogs would begin to slobber to the bell
Labeled each
part of these events:
• Unconditioned stimulus or US:
– The stimulus that automatically elicited the behavior (usually innate)
– E.g., the food elicited the slobber
• Unconditioned response or UR
– The behavior that is automatically elicited
– Unlearned; often reflexive
• Conditioned stimulus or CS:
– The stimulus that predicts the US
– Is a learned (thus conditioned) stimulus
• Conditioned response or CR:
– The behavior that occurs to the CS
– Often very similar to the unconditioned response
– Occurs because the CS predicts the US
Classical Conditioning Procedure
CS  US  UR
Bell
Food
CR
Slobber with less
Digestive enzymes
Slobber
Order of
presentation is
very important!
Strength or magnitude of
CR
Classical Conditioning learning curve
120
100
80
60
40
20
0
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64
Trials
The CR does not just suddenly
Appear, rather it takes several
trials
Or sessions to learn the connection
Between the CS and the US
Several important characteristics
• The CR is not always identical to the UR….and even can be opposite.
This is called a compensatory response.
• The CR gets stronger with more pairings.
• The CR gets weaker if you stop the CS-US pairings…this is called
extinction
• If similar settings occur to the original CS-US setting, even an
extinguished CR may reappear: This is called spontaneous recovery.
• An extinguished CR can be more easily relearned than a new CS-US
pairing
• A CR will generalize to similar CSs, but one can also learn to
discriminate to a particular CS.
The Law of Effect
• Thorndike (1911): Animal Intelligence
– Experimented with cats in a puzzle box
• Put cats in the box
• Cats had to figure out how to pull/push/move lever to
get out; when out got reward
• The cats got faster and faster with each trial
• Law of Effect emerged from this research:
– When a response is followed by a satisfying state
of affairs, that response will increase in frequency.
Skinner’s version of
Law of Effect
• Had two problems with Thorndike’s law:
– Defining “satisfying state of affairs”
– Defining “increase” in behavior
• Rewrote the law to be more specific:
– Used words reinforcer and punisher
– Idea of reinforcer is strengthening of relation
between a R and Sr
Skinner’s version of
Law of Effect
• Now defined reinforcement and punishment:
– A reinforcer is any stimulus which increases the
probability of a response when delivered
contingently
– A punisher is any stimulus which decreases the
probability of a response when delivered
contingently
• Also noted could deliver reinforcers and
punishers in TWO ways:
– Add something: positive
– Take away something: negative
Reinforcers vs. Punishers
Positive vs. Negative
•
•
•
•
Reinforcer = rate of response INCREASES
Punisher = rate of response DECREASES
Positive: something is ADDED to environment
Negative: something is TAKEN AWAY from
environment
• Can make a 4x4 contingency table
Several important characteristics
• The R gets stronger with R-Sr pairings.
• The R gets weaker if you stop the R-Sr pairings…BUT it
will increase in intensity before it weakens
– Transitory increase in the rate or responding
– Extinction induced aggression
• If similar settings occur to the original R-Sr setting,
even an extinguished R may reappear: This is called
spontaneous recovery.
• An extinguished R can be more easily relearned than a
new CS-US pairing
More parameters:
• Generalization can occur:
– Operant response may occur in situations similar to the one in
which originally trained
– Can learn to behavior in many similar settings
• Discrimination can occur
– Operant response can be trained to very specific stimuli
– Only exhibit response under specific situations
• Can use a cue to teach animal:
– S+ or SD : contingency in place
– S- or S : contingency not in place
– Thus: SD: RSr
Schedules of Reinforcement:
• Continuous reinforcement (Sr):
– Reinforce every single time the animal performs the
response
– Use for teaching the animal the contingency
– Problem: Satiation
• Solution: only reinforce occasionally: PARTIAL Sr
–
–
–
–
Partial reinforcement
Can reinforce occasionally based on time
Can reinforce occasionally based on amount
Can make it predictable or unpredictable
Partial Reinforcement Schedules
• Fixed Ratio: every nth response is reinforced
• Fixed interval: the first response after x amount of
time is reinforced
• Variable ratio: on average of every nth response is
reinforced
• Variable interval: the first response after an
average of x amount of time is reinforced
Variable ratio and variable interval
Cumulative records
What is Clicker Training
• Popular term for training/teaching method of
operant conditioning
– Can be used with any living organism
• Gold fish
• Dogs
• Humans!
• Very simple process:
– S+ RSrcSr
– Cue response markerreinforcement
Clicker training
• System of training/teaching that uses positive
reinforcement in combination with an event
marker
• The event marker (click) “marks” the response
as correct
An Atypical way to train
• You WATCH for the behavior to occur
– Wait for the behavior or approximation of that behavior
– Don’t lure or force
• MARK the behavior: click when the behavior is occurring to
signal to the animal “yes, do THAT!”
• After marking the behavior, REINFORCE the response!!
• Clicker = Precise tool: an event marker
– The “click” pinpoints a behavioral instance
– Informs the animal that that was the correct targeted response
Why not just use your voice?
• Clicker is unique and distinct; your voice is not.
• Is a novel unconditioned signal thus can be
quickly paired as an event marker
• Used to teach NEW behaviors
• But: we WILL fade the clicker and replace it with
cue for the response:
– This could be a vocal cue, sign, etc.
Three hypotheses for
why the Clicker is effective!
• The reinforcing hypothesis
• The Marking hypothesis
• The Bridging hypothesis
The Reinforcing Hypothesis
• The reinforcing hypothesis:
– The clicker is a secondary reinforcer:
– R Sr-clickSr-food
– The click predicts food (CS-US) and so becomes
associated with the food reinforcer and becomes
reinforcing in and of itself
• Data support this hypothesis the most
• (although there are very little data)
The MarkingHypothesis
• The marking hypothesis:
– The click marks the behavior in time
– R Sr-clickSr-food
– The click tells the animal precisely what behavior was
reinforced.
– That is, it serves as a communicative feedback….do
THAT
• Data also support this hypothesis (although there
are very little data)
The Bridging Hypothesis
• The bridging hypothesis:
– The click bridges the time gap between the behavior and the primary
reinforcer.
– R Sr-clickSr-food
– Dopamine (DA) is released when an animal gets a primary reinforcer
• The operant behavior or CS becomes associated with that reinforcer
• After many pairings, the DA release occurs to the PREDICTOR (the R or CS) in
prediction of the primary reinforcer
– The click, because it is a secondary reinforcer and marks the behavior,
also allows release of DA as part of the feedback system
– That is, helps bridge the time delay between the behavior and the
response
– The release of DA tells the animal to “keep going”
• Data also support this hypothesis
• (although there are very little data)
Why should you use a clicker?
• Very powerful teaching tool
• According to Karen Pryor, clicker training
– Accelerates learning
– Strengthens the human-animal bond
– Produces long term recall
– Produces creativity and initiative
– Forgives your mistakes
– Generates enthusiastic learners
We use positive reinforcement and
TEACH to behavior
• We avoid using P+ or P– Many many reasons (avoidance, poor learning,
poor attention…..)
• We teach appropriate ways to get what a dog
wants
• We teach a replacement behavior for
inappropriate behaviors!
Examples of learning vs.
environmental manipulation
• Want to keep dog out of kitchen:
– Put up a gate: dog can’t get in, so behavior decreases
– Does not alter the contingency of going into the kitchen
– The dog has learned nothing
• Want you to sit in a chair
– I poke you behind the knees and you fall into the chair
– You increased “chair sitting” but didn’t learn chair sitting!
– Your behavior is not predictable when presented with the
chair
– or worse yet, you are now afraid of the chair and avoid it!
ABC’s of Operant Conditioning
•
As behavior analysts we focus on the ABCs of behavior.
•
Antecedents: Behavior Analysts determine what cues or sets off the behavior
– Identify the setting conditions
– Alter the setting conditions to give us environmental control
– Allows us to control what happens prior to the behavior
– Can introduce cues: Learned Antecedents
•
Behavior: Behavior Analysts target which response to increase or decrease
– Must shape and control behavior within physical limits of the organism
– Whether you increase or decrease a response depends on the results of that
response
•
Consequences: Behavior Analysts decide the consequence of the behavior
– No consequence (ignore the response)
– Reinforcer the response
– Punish the response
– Consequences are the Behavior Analyst’s most important tool!
Which consequence should we use?
• Punish the behavior?
– Decreases the probability of the behavior
– Can result in unstable responding, particularly
with negative reinforcement
– Can result in learned helplessness, avoidance and
aggression!
– Often are ethical limitations
Which consequence should we use?
• Ignore the behavior?
– Decreases the probability of the behavior
– Process of extinction
– Two problems:
• Extinction burst
• Extinction-induced aggression
– What is the organism learning?
Which consequence should we use?
• Positively Reinforce a behavior:
– Increases the probability of the behavior
– Can reinforce the opposite of the response you
are trying to decrease!
– Creates a “fun” learning environment
– Data suggest that organisms trained with positive
reinforcement WANT to work!
Which consequence should we use?
• But wait: won’t positive reinforcement make
greedy organisms?
– Initially, we use continuous reinforcement
– Gradually we thin out the rate of reinforcement using
partial schedules of reinforcement
– More and more responding or chains of behavior
required to get a reward
• When you were kindergarten, you needed lots of
reinforcers every day
• Now in college you can work all semester for that
final reinforcer of an “A”.
What skills are necessary to become a
good clicker trainer?
•
Must be an excellent observer of behavior
–
•
Must be precise with your clicker
–
–
•
When learning a new response, the animal needs lots of feedback
Reinforcement variety improves the learning process
Reinforcers must be of value to the learner (what THEY like, not what YOU like).
Barney stickers or Beer?
Must use the clicker as a conditioned reinforcer:
–
–
–
•
Must be quick and “catch” and “mark” that response
May introduce a “keep going” signal, too!
Must be generous with reinforcement
–
–
–
–
•
Must be able to identify the response or component of the response
The clicker derives value from it being tightly paired with the primary reinforcer
Use as a bridge or a “yes, keep going” signal
We will call the clicker an event marker
Must be consistent!
–
–
The animal is learning the rule, so the rule must be consistent
Only when the response is solid will you move to partial reinforcement
Three important class components:
• During Monday lectures we will:
– Learn the theory of operant conditioning
– Learn about our organism: the domestic dog
• During our labs we will apply our information
that we learn in the classroom with each other
and with the dogs.
• Are you ready to do this?