Download conditioned reinforcer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Attribution (psychology) wikipedia , lookup

Theory of planned behavior wikipedia , lookup

Applied behavior analysis wikipedia , lookup

Theory of reasoned action wikipedia , lookup

Classical conditioning wikipedia , lookup

Perceptual control theory wikipedia , lookup

Shock collar wikipedia , lookup

Verbal Behavior wikipedia , lookup

Behavior analysis of child development wikipedia , lookup

Learning theory (education) wikipedia , lookup

Social cognitive theory wikipedia , lookup

Behaviorism wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
Response-Reinforcer Relation
• Two types of relationships exist between a response and a reinforcer.
– 1. Causal Relationship between a response and the reinforcer
• Contingency: (R) - (O) doing(R) produces (O)
• extent to which response is necessary and sufficient for occurrence of the
reinforcer
– 2. Temporal Relationship between the response and reinforcer
• amount of time between response and reinforcer.
• Temporal Contiguity: (R) - (O) , outcome is immediate
• Or with some delay (R) - - - -delay- - - - - (O)
• Response-reinforcer contingency and temporal contiguity are
independent of each other
– Can have (R)-(O)contingency with either short or long temporal relationship
• Mow the lawn, get paid immediately or get paid next Tuesday
– Can have immediate outcome but only get the outcome part of the time
• Put money in a slot machine, sometimes you win and sometimes you do not
Effects of Temporal Contiguity
• Temporal contiguity (R) - (O)
– immediate outcome provides the best learning
– Even short delays CAN hinder learning
• Dickinson et al (1992)
– rats were reinforced for lever-pressing
– varied delay between response and reinforcer
– As the delay between the response and reinforcer increased the conditioned
lever pressing decreased dramatically see Figure 5.11
• Why is instrumental conditioning so sensitive to a delay of
reinforcement?
– Is this also true for human behavior?
• How much delay can people tolerate?
• What are some examples of long delay between behavior and outcome?
• Dealing with delay of reinforcement
– However even rats can tolerate some delay if they bridge the delay
• with a conditioned reinforcer
• Or with a marking procedure
The Principles of Learning and Behavior, 7e by Michael Domjan
Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.
Secondary or Conditioned Reinforcer
• Primary reinforcers: usually food, drink, and pleasure
• Secondary, or conditioned reinforcer
– associated with the primary reinforcer
•
•
•
•
•
Present clicker – food pairings for dog training
Then response “sit” -- outcome “clicker“ for training the dog to sit
clicker is a conditioned reinforcer used to bridge gap until primary reinforcer
However clicker --- food only needs to be reinstated occasionally
Response “sit” -- outcome “verbal praise” for training a dog to sit
– Is verbal praise a conditioned reinforcer?
– What works as conditioned reinforcers for humans?
• What is a common conditioned reinforcer for people?
• Coaches, instructors, parents use verbal praise
• Is verbal praise a conditioned reinforcer?
– What is the primary reinforcer?
Marking Procedure
• Marking the target response to bridge delayed reinforcement
– Use a specific stimulus such as a flashing light in conjunction with the response
– In discrete trials “maze” moving the rat to a holding chamber during the delay
• Lieberman et al (1979) tested whether rats could learn a correct turn or choice in a
maze despite a long delay of reward
• Williams (1999)
–
–
–
–
–
–
–
–
No signal group: lever press with 30 second delay then food delivery
Marked group: lever press, 5 second light, 25 second delay then food delivery
Blocked group: lever press, 25 second delay, 5 second light before food delivery
See Figure 5.12
marking improves learning while blocking prevents learning
marked group can use the light signal to fill the delay
in the blocked group the light just before the food interferes with learning
However, even the no signal group shows some learning
• so they must be bridging the delay by some other means
• such as ?
The Principles of Learning and Behavior, 7e by Michael Domjan
Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.
Response-Reinforcer Contingency
• Contingency: the outcome is "contingent on" a particular response
– positive contingency response produces outcome
– negative contingency response prevents or eliminates outcome
• Studies of delay of reinforcement demonstrated
– that a perfect contingency (R – O) is not sufficient to produce strong
instrumental responding
– Led to the conclusion that response-reinforcer contiguity, rather than
contingency, was the critical factor.
• Skinner’s Superstition Experiment supported this conclusion
– Food presented to pigeons every 15 seconds regardless of the behavior of
the bird.
– Birds showed stereotyped behavior patterns
– Skinner’s operant conditioning explanation
• Adventitious (accidental) reinforcement of the bird’s behavior
• Stressed the importance of contiguity between response and the reinforcer
• However Skinner got this one wrong
Staddon and Simmelhag Superstition Experiment
• A landmark study that challenged Skinner’s interpretation.
– See Figure 5.13
– Similar procedure except fixed time interval of 12 s
– birds were observed and their behavior recorded on all sessions.
• Found two types of responses at asymptote:
– Interim responses: started immediately after food delivery but terminated
several seconds before food (e.g., turning circles, flapping wings), differed
among pigeons and intervals
– Terminal responses: started mid-interval and continued until food was
delivered.
• For all pigeons the terminal response was pecking
• terminal responses were reinforced
– Differences in Interim and Terminal responses can be explained by behavior
systems
• Terminal responses are species-specific behavior that is part of focal search
• Interim responses are more like general search beahvior
The Principles of Learning and Behavior, 7e by Michael Domjan
Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.
Controllability of Reinforcers
• Response controls the reinforcer
– A strong contingency allows control over the reinforcer.
– A weak contingency does not allow control over the reinforcer
– can be seen in positive reinforcement
• Eating candy for example
• However, most of the research has used Negative Reinforcement
– Which has more clinical application
– Responses remove or prevent an aversive event
• Taking drugs to reduce pain
• Jumping back to avoid getting run over
– Early experiments used escape-avoidance learning in dogs
• Unavoidable “uncontrollable” shock disrupted subsequent avoidance learning
• Avoidable “controllable” shock did not disrupt avoidance learning
– Called the learned-helplessness effect because they failed to avoid the
aversive shock even when they had the opportunity, i.e. they gave up
Learned-Helplessness (LH) effect
• The triadic design for LH experiments is outlined in Table 5.2
– Phase one: Exposure to inescapable shock
• Group 1: Escape - restrained and given unsignaled shock to tail and could
terminate the shock by spinning a wheel in front of them
• Group 2: Yoked - placed in same restraint and given same number and pattern of
shocks but could not terminate the shocks
• Group 3: Control - just put in restraint, no shocks
– Phase two: all groups, put in a 2-compartment shuttlebox and taught a
normal escape/avoidance reaction
• avoid shock by responding during a 10-s warning Light or escape shock once it
came on by jumping to other side of compartment
• if subject did not respond in 60 seconds the shock was terminated
– the experiment tested whether phase one experience affected
escape/avoidance learning
Wheel-Turn apparatus
used in LH experiments
The Principles of Learning and Behavior, 7e by Michael Domjan
Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.
The Principles of Learning and Behavior, 7e by Michael Domjan
Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.
Learned-Helplessness (LH) effect
• Results from the triadic design for LH experiments
– The Escape group learned as easily as the Control group
– But the Yoked group showed an impairment.
• This deficit in learning is the learned-helplessness effect.
• The failure to learn was due to the inability to control shock in phase one
– According to Seligman and Maier, the lack of control in phase one led to the
development of the general expectation that behavior is irrelevant to the
shock offset.
– This expectation of lack of control transferred to the new situation in phase
two, causing retardation of learning in the shuttle box
Learned-Helplessness hypothesis (LH)
• Based on the conclusion that animals can perceive the contingency
between their behavior and the reinforcer.
• When the outcomes are independent of the subject’s behavior
– the subject develops a state of learned helplessness which is manifest in 2 ways
• 1. there is a motivational loss indicated by a decline in performance and heightened
level of passivity
• 2. the subject has a generalized expectation that reinforcers will continue to be
independent of its behavior this persistent belief is the cause of the future learning
deficit
• The LH hypothesis has been challenged by studies showing that it is not
the lack of control that leads to the LH outcome, but rather the inability
to predict the reinforcer.
– 1. Receiving predictable, inescapable shock is less damaging than receiving
unsignaled shock. If inescapable shock is signaled, there is less learning deficit.
– 2. Presentation of stimuli following offset of inescapable eliminates the LH deficit.
• house-light was turned off for a few seconds when shock ended
• Yoked/feedback group learned as well as the Escape and No shock groups
Alternatives to LH hypothesis Attentional deficits
• Activity deficit hypothesis
– Reduced activity similar to freezing
– Not support by Y maze choice learning
• Attentional deficits hypothesis
– Inescapable shock may cause animals to pay less attention to their actions.
– If an animal fails to pay attention to its behavior, it will have difficulty
associating its actions with reinforces in escape-avoidance conditioning.
– When the response is marked by an external stimulus, which helps the
animal pay attention to the appropriate response, the LH deficit is reduced.
• Stimulus relations in escape conditioning
– Why doesn’t controllable shock produce deficits in learning?
– Receiving shock will produce strong emotional responses
– In escape condition the response (turning wheel) terminates the shock
• Very similar to actual escape from a predator i.e. negative reinforcement
• Signals “safety” so there are safety-signal feedback cues See Figure 5.14
FIGURE 5.14
Stimulus relations in an escape-conditioning trial. Shock-cessation feedback cues are experienced at
the start of the escape response, just before the termination of shock. Safety-signal feedback cues are
experienced just after the termination of shock, at the start of the intertrial interval.
The Principles of Learning and Behavior, 7e by Michael Domjan
Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.
Alternatives to LH hypothesis Attentional deficits
• Safety-signal feedback hypothesis
• Minor (1988, 1990)
– Inescaple shock group given a safety signal at the termination of shock
• 5 second light in first study
• Audio-visual combination in the second study
– Addition of safety signal eliminated the learned helpless effect
– Do not need to be able to escape from the aversive event
– Predictability, when it will begin and end, will prevent learned helplessness