* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download conditioned reinforcer
Attribution (psychology) wikipedia , lookup
Theory of planned behavior wikipedia , lookup
Applied behavior analysis wikipedia , lookup
Theory of reasoned action wikipedia , lookup
Classical conditioning wikipedia , lookup
Perceptual control theory wikipedia , lookup
Shock collar wikipedia , lookup
Verbal Behavior wikipedia , lookup
Behavior analysis of child development wikipedia , lookup
Learning theory (education) wikipedia , lookup
Social cognitive theory wikipedia , lookup
Behaviorism wikipedia , lookup
Response-Reinforcer Relation • Two types of relationships exist between a response and a reinforcer. – 1. Causal Relationship between a response and the reinforcer • Contingency: (R) - (O) doing(R) produces (O) • extent to which response is necessary and sufficient for occurrence of the reinforcer – 2. Temporal Relationship between the response and reinforcer • amount of time between response and reinforcer. • Temporal Contiguity: (R) - (O) , outcome is immediate • Or with some delay (R) - - - -delay- - - - - (O) • Response-reinforcer contingency and temporal contiguity are independent of each other – Can have (R)-(O)contingency with either short or long temporal relationship • Mow the lawn, get paid immediately or get paid next Tuesday – Can have immediate outcome but only get the outcome part of the time • Put money in a slot machine, sometimes you win and sometimes you do not Effects of Temporal Contiguity • Temporal contiguity (R) - (O) – immediate outcome provides the best learning – Even short delays CAN hinder learning • Dickinson et al (1992) – rats were reinforced for lever-pressing – varied delay between response and reinforcer – As the delay between the response and reinforcer increased the conditioned lever pressing decreased dramatically see Figure 5.11 • Why is instrumental conditioning so sensitive to a delay of reinforcement? – Is this also true for human behavior? • How much delay can people tolerate? • What are some examples of long delay between behavior and outcome? • Dealing with delay of reinforcement – However even rats can tolerate some delay if they bridge the delay • with a conditioned reinforcer • Or with a marking procedure The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved. Secondary or Conditioned Reinforcer • Primary reinforcers: usually food, drink, and pleasure • Secondary, or conditioned reinforcer – associated with the primary reinforcer • • • • • Present clicker – food pairings for dog training Then response “sit” -- outcome “clicker“ for training the dog to sit clicker is a conditioned reinforcer used to bridge gap until primary reinforcer However clicker --- food only needs to be reinstated occasionally Response “sit” -- outcome “verbal praise” for training a dog to sit – Is verbal praise a conditioned reinforcer? – What works as conditioned reinforcers for humans? • What is a common conditioned reinforcer for people? • Coaches, instructors, parents use verbal praise • Is verbal praise a conditioned reinforcer? – What is the primary reinforcer? Marking Procedure • Marking the target response to bridge delayed reinforcement – Use a specific stimulus such as a flashing light in conjunction with the response – In discrete trials “maze” moving the rat to a holding chamber during the delay • Lieberman et al (1979) tested whether rats could learn a correct turn or choice in a maze despite a long delay of reward • Williams (1999) – – – – – – – – No signal group: lever press with 30 second delay then food delivery Marked group: lever press, 5 second light, 25 second delay then food delivery Blocked group: lever press, 25 second delay, 5 second light before food delivery See Figure 5.12 marking improves learning while blocking prevents learning marked group can use the light signal to fill the delay in the blocked group the light just before the food interferes with learning However, even the no signal group shows some learning • so they must be bridging the delay by some other means • such as ? The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved. Response-Reinforcer Contingency • Contingency: the outcome is "contingent on" a particular response – positive contingency response produces outcome – negative contingency response prevents or eliminates outcome • Studies of delay of reinforcement demonstrated – that a perfect contingency (R – O) is not sufficient to produce strong instrumental responding – Led to the conclusion that response-reinforcer contiguity, rather than contingency, was the critical factor. • Skinner’s Superstition Experiment supported this conclusion – Food presented to pigeons every 15 seconds regardless of the behavior of the bird. – Birds showed stereotyped behavior patterns – Skinner’s operant conditioning explanation • Adventitious (accidental) reinforcement of the bird’s behavior • Stressed the importance of contiguity between response and the reinforcer • However Skinner got this one wrong Staddon and Simmelhag Superstition Experiment • A landmark study that challenged Skinner’s interpretation. – See Figure 5.13 – Similar procedure except fixed time interval of 12 s – birds were observed and their behavior recorded on all sessions. • Found two types of responses at asymptote: – Interim responses: started immediately after food delivery but terminated several seconds before food (e.g., turning circles, flapping wings), differed among pigeons and intervals – Terminal responses: started mid-interval and continued until food was delivered. • For all pigeons the terminal response was pecking • terminal responses were reinforced – Differences in Interim and Terminal responses can be explained by behavior systems • Terminal responses are species-specific behavior that is part of focal search • Interim responses are more like general search beahvior The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved. Controllability of Reinforcers • Response controls the reinforcer – A strong contingency allows control over the reinforcer. – A weak contingency does not allow control over the reinforcer – can be seen in positive reinforcement • Eating candy for example • However, most of the research has used Negative Reinforcement – Which has more clinical application – Responses remove or prevent an aversive event • Taking drugs to reduce pain • Jumping back to avoid getting run over – Early experiments used escape-avoidance learning in dogs • Unavoidable “uncontrollable” shock disrupted subsequent avoidance learning • Avoidable “controllable” shock did not disrupt avoidance learning – Called the learned-helplessness effect because they failed to avoid the aversive shock even when they had the opportunity, i.e. they gave up Learned-Helplessness (LH) effect • The triadic design for LH experiments is outlined in Table 5.2 – Phase one: Exposure to inescapable shock • Group 1: Escape - restrained and given unsignaled shock to tail and could terminate the shock by spinning a wheel in front of them • Group 2: Yoked - placed in same restraint and given same number and pattern of shocks but could not terminate the shocks • Group 3: Control - just put in restraint, no shocks – Phase two: all groups, put in a 2-compartment shuttlebox and taught a normal escape/avoidance reaction • avoid shock by responding during a 10-s warning Light or escape shock once it came on by jumping to other side of compartment • if subject did not respond in 60 seconds the shock was terminated – the experiment tested whether phase one experience affected escape/avoidance learning Wheel-Turn apparatus used in LH experiments The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved. The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved. Learned-Helplessness (LH) effect • Results from the triadic design for LH experiments – The Escape group learned as easily as the Control group – But the Yoked group showed an impairment. • This deficit in learning is the learned-helplessness effect. • The failure to learn was due to the inability to control shock in phase one – According to Seligman and Maier, the lack of control in phase one led to the development of the general expectation that behavior is irrelevant to the shock offset. – This expectation of lack of control transferred to the new situation in phase two, causing retardation of learning in the shuttle box Learned-Helplessness hypothesis (LH) • Based on the conclusion that animals can perceive the contingency between their behavior and the reinforcer. • When the outcomes are independent of the subject’s behavior – the subject develops a state of learned helplessness which is manifest in 2 ways • 1. there is a motivational loss indicated by a decline in performance and heightened level of passivity • 2. the subject has a generalized expectation that reinforcers will continue to be independent of its behavior this persistent belief is the cause of the future learning deficit • The LH hypothesis has been challenged by studies showing that it is not the lack of control that leads to the LH outcome, but rather the inability to predict the reinforcer. – 1. Receiving predictable, inescapable shock is less damaging than receiving unsignaled shock. If inescapable shock is signaled, there is less learning deficit. – 2. Presentation of stimuli following offset of inescapable eliminates the LH deficit. • house-light was turned off for a few seconds when shock ended • Yoked/feedback group learned as well as the Escape and No shock groups Alternatives to LH hypothesis Attentional deficits • Activity deficit hypothesis – Reduced activity similar to freezing – Not support by Y maze choice learning • Attentional deficits hypothesis – Inescapable shock may cause animals to pay less attention to their actions. – If an animal fails to pay attention to its behavior, it will have difficulty associating its actions with reinforces in escape-avoidance conditioning. – When the response is marked by an external stimulus, which helps the animal pay attention to the appropriate response, the LH deficit is reduced. • Stimulus relations in escape conditioning – Why doesn’t controllable shock produce deficits in learning? – Receiving shock will produce strong emotional responses – In escape condition the response (turning wheel) terminates the shock • Very similar to actual escape from a predator i.e. negative reinforcement • Signals “safety” so there are safety-signal feedback cues See Figure 5.14 FIGURE 5.14 Stimulus relations in an escape-conditioning trial. Shock-cessation feedback cues are experienced at the start of the escape response, just before the termination of shock. Safety-signal feedback cues are experienced just after the termination of shock, at the start of the intertrial interval. The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved. Alternatives to LH hypothesis Attentional deficits • Safety-signal feedback hypothesis • Minor (1988, 1990) – Inescaple shock group given a safety signal at the termination of shock • 5 second light in first study • Audio-visual combination in the second study – Addition of safety signal eliminated the learned helpless effect – Do not need to be able to escape from the aversive event – Predictability, when it will begin and end, will prevent learned helplessness