Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
PSY 445: Learning & Memory Chapter 4: Instrumental Conditioning: Reward INSTRUMENTAL CONDITIONING E. L. Thorndike (1905) • Described the learning that was governed by his "law of effect" as instrumental conditioning because responses are strengthened when they are instrumental in producing rewards • Law of Effect • Responses that are rewarded are more likely to be repeated and responses that produce discomfort are less likely to be repeated “Rewarded behaviors are more likely to recur” E. L. Thorndike (1874-1949) TRIAL-AND-ERROR LEARNING Thorndike's Puzzle Box • In his classic experiment, a cat was locked in the box and enticed to escape by using food that was placed out of the reach from the box • Ropes, levers, and latches that the cat could use to escape • Trial and error behavior would lead to ultimate success (usually within three minutes) • Thorndike felt we learned trial and error through awareness GESTALT VIEWPOINT Wolfgang Kohler • A Gestalt psychologist had an opposing view is that we learn things implicitly – unawareness – natural insight • Example: gorilla in a cage – food out of reach – but stick is not… INSTRUMENTAL CONDITIONING Operant Conditioning • A type of learning in which voluntary (controllable and non-reflexive) behavior is strengthened if it is reinforced and weakened if it is punished (or not reinforced) B.F Skinner SKINNER’S OPERANT CONDITIONING The organism learns a response by operating on the environment… Note: • The terms instrumental conditioning and operant conditioning describe essentially the same learning process and are often used interchangeably • Basically, Skinner extended and formalized many of Thorndike's ideas CLASSICAL CONDITIONING VS. INSTRUMENTAL CONDITIONING Instrumental Conditioning: Response comes first and is voluntary unlike classical where stimulus comes first and response is involuntary Classical: S R Operant: S R S that becomes R S METHODS OF STUDY Skinner Box • Key-pecking of round plexiglass disk placed at eye level Mazes • T-shaped; straight runway Infants • Head turning; Leg movements Computer games Skinner Box POSITIVE REINFORCEMENT Behavior is strengthened when something pleasant or desirable occurs following the behavior • With the use of positive reinforcement chances that the behavior will occur in the future is increased REINFORCEMENT VARIABLES AFFECTING ACQUISITION Amount /Quality of Reinforcement • Contrast Effect • Previous experience with the reward matters • Qualitatively • Kobre & Lipsitt (1972) • Infants: Water Sucrose Water (less) • Quantitatively • Crespi (1942): rats running for pellets See next slide QUANTITY, QUALITY, & CONTRASTS OF REINFORCEMENT Amount of Reinforcement Effect/Contrast Effect Crespi (1942) Procedure Rats running for pellets Group 1: Initial small reward; then switched from small reward to larger reward Group 2: initial large reward; then switched from large reward to smaller reward Group 3: served as a control (no change in reward) Results Initially, Group 2 > Group 1 Group 1: Started running faster (positive contrast) Group 2: Started running slower (negative contrast) See next slide Running Speed (ft/sec) CRESPI (1942) 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 <---------- Preshift -------------> Postshift -------------> 256-16 Pellets 16-16 Pellets 1 - 16 Pellets 2 4 6 8 10 12 14 16 18 20 2 4 6 8 Trials REINFORCEMENT VARIABLES AFFECTING ACQUISITION Drive • Motivational need or desire for the reward Raymond (1954) Procedure • Deprivation experiment in which rats were deprived of food Results • The rats deprived longer, ran faster Deprivation experiment REINFORCEMENT VARIABLES AFFECTING ACQUISITION Drive Hull (1949) Response strength = H x D x K Rats will run fastest when: H: high prior reinforcement (habit strength) D: the rat is hungry (in deprivation state; high drive state) K: the reinforcer is appealing (incentive) SCHEDULES OF REINFORCEMENT A schedule of reinforcement is the response requirement that must be met in order to obtain reinforcement Different schedules • Continuous: usually better for acquisition • Partial (intermittent): less extinction SCHEDULES OF REINFORCEMENT Stevenson & Zigler (1958) Procedure • Children performed a push-button task; one of three buttons produced the reward • 3 groups: 100% reward; 66% reward; 33% reward Results • Children had highest frequency of pressing the correct button on the continuous schedule • Children on partial schedules tried patterns that involved all three buttons; many errors Interpretation • Partial schedule interfered with learning of the responsereward contingency PARTIAL SCHEDULES Ratio Schedule • When you want to reinforce based on a certain number of responses occurring Interval Schedule • When you want to reinforce the first response after a certain amount of time has passed FOUR TYPES OF PARTIAL SCHEDULES Ratio Schedules Interval Schedules • Fixed Ratio • Fixed Interval • Variable Ratio • Variable Interval FIXED RATIO SCHEDULE On a fixed ratio schedule, reinforcement is contingent upon a fixed, predictable number of responses Characteristic pattern: • High rate of response • Short pause following each reinforcer FIXED RATIO SCHEDULE Higher Ratio requirements result in longer postreinforcement pauses • Example: The longer the chapter you read, the longer the study break! Ratio Strain – a disruption in responding due to an overly demanding response requirement • Movement from “dense/rich” to “lean” schedule should be done gradually FIXED INTERVAL SCHEDULES On a fixed interval schedule, reinforcement is contingent upon the first response after a fixed, predictable period of time Characteristic pattern: • Pattern includes the response/reward then a postreinforcement pause followed by a gradually increasing rate of response as the time interval draws to a close VARIABLE RATIO SCHEDULE On a variable ratio schedule, reinforcement is contingent upon a varying, unpredictable number of responses Characteristic pattern: • High and steady rate of response • Little or no post-reinforcer pausing • Telemarketing is an example of a behavior on this type of schedule Casino slot machines VARIABLE INTERVAL SCHEDULE On a variable interval schedule, reinforcement is contingent upon the first response after a varying, unpredictable period of time Characteristic pattern: • A moderate, steady rate of response with little or no post-reinforcement pause DELAY OF REINFORCEMENT A delay in reinforcement is usually less effective than when reinforcement is given immediately after a correct response Problems • Other behaviors may occur during the delay that may unintentionally become conditioned • Response has been forgotten DELAY OF REINFORCEMENT Lieberman, McIntosh, & Thomas (1979) Procedure • T-shaped maze: rats rewarded for a correct turn; no reward for incorrect turn • However, before reward/no reward they were put in a delay box Results • Rats were slow to learn which turn was correct Interpretation • Difficulty remembering Did I make the right turn? DELAY OF REINFORCEMENT Self-Control: The capacity to inhibit immediate gratification • Choice between small immediate reward vs. delayed large reward • Too long is not good; too small is not good • Gradual increases can work Logue, Forzano, & Ackerman (1996) • Experiment on children: age matters as 3-year-olds were more likely to choose a smaller’ immediate reward than were 5-year-olds DELAY OF REINFORCEMENT Self-Control • Food is a tough one to wait for • Even adults have trouble waiting for the bigger reward REINFORCERS Primary Reinforcers • Innately rewarding; no learning necessary • Stimulus that naturally strengthens any response (increases behavior) that precedes it without the need for any learning on the part of the organism • Food, water, etc. Secondary Reinforcers • A consequence that is learned by pairing with a primary reinforcer and thus increases behavior • For people, money, good grades, and words of praise, etc. are often linked to basic rewards • We need money to buy food, etc. SECONDARY REINFOREMENT Social Reinforcers • Praise, attention, physical contact, facial expressions given by parents, teachers, or peers can exert considerable control over our behavior THEORIES OF REINFORCEMENT Reinforcers as stimuli • Drive Reduction • Incentive Motivation • Brain Stimulation DRIVE REDUCTION THEORY (HULL, 1943) Supporters of this theory believe that when a need requires satisfaction, it produces drives • These are tensions that energize behavior in order to satisfy a need • Thirst and hunger are, for instance, drives for satisfying the needs of eating and drinking, respectively DRIVE REDUCTION THEORY Drives have been generally established as primary and secondary… • Primary drives satisfy biological needs and must be fulfilled in order to survive • Homeostasis is the motivational phenomenon for primary drives that preserves our internal equilibrium. This is true, for example, for hunger or thirst • Secondary drives satisfy needs that are not crucial to a person's life Critics felt that this theory was inadequate in explaining secondary drives INCENTIVE MOTIVATION Sometimes, we just do things because they are FUN! When this happens, we can say that motivation is coming from some property of the reinforcer itself rather than from some kind of internal drive • Examples include playing games and sports, putting spices on food, etc. INCENTIVE THEORY Suggests that people act to obtain positive incentives and avoid negative incentive • Explains secondary drives much better than drive-reduction theory BRAIN STIMULATION Underlying physiological basis of reinforcement • Possible part of the brain that is activated by stimuli that work as reinforcers Olds & Milner (1954) • Stimulation of reticular formation in the rat’s brain was reinforcing Blum et al. (1996) • Found some evidence that a genetic anomaly is associated with a reward-craving syndrome • People with a certain version of this gene become easily addicted to compulsive behaviors (smoking, gambling) REINFORCERS AS BEHAVIORS Rather than characterizing reinforcers as stimuli, they can be viewed as activities and behaviors • This view clearly expands the category of reinforcers Premack Principle (1962) • The idea that behaviors can be ranked in terms of their preference or value to an individual • Once this is determined, a more probable activity can be used to reinforce a less probable one Limitations • Unrestricted opportunity afforded person to engage in the activity is necessary to determine baseline frequency of occurrence • Behaviors will vary over time (deprivation, satiation) REINFORCERS AS BEHAVIORS Homme et al. (1963) • Unruly preschoolers High probability behaviors • Ignored teacher • Screaming • Pushing furniture Low probability behavior • Sitting quietly Premack Principle HOMME ET AL. (1963) Rewarded sitting quietly with... • 3 min of running around screaming Results • Sitting quietly increased Particular behaviors observed by different kids • Different responses effective reinforcers for different kids Premack Principle REINFORCERS AS STRENGTHENERS A reinforcer can strengthen the association between a discriminative stimulus and an instrumental response Light Bar Press Food Food reinforcer strengthens the association between the light and bar press • Question remains as to whether rats are doing this because of reward (food) or because food strengthened the association between light and bar press? REINFORCERS AS STRENGTHENERS Huston, Mondadori, & Waser (1974) • Mice on platform in Skinner Box; natural reaction is to step off • Group 1: Step off platform shock • Group 2: Step off platform shock food • If food is reward then Group 2 should step off again OR • If food is strengthener it will cause stronger link between stepping off platform and shock • Results? REINFORCERS AS INFORMATION No obvious reinforcer • Information may be positive (“Yes, I got it right”) or negative (“no, I messed up”) IS REINFORCEMENT NECESSARY? Tolman & Honzik (1930) Exp. 1: we discussed in chapter 1; latent learning among rats not immediately reinforced Exp. 2 Procedure • Group 1: Reinforced every time they found their way out of the maze (food in goal box) for 10 days; on day 11 no food in the goal box Results • Rats started taking wrong turns Interpretation • Taking reinforcement away leads to confusion AWARENESS IN HUMAN INSTRUMENTAL LEARNING Subliminal messages were thought to be so effective that US congress passed laws prohibiting these commercial messages in movies New Jersey (1957) movie theatre • Eat popcorn • Drink Coca-Cola AWARENESS IN HUMAN INSTRUMENTAL LEARNING Subliminal messages were thought to be so effective that US congress passed laws prohibiting these commercial messages in movies New Jersey (1957) movie theatre • Eat popcorn • Drink Coca-Cola AWARENESS IN HUMAN INSTRUMENTAL LEARNING Click on picture for video clip AWARENESS IN HUMAN INSTRUMENTAL LEARNING Greenspoon (1955) • Researcher would mutter “umm humm” whenever a plural noun was emitted • While subjects were not told of this contingency they nevertheless began to use more plural nouns throughout the course of the experiment • Researcher’s conclusions: Verbal conditioning took place without awareness Dulany (1968) • Replication suggests different conclusion CRITICISMS OF THE USE OF REINFORCEMENT 1. Manipulative form of control 2. Certain behaviors should be performed without rewards 3. Reinforcement produces transient changes 4. Intrinsic motivation is undermined by rewards • Internally motivated desire to perform a behavior for its own sake may be lessened DOES REINFORCEMENT UNDERMINE INTRINSIC MOTIVATION? Lepper, Greene, and Nisbett (1973) Baseline observations: • 51 3-5 yr olds who showed intrinsic interest in a target activity Procedure • Expected-Award condition (reinforcement) • Unexpected-Award condition (reinforcement) • No-Award condition (no reinforcement) Results • Those reinforced colored less than those not reinforced DOES REINFORCEMENT UNDERMINE INTRINSIC MOTIVATION? Lepper, Greene, and Nisbett (1973) Interpretation • Overjustification Limitations • Children in the experimental groups that were rewarded for drawing may have become satiated on the activity RESPONSE LEARNING Shaping • Training a behavior not in an organism’s behavior repertoire • Reinforcing successive approximations is usually part of this process (see next slide) • Skinner taught pigeons “unpigeon-like” behaviors • Others have trained monkeys to help quadriplegics RESPONSE LEARNING Chaining • Here, the response being reinforced is an entire sequence of behaviors • Explanations 1. Each response also acts as the discriminative stimulus for the next response in the series 2. Each response acts as a secondary reinforcer for the previous response Note • Both forward-chaining and backward-chaining appear to work equally well RESPONSE LEARNING Limitations 1. Some reflex responses cannot be modified 2. Species-specific limitations Breland & Breland (1961) • Tried to teach pigs to put wooden coins in a piggy bank by offering food reward • Researchers were unsuccessful; natural instincts won out over reinforcement RESPONSE LEARNING Limitations 3. Evolutionary preparedness interference 4. Behavior system interference DISCRIMINATIVE STIMULUS CONTROL Discriminative Stimuli (SD ) • Stimuli that signals when (or where) reinforcement is available Stimulus Control • The probability of the behavior varies depending upon the stimuli present • The response is brought under control of the stimulus Talwar et al. (2002) • Remote-control rat Click on image for news report STIMULUS CONTROL: GENERALIZATION Generalization is when responses to one stimulus occur to other, usually similar stimuli Generally, as the training and test stimuli become more different responding will decline, producing what is called a generalization gradient Guttman & Kalish (1956) • Pigeons were reinforced for pecking a 580 nm lit key (orange-yellow) on a VI schedule • A test session was then given where many different colored key lights were presented in extinction See next slide STIMULUS GENERALIZATION AS A MEASURE OF STIMULUS CONTROL Responses 400 350 300 250 200 150 Training SD 100 50 0 500 520 540 560 580 600 620 640 Wavelength (nanometers) Pigeons were trained to peck in the presence of a colored light of 580 nm wavelength and then tested in the presence of other colors. Guttman & Kalish (1956) STIMULUS CONTROL: DISCRIMINATION Discrimination training involves presenting at least 2 stimuli but reinforcing only one of them • Discrimination is differential responding to multiple stimuli • Responses are reinforced in the presence of SD, but these responses are not reinforced in the presence of S∆, a stimulus that signals the absence of reinforcement Limitations • Non-reinforced responses can produce negative reactions (frustration, agitation) • S∆ may become aversive through association with these negative emotions WHAT IS LEARNED IN INSTRUMENTAL CONDITIONING Response-Reinforcer Learning • Organism performs the response to get reward Stimulus-Response Learning • Connection is learned between SD and the response • Reinforcer acts to condition this association but is not part of the learned sequence • Singh (1970): “free rewards” Stimulus-Reinforcer Learning • Classical conditioning can occur • Typical sequence in a instrumental trial is: SD Response Reinforcement HABITS Habit Slips • Intrusion of a habit when an alternative behavior had been intended • SD can evoke an instrumental response even though changed conditions suggest that different response is currently more appropriate Breaking Habits • Correct a habit when the eliciting stimuli (SD ) is otherwise absent or disrupted BEHAVIOR MODIFICATION Successful programs follow rules of instrumental conditioning • Punishment • Ayllon (1963): stealing food in cafeteria example • Eliminate the reinforcer • Ayllon (1963): towel example • Increase rewards • Token economy (secondary reinforcer) BEHAVIORAL ECONOMICS Loss Aversion: Irrational actions related to being more sensitive to potential losses than to potential equivalent gains Chen, Lakshminarayanan, & Santos (2006) Procedure • Monkeys offered slices of apple in exchange for a token • Two options: • Person 1: showed two slices and sometimes gave one or both • Person 2: showed one slice but sometimes gave a second one • Each person gave the exact number of slices to the monkeys Results • Monkeys began to avoid the person showing the two slices Interpretation • It seems like a loss when you appear to be offered two slices and only get one as compared to when you see one slice and get it BEHAVIORAL ECONOMICS The Goal Gradient Hypothesis: The effect of a reward is weaker the further away the behavior is from the reward Kivetz, Urminsky, & Zheng (2006) Procedure • Reward program punch cards at campus coffee shop • Experiment 1: regular punch cards (10 holes) • Experiment 2: one group of students got regular punch cards while a second groups got “illusory” punch cards with (12 holes but 2 already punched out) Results: • In Exp. 1, responses increased as students got closer to free coffee; in Exp. 2 students with the “illusory” punch cards responded quicker than the others CREDITS Some slides prepared with the help of the following website: • www.radford.edu/~pjackson/ExtinctIC.ppt