* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download variable-ratio schedule
Survey
Document related concepts
Transcript
Concepts of Conditioning DR DINESH RAMOO Classical Conditioning CONCEPTS Explanations of Classical Conditioning What really? is classical conditioning, As is often the case, the process appeared simple at first, but later investigation found it to be a more complex and more interesting phenomenon. Pavlov noted that conditioning depended on the timing between CS and UCS Pavlov’s Hypothesis Pavlov surmised that presenting the CS and UCS at nearly the same time caused a connection to grow in the brain so that the animal treated the CS as if it were the UCS. The figure illustrates the connections before the start of training: The UCS excites a UCS centre in the brain, which immediately stimulates the UCR centre. Pavlov’s Hypothesis The figure illustrates connections that develop during conditioning: Pairing the CS and UCS develops a connection between their brain representations. After this connection develops, the CS excites the CS centre, which excites the UCS centre, which excites the UCR centre and produces a response. Later studies contradicted that idea. For example, a shock (UCS) causes rats to jump and shriek, but a conditioned stimulus paired with shock makes them freeze in position. They react to the conditioned stimulus as a danger signal, not as if they felt a shock. Also, in delay conditioning, where a delay separates the end of the CS from the start of the UCS, the animal does not make a conditioned response immediately after the conditioned stimulus but instead waits until almost the end of the usual delay between the CS and the UCS. Again, it is not treating the CS as if it were the UCS; it is using it as a predictor, a way to prepare for the UCS (Gallistel & Gibbon, 2000). It is true, as Pavlov suggested, that the longer the delay between the CS and the UCS, the weaker the conditioning, other things being equal. However, just having the CS and UCS close together in time is not enough. It is essential that they occur more often together than they occur apart. That is, there must be some contingency or predictability between them. Consider this experiment: For rats in both Group 1 and Group 2, every presentation of a CS is followed by a UCS, as shown in Figure 6.9. However, for Group 2, the UCS also appears at many other times, without the CS. In other words, for this group, the UCS happens every few seconds anyway, and it isn’t much more likely with the CS than without it. Group 1 learns a strong response to the CS; Group 2 does not (Rescorla, 1968, 1988). Now consider this experiment: One group of rats receives a light (CS) followed by shock (UCS) until they respond consistently to the light. (The response is to freeze in place.) Then they get a series of trials with both a light and a tone, again followed by shock. Do they learn a response to the tone? No. The tone always precedes the shock, but the light already predicted the shock, and the tone adds nothing new. The same pattern occurs with the reverse order: First rats learn a response to the tone and then they get light–tone combinations before the shock. They continue responding to the tone, but not to the light, again because the new stimulus predicted nothing that wasn’t already predicted (Kamin, 1969). These results demonstrate the blocking effect: The previously established association to one stimulus blocks the formation of an association to the added stimulus. Again, it appears that conditioning depends on more than presenting two stimuli together in time. Learning occurs only when one stimulus predicts another. Later research has found that presenting two or more stimuli at a time often produces complex results that we would not have predicted from the results of single-stimulus experiments (Urushihara, Stout, & Miller, 2004). Operant Conditioning Shaping Behaviour Suppose you want to train a rat to press a lever. If you put the rat in a box and wait, the rat might never press it. To avoid interminable waits, Skinner introduced a powerful technique, called shaping, for establishing approximations to it. a new response by reinforcing successive To shape a rat to press a lever, you might begin by reinforcing the rat for standing up, a common behaviour in rats. After a few reinforcements, the rat stands up more frequently. Now you change the rules, giving food only when the rat stands up while facing the lever. Soon it spends more time standing up and facing the lever. (It extinguishes its behaviour of standing and facing in other directions because those responses are not reinforced). Next you provide reinforcement only when the rat stands facing the correct direction while in the half of the cage nearer the lever. You gradually move the boundary, and the rat moves closer to the lever. Then the rat must touch the lever and, finally, apply weight to it. Through a series of short, easy steps, you shape the rat to press a lever. Shaping works with humans too, of course. All of education is based on the idea of shaping: First, your parents or teachers praise you for counting your fingers; later, you must add and subtract to earn their congratulations; step by step your tasks become more complex until you are doing calculus. Chaining Behaviour Ordinarily, you don’t do just one action and then stop. You do a long sequence of actions. To produce sequences of learned behaviour, psychologists use a procedure called chaining. Assume you want to train an animal, perhaps a guide dog or a show horse, to go through a sequence of actions in a particular order. You could chain the behaviours, reinforcing each one with the opportunity to engage in the next one. First, the animal learns the final behaviour for a reinforcement. Then it learns the next to last behaviour, which is reinforced by the opportunity to perform the final behaviour. And so on. For example, a rat might first be placed on the top platform as shown in figure f, where it eats food. Then it is put on the intermediate platform with a ladder in place leading to the top platform. The rat learns to climb the ladder. After it has done so, it is placed again on the intermediate platform, but this time the ladder is not present. It must learn to pull a string to raise the ladder so that it can climb to the top platform. Then the rat is placed on the bottom platform (figure a). It now has to learn to climb the ladder to the intermediate platform, pull a string to raise the ladder, and then climb the ladder again. We could, of course, extend the chain still further. Each behaviour is reinforced with the opportunity for the next behaviour, except for the final behaviour, which is reinforced with food. People learn to make chains of responses too. First, you learned to eat with a fork and spoon. Later, you learned to put your own food on the plate before eating. Eventually, you learned to plan a menu, go to the store, buy the ingredients, cook the meal, put it on the plate, and then eat it. Each behaviour is reinforced by the opportunity to engage in the next behaviour. To show how effective shaping and chaining can be, Skinner performed this demonstration: First, he trained a rat to go to the centre of a cage. Then he trained it to do so only when he was playing a certain piece of music. Next he trained it to wait for the music, go to the centre of the cage, and sit up on its hind legs. Step by step he eventually trained the rat to wait for the music (which happened to be the “Star-Spangled Banner”), move to the centre of the cage, sit up on its hind legs, put its claws on a string next to a pole, pull the string to hoist the U.S. flag, and then salute it. Only then did the rat get its reinforcement. Needless to say, a display of patriotism is not part of a rat’s usual repertoire of behaviour. Schedules of Reinforcement The simplest procedure in operant conditioning is to provide reinforcement for every correct response, a procedure known as continuous reinforcement. However, in the real world, unlike the laboratory, continuous reinforcement is not common. Reinforcement for some responses and not for others is known as intermittent reinforcement. We behave differently when we learn that only some of our responses will be reinforced. Psychologists have investigated the effects of many schedules of reinforcement, which are rules or procedures for the delivery of reinforcement. Four schedules for the delivery of intermittent reinforcement are fixed ratio, fixed interval, variable ratio, and variable interval. A ratio schedule provides reinforcements depending on the number of responses. An interval schedule provides reinforcements depending on the timing of responses. Fixed-Ratio Schedule A fixed-ratio schedule provides reinforcement only after a certain (fixed) number of correct responses have been made—after every sixth response, for example. We see similar behaviour among pieceworkers in a factory whose pay depends on how many pieces they turn out or among fruit pickers who get paid by the bushel. A fixed-ratio schedule tends to produce rapid and steady responding. Researchers sometimes graph the results with a cumulative record, in which the line is flat when the animal does not respond, and it moves up with each response. For a fixed-ratio schedule, a typical result would look like the figure. However, if the schedule requires a large number of responses for reinforcement, the individual pauses after each reinforced response. For example, if you have just completed 10 calculus problems, you may pause briefly before starting your next assignment. After completing 100 problems, you would pause even longer. Variable-Ratio Schedule A variable-ratio schedule is similar to a fixed-ratio schedule, except that reinforcement occurs after a variable number of correct responses. For example, reinforcement may come after as few as one or two responses or after a great many. Variable-ratio schedules generate steady response rates. Variable-ratio schedules, or approximations of them, occur whenever each response has about an equal probability of success. For example, when you apply for a job, you might or might not be hired. The more times you apply, the better your chances, but you cannot predict how many applications you need to submit before receiving a job offer. Fixed-Interval Schedule A fixed-interval schedule provides reinforcement for the first response made after a specific time interval. For instance, an animal might get food for only the first response it makes after each 15-second interval. Then it would have to wait another 15 seconds before another response would be effective. Animals (including humans) on such a schedule learn to pause after each reinforcement and begin to respond again toward the end of the time interval. The cumulative record would look like the figure. Checking your mailbox is an example of behaviour on a fixed-interval schedule. If your mail is delivered at about 3 P.M., and you are eagerly awaiting an important package, you might begin to check around 2:30 and continue checking every few minutes until it arrives. Variable-Interval Schedule With a variable-interval schedule, reinforcement is available after a variable amount of time has elapsed. For example, reinforcement may come for the first response after 2 minutes, then for the first response after the next 7 seconds, then after 3 minutes 20 seconds, and so forth. You cannot know how much time will pass before your next response is reinforced. Consequently, responses on a variable-interval schedule occur slowly but steadily. Checking your e-mail is an example: A new message could appear at any time, so you check occasionally but not constantly. Stargazing is also reinforced on a variable-interval schedule. The reinforcement for stargazing—finding a comet, for example—appears at unpredictable intervals. Consequently, both professional and amateur astronomers scan the skies regularly. Extinction of Responses Reinforced on Different Schedules Suppose you and a friend go to a gambling casino and bet on the roulette wheel. Amazingly, your first 10 bets are all winners. Your friend wins some and loses some. Then both of you go into a prolonged losing streak. Presuming the two of you have the same amount of money available and no unusual personality quirks, which of you is likely to continue betting longer? Your friend is, even though you had a more favourable early experience. Responses extinguish more slowly after intermittent reinforcement (either a ratio schedule or an interval schedule) than after continuous reinforcement. Extinction of Responses Reinforced on Different Schedules Consider another example. Your friend Beth has been highly reliable. Whenever she says she will do something, she does it. Becky, on the other hand, sometimes keeps her word and sometimes doesn’t. Now both of them go through a period of untrustworthy behaviour. With whom will you lose patience sooner? It’s Beth. One explanation is that you notice the change more quickly. If someone has been unreliable in the past, a new stretch of similar behaviour is nothing new. Classical and Operant Conditioning INTERRELATIONSHIPS Interrelationships of Classical and Operant Conditioning we have been discussing classical and operant conditioning as if they were totally separate aspects of behaviour. However, it should not be surprising to find that there are interconnections between the two: after all, organisms are constantly producing many responses, both reflex and operant. In this sense, the distinction between the two types of learning is partly a way of simplifying the analysis of behaviour, by breaking it into reflex and operant components. Interrelationships of Classical and Operant Conditioning In the real world, both processes can be occurring simultaneously. One striking example of this is negative reinforcement. You may recall that negative reinforcement utilizes a negative reinforcer in order to increase the probability of a response. One form of this is escape, where a negative reinforcer is presented, and is only removed after the organism makes the desired response. In this circumstance, the removal of the aversive stimulus is effectively like a reward, so the behaviour becomes more likely (hence, reinforcement). For example, a dog given a mild shock through an electrified floor grid will learn to jump to another chamber to escape the shock. Now, if a light flashes before the start of the shock, the dog will soon anticipate the shock, and jump before the shock begins. This becomes avoidance – the dog is jumping in order to avoid the negative reinforcer. This leads to an interesting problem: since the dog jumps before the shock, there is no longer any experience of the original reinforcer – a circumstance that would lead to extinction of the response if one were looking at positive reinforcement. So why does the dog keep jumping each time the light goes on? The light, of course, has become a discriminative stimulus, enabling the dog to respond before the shock occurs. Still, why should the dog persist in jumping without at least an occasional experience of shock? The answer seems to be that, through classical conditioning, the light has become a CS associated with the UCS of shock – which is a perfect scenario for creating a conditioned fear. Thus, the dog continues to jump, not to avoid the shock, but to escape from the feared light! (Mowrer 1956; Rescorla and Solomon 1967). Recognizing that the two processes (operant and classical conditioning) are occurring together also adds to our understanding of conditioned fears. Watson, in his demonstration with little Albert, discovered that conditioned fears do not readily extinguish. The reason for this seems to be that the feared stimulus (the CS) also triggers operant escape behaviour. This escape response removes the individual from the situation before there is an opportunity to determine if the UCS will follow or not – thereby preventing the conditions necessary for extinction. (The same mixture of classical and operant responses happens in the shower when we hear the toilet flush: while we fear the sound, we also tend to jump away from the water spray to avoid being scalded.) The fact that fear stimuli can evoke an operant response is a very significant point in terms of those everyday fears, which are called phobias. If, as Watson argued, such fears are based on classical conditioning, then it is also likely that the fears persist long after the original experience, because we avoid the situations that elicit the fear. As a result, there is no opportunity to find out if our fear is realistic or not. For example, a person who is afraid of flying will be reluctant to fly, and therefore has no chance to find out that flying is safe, and that there is nothing to fear. In essence, until we face the fear situation, there is no opportunity to extinguish the fear response. Another type of interaction can occur in which conditioned behaviours are also sustained by reinforcement. For example, a phobia may arise through classical conditioning, but the individual may also be positively reinforced by attention and sympathy from other people. In such circumstances, the individual may be unlikely to try to change. Questions?