Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Revision II DR DINESH RAMOO What is Learning? Almost all human behaviour is learned. Imagine if you suddenly lost all you had ever learned. What could you do? You would be unable to read, write, or speak. You couldn’t feed yourself, find your way home, drive a car, play the bassoon, or “party.” Needless to say, you would be totally incapacitated. (Dull, too!) Learning is a relatively permanent change in behaviour due to experience (Powell, Symbaluk, & Honey, 2009). Notice that this definition excludes both temporary changes and more permanent changes caused by motivation, fatigue, maturation, disease, injury, or drugs. Each of these can alter behaviour, but none qualifies as learning. Definitions of Learning A change in behaviour as a result of experience or practice. The acquisition of knowledge. Knowledge gained through study. To gain knowledge of, or skill in, something through study, teaching, instruction or experience. The process of gaining knowledge. A process by which behaviour is changed, shaped or controlled. The individual process of constructing understanding based on experience from a wide range of sources. Behaviourism Most psychologists who study animal learning and behaviour seek simple explanations, such as trial-and error learning, that do not require us to assume complicated mental processes. The behaviourists, who have dominated the study of animal learning, insist that psychologists should study only observable, measurable behaviours, not mental processes. Behaviourists seek the simplest possible explanation for any behaviour and resist interpretations in terms of understanding or insight. At least, they insist, we should exhaust attempts at simple explanations before we adopt more complex ones. Behaviourists The term behaviourist applies to theorists and researchers with quite a range of views (O’Donohue & Kitchener, 1999). Two major categories are: Methodological behaviourists radical behaviourists. Classical Conditioning Pavlov’s Experiment Pavlov used an experimental setup like the one in the figure (Goodwin, 1991). First, he selected dogs with a moderate degree of arousal. (Highly excitable dogs would not hold still long enough, and highly inhibited dogs would fall asleep.) Then he attached a tube to one of the salivary ducts in the dog’s mouth to measure salivation. He could have measured stomach secretions, but measuring salivation was easier. Pavlov found that, whenever he gave a dog food, the dog salivated. The food and salivation connection was automatic, requiring no training. Pavlov called food the unconditioned stimulus, and he called salivation the unconditioned response. If a particular stimulus consistently, automatically elicits a particular response, we call that stimulus the unconditioned stimulus (UCS), and the response to it is the unconditioned response (UCR). Next Pavlov introduced a new stimulus, such as a metronome. Upon hearing the metronome, the dog lifted its ears and looked around but did not salivate, so the metronome was a neutral stimulus with regard to salivation. Then Pavlov sounded the metronome a couple of seconds before giving food to the dog. After a few pairings of the metronome with food, the dog began to salivate as soon as it heard the metronome (Pavlov, 1927/1960). We call the metronome the conditioned stimulus (CS) because the dog’s response to it depends on the preceding conditions—that is, the pairing of the CS with the UCS. The salivation that follows the metronome is the conditioned response (CR). The conditioned response is simply whatever response the conditioned stimulus begins to elicit as a result of the conditioning (training) procedure. At the start of the conditioning procedure, the conditioned stimulus does not elicit a conditioned response. After conditioning, it does. At first During training After some number of repetitions Conditioned and Unconditioned Response In Pavlov’s experiment the conditioned response (salivation) closely resembled the unconditioned response (also salivation). However, in some cases it is quite different. For example, the unconditioned response to an electric shock includes shrieking and jumping. The conditioned response to a stimulus paired with shock (i.e., a warning signal for shock) is a tensing of the muscles and lack of activity (e.g., Pezze, Bast, & Feldon, 2003). Examples of Classical Conditioning Your alarm clock makes a faint clicking sound a couple of seconds before the alarm goes off. At first the click by itself does not awaken you, but the alarm does. After a week or so, you awaken as soon as you hear the click. Unconditioned Stimulus = Alarm Conditioned Stimulus = Click Unconditioned Response = Awakening Conditioned Response = Awakening Examples of Classical Conditioning You hear the sound of a dentist’s drill shortly before the unpleasant experience of the drill on your teeth. From then on the sound of a dentist’s drill arouses anxiety. Unconditioned Stimulus = Drilling Conditioned Stimulus = Sound of the drill Unconditioned Response = Tension Conditioned Response = Tension Examples of Classical Conditioning A nursing mother responds to her baby’s cries by putting the baby to her breast, stimulating the flow of milk. After a few days of repetitions, the sound of the baby’s cry is enough to start the milk flowing. Unconditioned Stimulus = Baby sucking Conditioned Stimulus = Baby’s cry Unconditioned Response = Milk flow Conditioned Response = Milk flow Examples of Classical Conditioning Note the usefulness of classical conditioning in each case: It prepares an individual for likely events. In some cases, however, the effects can be unwelcome. For example, many cancer patients who have had repeated chemotherapy or radiation become nauseated when they approach or even imagine the building where they received treatment (Dadds, Bovbjerg, Redd, & Cutmore, 1997). Unconditioned Stimulus Conditioned Stimulus = Chemotherapy or radiation = Approaching the building Unconditioned Response = Nausea Conditioned Response = Nausea Extinction Extinction is not the same as forgetting. Both weaken a learned response, but they arise in different ways. You forget during a long period with no relevant experience or practice. Extinction occurs as the result of a specific experience— perceiving the conditioned unconditioned stimulus. stimulus without the Extinction Extinction does not erase the original connection between the CS and the UCS. We can regard acquisition as learning to do a response and extinction as learning to inhibit it. For example, suppose you have gone through original learning in which a tone regularly predicted a puff of air to your eyes. You learned to blink your eyes at the tone. Then you went through an extinction process in which you heard the tone many times but received no air puffs. You extinguished, so the tone no longer elicited a blink. Now, without hearing a tone, you get another puff of air to your eyes. As a result, the next time you hear the tone, you will blink your eyes. Extinction inhibited your response to the CS (here, the tone), but a sudden puff of air weakens that inhibition (Bouton, 1994). Spontaneous Recovery Suppose you are in a classical-conditioning experiment. At first you repeatedly hear a buzzer sound (CS) that precedes a puff of air to your eyes (UCS). Then the buzzer stops predicting an air puff. After a few trials, your response to the buzzer extinguishes. Now, suppose you sit there for a long time with nothing happening and then suddenly you hear another buzzer sound. What will you do? Chances are, you will blink your eyes at least slightly. Spontaneous recovery is this temporary return of an extinguished response after a delay. Spontaneous recovery requires no additional CS–UCS pairings. Operant Conditioning Reinforcement A reinforcement is an event that increases the future probability of the most recent response. Thorndike said that it “stamps in,” or strengthens, the response. The next time the cat is in the puzzle box, it has a slightly higher probability of the effective response; after each succeeding reinforcement, the probability goes up another notch According to Skinner, reinforcement occurs when a response is followed by rewarding consequences and the organism’s tendency to make the response increases. The two examples diagrammed here illustrate the basic premise of operant conditioning—that voluntary behaviour is controlled by its consequences. These examples involve positive reinforcement (for a comparison of positive and negative reinforcement Punishment In contrast to a reinforcer, which increases the probability of a response, a punishment decreases the probability of a response. A reinforcer can be either the presentation of something (e.g., food) or the removal of something (e.g., pain). A punishment can be either the presentation of something (e.g., pain) or the removal of something (e.g., food). Reinforcement and Punishment What constitutes reinforcement? From a practical standpoint, a reinforcer is an event that follows a response and increases the later probability or frequency of that response. However, from a theoretical standpoint, we would like to have some way of predicting what would be a reinforcer and what would not. We might guess that reinforcers are biologically useful to the individual, but in fact many are not. For example, saccharin, a sweet but biologically useless chemical, can be a reinforcer. For many people alcohol and tobacco are stronger reinforcers than vitamin rich vegetables. So biological usefulness doesn’t define reinforcement. In his law of effect, Thorndike described reinforcers as events that brought “satisfaction to the animal.” That definition won’t work either. How could you know what brings a rat or a cat satisfaction? Furthermore, people will work hard for a pay check, a decent grade in a course, and other outcomes that often don’t produce evidence of pleasure (Berridge & Robinson, 1995). Classical and Operant Conditioning In general the two kinds of conditioning also differ in the behaviours they affect. Classical conditioning applies primarily to visceral responses (i.e., responses of the internal organs), such as salivation and digestion, whereas operant conditioning applies primarily to skeletal responses (i.e., movements of leg muscles, arm muscles, etc.). However, this distinction sometimes breaks down. For example, if a tone consistently precedes an electric shock (a classical-conditioning procedure), the tone will make the animal freeze in position (a skeletal response) as well as increase its heart rate (a visceral response). Categories of Reinforcement and Punishment Extinction In operant conditioning extinction occurs if responses stop producing reinforcements. For example, you were once in the habit of asking your roommate to join you for supper. The last five times you asked, your roommate said no, so you stop asking. In classical conditioning extinction is achieved by presenting the CS without the UCS; in operant conditioning the procedure is response without reinforcement. Generalization Someone who receives reinforcement for a response in the presence of one stimulus will probably make the same response in the presence of a similar stimulus. The more similar a new stimulus is to the original reinforced stimulus, the more likely the same response. This phenomenon is known as stimulus generalization. For example, you might reach for the turn signal of a rented car in the same place you would find it in your own car. Examples of Generalisation Many harmless animals have evolved an appearance that resembles a poisonous animal, because any predator that learns to avoid the poisonous animal generalizes its learning and avoids the harmless animal also. Eastern Ecuador has two similar poisonous frog species and one harmless species that mimics their appearance. Discrimination If reinforcement occurs for responding to one stimulus and not another, the result is a discrimination between them, yielding a response to one stimulus and not the other. For example, you smile and greet someone you think you know, but then you realize it is someone else. After several such experiences, you learn to recognize the difference between the two people. Discriminative Stimuli A stimulus that indicates which response is appropriate or inappropriate is called a discriminative stimulus. A great deal of our behaviour is governed by discriminative stimuli. For example, you learn ordinarily to be quiet in class but to talk when the professor encourages discussion. You learn to drive fast on some streets and slowly on others. Throughout your day one stimulus after another signals which behaviours will yield reinforcement, punishment, or neither. The ability of a stimulus to encourage some responses and discourage others is known as stimulus control. Basic Processes in Classical and Operant Conditioning Explanations of Classical Conditioning What really? is classical conditioning, As is often the case, the process appeared simple at first, but later investigation found it to be a more complex and more interesting phenomenon. Pavlov noted that conditioning depended on the timing between CS and UCS Later studies contradicted that idea. For example, a shock (UCS) causes rats to jump and shriek, but a conditioned stimulus paired with shock makes them freeze in position. They react to the conditioned stimulus as a danger signal, not as if they felt a shock. Also, in delay conditioning, where a delay separates the end of the CS from the start of the UCS, the animal does not make a conditioned response immediately after the conditioned stimulus but instead waits until almost the end of the usual delay between the CS and the UCS. Again, it is not treating the CS as if it were the UCS; it is using it as a predictor, a way to prepare for the UCS (Gallistel & Gibbon, 2000). It is true, as Pavlov suggested, that the longer the delay between the CS and the UCS, the weaker the conditioning, other things being equal. However, just having the CS and UCS close together in time is not enough. It is essential that they occur more often together than they occur apart. That is, there must be some contingency or predictability between them. Consider this experiment: For rats in both Group 1 and Group 2, every presentation of a CS is followed by a UCS, as shown in Figure 6.9. However, for Group 2, the UCS also appears at many other times, without the CS. In other words, for this group, the UCS happens every few seconds anyway, and it isn’t much more likely with the CS than without it. Group 1 learns a strong response to the CS; Group 2 does not (Rescorla, 1968, 1988). Now consider this experiment: One group of rats receives a light (CS) followed by shock (UCS) until they respond consistently to the light. (The response is to freeze in place.) Then they get a series of trials with both a light and a tone, again followed by shock. Do they learn a response to the tone? No. The tone always precedes the shock, but the light already predicted the shock, and the tone adds nothing new. The same pattern occurs with the reverse order: First rats learn a response to the tone and then they get light–tone combinations before the shock. They continue responding to the tone, but not to the light, again because the new stimulus predicted nothing that wasn’t already predicted (Kamin, 1969). These results demonstrate the blocking effect: The previously established association to one stimulus blocks the formation of an association to the added stimulus. Again, it appears that conditioning depends on more than presenting two stimuli together in time. Learning occurs only when one stimulus predicts another. Later research has found that presenting two or more stimuli at a time often produces complex results that we would not have predicted from the results of single-stimulus experiments (Urushihara, Stout, & Miller, 2004). Chaining Behaviour Ordinarily, you don’t do just one action and then stop. You do a long sequence of actions. To produce sequences of learned behaviour, psychologists use a procedure called chaining. Assume you want to train an animal, perhaps a guide dog or a show horse, to go through a sequence of actions in a particular order. You could chain the behaviours, reinforcing each one with the opportunity to engage in the next one. First, the animal learns the final behaviour for a reinforcement. Then it learns the next to last behaviour, which is reinforced by the opportunity to perform the final behaviour. And so on. Schedules of Reinforcement The simplest procedure in operant conditioning is to provide reinforcement for every correct response, a procedure known as continuous reinforcement. However, in the real world, unlike the laboratory, continuous reinforcement is not common. Reinforcement for some responses and not for others is known as intermittent reinforcement. We behave differently when we learn that only some of our responses will be reinforced. Psychologists have investigated the effects of many schedules of reinforcement, which are rules or procedures for the delivery of reinforcement. Four schedules for the delivery of intermittent reinforcement are fixed ratio, fixed interval, variable ratio, and variable interval. A ratio schedule provides reinforcements depending on the number of responses. An interval schedule provides reinforcements depending on the timing of responses. Fixed-Ratio Schedule A fixed-ratio schedule provides reinforcement only after a certain (fixed) number of correct responses have been made—after every sixth response, for example. We see similar behaviour among pieceworkers in a factory whose pay depends on how many pieces they turn out or among fruit pickers who get paid by the bushel. A fixed-ratio schedule tends to produce rapid and steady responding. Researchers sometimes graph the results with a cumulative record, in which the line is flat when the animal does not respond, and it moves up with each response. For a fixed-ratio schedule, a typical result would look like the figure. However, if the schedule requires a large number of responses for reinforcement, the individual pauses after each reinforced response. For example, if you have just completed 10 calculus problems, you may pause briefly before starting your next assignment. After completing 100 problems, you would pause even longer. Variable-Ratio Schedule A variable-ratio schedule is similar to a fixed-ratio schedule, except that reinforcement occurs after a variable number of correct responses. For example, reinforcement may come after as few as one or two responses or after a great many. Variable-ratio schedules generate steady response rates. Variable-ratio schedules, or approximations of them, occur whenever each response has about an equal probability of success. For example, when you apply for a job, you might or might not be hired. The more times you apply, the better your chances, but you cannot predict how many applications you need to submit before receiving a job offer. Fixed-Interval Schedule A fixed-interval schedule provides reinforcement for the first response made after a specific time interval. For instance, an animal might get food for only the first response it makes after each 15-second interval. Then it would have to wait another 15 seconds before another response would be effective. Animals (including humans) on such a schedule learn to pause after each reinforcement and begin to respond again toward the end of the time interval. The cumulative record would look like the figure. Checking your mailbox is an example of behaviour on a fixed-interval schedule. If your mail is delivered at about 3 P.M., and you are eagerly awaiting an important package, you might begin to check around 2:30 and continue checking every few minutes until it arrives. Variable-Interval Schedule With a variable-interval schedule, reinforcement is available after a variable amount of time has elapsed. For example, reinforcement may come for the first response after 2 minutes, then for the first response after the next 7 seconds, then after 3 minutes 20 seconds, and so forth. You cannot know how much time will pass before your next response is reinforced. Consequently, responses on a variable-interval schedule occur slowly but steadily. Checking your e-mail is an example: A new message could appear at any time, so you check occasionally but not constantly. Stargazing is also reinforced on a variable-interval schedule. The reinforcement for stargazing—finding a comet, for example—appears at unpredictable intervals. Consequently, both professional and amateur astronomers scan the skies regularly. Cognitive Social Theory Introduction By the 1960s, many researchers and theorists had begun to wonder whether a psychological science could be built strictly on observable behaviours without reference to thoughts. Most agreed that learning is the basis of much of human behaviour, but some were not convinced that classical and operant conditioning could explain everything people do. From behaviourist learning principles thus emerged cognitive–social theory (sometimes called cognitive–social learning or cognitive– behavioural theory), which incorporates concepts of conditioning but adds two new features: a focus on cognition and a focus on social learning. Learning and Cognition According to cognitive–social theory, the way an animal construes the environment is as important to learning as actual environmental contingencies. That is, humans and other animals are always developing mental images of, and expectations about, the environment, and these cognitions influence their behaviour. Preparedness and Phobias According to Martin Seligman (1971) and other theorists (Öhman, 1979; Öhman, Dimberg, & Öst, 1985), evolution has also programmed organisms to acquire certain fears more readily than others because of a phenomenon called preparedness. Preparedness involves species-specific predispositions to be conditioned in certain ways and not others. Learned Helplessness The powerful impact of expectancies on the behaviour of nonhuman animals was dramatically demonstrated in a series of studies by Martin Seligman (1975). Seligman harnessed dogs so that they could not escape electric shocks. At first the dogs howled, whimpered, and tried to escape the shocks, but eventually they gave up; they would lie on the floor without struggle, showing physiological stress responses and behaviours resembling human depression. A day later Seligman placed the dogs in a shuttle-box from which they could easily escape the shocks. Unlike dogs in a control condition who had not been previously exposed to inescapable shocks, the dogs in the experimental condition made no effort to escape and generally failed to learn to do so even when they occasionally did escape. The dogs had come to expect that they could not get away; they had learned to be helpless. Learned helplessness consists of the expectancy that one cannot escape aversive events and the motivational and learning deficits that result from this belief. Explanatory Style Seligman argued that learned helplessness is central to human depression as well. In humans, however, learned helplessness is not an automatic outcome of uncontrollable aversive events. Seligman and his colleagues observed that some people have a positive, active coping attitude in the face of failure or disappointment, whereas others become depressed and helpless (Peterson, 2000; Peterson & Seligman, 1984). They demonstrated in dozens of studies that explanatory style plays a crucial role in whether or not people become, and remain, depressed. Explanatory Style Individuals with a depressive or pessimistic explanatory style blame themselves for the bad things that happen to them. In the language of helplessness theory, pessimists believe the causes of their misfortune are internal rather than external, leading to lowered self-esteem. They also tend to see these causes as stable (unlikely to change) and global (broad, general, and widespread in their impact). When a person with a pessimistic style does poorly on a biology exam, he may blame it on his own stupidity—an explanation that is internal, stable, and global. Most people, in contrast, would offer themselves explanations that permit hope and encourage further effort, such as “I didn’t study hard enough.” Observational Learning Albert Bandura, Dorothea Ross, and Sheila Ross (1963) studied the role of imitation for learning aggressive behavior. They asked two groups of children to watch films in which an adult or a cartoon character violently attacked an inflated “Bobo” doll. Another group watched a different film. They then left the children in a room with a Bobo doll. Only the children who had watched films with attacks on the doll attacked the doll themselves, using many of the same movements they had just seen. The clear implication is that children copy the aggressive behavior they have seen in others. Basic Processes Bandura has identified four key processes that are crucial in observational learning. The first two— attention and retention—highlight the importance of cognition in this type of learning. Attention. To learn through observation, you must pay attention to another person’s behaviour and its consequences. Retention. You may not have occasion to use an observed response for weeks, months, or even years. Thus, you must store a mental representation of what you have witnessed in your memory. Reproduction. Enacting a modelled response depends on your ability to reproduce the response by converting your stored mental images into overt behaviour. This step may not be easy for some responses. For example, most people cannot execute a breath-taking windmill dunk after watching Derrick Rose do it in a basketball game. Motivation. Finally, you are unlikely to reproduce an observed response unless you are motivated to do so. Your motivation depends on whether you encounter a situation in which you believe that the response is likely to pay off for you. 1. Contagion contagion, a phenomenon in which a response by one individual tends to elicit the same response in others, might be mistaken for observational learning. For example, perhaps you’ve noticed that when one person yawns, others tend to yawn also. Have they learned to yawn by observing others? Obviously not. 2. Classical Conditioning one might mistake classical conditioning for observational learning. Suppose Michelle is in the garage with her mother when a mouse scurries by. Her mother screams and jumps away. This might cause Michelle to be afraid of mice, but not necessarily because she learned her mother’s fear. It could be that her mother’s scream scared Michelle (just like the loud noise scared little Albert), and because the mouse was also present (just like the rat for little Albert) Michelle might learn to fear mice. 3. Stimulus enhancement behaviors that are due to stimulus enhancement might be mistaken for observational learning. Stimulus enhancement, as the name implies, occurs when attention is directed to a stimulus, such as when an illusionist says, “Keep your eyes on the red ball.” How could this be mistaken for observational learning? Well, suppose one night we discover that a raccoon has learned how to open a garbage can, and, much to our dismay, the following night many raccoons have opened many garbage cans. We might assume that they learned how to do this by watching the first raccoon, but that might not be what happened. It could be that the behavior of the first raccoon caused the other raccoons to realize that garbage cans might hold some tasty treasures—pizza crusts, fried chicken skins, and half-eaten jellyrolls. This might have emboldened the other raccoons to try to open garbage cans, and after a bit of effort they might have figured out how to do it. Thus, the first raccoon might not have taught them how to open garbage cans but might simply have directed their attention to the garbage cans. Self-Reinforcement and Self-Punishment We learn by observing others who are doing what we would like to do. If our sense of self-efficacy is strong enough, we try to imitate their behavior. But actually succeeding often requires prolonged efforts. People typically set a goal for themselves and monitor their progress toward that goal. They provide reinforcement or punishment for themselves, just as if they were training someone else. They say to themselves, “If I finish this math assignment on time, I’ll treat myself to a movie and a new magazine. If I don’t finish on time, I’ll make myself clean the stove and the sink.” (Nice threat, but people usually forgive themselves without imposing the punishment.) Questions?