Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
142 Animal Learning Animal Learning Introductory article Armando Machado, University of Minho, Braga, Portugal Francisco J Silva, University of Redlands, Redlands, California, USA CONTENTS Introduction Evolution and learning Habituation: adapting to repetitive, harmless events Pavlovian conditioning: learning about correlations The field of animal learning studies the behavioral mechanisms and processes that animals use to adapt to changes in their environment. Its emphasis is on environment±behavior interactions. INTRODUCTION Learning refers to a heterogeneous set of processes that evolved in animals to accommodate changes in their environments. These processes can produce relatively permanent changes in behavior and are brought into play by some form of interaction between the animal and its surroundings. For instance, a moving nematode (a tiny roundworm) that stops momentarily or reverses its motion when it experiences a vibration will cease doing so if the vibration occurs repeatedly. Foraging bees perform an intricate dance, the orientation and speed of which change with the direction and distance of the food source from the hive, such that other bees also can locate the food site. Hungry domestic cats mew in the presence of their owner, who will then give them food. In all of these examples, the organism's behavior is showing the effects of particular interactions between itself and its environment. Classifying distinct types of interactions, identifying their elements, quantifying their static and dynamic properties, and describing how their cumulative effects are expressed is the goal of people who study learning. Before classifying interactions between organisms and their environments, it is important to place the process of learning within an evolutionary context ± that is, to understand why learning evolved. EVOLUTION AND LEARNING It is widely assumed that learning evolved in specific contexts, such as gathering food, eluding predators, capturing prey, attracting mates and Operant conditioning: learning about causation and control Conclusion avoiding poisons. Despite this variety in contexts, countless experiments indicate that the same principles of learning hold across many different species, tasks and behaviors. How can we reconcile the assumption that learning evolved in specific contexts with the fact that it occurs similarly in many contexts? The answer is, by conceiving of learning processes as mixtures in varying proportions of both context-specific and general mechanisms. For example, taste aversion learning occurs when animals avoid gustatory and olfactory cues associated with foods that made them ill, even when the flavor is separated from the illness by many hours. That animals can learn which cues predict biologically significant events is commonplace; that animals can learn to avoid cues separated by many hours from a biologically significant event typically happens only when flavor is the predictive cue and illness is the significant event. To clarify further the relationship between context-specific and general features of learning, consider the following analogy. A house built for living near the Arctic Circle will differ considerably from one built for living in south Florida: the former requires thick insulation, double-paned windows and a furnace; the latter needs mildewresistant insulation, shaded glass and air conditioning. Despite these differences, there are general features common to both houses, such as the presence of windows, doors, rooms, walls and a roof, and a general function for both houses, such as sheltering and protecting its inhabitants from weather and predators. Because the function of a house is similar in both regions, there is some commonality to the solutions. The presence of general features in learning is illustrated by the fact that animals from widely separated taxa respond to environmental stimuli in similar ways. For example, they ignore repetitive harmless stimuli, a process called habituation; they Animal Learning detect correlations among biologically important events, a process called Pavlovian conditioning; and they learn causal relations between their behavior and its consequences, a process called operant conditioning. Central to these processes is temporal integration, for only by tracking and integrating events across time can animals determine whether an event is repeating, whether it occurs before, during or after other events, or whether it is a reliable consequence of responding or not responding. It should come as no surprise, then, that temporal variables are often critical determinants of learning. HABITUATION: ADAPTING TO REPETITIVE, HARMLESS EVENTS When a harmless stimulus occurs repeatedly and there are no other events associated with it, there might be an advantage to ignoring the stimulus. For example, imagine a land snail on a small wooden platform that vibrates briefly while the animal moves. Typically, this stimulus (the vibration) elicits a protective response, contracting the antennas. If these vibrations are repeated, say every 30 s, then the contractions decline. Eventually, the vibrations are ignored in the sense that the antennae are not contracted and the snail keeps moving. This waning of a response to repeated presentations of a stimulus is termed habituation. To interpret habituation we can appeal to the concept of response threshold, which is defined roughly as the minimum stimulus intensity required to elicit a reflexive response. As the stimulus is repeated, the threshold increases, which makes it more difficult to produce a response. Eventually the threshold is greater than the current stimulus intensity and the response fails to occur. Properties of Habituation Habituation is present in virtually every animal species, from single-celled animals such as the ciliate Vorticella to humans. It has even been observed in individual motor neurons. This widespread phenomenon deserves attention for two related reasons. First, it introduces some of the key variables and functional relations that psychologists have identified in most learning processes. Second, habituation reveals the amazing complexity of even the simplest of learning processes. Recovery from habituation In the example above, if the platform ceases to vibrate after habituation has occurred, then with 143 the passage of time since the last vibration, the snail's antennae are increasingly likely to contract when the platform again vibrates. In other words, habituation seems to `wear off' when the stimulus that produced it is not presented. This recovery of the response corresponds with a return of the threshold to its initial value. Stimulus intensity If the vibration of the platform is sufficiently intense, then the snail may withdraw into its shell. This response might also habituate if the strong stimulus is repeated. Typically, however, the courses of habituation and recovery from habituation are slower for strong than for weak stimuli. When the stimulus is more intense, the rise of the threshold takes longer to surpass the stimulus intensity, and the fall of the threshold in the absence of the stimulus takes longer to return to its initial, baseline value. Time between stimulus presentations All else being equal, stimuli closer in time produce faster habituation than stimuli farther apart. This finding is consistent with the threshold account of habituation: longer intervals give the threshold more time to decrease, which partly offsets the effect of the stimulus presentations. Interestingly, high rates of stimulation may also lead to faster recovery from habituation. This result, unlike the preceding ones, does not follow from the view that changes in threshold are responsible for habituation, unless the rate at which the threshold returns to its baseline value depends on the rate of stimulation. Relearning effect Imagine the following experiment. After we record the course of habituation on day 1, we stop the vibrations and let recovery occur. On day 2 we vibrate the platform again and record the new course of habituation. Typically, the rate of habituation is faster during the second day. The importance of this finding is that the difference in the rates of habituation shows that the animal's internal state on day 2 is different from its state on day 1 despite the similarity in its initial responses. Again, a simple change in the response threshold cannot accommodate this finding. Stimulus generalization and stimulus specificity The contraction of the antennas ceases not only in the presence of the original vibration, but also in the presence of similar stimuli (stimulus generalization). However, it is readily elicited by different 144 Animal Learning stimuli such as a blast of air (stimulus specificity). These properties are two sides of the same coin; generalization focuses on the fact that habituation to one stimulus extends to some of the other stimuli that can also elicit the response; specificity focuses on the fact that habituation to one stimulus does not extend to all stimuli that can elicit the response. Careful empirical work is needed to identify the stimulus properties (e.g. intensity, duration, rate) along which generalization proceeds. Functions of Habituation As noted above, habituation has been observed in almost every animal species. Why is it so prevalent ± and why do the properties of habituation hold true across many different species, responses and a fortiori physiological mechanisms? Probably, habituation occurs because it is sometimes safe and economical to ignore a repetitive stimulus. An animal that continued to respond to every stimulus impinging on its receptors would be overwhelmed by stimulation and incapable of acting appropriately. However, the animal would pay a high price if the effects of habituation were permanent, for what was once a harmless vibration caused by the running of a distant animal might now be caused by an approaching predator. In the same vein, assuming stronger stimuli are potentially more harmful than weaker stimuli, it makes sense that they should be ignored less quickly than weaker stimuli. Classifying incoming stimuli as `The same!' `The same!' `The same!' also seems safer when the stimuli are close in time. PAVLOVIAN CONDITIONING: LEARNING ABOUT CORRELATIONS Food and water, predators and prey, mates and offspring, and escape routes and shelter, are some of the primary determiners of survival and reproduction. As such, it is reasonable to attribute great evolutionary advantage to animals capable of anticipating them. For all animals, specific sounds, sights or odor trails, places or times of occurrence, or more complex sequences and configurations of stimuli might be reliably correlated with biologically important events. If an animal could learn the correlational texture of its world (i.e. the relationships among events), then it would have the advantage of responding one way when a stimulus predicts an important event and in another way when a stimulus does not. The correlations that an animal can learn depend on the animal and the types of stimuli and events in its environment. In terms of the animal, there might be reliable cues that it cannot detect simply because it has not evolved the required biological machinery (e.g. sensory receptors). In terms of the environment, a stimulus might be detectable but its frequency of occurrence or its reliability as a cue might be too low to support the evolution of an ability to fully exploit it; the cost would outweigh the benefit, as it were. Learning about the cueing function of a stimulus is therefore constrained both by the animal's physiology and the specific arrangements of events in the animal's world. Constraints notwithstanding, how does an animal learn the correlation between a neutral and a biologically important stimulus? The pioneering work on this question was conducted by Ivan Petrovich Pavlov (1849±1936), the famous Russian physiologist and 1904 Nobel prizewinner. In good scientific fashion, Pavlov reduced the problem to its bare essentials: a tone reliably preceded a bit of meat powder delivered to the mouth of a hungry dog. Of interest was the animal's behavior during the tone. Initially, when the tone was presented the dog pricked up its ears and looked in the direction of the source of the tone, but, critically, it did not salivate. After a few pairings of the tone and food, the orienting response elicited by the tone ceased (habituation had set in) and a new response during the tone began to occur ± salivation. Because, `food in the mouth' elicited copious salivation without any previous training, Pavlov called it the unconditional stimulus (US) and `salivation in the presence of food' the unconditional response (UR). As the quantity and quality of salivation to the tone depended on the prior predictive history of the tone, Pavlov called the salivation to the tone a conditional response (CR) and the tone a conditional stimulus (CS). The study of how behavior changes when two or more stimuli are paired, as in the preceding example, is known as Pavlovian or classical conditioning. With this and similar laboratory preparations, Pavlov and many subsequent researchers have tried to understand how animals learn the cueing function of stimuli. Some of their experiments showed the following results, many of which resemble those obtained in studies of habituation. Extinction If, after the tone elicits salivation reliably, it is presented without the food, then the dog will eventually stop salivating during the tone. That is, when the CS no longer predicts the US, the CR weakens and may eventually disappear. Through acquisition Animal Learning and extinction processes, animals adjust to changes in the pattern of events in their environment. Spontaneous Recovery If the experimenter allows the dog to rest for, say, 24 h after the extinction training, and then presents the tone again, the animal that had stopped salivating to the tone may again salivate to it; that is, the CR spontaneously recovers. The passage of time undoes some of the effects of extinction. Why spontaneous recovery of the CR happens is still poorly understood. Stimulus Generalization Having learned to salivate to a specific tone, the dog also will salivate to similar tones. That is, a CR will be elicited by the original stimulus as well as similar stimuli; however, the more different these other stimuli are from the original CS, the weaker the CR they elicit. Because no stimulus ever recurs in precisely the same way (e.g. the rustling of the leaves announcing a lion is different in different situations), it is advantageous to extend newly acquired responses to similar stimuli. Stimulus Discrimination When Pavlov alternated two tones during training and paired one but not the other with food, his dogs eventually salivated only to the tone paired with food. That is, if one stimulus (CS) is paired with a US, but another stimulus (CS ) is not, then the CR will occur only or mainly in the presence of the CS. Stimulus discrimination helps ensure that a response occurs in particular environments, rather than indiscriminately across situations and time. Contingency Effects Assume that during the original training the food only follows the tone on 50 percent of the trials. On the remaining 50 percent the tone occurs alone. Under this circumstance, the amount of salivation to the tone during training is smaller than when the food always followed the tone. Similarly, if food also occurs occasionally in the absence of the tone, the CR is weaker than when food only follows the tone. In the extreme, if food occurs more often in the absence of the tone than in its presence, the tone will actively suppress salivation instead of eliciting it. In summary, the results of many experiments show that animals are sensitive to the direction 145 (positive or negative) and the strength of the correlation, or contingency, between the CS and the US. The effect of contingency shows that temporal contiguity between the tone and food is insufficient to ensure that the tone will become a CS. Much depends on what else the animal has been experiencing, both during the presence and the absence of the tone. That is, the animal seems to integrate events that are temporally extended, and to behave according to the actual correlation value between the tone and the food. Both temporal and probability relations between the CS and US, or contiguity and contingency, are important in Pavlovian conditioning. In fact, the process is even more complex than stated above. Consider an experiment in which a tone is paired with food until it elicits salivation reliably. Next, the tone is presented along with a light, and this compound stimulus is followed with food. Will the light elicit salivation when it is presented alone and without the food? Because food always occurs after the light and never in its absence, the light is maximally (and positively) correlated with food. Moreover, because the food closely follows the light, the two stimuli also are temporally contiguous. Hence, one might predict a strong association between the light and food and, therefore, salivation to the light. However, routinely little or no salivation to the light is found. Control experiments indicate that because the tone already predicted the food at the end of the first part of the experiment, it somehow blocked the association `light±food'. We could say that the light provided no new information about the food beyond that already provided by the tone and, hence, the light did not help the animal anticipate the US any better than the tone. The important point is that such blocking highlights the fact that an animal's prior experiences can modulate the effects of contiguity and contingency. Relevance of Pavlovian Conditioning Since Pavlov's pioneering work, the study of Pavlovian conditioning has revealed many other complex relations among the CR and temporal variables, the sequential arrangements of the various stimuli, the context in which conditioning occurs, and the animal species and the particular response system under consideration. Pavlovian conditioning is fundamental to understanding drug addictions, phobias and a variety of sexual responses in humans and other animals. Its domain of study also has become increasingly quantitative. Real-time, dynamic models of the learning process 146 Animal Learning have started to replace verbal accounts. However, much remains unknown about the process through which stimuli that are insignificant when considered in isolation become significant when they signal biologically important events. OPERANT CONDITIONING: LEARNING ABOUT CAUSATION AND CONTROL The preceding discussion focused on how an animal's behavior is changed by repeated presentations of single stimuli (habituation) or by relationships among stimuli (Pavlovian conditioning). In both of these situations, behavior changes as a result of the stimuli that precede it. However, it is also the case that things happen after a response. For example, a young male cowbird sings one of its song variants and elicits a subtle wing flick from a female. An adult male cowbird sings a variant that stimulates a precopulatory display in a female and a vigorous attack from a dominant male. However, if the same adult cowbird sings a less stimulating song, then it avoids being chased by the dominant male. Operant or instrumental learning results when an animal's behavior causes a stimulus change, which in turn changes the animal's subsequent behavior: the young cowbird is more likely to sing the variant that caused the positive female reaction; the adult cowbird is less likely to sing the song that caused the attack and more likely to sing the one that avoided it. This capacity to change behavior because of its consequences enables animals to learn about control and to exploit the causal texture of their social and physical worlds. In the examples above, the operant response produced different types of consequences. Psychologists classify these consequences by means of their effect on behavior. Consequences of an action that increase the likelihood of that action recurring are termed `reinforcers' ± positive reinforcers if the consequence is the occurrence of a stimulus (e.g. the wing flick display from the female), and negative reinforcers if the consequence is the cessation or avoidance of a stimulus (e.g. the threat and attack avoided by the adult cowbird when it sang the less stimulating song). Consequences of an action that decrease the probability of that action recurring are `punishers' (e.g. the attack suffered by the low-ranking cowbird when it sang its most stimulating song). By modifying its behavior to produce reinforcers and eliminate or avoid punishers, an animal shapes its world while its behavior is shaped by its world. This closed feedback system is the hallmark of operant conditioning. The laboratory study of operant conditioning began with the work of Edward L. Thorndike (1874±1949), who showed that cats placed in a puzzle box (a wooden cage with a door that could be operated by the animal) become quicker at escaping with repeated successes. Later, B. F. Skinner (1904±1990) studied how behavior is shaped by its consequences, and how new response forms emerge when variations in behavior have different consequences. To conduct his experiments, Skinner invented the operant chamber, a box with a lever that hungry rats could press to receive food dispensed into a tray. The operant chamber soon became the microscope of learning psychologists. What sort of consequences function as reinforcers or punishers? An agreed-upon theory that predicts which stimuli reinforce or punish behavior, the circumstances in which they do so, and why they do so, has eluded experimental psychologists. However, researchers have identified several factors that influence the behavioral effects of reinforcers and punishers. Contiguity and Contingency As in Pavlovian conditioning, contiguity and contingency are important. Other things being equal, long intervals between a response and a consequence weaken the strengthening effect of the latter; the more immediately a consequence follows a response, the more likely it is that the response will be affected by the consequence. However, short intervals may have weak effects if the correlation between the response and the consequence is low. This can occur in two ways ± when a response is followed by the consequence too infrequently, or when a reinforcer occurs independently of the response that normally produces it. An adult, subordinate cowbird might continue to sing its most stimulating song if that song rarely produces aggression from other males; a young cowbird might spend less time singing if the positive female display occurred in the absence of the song. Extinction The behavioral changes brought about with operant conditioning are reversible. When the environment changes and a response that used to be followed by a positive outcome is no longer followed by it, the probability of that response occurring declines. That is, the animal ceases emitting behavior that is no longer functional. However, extinction may be rapid or slow. On some occasions, the animal quickly changes its behavior, Animal Learning whereas on other occasions it perseveres for long periods of time. Whether extinction is rapid or slow depends largely on whether every instance of a particular response was reinforced (rapid) or only occasionally reinforced (slow). Schedules of Reinforcement In the natural environment, it is rare that every instance of a particular response is followed by a consequence. To understand the effects of intermittent reinforcement psychologists have studied different rules specifying when a response will produce a consequence. Two examples of these rules, collectively known as schedules of reinforcement, are ratio and interval schedules. In ratio schedules the reinforcer depends solely on the occurrence of behavior (e.g. a rat receives food each time it completes five lever presses); in interval schedules the reinforcer depends on the occurrence of behavior and the passage of time (e.g. a rat receives food following the first lever press after 15 s since the previous occurrence of food). Because there are no restrictions to when the rat can press the lever, the experimenter can study how the rate of a response changes across time as a function of how it produces a consequence. Typically, ratio schedules support higher rates of responding than comparable interval schedules. Why? Because the two types of schedule induce different feedback functions, that is, different relations between response rate and reinforcement rate. As an illustration, consider a ratio schedule in which five responses produce one reinforcer. In this schedule, how often reinforcers occur depends exclusively on how rapidly the animal responds; reinforcement rate will always equal one-fifth of the response rate. In contrast, consider an interval schedule in which a response is reinforced if it occurs at least 15 s since the previous reinforcer. In this schedule, a response rate of four responses per minute matches the reinforcement rate. Slower response rates produce proportional changes in reinforcer rates, but faster response rates do not. That is, reinforcement rate ceases to vary with response rate. Differences in the feedback function explain why ratio schedules typically produce higher rates of responding than interval schedules. Choice Just as simple processes combine to produce complex phenomena such as weather, geological formations and evolution, so too do basic processes of learning combine to produce more complex 147 behavior. Choice among options is an example that illustrates how reinforcement rates interact to affect behavior. In the simplest situation, an animal faces two response keys, each of which delivers a reinforcer according to a schedule of reinforcement. For example, a pigeon might peck one key and receive a morsel of food with probability p, or peck another key and receive food with probability q. Another example might consist of a rat that presses one lever that delivers reinforcers with rate r, or another lever that delivers them with rate s. Studies such as these with pigeons, rats, rhesus monkeys and humans, among other species, have yielded a robust empirical finding known as the matching law. The law states that the proportion of choices on one alternative equals the proportion of reinforcers obtained from that alternative. In symbols: x= x y Rx = Rx Ry 1 where x and y are the total numbers of choices of each alternative, and Rx and Ry are the corresponding total numbers of obtained reinforcers. Much less understood is how basic behavioral processes combine to yield the matching law. Some researchers propose that the equality is the outcome of the cumulative strengthening effect of individual reinforcers on the two response alternatives. Others suggest that the law results from the animal's sensitivity to global rates of reinforcement and its ability to maximize these rates under constraint. Still others suggest that matching derives from the tracking of the intervals between successive reinforcers on the two alternatives. In these three hypotheses we see, once again, the difficulty of determining the timescale of the learning process. Equally poorly understood is the acquisition of preference and how it relates to the various parameters of the choice situations (e.g. how the values of p and q, or r and s, in the examples above, determine how fast the animal comes to prefer the best alternative). Stimulus Control Because no response occurs in a vacuum, a response±consequence relation is always contextspecific. In the laboratory, if a pigeon is reinforced for pecking a green key but not a red one, then the bird will restrict its pecking to the green key. This differential responding occurs because the two stimuli are correlated with different response± consequence relations: but, as with habituation and Pavlovian conditioning, the extent that stimuli different from the ones used in training control 148 Animal Learning operant behavior depends on a variety of factors. For example, the amount of pecking in the presence of stimuli similar to the green key (e.g. a blue key) and stimuli similar to the red key (e.g. an orange key) depends on how much training the pigeon received with the original stimuli. More extensive stimulus discrimination training promotes discrimination, whereas less training promotes stimulus generalization. Also, reinforcing behavior differentially in the presence of different stimuli produces sharper discriminations (less generalization) than reinforcing a response in the presence of a single stimulus. Moreover, when two or more stimulus elements signal that a response is likely to be reinforced, some form of stimulus competition may ensue. Consider the following experiment: a pigeon is trained to peck a green key with a black horizontal line, but not to peck a red key with a vertical line. The degree of stimulus discrimination and generalization is then tested by recording the amount of pecking at a white key during presentations of the line in various degrees of orientation. During the test, the pigeon pecked all line orientations similarly (i.e. stimulus generalization). However, when tested with keys of different colours but without the line, the pigeon pecked most at the green key and least at the red one (i.e. stimulus discrimination). This example shows that, for reasons that are poorly understood, some features of a stimulus may overshadow others. For the bird in this example, color overshadowed line orientation; but the reverse could have happened. A similar effect occurs in Pavlovian conditioning. Timing Temporal variables play a fundamental role in habituation and in Pavlovian and operant conditioning. Time may also be more directly involved in learning, as when animals learn to act according to the temporal attributes of a stimulus. For example, rats, pigeons, monkeys and other vertebrates can learn to behave in one manner after a 2 s stimulus and in another manner after an 8 s stimulus. When a reinforcer such as food is available periodically, say every 30 s (an example of an interval reinforcement schedule), animals learn to pause immediately after food and then, after about 15 s, respond at an increasingly faster rate until reinforcement. The study of the temporal regulation of behavior is one of the most developed areas in the study of animal learning. A major empirical finding that has emerged from these studies is the scalar property of temporal discrimination, which states that all temporal judgments are relative. Hence, how a rat behaves at 10 s when reinforcement occurs every 30 s is similar to the way it behaves at 20 s when reinforcement occurs every 60 s. As another example, assume that a rat is trained to press a lever on the left after a 2 s signal and a lever on the right after an 8 s signal. Empirical tests show that the rat will be indifferent (i.e. it is just as likely to press the left as the right lever) when presented with a 4 s signal, because 2 is to 4 as 4 is to 8. However, if the two training stimuli were 4 s and 16 s long, then the rat would be indifferent when presented with an 8 s signal. The scalar property derives its name from the fact that when the intervals of a temporal discrimination change, the animal's performance is scaled (stretched or shrunk) by the same factor. Why the scalar property holds remains a matter of controversy. Avoidance Learning Although operant and Pavlovian conditioning seem clearly distinguishable, components of each are often present in the procedures of the other. In this sense, it is perhaps better to conceive of operant and Pavlovian conditioning as analogous to elemental hydrogen and oxygen. These two elements are rare in nature, but their combination in the form of water is common. Similarly, most learned behavior is a mixture in varying proportions of operant and Pavlovian conditioning. The clearest examples of operant±Pavlovian interaction can be seen in situations where animals have to avoid aversive outcomes. A gopher that sees a hawk overhead will not wait to see what the bird will do; upon sensing the hawk, the gopher will retreat to its burrow. In the laboratory, a dog will jump over a hurdle during a tone if this response prevents the delivery of shock. In these and similar circumstances, the sight of the hawk or the sound of the tone predict an aversive outcome (being attacked or shocked) unless a certain response occurs (retreating to a burrow or jumping a hurdle). The relation between the signal and the aversive event is a Pavlovian CS±US relation; but because responding during the CS allows the animal to avoid the aversive event, this response is negatively reinforced (an operant response± consequence relation). Relevance of Operant Conditioning Since Thorndike's and Skinner's early work, the study of operant conditioning has been extended Animal Learning in many different directions. Neuroscientists, pharmacologists and clinical psychologists, for example, have used the techniques and conceptual tools of operant conditioning to understand the functioning of the nervous system, the behavioral effects of drugs, and the intricacies of behavioral disorders such as depression. Artificial intelligence researchers also have borrowed ideas from the domain of operant conditioning to design systems that learn through the consequences of their actions. The study of operant conditioning also has become more quantitative. As in Pavlovian conditioning, real-time, dynamic models of the operant process have started to replace purely verbal accounts. However, much remains unknown. For example, although it is reasonable to assume that a consequence with survival value (or which is closely associated with a stimulus that has survival value) will have a strong effect on the response that produced it, research has yet to yield a general theory of reinforcement and punishment. CONCLUSION The processes reviewed above represent only a fraction of the most basic categories of the taxonomy of learning. Much else has been done, but still more remains to be investigated. Despite a century of research, three central questions of learning theory remain largely unanswered. First, how does an animal's evolutionary history constrain the sorts of things that it can learn? Second, how are processes that occur on different timescales 149 integrated? Third, why are seemingly simple processes so complexly organized? These questions are likely to set the research agenda of learning psychologists for the next decades. Further Reading Abramson CI (1994) A Primer of Invertebrate Learning: The Behavioral Perspective. Washington, DC: American Psychological Association. Hearst E (1988) Fundamentals of learning and conditioning. In: Atkinson RC, Herrnstein RJ, Lindzey G and Luce RD (eds) Steven's Handbook of Experimental Psychology, 2nd edn, vol. 2, Learning and Cognition, pp. 3±109. New York, NY: John Wiley. Mackintosh NJ (1974) The Psychology of Animal Learning. New York, NY: Academic Press. Mackintosh NJ (1983) Conditioning and Associative Learning. Oxford, UK: Oxford University Press. Mazur J (1998) Learning and Behavior, 4th edn. London: Prentice-Hall. Pavlov IP (1927) Conditioned Reflexes, translated by GV Anrep. London, UK: Oxford University Press. Rescorla RA (1988) Pavlovian conditioning: it's not what you think it is. American Psychologist 43: 151±160. Skinner BF (1961) Selection by consequences. Science 213: 501±504. Staddon JER (2001) Adaptive Dynamics: The Theoretical Analysis of Behavior. Cambridge, MA: MIT Press. Thorndike EL (1911) Animal Intelligence. New York, NY: Macmillan. Williams BA (1988) Reinforcement, choice, and response strength. In: Atkinson RC, Herrnstein RJ, Lindzey G and Luce RD (eds) Steven's Handbook of Experimental Psychology, 2nd edn, vol. 2, Learning and Cognition, pp. 167±244. New York, NY: John Wiley.