Download Animal Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Behavior analysis of child development wikipedia , lookup

Verbal Behavior wikipedia , lookup

Learning theory (education) wikipedia , lookup

Behaviorism wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Classical conditioning wikipedia , lookup

Psychophysics wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
142
Animal Learning
Animal Learning
Introductory article
Armando Machado, University of Minho, Braga, Portugal
Francisco J Silva, University of Redlands, Redlands, California, USA
CONTENTS
Introduction
Evolution and learning
Habituation: adapting to repetitive, harmless events
Pavlovian conditioning: learning about correlations
The field of animal learning studies the behavioral
mechanisms and processes that animals use to
adapt to changes in their environment. Its emphasis
is on environment±behavior interactions.
INTRODUCTION
Learning refers to a heterogeneous set of processes
that evolved in animals to accommodate changes in
their environments. These processes can produce
relatively permanent changes in behavior and are
brought into play by some form of interaction between the animal and its surroundings. For instance, a moving nematode (a tiny roundworm)
that stops momentarily or reverses its motion
when it experiences a vibration will cease doing
so if the vibration occurs repeatedly. Foraging
bees perform an intricate dance, the orientation
and speed of which change with the direction and
distance of the food source from the hive, such that
other bees also can locate the food site. Hungry
domestic cats mew in the presence of their owner,
who will then give them food. In all of these
examples, the organism's behavior is showing the
effects of particular interactions between itself and
its environment. Classifying distinct types of interactions, identifying their elements, quantifying
their static and dynamic properties, and describing
how their cumulative effects are expressed is the
goal of people who study learning. Before classifying interactions between organisms and their environments, it is important to place the process of
learning within an evolutionary context ± that is, to
understand why learning evolved.
EVOLUTION AND LEARNING
It is widely assumed that learning evolved in specific contexts, such as gathering food, eluding
predators, capturing prey, attracting mates and
Operant conditioning: learning about causation and
control
Conclusion
avoiding poisons. Despite this variety in contexts,
countless experiments indicate that the same principles of learning hold across many different
species, tasks and behaviors. How can we reconcile
the assumption that learning evolved in specific
contexts with the fact that it occurs similarly in
many contexts? The answer is, by conceiving of
learning processes as mixtures in varying proportions of both context-specific and general mechanisms. For example, taste aversion learning occurs
when animals avoid gustatory and olfactory cues
associated with foods that made them ill, even
when the flavor is separated from the illness by
many hours. That animals can learn which cues
predict biologically significant events is commonplace; that animals can learn to avoid cues separated by many hours from a biologically significant
event typically happens only when flavor is the
predictive cue and illness is the significant event.
To clarify further the relationship between
context-specific and general features of learning,
consider the following analogy. A house built for
living near the Arctic Circle will differ considerably
from one built for living in south Florida: the
former requires thick insulation, double-paned
windows and a furnace; the latter needs mildewresistant insulation, shaded glass and air conditioning. Despite these differences, there are general
features common to both houses, such as the presence of windows, doors, rooms, walls and a roof,
and a general function for both houses, such as
sheltering and protecting its inhabitants from
weather and predators. Because the function of a
house is similar in both regions, there is some commonality to the solutions.
The presence of general features in learning is
illustrated by the fact that animals from widely
separated taxa respond to environmental stimuli
in similar ways. For example, they ignore repetitive
harmless stimuli, a process called habituation; they
Animal Learning
detect correlations among biologically important
events, a process called Pavlovian conditioning;
and they learn causal relations between their
behavior and its consequences, a process called
operant conditioning.
Central to these processes is temporal integration, for only by tracking and integrating events
across time can animals determine whether an
event is repeating, whether it occurs before, during
or after other events, or whether it is a reliable
consequence of responding or not responding. It
should come as no surprise, then, that temporal
variables are often critical determinants of learning.
HABITUATION: ADAPTING TO
REPETITIVE, HARMLESS EVENTS
When a harmless stimulus occurs repeatedly and
there are no other events associated with it, there
might be an advantage to ignoring the stimulus.
For example, imagine a land snail on a small
wooden platform that vibrates briefly while the
animal moves. Typically, this stimulus (the vibration) elicits a protective response, contracting the
antennas. If these vibrations are repeated, say every
30 s, then the contractions decline. Eventually, the
vibrations are ignored in the sense that the antennae are not contracted and the snail keeps moving.
This waning of a response to repeated presentations of a stimulus is termed habituation.
To interpret habituation we can appeal to the
concept of response threshold, which is defined
roughly as the minimum stimulus intensity required to elicit a reflexive response. As the stimulus
is repeated, the threshold increases, which makes it
more difficult to produce a response. Eventually
the threshold is greater than the current stimulus
intensity and the response fails to occur.
Properties of Habituation
Habituation is present in virtually every animal
species, from single-celled animals such as the ciliate Vorticella to humans. It has even been observed
in individual motor neurons. This widespread phenomenon deserves attention for two related
reasons. First, it introduces some of the key variables and functional relations that psychologists
have identified in most learning processes. Second,
habituation reveals the amazing complexity of even
the simplest of learning processes.
Recovery from habituation
In the example above, if the platform ceases to
vibrate after habituation has occurred, then with
143
the passage of time since the last vibration, the
snail's antennae are increasingly likely to contract
when the platform again vibrates. In other words,
habituation seems to `wear off' when the stimulus
that produced it is not presented. This recovery of
the response corresponds with a return of the
threshold to its initial value.
Stimulus intensity
If the vibration of the platform is sufficiently intense, then the snail may withdraw into its shell.
This response might also habituate if the strong
stimulus is repeated. Typically, however, the
courses of habituation and recovery from habituation are slower for strong than for weak stimuli.
When the stimulus is more intense, the rise of the
threshold takes longer to surpass the stimulus intensity, and the fall of the threshold in the absence
of the stimulus takes longer to return to its initial,
baseline value.
Time between stimulus presentations
All else being equal, stimuli closer in time produce
faster habituation than stimuli farther apart. This
finding is consistent with the threshold account of
habituation: longer intervals give the threshold
more time to decrease, which partly offsets the effect
of the stimulus presentations. Interestingly, high
rates of stimulation may also lead to faster recovery
from habituation. This result, unlike the preceding
ones, does not follow from the view that changes in
threshold are responsible for habituation, unless
the rate at which the threshold returns to its baseline value depends on the rate of stimulation.
Relearning effect
Imagine the following experiment. After we record
the course of habituation on day 1, we stop the
vibrations and let recovery occur. On day 2 we
vibrate the platform again and record the new
course of habituation. Typically, the rate of habituation is faster during the second day. The importance of this finding is that the difference in the rates
of habituation shows that the animal's internal state
on day 2 is different from its state on day 1 despite
the similarity in its initial responses. Again, a
simple change in the response threshold cannot
accommodate this finding.
Stimulus generalization and stimulus
specificity
The contraction of the antennas ceases not only in
the presence of the original vibration, but also in
the presence of similar stimuli (stimulus generalization). However, it is readily elicited by different
144
Animal Learning
stimuli such as a blast of air (stimulus specificity).
These properties are two sides of the same coin;
generalization focuses on the fact that habituation
to one stimulus extends to some of the other stimuli
that can also elicit the response; specificity focuses
on the fact that habituation to one stimulus does
not extend to all stimuli that can elicit the response.
Careful empirical work is needed to identify the
stimulus properties (e.g. intensity, duration, rate)
along which generalization proceeds.
Functions of Habituation
As noted above, habituation has been observed in
almost every animal species. Why is it so prevalent
± and why do the properties of habituation hold
true across many different species, responses and
a fortiori physiological mechanisms? Probably, habituation occurs because it is sometimes safe and
economical to ignore a repetitive stimulus. An
animal that continued to respond to every stimulus
impinging on its receptors would be overwhelmed
by stimulation and incapable of acting appropriately. However, the animal would pay a high
price if the effects of habituation were permanent,
for what was once a harmless vibration caused by
the running of a distant animal might now be
caused by an approaching predator. In the same
vein, assuming stronger stimuli are potentially
more harmful than weaker stimuli, it makes sense
that they should be ignored less quickly than
weaker stimuli. Classifying incoming stimuli as
`The same!' `The same!' `The same!' also seems
safer when the stimuli are close in time.
PAVLOVIAN CONDITIONING:
LEARNING ABOUT CORRELATIONS
Food and water, predators and prey, mates and
offspring, and escape routes and shelter, are some
of the primary determiners of survival and reproduction. As such, it is reasonable to attribute great
evolutionary advantage to animals capable of anticipating them. For all animals, specific sounds,
sights or odor trails, places or times of occurrence,
or more complex sequences and configurations of
stimuli might be reliably correlated with biologically important events. If an animal could learn the
correlational texture of its world (i.e. the relationships among events), then it would have the advantage of responding one way when a stimulus
predicts an important event and in another way
when a stimulus does not.
The correlations that an animal can learn depend
on the animal and the types of stimuli and events in
its environment. In terms of the animal, there might
be reliable cues that it cannot detect simply because
it has not evolved the required biological machinery (e.g. sensory receptors). In terms of the environment, a stimulus might be detectable but its
frequency of occurrence or its reliability as a cue
might be too low to support the evolution of an
ability to fully exploit it; the cost would outweigh
the benefit, as it were. Learning about the cueing
function of a stimulus is therefore constrained both
by the animal's physiology and the specific arrangements of events in the animal's world.
Constraints notwithstanding, how does an
animal learn the correlation between a neutral and
a biologically important stimulus? The pioneering
work on this question was conducted by Ivan Petrovich Pavlov (1849±1936), the famous Russian
physiologist and 1904 Nobel prizewinner. In good
scientific fashion, Pavlov reduced the problem to
its bare essentials: a tone reliably preceded a bit of
meat powder delivered to the mouth of a hungry
dog. Of interest was the animal's behavior during
the tone. Initially, when the tone was presented the
dog pricked up its ears and looked in the direction
of the source of the tone, but, critically, it did not
salivate. After a few pairings of the tone and food,
the orienting response elicited by the tone ceased
(habituation had set in) and a new response during
the tone began to occur ± salivation. Because, `food
in the mouth' elicited copious salivation without
any previous training, Pavlov called it the unconditional stimulus (US) and `salivation in the presence of food' the unconditional response (UR). As
the quantity and quality of salivation to the tone
depended on the prior predictive history of the
tone, Pavlov called the salivation to the tone a conditional response (CR) and the tone a conditional
stimulus (CS). The study of how behavior changes
when two or more stimuli are paired, as in the
preceding example, is known as Pavlovian or classical conditioning.
With this and similar laboratory preparations,
Pavlov and many subsequent researchers have
tried to understand how animals learn the cueing
function of stimuli. Some of their experiments
showed the following results, many of which resemble those obtained in studies of habituation.
Extinction
If, after the tone elicits salivation reliably, it is presented without the food, then the dog will eventually stop salivating during the tone. That is, when
the CS no longer predicts the US, the CR weakens
and may eventually disappear. Through acquisition
Animal Learning
and extinction processes, animals adjust to changes
in the pattern of events in their environment.
Spontaneous Recovery
If the experimenter allows the dog to rest for, say,
24 h after the extinction training, and then presents
the tone again, the animal that had stopped salivating to the tone may again salivate to it; that is, the
CR spontaneously recovers. The passage of time
undoes some of the effects of extinction. Why spontaneous recovery of the CR happens is still poorly
understood.
Stimulus Generalization
Having learned to salivate to a specific tone, the
dog also will salivate to similar tones. That is, a CR
will be elicited by the original stimulus as well as
similar stimuli; however, the more different these
other stimuli are from the original CS, the weaker
the CR they elicit. Because no stimulus ever recurs
in precisely the same way (e.g. the rustling of the
leaves announcing a lion is different in different
situations), it is advantageous to extend newly acquired responses to similar stimuli.
Stimulus Discrimination
When Pavlov alternated two tones during training
and paired one but not the other with food, his
dogs eventually salivated only to the tone paired
with food. That is, if one stimulus (CS‡) is paired
with a US, but another stimulus (CS ) is not, then
the CR will occur only or mainly in the presence of
the CS‡. Stimulus discrimination helps ensure that
a response occurs in particular environments,
rather than indiscriminately across situations
and time.
Contingency Effects
Assume that during the original training the food
only follows the tone on 50 percent of the trials. On
the remaining 50 percent the tone occurs alone.
Under this circumstance, the amount of salivation
to the tone during training is smaller than when the
food always followed the tone. Similarly, if food
also occurs occasionally in the absence of the tone,
the CR is weaker than when food only follows the
tone. In the extreme, if food occurs more often in
the absence of the tone than in its presence, the tone
will actively suppress salivation instead of eliciting
it. In summary, the results of many experiments
show that animals are sensitive to the direction
145
(positive or negative) and the strength of the correlation, or contingency, between the CS and the US.
The effect of contingency shows that temporal
contiguity between the tone and food is insufficient
to ensure that the tone will become a CS. Much
depends on what else the animal has been experiencing, both during the presence and the absence of
the tone. That is, the animal seems to integrate
events that are temporally extended, and to behave
according to the actual correlation value between
the tone and the food. Both temporal and probability relations between the CS and US, or contiguity
and contingency, are important in Pavlovian conditioning.
In fact, the process is even more complex than
stated above. Consider an experiment in which a
tone is paired with food until it elicits salivation
reliably. Next, the tone is presented along with a
light, and this compound stimulus is followed with
food. Will the light elicit salivation when it is presented alone and without the food? Because food
always occurs after the light and never in its absence, the light is maximally (and positively) correlated with food. Moreover, because the food
closely follows the light, the two stimuli also are
temporally contiguous. Hence, one might predict a
strong association between the light and food and,
therefore, salivation to the light. However, routinely little or no salivation to the light is found.
Control experiments indicate that because the tone
already predicted the food at the end of the first
part of the experiment, it somehow blocked the
association `light±food'. We could say that the
light provided no new information about the food
beyond that already provided by the tone and,
hence, the light did not help the animal anticipate
the US any better than the tone. The important
point is that such blocking highlights the fact that
an animal's prior experiences can modulate the
effects of contiguity and contingency.
Relevance of Pavlovian Conditioning
Since Pavlov's pioneering work, the study of Pavlovian conditioning has revealed many other
complex relations among the CR and temporal
variables, the sequential arrangements of the various stimuli, the context in which conditioning
occurs, and the animal species and the particular
response system under consideration. Pavlovian
conditioning is fundamental to understanding
drug addictions, phobias and a variety of sexual
responses in humans and other animals. Its domain
of study also has become increasingly quantitative.
Real-time, dynamic models of the learning process
146
Animal Learning
have started to replace verbal accounts. However,
much remains unknown about the process through
which stimuli that are insignificant when considered in isolation become significant when they
signal biologically important events.
OPERANT CONDITIONING: LEARNING
ABOUT CAUSATION AND CONTROL
The preceding discussion focused on how an
animal's behavior is changed by repeated presentations of single stimuli (habituation) or by
relationships among stimuli (Pavlovian conditioning). In both of these situations, behavior changes
as a result of the stimuli that precede it. However,
it is also the case that things happen after a response. For example, a young male cowbird sings
one of its song variants and elicits a subtle wing
flick from a female. An adult male cowbird sings a
variant that stimulates a precopulatory display in
a female and a vigorous attack from a dominant
male. However, if the same adult cowbird sings a
less stimulating song, then it avoids being chased
by the dominant male. Operant or instrumental
learning results when an animal's behavior causes
a stimulus change, which in turn changes the animal's subsequent behavior: the young cowbird is
more likely to sing the variant that caused the positive female reaction; the adult cowbird is less likely
to sing the song that caused the attack and more
likely to sing the one that avoided it. This capacity
to change behavior because of its consequences
enables animals to learn about control and to exploit the causal texture of their social and physical
worlds.
In the examples above, the operant response produced different types of consequences. Psychologists classify these consequences by means of their
effect on behavior. Consequences of an action that
increase the likelihood of that action recurring are
termed `reinforcers' ± positive reinforcers if the
consequence is the occurrence of a stimulus (e.g.
the wing flick display from the female), and negative reinforcers if the consequence is the cessation
or avoidance of a stimulus (e.g. the threat and
attack avoided by the adult cowbird when it sang
the less stimulating song). Consequences of an
action that decrease the probability of that action
recurring are `punishers' (e.g. the attack suffered by
the low-ranking cowbird when it sang its most
stimulating song). By modifying its behavior to
produce reinforcers and eliminate or avoid punishers, an animal shapes its world while its behavior is
shaped by its world. This closed feedback system is
the hallmark of operant conditioning.
The laboratory study of operant conditioning
began with the work of Edward L. Thorndike
(1874±1949), who showed that cats placed in a
puzzle box (a wooden cage with a door that could
be operated by the animal) become quicker at escaping with repeated successes. Later, B. F. Skinner
(1904±1990) studied how behavior is shaped by its
consequences, and how new response forms
emerge when variations in behavior have different
consequences. To conduct his experiments, Skinner
invented the operant chamber, a box with a lever
that hungry rats could press to receive food dispensed into a tray. The operant chamber soon
became the microscope of learning psychologists.
What sort of consequences function as reinforcers or punishers? An agreed-upon theory that predicts which stimuli reinforce or punish behavior,
the circumstances in which they do so, and why
they do so, has eluded experimental psychologists.
However, researchers have identified several
factors that influence the behavioral effects of reinforcers and punishers.
Contiguity and Contingency
As in Pavlovian conditioning, contiguity and contingency are important. Other things being equal,
long intervals between a response and a consequence weaken the strengthening effect of the
latter; the more immediately a consequence follows
a response, the more likely it is that the response
will be affected by the consequence. However,
short intervals may have weak effects if the correlation between the response and the consequence is
low. This can occur in two ways ± when a response
is followed by the consequence too infrequently, or
when a reinforcer occurs independently of the response that normally produces it. An adult, subordinate cowbird might continue to sing its most
stimulating song if that song rarely produces aggression from other males; a young cowbird might
spend less time singing if the positive female display occurred in the absence of the song.
Extinction
The behavioral changes brought about with operant conditioning are reversible. When the environment changes and a response that used to
be followed by a positive outcome is no longer
followed by it, the probability of that response occurring declines. That is, the animal ceases emitting
behavior that is no longer functional. However,
extinction may be rapid or slow. On some occasions, the animal quickly changes its behavior,
Animal Learning
whereas on other occasions it perseveres for long
periods of time. Whether extinction is rapid or slow
depends largely on whether every instance of a
particular response was reinforced (rapid) or only
occasionally reinforced (slow).
Schedules of Reinforcement
In the natural environment, it is rare that every
instance of a particular response is followed by a
consequence. To understand the effects of intermittent reinforcement psychologists have studied
different rules specifying when a response will
produce a consequence. Two examples of these
rules, collectively known as schedules of reinforcement, are ratio and interval schedules. In ratio
schedules the reinforcer depends solely on the occurrence of behavior (e.g. a rat receives food each
time it completes five lever presses); in interval
schedules the reinforcer depends on the occurrence
of behavior and the passage of time (e.g. a rat
receives food following the first lever press after
15 s since the previous occurrence of food). Because
there are no restrictions to when the rat can press
the lever, the experimenter can study how the rate
of a response changes across time as a function of
how it produces a consequence.
Typically, ratio schedules support higher rates of
responding than comparable interval schedules.
Why? Because the two types of schedule induce
different feedback functions, that is, different relations between response rate and reinforcement
rate. As an illustration, consider a ratio schedule
in which five responses produce one reinforcer. In
this schedule, how often reinforcers occur depends
exclusively on how rapidly the animal responds;
reinforcement rate will always equal one-fifth of
the response rate. In contrast, consider an interval
schedule in which a response is reinforced if it
occurs at least 15 s since the previous reinforcer.
In this schedule, a response rate of four responses
per minute matches the reinforcement rate. Slower
response rates produce proportional changes in
reinforcer rates, but faster response rates do not.
That is, reinforcement rate ceases to vary with response rate. Differences in the feedback function
explain why ratio schedules typically produce
higher rates of responding than interval schedules.
Choice
Just as simple processes combine to produce complex phenomena such as weather, geological formations and evolution, so too do basic processes
of learning combine to produce more complex
147
behavior. Choice among options is an example
that illustrates how reinforcement rates interact to
affect behavior.
In the simplest situation, an animal faces two
response keys, each of which delivers a reinforcer
according to a schedule of reinforcement. For
example, a pigeon might peck one key and receive
a morsel of food with probability p, or peck another
key and receive food with probability q. Another
example might consist of a rat that presses one
lever that delivers reinforcers with rate r, or another
lever that delivers them with rate s. Studies such as
these with pigeons, rats, rhesus monkeys and
humans, among other species, have yielded a
robust empirical finding known as the matching
law. The law states that the proportion of choices
on one alternative equals the proportion of reinforcers obtained from that alternative. In symbols:
x=…x ‡ y† ˆ Rx =…Rx ‡ Ry †
…1†
where x and y are the total numbers of choices of
each alternative, and Rx and Ry are the corresponding total numbers of obtained reinforcers.
Much less understood is how basic behavioral
processes combine to yield the matching law.
Some researchers propose that the equality is the
outcome of the cumulative strengthening effect of
individual reinforcers on the two response alternatives. Others suggest that the law results from the
animal's sensitivity to global rates of reinforcement
and its ability to maximize these rates under constraint. Still others suggest that matching derives
from the tracking of the intervals between successive reinforcers on the two alternatives. In these
three hypotheses we see, once again, the difficulty
of determining the timescale of the learning process. Equally poorly understood is the acquisition
of preference and how it relates to the various
parameters of the choice situations (e.g. how the
values of p and q, or r and s, in the examples above,
determine how fast the animal comes to prefer the
best alternative).
Stimulus Control
Because no response occurs in a vacuum, a response±consequence relation is always contextspecific. In the laboratory, if a pigeon is reinforced
for pecking a green key but not a red one, then the
bird will restrict its pecking to the green key. This
differential responding occurs because the two
stimuli are correlated with different response±
consequence relations: but, as with habituation
and Pavlovian conditioning, the extent that stimuli
different from the ones used in training control
148
Animal Learning
operant behavior depends on a variety of factors.
For example, the amount of pecking in the presence
of stimuli similar to the green key (e.g. a blue key)
and stimuli similar to the red key (e.g. an orange
key) depends on how much training the pigeon
received with the original stimuli. More extensive
stimulus discrimination training promotes discrimination, whereas less training promotes stimulus generalization. Also, reinforcing behavior
differentially in the presence of different stimuli
produces sharper discriminations (less generalization) than reinforcing a response in the presence of
a single stimulus.
Moreover, when two or more stimulus elements
signal that a response is likely to be reinforced,
some form of stimulus competition may ensue.
Consider the following experiment: a pigeon is
trained to peck a green key with a black horizontal
line, but not to peck a red key with a vertical line.
The degree of stimulus discrimination and generalization is then tested by recording the amount of
pecking at a white key during presentations of the
line in various degrees of orientation. During the
test, the pigeon pecked all line orientations similarly (i.e. stimulus generalization). However, when
tested with keys of different colours but without
the line, the pigeon pecked most at the green key
and least at the red one (i.e. stimulus discrimination). This example shows that, for reasons that
are poorly understood, some features of a stimulus
may overshadow others. For the bird in this
example, color overshadowed line orientation; but
the reverse could have happened. A similar effect
occurs in Pavlovian conditioning.
Timing
Temporal variables play a fundamental role in habituation and in Pavlovian and operant conditioning. Time may also be more directly involved in
learning, as when animals learn to act according to
the temporal attributes of a stimulus. For example,
rats, pigeons, monkeys and other vertebrates can
learn to behave in one manner after a 2 s stimulus
and in another manner after an 8 s stimulus. When
a reinforcer such as food is available periodically,
say every 30 s (an example of an interval reinforcement schedule), animals learn to pause immediately after food and then, after about 15 s, respond
at an increasingly faster rate until reinforcement.
The study of the temporal regulation of behavior is
one of the most developed areas in the study of
animal learning.
A major empirical finding that has emerged from
these studies is the scalar property of temporal
discrimination, which states that all temporal judgments are relative. Hence, how a rat behaves at 10 s
when reinforcement occurs every 30 s is similar to
the way it behaves at 20 s when reinforcement
occurs every 60 s. As another example, assume
that a rat is trained to press a lever on the left
after a 2 s signal and a lever on the right after an
8 s signal. Empirical tests show that the rat will be
indifferent (i.e. it is just as likely to press the left as
the right lever) when presented with a 4 s signal,
because 2 is to 4 as 4 is to 8. However, if the two
training stimuli were 4 s and 16 s long, then the rat
would be indifferent when presented with an 8 s
signal. The scalar property derives its name from
the fact that when the intervals of a temporal discrimination change, the animal's performance is
scaled (stretched or shrunk) by the same factor.
Why the scalar property holds remains a matter of
controversy.
Avoidance Learning
Although operant and Pavlovian conditioning
seem clearly distinguishable, components of each
are often present in the procedures of the other. In
this sense, it is perhaps better to conceive of operant
and Pavlovian conditioning as analogous to elemental hydrogen and oxygen. These two elements
are rare in nature, but their combination in the
form of water is common. Similarly, most learned
behavior is a mixture in varying proportions of
operant and Pavlovian conditioning.
The clearest examples of operant±Pavlovian
interaction can be seen in situations where animals
have to avoid aversive outcomes. A gopher that
sees a hawk overhead will not wait to see what
the bird will do; upon sensing the hawk, the gopher
will retreat to its burrow. In the laboratory, a dog
will jump over a hurdle during a tone if this response prevents the delivery of shock. In these and
similar circumstances, the sight of the hawk or the
sound of the tone predict an aversive outcome
(being attacked or shocked) unless a certain response occurs (retreating to a burrow or jumping
a hurdle). The relation between the signal and the
aversive event is a Pavlovian CS±US relation; but
because responding during the CS allows the
animal to avoid the aversive event, this response
is negatively reinforced (an operant response±
consequence relation).
Relevance of Operant Conditioning
Since Thorndike's and Skinner's early work, the
study of operant conditioning has been extended
Animal Learning
in many different directions. Neuroscientists,
pharmacologists and clinical psychologists, for
example, have used the techniques and conceptual
tools of operant conditioning to understand the
functioning of the nervous system, the behavioral
effects of drugs, and the intricacies of behavioral
disorders such as depression. Artificial intelligence
researchers also have borrowed ideas from the
domain of operant conditioning to design systems
that learn through the consequences of their
actions. The study of operant conditioning also
has become more quantitative. As in Pavlovian
conditioning, real-time, dynamic models of the
operant process have started to replace purely
verbal accounts. However, much remains unknown. For example, although it is reasonable to
assume that a consequence with survival value (or
which is closely associated with a stimulus that has
survival value) will have a strong effect on the
response that produced it, research has yet to
yield a general theory of reinforcement and
punishment.
CONCLUSION
The processes reviewed above represent only a
fraction of the most basic categories of the taxonomy of learning. Much else has been done, but
still more remains to be investigated. Despite a
century of research, three central questions of
learning theory remain largely unanswered. First,
how does an animal's evolutionary history constrain the sorts of things that it can learn? Second,
how are processes that occur on different timescales
149
integrated? Third, why are seemingly simple processes so complexly organized? These questions
are likely to set the research agenda of learning
psychologists for the next decades.
Further Reading
Abramson CI (1994) A Primer of Invertebrate Learning: The
Behavioral Perspective. Washington, DC: American
Psychological Association.
Hearst E (1988) Fundamentals of learning and
conditioning. In: Atkinson RC, Herrnstein RJ, Lindzey
G and Luce RD (eds) Steven's Handbook of Experimental
Psychology, 2nd edn, vol. 2, Learning and Cognition,
pp. 3±109. New York, NY: John Wiley.
Mackintosh NJ (1974) The Psychology of Animal Learning.
New York, NY: Academic Press.
Mackintosh NJ (1983) Conditioning and Associative
Learning. Oxford, UK: Oxford University Press.
Mazur J (1998) Learning and Behavior, 4th edn. London:
Prentice-Hall.
Pavlov IP (1927) Conditioned Reflexes, translated by GV
Anrep. London, UK: Oxford University Press.
Rescorla RA (1988) Pavlovian conditioning: it's not what
you think it is. American Psychologist 43: 151±160.
Skinner BF (1961) Selection by consequences. Science 213:
501±504.
Staddon JER (2001) Adaptive Dynamics: The Theoretical
Analysis of Behavior. Cambridge, MA: MIT Press.
Thorndike EL (1911) Animal Intelligence. New York, NY:
Macmillan.
Williams BA (1988) Reinforcement, choice, and response
strength. In: Atkinson RC, Herrnstein RJ, Lindzey G
and Luce RD (eds) Steven's Handbook of Experimental
Psychology, 2nd edn, vol. 2, Learning and Cognition,
pp. 167±244. New York, NY: John Wiley.