Download BRAIN AND BEHAVIOR

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Learning theory (education) wikipedia , lookup

Neuroeconomics wikipedia , lookup

Applied behavior analysis wikipedia , lookup

Verbal Behavior wikipedia , lookup

Adherence management coaching wikipedia , lookup

Behavior analysis of child development wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Behaviorism wikipedia , lookup

Classical conditioning wikipedia , lookup

Psychophysics wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
•  Rela@on%between%some%event%(a%reinforcer)%
and%a%preceding%response%increases%the%
strength%of%the%response%
BRAIN AND BEHAVIOR ed%
bserv
y%its%o ur%and%
%b
d
e
n
io
Defi n%behav
%o
c@ve%
effect y%its%subje
NOT%b uali@es%
q
Secondary(reinforcement(
Response%
INSTRUMENTAL LEARNING: POSITIVE REINFORCEMENT AND EXTINCTION Response%(R)%
Factors(affec=ng(instrumental(
Also called operant learning condi=oning(
Science of learning about responses: stimulus ! response ! outcome Response(
Lever%retrac@ng%
INSTRUMENTAL CONDITIONING •
•
•
Reinforcer(Reinforcement((RL)(
Distinct from Classical (Pavlovian) conditioning o Classical: animal behaves as if it has learned to associate a stimulus with a significant event. The Sound%of%food%
response is elicited by a stimulus that comes before it dispenser% Signal%marking%
o Instrumental: animal behaves as if it has learned to associate a behavior with a significant event. Ra%%%%%.%
Other%s@muli%present%
Memory(decay?(
The response is voluntary, and not elicited by any stimulus in%chamber%(context)%
Strong:%
R%
Ra%
" The subject’s behaviour determines the outcomes only in instrumental conditioning @me%
• Distinction largely academic – outside the lab, the two paradigms are largely inseparable; behavior is a Previously%neutral%s@muli%may%acquire%reinforcing%proper@es%
combined result of stimulus learning • and response learning G%Most%rewarding%s@muli%in%our%lives%are%secondary%reinforcers%
Weak:%
R%
Ra%
THORNDIKE’S LAW OF EFFECT @me%
G%Very%useful%in%animal%training%(e.g.%clicker%training)%
Interference(from
• Response that is met with a satisfying outcome will increase that response in future other(events?(
• Response that is met with a frustrating outcome will decrease that response in future Sound(of(lever(retrac=ng(
%
1.%Temporal%con@guity%
%
Reinforcement((RL)(
R(
Small/no%interval%produces%stronger%
Appe@@ve%
(good%outcome)%
learning%in%(almost)%all%cases%of%
Studied by Thorndike and extended by Skinner instrumental%and%classical%condi@oning.%
Posi@ve%
Reward%
Reinforcement: Relation between some event (a REINFORCEMENT •
•
•
•
(excep@on:%condi@oned%taste%aversion)%
con@ngency:%
i.e.%Posi@ve%Ra%
reinforcer) and a preceding response increases Response%results%in%
%
the strength of the response (opposed to a %
outcome%
Response(↑((
punisher) The reinforcer is defined by its observed effect Nega@ve%
Omission%
on behavior and NOT by its subjective qualities con@ngency:%
%
Conditioned (secondary) reinforcement: Response%prevents%
%
o Previously neutral stimuli may acquire outcome%
Response'↓%%
reinforcing properties (e.g. context, sound of lever) o Most rewarding stimuli in our lives are secondary reinforcers, e.g. money o Very useful in animal training (e.g. clicker training – the clicker serves as a signal) Factors(affec=ng(instrumental(
condi=oning(
Aversive%
(bad%outcome)%
Punishment%
%
%
Response'↓%%
Nega@ve%Ra%
(e.g.%avoidance)%
%
Response(↑((
%
FACTORS AFFECTING INSTRUMENTAL CONDITIONING
1.%Temporal%con@guity%
Factors(affec=ng(instrumental(
condi=oning(
1. Temporal contiguity Memory(decay?(
Strong:%
R%
Ra%
Small/no interval produces @me%
stronger learning in (almost) all cases of instrumental and classical conditioning (exception: Weak:%
R%
Ra%
@me%
conditioned taste aversion e.g. Interference(from(
alcohol – strong temporal other(events?(
contiguity is not necessary) Small/no%interval%produces%stronger%
2. Contingency (dependent upon) When rewards are given all the time learning%in%(almost)%all%cases%of%
Strong:%
R% RL(
R% RL(
R% RL(
regardless of behavior – is Rft contingent instrumental%and%classical%condi@oning.%
upon the performance of response R? (excep@on:%condi@oned%taste%aversion)%
Does the response INCREASE the %
probability of Rft? i.e. is P(Rft|R) > P(Rft|no R) ? Weak:%
R% RL( RL(
R% RL( RL(
RL( R% RL( RL(
%
%
2.%Con@ngency%
Is%RL(con@ngent%upon%the%performance%of%response%R?%
Ra@o%
SHAPING •
•
•
•
e.g.%FR5%%
Means%reinforcement%is%
delivered%once%every%5%
responses(
e.g.%VR5(
Means%reinforcement%is%
delivered%on%average%every
5%responses%%
Problem: complex learned behaviors are unlikely to occur spontaneously e.g.%FI5(sec.%
e.g.%VI5(sec.%
Means%reinforcement%
Reinforcement%delivered%o
Behaviors “evolve” through reinforcement of successive approximation of a desired response delivered%on%the%first%
the%first%response%aaer%a%
The term behavior “shaping” popularized by behaviorists (especially Skinner) – bird training Interval% response%aaer%5%seconds%
To be effective, behaviour shaping must adhere to the basic principles of reinforcement: variable%@me%(mean%=%5%
has%elapsed%since%last%
seconds)%has%elapsed%since
o Close temporal contiguity between R and Rft reinforcement%
last%reinforcement%
o Contingency – avoid giving spurious Rfts! o Avoid reinforcing the wrong behaviour – could lead to development of “superstitious” behaviour (Skinner, 1948: put pigeons in Skinner boxes and rewarded them regardless of behavior, the bird begun to behave as though a causal relation was involved; accidental pairings occurred) 4
SCHEDULES OF REINFORCEMENT In animal training and real life, primary rewards are rarely guaranteed 100% of the time o Secondary or partial reinforcement occurs when not every response is being reinforced. o Produces slower but more persistent responding o FR produces a “post-­‐reinforcer pause”, however, VR can generate very high and steady rates Schedules(of(reinforcement(
o FI produces “scalloping”, and elapsed time becomes an SD, however, VI more steady •
Schedules(of(reinforcement(
Fixed%
Variable%
Ra@o%
e.g.%FR5%%
Means%reinforcement%is%
delivered%once%every%5%
responses(
e.g.%VR5(
Means%reinforcement%is%
delivered%on%average%every%
5%responses%%
Interval%
e.g.%FI5(sec.%
Means%reinforcement%
delivered%on%the%first%
response%aaer%5%seconds%
has%elapsed%since%last%
reinforcement%
e.g.%VI5(sec.%
Reinforcement%delivered%on%
the%first%response%aaer%a%
variable%@me%(mean%=%5%
seconds)%has%elapsed%since%
last%reinforcement%
EXTINCTION Availability of reinforcement is removed – Zero contingency between response and reinforcement Par=al(reinforcement(ex=nc=on(effect(
• Established response tends to decline; observed in instrumental and classical conditioning • It is the basis for therapies that clinical psychologists use to eliminate maladaptive or unwanted behavior •  Responding%acquired%with%PRF%persists%when%%
Schedules(of(reinforcement(
• Negative punishment, or omission training, works on a similar b%nonGreinforced%to%a%greater%extent%than%CRF%
asis (omission of an expected reward) • Partial reinforcement extinction effect (PREE) o Responding acquired with PRF persists when non-­‐
reinforced to a greater extent than CRF o This is because CRF very distinguishable from extinction whereas PRF is less so o Gambling study: The less reliably a response is reinforced, the more persistent it is during •  Availability%of%reinforcement%is%removed%
extinction, i.e. takes longer to be extinguished • Extinction does not necessarily destroy the original – learning, Zero%con@ngency%between%response%and%reinforcer%
a lot of the learning remains, shown by: o Spontaneous recovery: If time has passed after extinction has occurred, presentation of the CS can •  Established%response%tends%to%decline%
evoke some responding again (distinction between learning and performance) – i.e. a renewal effect •  Observed%in%instrumental%and%classical%condi@oning%
that happens when the CS is tested outside of extinction’s temporal context o Renewal effect: After extinction, if the CS is tested in a different context, responding can also return, demonstrating that extinction performance can be quite specific to the context in which it is learned •  Omission%training%works%on%a%similar%basis%
o Reinstatement: if US (shock) is presented on its own after extinction, it can cause responding to the –  The%omission%of%an%expected%reward%
CS (tone) to return – the new US presentations conditions the context, which then triggers fear of CS –  cNega@ve%con@ngency%between%response%and%ra%
Ex=nc=on(
o Rapid re-­‐acquisition: conditioned response an return very quickly when CS-­‐US pairings are Par=al(reinforcement(ex=nc=on(effect(
–  “Nega@ve%punishment”%
2%
repeated after extinction •  Availability%of%reinforcement%is%removed%
o Renewal effect and spontaneous recovery may provide reasons for relapse after therapy –  Zero%con@ngency%between%response%and%reinforcer%
o These effects suggest that extinction inhibits rather than erases the learned behavior •
•  Established%response%tends%to%decline%
•  Observed%in%instrumental%and%classical%condi@oning%
ber(of(plays(before(quiXng)(
Ex=nc=on(
1.9%
persistence%at%
gambling%
1.8%
SD$(or$S+)$versus$SΔ$(or$S:)$
THE DISCRIMINATIVE STIMULUS IN BEHAVIOR DISCRIMINATIVE STIMULI •
•
•
$
$
–  In$the$presence$of$SD,$the$response$is$reinfor
–  In$the$presence$of$SΔ,$response$is$not$reinfor
Reinforcement
“stamps in” a
connection
between SD & R
SD(
R(
R@(
The classic operant response in the lab is lever-­‐
pressing in rats reinforced by food. However, things can be arranged so that lever-­‐pressing only produces food when a particular stimulus is present, e.g. a light •  Too$simplis1c$in$some$cases?$
o The rat learns to detect the different contingencies and begins to confine lever-­‐pressing to when D$sensi1ve$to$“value”$of$
–  Responding$in$presence$of$S
the light is on; responses in light-­‐off are extinguished D
Δ$
o This is a simple example of discrimination learning –  S $and$S act$to$facilitate$and$inhibit$the$R:RQ$associa
o The operant response is now said to be under stimulus control – its occurrence now depends on a stimulus that sets the occasion for it (applicable to real life; we act differently in different situations) Stimulus control is acquired through differential reinforcement; behavior becomes observably different in the presence vs., the absence of this particular stimulus o Also, a particular stimulus feature or dimension can control behavior, and response rate varies when this feature is manipulated The discriminative stimulus that is initiating the operant response is not thought to ‘elicit’ it, instead, the stimuli ‘sets the occasion’ for the response, or, signifies that there is now a relationship between the response and the reinforcer; the operant is controlled both by the stimulus that sets the occasion for it, and the reinforcer that reinforces it •  In$experiments,$discrimina1ve$s1muli$are$usuall
In the presence of SD, the response is reinforced (e.g. light-­‐on situation) discrete$events$(lights,$tones,$etc.)$
In the presence of SΔ (S-­‐delta), response is not reinforced (e.g. light-­‐off situation) In behavior chaining (sequence of behaviors), the behaviors in the chain are glued together by D/SΔ:$
discriminative stimuli (e.g. the tone that signifies food iBut$the$following$might$also$serve$as$S
s present), which are present at each step Contexts$
o Dual function of SD: reinforcers the preceding r• 
esponse, while also setting the occasion for the next D
Too simplistic in some cases? ! Responding in presence o
f S
s
ensitive to “value” of Rft •  Emo1onal/physiological$states$
In experiments, discriminative stimuli are usually discrete events (lights, tones, etc.) The following may also •  The$passage$of$1me$
serve as SD/SΔ: •  The$reinforcer$itself?$
o Contexts (dictates the kind of behavior that is appropriate) o Emotional/physiological states o The passage of time (e.g. the rat realizes that he will get food every second day) o The reinforcer itself may act as a cue that other reinforcers are available Experiments using stimulus-­‐control methods have tested how well animals can see colors, hear ultrasounds Extinction as “new learning” – context plays a critical role in extinction o Is inhibitory learning (extinction) specific to the context in which extinction occurs? o Does context act as a discriminative stimulus? Discrimina=ve(s=muli(
•
•
•
•
•
•
•
GENERALIZATION •
•
If Rft is delivered in the presence of a stimulus (S+), learning tends to generalise to similar stimuli... There tends to be a stimulus generalization gradient I responding to a new stimulus depends on its similarity to a stimulus that has already been reinforced; have been shown for a wide range of stimuli •  CRF$very$dis1nguishable$from$Ex1nc1on$wherea
o Gradually change the features ! Drop off in responding to the stimulus (boot fetish example) PRF$is$less$so:$
o The steepness of the gradient (the rate at which responding declines as the stimulus is changed), indicates how much responding actually depends on a particular stimulus dimension CRF$→$Ex1nc1on$
Example:(explana=on(of(PREE?
DISCRIMINATION •
•
•
•
•
•
R Rft
R Rft
R Rft
R Rft
R Rft
R Rft R
R
R
VR3$→$Ex1nc1on$
A failure of generalization is discrimination / A failure of discrimination is generalization R Rft towards R
R Rft R
R
R
R
Discriminating between stimuli means behaving differently tRhem R
Discrimination applies in cases where: •  CRF$vs$Ex1nc1on$serve$as$dis1nguishable$“mark
o The stimuli are easy to tell apart (obviously different along some dimension, e.g. colour) OR o The stimuli are confusable (the difference between them is not obvious) The organism cannot discriminate ! sensory limitation •  New$learning$facilitated$by$the$different$contex
(more$effec1ve$discrimina1ve$s1muli)$
The organism doesn’t discriminate ! lack of stimulus control Can create better stimulus control through discrimination learning o Finer discriminations can be learned through differential reinforcement o The content of what is learned is critical for generalization and discrimination in similar situations