Download PSYC2011 Exam Notes Instrumental conditioning • Also called

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Verbal Behavior wikipedia , lookup

Behavior analysis of child development wikipedia , lookup

Learning theory (education) wikipedia , lookup

Insufficient justification wikipedia , lookup

Behaviorism wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Behaviour therapy wikipedia , lookup

Psychophysics wikipedia , lookup

Classical conditioning wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
PSYC2011 Exam Notes
Instrumental conditioning




Also called “operant conditioning”
“Response” learning
- Stimulus -> Response -> Outcome
- Learning about the consequences of your actions, behaviour change
Distinct from classical (Pavlovian) conditioning
- Conditioned Stimulus (CS) -> Unconditioned Stimulus (US)
- Response changes the outcome
The subject’s behaviour determines the presentation of outcomes only in instrumental conditioning
Thorndike’s Law of Effect





If an animal behaves in a certain way and receives some form of satisfaction, they are more likely to
behave in that way again in the same situation
Behaviours which are closely followed by punishment are less likely to occur in the same situation
Cat in the puzzle box
- No insight or point where the cat realised that the lever needs to be pushed to escape
- Trial and error led to success, the amount of time for trials diminished over time
- Learning is a continuous process, it is incremental
Response -> Satisfying outcome -> Increase response
Response -> Frustrating outcome -> Decrease response
Reinforcement








Relation between some event (a reinforcer) and a preceding response increases the strength of the
response
Reinforcers are defined by their observed effect on behaviour and not by its subjective qualities
Positive contingency: response results in outcome
Negative contingency: response prevents outcome
Positive reinforcement (reward): good outcome increases response
Negative reinforcement (avoidance): removal of a bad outcome increases response
Punishment: bad outcome decreases response
Omission: removal of a good outcome decreases response
Secondary reinforcement



Previously neutral stimuli may acquire reinforcing properties
- Reinforcement can transfer to other stimuli
- e.g. lever retracting = food coming, sound of food dispenser, signal marking reinforcement (lights,
etc.), other stimuli present in chamber (context)
- These things are loosely associated with the delivery of reinforcement
Most rewarding stimuli in our lives are secondary reinforcers
Very useful in animal training (e.g. clicker training)
- Immediate reinforcer the very second the animal performs the task, signals food is coming
Factors affecting instrumental conditioning


Temporal contiguity: the amount of time between response and the delivery of the reinforcer
- Strong temporal contiguity is when the reinforcer is delivered closer to the response = more
effective conditioning
- Memory decay over time? By the time the reinforcer is delivered, the memory of the response is
weak; this leads to weaker conditioning
- Interference from other events? Has done other things in the meantime, could be reinforcing one of
the other actions instead of the desired action
- Small/no interval produces stronger learning in (almost) all cases of instrumental and classical
conditioning (exception: conditioned taste aversion [alcohol, chemotherapy drugs, etc.])
Contingency: describes the statistical action between the two events
- Does performing the action lead to reinforcement?
- Strong = response/reward, response/reward
- Weak = response/reward/reward, response/reward/reward/reward
- Response needs to be a necessary requirement for getting the reward to increase effectiveness of
conditioning
Shaping





Problem: complex behaviours are unlikely to occur spontaneously
Behaviour “evolves” through reinforcement of successive approximation of a desired response
The term behaviour “shaping” popularised by behaviourists (especially Skinner)
Can sometimes occur inadvertently (e.g. mother rewarding child’s tantrum by comforting them)
To be effective, behaviour shaping must adhere to the basic principles of reinforcement
- Close temporal contiguity between response and reinforcement
- Avoid giving spurious reinforcement, this degrades contingency
- Avoid reinforcing the wrong behaviour, development of “superstitious” behaviour
Response chaining


Many complex behaviours can be thought of as a series of simple responses

Most effective way of doing this is to start with the last response in the chain and move backwards to
the first response
Response “chaining” involves shaping a sequence of responses
- e.g. dancing, driving a manual
- Sight of lever (stimulus) -> approach lever (response) -> feel of lever (s) -> press lever (r) -> sound of
lever (s) -> approach magazine (r) -> food (s) -> leave magazine (r)
Schedules of reinforcement


In animal training and real life, primary rewards are rarely guaranteed 100% of the time


Fixed ratio (e.g. FR5: means reinforcement is delivered once every 5 responses)


Variable ratio (e.g. VR5: means reinforcement is delivered on average every 5 responses)
Partial reinforcement or secondary reinforcement
- Often desirable for practical reasons
- Produces slower but more persistent responding
Fixed interval (e.g. FI5: means reinforcement is delivered on the first response after 5 seconds has
elapsed since last reinforcement)
Variable interval (e.g. VI5: means reinforcement is delivered on the first response after a variable time
(mean = 5 seconds) has elapsed since last reinforcement)
Extinction




Availability of reinforcement is removed
- Zero contingency between response and reinforcer
Established response tends to decline
Observed in instrumental and classical conditioning
Omission training works on a similar basis
- The omission of an expected reward
- Negative contingency between response and reinforcement
- “Negative punishment”
Partial reinforcement extinction effect

Responding acquired with PRF persists when non-reinforced to a greater extent than CRF (continuous
reinforcement)

Partial reinforcement produces more persistent responding although relatively slow rate of response
at the beginning

The less reliably a response is reinforced, the more persistent it is during extinction
Discriminative stimuli

SD (or S+) vs. SΔ (or S-)
- In the presence of SD, the response is reinforced
- In the presence of SΔ, the response is not reinforced

Reinforcement “stamps in” a connection between SD and response – Thorndike
- SD -> response -> reinforcement
- Habit formation: the next time you see the SD you will elicit the response without deliberation

Too simplistic in some cases?
- Responding in presence of SD sensitive to “value” of reinforcement
- SD and SΔ act to facilitate and inhibit the response-reinforcement association


In experiments, discriminative stimuli are usually discrete events (lights, tones, etc.)

The discrete trial is made up of:
- The SD (instruction or stimulus given)
- A response or prompt
- Reinforcement or correction
But the following might also serve as SD/SΔ:
- Contexts
- Emotional/physiological states
- The passage of time
- The reinforcer itself
Example: explanation of PREE?

CRF is very distinguishable from extinction whereas PRF is less so:
- CRF -> extinction: response/reward, response/reward, response/nothing, response/nothing, etc.
- PRF -> extinction: response/reward, response/nothing, response/nothing, response/reward,
response/nothing, etc.
- Much less noticeable shift in context


CRF vs. extinction serve as distinguishable “markers”
New learning facilitated by the different contexts (more effective discriminative stimuli)
Is extinction unlearning?




Evidence for the original association re-emerges under some circumstances:
- Spontaneous recovery: occurs if you finish extinction session, then start responding again as if you
had never gone through extinction
- Reinstatement: previously extinguished association returns after the unsignalled presentation of an
unconditioned stimulus
- Rapid reacquisition: acquiring response faster upon retraining, original learning still present?
- Renewal: subtle change in context can renew the original response, extinction is context-specific?
All of these effects point toward the context serving as a cue – SD
Context plays a critical role in extinction
Extinction as new learning:
- Inhibitory learning specific to the context in which extinction occurs?
- Context acts as a discriminative stimulus?
Stimulus control

Discriminative stimuli “control” behaviour
- Behaviour is observably different in the presence vs. the absence of a particular stimulus
- Stimulus control is acquired through differential reinforcement

A particular stimulus feature or stimulus dimension can control behaviour
- Variations in response rate when the feature is manipulated (eg. colour, size, orientation)
Generalisation


If reinforcement is delivered in the presence of a stimulus (SD/S+), learning tends to generalise to
similar stimuli
Generalisation gradient (across a stimulus continuum):
- The closer to the original stimulus, the more generalisation occurs
- The less similar a stimulus is to what has been presented in the training, the less response you’ll see
Discrimination


Discriminating between stimuli means behaving differently towards them
Discrimination applies in cases where:
- The stimuli are easy to tell apart (obviously different along some dimension, e.g. colour)
- The stimuli are confusable (the difference between them is not obvious)
Discrimination learning



Generalisation as failure to discriminate?
- Organism cannot discriminate (sensory limitation)
- Organism doesn’t discriminate (lack of stimulus control)
Finer discriminations can be learned through reinforcement
The content of what is learned is critical for generalisation and discrimination in similar situations
Transposition: relational learning?

e.g. Kohler (1918)
- Trained chickens to peck at a darker stimulus for reward
- Changed the colours to see which stimulus they would peck at
- Saw a preference for the darker stimulus when colours had been changed
- Evidence of learning a relationship between two stimuli?
Spence’s theory





Excitatory conditioning to SD/S+, generalises to similar values
Inhibitory conditioning to SΔ/S-, generalises to similar values
Spence (1936): “gradient summation” theory of discrimination learning
Feature based conditioning can explain transposition
Predicts that “relational” choices will have clear physical limitations
Peak shift


Displacement of the “peak” of the gradient away from S+ in the direction opposite SSpence’s theory provides an explanation
Discrimination and categorisation

Animals can learn to discriminate between complex stimuli, even on seemingly “conceptual” grounds
- e.g. categorisation of complex scenes by pigeons
- Pigeons conditioned using large set of stimuli
- Often over diverse physical features (e.g. trees change with the seasons)
- Perform above chance on new category members
- Indicative of the formation of a prototype (a representation of the typical category member)



Features common to one category are more strongly reinforced

The formation of a concept?
- The most common features (e.g. “leg” shapes) are most strongly reinforced, become best
discriminative stimuli
Features common to both categories are not as strongly reinforced
What looks like the learning of a prototype or category might be learning about the features that
category members share in common
Motivation



Conditioned behaviour:
- Variable but
- Persistent
Deprivation and satiation:
- Affect activity
- Affect preferences
What is the role of motivation in:
- Instrumental conditioning?
- Performing a conditioned response?
Motivation and performance



Internal states can affect performance of previously learned responses
e.g. Frustration: a motivational response to the omission of an expected rewards
Frustration can produce a paradoxical reward effect
- Responding seemingly strengthened by the omission of a reward
- This is temporary



Frustration in extinction?
Omission of reward generates frustration, driving a brief spurt of activity (spontaneous recovery)?
Explains the PREE:
- Partial reinforcement = reinforcement in the presence of frustration
- Responding more resilient to frustration than in CRF
The role of motivation in learning




Thorndike’s Law of Effect
Motivational properties of the reinforcer are critical for learning
Satisfaction results in stimulus-response learning
No learning without the reinforcing outcome
Latent learning


Tolman:
- Maze learning with rats
- Rats that received food at the end of the maze learned better, making less errors in the maze
- After swapping the groups and providing food to rats who never had it before, their errors dropped
dramatically whereas the group of rats who had food removed drastically showed more errors
- Without food, no strong motivation to navigate the maze without making errors
Learning occurs without reinforcement
- Learning without behaviour (in the absence of reinforcement) (latent learning)
- Reinforcement provides impetus to perform
Circularity in the Law of Effect

Skinner:
- What is reinforcement? Increase in response when paired with a reinforcer
- What is a reinforcer? Stimulus/event that causes reinforcement
- Explanatory value = 0
Better definitions of reinforcement

Hull:
- Biological needs (e.g. for food, water, sleep, sex) motivate behaviour
- “Drives”
- Behaviour organised to satisfy needs (reduce drives):
- Behaviour = habit x drive (in other words, learning x motivation)
- Reinforcement = drive reduction
- Reinforcer = a stimulus that reduces a drive

Premack (1959):
- Reinforcement involves behaviour of its own (e.g. consumption)
- Reinforcement = increasing access to preferred behaviours
- Providing the opportunity to perform a preferred behaviour (e.g. eating)

The Premack Principle: (given sufficient freedom) what behaviour is an individual most likely to
engage in?
- High probability behaviour (more preferred)
- Low probability behaviour (less preferred)
- Relative behavioural property
- Reinforcement depends on current preference of the individual (reinforcement is dynamic)
- According to this principle, some behaviour that happens reliably (or without interference by a
researcher, e.g. a child watching TV), can be used as a reinforcer for a behaviour that occurs less
reliably, (e.g. a child doing the dishes)
Instrumental conditioning: what is learned?


Stimulus-response theory (e.g. Thorndike, Hull)
- Motivating outcome reinforces the stimulus-response association
- Insensitive to changes in motivation for the outcome
- Habitual
- Strong links between habitual behaviour and automaticity
- Habitual responses are not sensitive to motivational changes that are specific to the outcome
- But they are sensitive to the general motivational state of the organism
- A stimulus that elicits a habitual response primes us to respond in a certain way
- There may be subtle biases in conscious decision and action that can be described as being habitual
But discriminative stimuli influence motivational states

e.g. Cigarette craving in smokers (Dar et al., 2010)
- Craving going up toward the end of flights, knowing that they will be allowed to smoke soon
increases the craving ratings
- Lower rates at the beginning of flights as there is a lack of availability of the reward

Two-process theory (stimulus-outcome learning)
- As stimulus is associated with outcome, it elicits emotional state
- Sensitive to “central” emotional states elicited by stimulus
- Excitement or fear leads to the type of response given
- Goal-directed action/behaviour
Outcome devaluation


A (negative) change in the motivational significance of the outcome (the reinforcer)



Used to determine whether a subject is capable of choosing action based on their current goals


Stimulus activates knowledge of the devalued relationship (cognitive)
Through pairing outcome with aversive outcome (e.g. poisoning), or through satiation (e.g. free
feeding, long exposure)
- Conditioned taste aversion, pairing with other negative events, satiation
Sensitivity to current value of reward even though not experiencing the reward
Need to retrieve from memory that you don’t like that reward after devaluation and choose the
alternative
Apparent in some animals and most humans
Stimulus (response-outcome) learning

Stimulus acts as an occasion setter
- A stimulus that signifies that there is now a relationship between response and reinforcer
- Different to having direct associations with the response or the outcome


Sensitive to the specific appraisal of expected outcome: will outcome be satisfying?
Goal-directed
Punishment






A situation where responding decreases because of a contingency between the response and a bad
outcome
Involves the delivery of an aversive stimulus (shock, loud noise, physical action, physical irritation,
reprimand, time-out [sensory deprivation], overcorrection [performing the errored action over and
over again], monetary fines)
Omission is also a form of punishment (negative punishment): performing the act responds in a lower
probability of something nice happening, preventing yourself from receiving a reward (negative
contingency)
Punishment is contentious:
- Is punishment cruel? Is it unnecessary?
Physical punishment:
- In schools
- In public
In contrast, exaggeration of the risk of aversive outcomes in media is rife:
- Heightened perceived threat
- Avoidance learning receives little attention
Early studies


If a response is met with a frustrating outcome, the response is diminished – Thorndike, the negative
Law of Effect
- Dropped this from the law as he couldn’t get it to work in the lab
Punishment is ineffective?
- Thorndike (with humans): the response is “wrong”
- Skinner (with rats): response met with a slap on the paw
- But: response met with an electric shock is very effective
Factors affecting punishment




Yerkes and Dodson (1908)
- Rats need to learn to discriminate between two chambers
- One of the chambers is electrified and will give an electric shock when the rat runs through it
- Looked at the number of trials it takes before the rat learns this perfectly and doesn’t make any
errors
- The stronger the shock, the faster the rat learns
- In chambers where it is harder to discriminate, if the shock is strong it takes a while for the animal to
learn as well – an optimal point of learning
Intensity determines effectiveness
- Yerkes-Dodson law
- Depends on difficulty
- If you are teaching someone and they are making errors, if you are punishing them too severely this
will make the performance worse rather than better
Stimulus control
- Reduction of response for SD but not SΔ
Path dependence
- Weaker -> stronger = ineffective (e.g. electric shock building up over time)
- If you start with a strong shock and make it weaker over time, this is sufficient to sustain change in
behaviour
- Resistance/habituation



Delay
- Shorter better than longer
- Temporal contiguity
Reinforcement schedule
- CRF better than PRF for punishment
- But what will happen in extinction? (Effect diminishes faster)
Contingency between response and punishment
Punishment and reinforcement




Punishment of a reinforced response?
- Trade-off between reward and aversive outcome
Punishment affects responding to Interval and Ratio schedules differently
- Steady rate vs. bouts of behaviour
- Punishment can increase a reinforced response
Availability of other responses
- Must be alternative ways to achieve goal
- Having alternative things to do increases efficacy of a punisher (even a very mild one)
Punishment seeking behaviour
- Brown et al. (1964): an animal model of masochistic behaviour?
- e.g. Avoidance learning
- Persistent, self-punitive
- “Vicious circle” of behaviour
Explaining effects of punishment




The (negative) Law of Effect
- Thorndike abandoned idea
Premack principle still applicable:
- If more-preferred behaviour leads to having to perform less-preferred behaviour, more-preferred
behaviour would diminish
Conditioned emotional response
- Suppression through fear conditioning
- Instrumental or classical?
Avoidance learning
- Learning of an incompatible (competing) response
- Learning to perform in a certain way in order to avoid an aversive outcome
- Unpleasant event avoided by performing alternative response
Side effects

Punishment seems to be effective but:
- Neurotic symptoms
- Aggression (elicited by pain, frustration, modelling of behaviour)
- Fear/anxiety (response -> shock -> fear)
- Fear conditioning not specific to the undesirable response (can relate to context, punisher, the
whole situation, etc.)
Fear conditioning

Generalisation of fear

Little Albert
- J. B. Watson
- Fear of rat due to loud noise generalised to stuffed animals, coats, rabbits, etc.
Alternatives to punishment


Extinction
- Undesirable behaviour -> nothing
Differential reinforcement of other behaviours (DRO)
- Other behaviour -> reward
Effective punishment is…








Immediate
Consistent
Contingent on undesirable response
Delivered under variety of conditions
Sufficiently aversive from the outset
Not too severe
Delivered in the presence of alternative responses
And (in the case of humans) accompanied by a rational explanation
Instrumental avoidance

Public advertising
- e.g. From the RTA
- Trying to get you to change your behaviour because of the treat of something bad happening
- Bechterev: “classical” conditioning in humans?
- Brogden et al. (1938): running/activity, motivated to continue running on the basis of an absent
event (no electric shock)
Avoidance learning



Negative reinforcement
Response is encouraged because a negative outcome is avoided
Two types of response:
- Escape (response - escape prevents shock), early in training
- Avoidance (response avoids future shock), later in training
- Signalled or discriminative avoidance: a signal present to let the participant know a shock is coming
Problem




No response -> shock
Response -> nothing
Avoidance involves something not happening
How can this be considered reinforcing?
- Learning about absent events?
- Shock is not the only thing that “doesn’t happen”