Download Classical Conditioning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neuroeconomics wikipedia , lookup

Neuroethology wikipedia , lookup

Metastability in the brain wikipedia , lookup

Learning wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Nervous system network models wikipedia , lookup

Behavior analysis of child development wikipedia , lookup

Concept learning wikipedia , lookup

Agent-based model in biology wikipedia , lookup

Behaviorism wikipedia , lookup

Biological neuron model wikipedia , lookup

Neural modeling fields wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Operant conditioning wikipedia , lookup

Classical conditioning wikipedia , lookup

Transcript
Conditioning
Bear with me. Bare with me. Beer with me.
Stay focused.
Learning
• A. Two-process learning (Rescorla-Solomon 67)
• fast: fear and arousal
• slow: adaptive behavioral responses
• B. Three-process learning
•A
• declarative memory (as opposed to procedural)
• C. More-than-three-process learning
•A
• declarative memory
• episodic memory
• semantic memory
• more stuff
Typically
this
subsides as
this
is learned.
Conditional and Unconditional
US
innate
UR
US
innate
Training
S
CS
US = “Reinforcer”
Delay procedure
Trace procedure
CS
CS
US
US
easier
harder
UR/CR
Classical and Operant
US
innate
UR/CR
delivery of the reinforcer is
contingent on the occurrence of a
stimulus (the CS).
CS
US
S1
innate
Action
delivery of the reinforcer is
contingent on the occurrence of a
designated response
CC predicts that the animal will produce UR/CR while performing
the desired action, but does not explain why the animal learns to
select the action.
Selectionist View
• Selectionist principles
– Behaviors are varied, selected and retained in a
process similar to the natural selection of the
species
– Only overt behaviors can be reinforced by the
environment
– Principle of the selection is based in the
behavioral discrepancy
Behavioral Discrepancy
Behavioral discrepancy is the change in an ongoing
behavior produced by the eliciting stimulus
Example:
Presentation of food produces salivation which
would not otherwise occur
Unified Selection Principle
Whenever a behavioral discrepancy occurs, an environmentbehavior relation is selected that consists -- other things being
equal -- of all those stimuli occurring immediately before the
discrepancy and all those responses occurring immediately
before and at the same time as the elicited response.
Under this principle there is no difference between
Classical and Operant conditioning as far as learning goes.
Conditioning Phenomena
Name
Set I
Set II
l r
Pavlovian
Test
l r
R
U
l

r
|S |V
|Ts  r |W
1
2
l s  r
Overshadowing
Inhibitory
1
2
l r U
R
S
Tl s  V
W
sr
s
Blocking
l r
l s  r
Upwards unblocking
l r
l s  r u r
sr
ls  r
s  r
Downwards unblocking
l  r u r
It goes on...
Conditioning/Selection Models
• Trial-by-trial
• Probabilistic (Dayan-Long, Cheng-Novick)
• … and not (Rescorla-Wagner)
• NN (Donohoe)
• Moment-by-moment
• Sutton-Barto
• Mignault
• Schmajuk (NN)
• ~ Bazillion of others...
S1 and S2 processing should happen at roughly the same time so almost all
models suggest a multiplicative relationship between levels of S1 and S2.
V ~ US CS or V ~ ( reinforcement) (eligibility)
Rescorla-Wagner model
• Trial based
• Based on net prediction of the reward
• Only happens when prediction discrepancy is detected
• Falls out straight from ML estimation of association strength
• Is essantially the delta-rule
net prediction
association strength update
reward
stimulus eligibility
V (  VS ) S
Problems:
• Does not deal well with overshadowing and downwards unblocking...
• Does not depend on the temporal relations between stimuli
• Does not explain re-acquisition rate
Sutton-Barto model
• Real-time model
• Combines Y theory with RW model
• time-derivative model
• presumes that all stimuli produce +V at the onset and -V at the offset
• Deals with secondary conditioning
V Y S
Y Y ( t ) Y ( t t )
sum of all the associative strengths at a given time
Problems:
• Does not model Inter-Stimulus Intervals where the efficiency of the
training should decrease with increased ISI
• Does not deal with reacquisition
Temporal Difference model
• Is related to the SB model (and the RW model)
• Models reward in small discrete intervals
• Models second order conditioning
• Based on the assumption that the goal of learning is to
accurately predict the future US levels
V (  t 1 Vt 1 Vt ) S
discounted prediction of the future reward (V for predicted values of S)
Problems:
• No model of attention, salience, configuration etc...
• No indirect associations modeled (sensory preconditioning)
• Problems with downwards unblocking
Statistical models
P(r| s1, s2 ) ?
P( r| s1, s2 ) N ( w1s1 w2 s2 ,  2 )
This results in exactly the RW model with ML.
P( r| s1, s2 )  1N ( w1s1, ,  2 )  2 N ( w2 s2, ,  2 ) N ( w ,  2 )
This is EM. Similar to comparator models of conditioning
(whatever they are). Has problems with inhibitory conditioning.
P( r| s1 , s2 ) N ( 1w1s1  2 w2 s2 ,  2 )
Dayan & Long’s model. Models the conditioning phenomena.
Does not consider associability (eligibility in SB) and attention.
No distinction between preparatory and consumatory conditioning
NN models
Warning: a personal opinion!
• Everything is a neural net - things happen naturally
• The weights propagate and this forms the dynamics
of the Stimulus-Stimulus interactions
S1
Stuff
happens
here
Response
S2
Whatever….
Bruce’s favorite model
• Model time and rate of CS and reinforcement
• Time -scale invariant
• Non-associative framework
rates of reinforcement
1
2

n
L
1
M
M
t
M
M
t
M

M
t
M
Nt
12
2
1n
n
t12
t1
1

t2 n
tn
cumulative duration of the conjunction of S1 and Sn
O
P
P
P

P
P
P
1P
Q
t
 1n
t1
t
 2n
t2
 


1
N1
t1
N2
t2

Nn
tn
cumulative number of
reinforcements in presence of Sn
cumulative duration of Sn
References
• Dayan, P., and Abbot, L. F. (2000?). Theoretical Neuroscience. In Print??? (http://www.gatsby.ucl.ac.uk/~dayan/book/)
• Dayan, P. and Long, T., (1998?). Statistical Models of Conditioning. NIPS10.
•Gallistel, C. R., and Gibbon, J., (2000) . Time, Rate and Conditioning. Psychological review, in print.
• Pavlov, I. P. (1927). Conditioned Reflexes. Oxford: Oxford University Press.
• Mignault, A. and Marley, A. A. J. (1997). A Real-Time Neuronal Model of Classical Conditioning. Adaptive Behavior. Vol. 6-1, 3-61.
• Rescorla, R. A. (1988). Behavioral studies of Pavlovian conditioning. Annual Review of Neuroscience 11: 329 - 352.
• Rescorla, R. A., and R. L. Solomon. (1967). Two-process learning theory: Relationships between Pavlovian conditioning and instrumental
learning. Psychological Review 74: 151 - 182.
• Rescorla, R. A., and A. R. Wagner. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and
nonreinforcement. In A. H. Black and W. F. Proskay, Eds., Classical Conditioning, vol. 2, Current Research and Theory. New York:
Appleton-Century-Crofts, pp. 54 - 99.
• Roitblat, H. L. and Meyer, J.-A.. Comparative Approaches to Cognitive Science. MIT Press.
• Schmajuk, N. A. (1997). Animal Learning and Cognition. A neural Network approach.
• Skinner, B. F. (1938). The Behavior of Organisms. New York: Appleton-Century-Crofts.
• Sutton, R. S., and Barto, A. W, (1990). Computational Neuroscience: Foundations of Adaptive Networks. MIT Press
• Thorndike, E. L. (1911). Animal Intelligence: Experimental Studies. New York: Macmillan.
• Wilson, R. A. and Keil, F. (1999) The MIT Encyclopedia of Cognitive Sciences. MIT Press. MITECS (http://cognet.mit.edu/MITECS)