Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Impulsivity wikipedia , lookup
Father absence wikipedia , lookup
Learning theory (education) wikipedia , lookup
Behaviorism wikipedia , lookup
Neuroeconomics wikipedia , lookup
Psychological behaviorism wikipedia , lookup
Psychophysics wikipedia , lookup
Eyeblink conditioning wikipedia , lookup
Classical Conditioning Pavlov’s finding (1890-1900s): Initially, sight of food leads to dog salivating. If sound of bell consistently accompanies or precedes presentation of food, then after a while the sound of the bell leads to salivating. 1 Terminology food unconditioned stimulus, US (reward) salivating unconditioned response, UR Sound of bell consistently precedes food. Afterwards, bell leads to salivating: bell conditioned stimulus, CS (expectation of reward) salivating conditioned response, CR 2 Variations of Conditioning 1 Extinction: • Stimulus (bell) repeatedly shown without reward (food). • Result: conditioned response (salivating) reduced. Partial reinforcement: • Stimulus only sometimes preceding reward • Result: conditioned response weaker than in classical case. Blocking (2 stimuli S1+S2): • First: S1 associated with reward: (classical case) • Then: S1 and S2 shown together followed by reward. • Result: Association between S2 and reward not learned. 3 Variations of Conditioning 2 Inhibitory Conditioning (2 stimuli): alternate 2 types of trials: 1. S1 followed by reward. 2. S1+S2 followed by absence of reward. Result: S1 becomes predictor of presence of reward, S2 becomes predictor of absence of reward. To show this use for example the following 2 methods: • train animal to predict reward based on S2. Result: learning slowed • train animal to predict reward based on S3, then show S2+S3. Result: conditioned response weaker than for S3 alone. 4 Variations of Conditioning 3 Overshadowing (2 stimuli): • Repeatedly present S1+S2 followed by reward. • Result: often, reward prediction shared unequally between stimuli. Example (made up): • red light + high pitch beep precede pigeon food. • Result: red light more effective in predicting the food than high pitch beep. Secondary Conditioning: • S1 preceding reward. Then, S2 preceding S1. • Result: S2 leads to prediction of reward. • But: if S1 following S2 showed too often: extinction will occur 5 Summary of Conditioning Findings (incomplete, has been studied extensively for decades, many books on topic) figure taken from Dayan&Abbott 6 Rescorla Wagner Model of Classical Conditioning (Rescorla & Wagner, 1972): • simple trial level model that explains a number of findings about classical conditioning • many things it doesn’t explain • number of extensions 7 The setting: (Rescorla & Wagner, 1972): Consider stimulus variable u representing presence (u = 1) or absence (u = 0) of stimulus. Correspondingly, reward variable r represents presence or absence of reward. The expected reward v is modeled as “stimulus times weight”: v = wu w u v Learning is done by adjusting the weight to minimize error between predicted reward v and actual reward r. 8 How to change the weight? Denote the prediction error by δ (delta): δ = r-v Learning rule: w←w+εδu, where ε is a learning rate. Q: Why is this useful? A: This rule does stochastic gradient descent to minimize the expected squared error (r-v)2; w converges to <r>. R.W. rule is variant of the “delta rule” in neural networks. 9 Goal: minimize squared error E=δ2 ∂ ∂ 2 ∂ δ = E= (r − v) 2 ∂w ∂w ∂w ∂ (r − wu ) 2 = ∂w = 2(r − wu )(−u ) δ2 ∂ E>0 ∂w ∂ E<0 ∂w = −2δu w ← w −η ∂E ∂w w w ← w + εδu , where ε = 2η Note: in psychological terms the learning rate is a measure of associability of stimulus with reward. 10 Example: prediction error δ = r-v; learning rule: w ← w + ε δ u figure taken from Dayan&Abbott conditioning extinction partial reinforcement 11 Multiple Stimuli Essentially the same idea/learning rule: In case of multiple stimuli: v = w·u (predicted reward = dot product of stimulus vector and weight vector) Prediction error: δ = r-v Learning rule: w ← w + ε δ u ∂ 2 ∂ ∂ 2 δ = (r − v) = (r − w ⋅ u) 2 ∂wi ∂wi ∂wi ∂ = 2(r − w ⋅ u) ∂wi − ∑ w ju j j = −2δui 12 In how far does Rescorla Wagner rule account for variants of classical conditioning? (prediction: v = w·u; error: δ = r-v; learning: w← w + ε δ u) figure taken from Dayan&Abbott 13 (prediction: v = w·u; error: δ = r-v; learning: w := w + ε δ u) • Extinction, Partial Reinforcement: o.k., since w converges to <r> • Blocking: during pre-training, w1 converges to r. During training v=w1u1+w2u2=r, hence δ=0 and w2 does not grow. • Inhibitory Conditioning: on S1 only trials, w1 gets positive value. on S1+S2 trials, v=w1+w2 must converge to zero, hence w2 becoming negative. • Overshadow: v=w1+w2 goes to r, but w1 and w2 may become different if there are different learning rates εi for them. • Secondary Conditioning: R.-W.-rule predicts negative S2 weight! • Rescorla Wagner rule qualitatively accounts for wide range of conditioning phenomena but not for secondary conditioning (and some other things). 14 Instrumental Conditioning • Classical Conditioning: only concerned with prediction of reward; doesn’t consider agent’s actions. Reward usually depends on what you’ve done! Edward L. Thorndike‘s „Law of effect“ (1911): • Responses to a situation that are followed by satisfaction are strengthened • Responses that are followed by discomfort are weakened. 15 “Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond” (Thorndike, 1911, p. 244) cat in “puzzle box” rat in “Skinner box” 16 multi-modal “Skinner box” 17 Clicker Training • used by animal trainers to teach behaviors • developed by Marian Kruse and Keller Breland (two of Skinner‘s first students) • used on over 150 species and robots [Kaplan et al., 2002] • basic idea: use „click“ as a secondary reinforcer. • trainer „marks“ desired behavior with click at precisely the right time 18