Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 04 - main goals: • Describe the essential difference between classical and operant conditioning • Acquire an understanding of the neural basis of reward and it’s relation to the dopaminergic system • Understand how drugs of abuse might tap into the neural reward circuitry • Acquire an understanding of reward prediction errors • Button push neurons in basal ganglia. Reverse replay •Slide 1 Lecture 04 – Circuit Motifs 1. reinforcement learning – operant conditioning ( Basal Ganglia? ) Behavior 1 (push left button) Behavior 2 (push right button) •Slide 2 Lecture 04 – Circuit Motifs Behavior 1 Behavior 2 Synaptic tagging vs Working Memory •Slide 3 What is free will? Do we have it? What about disorders of decision-making? What is free will? Do we have it? Nao Uchida: Neurobiology of Perception and Decision Making - MCB 145 Drugs of abuse • cocaine • amphetamine • opiate (heroin) • nicotine • ethanol • cannabinoids (marijuana) • hallucinogens • PCP • These drugs cause “compulsive” drug-taking despite the knowledge of negative outcomes. Operant Conditioning Operant conditioning was the dominant school in American psychology from the 1930s through the 1950s. (Edward Thorndyke; Burrhus Frederic Skinner) Where classical conditioning illustrates S-->R learning, operant conditioning is often viewed as R-->S learning •Slide 7 Law of effect •Slide 8 Thorndyke’s puzzle box • Placed hungry cat in box • Cat can escape and eat if it hits the foot peddle. • Thorndike observed the behaviours of the cat. •Slide 9 Thorndyke’s observation First trial inside of box Scratch at bars Dig at floors Howl Push at ceiling Pace around Hiss Press Lever •Slide 10 Thorndyke’s observation Scratch at bars Dig at floors A few trials later Howl Push at ceiling Pace around Hiss Press Lever •Slide 11 Thorndyke’s observation Scratch at bars Dig at floors After many trials in the box Howl Push at ceiling Pace around Hiss Press Lever •Slide 12 Time required to escape (seconds) Thorndyke’s results 240 180 120 60 5 10 15 20 25 Successive trials in the puzzle box • Law of Effect: Responses that produce a satisfying result are more likely to be repeated in a similar situation, responses that produce a discomforting result are less likely to reoccur in similar situations. •Slide 13 Skinner’s operant conditioning •Slide 14 Pigeon Movies •Slide 15 Skinner’s operant conditioning • Operant response: Behaviour that has an effect on the environment. • Operant conditioning: Learning associated with the above behaviour. • Reinforcer: A stimulus that increases the likelihood of a behaviour. -> Thorndike’s ‘satisfaction” is mentalistic • Problems with the puzzle box. -> Animal can only make one correct response per trial. •Slide 16 The Skinner box • Animal can respond multiple times • Operant response: Bar pressing • Operant conditioning: Increased bar pressing when food is delivered following the response. • Shaping by successive approximations •Slide 17 Pleasure Rats •Slide 18 •Slide 19 Dopamine Substantia nigra pars compacta (SNc) Ventral tegmental area (VTA) The mesocorticolimbic dopamine pathway •The neurons of the VTA (ventral tegmental area) contain the neurotransmitter dopamine which is released in the nucleus accumbens and in the prefrontal cortex. This pathway is activated by a rewarding stimulus •Slide 21 The mesocorticolimbic dopamine pathway •Slide 22 Error in prediction drives learning Prediction Evaluation Blocking Paradigm Showing that Learning Depends on Prediction Error Rather than Stimulus-Reward Pairing Alone •Slide 24 Dopamine neurons encode reward prediction during learning •Slide 25 W. Schultz. Getting formal with dopamine and reward. Neuron 36:241, 2002. Activity of Dopamine Neurons depends on Prediction Error ( = Surprise) •Slide 26 Sustained activity correlates with uncertainty •Mcb105 2003 4th C. D. Fiorillo, et al. Discrete coding of reward probability and uncertainty by 27 dopamine neurons. Science•Slide 299:18981902, 2003. Sustained activity correlates with uncertainty… •Risk-taking •Gambling •uncertain rewards are much more powerful – starting an old car … and with importance •Mcb105 2003 4th •Slide 29 Different Temporal Operating Modes for Different Dopamine Functions •Mcb105 2003 4th •Slide 30 papers B. Brembs, F. D. Lorenzetti, F. D. Reyes, D. A. Baxter, and J. H. Byrne. Operant reward learning in Aplysia: neuronal correlates and mechanisms. Science 296:1706-1709, 2002. Jeremiah Y. Cohen, Sebastian Haesler, Linh Vong, Bradford B. Lowell & Naoshige Uchida. Neuron-type-specific signals for reward and punishment in the ventral tegmental area, Nature 482:85-88, 2012 •Mcb105 2003 4th •Slide 31 Matlab Module: Simple Model of Cue vs Reward responses during learning •Slide 32 How do dopamine neurons compute error signals? Cue Reward Output Input Cue excitatory Reward inhibitory Reward expectation Cue Reward expectation Reward