* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Summary:A Neural Substrate of Prediction and Reward
Survey
Document related concepts
Recurrent neural network wikipedia , lookup
Artificial intelligence wikipedia , lookup
Clinical neurochemistry wikipedia , lookup
Donald O. Hebb wikipedia , lookup
Nervous system network models wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Neuroeconomics wikipedia , lookup
Perceptual learning wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Eyeblink conditioning wikipedia , lookup
Neural modeling fields wikipedia , lookup
Psychological behaviorism wikipedia , lookup
Transcript
Summary:A Neural Substrate of Prediction and Reward Authors:Wolfram Schultz, Peter Dayan , P.Read Montague Summary By: Parth Sharma October 24, 2016 1 Introduction Predicting the future events is essential to the survival of any organism . Predictions give the organism time to prepare for the upcoming event and mitigate catastrophes and exploit opportunities. But how does the learning occur? An organism must base its predictions on some discriminant , as it cannot attempt to learn to predict everything. Rewards and Punishments are obvious choices for deciding the significance of any event. I.e If any event is not associated with any reward or punishment, it’s probably irrelevant to the survival of the organism. Behavioral experiments have supported this claim. 2 Building blocks of the modeling Learning Reinforcement learning(RL) is a type of Machine Learning which is widely used in Robotics. The basic model of RL comprises of the Agent and the environment . The agent is set into an initial “state” (which is basically the description of the current environment ) . It then performs an “action”. Depending the state and the action, the environment transitions the agent into another state and gives a “reward”. This cycle of action followed by change in state and reward goes on until the episode ends. An agent learns a “policy” , which is a mapping (probabilistic or deterministic) from the state space to action space i.e. what action to perform , given a state . For example, for an agent in the 2-D world whose aim is to reach (4,4), the states can be represented by the (x,y) coordinates, the actions can be to move up, down , left , right by 1 . The agent will be rewarded +1 if the agent reaches (4,4), otherwise reward on every state will be zero. The learning of agent is derived from the “error” signal , which tells the agent how to modify the policy to increase reward. 3 Modelling neurological data From physiological studies, we know that dopamine plays an essential role in learning. The paper presents recent recordings from Dopamine neurons of primates that while learning associations between neutral stimulus and rewards . What’s remarkable about them is that levels of dopamine showed an uncanny resemblance the expected “error” signal (from TD learning , an RL algorithm ) . Thus they hypothesize that Dopamine encodes the error signal in brain’s prediction signal and drive learning and propose a model of Classical Conditioning(A type of learning) . This finding has been used to create numerous successful Computational models of phenomena such as Addiction and Interval Timing. Researchers can run nuanced experiments of these models which act as driving force for empirical research . The researchers can then improve the model by looking at instances when the empirical data disagrees with the predicted data. 1