Download Summary:A Neural Substrate of Prediction and Reward

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Recurrent neural network wikipedia , lookup

Artificial intelligence wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Donald O. Hebb wikipedia , lookup

Nervous system network models wikipedia , lookup

Enactivism wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Neuroeconomics wikipedia , lookup

Perceptual learning wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Neural modeling fields wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Learning theory (education) wikipedia , lookup

Learning wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Transcript
Summary:A Neural Substrate of Prediction and Reward
Authors:Wolfram Schultz, Peter Dayan , P.Read Montague Summary By: Parth Sharma
October 24, 2016
1
Introduction
Predicting the future events is essential to the survival of any organism . Predictions give the organism
time to prepare for the upcoming event and mitigate catastrophes and exploit opportunities. But how does
the learning occur? An organism must base its predictions on some discriminant , as it cannot attempt to
learn to predict everything. Rewards and Punishments are obvious choices for deciding the significance of
any event. I.e If any event is not associated with any reward or punishment, it’s probably irrelevant to the
survival of the organism. Behavioral experiments have supported this claim.
2
Building blocks of the modeling Learning
Reinforcement learning(RL) is a type of Machine Learning which is widely used in Robotics. The basic
model of RL comprises of the Agent and the environment . The agent is set into an initial “state” (which is
basically the description of the current environment ) . It then performs an “action”. Depending the state
and the action, the environment transitions the agent into another state and gives a “reward”. This cycle
of action followed by change in state and reward goes on until the episode ends. An agent learns a “policy”
, which is a mapping (probabilistic or deterministic) from the state space to action space i.e. what action
to perform , given a state . For example, for an agent in the 2-D world whose aim is to reach (4,4), the
states can be represented by the (x,y) coordinates, the actions can be to move up, down , left , right by
1 . The agent will be rewarded +1 if the agent reaches (4,4), otherwise reward on every state will be zero.
The learning of agent is derived from the “error” signal , which tells the agent how to modify the policy to
increase reward.
3
Modelling neurological data
From physiological studies, we know that dopamine plays an essential role in learning. The paper presents
recent recordings from Dopamine neurons of primates that while learning associations between neutral stimulus and rewards . What’s remarkable about them is that levels of dopamine showed an uncanny resemblance
the expected “error” signal (from TD learning , an RL algorithm ) . Thus they hypothesize that Dopamine
encodes the error signal in brain’s prediction signal and drive learning and propose a model of Classical
Conditioning(A type of learning) . This finding has been used to create numerous successful Computational
models of phenomena such as Addiction and Interval Timing. Researchers can run nuanced experiments of
these models which act as driving force for empirical research . The researchers can then improve the model
by looking at instances when the empirical data disagrees with the predicted data.
1