* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Descision making
Survey
Document related concepts
Feature detection (nervous system) wikipedia , lookup
Perceptual learning wikipedia , lookup
Artificial neural network wikipedia , lookup
Central pattern generator wikipedia , lookup
Activity-dependent plasticity wikipedia , lookup
Learning theory (education) wikipedia , lookup
Neuroeconomics wikipedia , lookup
Concept learning wikipedia , lookup
Machine learning wikipedia , lookup
Synaptic gating wikipedia , lookup
Eyeblink conditioning wikipedia , lookup
Convolutional neural network wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Recurrent neural network wikipedia , lookup
Transcript
נוירוביולוגיה ומדעי המוח 2010 Types of Machine Learning 1. Unsupervised Learning: • Only network inputs are available to the learning algorithm. • The network is given only unlabeled examples. • Network learns to categorize (cluster) the inputs. • Example: Hebbian plasticity rule Wi (n 1) Wi (n) aXi (n) Y (n) Wi – weight of ith synapse X – presynaptic activity Y – postsynaptic activity; n – number of synaptic changes (input patterns) a – amplitude of learning Hebbian Rules • In 1949, Hebb postulated that the changes in a synapse are proportional to the correlation between firing of the neurons that are connected through the synapse (the pre- and post- synaptic neurons): “Neurons that fire together, wire together” • Examples: Classical conditioning Spike-timing-dependent synaptic plasticity (STDP) Synaptic Plasticity and Memory למידה – הפעלת דפוס פעילות על פני התאים שמייצג את המאורעות בעולם גורם לשינוי חוזקי סינפסות ברשת נוירונים. שליפת זיכרון – שפעול הקשרים שהשתנו מחדש עקב חשיפה לחלק מהדפוס שנלמד קודם. LTPכמנגנון הביאני ללמידה וזיכרון: מכיל פאזה מוקדמת ומאוחרת כתהליכיםנפרדים שניתן לחסום פרמקולוגית רק אחד מהם ספציפי אסוציאטיבי קורלציה בין למידה לLTP-()classical conditioning, fear conditioning קורלציה בין חסימת ( LTPדרך חסימת (NMDAלחסימת למידה ושליפת זיכרון )(Morris Water Maze יצירת LTPמלאכותי (גירוי חשמלי בלבד) יכוללהחליף גירוי סנסורי שמוביל ללמידה וזיכרון Application of the hebbian learning rule: The linear associator • The activation of each neuron in the output layer is given by a sum of weighted inputs. • The strength of each connection is calculated from the product of the preand postsynaptic activities, scaled by a “learning rate” a (which determines how fast connection weights change). Δwij = a * g[i] * f[j]. • The linear associator stores associations between a pattern of neural activations in the input layer f and a pattern of activations in the output layer g. • Once the associations have been stored in the connection weights between layer f and layer g, the pattern in layer g can be “recalled” by presentation of the input pattern in layer f. Types of Machine Learning 2. Reinforcement Learning: • The network is only provided with a grade, or score, which indicates network performance. • The network learns how to act given an observation of the world. Every action has some impact on the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm. • Reinforcement learning differs from supervised learning in that correct input/output pairs are never presented, and sub-optimal actions aren’t explicitly corrected. • Formally, the basic reinforcement learning model consists of: a set of environment states S a set of actions A a set of scalar "rewards" in Types of Machine Learning (deducing a function from training data ) 3. Supervised Learning: • The network is provided with a set of examples of proper network behavior (inputs/targets). { p1, t 1} { p2, t 2} {pQ,tQ } - Experimenter needs to determine the type of training examples - The training set needs to be characteristic of the real-world use of the function. - Determine the input feature representation of the learned function (what and how many features in the vector). • The network generates a function that maps inputs to desired outputs. • Example: the Perceptron Application of Supervised Learning: Binary Classification • Given learning data: (x1,x2), (x1,x2), … ,(x1,x2) • A model is constructed: X y {0, 1} Model • The output y is a linear combination of x: x1 x2 w1 w2 wm xm y The Perceptron h W j X j 1 Y 1 sgn h 2 j Y – output ; h – sum of scaled inputs ; W – synaptic weight ; X - input sgn() =1 if h>0, else sgn() = 0 1 Y 1 sgn W X 2 x1 x2 w1 w2 wm xm y Geometrical interpretation W X W1 X1 W2 X 2 X1 W1 W2 X2 Y W X W1 X2 W2 X1 Geometrical interpretation W X W1 X 1 W2 X 2 W cos W X cos X W sin W X sin X W X cos W cos X sin W sin X W X cos W X W2 X2 W X W1 X1 Geometrical interpretation 1 Y 1 sgn W X 2 1 sgn cos 2 X1 W1 X2 W2 Y W2 X2 W X W1 X1 The Perceptron • A single layer perceptron can only learn linearly separable problems. • A single layer perceptron of N units can only learn N patterns. • More than one layer of perceptrons can learn any Boolean function • Overtraining: accuracy usually rises, then falls Perceptron Learning Demonstration Perceptron Learning Demonstration Input Features: Taste Sweet = 1, Not_Sweet = 0 Seeds Edible = 1, Not_Edible = 0 Skin Edible = 1, Not_Edible = 0 Output: sweet fruit = 1 not sweet fruit = 0 We start with no knowledge: Input Taste 0.0 Output Seeds 0.0 0.0 Skin If ∑ > 0.4 then fire Perceptron Learning • To train the perceptron, we will show it each example and have it categorize each one. • Since it’s starting with no knowledge, it is going to make mistakes. • When it makes a mistake, we are going to adjust the weights to make that mistake less likely in the future. • When we adjust the weights, we’re going to take relatively small steps to be sure we don’t over-correct and create new problems. 1. We Show it a banana: Taste Input 1 Seeds 1 1 1 0.0 0.0 0.0 Skin 0 0 0.0 Output 0 If ∑ > 0.4 then fire In this case we have: [(1 * 0) = 0] + [(1 * 0) = 0] + [(0 * 0) = 0] = 0 Since that is less than the threshold (0.4), we responded “no.” Is that correct? No. Since we got it wrong, we need to change the weights using the delta rule: ∆w = learning rate * (overall teacher - overall output) * node output ∆w = learning rate * (overall teacher - overall output) * node output 1. Learning rate: We set that ourselves. Has to be large enough that learning happens in a reasonable amount of time, but small enough not to go too fast. (let’s pick 0.25) 2. (overall teacher - overall output): The teacher knows the correct answer (e.g., that a banana should be a good fruit). In this case, the teacher says 1, the output is 0, so (1 - 0) = 1. 3. Node output: That’s what came out of the node whose weight we’re adjusting. first node: ∆w = 0.25 X 1 X 1 = 0.25. Taste Input 1 1 Seeds 1 1 Skin 0 0 0.0 0.0 0.0 0.0 If ∑ > 0.4 then fire Output 0 The Delta Rule ∆w = learning rate * (overall teacher - overall output) * node output • If we get the categorization right, (overall teacher - overall output) will be zero (the right answer minus itself). In other words, if we get it right, we won’t change any of the weights. • If we get the categorization wrong, (overall teacher - overall output) will either be -1 or +1: - If we said “yes” when the answer was “no,” we’re too high on the weights and we will get a (teacher - output) of -1 which will result in reducing the weights. - If we said “no” when the answer was “yes,” we’re too low on the weights and this will cause them to be increased. The Delta Rule ∆w = learning rate * (overall teacher - overall output) * node output • If the node whose weight we’re adjusting is “0”, then it didn’t participate in making the decision. In that case, it shouldn’t be adjusted. Multiplying by zero will make that happen. • If the node whose weight we’re adjusting is “1”, then it did participate and we should change the weight (up or down as needed). How do we change the weights for a banana? Feature: Learning rate: (overall teacher - overall output): Node output: ∆w taste 0.25 1 1 +0.25 seeds 0.25 1 1 +0.25 skin 0.25 1 0 0 Taste Input 1 1 Seeds 1 1 Skin 0 0 0.0 0.0 0.0 0.0 Output 0 If ∑ > 0.4 then fire Input Taste Seeds Skin 0.25 0.25 0.0 0.0 If ∑ > 0.4 then fire Output 0 2. We Show it a pear: Input Taste 1 1 Seeds 0 0 Skin 1 1 0.25 0.25 0.25 0.0 If ∑ > 0.4 then fire Output 0 We change the weights for a pear: Feature: Learning rate: (overall teacher - overall output): Node output: ∆w taste 0.25 1 1 +0.25 seeds 0.25 1 0 0 skin 0.25 1 1 +0.25 Adjusted weights for a pear: Input Taste 0.50 0.25 Seeds Skin 0.25 If ∑ > 0.4 then fire Output 3. We Show it a lemon: Input Taste 0 0 0.50 Output Seeds 0 0 0.25 0.25 Skin 0 0 0 If ∑ > 0.4 then fire 0 We change the weights for a lemon: Feature: Learning rate: (overall teacher Node output: - overall output): ∆w taste 0.25 0 0 0 seeds 0.25 0 0 0 skin 0.25 0 0 0 Adjusted weights for a lemon: Input Taste Seeds 0.50 0.25 0.25 Skin Output If ∑ > 0.4 then fire 4. We Show it a strawberry: Input Taste 1 1 0.50 Output Seeds 1 1 0.25 0.25 Skin 1 1 1 If ∑ > 0.4 then fire 1 We change the weights for a strawberry : Feature: Learning rate: (overall teacher - overall output): Node output: ∆w taste 0.25 0 1 0 seeds 0.25 0 1 0 skin 0.25 0 1 0 Adjusted weights for a strawberry : Input Taste 0.50 Seeds 0.25 0.25 Skin If ∑ > 0.4 then fire Output The perceptron can now classify correctly any example. 5. We Show it a green apple: Input Taste 0 0 0.50 Output Seeds 0 0 0.25 0.25 Skin 1 1 0.25 If ∑ > 0.4 then fire 0 Decision Making • Neuroanatomical substrates of decision making: Orbitofrontal cortex (within the prefrontal cortex): Responsible for processing, evaluating and filtering social and emotional information for appropriate decision making abilities. It is seen to be involved because of on-line rapid evaluation of stimulusreinforcement associations, that is, learning to link a stimulus and action with its reinforcing properties. Anterior cingulate cortex: Controls and selects appropriate behavior as well as monitors errors and incorrect responses of the organism Dorsolateral prefrontal cortex (DLPFC): Monitors errors and make appropriate choices during decision making. Analysis of cost-benefit in working memory. Basal ganglia-thalamocortical circuits (BGTC) and frontoparietal networks: Directing attention toward relevant information as opposed to irrelevant information during goal-related decision making processes Decision Making • Neuroanatomical substrates of decision making: The dopaminergic system: Appears to be a primary substrate for the representation of decision utility. Increased firing of dopamine neurons has been documented when people are faced with unexpected rewards and in response to stimuli that predict future rewards. The Ventral Striatum: the center of integration of the ‘data’ between the prefrontal cortex, amygdala and hippocampus. It plays a critical role in the representation of the magnitude of anticipated reward The Amygdala: involved in emotion and learning ; responsible for producing fear responses. Plays a key role in the representation of utility from a gain or dis-utility from losses. Decision Making • Factors that impact decision making: Expertise: with expertise come differences in the function and structure of brain regions required for decision making and task completion. - - London black cab drivers who are required to learn and memorize London streets show a different degree of hippocampal volume distribution when compared to ordinary drivers. Physics experts use a ‘working forwards’ strategy to solve problems, making decisions using the information given in the problem to derive a solution. In contrast, neophytes to physics typically employ a ‘working backwards’ strategy in which they start from the perceived goal state or decision and back track. Age: with age come changes in the recruitment of specific brain regions for task completion during decision making. Older adults will often compensate for age-related declines in prefrontal structure and function by recruiting additional prefrontal regions and more posterior regions Sex: bias toward men for faster decision making in situations of uncertainty and limited feedback. Neural Activity Correlates of Decision Making • Neural correlates of decision variables in parietal cortex (M.L. Platt & P.W. Glimcher, 1999): The gain (or reward) a monkey can expect to realize from an eyemovement response modulates the activity of neurons in the lateral intraparietal area (LIP). In Addition, the activity of these neurons is sensitive to the probability that a particular response will result in a gain. Neural Activity Correlates of Decision Making • “Neurons in the orbitofrontal cortex encode economic value” (C. PadoaSchioppa & J.A. Assad, 2006): - Neurons in the orbitofrontal cortex (OFC) encode the value of offered and chosen goods. - OFC neurons encode value independently of visuospatial factors and motor responses. (If a monkey chooses between A and B, neurons in the OFC encode the value of the two goods independently of whether A is presented on the right and B on the left, or vice versa). Conclusion: economic choice is essentially choice between goods rather than choice between actions. Neural Activity Correlates of Decision Making • “Microstimulation of macaque area LIP affects decision-making in a motion discrimination task” (TD Hanks, J Ditterich & MN Shadlen, 2006): - In each experiment, they identified a cluster of LIP cells with overlapping response fields (RFs) - Choices toward the stimulated RF were faster with microstimulation, while choices in the opposite direction were slower. - Microstimulation never directly evoked saccades, nor did it change reaction times in a simple saccade task. - These results demonstrate that the discharge of LIP neurons is causally related to decision formation in the discrimination task.