* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides - NYU Computation and Cognition Lab
Survey
Document related concepts
Vocabulary development wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
State-dependent memory wikipedia , lookup
Catastrophic interference wikipedia , lookup
Time perception wikipedia , lookup
Psychophysics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Donald O. Hebb wikipedia , lookup
Eyeblink conditioning wikipedia , lookup
Psychological behaviorism wikipedia , lookup
Machine learning wikipedia , lookup
Transcript
An Introduction to Learning Lecture 3/13 Todd M. Gureckis Department of Psychology New York University [email protected] 1 Agenda for Today Simple forms of non-associative learning Habituation/Sensitization Priming Imprinting Sensory adaptation Perceptual Learning Unsupervised Learning 2 Non-associative Learning Most of the forms of learning we will cover in the next few weeks (classical/instrumental conditioning) are about learning the relationship between actions or stimuli and particular consequences Other forms of learning have no obvious relationship to reward/ consequence Interesting place to start because the nature of these processes is much less understood (neurally speaking), but also are a key component of contemporary machine learning research (unsupervised learning) The reason for the lack of neural understanding is that the processes are largely distributed throughout the brain (visual areas for perceptual learning, higher cortex for habituation, reflex systems for other forms of habituation) 3 1 Tolman. 4 Tolman (1948) “Cognitive Maps” to save the world! Evidence for non-stimulus/response learning (could perhaps be thought of as stimulus-stimulus behavior) were explored in five paradigms Latent learning Vivarious trial and error Searching for the stimulus Hypotheses Spatial Orientation 5 Latent Learning 6 Vicarious Trial and Error Basically, when the rat is learning and has to make a choice, there is a pattern of looking back and forth between the options They don’t do this at first when the task is really hard, but do once they show learning On each tasks, VTE’s appear early on “VTEing, as I see it, is evidence that the critical stages-whether in the first picking up of the instructions for in the later making sure of which stimulus is which - the animal’s activity is not just one of responding passively to discrete stimuli, but rather one of active selecting and comparing of stimuli.” 7 Searching for the Stimulus Watch: Werner Herzog get hit by sniper http://www.youtube.com/watch?v=ylXqc8TQ15w Immediate search for the stimulus that caused the pain If the stimulus is quickly removed, rats will fail to learn the association between cues and a fear response 8 Hypothesis Testing in the Rat 9 Spatial Orientation light 10 Spatial Orientation 11 What does it mean to learn a “model” of the world? A Computational Framework for Thinking about Nonassociative Learning - Unsupervised Learning How might something like a “cognitive map” be learned without “a teacher” (so to speak)? Why would organisms both learning something that is not directly tied to reward? One way to think about this problem is to provide a clear articulation of the computational level account of what the organism is trying to do (e.g., Barlow, 1989) 12 A Computational Framework for Thinking about Nonassociative Learning - Unsupervised Learning In information theoretic terms, our sensory signals are highly redundant across time and space To see this, just imagine a high entropy signal like random white noise Clearly our experience has higher structure than this highentropy signal, and the difference can essentially be viewed as the redudancy of the signal. We normally think of redundancy as lower information content (why not get rid of the redudant part of the signal) Barlow’s point is that this redundancy is basically our knowledge about the world. It is the part of the signal that our brains can latch onto as a source of structure 13 Key Points from Barlow (1989) How might organism make use of this redundancy? Genetic specification to invariant features of the environment (e.g., taking advantage of light coming from above) ... also Learning! How might the brain learn to take advantage of this structure? By tracking aspects of experience that are stable across time (i.e., statistics). Mean, Variance, Covariances “...one might take advantage of covariance by devising a code in which the measured correlations are “expected” in the input, but removed from the output by a suitable set of linear combinations of the input signals” --- Similar to idea of differentiation Alternatively, correlated signals might become fused into a single code (categorical perception, unitization) 14 A more general approach: (replacing minimum entropy coding with a more intuitive example) The EM algorithm in this case takes a large set of inputs and discovers a particular decomposition of the features of those items in a simple “code” Instead of expressing each item individually, the more compressed code models the stable and variable part of the input 15 from 00000000000000000001 to 01001001 Key Points from Barlow (1989) Building a model of the regularities in environment (i.e., an internal code that captures aspects of the statistics in the world) also captures the prior structure Learning should largely be about deviation from expectations “One can regard the model or map as something automatically help up for comparison with the current input; it is like a negative filter through which incoming messages are automatically passed, so that what emerges is the difference between what is actually happening and what one would expect to happen, based on past experience. In this way, past experience can be made continuously and automatically available.” We should take this as our computational-level account of what unsupervised learning is designed to do. Allows the organism to model stable aspects of the world as a way to drives future learning. 16 2 Mechanism of nonassociative learning. 17 Habituation Everyone in this classroom knows about habituation since you live in one of the noisiest, dirtiest, crowded cities on earth ... or if you really want, you can consider Dahmer’s mom as an example 18 Habituation Everyone in this classroom knows about habituation since you live in one of the noisiest, dirtiest, crowded cities on earth ... or if you really want, you can consider Dahmer’s mom as an example Is habituation a form of learning? 19 Habituation Everyone in this classroom knows about habituation since you live in one of the noisiest, dirtiest, crowded cities on earth ... or if you really want, you can consider Dahmer’s mom as an example Is habituation a form of learning? 20 Habituation as a Tool for studying Learning and Memory pa-pa-pa-pa-pa-pa-pa-pa-pa 21 Habituation as a Tool for studying Learning and Memory pa da 22 Habituation as a Tool for studying Learning and Memory da pa 23 Habituation as a Tool for studying Learning and Memory Longer looking time toward novel/ changed stimulus reveals ability to detect differences (called dishabituation) 24 Habituation as a Tool for studying Learning and Memory 25 The Basic Neurobiology of Habituation 26 The Basic Neurobiology of Habituation 27 The Basic Neurobiology of Habituation 28 The Basic Neurobiology of Habituation Note that habituation can be both of orienting response (kids looking at sounds/ pictures) or withdrawl (Aplysia)! 29 Sensitization The flip side of habituation is sensitization, the increased sensitivity of stimuli in the environment Examples: After 9-11 people became temporarily sensitized to the sound of airplanes flying overhead and to seeing them in the sky Are they simply two sides of the same coin? 30 No. Habituation Specific to particular stimulus, brain circuit, due to repetition, short and long-term effects Sensitization General to multiple stimuli and response, general heightened sensitivity, associated with emotional stimuli, short lasting 31 The Basic Neurobiology of Sensitization 32 The Basic Neurobiology of Sensitization Dual process theory 33 Habituation and Sensitization Primitive forms of non-associative learning that modulate adaptive behavior Help organisms tune out and conserve energy by not processing repeated stimuli Converse is that threatening situations can cause a general sensitization to stimuli throughout the nervous system 34 Perceptual Learning 35 Perceptual Learning Changes in the ability to detect, discrimination, and classify sensory stimuli following extensive experience with those stimuli Happens in almost all of the brain pathways we considered in the last lectures (audition, vision, touch, olfaction, etc...) Non-associative (although can be aided with rewards) Unlike habituation not about reducing some behavioral response, but aiding in perceptual analysis of the environment without effecting specific responses 36 Perceptual Learning perception after learning perception Bridget Riley, Movement in Squares, 1961 37 Short-term to medium-term forms of perceptual learning: Adaptation **learn how to build your own upside down glasses at http://www.instructables.com/id/Upsidedown-glasses/ 38 Mechanisms of Perceptual Learning (Goldstone, 1998) Attentional weighting - an adaptive allocation of computational resources/processing to important parts of the environment, ignoring irrelevant part (see Barlow’s negative filter idea) Imprinting - direct adaptation of the perceptual system to the particular pattern of input (similar to priming) Differentiation - increased differences in particular percepts, breaking apart of stimuli that were once fused, modeling the independent components Unitization - the converse, grouping items into a single functional unit that can be triggered in response to particular combinations 39 Attentional weighting 40 Mechanisms and Processes of Perceptual Learning (from Rob Goldstone and Joe Lee) One of the simplest ways in which thoughts distort perception is by highlighting or emphasizing certain aspects that are momentarily important. 41 Changes in Space: Perceptual Learning Selective attention can cause a re-weighting of the importance of particular kinds of information that can “warp” our perception A B 42 Changes in Space: Perceptual Learning Attentional processes not fully strategic (e.g. Shiffrin & Schneider, 1977) Also evidence of negative priming (Tipper, 1992) L L L L L L T L L L L L L Search for T ignore L L 43 Changes in Space: Perceptual Learning Attentional processes not fully strategic (e.g. Shiffrin & Schneider, 1977) Also evidence of negative priming (Tipper, 1992) T T T T T T T T Z T T T T T Search for Z ignore T ... Previous targets hard to ignore (automatically capture attention). 44 Changes in Space: Perceptual Learning Attentional processes not fully strategic (e.g. Shiffrin & Schneider, 1977) Also evidence of negative priming (Tipper, 1992) T T T T T T Search for Z ignore T ... Previous targets hard to ignore (automatically capture attention). T T Z T T T T T Also goes opposite ways, previous distractors are responded to slower than novel items... effects can last for days 45 Imprinting 46 Imprinting of Whole Stimuli Storage of entire stimuli in memory Central to classic theories of learning an memory generally referred to as instance-based models (Nosofsky, 1986; Logan, 1988) Evidence for whole-stimulus storage includes: Spoken word recognition between in original voice than in different voice (Palmeri et al, 1993) After training in numerosity judgments of dots, response times can become the same for 6-11 dots only if the dots are arranged as they were during training (Palmeri, 1997) Identification of previously presented stimuli is higher for things that have been preexposed 47 Stimuli XY X 48 Y Schyns and Rodet (1997) Training Phase 1 2 XY 1 3 X 2 X Y 3 Y 49 XY Schyns and Rodet (1997) Test Phase X XY X *** Y XY *** Y X X-Y *** Y 50 Schyns and Rodet (1997) X-Y-XY 100 98 XY-X-Y 98 88 92 92 88 81 Percent Each Classification Percent Correct Classifications Test Phase 75 50 25 0 19 X Y XY 100 75 54 50 25 0 27 19 XY X Y X-Y 51 Schyns and Rodet (1997) X-Y-XY 100 98 XY-X-Y 98 88 92 92 88 81 Percent Each Classification Percent Correct Classifications Test Phase 75 50 25 0 19 X Y XY 100 75 54 50 25 0 27 19 XY X Y X-Y 52 Schyns and Rodet (1997) Mere exposure/Latent Learning 53 Priming and Repetition Suppression Prior, repeated exposure to a stimulus can lead to easier, faster, and less effortful processing of that stimulus Examples include word-stem completion tasks (MOT___ is completed as MOTEL, but given prior exposure to MOTH people fill in MOTH) Possible neural correlated is the repetition suppression found in neural signals (less activation/ activity on repeated presentations of a stimulus) 54 Theoretical Models of R.S. 55 Mechanisms and Processes of Perceptual Learning (from Rob Goldstone and Joe Lee) With expertise or practice, differences are noticed between objects that were once thought to be identical. 56 With training or experience, differences that seem small become easier to detect 57 58 The 75% acuity threshold (place where you can tell 1 or 2 dots separated by .5mm) is 3m The 75% acuity threshold (place where you can tell 1 or 2 lines separated by .5mm) is 20m! Why are lines better than dots? ... the answer can be understood in terms of HYPER-ACUITY of the perceptual system 59 Stimulus specific Only route to improved performance is experience Doesn’t require overt discovery of some criterion or feedback Even specific to particular regions of the visual field (low level effect) 60 The Computational Basis of Hyperacuity Key points: 1. A “sloppy” system with low resolution can do surprisingly well! 2. Tiny differences that fall within the receptive field of any individual neuron are capture in a better way in the aggregate action of a large number of overlapping neurons. 3. Different inputs are projected into a “space” or “map” the dimensionality of which is given by the number of neurons 4. Way better than a digitized non-overlapping system such as the CCD in your digital camera! 61 Improvements in discrimination can come from increases in the number of units responding to a particular region of input space 62 63 Another form of differentiation/unitization Differences among items that fall into different categories are exaggerated, and differences among items that fall into the same category are minimized. 64 Changes in Space: Perceptual Learning Categorical Perception Effect People are better able to discriminate category members when they come from different categories than when they come from from the same category An innate (Eimas et al. 1971) and learned basis (Lively et al, 1993; Goldstone, 1998) 65 Changes in Space: Perceptual Learning Beale & Keil (1995) examined if individual faces are perceived categorically Sharpness strong for familiar faces compared to unfamiliar faces (although this is bad example... Arnold is pretty well known now!) 66 Computational Basis (Warping of Internal Codes) 67 Computational Basis (Selective Attention) A B 68 Mechanisms and Processes of Perceptual Learning (from Rob Goldstone and Joe Lee) We create perceptual dimensions (size, brightness, saturation, eccentricity) by witnessing variation along these dimensions. We tend to order objects by their value on dimensions that we create or already possess. Objects that are originally perceived holistically, without being decomposed into separate dimensions, come to be perceived analytically, in terms of their underlying dimensions. 69 Dimensionalization Dimensions that are easy for adults to see are hard for kids. For example, brightness and size appear fused (processes holistically) by young children (Smith, 1989) Children have trouble selectively attending to certain dimensions while ignoring others (for example size and color) (Smith & Evans, 1989) and their sorting behavior tends to be more “wholistic” Things like brightness and saturation are typically considered integral in the sense that you can’t ignore one and process the other, however with extensive training, people can selectively attend to one or the other (Goldstone, 1994) 70 Mechanisms and Processes of Perceptual Learning (from Rob Goldstone and Joe Lee) If a group of shapes form a coherent, contiguous pattern and are often repeated together, a single chunk or unit will be formed by concatenating them together. In some ways, unitization is the opposite of dimensionalization and segmentation. Whereas these latter mechanisms break down an object into parts or dimensions, unitization creates a single unit from multiple parts that occur together. 71 Unitization 72 Unitization Complex conjunctive search task, Shiffrin and Lightfoot (1977) showed prolonged improvements in decreased search slopes over sessions suggesting untization and more effective processing of the target 73 Unitization Harder than 74 Mechanisms and Processes of Perceptual Learning (from Rob Goldstone and Joe Lee) We naturally see objects as being composed out of parts. Instead of perceiving indivisible objects, we perceive objects in terms of their labelled or categorized parts. We break objects into parts that we have learned are relevant or important. 75 Pevtzow and Goldstone (1994) 76 Key Summary There are a variety of mechanisms for non-associative learning in the human and animal brain These types of learning can be viewed as adaptation to the environment to enable an efficient coding of experience These forms of learning extend across multiple areas and levels of behavior (forms of cognitive learning, low-level perceptual effects) In addition, these very simple forms of learning appear to manifest in lower species as well. 77 Key Principals for the Semester Learning and memory are closely related and intertwined states of information processing Major insights about learning and memory have come from studies of the brain The concept of multiple memory systems unifies the study of learning and memory The underlying bases of learning and memory are the same in humans and animals Our theoretical approaches to studying learning are always closely tied to technological advances that are unfolding in general society (e.g., today - machine learning) 78 Readings Textbook reading: Gluck, Ch. 7 - Classical Conditioning Rescorla, R.A. (1998) “Pavlovian Conditioning: It’s not what you think it is” American Psychologist, 43(4), 151-160. Rescorla, R.A. and Wagner, R.A. (1971) “A theory of Pavlovian Conditionig: Variations in the Effectiveness of Reinforcement and Non-reinforcement” in Black, A.H. & Prokasy, W.F (Eds.), Classical conditioning II: Current research and theory (pp. 64-99). New York: Appleton-Century-Crofts. Clark, R.E. and Squire, L.R. (1998) “Classical Conditioning and Brain System: The Role of Awareness” Science, 280, 77-81 Dayan, P., Kakade, S. and Montague, P.R. (2000) “Learning and selective attention” Nature Neuroscience, 3, 1218-1223. Pearce, J.M. and Hall, G. (1980) “A Model for Pavlovian Learning: Variations in the Effectiveness of Conditioned by Not of Unconditioned Stimuli” Psychological Review, 87, 532-552 79 References for Slides Eichenbaum, H. (2008) Learning & Memory, New York, NY: WW Norton and Company. Gluck, M.A., Mercado, E., & Myers, C.E. (2008) Learning & Memory: From Brain to Behavior, New York, NY: Worth Publishers. Barlow, H.B. (1989) “Unsupervised Learning” Neural Computation, 1, 295-311. Tolman, E.C. (1948) “Cognitive Maps in Rats and Men” Psychological Review, 55(4), 189-208. Goldstone, R.L. (1998) "Perceptual Learning" Annual Review of Psychology, 49, 585-612. Schyns, P. G., & Rodet, L. (1997). Categorization creates functional features. Journal of Experimental Psychology: Learning, Memory & Cognition, 23, 681–696. Lecture notes from Rob Goldstone, Brad Love 80