Download Slides - NYU Computation and Cognition Lab

Document related concepts

Vocabulary development wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Perception wikipedia , lookup

State-dependent memory wikipedia , lookup

Catastrophic interference wikipedia , lookup

Time perception wikipedia , lookup

Psychophysics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Donald O. Hebb wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Machine learning wikipedia , lookup

Learning theory (education) wikipedia , lookup

Learning wikipedia , lookup

Perceptual learning wikipedia , lookup

Transcript
An Introduction to Learning
Lecture 3/13
Todd M. Gureckis
Department of Psychology
New York University
[email protected]
1
Agenda for Today
Simple forms of non-associative learning
Habituation/Sensitization
Priming
Imprinting
Sensory adaptation
Perceptual Learning
Unsupervised Learning
2
Non-associative Learning
Most of the forms of learning we will cover in the next few weeks
(classical/instrumental conditioning) are about learning the
relationship between actions or stimuli and particular
consequences
Other forms of learning have no obvious relationship to reward/
consequence
Interesting place to start because the nature of these processes is
much less understood (neurally speaking), but also are a key
component of contemporary machine learning research
(unsupervised learning)
The reason for the lack of neural understanding is that the processes are
largely distributed throughout the brain (visual areas for perceptual learning,
higher cortex for habituation, reflex systems for other forms of habituation)
3
1
Tolman.
4
Tolman (1948)
“Cognitive Maps” to save the world!
Evidence for non-stimulus/response learning (could perhaps be
thought of as stimulus-stimulus behavior) were explored in five
paradigms
Latent learning
Vivarious trial and error
Searching for the stimulus
Hypotheses
Spatial Orientation
5
Latent Learning
6
Vicarious Trial and Error
Basically, when the rat is learning and has to
make a choice, there is a pattern of looking
back and forth between the options
They don’t do this at first when the task is
really hard, but do once they show learning
On each tasks, VTE’s appear early on
“VTEing, as I see it, is evidence that the
critical stages-whether in the first picking up
of the instructions for in the later making sure
of which stimulus is which - the animal’s
activity is not just one of responding
passively to discrete stimuli, but rather one
of active selecting and comparing of stimuli.”
7
Searching for the Stimulus
Watch: Werner Herzog get hit by sniper
http://www.youtube.com/watch?v=ylXqc8TQ15w
Immediate search for the stimulus that caused the pain
If the stimulus is quickly removed, rats will fail to learn the association
between cues and a fear response
8
Hypothesis Testing in the Rat
9
Spatial Orientation
light
10
Spatial Orientation
11
What does it mean to learn a “model” of the world?
A Computational Framework for Thinking about Nonassociative Learning - Unsupervised Learning
How might something like a “cognitive map” be learned
without “a teacher” (so to speak)?
Why would organisms both learning something that is not
directly tied to reward?
One way to think about this problem is to provide a clear
articulation of the computational level account of what the
organism is trying to do (e.g., Barlow, 1989)
12
A Computational Framework for Thinking about Nonassociative Learning - Unsupervised Learning
In information theoretic terms, our sensory signals are highly
redundant across time and space
To see this, just imagine a high entropy signal like random
white noise
Clearly our experience has higher structure than this highentropy signal, and the difference can essentially be viewed as
the redudancy of the signal.
We normally think of redundancy as lower information content
(why not get rid of the redudant part of the signal)
Barlow’s point is that this redundancy is basically our
knowledge about the world. It is the part of the signal that our
brains can latch onto as a source of structure
13
Key Points from Barlow (1989)
How might organism make use of this redundancy?
Genetic specification to invariant features of the environment
(e.g., taking advantage of light coming from above) ... also
Learning!
How might the brain learn to take advantage of this structure?
By tracking aspects of experience that are stable across
time (i.e., statistics).
Mean, Variance, Covariances
“...one might take advantage of covariance by devising a code
in which the measured correlations are “expected” in the
input, but removed from the output by a suitable set of linear
combinations of the input signals” --- Similar to idea of
differentiation
Alternatively, correlated signals might become fused into a
single code (categorical perception, unitization)
14
A more general approach: (replacing minimum entropy
coding with a more intuitive example)
The EM algorithm in this
case takes a large set of
inputs and discovers a
particular decomposition of
the features of those items in
a simple “code”
Instead of expressing each
item individually, the more
compressed code models
the stable and variable part
of the input
15
from
00000000000000000001
to
01001001
Key Points from Barlow (1989)
Building a model of the regularities in environment (i.e., an
internal code that captures aspects of the statistics in the
world) also captures the prior structure
Learning should largely be about deviation from expectations
“One can regard the model or map as something automatically
help up for comparison with the current input; it is like a
negative filter through which incoming messages are
automatically passed, so that what emerges is the difference
between what is actually happening and what one would
expect to happen, based on past experience. In this way, past
experience can be made continuously and automatically
available.”
We should take this as our computational-level account of
what unsupervised learning is designed to do. Allows the
organism to model stable aspects of the world as a way to
drives future learning.
16
2
Mechanism of nonassociative learning.
17
Habituation
Everyone in this classroom knows about habituation since you live
in one of the noisiest, dirtiest, crowded cities on earth
... or if you really want, you can consider Dahmer’s mom as an
example
18
Habituation
Everyone in this classroom knows about habituation since you live
in one of the noisiest, dirtiest, crowded cities on earth
... or if you really want, you can consider Dahmer’s mom as an
example
Is habituation a form of learning?
19
Habituation
Everyone in this classroom knows about habituation since you live
in one of the noisiest, dirtiest, crowded cities on earth
... or if you really want, you can consider Dahmer’s mom as an
example
Is habituation a form of learning?
20
Habituation as a Tool for studying
Learning and Memory
pa-pa-pa-pa-pa-pa-pa-pa-pa
21
Habituation as a Tool for studying
Learning and Memory
pa
da
22
Habituation as a Tool for studying
Learning and Memory
da
pa
23
Habituation as a Tool for studying
Learning and Memory
Longer looking time toward novel/
changed stimulus
reveals ability to detect differences
(called dishabituation)
24
Habituation as a Tool for
studying Learning and
Memory
25
The Basic Neurobiology
of Habituation
26
The Basic Neurobiology
of Habituation
27
The Basic Neurobiology
of Habituation
28
The Basic Neurobiology
of Habituation
Note that habituation can be both of
orienting response (kids looking at sounds/
pictures) or withdrawl (Aplysia)!
29
Sensitization
The flip side of habituation is sensitization, the increased
sensitivity of stimuli in the environment
Examples:
After 9-11 people became temporarily sensitized to the sound
of airplanes flying overhead and to seeing them in the sky
Are they simply two sides of the same coin?
30
No.
Habituation
Specific to particular stimulus, brain
circuit, due to repetition, short and
long-term effects
Sensitization
General to multiple stimuli and
response, general heightened
sensitivity, associated with emotional
stimuli, short lasting
31
The Basic Neurobiology
of Sensitization
32
The Basic
Neurobiology
of Sensitization
Dual process
theory
33
Habituation and Sensitization
Primitive forms of non-associative learning that modulate adaptive
behavior
Help organisms tune out and conserve energy by not processing
repeated stimuli
Converse is that threatening situations can cause a general
sensitization to stimuli throughout the nervous system
34
Perceptual Learning
35
Perceptual Learning
Changes in the ability to detect,
discrimination, and classify sensory
stimuli following extensive experience
with those stimuli
Happens in almost all of the brain
pathways we considered in the last
lectures (audition, vision, touch,
olfaction, etc...)
Non-associative (although can be
aided with rewards)
Unlike habituation not about reducing
some behavioral response, but aiding
in perceptual analysis of the
environment without effecting
specific responses
36
Perceptual Learning
perception
after learning
perception
Bridget Riley, Movement in Squares, 1961
37
Short-term to medium-term forms of perceptual
learning: Adaptation
**learn how to build your own upside down
glasses at http://www.instructables.com/id/Upsidedown-glasses/
38
Mechanisms of Perceptual Learning (Goldstone, 1998)
Attentional weighting - an adaptive allocation of
computational resources/processing to important parts of
the environment, ignoring irrelevant part (see Barlow’s
negative filter idea)
Imprinting - direct adaptation of the perceptual system to
the particular pattern of input (similar to priming)
Differentiation - increased differences in particular
percepts, breaking apart of stimuli that were once fused,
modeling the independent components
Unitization - the converse, grouping items into a single
functional unit that can be triggered in response to
particular combinations
39
Attentional weighting
40
Mechanisms and Processes of Perceptual
Learning (from Rob Goldstone and Joe Lee)
One of the simplest ways in which thoughts distort perception is by highlighting or
emphasizing certain aspects that are momentarily important.
41
Changes in Space: Perceptual Learning
Selective attention can cause a re-weighting of the importance of
particular kinds of information that can “warp” our perception
A
B
42
Changes in Space: Perceptual Learning
Attentional processes not fully strategic (e.g.
Shiffrin & Schneider, 1977)
Also evidence of negative priming (Tipper, 1992)
L
L
L
L
L
L
T
L
L
L
L L
L
Search for T ignore L
L
43
Changes in Space: Perceptual Learning
Attentional processes not fully strategic (e.g.
Shiffrin & Schneider, 1977)
Also evidence of negative priming (Tipper, 1992)
T
T
T
T
T
T
T
T
Z
T
T T
T
T
Search for Z ignore T
... Previous targets hard to ignore
(automatically capture
attention).
44
Changes in Space: Perceptual Learning
Attentional processes not fully strategic (e.g.
Shiffrin & Schneider, 1977)
Also evidence of negative priming (Tipper, 1992)
T
T
T
T
T
T
Search for Z ignore T
... Previous targets hard to ignore
(automatically capture
attention).
T
T
Z
T
T T
T
T
Also goes opposite ways, previous
distractors are responded to slower
than novel items... effects can last for
days
45
Imprinting
46
Imprinting of Whole Stimuli
Storage of entire stimuli in memory
Central to classic theories of learning an memory generally referred to as
instance-based models (Nosofsky, 1986; Logan, 1988)
Evidence for whole-stimulus storage includes:
Spoken word recognition between in original voice than in
different voice (Palmeri et al, 1993)
After training in numerosity judgments of dots, response times
can become the same for 6-11 dots only if the dots are
arranged as they were during training (Palmeri, 1997)
Identification of previously presented stimuli is higher for things
that have been preexposed
47
Stimuli
XY
X
48
Y
Schyns and Rodet (1997)
Training Phase
1
2
XY
1
3
X
2
X
Y
3
Y
49
XY
Schyns and Rodet (1997)
Test Phase
X
XY
X
***
Y
XY
***
Y
X
X-Y
***
Y
50
Schyns and Rodet (1997)
X-Y-XY
100
98
XY-X-Y
98
88
92
92
88
81
Percent Each
Classification
Percent Correct Classifications
Test Phase
75
50
25
0
19
X
Y
XY
100
75
54
50
25
0
27
19
XY
X
Y
X-Y
51
Schyns and Rodet (1997)
X-Y-XY
100
98
XY-X-Y
98
88
92
92
88
81
Percent Each
Classification
Percent Correct Classifications
Test Phase
75
50
25
0
19
X
Y
XY
100
75
54
50
25
0
27
19
XY
X
Y
X-Y
52
Schyns and Rodet (1997)
Mere exposure/Latent Learning
53
Priming and Repetition Suppression
Prior, repeated exposure to a
stimulus can lead to easier, faster,
and less effortful processing of that
stimulus
Examples include word-stem
completion tasks (MOT___ is
completed as MOTEL, but given
prior exposure to MOTH people fill
in MOTH)
Possible neural correlated is the
repetition suppression found in
neural signals (less activation/
activity on repeated presentations
of a stimulus)
54
Theoretical Models of R.S.
55
Mechanisms and Processes of Perceptual
Learning (from Rob Goldstone and Joe Lee)
With expertise or practice, differences are noticed between objects that were once thought to
be identical.
56
With training or experience, differences
that seem small become easier to detect
57
58
The 75% acuity threshold (place where you can
tell 1 or 2 dots separated by .5mm) is 3m
The 75% acuity threshold (place where you can
tell 1 or 2 lines separated by .5mm) is 20m!
Why are lines better than dots? ... the
answer can be understood in terms of
HYPER-ACUITY of the perceptual system
59
Stimulus specific
Only route to improved
performance is experience
Doesn’t require overt
discovery of some criterion
or feedback
Even specific to particular
regions of the visual field (low
level effect)
60
The Computational Basis of Hyperacuity
Key points:
1. A “sloppy” system with low
resolution can do surprisingly
well!
2. Tiny differences that fall
within the receptive field of
any individual neuron are
capture in a better way in the
aggregate action of a large
number of overlapping
neurons.
3. Different inputs are
projected into a “space” or
“map” the dimensionality of
which is given by the number
of neurons
4. Way better than a digitized
non-overlapping system such
as the CCD in your digital
camera!
61
Improvements in
discrimination can come
from increases in the
number of units responding
to a particular region of
input space
62
63
Another form of differentiation/unitization
Differences among items that fall into different categories are exaggerated, and differences
among items that fall into the same category are minimized.
64
Changes in Space: Perceptual Learning
Categorical Perception Effect
People are better able to
discriminate category
members when they come
from different categories than
when they come from from the
same category
An innate (Eimas et al. 1971)
and learned basis (Lively et al,
1993; Goldstone, 1998)
65
Changes in Space: Perceptual Learning
Beale & Keil (1995) examined if
individual faces are perceived
categorically
Sharpness strong for familiar
faces compared to unfamiliar
faces (although this is bad
example... Arnold is pretty
well known now!)
66
Computational Basis (Warping of Internal Codes)
67
Computational Basis (Selective Attention)
A
B
68
Mechanisms and Processes of Perceptual
Learning (from Rob Goldstone and Joe Lee)
We create perceptual dimensions (size, brightness, saturation, eccentricity) by witnessing
variation along these dimensions. We tend to order objects by their value on dimensions that
we create or already possess. Objects that are originally perceived holistically, without being
decomposed into separate dimensions, come to be perceived analytically, in terms of their
underlying dimensions.
69
Dimensionalization
Dimensions that are easy for adults to see are hard for kids. For example,
brightness and size appear fused (processes holistically) by young children
(Smith, 1989)
Children have trouble selectively attending to certain dimensions while
ignoring others (for example size and color) (Smith & Evans, 1989) and their
sorting behavior tends to be more “wholistic”
Things like brightness and saturation are typically considered integral in the
sense that you can’t ignore one and process the other, however with
extensive training, people can selectively attend to one or the other
(Goldstone, 1994)
70
Mechanisms and Processes of Perceptual
Learning (from Rob Goldstone and Joe Lee)
If a group of shapes form a coherent, contiguous pattern and are often repeated together, a
single chunk or unit will be formed by concatenating them together. In some ways, unitization
is the opposite of dimensionalization and segmentation. Whereas these latter mechanisms
break down an object into parts or dimensions, unitization creates a single unit from multiple
parts that occur together.
71
Unitization
72
Unitization
Complex conjunctive search
task, Shiffrin and Lightfoot
(1977) showed prolonged
improvements in decreased
search slopes over sessions
suggesting untization and
more effective processing of
the target
73
Unitization
Harder than
74
Mechanisms and Processes of Perceptual
Learning (from Rob Goldstone and Joe Lee)
We naturally see objects as being composed out of parts. Instead of perceiving indivisible objects, we perceive objects in
terms of their labelled or categorized parts. We break objects into parts that we have learned are relevant or important.
75
Pevtzow and Goldstone (1994)
76
Key Summary
There are a variety of mechanisms for non-associative learning
in the human and animal brain
These types of learning can be viewed as adaptation to the
environment to enable an efficient coding of experience
These forms of learning extend across multiple areas and levels
of behavior (forms of cognitive learning, low-level perceptual
effects)
In addition, these very simple forms of learning appear to
manifest in lower species as well.
77
Key Principals for the Semester
Learning and memory are closely related and intertwined
states of information processing
Major insights about learning and memory have come from
studies of the brain
The concept of multiple memory systems unifies the study
of learning and memory
The underlying bases of learning and memory are the same
in humans and animals
Our theoretical approaches to studying learning are always
closely tied to technological advances that are unfolding in
general society (e.g., today - machine learning)
78
Readings
Textbook reading: Gluck, Ch. 7 - Classical Conditioning
Rescorla, R.A. (1998) “Pavlovian Conditioning: It’s not what you think it is” American
Psychologist, 43(4), 151-160.
Rescorla, R.A. and Wagner, R.A. (1971) “A theory of Pavlovian Conditionig:
Variations in the Effectiveness of Reinforcement and Non-reinforcement” in Black,
A.H. & Prokasy, W.F (Eds.), Classical conditioning II: Current research and theory
(pp. 64-99). New York: Appleton-Century-Crofts.
Clark, R.E. and Squire, L.R. (1998) “Classical Conditioning and Brain System: The
Role of Awareness” Science, 280, 77-81
Dayan, P., Kakade, S. and Montague, P.R. (2000) “Learning and selective attention”
Nature Neuroscience, 3, 1218-1223.
Pearce, J.M. and Hall, G. (1980) “A Model for Pavlovian Learning: Variations in the
Effectiveness of Conditioned by Not of Unconditioned Stimuli” Psychological Review,
87, 532-552
79
References for Slides
Eichenbaum, H. (2008) Learning & Memory, New York, NY: WW Norton and Company.
Gluck, M.A., Mercado, E., & Myers, C.E. (2008) Learning & Memory: From Brain to
Behavior, New York, NY: Worth Publishers.
Barlow, H.B. (1989) “Unsupervised Learning” Neural Computation, 1, 295-311.
Tolman, E.C. (1948) “Cognitive Maps in Rats and Men” Psychological Review, 55(4), 189-208.
Goldstone, R.L. (1998) "Perceptual Learning" Annual Review of Psychology, 49, 585-612.
Schyns, P. G., & Rodet, L. (1997). Categorization creates functional features. Journal of
Experimental Psychology: Learning, Memory & Cognition, 23, 681–696.
Lecture notes from Rob Goldstone, Brad Love
80