Download ling411-19-Learning - OWL-Space

Document related concepts

Cortical cooling wikipedia , lookup

Activity-dependent plasticity wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Optogenetics wikipedia , lookup

Donald O. Hebb wikipedia , lookup

Synaptic gating wikipedia , lookup

Emotional lateralization wikipedia , lookup

Connectome wikipedia , lookup

Neuroesthetics wikipedia , lookup

Neuroeconomics wikipedia , lookup

Embodied language processing wikipedia , lookup

Perceptual learning wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Cognitive neuroscience of music wikipedia , lookup

Learning wikipedia , lookup

Recurrent neural network wikipedia , lookup

Catastrophic interference wikipedia , lookup

Node of Ranvier wikipedia , lookup

Machine learning wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Transcript
Ling 411 – 19
Learning
REVIEW
Operations in relational networks
 Relational networks are dynamic
 Activation moves along lines and through nodes
 Links have varying strengths
• A stronger link carries more activation, other
things being equal
 All nodes operate on two principles:
• Integration
 Of incoming activation
• Broadcasting
 To other nodes
Review
Operation of the Network
in terms of cortical columns
 The linguistic system operates as distributed
processing of multiple individual components
• “Nodes” in an abstract model
• These nodes are implemented as cortical columns
 Columnar Functions
• Integration: A column is activated if it receives enough
activation from other columns
 Can be activated to varying degrees
 Can keep activation alive for a period of time
• Broadcasting: An activated column transmits activation to
other columns
 Exitatory – contribution to higher level
 Inhibitory – dampens competition at same level
Additional operations: Learning
 Links get stronger when they are successfully used
(Hebbian learning)
• Learning consists of strengthening them
• Hebb 1948
 Threshold adjustment
• When a node is recruited its threshold increases
• Otherwise, nodes would be too easily satisfied
Neural processes for learning
 Basic principle: when a connection is successfully
used, it becomes stronger
• Successfully used if another connection to same
node is simultaneously active
 Mechanisms of strengthening
• Biochemical changes at synapses
• Growth of dendritic spines
• Formation of new synapses
 Weakening: when neurons fire independently of
each other their mutual connections (if any) weaken
Neural processes for learning
C
Synapses here get
strengthened
A
B
If connections AC and BC are active at the same time, and
if their joint activation is strong enough to activate C, they
both get strengthened
(adapted from Hebb)
Requirements that must be assumed
(implied by the Hebbian learning principle)
 Prerequisites:
• Initially, connection strengths are very weak
 Term: Latent Links
• They must be accompanied by nodes
 Term: Latent Nodes
• Latent nodes and latent connections must
be available for learning anything learnable
 The Abundance Hypothesis
• Abundant latent links
• Abundant latent nodes
Abundance is a property of biological systems generally
 Cf.: Acorns falling from an oak tree
 Cf.: A sea tortoise lays thousands of eggs
• Only a few will produce viable offspring
 Cf. Edelman: “silent synapses”
• The great preponderance of cortical
synapses are “silent” (i.e., latent)
 Electrical activity sent from a cell body to its
axon travels to thousands of axon branches,
even though only one or a few of them may
lead to downstream activation
Learning – The Basic Process
Latent
nodes
Latent
links
Dedicated
nodes and
links
Learning – The Basic Process
Latent
nodes
Let these
links get
activated
Learning – The Basic Process
Latent
nodes
Then these
nodes will get
activated
Learning – The Basic Process
That will
activate
these links
Learning – The Basic Process
This node
gets enough
activation to
satisfy its
threshold
Learning – The Basic Process
This node is
therefore
recruited
B
A
These links get
strengthened
and the node’s
threshold gets
raised
Learning – The Basic Process
This node is
now dedicated
to function AB
AB
B
A
Learning
Next time it
gets activated
it will send
activation on
these links to
next level
AB
B
A
Learning:
more terms
AB
Child nodes
Potential
Actual
B
A
Parent nodes
Learning: Deductions from
the basic process
 Learning is generally bottom-up
 The knowledge structure as learned by the cognitive
network is hierarchical — has multiple layers
 Hierarchy and proximity:
• Logically adjacent levels in a hierarchy can be expected to
be locally adjacent
 Excitatory connections are predominantly from one
layer of a hierarchy to the next
 Higher levels will tend to have larger numbers of
nodes than lower levels
Learning in cortical networks:
A Darwinian process
 The abundance hypothesis
• Needed to allow flexibility of learning
• Abundant latent nodes
 Must be present throughout cortex
• Abundant latent connections of a node
 Every node must have abundant latent links
 A trial-and-error process:
• Thousands of connection possibilities available
 The abundance hypothesis
• Strengthen those few that succeed
 Cf. natural selection
 “Neural Darwinism” (Edelman)
Anatomical support for the hypothesis
of abundant latent links
 A typical pyramidal node has
• thousands of incoming synapses
 connecting to its dendrites and its cell body
• thousands of output synapses
 from multiple branches of its axon
 But only a very few of these are recruited for a specific
function
• For example, the typical node in a functional web has
perhaps only dozens or maybe up to 100 or so links
 By far the great preponderance of these are latent
• Edelman: “silent synapses”
Learning – Enhanced understanding
 This “basic process” is not the full story
 The nodes of the above depiction:
• Are they minicolumns, maxicolumns, or what?
• Most likely, a bundle of contiguous columns
• Often a maxicolumn or hypercolumn
REVIEW
Columns of different sizes
 Minicolumn
• Basic anatomically described unit
• 70-110 neurons (avg 75-80)
• Diameter barely more than that of pyramidal cell body (30-50 μ)
 Maxicolumn (term used by Mountcastle)
• Diameter 300-500 μ
• Bundle of 100 or more contiguous minicolumns
 Hypercolumn – up to 1 mm diameter
• Can be long and narrow rather than cylindrical
• Bundle of contiguous maxicolumns
 Functional column
• Intermediate between minicolumn and maxicolumn
• A contiguous group of minicolumns
REVIEW
Hypercolums: Modules of maxicolumns
A homotypical
area in the
temporal lobe
of a macaque
monkey
Functional columns
vis-à-vis minicolumns and maxicolumns
 Maxicolumn
• About 100 minicolumns
• About 300-500 microns in diameter
 Functional column
• A group of one to several contiguous
minicolumns within a maxicolumn
• Established during learning
• Initially it might be an entire maxicolumn
Learning in a system with
columns of different sizes
 At early learning stage, maybe a whole hypercolumn
gets recruited
 Later, maxicolumns for further distinctions
 Still later, functional columns as subcolumns within
maxicolumns
 New term: Supercolumn – a group of minicolumns of
whatever size, hypercolumn, maxicolumn, functional
column
• Any supercolumn is potentially divisible
 Links between supercolumns will thus consist of
multiple fibers
REVIEW
Functional columns in phonological recognition:
A hypothesis
 Demisyllable (e.g. /de-/) activates a maxicolumn
 Different functional columns within the maxicolumn
for syllables with this demisyllable
• /ded/, /deb/, /det/, /dek/, /den/, /del/
REVIEW
Functional columns in phonological recognition
A hypothesis
deb
[de-]
ded
det
de-
den
del
dek
A maxicolumn
(ca. 100 minicolumns)
Divided into
functional columns
(Note that all
respond to /de-/)
Functional columns in phonological recognition
A hypothesis
deb
[de-]
ded
det
de-
den
del
dek
This one learned first
Then, subdivisions
are established
REVIEW
Adjacent maxicolumns in phonological cortex?
Hypercolum
de-
te-
be-
pe-
ge-
ke-
A module of
contiguous
maxicolumns
Each of these
maxicolumns is
divided into
functional columns
Note that the entire module responds to [-e-]
Revisit the basic learning diagram:
Let each node represent a supercolumn
Latent
supercolumns
Bundles of
latent links
Dedicated
supercolumns
and links
Learning – The Basic Process
Let these
links get
activated
Learning – The Basic Process:
Refined view
Then these
supercolumns
get activated
Learning – The Basic Process:
Refined view
That will
activate these
links
Learning – Refined view
This
supercolumn
gets enough
activation to
satisfy its
threshold
Learning – Refined view
This supercolumn is
recruited for
function AB
AB
B
A
Learning:
Refined view
Next time it
gets activated
it will send
activation on
these links to
next level
AB
B
A
Learning
Refined view
Can get
subdivided for
finer distinctions
AB
B
A
Learning:
Refined view
AB
Hypercolumn
composed of 3
maxicolumns –
Can get
subdivided for
finer distinctions
B
A
A further enhancement
 Minicolumns within a supercolumn have
mutual horizontal excitatory connections
 Therefore, some minicolumns can get
activated from their neighbors even if
they don’t receive activation from outside
Learning: refined view
If, later, C is activated
along with A and B,
then maxicolumn ABC
is recruited for ABC
ABC
AB
B
A
C
Learning:
refined view
And the
connection
from C to ABC
is strengthened
–it is no longer
latent
ABC
AB
B
A
C
Learning phonological distinctions:
A hypothesis
deb
ded
det
de-
de-
te-
be-
pe-
ge-
ke-
den
del
dek
3. The maxicolumn gets
divided into functional
columns
2. It gets subdivided
into maxicolumns for
demisyllables
1. In learning, this
hypercolumn gets
established first,
responding to [-e-]
Remaining question – learning lateral inhibition
 When a hypercolumn is first recruited, no lateral
inhibition among its internal subdivisions
• (Or very little)
 Later, when finer distinctions are learned, they
get reinforced by lateral inhibition
• Latent inhibitory neurons become activated
 Question: How does this process work?
• I.e., what makes these inhibitory neurons
change from latent to active?
“Evolutionary Learning” and
the Proximity Principle
 Related functions tend to be in close proximity
• If very closely related, they tend to be adjacent
 Areas which integrate properties of different subsystems
(e.g., different sensory modalities) tend to be in locations
intermediate between those subsystems
Evolutionary Learning and the Proximity Principle

Start with the observation:
• Related areas tend to be adjacent to each other
 Primary auditory and Wernicke’s area
 V1 and V2, etc.
 Wernicke’s area and lexical-conceptual
information – angular gyrus, MTG
• Thus we have the ‘proximity principle’
 Question: Why – How to explain?
How to Explain the Proximity Principle?

Factors responsible for observations of
proximity in cortical structure
1. Economic necessity
2. Genetic factors
3. Experience – provides details of localization
within the limits imposed by genetic factors
Proximity: Economic necessity
 Question: Could a given column be connected to any
other column anywhere in the cortex?
 That would require a huge number of available latent
connections
 Way more than are present
 Hence there are strict limits on intercolumn connectivity
 Therefore, proximity is necessary just for economy of
representation
Limits on intercolumn connectivity
 Number of cortical minicolumns:
• If 27 billion neurons in entire cortex
• If avg. 77 neurons per minicolumn
• Then 350 million minicolumns in the cortex
 Extent of available latent connections to other columns
• Perhaps 35,000 to 350,000
• Do the math..
 A given column has available latent connections
to between 1/1000 and 1/10000 of the other
columns in the cortex
Locations of available latent connections
 Local
• Surrounding area
• Horizontal connections (grey matter)
 Intermediate
• Short-distance fibers in white matter
• For example from one gyrus to neighboring gyrus
 Long-distance
• Long-distance fiber bundles
• At ends, considerable branching
The role of long-distance fibers
 Arcuate fasciculus
• Genetically determined
• Limits location of phonological recognition area
 Interhemispheric fibers
• Also genetically determined
• Wernicke’s area – RH homolog of W’s area
• Broca’s area – RH homolog of B’s area
• Etc.
Cortical connectivity properties
(Cf. Pulvermüller 2002:17)
 Probability of adjacent areas being connected: >70%
• But if we count by columns instead of cells the
figure is probably higher, maybe close to 100%
 Probability of distant areas being connected: 15-30%
• Distant areas: at least one intervening area
• In Macaque monkey, most areas have links to 10
or more other areas within same hemisphere
Cortical connectivity properties
 Probability of adjacent areas being connected: >70%
(Pulvermüller p. 17)
• But if we count by minicolumns instead of cells the
figure is probably higher, maybe close to 100%
 Probability of distant areas being connected: 15-30%
(p. 17)
• Distant areas: at least one intervening area
• In Macaque monkey, most areas have links to 10
or more other areas within same hemisphere
More cortical connectivity properties
 Most areas are connected to homotopic area of
opposite hemisphere
 Most connections between areas are reciprocal
 Primary areas not directly connected to one
another, except for motor-somatosensory
• Connections under central sulcus
Degrees of separation
between cortical neurons or columns
 For neurons of neighboring columns: 1
 For distant neurons in same hemisphere
• Range: 1 to about 5 or 6 (estimate)
• Mostly 1, 2, or 3, especially if functionally
closely related
• Average about 3 (estimate)
 For opposite hemisphere
• Add 1 to figures for same hemisphere
 Probably, for any two columns anywhere in
the cortex, whether functionally related or
not, fewer than 6 degrees of separation
Some long-distance fiber bundles
(schematic)
Two Factors in Localization
 Genetic factors determine general area for a
particular type of knowledge
 Within this general area the learning-based proximity
factors select a more narrowly defined location
 Thus the exact localization depends on experience of
the individual
 When part of the system is damaged, learning-based
factors can take over and result in an abnormal
location for a function – plasticity
Genetically determined proximity
 Genetically-determined proximity would have
developed over a long period of evolution
• Many features are shared with other mammals
 This process could be called ‘evolutionary learning’
 According to standard evolutionary theory..
• A process of trial-and-error:
 Trial
•
Produce varieties
 Error:
•
•
Most varieties will not survive/reproduce
The others – the best among them – are selected
 Other genetic factors supplement proximity
• Long-distance fiber bundles
Some innate factors relating to localization
 Primary areas
 Long-distance fiber bundles
Innate factors relating to primary areas
 Location
• Genetically determined locations
 But there are exceptions
• Malformation
• Damage
 Structure
• Genetically determined structures adapted to
sensory modality (they have to be where they are)
 Heterotypical structures
• Found in primary areas
 Primary visual
 Primary auditory
REVIEW
A Heterotypical (i.e., genetically built-in) structure
Visual motion perception
An area in the
posterior bank of
the superior
temporal sulcus
of a macaque
monkey (“V-5”)
A heterotpical area
Albright et al. 1984
REVIEW
A Heterotypical structure:
Auditory areas in a cat’s cortex
A1
AAF – Anterior auditory field
A1 – Primary auditory field
PAF – Posterior auditory field
VPAF – Ventral posterior
auditory field
Innate factors relating to localization
 The primary areas
 Long-distance fiber bundles
• Interhemispheric – via corpus callosum
• Longitudinal – from front to back
 Arcuate fasciculus is part of the
superior longitudinal fasciculus
 They allow for exceptions to proximity
• Areas closely related yet not neighboring
Implications of the proximity principle
 System level
• Functionally related subsystems will tend to be close to
one another
• Neighboring subsystems will probably have related
functions
 Cortical column level
• Nodes for similar functions should be physically close to
one another
• Nodes that are physically close to one another probably
have similar functions
 Therefore..
• Neighboring nodes are likely to be competitors
• They need to have mutually inhibitory connections
Applying the proximity principle
 For both types (genetic and experience-based) we can
make predictions of where various functions are most
likely to be located, based on the proximity principle
• Broca’s area near the inferior precentral gyrus
• Wernicke’s area near the primary auditory area
 Such predictions are possible even in cases where we
don’t know whether genetics or learning is responsible
• maybe both
Deriving location from proximity hypothesis
 The cortex has to provide for “decoding” speech input
 Speech input enters the cortex in the primary auditory
area
 Results of the “decoding” (recognition of syllables etc.)
are represented in Wernicke’s area
 Why is Wernicke’s area where it is?
Speech Recognition in the Left Hemisphere
Phonological
Production
Primary Auditory
Area
Phonological
Recognition
Wernicke’s Area
Exercise: Location of Wernicke’s area
 Why is phonological recognition in the posterior superior
temporal gyrus?
• Alternatives to consider:
 Anterior to primary auditory cortex
• Advantage: would be close to phonological production
 Inferior to primary auditory cortex
 (There are two reasons)
Answer: Location of Wernicke’s area
 Wernicke’s area pretty much has to be where it is to take
advantage of the arcuate fasciculus
 The location of W.’s area makes it close to angular gyrus,
likely area for noun lemmas (morphemes and complex
morphemes)
 Also, close to SMG, presumed area for phonological
monitoring
• (Why?
 Because it is adjacent to primary somatosensory area)
More exercises
 Explaining likely locations of morphemes
• verb morphemes in the frontal lobe
• noun morphemes in the angular gyrus
and/or middle temporal gyrus
 The dorsal (where) pathway of visual perception
Experience-based proximity
 Can be expected to be operative
• more at higher (more abstract) levels, less at
lower levels
• for areas of knowledge that have developed
too recently for evolution to have played a role
 Reading
 Writing
 Higher mathematics
 Physics, computer technology, etc.
Innate features that support language




Columnar structure
Coding of frequencies in Heschl’s gyrus
Arcuate fasciculus
Interhemispheric connections (via corpus callosum)
– e.g., connect Wernicke’s area with RH homolog
 Spread of myelination from primary areas to
successively higher levels
 Left-hemisphere dominance for grammar etc.
Consequences of the Proximity Principle
 Nodes in close competition will tend to be neighbors
• And their mutual competition is preordained even
though the properties they are destined to integrate
will only be established through the learning process
 Therefore, inhibitory connections should exist
predominantly among nodes of the same hierarchical
level
• Confirmed by neuroanatomy
• The presence of their mutual inhibitory connections is
presumably specified genetically
Variation in threshold strength
N.B. All of these
properties are
found in neural
structures
 Thresholds are not fixed
• They vary as a result of use – learning
 Nor are they integral
 What we really have are threshold functions,
such that
• A weak amount of incoming activation
produces no response
• A larger degree of activation results in
weak outgoing activation
• A still higher degree of activation yields
strong outgoing activation
• S-shaped (“sigmoid”) function
Outgoing activation
Threshold function
--------------- Incoming activation -------------------
end