* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ling411-19-Learning - OWL-Space
Cortical cooling wikipedia , lookup
Activity-dependent plasticity wikipedia , lookup
Feature detection (nervous system) wikipedia , lookup
Optogenetics wikipedia , lookup
Donald O. Hebb wikipedia , lookup
Synaptic gating wikipedia , lookup
Emotional lateralization wikipedia , lookup
Neuroesthetics wikipedia , lookup
Neuroeconomics wikipedia , lookup
Embodied language processing wikipedia , lookup
Perceptual learning wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Psychological behaviorism wikipedia , lookup
Cognitive neuroscience of music wikipedia , lookup
Recurrent neural network wikipedia , lookup
Catastrophic interference wikipedia , lookup
Node of Ranvier wikipedia , lookup
Machine learning wikipedia , lookup
Ling 411 – 19 Learning REVIEW Operations in relational networks  Relational networks are dynamic  Activation moves along lines and through nodes  Links have varying strengths • A stronger link carries more activation, other things being equal  All nodes operate on two principles: • Integration  Of incoming activation • Broadcasting  To other nodes Review Operation of the Network in terms of cortical columns  The linguistic system operates as distributed processing of multiple individual components • “Nodes” in an abstract model • These nodes are implemented as cortical columns  Columnar Functions • Integration: A column is activated if it receives enough activation from other columns  Can be activated to varying degrees  Can keep activation alive for a period of time • Broadcasting: An activated column transmits activation to other columns  Exitatory – contribution to higher level  Inhibitory – dampens competition at same level Additional operations: Learning  Links get stronger when they are successfully used (Hebbian learning) • Learning consists of strengthening them • Hebb 1948  Threshold adjustment • When a node is recruited its threshold increases • Otherwise, nodes would be too easily satisfied Neural processes for learning  Basic principle: when a connection is successfully used, it becomes stronger • Successfully used if another connection to same node is simultaneously active  Mechanisms of strengthening • Biochemical changes at synapses • Growth of dendritic spines • Formation of new synapses  Weakening: when neurons fire independently of each other their mutual connections (if any) weaken Neural processes for learning C Synapses here get strengthened A B If connections AC and BC are active at the same time, and if their joint activation is strong enough to activate C, they both get strengthened (adapted from Hebb) Requirements that must be assumed (implied by the Hebbian learning principle)  Prerequisites: • Initially, connection strengths are very weak  Term: Latent Links • They must be accompanied by nodes  Term: Latent Nodes • Latent nodes and latent connections must be available for learning anything learnable  The Abundance Hypothesis • Abundant latent links • Abundant latent nodes Abundance is a property of biological systems generally  Cf.: Acorns falling from an oak tree  Cf.: A sea tortoise lays thousands of eggs • Only a few will produce viable offspring  Cf. Edelman: “silent synapses” • The great preponderance of cortical synapses are “silent” (i.e., latent)  Electrical activity sent from a cell body to its axon travels to thousands of axon branches, even though only one or a few of them may lead to downstream activation Learning – The Basic Process Latent nodes Latent links Dedicated nodes and links Learning – The Basic Process Latent nodes Let these links get activated Learning – The Basic Process Latent nodes Then these nodes will get activated Learning – The Basic Process That will activate these links Learning – The Basic Process This node gets enough activation to satisfy its threshold Learning – The Basic Process This node is therefore recruited B A These links get strengthened and the node’s threshold gets raised Learning – The Basic Process This node is now dedicated to function AB AB B A Learning Next time it gets activated it will send activation on these links to next level AB B A Learning: more terms AB Child nodes Potential Actual B A Parent nodes Learning: Deductions from the basic process  Learning is generally bottom-up  The knowledge structure as learned by the cognitive network is hierarchical — has multiple layers  Hierarchy and proximity: • Logically adjacent levels in a hierarchy can be expected to be locally adjacent  Excitatory connections are predominantly from one layer of a hierarchy to the next  Higher levels will tend to have larger numbers of nodes than lower levels Learning in cortical networks: A Darwinian process  The abundance hypothesis • Needed to allow flexibility of learning • Abundant latent nodes  Must be present throughout cortex • Abundant latent connections of a node  Every node must have abundant latent links  A trial-and-error process: • Thousands of connection possibilities available  The abundance hypothesis • Strengthen those few that succeed  Cf. natural selection  “Neural Darwinism” (Edelman) Anatomical support for the hypothesis of abundant latent links  A typical pyramidal node has • thousands of incoming synapses  connecting to its dendrites and its cell body • thousands of output synapses  from multiple branches of its axon  But only a very few of these are recruited for a specific function • For example, the typical node in a functional web has perhaps only dozens or maybe up to 100 or so links  By far the great preponderance of these are latent • Edelman: “silent synapses” Learning – Enhanced understanding  This “basic process” is not the full story  The nodes of the above depiction: • Are they minicolumns, maxicolumns, or what? • Most likely, a bundle of contiguous columns • Often a maxicolumn or hypercolumn REVIEW Columns of different sizes  Minicolumn • Basic anatomically described unit • 70-110 neurons (avg 75-80) • Diameter barely more than that of pyramidal cell body (30-50 μ)  Maxicolumn (term used by Mountcastle) • Diameter 300-500 μ • Bundle of 100 or more contiguous minicolumns  Hypercolumn – up to 1 mm diameter • Can be long and narrow rather than cylindrical • Bundle of contiguous maxicolumns  Functional column • Intermediate between minicolumn and maxicolumn • A contiguous group of minicolumns REVIEW Hypercolums: Modules of maxicolumns A homotypical area in the temporal lobe of a macaque monkey Functional columns vis-à-vis minicolumns and maxicolumns  Maxicolumn • About 100 minicolumns • About 300-500 microns in diameter  Functional column • A group of one to several contiguous minicolumns within a maxicolumn • Established during learning • Initially it might be an entire maxicolumn Learning in a system with columns of different sizes  At early learning stage, maybe a whole hypercolumn gets recruited  Later, maxicolumns for further distinctions  Still later, functional columns as subcolumns within maxicolumns  New term: Supercolumn – a group of minicolumns of whatever size, hypercolumn, maxicolumn, functional column • Any supercolumn is potentially divisible  Links between supercolumns will thus consist of multiple fibers REVIEW Functional columns in phonological recognition: A hypothesis  Demisyllable (e.g. /de-/) activates a maxicolumn  Different functional columns within the maxicolumn for syllables with this demisyllable • /ded/, /deb/, /det/, /dek/, /den/, /del/ REVIEW Functional columns in phonological recognition A hypothesis deb [de-] ded det de- den del dek A maxicolumn (ca. 100 minicolumns) Divided into functional columns (Note that all respond to /de-/) Functional columns in phonological recognition A hypothesis deb [de-] ded det de- den del dek This one learned first Then, subdivisions are established REVIEW Adjacent maxicolumns in phonological cortex? Hypercolum de- te- be- pe- ge- ke- A module of contiguous maxicolumns Each of these maxicolumns is divided into functional columns Note that the entire module responds to [-e-] Revisit the basic learning diagram: Let each node represent a supercolumn Latent supercolumns Bundles of latent links Dedicated supercolumns and links Learning – The Basic Process Let these links get activated Learning – The Basic Process: Refined view Then these supercolumns get activated Learning – The Basic Process: Refined view That will activate these links Learning – Refined view This supercolumn gets enough activation to satisfy its threshold Learning – Refined view This supercolumn is recruited for function AB AB B A Learning: Refined view Next time it gets activated it will send activation on these links to next level AB B A Learning Refined view Can get subdivided for finer distinctions AB B A Learning: Refined view AB Hypercolumn composed of 3 maxicolumns – Can get subdivided for finer distinctions B A A further enhancement  Minicolumns within a supercolumn have mutual horizontal excitatory connections  Therefore, some minicolumns can get activated from their neighbors even if they don’t receive activation from outside Learning: refined view If, later, C is activated along with A and B, then maxicolumn ABC is recruited for ABC ABC AB B A C Learning: refined view And the connection from C to ABC is strengthened –it is no longer latent ABC AB B A C Learning phonological distinctions: A hypothesis deb ded det de- de- te- be- pe- ge- ke- den del dek 3. The maxicolumn gets divided into functional columns 2. It gets subdivided into maxicolumns for demisyllables 1. In learning, this hypercolumn gets established first, responding to [-e-] Remaining question – learning lateral inhibition  When a hypercolumn is first recruited, no lateral inhibition among its internal subdivisions • (Or very little)  Later, when finer distinctions are learned, they get reinforced by lateral inhibition • Latent inhibitory neurons become activated  Question: How does this process work? • I.e., what makes these inhibitory neurons change from latent to active? “Evolutionary Learning” and the Proximity Principle  Related functions tend to be in close proximity • If very closely related, they tend to be adjacent  Areas which integrate properties of different subsystems (e.g., different sensory modalities) tend to be in locations intermediate between those subsystems Evolutionary Learning and the Proximity Principle  Start with the observation: • Related areas tend to be adjacent to each other  Primary auditory and Wernicke’s area  V1 and V2, etc.  Wernicke’s area and lexical-conceptual information – angular gyrus, MTG • Thus we have the ‘proximity principle’  Question: Why – How to explain? How to Explain the Proximity Principle?  Factors responsible for observations of proximity in cortical structure 1. Economic necessity 2. Genetic factors 3. Experience – provides details of localization within the limits imposed by genetic factors Proximity: Economic necessity  Question: Could a given column be connected to any other column anywhere in the cortex?  That would require a huge number of available latent connections  Way more than are present  Hence there are strict limits on intercolumn connectivity  Therefore, proximity is necessary just for economy of representation Limits on intercolumn connectivity  Number of cortical minicolumns: • If 27 billion neurons in entire cortex • If avg. 77 neurons per minicolumn • Then 350 million minicolumns in the cortex  Extent of available latent connections to other columns • Perhaps 35,000 to 350,000 • Do the math..  A given column has available latent connections to between 1/1000 and 1/10000 of the other columns in the cortex Locations of available latent connections  Local • Surrounding area • Horizontal connections (grey matter)  Intermediate • Short-distance fibers in white matter • For example from one gyrus to neighboring gyrus  Long-distance • Long-distance fiber bundles • At ends, considerable branching The role of long-distance fibers  Arcuate fasciculus • Genetically determined • Limits location of phonological recognition area  Interhemispheric fibers • Also genetically determined • Wernicke’s area – RH homolog of W’s area • Broca’s area – RH homolog of B’s area • Etc. Cortical connectivity properties (Cf. Pulvermüller 2002:17)  Probability of adjacent areas being connected: >70% • But if we count by columns instead of cells the figure is probably higher, maybe close to 100%  Probability of distant areas being connected: 15-30% • Distant areas: at least one intervening area • In Macaque monkey, most areas have links to 10 or more other areas within same hemisphere Cortical connectivity properties  Probability of adjacent areas being connected: >70% (Pulvermüller p. 17) • But if we count by minicolumns instead of cells the figure is probably higher, maybe close to 100%  Probability of distant areas being connected: 15-30% (p. 17) • Distant areas: at least one intervening area • In Macaque monkey, most areas have links to 10 or more other areas within same hemisphere More cortical connectivity properties  Most areas are connected to homotopic area of opposite hemisphere  Most connections between areas are reciprocal  Primary areas not directly connected to one another, except for motor-somatosensory • Connections under central sulcus Degrees of separation between cortical neurons or columns  For neurons of neighboring columns: 1  For distant neurons in same hemisphere • Range: 1 to about 5 or 6 (estimate) • Mostly 1, 2, or 3, especially if functionally closely related • Average about 3 (estimate)  For opposite hemisphere • Add 1 to figures for same hemisphere  Probably, for any two columns anywhere in the cortex, whether functionally related or not, fewer than 6 degrees of separation Some long-distance fiber bundles (schematic) Two Factors in Localization  Genetic factors determine general area for a particular type of knowledge  Within this general area the learning-based proximity factors select a more narrowly defined location  Thus the exact localization depends on experience of the individual  When part of the system is damaged, learning-based factors can take over and result in an abnormal location for a function – plasticity Genetically determined proximity  Genetically-determined proximity would have developed over a long period of evolution • Many features are shared with other mammals  This process could be called ‘evolutionary learning’  According to standard evolutionary theory.. • A process of trial-and-error:  Trial • Produce varieties  Error: • • Most varieties will not survive/reproduce The others – the best among them – are selected  Other genetic factors supplement proximity • Long-distance fiber bundles Some innate factors relating to localization  Primary areas  Long-distance fiber bundles Innate factors relating to primary areas  Location • Genetically determined locations  But there are exceptions • Malformation • Damage  Structure • Genetically determined structures adapted to sensory modality (they have to be where they are)  Heterotypical structures • Found in primary areas  Primary visual  Primary auditory REVIEW A Heterotypical (i.e., genetically built-in) structure Visual motion perception An area in the posterior bank of the superior temporal sulcus of a macaque monkey (“V-5”) A heterotpical area Albright et al. 1984 REVIEW A Heterotypical structure: Auditory areas in a cat’s cortex A1 AAF – Anterior auditory field A1 – Primary auditory field PAF – Posterior auditory field VPAF – Ventral posterior auditory field Innate factors relating to localization  The primary areas  Long-distance fiber bundles • Interhemispheric – via corpus callosum • Longitudinal – from front to back  Arcuate fasciculus is part of the superior longitudinal fasciculus  They allow for exceptions to proximity • Areas closely related yet not neighboring Implications of the proximity principle  System level • Functionally related subsystems will tend to be close to one another • Neighboring subsystems will probably have related functions  Cortical column level • Nodes for similar functions should be physically close to one another • Nodes that are physically close to one another probably have similar functions  Therefore.. • Neighboring nodes are likely to be competitors • They need to have mutually inhibitory connections Applying the proximity principle  For both types (genetic and experience-based) we can make predictions of where various functions are most likely to be located, based on the proximity principle • Broca’s area near the inferior precentral gyrus • Wernicke’s area near the primary auditory area  Such predictions are possible even in cases where we don’t know whether genetics or learning is responsible • maybe both Deriving location from proximity hypothesis  The cortex has to provide for “decoding” speech input  Speech input enters the cortex in the primary auditory area  Results of the “decoding” (recognition of syllables etc.) are represented in Wernicke’s area  Why is Wernicke’s area where it is? Speech Recognition in the Left Hemisphere Phonological Production Primary Auditory Area Phonological Recognition Wernicke’s Area Exercise: Location of Wernicke’s area  Why is phonological recognition in the posterior superior temporal gyrus? • Alternatives to consider:  Anterior to primary auditory cortex • Advantage: would be close to phonological production  Inferior to primary auditory cortex  (There are two reasons) Answer: Location of Wernicke’s area  Wernicke’s area pretty much has to be where it is to take advantage of the arcuate fasciculus  The location of W.’s area makes it close to angular gyrus, likely area for noun lemmas (morphemes and complex morphemes)  Also, close to SMG, presumed area for phonological monitoring • (Why?  Because it is adjacent to primary somatosensory area) More exercises  Explaining likely locations of morphemes • verb morphemes in the frontal lobe • noun morphemes in the angular gyrus and/or middle temporal gyrus  The dorsal (where) pathway of visual perception Experience-based proximity  Can be expected to be operative • more at higher (more abstract) levels, less at lower levels • for areas of knowledge that have developed too recently for evolution to have played a role  Reading  Writing  Higher mathematics  Physics, computer technology, etc. Innate features that support language     Columnar structure Coding of frequencies in Heschl’s gyrus Arcuate fasciculus Interhemispheric connections (via corpus callosum) – e.g., connect Wernicke’s area with RH homolog  Spread of myelination from primary areas to successively higher levels  Left-hemisphere dominance for grammar etc. Consequences of the Proximity Principle  Nodes in close competition will tend to be neighbors • And their mutual competition is preordained even though the properties they are destined to integrate will only be established through the learning process  Therefore, inhibitory connections should exist predominantly among nodes of the same hierarchical level • Confirmed by neuroanatomy • The presence of their mutual inhibitory connections is presumably specified genetically Variation in threshold strength N.B. All of these properties are found in neural structures  Thresholds are not fixed • They vary as a result of use – learning  Nor are they integral  What we really have are threshold functions, such that • A weak amount of incoming activation produces no response • A larger degree of activation results in weak outgoing activation • A still higher degree of activation yields strong outgoing activation • S-shaped (“sigmoid”) function Outgoing activation Threshold function --------------- Incoming activation ------------------- end
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            