Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit Course Overview  Learning  Introduction to neural nets  Learning syntax  Evolution  Syntax  Learning bias and structure  Culture  Iterated learning  The Talking Heads (practical) Computers for modelling  Computers in linguistics  Engineering (speech and language technologies)  Research tools (waveform analysis, psycholinguistic stimuli etc.)  Recently: modelling building  Why build models?  Why use computers?  What is a model anyway? What is a model?  One view: MODEL THEORY PREDICTION OBSERVATION  We use models when we can’t be sure what our theories predict  Especially useful when dealing with complex systems A simple example  Vowels exist in a “space”  Only some patterns arise cross-linguistically  E.g. vowel space seems to be symmetrically filled  Why? Theory to Model  We need a theory to explain vowel-space universal  Possible theory:  Vowels tend to avoid being close to each other to maintain perceptual distinctiveness.  Use model to test theory (Liljencrants & Lindblom 1972)  In general, computational models are useful when dealing with “complex systems” Is language a complex system?  Yes – evolution on many different timescales: Individual learning Cultural evolution Biological evolution  Computational models will help us understand these interactions… Learning  Language learning is crucial to language evolution  What is learning?  Learning occurs when an organism changes its internal state on the basis of experience  What do we need to model learning? 1. a model of internal states 2. A model of experience 3. An algorithm to change 1 into 2 One approach: Neural nets  An approach to internal states based on the brain  An artificial neuron is a computational unit that sums inputs and uses them to decide whether to produce an output Networks of neurons  Typically there will be many connected neurons  Information is stored in weights on the connections  Weights multiply signals sent between nodes  Signals into a node can be excitatory or inhibitory An artificial neuron neti   wij a j j  Add up all the inputs multiplied by their weights  f(net) is the “activation function” that scales the input A useful activation function 1 ai  1  e  neti  All or nothing for big excitations or inhibitions…  … but more sensitive in between. AND: a very simple network  A network that works out if both inputs are activated: OUTPUT -7.5 5 5 BIAS NODE (always set to 1.0) INPUT 1 INPUT 2  Network gives an output over 0.5 only if both inputs are 1. OR: another very simple network  A network that works out if either input is activated: OUTPUT -7.5 10 10 BIAS NODE (always set to 1.0) INPUT 1 INPUT 2  Network gives an output over 0.5 if either input is 1. XOR: a difficult challenge  A network that works out if only one input is activated: OUTPUT ? ? ? BIAS NODE (always set to 1.0) INPUT 1 INPUT 2  Solution needs more complex net with three layers. WHY? XOR network - step 1  XOR is the same as OR but not AND  Calculate OR  Calculate NOT AND  AND the results AND NOT AND OR XOR network - step 2 OUTPUT BIAS NODE -7.5 -7.5 5 5 AND 7.5 HIDDEN 1 HIDDEN 2 10 10 -5 -5 INPUT 1 INPUT 2 NOT AND OR But what about learning?  We now have:  a model of internal states (connection weights)  a model of experience (inputs and outputs)  Learning:  set the weights in response to experience  How?  Compare network behaviour with “correct” behaviour  Adjust the weights to reduce network error Error-driven learning 1. Set weights to random values 2. Present input pattern 3. Feed-forward activation through the network to get an output 4. Calculate difference between output and desired output (i.e. error) 5. Adjust weights so that the error is reduced 6. Repeat until network is producing the desired results. Gradient descent     Gradient descent is a form of error-driven learning Start on random point of “error surface” Move on surface in direction of steepest slope Potential problems:  May overshoot the global minimum  Might get stuck in local minimum Example: learning past tense of verbs  Network that takes present tense form of verb…  …and produces past tense.  Uses examples to set weights  Generalises to add /-ed/ to verbs it’s never seen before.  Has it learnt a linguistic rule? Is this psychologically plausible?  We need an error signal  Where does this error signal come from?  Possibilities:  A teacher  Reinforcement  The outcome of some prediction:  e.g. what’s the next word?  what’s the past tense of this verb? Summary  Modelling tests theories  Computer modelling appropriate for complex systems  Language evolution involves several complex systems  Neural nets are one approach to modelling learning  Networks can be made to adapt to data through error-driven learning  Next lecture: how to model acquisition of syntax