Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Why memory matters in English grammar Dick Hudson Manchester, March 2009 Memory • Long-term memory • Short-term memory – aka ‘working memory’ – used for thinking – limited capacity: ‘7 ± 2’ • Maybe working memory is the currently active area of long-term memory Memory as a network • Long-term memory is a network • Evidence: activation spills onto neighbours • Evidence: – priming of neighbouring words – speech errors are wrongly selected neighbours • But the network’s not just language – ‘cognitive linguistics’ Network activity • Node activation – activation takes energy, and is limited – keeping a node active is expensive • Node creation – essential for processing experience – also expensive • Node binding – expensive as it tends to confuse similar nodes Activation active! Activation Activation Node building new!! Node binding Node binding Tokens and types • Memory must include temporary tokens as well as permanent types. • Tokens are different from types – different properties, e.g. time, speaker – even conflicting properties, e.g. mispelings • But tokens are also very expensive, – because they’re the focus of attention. Tokens in syntactic theory • • • • What tokens can we afford? At least one token per word e.g. five tokens here At least one dependency token per word But do we really need more? – e.g. for we: a word, and a DP? • Phrases are expensive – so they need really strong evidence! Dependency structure • Just one token node per word • And one per dependency • e.g. “Dependency grammar is very ancient.” p a s Dependency grammar a is very ancient Phrase structure • • • • One token per word, plus: one token per phrase-mother. one part-whole relation per word or phrase. e.g. “Phrase structure is very young.” The cost of phrase structure VERY expensive! phrase structure is very young is very young very young phrase structure Phrase structure is very young. So what? (1) • Tokens are expensive (for memory resources) as long as they’re active. • So the sooner they de-activate, the better. • Tokens can de-activate sooner in dependency structure than in phrase structure. • So dependency structure is psychologically more plausible. Dependency distance • • • • How long must a word token stay active? Till it’s linked as dependent to a ‘parent’. What’s the cost of keeping it active? The other tokens that are active at the same time. • I.e. cost of W = number of words between W and its parent. = dependency distance An example p a s Dependency grammar dependency distance 0 a is very 0 0 N/A ancient 1 Long subjects and dependency distance This is the dog that chased the cat that caught the rat that ate the cheese that lay in the house that Jack built. max dd = 0 The dog that chased the cat that caught the rat that ate the cheese that lay in the house that Jack built max dd = 21 is this one. So what (2) • Long subjects are expensive because their head competes for activation with all the other words between it and the verb. • Dependency distance measures this precisely. – Ed Gibson (MIT) has independently developed a similar measure. Learning syntax • Dependency patterns can only be learned from active tokens. • Most words in casual speech have dd = 0. – 74.2% in PEN treebank – 63% adults in CHILDES – only 1-4% have dd > 4. • Every English dependency allows dd = 0. So what? (3) • Learning dependency patterns is easy. • Adjacent but non-dependent words are (by definition) random, and have no lasting effect. • Non-adjacent but dependent words don’t matter because the same patterns can always be learned from easier examples. • So most of syntax is easy to learn as data. – inducing generalizations is more tricky. Typology • Why are SVO languages so common? – SOV = 45%, SVO = 35%, VSO = 10% (± 5%) • Each order has some benefits. • For SVO, it’s low dependency distance. S O V min dd = 1 V S O min dd = 1 S V O min dd = 0 Moreover, …. big book noun about linguistics adjective very happy to see you preposition just before Christmas So what? (4) • One of the pressures on languages is to minimize dependency distances. • If words allow two dependents, dd is 0 if the dependents are on opposite sides. • This is possible in all English word classes, not just in verbs. • Maybe SVO is part of a more general pattern which reduces memory load. ‘consistently mixed’ Long subjects • Long subjects are hard to produce. • The head word may de-activate before the verb is produced, hence frequent nonagreement examples: “… the accuracy of the quotes have not been disputed.” • Long subjects are also hard to understand. nearest active N Why is it-extraposition helpful? 10 that extraposed It sentences are easier to process than their unextraposed equivalents 1 is clear The extraposed version is more complex but easier. Dependency structures for itextraposition 1 It ’s 2 clear that extraposed sentences are easier to process than their unextraposed equivalents. 2 max dd = 2 1 max dd = 10 10 2 That extraposed sentences are easier to process than their unextraposed equivalents 2 1 is clear Other tactics to help memory • Extraposition from NP Two people who were on the pavement •‘Heavy NP shift’ died 8 I saw something that would have yesterday made even you laugh •Topicalisation 3 3 anaphoric distance 8 we sat down to rest and have when we got a light snack there 13 Grammaticality and weight • These special strategies override normal rules. • But they’re only allowed for ‘heavy’ (or otherwise memory-heavy) structures. – *I rang up her. – I rang up the girl who ….. • So grammarians need a theory of memory. Thank you • The theory is called Word Grammar: www.phon.ucl.ac.uk/home/dick/wg.htm • This slide show can be found at www.phon.ucl.ac.uk/home/dick/talks.htm So what? (5) • English grammar has evolved to minimize demands on memory. – basic word order (consistently mixed) – special orders for overriding the basic order. • Grammaticality depends on memory load as well as on grammar.