Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lexical Frequency and Linguistic Variation Gregory R. Guy Pennsylvania State University 8 November 2013 Issues, order of presentation • Theories and models: Bybee, Pierrehumbert; Exemplar Theory vs. conventional phonology • Does frequency significantly affect phonology? • Are frequency effects continuous, discrete, linear? • Are frequency effects independent and orthogonal to other constraints, or do they interact? • Does frequency affect morphological variation? • Does frequency affect syntactic variation? • Negative evidence – when does frequency fail? Other problems • The data problem. Zipfs law curve. – Few tokens of low frequency items, by def. • Theoretical issues: how to incorporate freq in abstract representation, independence of operations, entrenchment vs change • Exemplars- remembering the whole cloud • Anti-frequency effects of generalization The data problem: Zipf’s Law • Lexical frequency follows a power law distribution: there are very few words with very high frequencies, and many words with very low frequencies. – e.g., in the million word Brown corpus, the 135 most common words account for half the data • Hence, for statistical analysis of individual low frequency items, you need to process LOTS of data Zipf’s law: Text frequency in Moby Dick Part I. Does lexical frequency affect phonological processes? Theoretical claims: Exemplar Theory, Usage-based Phonology • Bybee, Pierrehumbert, Hay, etc. • Words have very rich representations, incorporating information about frequency and contexts of occurrence • Frequency postulated to condition variation, drive certain phonological changes (e.g., lenition) Bybee on lexical frequency effects (Bybee 2000) • “The more a word is used, the more it is exposed to the reductive effect of articulatory automation…” • “Sound change affects stored representations incrementally each time a word is used…” Phonological variables involving “reduction” • English: final coronal stop (t/d) deletion: – variable in all dialects – constrained by linguistic context (preceding and following context, morphology) {t,d} -> Ø/ C__## – e.g. ol’ man, eas’ side Spanish: final /s/ lenition (‘aspiration and deletion’) • Articulation: reduction in extent and duration of lingual constriction • Acoustic properties: lowered frequency distribution, shorter duration – variable in some dialects (e.g. Caribbean, Argentine, Andalucian) [s] -> {h,Ø}/ __## – e.g. menos~menoh~meno Does frequency have significant effects on variation? • Yes! e.g., studies of English -t,d deletion: – Myers & Guy 1997 – Guy, Hay & Walker 2008 • And No! e.g.: – Myers & Guy 1997 – Erker 2008 (Caribbean Spanish -s lenition) Is frequency significant? Yes -t,d deletion (Myers & Guy 1997) Monomorphemic words* N Deletions % Del Low frequency 151 28 18.5 High frequency 573 194 33.9 *p < .01 Obs: a lenition process, promoted by higher lexical frequency, per Bybee. Guy, Hay & Walker 2008: t,d deletion. p=.0005 Is frequency significant? No! -t,d deletion (Myers & Guy 1997) Regular Past Tense Verbs N Deletions Low frequency 96 7 High frequency 220 18 chi-sq (1df) = .073, p > .70 % Del 7.3 8.2 Is frequency significant? No! Spanish -s lenition (Erker 2008) • Spectral center of gravity N=318 r = -.02 p = .74 r = .07 p = .136 • Duration* N=453 *n.s. overall, but see below Spanish -s lenition: Spectral center of gravity by frequency. (Erker 2008) p=.74 What’s the right measure of frequency? • Published counts for large corpora: – Myers & Guy: Francis & Kucera -- significant – Guy, Hay, Walker: CELEX -- not significant – Erker 2008: Davies -- not significant • Local corpus counts: – Guy, Hay, Walker: ONZE corpus, significant – Erker 2008: Interview frequency, not significant The right stuff: Lexical frequency in the ONZE corpus (Guy, Hay & Walker, 2008) “Log frequency of word, as produced by these speakers, has a significant effect on -t,d deletion (p=.0005). Higher frequency words are more reduced, [per Hooper 1976, Bybee 2000, etc.]” “Substituting Log CELEX frequency yields a nonsignificant effect (p=.13). Frequencies from CELEX, based on data from a different dialect of English collected many years later, are not significant predictors of deletion, even though they correlate well with local frequency measures.” Sometimes there is no right measure Spanish -s lenition (Erker 2008) • Spectral center of gravity Davies freq: r = -.02 IV freq: r = .029 p = .740 p = .606 • Duration Davies freq: r = .07 IV freq: r = -.066 p = .136 p = .244 What kind of effect? • Continuous, log-linear (GHW) • Discrete (Myers & Guy) (data set partitioned at K&F count of 35 pmw) • Threshhold effect? (Erker, duration) Continuous, linear effect (Guy, Hay & Walker 2008) Discrete effect: data partitioned into high and low freq. sets (Myers & Guy) -t,d deletion: Monomorphemic words* Low frequency High frequency N Deletions % Del 151 28 18.5 573 194 33.9 *p < .01 A threshhold effect: Spanish -s lenition: duration (Erker 2008) • In a continuous treatment, lexical frequency is not significantly correlated with duration: r = .07 p = .136 (Duration is not significantly shorter in more frequent words) • But in a discretely partioned data set, frequency is significant. (High frequency words are shorter than low frequency words) Spanish -s lenition: Duration by lexical frequency. (Erker 2008) Threshhold effect: -s lenition (Erker) Frequency significant in discretely partioned data set • High frequency (rank < 250) N=318 Mean duration = 41ms • Low frequency (rank > 250) N=134 Mean duration = 55ms t-test: p=.002 Do frequency effects extend beyond phonology? • Bybee’s focus is mainly phonological; her mechanism for advancing lenition by frequent repetition applies to articulation. • But other evidence suggests frequency effects at other levels (e.g. morphology, syntax, lexical semantics) Part II. Does frequency affect morphology? • Does lexical frequency interact with morphological constraints on phonological processes? • Does it affect morphological variation? Are frequency effects constant, orthogonal to morphological constraints? • Bybee’s model of lenition fed by frequency is phonologically motivated • It suggests systematic preference for lenition/deletion in higher frequency forms • This should be independent of and orthogonal to morphological constraints (also others, e.g. syntax, discourse) But, other models make other predictions, e.g… • Pinker (inter alia) argues that regular derived forms are generated by rule; only roots and irregular forms are stored • Hence, frequency effects on regular derived forms shouldn’t occur, because they have no independent lexical representations to accumulate exemplars or collocational information Frequency interacts with morphology: -t,d deletion (Myers & Guy) Frequency effect by morphology 40 % -t,d deletion 35 30 25 monomorphemes 20 regular past 15 10 5 0 low freq high freq Lexical frequency Fruehwald on lexical frequency and -t,d deletion • Multivariate analysis of -t,d deletion in the Buckeye corpus, including a treatment of frequency • Like Myers & Guy, this study shows interaction b/w frequency and morphological class: the morphology effect is weak or neutralized at low frequencies, manifest at high frequencies. -t,d morphology: Fruehwald (probabilities of /t,d/ retention) Does frequency affect morphological variation? • Morphological variation: choices among morphological alternants in a language • LaFave and Guy (2011) look at frequency constraints on adjective gradation in English LaFave & Guy: Adjective gradation in English • English has two morphological options for making comparative and superlative adjectives: – Synthetic: great, greater, greatest happy, happier, happiest – Analytic: great, more great, most great happy, more happy, most happy (nb: trisyllabic or longer roots have only analytic forms: *importantest) Adjective gradation is socially and linguistically constrained • Lower status speakers use more synthetic/inflected forms • Shorter words favor synthetic forms • Does lexical frequency have an effect on this choice? High Frequency Roots (mono- and di-syllables) i. ii. iii. iv. v. vi. vii. viii. good old big easy high young close hard 185 51 41 41 37 23 19 17 ix. x. xi. xii. xiii. xiv. xv. xvi. early large small cheap long low great late 16 14 14 13 12 12 11 10 Constraints on adjective gradation (Goldvarb factor weights) Probability of producing Synthetic/Inflected variant Degree Number of Syllables Frequency Education Comparative N=527 Superlative N=248 .420 .665 One N=589 Two N=186 .640 .139 High Low N=516 N=259 .863 .025 High School (or less) N=79 Undergraduate Graduate Unknown N=486 N=96 N=114 .654 .565 .357 .256 N.S. FGs: Channel (Style), Age, Sex, Dialect Region, Ethnicity input = 0.992 Morphology and lexical frequency: conclusions • Morphological constraints on phonology interact with lexical frequency: derived forms are less affected by frequency • Lexical frequency can affect choice among morphological alternatives: – higher frequency favors selection of marked alternatives; – lower frequency favors use of most general alternative Part III. Does frequency affect syntactic variation • Can lexical frequency affect syntactic variation, i.e. selection among syntactic alternatives? • A test case: Spanish pro-drop • Spanish is a pro-drop language; subject pronoun expression is optional Variable pro-drop in Spanish • Spanish speakers alternate between overt subject personal pronouns (SPPs) and zeros: – El habla español vs. (Ø) Habla español. – Yo vengo mañana vs. (Ø) Vengo mañana. • This variable is well-studied; known to be constrained by many linguistic factors; some dialectal differences Constraints on Spanish pro-drop • Properties of verb – Paradigm regularity – Tense/Mood/Aspect – Person/number – Verbal semantics • Discourse properties – Switch reference vs. continuity of reference The research question -Erker & Guy 2012 • If properties of the verb (like its person/number form or paradigmatic regularity) determine whether or it has an overt subject pronoun, then… The research question -Erker & Guy 2012 • If properties of the verb (like its person/number form or paradigmatic regularity) determine whether or it has an overt subject pronoun, then… • Does the lexical frequency of the verb also affect pro-drop? Frequency effect on Spanish SPPs • Individual verbs could be stored in the exemplar cloud with collocational information about whether the associated subject pronoun was expressed or not • Results: in 4k+ verbs from Otheguy/ Zentella NYC Spanish corpus, a main effect of frequency is found… but…. Erker & Guy, Spanish subject pronouns: Frequency and morphological regularity __Regular verbs __Irregular verbs Frequency and tense-mood-aspect Frequency and verbal semantics Frequency and switch-reference Main Effect Constraint Regularity Semantic Cl. Person/Num. TMA Switch Ref. Interaction w/ Frequency Infrequent Forms Frequent Forms No (p = .73) No (p = .38) No (p = .47) Yes (p = .001) Yes (p = .001) Yes (p = .001) Yes (p = .001) Yes (p = .001) Yes (p = .006) Yes (p = .001) Yes (p = .001) Yes (p = .001) Yes (p = .001) Near sig. (p<.08) Near sig. (p<.06) Interactions: summary • In all cases, a constraint effect is greater in more frequent forms • In some cases, the constraint effect is neutralized (insignificant) in infrequent forms • The effect of higher frequency is not constant: in some contexts, higher frequency favors more overt SPPs, in others there are fewer SPPs. Interactions: summary • Conjecture: frequency interacts strongly with features local to the lexical item: – -t,d deletion: morphology, derivational history – Spanish pronouns: regularity, semantics • Frequency interacts weakly with constraints external to the lexical item: – Discourse level: switch reference – Paradigmatic: tense/mood/aspect How does frequency operate? • Hypothesis: Speakers require some minimal level of exposure to a lexical item to formulate hypotheses about it, or to identify patterns it participates in. • Hence, high frequency lexical items can be associated with collocational information (e.g. whether a verb cooccurs with an overt subject pronoun) Corollary: Interaction • Speakers can formulate analogical generalizations across higher frequency forms; e.g. regarding paradigmatic regularity, person/number forms, etc. • But lower frequency forms get the ‘plain vanilla’ treatment, with SPPs inserted at the unmarked average rate, undifferentiated by lexical identity or structural properties Part IV. Some negative evidence: When frequency (and exemplar models) fail Jesse goes to Australia • Amer Eng speaking child moved to Oz at the age of 1yr10.5mos • Evidence from input: Aus Eng has regular aspirated /t/ intervocalically where Amer Eng has a flap: – e.g., water, little, pretty [lIrl] vs. [lIthl] Jesse’s output after 10 weeks in Australian daycare • All intervocalic post-tonic obstruents become voiceless! • Coronal stops: water, pretty, but also: daddy>datty, cuddle>cuttle • Noncoronal stops: doggie>dockie, table>tapu, bobble>bapu, baby buggy>bapy bucky • Fricatives: fuzzy bear>fussy bear, driver, driving> drifer, drifing Note about the evidence • No exemplars in target dialect for devoicing other than /t/ tokens • No exemplars in native dialect for any devoicing! • Massive counterevidence in both native and target dialect AGAINST his rule • Frequency goes massively the wrong way • No explanation by linguistic immaturity, markedness Jesse’s devoicing rule C --> [-voice]/ V ___ V [-son] [-stress] Repairing the overshoot • Jesse pares away incorrect devoicings by (abstract) natural classes: fricatives, velar stops, labial stops • Persistent difficulty identifying which Am Eng flaps matched Aus Eng voiceless stops (e.g., continued use of ‘datty’), despite frequent counterexamples Conclusion • Jesse’s evidence suggests he has abstract underlying representations, one per lexical item, and performs abstract phonological operations on these URs. • This evidence is inconsistent with exemplar theory, i.e., a word-by-word frequency driven model Morphological acquisition in coronal stop deletion • Guy & Boyd find age-graded treatments of semi-weak past tense forms in coronal stop deletion (e.g., left, kept, told) • Most favorable category for deletion in young children • Deleted at rate of monomorphemes for adolescents, young adults • In middle age, most speakers move to a conservative treatment, suggesting a ‘derivational’ morphological analysis Therefore, children are unable to match parental inputs, regardless of frequency • If at a given stage of acquisition, children cannot formulate ‘derivational’ morphological structures, they will deviate from adult usage no matter how frequently they hear such forms -t,d deletion probabilities for Curt and Kay C., King of Prussia Probability of deletion 1 0.8 0.6 Curt C., 44 Kay C., 34 0.4 0.2 0 Monomorphemic Derivational fist, hand, cold lost, kept, told Preterit tossed, rolled Probability matching of David C., 7 years old, King of Prussia PA fist, hand, cold lost, kept, told tossed, rolled Probability matching of 16 children, 3-5 years old, So. Philadelphia. Source: Figure 7.4, Roberts 1994 1 Probability of deletion 0.8 0.6 Children (N=1841) Parents (N=604) 0.4 0.2 0 Monomorpheme fist, hand, cold Derivational Preterit lost, kept, told tossed, rolled General Conclusions • Frequency has some significant effects, in morphosyntax as well as phonology • But sometimes it has no effect, or fails completely • It’s not a general and unitary phenomenon; shows interactions • Threshhold effects: speakers need some minimum number of instances of a word to formulate lexically-linked constraints, collocations, etc. • Otherwise they treat lexical items the same Therefore… • (Some) lexical representations must incorporate (some) frequency info • Conventional abstract lexical representations (e.g. generative phonology) are too impoverished to adequately account for freq. info. • Richer representations are needed, including information about collocations But do the facts require full exemplar clouds in memory? • This may be overkill: • Fine frequency gradations are not usually evident in the data • Exemplar/frequentist models sometimes make completely wrong predictions • Abstract representations and generalized operations are required How frequency effects work • Where frequency effects are evident, high frequency appears to favor: – lenition over retention – marked over unmarked – lexically specific over general – ‘exceptions’ over ‘rules’ • In other words, lexical frequency resembles the ‘elsewhere’ condition • If you have a lot of information about a word, from hearing and using it often, you can formulate lexically-specific hypotheses on that word, and store them in your mental representation • Failing that, you treat words using general processes. • High lexical frequency ENABLES but does not REQUIRE specific linguistic outcomes. So, Sometimes the frequency magic works, And sometimes it doesn’t, It’s only part of the story. Thank you, Thank you… Frequently