Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lexical Frequency and
Linguistic Variation
Gregory R. Guy
Pennsylvania State University
8 November 2013
Issues, order of presentation
• Theories and models: Bybee, Pierrehumbert;
Exemplar Theory vs. conventional phonology
• Does frequency significantly affect phonology?
• Are frequency effects continuous, discrete, linear?
• Are frequency effects independent and orthogonal
to other constraints, or do they interact?
• Does frequency affect morphological variation?
• Does frequency affect syntactic variation?
• Negative evidence – when does frequency fail?
Other problems
• The data problem. Zipfs law curve.
– Few tokens of low frequency items, by def.
• Theoretical issues: how to incorporate freq in
abstract representation, independence of
operations, entrenchment vs change
• Exemplars- remembering the whole cloud
• Anti-frequency effects of generalization
The data problem: Zipf’s Law
• Lexical frequency follows a power law
distribution: there are very few words with
very high frequencies, and many words with
very low frequencies.
– e.g., in the million word Brown corpus, the 135
most common words account for half the data
• Hence, for statistical analysis of individual low
frequency items, you need to process LOTS
of data
Zipf’s law: Text frequency in Moby Dick
Part I.
Does lexical frequency affect
phonological processes?
Theoretical claims: Exemplar
Theory, Usage-based Phonology
• Bybee, Pierrehumbert, Hay, etc.
• Words have very rich representations,
incorporating information about
frequency and contexts of occurrence
• Frequency postulated to condition
variation, drive certain phonological
changes (e.g., lenition)
Bybee on lexical frequency
effects (Bybee 2000)
• “The more a word is used, the more it
is exposed to the reductive effect of
articulatory automation…”
• “Sound change affects stored
representations incrementally each
time a word is used…”
Phonological variables
involving “reduction”
• English: final coronal stop (t/d) deletion:
– variable in all dialects
– constrained by linguistic context (preceding
and following context, morphology)
{t,d} -> Ø/ C__##
– e.g. ol’ man, eas’ side
Spanish: final /s/ lenition
(‘aspiration and deletion’)
• Articulation: reduction in extent and duration
of lingual constriction
• Acoustic properties: lowered frequency
distribution, shorter duration
– variable in some dialects (e.g. Caribbean,
Argentine, Andalucian)
[s] -> {h,Ø}/ __##
– e.g. menos~menoh~meno
Does frequency have
significant effects on variation?
• Yes! e.g., studies of English -t,d
deletion:
– Myers & Guy 1997
– Guy, Hay & Walker 2008
• And No! e.g.:
– Myers & Guy 1997
– Erker 2008 (Caribbean Spanish -s lenition)
Is frequency significant? Yes
-t,d deletion (Myers & Guy 1997)
Monomorphemic words*
N Deletions % Del
Low frequency
151 28
18.5
High frequency
573 194
33.9
*p < .01
Obs: a lenition process, promoted by higher
lexical frequency, per Bybee.
Guy, Hay & Walker 2008: t,d deletion. p=.0005
Is frequency significant? No!
-t,d deletion (Myers & Guy 1997)
Regular Past Tense Verbs
N Deletions
Low frequency 96
7
High frequency 220
18
chi-sq (1df) = .073, p > .70
% Del
7.3
8.2
Is frequency significant? No!
Spanish -s lenition (Erker 2008)
• Spectral center of gravity
N=318
r = -.02
p = .74
r = .07
p = .136
• Duration*
N=453
*n.s. overall, but see below
Spanish -s lenition: Spectral center of
gravity by frequency. (Erker 2008)
p=.74
What’s the right measure of
frequency?
• Published counts for large corpora:
– Myers & Guy: Francis & Kucera -- significant
– Guy, Hay, Walker: CELEX -- not significant
– Erker 2008: Davies -- not significant
• Local corpus counts:
– Guy, Hay, Walker: ONZE corpus, significant
– Erker 2008: Interview frequency, not significant
The right stuff: Lexical frequency in
the ONZE corpus
(Guy, Hay & Walker, 2008)
“Log frequency of word, as produced by these speakers,
has a significant effect on -t,d deletion (p=.0005).
Higher frequency words are more reduced, [per Hooper
1976, Bybee 2000, etc.]”
“Substituting Log CELEX frequency yields a nonsignificant effect (p=.13). Frequencies from CELEX,
based on data from a different dialect of English
collected many years later, are not significant
predictors of deletion, even though they correlate well
with local frequency measures.”
Sometimes there is no right measure
Spanish -s lenition (Erker 2008)
• Spectral center of gravity
Davies freq: r = -.02
IV freq:
r = .029
p = .740
p = .606
• Duration
Davies freq: r = .07
IV freq:
r = -.066
p = .136
p = .244
What kind of effect?
• Continuous, log-linear (GHW)
• Discrete (Myers & Guy) (data set
partitioned at K&F count of 35 pmw)
• Threshhold effect? (Erker, duration)
Continuous, linear effect (Guy, Hay & Walker 2008)
Discrete effect: data partitioned into
high and low freq. sets (Myers & Guy)
-t,d deletion: Monomorphemic words*
Low frequency
High frequency
N Deletions % Del
151
28
18.5
573 194
33.9
*p < .01
A threshhold effect: Spanish -s
lenition: duration (Erker 2008)
• In a continuous treatment, lexical frequency is
not significantly correlated with duration:
r = .07
p = .136
(Duration is not significantly shorter in more
frequent words)
• But in a discretely partioned data set,
frequency is significant. (High frequency
words are shorter than low frequency words)
Spanish -s lenition: Duration by
lexical frequency. (Erker 2008)
Threshhold effect: -s lenition (Erker)
Frequency significant in discretely
partioned data set
• High frequency (rank < 250)
N=318
Mean duration = 41ms
• Low frequency (rank > 250)
N=134
Mean duration = 55ms
t-test: p=.002
Do frequency effects extend
beyond phonology?
• Bybee’s focus is mainly phonological;
her mechanism for advancing lenition
by frequent repetition applies to
articulation.
• But other evidence suggests frequency
effects at other levels (e.g. morphology,
syntax, lexical semantics)
Part II.
Does frequency affect morphology?
• Does lexical frequency interact with
morphological constraints on
phonological processes?
• Does it affect morphological variation?
Are frequency effects constant,
orthogonal to morphological constraints?
• Bybee’s model of lenition fed by
frequency is phonologically motivated
• It suggests systematic preference for
lenition/deletion in higher frequency
forms
• This should be independent of and
orthogonal to morphological constraints
(also others, e.g. syntax, discourse)
But, other models make
other predictions, e.g…
• Pinker (inter alia) argues that regular
derived forms are generated by rule;
only roots and irregular forms are stored
• Hence, frequency effects on regular
derived forms shouldn’t occur, because
they have no independent lexical
representations to accumulate
exemplars or collocational information
Frequency interacts with morphology:
-t,d deletion (Myers & Guy)
Frequency effect by morphology
40
% -t,d deletion
35
30
25
monomorphemes
20
regular past
15
10
5
0
low freq
high freq
Lexical frequency
Fruehwald on lexical
frequency and -t,d deletion
• Multivariate analysis of -t,d deletion in the
Buckeye corpus, including a treatment of
frequency
• Like Myers & Guy, this study shows
interaction b/w frequency and morphological
class: the morphology effect is weak or
neutralized at low frequencies, manifest at
high frequencies.
-t,d morphology: Fruehwald
(probabilities of /t,d/ retention)
Does frequency affect
morphological variation?
• Morphological variation: choices among
morphological alternants in a language
• LaFave and Guy (2011) look at
frequency constraints on adjective
gradation in English
LaFave & Guy: Adjective
gradation in English
• English has two morphological options for
making comparative and superlative
adjectives:
– Synthetic: great, greater, greatest
happy, happier, happiest
– Analytic: great, more great, most great
happy, more happy, most happy
(nb: trisyllabic or longer roots have only analytic forms: *importantest)
Adjective gradation is socially
and linguistically constrained
• Lower status speakers use more
synthetic/inflected forms
• Shorter words favor synthetic forms
• Does lexical frequency have an effect
on this choice?
High Frequency Roots
(mono- and di-syllables)
i.
ii.
iii.
iv.
v.
vi.
vii.
viii.
good
old
big
easy
high
young
close
hard
185
51
41
41
37
23
19
17
ix.
x.
xi.
xii.
xiii.
xiv.
xv.
xvi.
early
large
small
cheap
long
low
great
late
16
14
14
13
12
12
11
10
Constraints on adjective gradation
(Goldvarb factor weights)
Probability of producing Synthetic/Inflected variant
Degree
Number of
Syllables
Frequency
Education
Comparative
N=527
Superlative
N=248
.420
.665
One
N=589
Two
N=186
.640
.139
High
Low
N=516
N=259
.863
.025
High School
(or less)
N=79
Undergraduate
Graduate
Unknown
N=486
N=96
N=114
.654
.565
.357
.256
N.S. FGs: Channel (Style), Age, Sex, Dialect Region, Ethnicity
input = 0.992
Morphology and lexical
frequency: conclusions
• Morphological constraints on phonology
interact with lexical frequency: derived
forms are less affected by frequency
• Lexical frequency can affect choice
among morphological alternatives:
– higher frequency favors selection of
marked alternatives;
– lower frequency favors use of most general
alternative
Part III. Does frequency affect
syntactic variation
• Can lexical frequency affect syntactic
variation, i.e. selection among syntactic
alternatives?
• A test case: Spanish pro-drop
• Spanish is a pro-drop language; subject
pronoun expression is optional
Variable pro-drop in Spanish
• Spanish speakers alternate between
overt subject personal pronouns (SPPs)
and zeros:
– El habla español vs. (Ø) Habla español.
– Yo vengo mañana vs. (Ø) Vengo mañana.
• This variable is well-studied; known to
be constrained by many linguistic
factors; some dialectal differences
Constraints on Spanish pro-drop
• Properties of verb
– Paradigm regularity
– Tense/Mood/Aspect
– Person/number
– Verbal semantics
• Discourse properties
– Switch reference vs. continuity of reference
The research question -Erker & Guy 2012
• If properties of the verb (like its
person/number form or paradigmatic
regularity) determine whether or it has an
overt subject pronoun, then…
The research question -Erker & Guy 2012
• If properties of the verb (like its
person/number form or paradigmatic
regularity) determine whether or it has an
overt subject pronoun, then…
• Does the lexical frequency of the
verb also affect pro-drop?
Frequency effect
on Spanish SPPs
• Individual verbs could be stored in the
exemplar cloud with collocational
information about whether the
associated subject pronoun was
expressed or not
• Results: in 4k+ verbs from Otheguy/
Zentella NYC Spanish corpus, a main
effect of frequency is found… but….
Erker & Guy, Spanish subject pronouns:
Frequency and morphological regularity
__Regular verbs
__Irregular verbs
Frequency and tense-mood-aspect
Frequency and verbal semantics
Frequency and switch-reference
Main Effect
Constraint
Regularity
Semantic Cl.
Person/Num.
TMA
Switch Ref.
Interaction w/
Frequency
Infrequent Forms Frequent Forms
No (p = .73)
No (p = .38)
No (p = .47)
Yes (p = .001)
Yes (p = .001)
Yes (p = .001)
Yes (p = .001)
Yes (p = .001)
Yes (p = .006)
Yes (p = .001)
Yes (p = .001)
Yes (p = .001)
Yes (p = .001)
Near sig. (p<.08)
Near sig. (p<.06)
Interactions: summary
• In all cases, a constraint effect is greater in
more frequent forms
• In some cases, the constraint effect is
neutralized (insignificant) in infrequent forms
• The effect of higher frequency is not constant:
in some contexts, higher frequency favors
more overt SPPs, in others there are fewer
SPPs.
Interactions: summary
• Conjecture: frequency interacts strongly with
features local to the lexical item:
– -t,d deletion: morphology, derivational history
– Spanish pronouns: regularity, semantics
• Frequency interacts weakly with constraints
external to the lexical item:
– Discourse level: switch reference
– Paradigmatic: tense/mood/aspect
How does frequency operate?
• Hypothesis: Speakers require some
minimal level of exposure to a lexical
item to formulate hypotheses about it, or
to identify patterns it participates in.
• Hence, high frequency lexical items can
be associated with collocational
information (e.g. whether a verb cooccurs with an overt subject pronoun)
Corollary: Interaction
• Speakers can formulate analogical
generalizations across higher frequency
forms; e.g. regarding paradigmatic
regularity, person/number forms, etc.
• But lower frequency forms get the ‘plain
vanilla’ treatment, with SPPs inserted at
the unmarked average rate,
undifferentiated by lexical identity or
structural properties
Part IV.
Some negative evidence:
When frequency (and
exemplar models) fail
Jesse goes to Australia
• Amer Eng speaking child moved to Oz
at the age of 1yr10.5mos
• Evidence from input: Aus Eng has
regular aspirated /t/ intervocalically
where Amer Eng has a flap:
– e.g., water, little, pretty
[lIrl] vs. [lIthl]
Jesse’s output after 10 weeks
in Australian daycare
• All intervocalic post-tonic obstruents become
voiceless!
• Coronal stops: water, pretty, but also:
daddy>datty, cuddle>cuttle
• Noncoronal stops: doggie>dockie,
table>tapu, bobble>bapu, baby buggy>bapy
bucky
• Fricatives: fuzzy bear>fussy bear, driver,
driving> drifer, drifing
Note about the evidence
• No exemplars in target dialect for devoicing
other than /t/ tokens
• No exemplars in native dialect for any
devoicing!
• Massive counterevidence in both native and
target dialect AGAINST his rule
• Frequency goes massively the wrong way
• No explanation by linguistic immaturity,
markedness
Jesse’s devoicing rule
C --> [-voice]/ V ___ V
[-son]
[-stress]
Repairing the overshoot
• Jesse pares away incorrect devoicings by
(abstract) natural classes: fricatives, velar
stops, labial stops
• Persistent difficulty identifying which Am Eng
flaps matched Aus Eng voiceless stops (e.g.,
continued use of ‘datty’), despite frequent
counterexamples
Conclusion
• Jesse’s evidence suggests he has
abstract underlying representations, one
per lexical item, and performs abstract
phonological operations on these URs.
• This evidence is inconsistent with
exemplar theory, i.e., a word-by-word
frequency driven model
Morphological acquisition in
coronal stop deletion
• Guy & Boyd find age-graded treatments of
semi-weak past tense forms in coronal stop
deletion (e.g., left, kept, told)
• Most favorable category for deletion in young
children
• Deleted at rate of monomorphemes for
adolescents, young adults
• In middle age, most speakers move to a
conservative treatment, suggesting a
‘derivational’ morphological analysis
Therefore, children are unable
to match parental inputs,
regardless of frequency
• If at a given stage of acquisition,
children cannot formulate ‘derivational’
morphological structures, they will
deviate from adult usage no matter how
frequently they hear such forms
-t,d deletion probabilities for Curt and Kay C., King of Prussia
Probability of deletion
1
0.8
0.6
Curt C., 44
Kay C., 34
0.4
0.2
0
Monomorphemic
Derivational
fist, hand, cold
lost, kept, told
Preterit
tossed, rolled
Probability matching of David C., 7 years old, King of Prussia PA
fist, hand, cold
lost, kept, told
tossed, rolled
Probability matching of 16 children, 3-5 years old, So. Philadelphia. Source:
Figure 7.4, Roberts 1994
1
Probability of deletion
0.8
0.6
Children (N=1841)
Parents (N=604)
0.4
0.2
0
Monomorpheme
fist, hand, cold
Derivational
Preterit
lost, kept, told
tossed, rolled
General Conclusions
• Frequency has some significant effects, in
morphosyntax as well as phonology
• But sometimes it has no effect, or fails
completely
• It’s not a general and unitary phenomenon;
shows interactions
• Threshhold effects: speakers need some
minimum number of instances of a word to
formulate lexically-linked constraints,
collocations, etc.
• Otherwise they treat lexical items the same
Therefore…
• (Some) lexical representations must
incorporate (some) frequency info
• Conventional abstract lexical
representations (e.g. generative
phonology) are too impoverished to
adequately account for freq. info.
• Richer representations are needed,
including information about collocations
But do the facts require full
exemplar clouds in memory?
• This may be overkill:
• Fine frequency gradations are not
usually evident in the data
• Exemplar/frequentist models sometimes
make completely wrong predictions
• Abstract representations and
generalized operations are required
How frequency effects work
• Where frequency effects are evident,
high frequency appears to favor:
– lenition over retention
– marked over unmarked
– lexically specific over general
– ‘exceptions’ over ‘rules’
• In other words, lexical frequency
resembles the ‘elsewhere’ condition
• If you have a lot of information about a
word, from hearing and using it often,
you can formulate lexically-specific
hypotheses on that word, and store
them in your mental representation
• Failing that, you treat words using
general processes.
• High lexical frequency ENABLES but
does not REQUIRE specific linguistic
outcomes.
So,
Sometimes the frequency magic works,
And sometimes it doesn’t,
It’s only part of the story.
Thank you,
Thank you…
Frequently