Download Effect of cochlear implants on children`s perception and production

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Specific language impairment wikipedia , lookup

Telecommunications relay service wikipedia , lookup

Lip reading wikipedia , lookup

Speech perception wikipedia , lookup

Auditory system wikipedia , lookup

Dysprosody wikipedia , lookup

Transcript
Effect of cochlear implants on children’s perception and
production of speech prosody
Takayuki Nakataa)
Department of Complex and Intelligent Systems, Future University Hakodate, 116-2 Kamedanakano,
Hakodate, Hokkaido 041-8655, Japan
Sandra E. Trehub
Department of Psychology, University of Toronto, 3359 Mississauga Road North, Mississauga,
Ontario L5L 1C6, Canada
Yukihiko Kanda
Kanda ENT Clinic and Nagasaki Bell Hearing Center, 4-25 Wakakusa-machi, Nagasaki,
Nagasaki 852-8023, Japan
(Received 10 June 2010; revised 30 November 2011; accepted 30 November 2011)
Japanese 5- to 13-yr-olds who used cochlear implants (CIs) and a comparison group of normally
hearing (NH) Japanese children were tested on their perception and production of speech prosody.
For the perception task, they were required to judge whether semantically neutral utterances that
were normalized for amplitude were spoken in a happy, sad, or angry manner. The performance of
NH children was error-free. By contrast, child CI users performed well below ceiling but above
chance levels on happy- and sad-sounding utterances but not on angry-sounding utterances. For the
production task, children were required to imitate stereotyped Japanese utterances expressing
disappointment and surprise as well as culturally typically representations of crow and cat sounds.
NH 5- and 6-year-olds produced significantly poorer imitations than older hearing children, but age
was unrelated to the imitation quality of child CI users. Overall, child CI user’s imitations were
significantly poorer than those of NH children, but they did not differ significantly from the
imitations of the youngest NH group. Moreover, there was a robust correlation between the
performance of child CI users on the perception and production tasks; this implies that difficulties
with prosodic perception underlie their difficulties with prosodic imitation.
C 2012 Acoustical Society of America. [DOI: 10.1121/1.3672697]
V
PACS number(s): 43.64.Me, 43.70.Mn, 43.71.Ky, 43.66.Ts [JES]
I. INTRODUCTION
Prosody can distinguish statements (e.g., You’re tired.)
from questions (e.g., You’re tired?) and nouns (e.g., object)
from verbs (e.g., object). It also conveys information about a
speaker’s feelings (e.g., happy, sad) and communicative
intentions (e.g., humor or irony). These prosodic distinctions
are realized acoustically by variations in fundamental frequency (F0) or pitch, duration, and amplitude. For example,
happy and angry utterances are higher and more variable in
pitch, louder, and faster than sad utterances (Banse and
Scherer, 1996). Speaking rate (Breitenstein et al., 2001) and
pitch patterns (Murray and Arnott, 1993; Ladd, 1996) are
thought to play a dominant role in the perception of intonation and vocal emotion with amplitude making secondary
contributions (Murray and Arnott, 1993; Ladd, 1996).
Because cochlear implants (CIs) provide limited information
about the temporal fine structure of speech—its pitch and
harmonics, in particular (Geurts and Wouters, 2001; Green
et al., 2004), CI users are likely to have difficulty perceiving
and producing some aspects of prosody. Not surprisingly,
the degraded pitch and spectral cues provided by implants
a)
Author to whom correspondence should be addressed. Electronic mail:
[email protected]
J. Acoust. Soc. Am. 131 (2), February 2012
Pages: 1307–1314
also have adverse consequences for CI users’ perception of
music (Vongpaisal et al., 2006; Cooper et al., 2008; Kang
et al., 2009).
The distinction between statements (i.e., falling terminal
pitch) and yes-no questions (i.e., rising terminal pitch) may
be easier than other prosodic distinctions because implant
users can capitalize on gross periodicity cues in the signal
(Rosen, 1992). In general, child and adult CI users differentiate questions from statements, but performance is well below
that of normally hearing (NH) peers (Most and Peled, 2007;
Peng et al., 2008; Meister et al., 2009). Moreover, child CI
users’ ability to distinguish the intonation patterns of statements from questions is poorer than that of severely and profoundly deaf children who use hearing aids (Most and Peled,
2007).
The decoding of vocal emotions involves even greater
challenges, especially for child CI users. Adult CI users generally recognize vocal emotions at better than chance levels,
but their performance is substantially poorer than that of NH
adults. With semantically neutral sentences that portray
happy, sad, angry, and neutral emotions, CI users achieve
51% correct on a four-alternative forced-choice task as compared with 84% correct for NH adults (Pereira, 2000).
Because of overlapping acoustic cues, perhaps it is not surprising that happy utterances are confused with angry
0001-4966/2012/131(2)/1307/8/$30.00
C 2012 Acoustical Society of America
V
Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
1307
utterances and sad utterances with neutral utterances. In a
subsequent study with a five-alternative forced-choice task
(anxious emotions added to the other four), adult CI users
achieved roughly 45% correct on utterances with normally
varying amplitude and 37% on utterances with normalized
amplitude (Luo et al., 2007). The corresponding performance levels for hearing adults were 90% and 87% correct,
respectively. CI users’ ability to distinguish emotions with
higher amplitude (happy, angry, anxious) from those with
lower amplitude (sad, neutral) and the adverse consequences
of normalized amplitude indicate that intensity cues play a
much greater role in emotion recognition for CI adults than
for NH adults. Moreover, it is likely that CI users’ confusion
among happy, angry, and anxious emotions, which differed
mainly in mean pitch and pitch variability, stemmed from
their problems with pitch resolution.
Because the adult CI users in these studies were postlingually deafened, they would have well-established representations of various vocal emotions. Nevertheless, those
representations were insufficient in the absence of critical
pitch cues. The situation is even more challenging for prelingually deaf children with CIs, who must learn the vocal emotion categories from spectrally degraded input. One would
therefore expect child CI users to have even greater difficulty
recognizing vocal emotions than adult CI users.
In one study, NH children and adolescents identified
emotions more accurately from audiovisual information than
from visual information alone, but the addition of auditory
information did not improve performance for same-age CI
users (Most and Aviner, 2009). In another study, CI users
8–13 years of age failed to differentiate happy, sad, angry,
and fearful utterances, but they showed age-appropriate differentiation of facial portrayals of the same emotions
(Hopyan-Misakyan et al., 2009). Child CI users have more
success identifying the valence of nonverbal expressions of
emotion (e.g., giggling, crying, and interjections like ow and
mmm), even though they often assign positive valence to crying (Schorr, 2005). Interestingly, performance on the latter
task is correlated with self-ratings of quality of life with a
cochlear implant (Schorr et al., 2009), highlighting the contribution of vocal emotion perception to effective communication and social adjustment.
Although the general consensus is that adult and child
CI users have considerable difficulty interpreting vocal emotion, there is limited research in this realm compared to the
large literature on speech perception. Accordingly, one goal
of the present investigation was to examine vocal emotion
perception in child CI users and a comparison group of NH
children from Japan. Instead of the four target emotions
(happy, sad, angry, fearful) used by Hopyan-Misakyan et al.
(2009) and the six emotions (happy, sad, angry, fearful, disgusted, surprised) used by Most and Aviner (2009), the present study was restricted to happy, sad, and angry emotions.
Instead of stimuli that yield modest levels of performance
with NH children, the present stimuli yielded virtually unanimous agreement with NH children. Finally, unlike the vocal
emotion stimuli in previous studies with CI users, the present
stimuli were normalized for amplitude. Amplitude normalization eliminated an important cue for CI users, making it
1308
J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012
possible to examine the contributions of spectral and speech
rate information. Child CI users were expected to identify
happy and sad utterances at better than chance levels not
only because these are the earliest emotion categories
acquired by children (Saarni, 1999) but also because they are
the most acoustically contrastive. CI users were expected to
confuse happy with angry utterances because of overlapping
acoustic cues for these vocal emotions.
To date, there has been no research on the production of
vocal emotions in CI users. For adult implant users, such
productions would be based largely on patterns of speech
acquired prior to their hearing loss. By contrast, successful
production of vocal emotions in child CI users would depend
on their ability to extract the relevant features from the input,
associate them with appropriate emotional categories, and
monitor feedback from their own productions.
Research on vocal production in child CI users has
focused largely on speech intelligibility (Peng et al., 2004a;
Flipsen and Colvard, 2006; Svirsky et al., 2007) and only
minimally on prosody. Because pitch level and pitch contour
can be cues to word meaning in tone languages such as Mandarin or Cantonese, these cues are critical to speech perception and production in such languages. The available
evidence reveals that child CI users have difficulty perceiving lexical tone contrasts (Barry et al., 2002; Ciocca et al.,
2002). It is not surprising, then, that they have difficulty
mastering the production of such contrasts (Wei et al., 2000;
Xu et al., 2004), even after 6 years of implant experience.
Nevertheless, the problems cannot be attributed entirely to
device limitations because some CI users succeed in perceiving and producing Mandarin tones (Peng et al., 2004b).
In general, cochlear implants have resulted in significant
improvements in prosodic production on the part of prelingually deaf children. Conversational samples of speech
reveal superior pitch contour and phrasing for child CI users
relative to same-age hearing aid users with severe or profound hearing loss (Lenden and Flipsen, 2007). Nevertheless, child CI users have substantial deficiencies in prosodic
production relative to NH children. For example, adults categorize the statements and yes-no questions produced by 7- to
20-yr-old CI users with 73% accuracy (chance level of 50%)
and those produced by age-matched NH individuals with
97% accuracy (Peng et al., 2008). Moreover, the quality of
CI users’ productions correlates significantly with their accuracy of perceiving the distinctions between statements and
questions. In another study, child CI users were required to
imitate a question with rising intonation contour (Are you
ready?) on an annual basis for several years post implant
(Peng et al., 2007). Production of rising intonation was consistent with gradual improvement from year to year until 7
years of implant experience, when about 62% of utterances
were judged as contour appropriate. Because there were no
comparison data from NH children, it is difficult to separate
the contribution of implant experience from age-related
changes in prosodic imitation. Moreover, rising intonation is
not obligatory for yes-no questions that involve inversion of
subject and verb (e.g., Are you ready?), which complicates
the interpretation of productions that lack rising intonation.
In principle, it would be preferable to evaluate prosodic
Nakata et al.: Speech prosody and cochlear implants
Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
TABLE I. Background information of CI participants.
Child
Age at implant (yr)
Age at test (yr)
Implant use (yr)
Hearing aid at test
Age at amplification (yr)
Device
Coding
CI-1
CI-2
CI-3
CI-4
CI-5
CI-6
CI-7
CI-8
CI-9
CI-10
CI-11
CI-12
CI-13
CI-14
CI-15
CI-16
CI-17
CI-18
2.5
3.4
3.0
4.9
2.9
2.8
4.2
3.1
3.1
3.7
4.8
2.6
5.3
3.8
3.1
4.5
4.4
2.1
6.4
6.6
6.6
7.3
7.4
7.4
7.5
7.6
8.0
8.3
8.3
8.5
8.7
9.2
9.8
11.3
12.6
13.2
3.9
3.2
3.7
2.4
4.5
4.6
3.2
4.6
4.9
4.6
3.5
5.9
3.4
5.3
6.7
6.9
8.3
8.1
No
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
No
No
No
1.9
0.5
2.0
1.2
0.5
1.9
2.2
1.0
2.0
2.0
2.4
1.0
0.7
2.1
1.5
0.6
never
1.7
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Advanced bionics clarion
Cochlear nucleus
Advanced bionics clarion
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
Cochlear nucleus
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
SAS
ACE
SAS
ACE
ACE
SPEAK
SPEAK
SPEAK
SPEAK
production from imitative contexts where the target prosodic
patterns are obligatory.
A second goal of the present investigation was to
examine changes in prosodic imitation as a function of age
and implant use. Accordingly, we evaluated the ability of
child CI users and NH children to imitate two typical Japanese exclamatory utterances—one involving falling-rising
intonation, the other rising intonation—and two characteristic simulations of animal sounds—a crow (two successive
falling contours) and a cat (rising-falling intonation). Onomatopoeia and sound symbolism are much more common
in Japanese than in English, both in formal and in vernacular language (Hasada, 2001), so that expressions such as
those used here would be understood widely in the general
population. Presumably, age-related changes in the prosodic
productions of NH children would provide an effective
standard against which to evaluate the abilities of CI
children.
II. METHOD
A. Participants
The target participants in the present study were 18 Japanese children (3 boys, 15 girls), 5–13 yr of age (M ¼ 8.60,
SD ¼ 1.97), all but one of whom were congenitally deaf and
used CIs for at least 2 years (M ¼ 4 years, 10 months). One
child became deaf at 4 years of age but was retained in the
sample because of performance within 1 SD of the other
children. All CI children were relatively successful implant
users as indicated by their placement in age-appropriate
classes in regular schools and their ease of communicating
orally with hearing adults and children. Prior to implantation, these children had used bilateral hearing aids. Details
of etiology, age of implantation, type of implant, and age of
testing are provided in Table I. A control sample consisted
of 44 Japanese NH children (17 boys, 27 girls) 5–10 yr of
age (for boys, M ¼ 7.59, SD ¼ 1.90 and for girls, M ¼ 7.38,
J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012
SD ¼ 1.59). Their hearing status was not tested but based on
parent report. An additional five children were tested but
excluded from the final sample because of failure to understand the task (one CI child, one NH child) or technical recording difficulties (three NH children). All CI and NH
children participated in the prosody production task. After
testing 20 NH children (4 5–6-yr-olds, 5 7–8-yr-olds, and 11
9–10-yr-olds) on both prosody perception and prosody
production tasks, testing of NH children on the prosody perception task was discontinued because of error-free performance by all of the children. NH undergraduates (n ¼ 31) from
Japan who were 19–22 years of age were recruited to rate
the utterance imitations of CI and NH children. The hearing
status of undergraduates was not tested but based on selfreport.
B. Apparatus
For the CI children, testing occurred in a soundattenuating booth at a hearing clinic that the children visited
regularly. For the NH children, 29 of them were tested in a
sound-attenuating booth. The remaining children, 17 of the
22 5- to 6-yr-olds, were tested with headphones in a quiet
room on the premises of their kindergarten. There was no
indication that differences in the test conditions of NH
children—sound field in a sound-attenuating booth or headphones in a quiet room—affected performance. Presentation
of audio stimuli and recording of responses for perception
and production tasks were controlled by customized software
on a touch screen notebook computer. A desktop computer
with regular monitor controlled presentation and recording
of responses for rating tasks. For testing conducted in the
sound-attenuating booth, stimuli were presented from a loudspeaker 1 m in front of the participant. For testing conducted
at the kindergarten, stimuli were presented over headphones.
The NH undergraduates were tested in a sound-attenuating
booth with audio stimuli presented over loudspeakers (Bose
Micro Music Monitor).
Nakata et al.: Speech prosody and cochlear implants
Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
1309
TABLE II. Sentences for prosody perception task.
Sentence
1. Denki-o tsukemasu (I turn on the light)
2. Isu-ni suwarimasu (I sit on a chair)
3. Enpitsu-o motteimasu (I have a pencil)
4. Michi-o arukimasu (I walk in the street)
FIG. 1. Schematic happy, sad, and angry faces.
C. Stimuli
For the prosody perception task, four semantically neutral utterances (see Table II) were produced by the same
woman with three types of affective prosody–happy (fast
speaking rate, high pitch level), sad (slow speaking rate, low
pitch level), or angry (fast speaking rate, high pitch level),
for a total of 12 utterances. The speaker was instructed to
portray four emotionally neutral utterances with happy, sad,
and angry paralanguage at four levels of emotional intensity,
reflecting gradations of the specific emotion in question.
From these recordings, the best or clearest exemplars of
happy, sad, and angry emotions were selected for each utterance and emotion category. Root-mean-square amplitude
was equalized across utterances by means of PRAAT 5.0.34
speech analysis and synthesis software (Boersma and
Weenink, 2008), and stimuli were presented at approximately 65-70 dB SPL. Mean pitch, pitch range, and speaking
rate are shown in Table III. As can be seen in Table III,
mean F0, F0 range, and F0 variability were highest for happy
sentences, intermediate for angry sentences, and lowest for
sad sentences. Utterance duration was shortest (i.e., fastest
speaking rate) for happy sentences, similar for angry sentences, and longest (i.e., slowest speaking rate) for sad sentences. Sentence 1 with happy prosody, sentence 2 with sad
prosody, and sentence 3 with angry prosody (see Table II)
were used as practice stimuli and also as test stimuli. Three
line drawings depicted schematic facial expressions of happiness, sadness, or anger (see Fig. 1).
For the prosody production task (see Table IV), there
were four utterances consisting of two Japanese exclamatory
utterances—a stereotyped expression of disappointment
involving falling and rising contour; a stereotyped expression of surprise involving rising contour—and two characteristic simulations of animal sounds—Japanese renditions of
the caw caw of a crow (repeated falling contour) and the
meow of a cat (rising and falling contour). These utterances
were presented at 65-70 dB SPL. A digital audio recorder
(SONY TCD-D8) and microphone (EV Cobalt Co9) were
used for recordings of stimuli for the prosody perception and
production tasks and for recordings of responses for the production task.
D. Procedure
All children were tested individually. The prosody perception task always preceded the prosody production task.
The prosody perception task was introduced as a listening
game. First, the experimenter demonstrated each type of
emotional expression with animated examples (e.g., “When
I’m feeling happy, I talk like this”). Then the children were
encouraged to show how they talk when they feel happy,
sad, and angry. After those demonstrations, children proceeded to the perception task in which they were asked to
judge whether the girl, who was named Mei, was feeling
happy, sad, or angry. They were required to respond by
touching the picture (one of three schematic drawings) on a
touch screen notebook computer that showed her feelings.
Initially, they received one practice trial with each of three
emotion categories, and they received verbal feedback about
their response choices. If children responded correctly to the
three utterances, they proceeded to the test phase. If not, the
practice phase was continued with the same three utterances
TABLE III. Acoustic features of utterances in prosody perception task.
Emotion
Sentence
Mean F0 (Hz)
F0 range (Hz)
SD of F0 (Hz)
Sentence duration (s)
Happy
Happy
Happy
Happy
Happy
1
2
3
4
Mean of 1–4
374
398
405
397
393
171
240
140
224
194
43
41
43
52
45
1.009
1.192
1.394
0.966
1.140
Sad
Sad
Sad
Sad
Sad
1
2
3
4
Mean of 1–4
291
283
263
270
277
116
76
99
62
88
33
10
28
13
21
1.236
1.368
1.561
1.503
1.417
Angry
Angry
Angry
Angry
Angry
1
2
3
4
Mean of 1–4
350
386
362
317
354
174
229
136
113
163
43
45
39
28
39
1.088
1.208
1.406
1.050
1.188
1310
J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012
Nakata et al.: Speech prosody and cochlear implants
Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
TABLE IV. Utterances for prosody production task.
Utterances
1. Ah-ah (exclamatory expression of disappointment)
2. Eh (exclamatory expression of surprise)
3. Kah-kah (simulated sound of crow)
4. Nya-o (simulated sound of cat)
until they understood the goal of the game (i.e., matching the
vocal emotional expressions to the pictured facial expressions) and responded correctly to those utterances. During
the test phase, the 12 utterances, which included the three
used in the practice phase, were presented in random order
without feedback. For the prosody production task, children
were asked to imitate, as closely as possible, each of the
model’s utterances, which were presented in fixed order—
exclamatory expression of disappointment (falling-rising
contour), exclamatory expression of surprise (rising contour), crow sound (falling-falling contour), and cat sound
(rising-falling contour) (see Fig. 2).
Undergraduate students were tested individually. They
listened to the utterances from CI and NH children and rated
how closely each utterance matched the model on a 10-point
scale (1 ¼ not similar at all to 10 ¼ extremely similar). To
familiarize raters with the range of expected performance,
there was a practice phase in which they listened to imitations of utterance 1 by four children whose utterances were
also presented in the rating task. These four children (three
CI children and one NH child) were chosen on the basis of
preliminary ratings indicating that they spanned the range of
rating categories from lowest to highest.
III. RESULTS
NH children were 100% accurate on the perception task.
For CI children, preliminary one-sample t-tests showed correct identification above chance levels (33%) for happy
utterances (M ¼ 61%, SD ¼ 32), t(17) ¼ 3.64, P ¼ 0.002, and
for sad utterances (M ¼ 64%, SD ¼ 33), t(17) ¼ 3.88,
P ¼ 0.001, but not for angry utterances (M ¼ 39%, SD ¼ 31),
t(17) ¼ 0.76, P ¼ 0.460. An analysis of variance (ANOVA)
of CI children’s performance revealed a significant effect
of emotion category, F(2,34) ¼ 4.19, P ¼ 0.024, g2 ¼ 0.20.
Post hoc t-tests with Bonferroni adjusted P values revealed
significantly poorer identification of angry utterances than
sad utterances, P ¼ 0.013. Angry utterances were misidentified as happy 83% (SD ¼ 21) of the time, significantly more
often than would be expected by chance, t(15) ¼ 6.22,
P < 0.001. CI children’s performance on prosody perception
was not correlated with chronological age, r(16) ¼ 0.008,
age of implantation, r(16) ¼ 0.25, or implant experience,
r(16) ¼ 0.10, Ps > 0.05.
Undergraduate raters were divided in two groups. The
first group (n ¼ 14) rated 22 CI children and 22 NH children.
The second group (n ¼ 17) rated 9 randomly selected
children from the 22 CI children rated by the first group as
well as 28 additional NH children. Interrater reliability on
ratings for the 9 CI children was highly significant (intraclass
correlation coefficient ¼ 0.96, P < 0.001). For each NH and
CI child, there were four scores corresponding to the
average ratings of the judges on each of the four imitated
utterances. CI children’s performance on prosody production, as reflected in the ratings, showed no significant
correlation with chronological age, r(16) ¼ 0.06, age of implantation, r(16) ¼ 0.37, or implant experience, r(16)
¼ 0.13, Ps > 0.05. Mean ratings for the three age groups of
hearing children as well as the entire group of CI children
are shown in Fig. 3.
An ANOVA to examine the effects of hearing status (CI
or NH) and utterance type on adults’ ratings of children’s
imitations revealed a main effect of hearing status,
F(1,60) ¼ 18.68, P < 0.001, g2 ¼ 0.24. Collapsing over age
and utterance type, CI children received significantly lower
ratings (M ¼ 3.80, SD ¼ 1.13) than NH children (M ¼ 5.51,
SD ¼ 1.51). There was also a main effect of utterance type,
F(3,180) ¼ 20.42, P < 0.001, g2 ¼ 0.25 (see Fig. 4). Averaging across hearing status and age group, ratings were highest
for surprise (M ¼ 5.73, SD ¼ 1.92), with its rising contour,
which significantly exceeded all other ratings, Ps < 0.02
(with Bonferroni adjustment). Next highest were the ratings
of the simulated cat sound (M ¼ 5.15, SD ¼ 1.72), followed
by the ratings of the simulated crow sound (M ¼ 5.08,
SD ¼ 1.64) and disappointment ratings (M ¼ 4.34,
SD ¼ 1.77). Ratings of disappointment were significantly
FIG. 2. Amplitude waveforms (top)
and F0 patterns (bottom) of the four
utterances in the prosody production
task.
J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012
Nakata et al.: Speech prosody and cochlear implants
Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
1311
(M ¼ 4.44, SD ¼ 1.13), P ¼ 0.26. CI children’s ratings
were significantly poorer than those of NH 7- to 8-yr-olds
(M ¼ 6.11, SD ¼ 1.23) and 9- to 10-yr-olds (6.86, SD
¼ 0.73), P < 0.01. While age was significantly correlated
with imitation ratings for NH children, r(42) ¼ 0.66,
P < 0.001, neither age nor implant experience was correlated
with the performance of child CI users, rs < 0.14, Ps > 0.05.
Nevertheless, there was a robust correlation between prosody perception and production among child CI users,
r(16) ¼ 0.56, P < 0.05. The outcome of all analyses
remained unchanged when the child who became deaf at 4
years of age was excluded.
IV. DISCUSSION
FIG. 3. Mean ratings of CI and NH children’s prosodic imitations. Error
bars indicate standard errors.
lower than all others, Ps < 0.01 (with Bonferroni
adjustment).
A separate ANOVA with age (5–6, 7–8, or 9–10 yr)
as the independent variable and ratings of NH children’s imitations averaged over the four utterance types as the dependent variable revealed a highly significant effect of age,
F(2,41) ¼ 24.80, P < 0.001, g2 ¼ 0.55. Ratings for NH children increased linearly from 5 to 6 yr of age (M ¼ 4.44,
SD ¼ 1.13) to 7–8 yr of age (M ¼ 6.11, SD ¼ 1.23) and to
9–10 yr of age (M ¼ 6.86, SD ¼ 0.73). Follow-up Scheffé
tests revealed that 5- to 6-yr olds (M ¼ 4.43, SD ¼ 1.13)
received significantly lower ratings than 7- to 8-yr-olds
(M ¼ 6.11, SD ¼ 1.23) and 9- to 10-yr-olds (M ¼ 6.86,
SD ¼ 0.73), Ps < 0.005, who did not differ from one another.
As can be seen in Fig. 3, the oldest NH children performed
reasonably well but below ceiling levels (rating of 10). Imitations of the youngest NH children were only modestly similar to the modeled utterances.
CI children’s imitation scores were compared against
the three age groups of NH children by Bonferroni-adjusted
pairwise t-tests. Ratings of CI children’s imitations
(M ¼ 3.80, SD ¼ 1.13) trended lower but did not differ
significantly from those of 5- to 6-yr-old NH children
FIG. 4. Mean ratings of CI and NH children on the four utterance types.
Error bars indicate standard errors.
1312
J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012
Congenitally deaf Japanese children with cochlear
implants succeeded in identifying happy and sad utterances
that were amplitude normalized, but they failed to identify
angry utterances. In fact, they typically misidentified angry
utterances as happy. Although happy and sad utterances usually differ in pitch level, pitch variability, amplitude, and
speaking rate, happy and angry utterances tend to be minimally contrastive in these respects (Banse and Scherer,
1996). The provision of amplitude cues, which were unavailable in the present study, would have increased the distinctions between happy and sad utterances but not between
happy and angry utterances. The happy and angry utterances
were somewhat contrastive in pitch level and pitch variability, but they were very similar in speaking rate (see
Table III). In view of the pitch processing limitations of CI
users (Geurts and Wouters, 2001), it is likely that child CI
users relied primarily on differences in speaking rate, which
led to successful differentiation of happy from sad utterances
and unsuccessful differentiation of happy from angry
utterances. There is suggestive evidence that child CI users
identify familiar voices on the basis of articulatory timing
cues and speaking rate even though NH children do so
largely on the basis of pitch and timbre cues (Vongpaisal
et al., 2010). Pitch and timbre cues in the present study
enabled NH children to achieve errorless performance on all
emotion categories.
What factors underlie CI children’s success in differentiating happy from sad utterances and their failure to do so in
previous research (Hopyan-Misakyan et al., 2009)? Perhaps
the three-alternative forced-choice task in the present study
was somewhat easier than the four-alternative forced-choice
task used by Hopyan-Misakyan et al. (2009) and the sixalternative task used by Most and Peled (2007). What was
undoubtedly more important was the clarity of emotional
expression in the present study, which resulted in errorless
performance for NH children. By contrast, NH children’s
correct identification of happy and sad utterances in the
Hopyan-Misakyan et al. (2009) study was 77% and 83%,
respectively, dropping to 48% and 50% for CI children.
No feedback was provided in the present task or in
previous studies of emotion identification by CI users.
Nevertheless, it is likely to help child CI users focus on the
relevant cues to optimize their performance. Feedback,
which has been used successfully in child CI users’
Nakata et al.: Speech prosody and cochlear implants
Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
differentiation of same-gender voices (Vongpaisal et al.,
2010), can be regarded as a form of short-term training.
There are indications of the efficacy of more extended training in the domain of prosody. For example, several weeks of
prosodic training for CI children led to improvement in prosodic perception and imitation (Klieve and Jeanes, 2001). At
present, however, habilitation programs for CI children focus
largely on phonemic perception and production as well as
language acquisition with a very limited role for prosody.
Surprisingly, NH children were not very proficient at
imitating stereotyped intonation patterns that are used widely
in Japanese conversation. In fact, they showed an unexpectedly protracted course of development. Even the oldest NH
children, who were 9–10 years of age, did not consistently
provide precise imitations of the modeled utterances. Perhaps it is not surprising, then, that the child CI users, most of
whom were well beyond 6 yr of age, performed much like
NH 5- and 6-yr-olds. Peng et al. (2007) found a protracted
and uneven developmental course for the imitation of rising
intonation patterns by child and adult CI users who were
native speakers of Mandarin. Specifically, CI users progressed steadily from year to year, reaching a plateau 7 yr
post-implant and subsequently showing some regression.
Most of the child CI users in the present sample had considerably less than 7 yr of implant experience. Unlike the situation in the Peng et al. (2007) study, however, years of
implant experience were uncorrelated with the performance
of CI users in the present study. Participants in the Peng
et al. (2007) study had a much wider range of implant
experience than those in the present study, and they were
considerably older, which could account for the contrasting
experiential effects. The target utterances were also different: stereotyped emotional expressions and animal sounds in
the present study and yes-no questions in the other study.
The relative difficulty of imitating the four utterances in
the present study is unlikely to be related to their relative
familiarity. All of the utterances would be highly familiar to
Japanese speakers, but the expressions of surprise and disappointment have the greatest frequency of occurrence. One
might argue that animal sounds are more salient for young
children than are conventional expressions of surprise and
disappointment. However, neither frequency of occurrence
nor age appropriateness can account for the observed patterns of performance. The order of difficulty was similar for
NH and CI children (see Fig. 4), increasing the likelihood
that relative difficulty was related to complexity of pitch patterning (see Fig. 2). For example, the highest ratings, corresponding to the most accurate reproductions, were obtained
for the surprise expression, which featured a unidirectional
(rising) pitch contour, and the lowest ratings for the disappointment expression, which featured a bi-directional (falling-rising) contour. Because child CI users were relatively
successful in reproducing the surprise expression, with its
rising pitch contour, they might also be successful in producing yes-no questions, which are challenging for many adult
CI users (Peng et al., 2007). It is important to note, however,
that the absence of a model is likely to result in poorer prosodic production skills than those observed in the present
study.
J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012
The significant correlation between CI children’s perception of affective prosody and their prosodic imitations
implies that perceptual factors played an important role in
the observed production difficulties. In fact, the correlations
mirror those found by Peng et al. (2008) for the perception
and production of statements and yes-no questions by child
and young adult CI users. Nevertheless, device limitations
cannot be entirely responsible for the difficulties of child CI
users. In some studies, years of implant experience have
made important contributions to prosodic perception and
production (Peng et al., 2007; Peng et al., 2008). The role of
everyday experience and learning was also evident in the
modest prosodic imitations of 5- to 6-yr-old NH children relative to their older counterparts. Finally, the ability of some
CI users to produce age-appropriate intonation patterns
implies that the spectral processing limitations of current
implants provide challenges but not insurmountable barriers
to progress in this domain. An important task for future
research is to isolate the factors associated with positive outcomes in the best performers. Moreover, because prosodic
perception and production contribute to social and emotional
well-being (Schorr et al., 2009), and they are responsive to
intervention in CI users (Klieve and Jeanes, 2001), it would
be prudent to invest more research and rehabilitation effort
in this realm.
ACKNOWLEDGMENTS
This research was supported by grants from the Japan
Society for the Promotion of Science to T.N. and from the
Natural Sciences and Engineering Research Council of Canada to S.E.T. We gratefully acknowledge the cooperation of
many parents and children whose participation made this
study possible. We also acknowledge the insightful advice
of Dr. Haruo Takahashi and Hideo Tanaka and assistance
with data collection from Akiko Ito, Yumiko Kido, Yoko
Kakita, Noriko Matsunaga, Kyoko Ohba, Kyoko Sasaki, and
Naoko Maruo.
Banse, R., and Scherer, K. R. (1996). “Acoustic profiles in vocal emotion
expression,” J. Pers. Soc. Psychol. 70, 614–636.
Barry, J. G., Blamey, P. J., Martin, L. F. A., Lee, K. Y. S., Tang, T., Ming,
Y. Y., and van Hasselt, C. A. (2002). “Tone discrimination in Cantonesespeaking children using a cochlear implant,” Clin. Linguist. Phonet. 16,
79–99.
Boersma, P., and Weenink, D. (2008). “PRAAT” (University of Amsterdam).
Breitenstein, C., Van Lancker, D., and Daum, I. (2001). “The contribution
of speech rate and pitch variation to the perception of vocal emotions in a
German and an American sample,” Cognit. Emotion 15, 57–79.
Ciocca, V., Francis, A. L., Aisha, R., and Wong, L. (2002). “The perception
of Cantonese lexical tones by early-deafened cochlear implantees,”
J. Acoust. Soc. Am. 111, 2250–2256.
Cooper, W. B., Tobey, E., and Loizou, P. C. (2008). “Music perception by
cochlear implant and normal hearing listeners as measured by the Montreal Battery for Evaluation of Amusia,” Ear Hear. 29, 618–626.
Flipsen, P., Jr., and Colvard, L. G. (2006). “Intelligibility of conversational
speech produced by children with cochlear implants,” J. Commun. Disord.
39, 93–108.
Geurts, L., and Wouters, J. (2001). “Coding of the fundamental frequency
in continuous interleaved sampling processors for cochlear implants,”
J. Acoust. Soc. Am. 109, 713–726.
Green, T., Faulkner, A., and Rosen, S. (2004). “Enhancing temporal cues
to voice pitch in continuous interleaved sampling cochlear implants,”
J. Acoust. Soc. Am. 116, 2298–2310.
Nakata et al.: Speech prosody and cochlear implants
Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp
1313
Hasada, R. (2001). “Meaning of Japanese sound-symbolic emotion words,”
in Emotions in Crosslinguistic Perspective, edited by J. Harkins and A.
Wierzbicka (Mouton de Gruyter, Berlin), pp. 217–253.
Hopyan-Misakyan, T. M., Gordon, K. A., Dennis, M., and Papsin, B. C.
(2009). “Recognition of affective speech prosody and facial affect in deaf
children with unilateral right cochlear implants,” Child Neuropsychol. 15,
136–146.
Kang, R., Nimmons, G. L., Drennan, W., Lognion, J., Ruffin, C., Nie, K.,
Won, J. H., Worman, T., Yueh, B., and Rubinstein, J. (2009).
“Development and validation of the University of Washington Clinical
Assessment of Music Perception test,” Ear Hear. 30, 411–418.
Klieve, S., and Jeanes, R. C. (2001). “Perception of prosodic features by
children with cochlear implants: Is it sufficient for understanding meaning
differences in language?,” Deaf. Educ. Int. 3, 15–37.
Ladd, D. R. (1996). Intonational Phonology (Cambridge University Press,
Cambridge), pp. 6–33.
Lenden, J. M., and Flipsen, P., Jr. (2007). “Prosody and voice characteristics
of children with cochlear implants,” J. Commun. Disord. 40, 66–81.
Luo, X., Fu, Q.-J., and Galvin, J. J. (2007). “Vocal emotion recognition by
normal-hearing listeners and cochlear implant users,” Trends Amplif. 11,
301–315.
Meister, H., Landwehr, M., Pyschny, V., Walger, M., and von Wedel, H.
(2009). “The perception of prosody and speaker gender in normal-hearing
listeners and cochlear implant recipients,” Int. J. Audiol. 48, 38–48.
Most, T., and Aviner, C. (2009). “Auditory, visual, and auditory-visual perception of emotions by individuals with cochlear implants, hearing aids,
and normal hearing,” J. Deaf Stud. Deaf Educ. 14, 449–464.
Most, T., and Peled, M. (2007). “Perception of suprasegmental features of
speech by children with cochlear implants and children with hearing aids,”
J. Deaf Stud. Deaf Educ. 12, 350–361.
Murray, I. R., and Arnott, J. L. (1993). “Toward the simulation of emotion
in synthetic speech: A review of the literature on human vocal emotion,”
J. Acoust. Soc. Am. 93, 1097–1108.
Peng, S.-C., Spencer, L. J., and Tomblin, J. B. (2004a). “Speech intelligibility of pediatric cochlear implant recipients with 7 years of device experience,” J. Speech Lang. Hear. Res. 47, 1227–1236.
Peng, S.-C., Tomblin, J. B., Cheung, H., Lin, Y.-S., and Wang, L.-S.
(2004b). “Perception and production of Mandarin tones in prelingually
deaf children with cochlear implants,” Ear Hear. 25, 251–264.
1314
J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012
Peng, S.-C., Tomblin, J. B., Spencer, L. J., and Hurting, R. R. (2007).
“Imitative production of rising speech intonation in pediatric cochlear
implant recipients,” J. Speech Lang. Hear. Res. 50, 1210–1227.
Peng, S.-C., Tomblin, J. B., and Turner, C. W. (2008). “Production and perception of speech intonation in pediatric cochlear implant recipients and
individuals with normal hearing,” Ear Hear. 29, 336–351.
Pereira, C. (2000). “The perception of vocal affect by cochlear implantees,”
in Cochlear Implants, edited by S. B. Waltzmban and N. L. Cohen
(Thieme Medical, New York), pp. 343–345.
Rosen, S. (1992). “Temporal information in speech: Acoustic, auditory and
linguistic aspects,” in Processing Complex Sounds by the Auditory System,
edited by R. P. Carlyon, C. J. Darwin, and I. J. Russell (Clarendon Press,
New York), pp. 73–79.
Saarni, C. (1999). The Development of Emotional Competence (Guilford
Press, New York), pp. 131–161.
Schorr, E. A. (2005). “Social and emotional functioning of children with
cochlear implants,” Ph.D. thesis, University of Maryland, College Park,
MD.
Schorr, E. A., Roth, F. P., and Fox, N. A. (2009). “Quality of life for children with cochlear implants: Perceived benefits and problems and the perception of single words and emotional sounds,” J. Speech Lang. Hear.
Res. 52, 141–152.
Svirsky, M. A., Chin, S. B., and Jester, A. (2007). “The effects of age at
implantation on speech intelligibility in pediatric cochlear implant
users: Clinical outcomes and sensitive periods,” Audiol. Med. 5,
293–306.
Vongpaisal, T., Trehub, S. E., and Schellenberg, E. G. (2006). “Song recognition by children and adolescents with cochlear implants,” J. Speech
Lang. Hear. Res. 49, 1091–1103.
Vongpaisal, T., Trehub, S. E., Schellenberg, E. G., van Lieshout, P., and
Papsin, B. C. (2010). “Children with cochlear implants recognize their
mother’s voice,” Ear Hear. 31, 555–566.
Wei, W. I., Wong, R., Hui, Y., Au, D. K. K., Wong, B. Y. K., Ho, W. K.,
Tsang, A., Kung. P., and Chung, E. (2000). “Chinese tonal language rehabilitation following cochlear implantation in children,” Acta Otolaryngol.
120, 218–221.
Xu, L., Li, Y., Hao, J., Chen, X., Xue, S. A., and Han, D. (2004). “Tone production in Mandarin-speaking children with cochlear implants: A preliminary study,” Acta Otolaryngol. 124, 363–367.
Nakata et al.: Speech prosody and cochlear implants
Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp