* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Effect of cochlear implants on children`s perception and production
Survey
Document related concepts
Transcript
Effect of cochlear implants on children’s perception and production of speech prosody Takayuki Nakataa) Department of Complex and Intelligent Systems, Future University Hakodate, 116-2 Kamedanakano, Hakodate, Hokkaido 041-8655, Japan Sandra E. Trehub Department of Psychology, University of Toronto, 3359 Mississauga Road North, Mississauga, Ontario L5L 1C6, Canada Yukihiko Kanda Kanda ENT Clinic and Nagasaki Bell Hearing Center, 4-25 Wakakusa-machi, Nagasaki, Nagasaki 852-8023, Japan (Received 10 June 2010; revised 30 November 2011; accepted 30 November 2011) Japanese 5- to 13-yr-olds who used cochlear implants (CIs) and a comparison group of normally hearing (NH) Japanese children were tested on their perception and production of speech prosody. For the perception task, they were required to judge whether semantically neutral utterances that were normalized for amplitude were spoken in a happy, sad, or angry manner. The performance of NH children was error-free. By contrast, child CI users performed well below ceiling but above chance levels on happy- and sad-sounding utterances but not on angry-sounding utterances. For the production task, children were required to imitate stereotyped Japanese utterances expressing disappointment and surprise as well as culturally typically representations of crow and cat sounds. NH 5- and 6-year-olds produced significantly poorer imitations than older hearing children, but age was unrelated to the imitation quality of child CI users. Overall, child CI user’s imitations were significantly poorer than those of NH children, but they did not differ significantly from the imitations of the youngest NH group. Moreover, there was a robust correlation between the performance of child CI users on the perception and production tasks; this implies that difficulties with prosodic perception underlie their difficulties with prosodic imitation. C 2012 Acoustical Society of America. [DOI: 10.1121/1.3672697] V PACS number(s): 43.64.Me, 43.70.Mn, 43.71.Ky, 43.66.Ts [JES] I. INTRODUCTION Prosody can distinguish statements (e.g., You’re tired.) from questions (e.g., You’re tired?) and nouns (e.g., object) from verbs (e.g., object). It also conveys information about a speaker’s feelings (e.g., happy, sad) and communicative intentions (e.g., humor or irony). These prosodic distinctions are realized acoustically by variations in fundamental frequency (F0) or pitch, duration, and amplitude. For example, happy and angry utterances are higher and more variable in pitch, louder, and faster than sad utterances (Banse and Scherer, 1996). Speaking rate (Breitenstein et al., 2001) and pitch patterns (Murray and Arnott, 1993; Ladd, 1996) are thought to play a dominant role in the perception of intonation and vocal emotion with amplitude making secondary contributions (Murray and Arnott, 1993; Ladd, 1996). Because cochlear implants (CIs) provide limited information about the temporal fine structure of speech—its pitch and harmonics, in particular (Geurts and Wouters, 2001; Green et al., 2004), CI users are likely to have difficulty perceiving and producing some aspects of prosody. Not surprisingly, the degraded pitch and spectral cues provided by implants a) Author to whom correspondence should be addressed. Electronic mail: [email protected] J. Acoust. Soc. Am. 131 (2), February 2012 Pages: 1307–1314 also have adverse consequences for CI users’ perception of music (Vongpaisal et al., 2006; Cooper et al., 2008; Kang et al., 2009). The distinction between statements (i.e., falling terminal pitch) and yes-no questions (i.e., rising terminal pitch) may be easier than other prosodic distinctions because implant users can capitalize on gross periodicity cues in the signal (Rosen, 1992). In general, child and adult CI users differentiate questions from statements, but performance is well below that of normally hearing (NH) peers (Most and Peled, 2007; Peng et al., 2008; Meister et al., 2009). Moreover, child CI users’ ability to distinguish the intonation patterns of statements from questions is poorer than that of severely and profoundly deaf children who use hearing aids (Most and Peled, 2007). The decoding of vocal emotions involves even greater challenges, especially for child CI users. Adult CI users generally recognize vocal emotions at better than chance levels, but their performance is substantially poorer than that of NH adults. With semantically neutral sentences that portray happy, sad, angry, and neutral emotions, CI users achieve 51% correct on a four-alternative forced-choice task as compared with 84% correct for NH adults (Pereira, 2000). Because of overlapping acoustic cues, perhaps it is not surprising that happy utterances are confused with angry 0001-4966/2012/131(2)/1307/8/$30.00 C 2012 Acoustical Society of America V Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp 1307 utterances and sad utterances with neutral utterances. In a subsequent study with a five-alternative forced-choice task (anxious emotions added to the other four), adult CI users achieved roughly 45% correct on utterances with normally varying amplitude and 37% on utterances with normalized amplitude (Luo et al., 2007). The corresponding performance levels for hearing adults were 90% and 87% correct, respectively. CI users’ ability to distinguish emotions with higher amplitude (happy, angry, anxious) from those with lower amplitude (sad, neutral) and the adverse consequences of normalized amplitude indicate that intensity cues play a much greater role in emotion recognition for CI adults than for NH adults. Moreover, it is likely that CI users’ confusion among happy, angry, and anxious emotions, which differed mainly in mean pitch and pitch variability, stemmed from their problems with pitch resolution. Because the adult CI users in these studies were postlingually deafened, they would have well-established representations of various vocal emotions. Nevertheless, those representations were insufficient in the absence of critical pitch cues. The situation is even more challenging for prelingually deaf children with CIs, who must learn the vocal emotion categories from spectrally degraded input. One would therefore expect child CI users to have even greater difficulty recognizing vocal emotions than adult CI users. In one study, NH children and adolescents identified emotions more accurately from audiovisual information than from visual information alone, but the addition of auditory information did not improve performance for same-age CI users (Most and Aviner, 2009). In another study, CI users 8–13 years of age failed to differentiate happy, sad, angry, and fearful utterances, but they showed age-appropriate differentiation of facial portrayals of the same emotions (Hopyan-Misakyan et al., 2009). Child CI users have more success identifying the valence of nonverbal expressions of emotion (e.g., giggling, crying, and interjections like ow and mmm), even though they often assign positive valence to crying (Schorr, 2005). Interestingly, performance on the latter task is correlated with self-ratings of quality of life with a cochlear implant (Schorr et al., 2009), highlighting the contribution of vocal emotion perception to effective communication and social adjustment. Although the general consensus is that adult and child CI users have considerable difficulty interpreting vocal emotion, there is limited research in this realm compared to the large literature on speech perception. Accordingly, one goal of the present investigation was to examine vocal emotion perception in child CI users and a comparison group of NH children from Japan. Instead of the four target emotions (happy, sad, angry, fearful) used by Hopyan-Misakyan et al. (2009) and the six emotions (happy, sad, angry, fearful, disgusted, surprised) used by Most and Aviner (2009), the present study was restricted to happy, sad, and angry emotions. Instead of stimuli that yield modest levels of performance with NH children, the present stimuli yielded virtually unanimous agreement with NH children. Finally, unlike the vocal emotion stimuli in previous studies with CI users, the present stimuli were normalized for amplitude. Amplitude normalization eliminated an important cue for CI users, making it 1308 J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012 possible to examine the contributions of spectral and speech rate information. Child CI users were expected to identify happy and sad utterances at better than chance levels not only because these are the earliest emotion categories acquired by children (Saarni, 1999) but also because they are the most acoustically contrastive. CI users were expected to confuse happy with angry utterances because of overlapping acoustic cues for these vocal emotions. To date, there has been no research on the production of vocal emotions in CI users. For adult implant users, such productions would be based largely on patterns of speech acquired prior to their hearing loss. By contrast, successful production of vocal emotions in child CI users would depend on their ability to extract the relevant features from the input, associate them with appropriate emotional categories, and monitor feedback from their own productions. Research on vocal production in child CI users has focused largely on speech intelligibility (Peng et al., 2004a; Flipsen and Colvard, 2006; Svirsky et al., 2007) and only minimally on prosody. Because pitch level and pitch contour can be cues to word meaning in tone languages such as Mandarin or Cantonese, these cues are critical to speech perception and production in such languages. The available evidence reveals that child CI users have difficulty perceiving lexical tone contrasts (Barry et al., 2002; Ciocca et al., 2002). It is not surprising, then, that they have difficulty mastering the production of such contrasts (Wei et al., 2000; Xu et al., 2004), even after 6 years of implant experience. Nevertheless, the problems cannot be attributed entirely to device limitations because some CI users succeed in perceiving and producing Mandarin tones (Peng et al., 2004b). In general, cochlear implants have resulted in significant improvements in prosodic production on the part of prelingually deaf children. Conversational samples of speech reveal superior pitch contour and phrasing for child CI users relative to same-age hearing aid users with severe or profound hearing loss (Lenden and Flipsen, 2007). Nevertheless, child CI users have substantial deficiencies in prosodic production relative to NH children. For example, adults categorize the statements and yes-no questions produced by 7- to 20-yr-old CI users with 73% accuracy (chance level of 50%) and those produced by age-matched NH individuals with 97% accuracy (Peng et al., 2008). Moreover, the quality of CI users’ productions correlates significantly with their accuracy of perceiving the distinctions between statements and questions. In another study, child CI users were required to imitate a question with rising intonation contour (Are you ready?) on an annual basis for several years post implant (Peng et al., 2007). Production of rising intonation was consistent with gradual improvement from year to year until 7 years of implant experience, when about 62% of utterances were judged as contour appropriate. Because there were no comparison data from NH children, it is difficult to separate the contribution of implant experience from age-related changes in prosodic imitation. Moreover, rising intonation is not obligatory for yes-no questions that involve inversion of subject and verb (e.g., Are you ready?), which complicates the interpretation of productions that lack rising intonation. In principle, it would be preferable to evaluate prosodic Nakata et al.: Speech prosody and cochlear implants Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp TABLE I. Background information of CI participants. Child Age at implant (yr) Age at test (yr) Implant use (yr) Hearing aid at test Age at amplification (yr) Device Coding CI-1 CI-2 CI-3 CI-4 CI-5 CI-6 CI-7 CI-8 CI-9 CI-10 CI-11 CI-12 CI-13 CI-14 CI-15 CI-16 CI-17 CI-18 2.5 3.4 3.0 4.9 2.9 2.8 4.2 3.1 3.1 3.7 4.8 2.6 5.3 3.8 3.1 4.5 4.4 2.1 6.4 6.6 6.6 7.3 7.4 7.4 7.5 7.6 8.0 8.3 8.3 8.5 8.7 9.2 9.8 11.3 12.6 13.2 3.9 3.2 3.7 2.4 4.5 4.6 3.2 4.6 4.9 4.6 3.5 5.9 3.4 5.3 6.7 6.9 8.3 8.1 No Yes Yes Yes Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No No No 1.9 0.5 2.0 1.2 0.5 1.9 2.2 1.0 2.0 2.0 2.4 1.0 0.7 2.1 1.5 0.6 never 1.7 Cochlear nucleus Cochlear nucleus Cochlear nucleus Cochlear nucleus Cochlear nucleus Cochlear nucleus Cochlear nucleus Cochlear nucleus Cochlear nucleus Advanced bionics clarion Cochlear nucleus Advanced bionics clarion Cochlear nucleus Cochlear nucleus Cochlear nucleus Cochlear nucleus Cochlear nucleus Cochlear nucleus ACE ACE ACE ACE ACE ACE ACE ACE ACE SAS ACE SAS ACE ACE SPEAK SPEAK SPEAK SPEAK production from imitative contexts where the target prosodic patterns are obligatory. A second goal of the present investigation was to examine changes in prosodic imitation as a function of age and implant use. Accordingly, we evaluated the ability of child CI users and NH children to imitate two typical Japanese exclamatory utterances—one involving falling-rising intonation, the other rising intonation—and two characteristic simulations of animal sounds—a crow (two successive falling contours) and a cat (rising-falling intonation). Onomatopoeia and sound symbolism are much more common in Japanese than in English, both in formal and in vernacular language (Hasada, 2001), so that expressions such as those used here would be understood widely in the general population. Presumably, age-related changes in the prosodic productions of NH children would provide an effective standard against which to evaluate the abilities of CI children. II. METHOD A. Participants The target participants in the present study were 18 Japanese children (3 boys, 15 girls), 5–13 yr of age (M ¼ 8.60, SD ¼ 1.97), all but one of whom were congenitally deaf and used CIs for at least 2 years (M ¼ 4 years, 10 months). One child became deaf at 4 years of age but was retained in the sample because of performance within 1 SD of the other children. All CI children were relatively successful implant users as indicated by their placement in age-appropriate classes in regular schools and their ease of communicating orally with hearing adults and children. Prior to implantation, these children had used bilateral hearing aids. Details of etiology, age of implantation, type of implant, and age of testing are provided in Table I. A control sample consisted of 44 Japanese NH children (17 boys, 27 girls) 5–10 yr of age (for boys, M ¼ 7.59, SD ¼ 1.90 and for girls, M ¼ 7.38, J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012 SD ¼ 1.59). Their hearing status was not tested but based on parent report. An additional five children were tested but excluded from the final sample because of failure to understand the task (one CI child, one NH child) or technical recording difficulties (three NH children). All CI and NH children participated in the prosody production task. After testing 20 NH children (4 5–6-yr-olds, 5 7–8-yr-olds, and 11 9–10-yr-olds) on both prosody perception and prosody production tasks, testing of NH children on the prosody perception task was discontinued because of error-free performance by all of the children. NH undergraduates (n ¼ 31) from Japan who were 19–22 years of age were recruited to rate the utterance imitations of CI and NH children. The hearing status of undergraduates was not tested but based on selfreport. B. Apparatus For the CI children, testing occurred in a soundattenuating booth at a hearing clinic that the children visited regularly. For the NH children, 29 of them were tested in a sound-attenuating booth. The remaining children, 17 of the 22 5- to 6-yr-olds, were tested with headphones in a quiet room on the premises of their kindergarten. There was no indication that differences in the test conditions of NH children—sound field in a sound-attenuating booth or headphones in a quiet room—affected performance. Presentation of audio stimuli and recording of responses for perception and production tasks were controlled by customized software on a touch screen notebook computer. A desktop computer with regular monitor controlled presentation and recording of responses for rating tasks. For testing conducted in the sound-attenuating booth, stimuli were presented from a loudspeaker 1 m in front of the participant. For testing conducted at the kindergarten, stimuli were presented over headphones. The NH undergraduates were tested in a sound-attenuating booth with audio stimuli presented over loudspeakers (Bose Micro Music Monitor). Nakata et al.: Speech prosody and cochlear implants Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp 1309 TABLE II. Sentences for prosody perception task. Sentence 1. Denki-o tsukemasu (I turn on the light) 2. Isu-ni suwarimasu (I sit on a chair) 3. Enpitsu-o motteimasu (I have a pencil) 4. Michi-o arukimasu (I walk in the street) FIG. 1. Schematic happy, sad, and angry faces. C. Stimuli For the prosody perception task, four semantically neutral utterances (see Table II) were produced by the same woman with three types of affective prosody–happy (fast speaking rate, high pitch level), sad (slow speaking rate, low pitch level), or angry (fast speaking rate, high pitch level), for a total of 12 utterances. The speaker was instructed to portray four emotionally neutral utterances with happy, sad, and angry paralanguage at four levels of emotional intensity, reflecting gradations of the specific emotion in question. From these recordings, the best or clearest exemplars of happy, sad, and angry emotions were selected for each utterance and emotion category. Root-mean-square amplitude was equalized across utterances by means of PRAAT 5.0.34 speech analysis and synthesis software (Boersma and Weenink, 2008), and stimuli were presented at approximately 65-70 dB SPL. Mean pitch, pitch range, and speaking rate are shown in Table III. As can be seen in Table III, mean F0, F0 range, and F0 variability were highest for happy sentences, intermediate for angry sentences, and lowest for sad sentences. Utterance duration was shortest (i.e., fastest speaking rate) for happy sentences, similar for angry sentences, and longest (i.e., slowest speaking rate) for sad sentences. Sentence 1 with happy prosody, sentence 2 with sad prosody, and sentence 3 with angry prosody (see Table II) were used as practice stimuli and also as test stimuli. Three line drawings depicted schematic facial expressions of happiness, sadness, or anger (see Fig. 1). For the prosody production task (see Table IV), there were four utterances consisting of two Japanese exclamatory utterances—a stereotyped expression of disappointment involving falling and rising contour; a stereotyped expression of surprise involving rising contour—and two characteristic simulations of animal sounds—Japanese renditions of the caw caw of a crow (repeated falling contour) and the meow of a cat (rising and falling contour). These utterances were presented at 65-70 dB SPL. A digital audio recorder (SONY TCD-D8) and microphone (EV Cobalt Co9) were used for recordings of stimuli for the prosody perception and production tasks and for recordings of responses for the production task. D. Procedure All children were tested individually. The prosody perception task always preceded the prosody production task. The prosody perception task was introduced as a listening game. First, the experimenter demonstrated each type of emotional expression with animated examples (e.g., “When I’m feeling happy, I talk like this”). Then the children were encouraged to show how they talk when they feel happy, sad, and angry. After those demonstrations, children proceeded to the perception task in which they were asked to judge whether the girl, who was named Mei, was feeling happy, sad, or angry. They were required to respond by touching the picture (one of three schematic drawings) on a touch screen notebook computer that showed her feelings. Initially, they received one practice trial with each of three emotion categories, and they received verbal feedback about their response choices. If children responded correctly to the three utterances, they proceeded to the test phase. If not, the practice phase was continued with the same three utterances TABLE III. Acoustic features of utterances in prosody perception task. Emotion Sentence Mean F0 (Hz) F0 range (Hz) SD of F0 (Hz) Sentence duration (s) Happy Happy Happy Happy Happy 1 2 3 4 Mean of 1–4 374 398 405 397 393 171 240 140 224 194 43 41 43 52 45 1.009 1.192 1.394 0.966 1.140 Sad Sad Sad Sad Sad 1 2 3 4 Mean of 1–4 291 283 263 270 277 116 76 99 62 88 33 10 28 13 21 1.236 1.368 1.561 1.503 1.417 Angry Angry Angry Angry Angry 1 2 3 4 Mean of 1–4 350 386 362 317 354 174 229 136 113 163 43 45 39 28 39 1.088 1.208 1.406 1.050 1.188 1310 J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012 Nakata et al.: Speech prosody and cochlear implants Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp TABLE IV. Utterances for prosody production task. Utterances 1. Ah-ah (exclamatory expression of disappointment) 2. Eh (exclamatory expression of surprise) 3. Kah-kah (simulated sound of crow) 4. Nya-o (simulated sound of cat) until they understood the goal of the game (i.e., matching the vocal emotional expressions to the pictured facial expressions) and responded correctly to those utterances. During the test phase, the 12 utterances, which included the three used in the practice phase, were presented in random order without feedback. For the prosody production task, children were asked to imitate, as closely as possible, each of the model’s utterances, which were presented in fixed order— exclamatory expression of disappointment (falling-rising contour), exclamatory expression of surprise (rising contour), crow sound (falling-falling contour), and cat sound (rising-falling contour) (see Fig. 2). Undergraduate students were tested individually. They listened to the utterances from CI and NH children and rated how closely each utterance matched the model on a 10-point scale (1 ¼ not similar at all to 10 ¼ extremely similar). To familiarize raters with the range of expected performance, there was a practice phase in which they listened to imitations of utterance 1 by four children whose utterances were also presented in the rating task. These four children (three CI children and one NH child) were chosen on the basis of preliminary ratings indicating that they spanned the range of rating categories from lowest to highest. III. RESULTS NH children were 100% accurate on the perception task. For CI children, preliminary one-sample t-tests showed correct identification above chance levels (33%) for happy utterances (M ¼ 61%, SD ¼ 32), t(17) ¼ 3.64, P ¼ 0.002, and for sad utterances (M ¼ 64%, SD ¼ 33), t(17) ¼ 3.88, P ¼ 0.001, but not for angry utterances (M ¼ 39%, SD ¼ 31), t(17) ¼ 0.76, P ¼ 0.460. An analysis of variance (ANOVA) of CI children’s performance revealed a significant effect of emotion category, F(2,34) ¼ 4.19, P ¼ 0.024, g2 ¼ 0.20. Post hoc t-tests with Bonferroni adjusted P values revealed significantly poorer identification of angry utterances than sad utterances, P ¼ 0.013. Angry utterances were misidentified as happy 83% (SD ¼ 21) of the time, significantly more often than would be expected by chance, t(15) ¼ 6.22, P < 0.001. CI children’s performance on prosody perception was not correlated with chronological age, r(16) ¼ 0.008, age of implantation, r(16) ¼ 0.25, or implant experience, r(16) ¼ 0.10, Ps > 0.05. Undergraduate raters were divided in two groups. The first group (n ¼ 14) rated 22 CI children and 22 NH children. The second group (n ¼ 17) rated 9 randomly selected children from the 22 CI children rated by the first group as well as 28 additional NH children. Interrater reliability on ratings for the 9 CI children was highly significant (intraclass correlation coefficient ¼ 0.96, P < 0.001). For each NH and CI child, there were four scores corresponding to the average ratings of the judges on each of the four imitated utterances. CI children’s performance on prosody production, as reflected in the ratings, showed no significant correlation with chronological age, r(16) ¼ 0.06, age of implantation, r(16) ¼ 0.37, or implant experience, r(16) ¼ 0.13, Ps > 0.05. Mean ratings for the three age groups of hearing children as well as the entire group of CI children are shown in Fig. 3. An ANOVA to examine the effects of hearing status (CI or NH) and utterance type on adults’ ratings of children’s imitations revealed a main effect of hearing status, F(1,60) ¼ 18.68, P < 0.001, g2 ¼ 0.24. Collapsing over age and utterance type, CI children received significantly lower ratings (M ¼ 3.80, SD ¼ 1.13) than NH children (M ¼ 5.51, SD ¼ 1.51). There was also a main effect of utterance type, F(3,180) ¼ 20.42, P < 0.001, g2 ¼ 0.25 (see Fig. 4). Averaging across hearing status and age group, ratings were highest for surprise (M ¼ 5.73, SD ¼ 1.92), with its rising contour, which significantly exceeded all other ratings, Ps < 0.02 (with Bonferroni adjustment). Next highest were the ratings of the simulated cat sound (M ¼ 5.15, SD ¼ 1.72), followed by the ratings of the simulated crow sound (M ¼ 5.08, SD ¼ 1.64) and disappointment ratings (M ¼ 4.34, SD ¼ 1.77). Ratings of disappointment were significantly FIG. 2. Amplitude waveforms (top) and F0 patterns (bottom) of the four utterances in the prosody production task. J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012 Nakata et al.: Speech prosody and cochlear implants Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp 1311 (M ¼ 4.44, SD ¼ 1.13), P ¼ 0.26. CI children’s ratings were significantly poorer than those of NH 7- to 8-yr-olds (M ¼ 6.11, SD ¼ 1.23) and 9- to 10-yr-olds (6.86, SD ¼ 0.73), P < 0.01. While age was significantly correlated with imitation ratings for NH children, r(42) ¼ 0.66, P < 0.001, neither age nor implant experience was correlated with the performance of child CI users, rs < 0.14, Ps > 0.05. Nevertheless, there was a robust correlation between prosody perception and production among child CI users, r(16) ¼ 0.56, P < 0.05. The outcome of all analyses remained unchanged when the child who became deaf at 4 years of age was excluded. IV. DISCUSSION FIG. 3. Mean ratings of CI and NH children’s prosodic imitations. Error bars indicate standard errors. lower than all others, Ps < 0.01 (with Bonferroni adjustment). A separate ANOVA with age (5–6, 7–8, or 9–10 yr) as the independent variable and ratings of NH children’s imitations averaged over the four utterance types as the dependent variable revealed a highly significant effect of age, F(2,41) ¼ 24.80, P < 0.001, g2 ¼ 0.55. Ratings for NH children increased linearly from 5 to 6 yr of age (M ¼ 4.44, SD ¼ 1.13) to 7–8 yr of age (M ¼ 6.11, SD ¼ 1.23) and to 9–10 yr of age (M ¼ 6.86, SD ¼ 0.73). Follow-up Scheffé tests revealed that 5- to 6-yr olds (M ¼ 4.43, SD ¼ 1.13) received significantly lower ratings than 7- to 8-yr-olds (M ¼ 6.11, SD ¼ 1.23) and 9- to 10-yr-olds (M ¼ 6.86, SD ¼ 0.73), Ps < 0.005, who did not differ from one another. As can be seen in Fig. 3, the oldest NH children performed reasonably well but below ceiling levels (rating of 10). Imitations of the youngest NH children were only modestly similar to the modeled utterances. CI children’s imitation scores were compared against the three age groups of NH children by Bonferroni-adjusted pairwise t-tests. Ratings of CI children’s imitations (M ¼ 3.80, SD ¼ 1.13) trended lower but did not differ significantly from those of 5- to 6-yr-old NH children FIG. 4. Mean ratings of CI and NH children on the four utterance types. Error bars indicate standard errors. 1312 J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012 Congenitally deaf Japanese children with cochlear implants succeeded in identifying happy and sad utterances that were amplitude normalized, but they failed to identify angry utterances. In fact, they typically misidentified angry utterances as happy. Although happy and sad utterances usually differ in pitch level, pitch variability, amplitude, and speaking rate, happy and angry utterances tend to be minimally contrastive in these respects (Banse and Scherer, 1996). The provision of amplitude cues, which were unavailable in the present study, would have increased the distinctions between happy and sad utterances but not between happy and angry utterances. The happy and angry utterances were somewhat contrastive in pitch level and pitch variability, but they were very similar in speaking rate (see Table III). In view of the pitch processing limitations of CI users (Geurts and Wouters, 2001), it is likely that child CI users relied primarily on differences in speaking rate, which led to successful differentiation of happy from sad utterances and unsuccessful differentiation of happy from angry utterances. There is suggestive evidence that child CI users identify familiar voices on the basis of articulatory timing cues and speaking rate even though NH children do so largely on the basis of pitch and timbre cues (Vongpaisal et al., 2010). Pitch and timbre cues in the present study enabled NH children to achieve errorless performance on all emotion categories. What factors underlie CI children’s success in differentiating happy from sad utterances and their failure to do so in previous research (Hopyan-Misakyan et al., 2009)? Perhaps the three-alternative forced-choice task in the present study was somewhat easier than the four-alternative forced-choice task used by Hopyan-Misakyan et al. (2009) and the sixalternative task used by Most and Peled (2007). What was undoubtedly more important was the clarity of emotional expression in the present study, which resulted in errorless performance for NH children. By contrast, NH children’s correct identification of happy and sad utterances in the Hopyan-Misakyan et al. (2009) study was 77% and 83%, respectively, dropping to 48% and 50% for CI children. No feedback was provided in the present task or in previous studies of emotion identification by CI users. Nevertheless, it is likely to help child CI users focus on the relevant cues to optimize their performance. Feedback, which has been used successfully in child CI users’ Nakata et al.: Speech prosody and cochlear implants Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp differentiation of same-gender voices (Vongpaisal et al., 2010), can be regarded as a form of short-term training. There are indications of the efficacy of more extended training in the domain of prosody. For example, several weeks of prosodic training for CI children led to improvement in prosodic perception and imitation (Klieve and Jeanes, 2001). At present, however, habilitation programs for CI children focus largely on phonemic perception and production as well as language acquisition with a very limited role for prosody. Surprisingly, NH children were not very proficient at imitating stereotyped intonation patterns that are used widely in Japanese conversation. In fact, they showed an unexpectedly protracted course of development. Even the oldest NH children, who were 9–10 years of age, did not consistently provide precise imitations of the modeled utterances. Perhaps it is not surprising, then, that the child CI users, most of whom were well beyond 6 yr of age, performed much like NH 5- and 6-yr-olds. Peng et al. (2007) found a protracted and uneven developmental course for the imitation of rising intonation patterns by child and adult CI users who were native speakers of Mandarin. Specifically, CI users progressed steadily from year to year, reaching a plateau 7 yr post-implant and subsequently showing some regression. Most of the child CI users in the present sample had considerably less than 7 yr of implant experience. Unlike the situation in the Peng et al. (2007) study, however, years of implant experience were uncorrelated with the performance of CI users in the present study. Participants in the Peng et al. (2007) study had a much wider range of implant experience than those in the present study, and they were considerably older, which could account for the contrasting experiential effects. The target utterances were also different: stereotyped emotional expressions and animal sounds in the present study and yes-no questions in the other study. The relative difficulty of imitating the four utterances in the present study is unlikely to be related to their relative familiarity. All of the utterances would be highly familiar to Japanese speakers, but the expressions of surprise and disappointment have the greatest frequency of occurrence. One might argue that animal sounds are more salient for young children than are conventional expressions of surprise and disappointment. However, neither frequency of occurrence nor age appropriateness can account for the observed patterns of performance. The order of difficulty was similar for NH and CI children (see Fig. 4), increasing the likelihood that relative difficulty was related to complexity of pitch patterning (see Fig. 2). For example, the highest ratings, corresponding to the most accurate reproductions, were obtained for the surprise expression, which featured a unidirectional (rising) pitch contour, and the lowest ratings for the disappointment expression, which featured a bi-directional (falling-rising) contour. Because child CI users were relatively successful in reproducing the surprise expression, with its rising pitch contour, they might also be successful in producing yes-no questions, which are challenging for many adult CI users (Peng et al., 2007). It is important to note, however, that the absence of a model is likely to result in poorer prosodic production skills than those observed in the present study. J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012 The significant correlation between CI children’s perception of affective prosody and their prosodic imitations implies that perceptual factors played an important role in the observed production difficulties. In fact, the correlations mirror those found by Peng et al. (2008) for the perception and production of statements and yes-no questions by child and young adult CI users. Nevertheless, device limitations cannot be entirely responsible for the difficulties of child CI users. In some studies, years of implant experience have made important contributions to prosodic perception and production (Peng et al., 2007; Peng et al., 2008). The role of everyday experience and learning was also evident in the modest prosodic imitations of 5- to 6-yr-old NH children relative to their older counterparts. Finally, the ability of some CI users to produce age-appropriate intonation patterns implies that the spectral processing limitations of current implants provide challenges but not insurmountable barriers to progress in this domain. An important task for future research is to isolate the factors associated with positive outcomes in the best performers. Moreover, because prosodic perception and production contribute to social and emotional well-being (Schorr et al., 2009), and they are responsive to intervention in CI users (Klieve and Jeanes, 2001), it would be prudent to invest more research and rehabilitation effort in this realm. ACKNOWLEDGMENTS This research was supported by grants from the Japan Society for the Promotion of Science to T.N. and from the Natural Sciences and Engineering Research Council of Canada to S.E.T. We gratefully acknowledge the cooperation of many parents and children whose participation made this study possible. We also acknowledge the insightful advice of Dr. Haruo Takahashi and Hideo Tanaka and assistance with data collection from Akiko Ito, Yumiko Kido, Yoko Kakita, Noriko Matsunaga, Kyoko Ohba, Kyoko Sasaki, and Naoko Maruo. Banse, R., and Scherer, K. R. (1996). “Acoustic profiles in vocal emotion expression,” J. Pers. Soc. Psychol. 70, 614–636. Barry, J. G., Blamey, P. J., Martin, L. F. A., Lee, K. Y. S., Tang, T., Ming, Y. Y., and van Hasselt, C. A. (2002). “Tone discrimination in Cantonesespeaking children using a cochlear implant,” Clin. Linguist. Phonet. 16, 79–99. Boersma, P., and Weenink, D. (2008). “PRAAT” (University of Amsterdam). Breitenstein, C., Van Lancker, D., and Daum, I. (2001). “The contribution of speech rate and pitch variation to the perception of vocal emotions in a German and an American sample,” Cognit. Emotion 15, 57–79. Ciocca, V., Francis, A. L., Aisha, R., and Wong, L. (2002). “The perception of Cantonese lexical tones by early-deafened cochlear implantees,” J. Acoust. Soc. Am. 111, 2250–2256. Cooper, W. B., Tobey, E., and Loizou, P. C. (2008). “Music perception by cochlear implant and normal hearing listeners as measured by the Montreal Battery for Evaluation of Amusia,” Ear Hear. 29, 618–626. Flipsen, P., Jr., and Colvard, L. G. (2006). “Intelligibility of conversational speech produced by children with cochlear implants,” J. Commun. Disord. 39, 93–108. Geurts, L., and Wouters, J. (2001). “Coding of the fundamental frequency in continuous interleaved sampling processors for cochlear implants,” J. Acoust. Soc. Am. 109, 713–726. Green, T., Faulkner, A., and Rosen, S. (2004). “Enhancing temporal cues to voice pitch in continuous interleaved sampling cochlear implants,” J. Acoust. Soc. Am. 116, 2298–2310. Nakata et al.: Speech prosody and cochlear implants Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp 1313 Hasada, R. (2001). “Meaning of Japanese sound-symbolic emotion words,” in Emotions in Crosslinguistic Perspective, edited by J. Harkins and A. Wierzbicka (Mouton de Gruyter, Berlin), pp. 217–253. Hopyan-Misakyan, T. M., Gordon, K. A., Dennis, M., and Papsin, B. C. (2009). “Recognition of affective speech prosody and facial affect in deaf children with unilateral right cochlear implants,” Child Neuropsychol. 15, 136–146. Kang, R., Nimmons, G. L., Drennan, W., Lognion, J., Ruffin, C., Nie, K., Won, J. H., Worman, T., Yueh, B., and Rubinstein, J. (2009). “Development and validation of the University of Washington Clinical Assessment of Music Perception test,” Ear Hear. 30, 411–418. Klieve, S., and Jeanes, R. C. (2001). “Perception of prosodic features by children with cochlear implants: Is it sufficient for understanding meaning differences in language?,” Deaf. Educ. Int. 3, 15–37. Ladd, D. R. (1996). Intonational Phonology (Cambridge University Press, Cambridge), pp. 6–33. Lenden, J. M., and Flipsen, P., Jr. (2007). “Prosody and voice characteristics of children with cochlear implants,” J. Commun. Disord. 40, 66–81. Luo, X., Fu, Q.-J., and Galvin, J. J. (2007). “Vocal emotion recognition by normal-hearing listeners and cochlear implant users,” Trends Amplif. 11, 301–315. Meister, H., Landwehr, M., Pyschny, V., Walger, M., and von Wedel, H. (2009). “The perception of prosody and speaker gender in normal-hearing listeners and cochlear implant recipients,” Int. J. Audiol. 48, 38–48. Most, T., and Aviner, C. (2009). “Auditory, visual, and auditory-visual perception of emotions by individuals with cochlear implants, hearing aids, and normal hearing,” J. Deaf Stud. Deaf Educ. 14, 449–464. Most, T., and Peled, M. (2007). “Perception of suprasegmental features of speech by children with cochlear implants and children with hearing aids,” J. Deaf Stud. Deaf Educ. 12, 350–361. Murray, I. R., and Arnott, J. L. (1993). “Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion,” J. Acoust. Soc. Am. 93, 1097–1108. Peng, S.-C., Spencer, L. J., and Tomblin, J. B. (2004a). “Speech intelligibility of pediatric cochlear implant recipients with 7 years of device experience,” J. Speech Lang. Hear. Res. 47, 1227–1236. Peng, S.-C., Tomblin, J. B., Cheung, H., Lin, Y.-S., and Wang, L.-S. (2004b). “Perception and production of Mandarin tones in prelingually deaf children with cochlear implants,” Ear Hear. 25, 251–264. 1314 J. Acoust. Soc. Am., Vol. 131, No. 2, February 2012 Peng, S.-C., Tomblin, J. B., Spencer, L. J., and Hurting, R. R. (2007). “Imitative production of rising speech intonation in pediatric cochlear implant recipients,” J. Speech Lang. Hear. Res. 50, 1210–1227. Peng, S.-C., Tomblin, J. B., and Turner, C. W. (2008). “Production and perception of speech intonation in pediatric cochlear implant recipients and individuals with normal hearing,” Ear Hear. 29, 336–351. Pereira, C. (2000). “The perception of vocal affect by cochlear implantees,” in Cochlear Implants, edited by S. B. Waltzmban and N. L. Cohen (Thieme Medical, New York), pp. 343–345. Rosen, S. (1992). “Temporal information in speech: Acoustic, auditory and linguistic aspects,” in Processing Complex Sounds by the Auditory System, edited by R. P. Carlyon, C. J. Darwin, and I. J. Russell (Clarendon Press, New York), pp. 73–79. Saarni, C. (1999). The Development of Emotional Competence (Guilford Press, New York), pp. 131–161. Schorr, E. A. (2005). “Social and emotional functioning of children with cochlear implants,” Ph.D. thesis, University of Maryland, College Park, MD. Schorr, E. A., Roth, F. P., and Fox, N. A. (2009). “Quality of life for children with cochlear implants: Perceived benefits and problems and the perception of single words and emotional sounds,” J. Speech Lang. Hear. Res. 52, 141–152. Svirsky, M. A., Chin, S. B., and Jester, A. (2007). “The effects of age at implantation on speech intelligibility in pediatric cochlear implant users: Clinical outcomes and sensitive periods,” Audiol. Med. 5, 293–306. Vongpaisal, T., Trehub, S. E., and Schellenberg, E. G. (2006). “Song recognition by children and adolescents with cochlear implants,” J. Speech Lang. Hear. Res. 49, 1091–1103. Vongpaisal, T., Trehub, S. E., Schellenberg, E. G., van Lieshout, P., and Papsin, B. C. (2010). “Children with cochlear implants recognize their mother’s voice,” Ear Hear. 31, 555–566. Wei, W. I., Wong, R., Hui, Y., Au, D. K. K., Wong, B. Y. K., Ho, W. K., Tsang, A., Kung. P., and Chung, E. (2000). “Chinese tonal language rehabilitation following cochlear implantation in children,” Acta Otolaryngol. 120, 218–221. Xu, L., Li, Y., Hao, J., Chen, X., Xue, S. A., and Han, D. (2004). “Tone production in Mandarin-speaking children with cochlear implants: A preliminary study,” Acta Otolaryngol. 124, 363–367. Nakata et al.: Speech prosody and cochlear implants Downloaded 08 Jun 2012 to 142.150.190.39. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp