Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
What matters more, the right information or the right place? • TRACK – Behavior and the Brain • SYMPOSIUM. – Auditory Learning in Profoundly Deaf Adults with Cochlear Implants – Monday, February 16, 12:30 p.m. - 2:00 p.m. • TITLE – What matters more, the right information or the right place? • AUTHORS. – Stuart Rosen and Andrew Faulkner Dept. of Phonetics & Linguistics University College London • SPEAKER. Stuart Rosen – [email protected] – Telephone: 44 20 7679-7404 1 Cochlear implants have proved a boon for deaf people in providing access to the auditory world. Yet there is still a great degree of variability in outcome for people provided with implants, and there are many possible reasons for this. One we will focus on here concerns the depth to which the electrode is inserted into the cochlea. All current cochlear implants try to mimic the normal working of the inner ear by dividing up the sound into separate frequency bands and delivering those bands to the appropriate nerve fibres. In the normal ear, different groups of hearing nerve fibres go to different parts of the cochlea, with each part of the cochlea most sensitive to a particular sound frequency. For children with little auditory experience, and with much ability to adapt, the exact nerve fibres to which that information is delivered is probably not very important. 2 But for deafened adults, with many years of auditory experience, the match of frequency band to auditory nerve fibres seems to be very important, at least initially. Matching up frequency bands to cochlear locations across the full frequency range of hearing depends upon a complete insertion of the electrode array, deep into the cochlea. But electrodes are often not inserted fully, for a variety of reasons. Because frequency positions in the normal cochlea are ordered from high to low as one moves from the base of the cochlea (where the electrode is inserted) to its apex, an incomplete electrode insertion means that the lower frequency areas of the cochlear are not reached. There are then 2 obvious choices of what to do in the case of incomplete insertion. Either one can match frequency bands to cochlear locations, but also lose the low frequency information corresponding to the cochlear locations not reached by the electrode. Or, one can decide which are the most important frequency regions to preserve in the speech, and then present them to electrodes regardless of their position. This is equivalent to moving the frequency content, or spectrum, of sounds upwards. The drawback of this approach is that speech is now presented with a different frequency content. 3 Because of the difficulty of doing well-controlled studies in genuine cochlear implant patients, our studies (and many performed by others) have used simulations of the sounds that cochlear implants deliver to patients, played to normal listeners (examples of these are available as WAV files). Initial studies by Bob Shannon & his colleagues (another speaker at this symposium) showed that shifting the frequency content of speech sufficiently could lead to an immediate and dramatic decrement in intelligibility. We were able to replicate this finding, assuming an electrode array that was inserted 6.5 mm short. For understanding words in sentences, performance dropped from about 65% correct to 0%. For audio examples of these stimuli, and spectrograms, or Voiceprints, of them, see the next page. (A spectrogram shows the dynamic frequency content of sounds. Time runs along the x-axis and frequency along the y-axis. The darkness of the trace indicates the amount of energy in a particular frequency region at a particular time. The darker the trace, the more energy.) 4 Simulations of incomplete insertions 0 mm 2.2 mm 4.3 mm 6.5 mm 12 noise-vocoded channels 5 But the listeners in these first experiments were given no chance to adapt to the altered stimuli. The acoustic characteristics of speech vary a lot from person to person, because of differences in sex, age, size, accent and emotional state, among others. So one essential aspect of our abilities as perceivers of speech is to adapt to changes in the particular acoustic form of the speech. In fact, people can adapt to extreme changes in the form of speech. One example we have recently studied (in an MSc thesis by Ruth Finn) is speech that has its spectrum rotated. See the next page for spectrograms and audio examples of a sentence in its normal form (top) and rotated (bottom), so its spectrogram looks upside down. 6 A more extreme transformation: Spectrally-rotated speech • Rotate the spectrum of speech around 2 kHz (Blesser, 1972 — low frequencies become high, and vice-versa). • Preserves aspects of voice melody and buzziness/ noisiness. She cut with her knife. 7 The following box plots show performance for identifying words in unknown spectrally-rotated sentences for a male and a female speaker for two groups of listeners. The control group were simply tested on three occasions, but had no other experience of rotated speech. The experimental group were tested before, in the middle of, and after, 6 hours of training by a female speaker using live connected speech that was spectrally-rotated. The median, or ‘typical’ score is shown by the horizontal middle bar on each box. (For more explanation about boxplots, see: http://exploringdata.cqu.edu.au/box_draw.htm) Note that performance in both groups is very low before training, and does not change for the controls. For the group who received training, performance increased significantly. We suspect that performance for the female speaker improves faster because all the training involved a female speaker, albeit a different one. Also note that the 6 hours of training is a very small amount compared to what one would experience through even a few days of normal experience. So we expect that performance would improve even further with more training. 8 Identifying words in sentences (sound alone) 1.0 male speaker female speaker .8 .6 .4 .2 control trained 0.0 N= 8 7 1 8 8 2 session number 8 8 3 8 8 1 8 8 2 8 8 3 9 To return to the question of adapting to incomplete electrode insertions, we simulated an insertion that was too shallow by 6.5 mm, and chose to fix the frequency range of the speech presented rather than lose the low frequencies. As mentioned before, this is equivalent to an upward shift in the frequency content of the speech. We trained listeners to understand this shifted speech for 3 hours. The following box plots show performance for identifying words in unknown sentences before, throughout, and after training. Even after such little training, performance levels increase from near zero, to about ½ the level possible with the unshifted stimuli. (Imperfect performance for even the unshifted stimuli is due to a number of aspects of the processing meant to simulate what cochlear implants deliver to patients, including smearing of spectral detail.) Rosen, S., Faulkner, A., & Wilkinson, L. (1999) Adaptation by normal listeners to upward spectral shifts of speech: Implications for cochlear implants. J Acoust Soc Am 106: 3629-3636. 10 Deleterious effects of spectral shifting can be ameliorated through experience Pre-training Post-training Words in sentences over 3 hours of experience of continuous speech 11 Given that listeners can adapt to spectral shifts to at least partially, there is still the question of the extent to which this would better than simply avoiding the necessary adaptation by matching frequency information to cochlear place (but losing some low frequency components in the speech). The spectrograms below give audio examples and spectrograms of the effect of having an electrode fully inserted (top) compared to one that is 8 mm short of full insertion (bottom). Note the higher range of frequencies (not very important for intelligibility), and missing low frequency components for the short insertion. 12 We therefore did a direct comparison of shifted vs. matched conditions in a crossover training study with 3 hours of training per condition. Looking at only at results for sentences, the boxplots on the next page shows that for the male talker, performance is always better in the shifted condition, whereas for the female talker, matched is better. This is easily understood given that male voices have more crucial information in the low frequency region that is lost in the matched condition. On the other hand, performance in the shifted condition benefits more from training. This also is easily understood, as in the shifted condition, the information is still present, but presented in a new way. In the matched condition, crucial information is lost. It also seems likely that with further training, performance in the shifted condition would improve further still, leading to improved performance even for the female talker. Faulkner, A., Rosen, S., & Norman, C. (2001) The right information matters more than frequency-place alignment: simulations of cochlear implant processors with an electrode array insertion depth of 17 mm. Speech, Hearing and Language: Work in Progress 13: 52-71. 13 Shifting vs. Matching: sentences Male talker: shifted > matched Significant training effect: Training helps in matched only when first shifted matched matched shifted Female talker: shifted < matched Significant training effect: Training helps more in shifted 14 In a final example of the ability of listeners to adapt to transformations of the speech signal, our PhD student Matt Smith has been simulating the effects of a missing region of nerve fibres, a so-called hole in hearing. As the next slide illustrates, it is normally assumed that the residual auditory nerve fibres necessary for the functioning of a cochlear implant, are spread reasonably uniformly through the cochlea. 15 ‘normal’ cochlear representation base cochlear location analysis filter bank high frequency low frequency apex 16 But suppose auditory nerve fibres do not survive in a region normally tuned to the crucial mid-frequency region of speech. Again we have two choices. We can preserve the frequency-to-electrode relationship and avoid the need for adaptation, accepting the fact that a frequency region is dropped from the signal, illustrated in the next slide. 17 A ‘hole’ can mean dropped frequencies base cochlear location analysis filter bank high frequency low frequency apex 18 Or similarly as was done in the shifted condition previously, we can warp the representation of frequencies from the hole to adjacent regions, as illustrated in the next slide. But here we would expect adaptation to the altered acoustic structure of speech to be necessary. 19 A ‘hole’ can mean warped frequencies high frequency low frequency base apex 20 Audio examples and spectrograms 21 The boxplots in the next slide show performance for the three conditions as the listeners receive 3 hours of experience in the warped condition. Interestingly, even before any training, the warped condition leads to a better performance than the dropped condition, but performance increases markedly for the warped condition but relatively little for the dropped condition. This study shows a clear advantage for altering the acoustic structure of speech to preserve crucial information, because people show a great deal of plasticity in adapting to altered acoustic structure. 22 Adapting to a warped spectrum 100 80 60 Condition 40 dropped 20 warped 0 matched Base1 Session T1 T2 T3 T4 T5 Retest 23 Conclusions • Adaptation to spectrally-shifted, warped and rotated speech suggests considerable plasticity and scope for learning by adult listeners in general, and implant users, in particular. • This ability needs to be allowed for in any study of speech processing, simulated or real. Perceptual testing without allowing opportunity for learning is likely to seriously underestimate the intelligibility of signals transformed in acoustic structure. • Speech processors should deliver the most informative frequency range irrespective of electrode position. 24 WAV files available: I File name Slide number Description ice_cream_no_shift.WAV 5 left column Top “The ice cream was pink” Simulation of a cochlear implant (CI) with full electrode insertion. ice_cream_22mm_shift.WAV 5 left column One down “The ice cream was pink” shifted condition Simulation of a CI electrode incompletely inserted by 2.2 mm ice_cream_43mm_shift.WAV 5 left column Two down “The ice cream was pink” shifted condition Simulation of a CI electrode incompletely inserted by 4.3 mm ice_cream_65mm_shift.WAV 5 left column Bottom “The ice cream was pink” shifted condition Simulation of a CI electrode incompletely inserted by 6.5 mm buying_bread_normal.WAV 7 top spectrogram “They’re buying some bread” Normal speech buying_bread_rotate.WAV 7 bottom spectrogram “They’re buying some bread” Spectrally-rotated speech cut_knife_normal.WAV 7 bottom left “She cut with her knife” Normal speech cut_knife_rotated.WAV 7 bottom right “She cut with her knife” Spectrally-rotated speech green_tomatoes_0mm_match_m.WAV 12 top spectrogram “The green tomatoes are small” Simulation of a cochlear implant (CI) with full electrode insertion. green_tomatoes_8mm_match_m.WAV 12 bottom spectrogram “The green tomatoes are small” matched condition Simulation of a CI electrode incompletely inserted by 8 mm 25 WAV files available: II File name Slide number Description birch_normal.wav 21 top spectrogram “The birch canoe slid on the smooth planks.” Simulation of a cochlear implant (CI) with full representation of auditory nerve fibres. birch_dropped.wav 21 middle spectrogram “The birch canoe slid on the smooth planks.” dropped condition Simulation of a CI with a mid-frequency hole in auditory nerve fibres, but preserving the frequency-to-cochlear-place relationship. Information is lost. birch_warped.wav 21 bottom spectrogram “The birch canoe slid on the smooth planks.” warped condition Simulation of a CI with a mid-frequency hole in auditory nerve fibres, in which the representation of the frequency information is warped. Information is preserved, but represented differently. 26 Thank you! 27