Download shifted - EurekAlert!

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sensorineural hearing loss wikipedia , lookup

Speech perception wikipedia , lookup

Dysprosody wikipedia , lookup

Auditory system wikipedia , lookup

Transcript
What matters more, the right information or
the right place?
• TRACK
– Behavior and the Brain
• SYMPOSIUM.
– Auditory Learning in Profoundly Deaf Adults with Cochlear Implants
– Monday, February 16, 12:30 p.m. - 2:00 p.m.
• TITLE
– What matters more, the right information or the right place?
• AUTHORS.
– Stuart Rosen and Andrew Faulkner
Dept. of Phonetics & Linguistics
University College London
• SPEAKER. Stuart Rosen
– [email protected]
– Telephone: 44 20 7679-7404
1
Cochlear implants have proved a boon for deaf people in providing access to the
auditory world. Yet there is still a great degree of variability in outcome for people
provided with implants, and there are many possible reasons for this. One we will
focus on here concerns the depth to which the electrode is inserted into the cochlea.
All current cochlear implants try to mimic the normal working of the inner ear by
dividing up the sound into separate frequency bands and delivering those bands to
the appropriate nerve fibres. In the normal ear, different groups of hearing nerve
fibres go to different parts of the cochlea, with each part of the cochlea most
sensitive to a particular sound frequency. For children with little auditory
experience, and with much ability to adapt, the exact nerve fibres to which that
information is delivered is probably not very important.
2
But for deafened adults, with many years of auditory experience, the match of
frequency band to auditory nerve fibres seems to be very important, at least initially.
Matching up frequency bands to cochlear locations across the full frequency range
of hearing depends upon a complete insertion of the electrode array, deep into the
cochlea. But electrodes are often not inserted fully, for a variety of reasons. Because
frequency positions in the normal cochlea are ordered from high to low as one
moves from the base of the cochlea (where the electrode is inserted) to its apex, an
incomplete electrode insertion means that the lower frequency areas of the cochlear
are not reached.
There are then 2 obvious choices of what to do in the case of incomplete insertion.
Either one can match frequency bands to cochlear locations, but also lose the low
frequency information corresponding to the cochlear locations not reached by the
electrode. Or, one can decide which are the most important frequency regions to
preserve in the speech, and then present them to electrodes regardless of their
position. This is equivalent to moving the frequency content, or spectrum, of sounds
upwards. The drawback of this approach is that speech is now presented with a
different frequency content.
3
Because of the difficulty of doing well-controlled studies in genuine cochlear
implant patients, our studies (and many performed by others) have used
simulations of the sounds that cochlear implants deliver to patients, played to
normal listeners (examples of these are available as WAV files). Initial studies
by Bob Shannon & his colleagues (another speaker at this symposium) showed
that shifting the frequency content of speech sufficiently could lead to an
immediate and dramatic decrement in intelligibility. We were able to replicate
this finding, assuming an electrode array that was inserted 6.5 mm short. For
understanding words in sentences, performance dropped from about 65%
correct to 0%. For audio examples of these stimuli, and spectrograms, or
Voiceprints, of them, see the next page.
(A spectrogram shows the dynamic frequency content of sounds. Time runs
along the x-axis and frequency along the y-axis. The darkness of the trace
indicates the amount of energy in a particular frequency region at a particular
time. The darker the trace, the more energy.)
4
Simulations of incomplete insertions
0 mm
2.2 mm
4.3 mm
6.5 mm
12 noise-vocoded channels
5
But the listeners in these first experiments were given no chance to adapt to the
altered stimuli. The acoustic characteristics of speech vary a lot from person to
person, because of differences in sex, age, size, accent and emotional state,
among others. So one essential aspect of our abilities as perceivers of speech is to
adapt to changes in the particular acoustic form of the speech. In fact, people can
adapt to extreme changes in the form of speech. One example we have recently
studied (in an MSc thesis by Ruth Finn) is speech that has its spectrum rotated.
See the next page for spectrograms and audio examples of a sentence in its
normal form (top) and rotated (bottom), so its spectrogram looks upside down.
6
A more extreme transformation:
Spectrally-rotated speech
• Rotate the spectrum
of speech around 2
kHz (Blesser, 1972
— low frequencies
become high, and
vice-versa).
• Preserves aspects of
voice melody and
buzziness/ noisiness.
She cut with her knife.
7
The following box plots show performance for identifying
words in unknown spectrally-rotated sentences for a male
and a female speaker for two groups of listeners. The
control group were simply tested on three occasions, but
had no other experience of rotated speech. The
experimental group were tested before, in the middle of,
and after, 6 hours of training by a female speaker using
live connected speech that was spectrally-rotated. The
median, or ‘typical’ score is shown by the horizontal
middle bar on each box. (For more explanation about
boxplots, see:
http://exploringdata.cqu.edu.au/box_draw.htm)
Note that performance in both groups is very low before
training, and does not change for the controls. For the
group who received training, performance increased
significantly. We suspect that performance for the female
speaker improves faster because all the training involved
a female speaker, albeit a different one. Also note that the
6 hours of training is a very small amount compared to
what one would experience through even a few days of
normal experience. So we expect that performance would
improve even further with more training.
8
Identifying words in sentences
(sound alone)
1.0
male speaker
female speaker
.8
.6
.4
.2
control
trained
0.0
N=
8
7
1
8
8
2
session number
8
8
3
8
8
1
8
8
2
8
8
3
9
To return to the question of adapting to incomplete
electrode insertions, we simulated an insertion that was
too shallow by 6.5 mm, and chose to fix the frequency
range of the speech presented rather than lose the low
frequencies. As mentioned before, this is equivalent to an
upward shift in the frequency content of the speech. We
trained listeners to understand this shifted speech for 3
hours. The following box plots show performance for
identifying words in unknown sentences before,
throughout, and after training. Even after such little
training, performance levels increase from near zero, to
about ½ the level possible with the unshifted stimuli.
(Imperfect performance for even the unshifted stimuli is
due to a number of aspects of the processing meant to
simulate what cochlear implants deliver to patients,
including smearing of spectral detail.)
Rosen, S., Faulkner, A., & Wilkinson, L. (1999)
Adaptation by normal listeners to upward spectral shifts
of speech: Implications for cochlear implants. J Acoust
Soc Am 106: 3629-3636.
10
Deleterious effects of spectral shifting
can be ameliorated through experience
Pre-training

Post-training

Words in sentences over 3 hours of experience of continuous speech
11
Given that listeners can adapt to spectral shifts to at least partially, there is still the question of
the extent to which this would better than simply avoiding the necessary adaptation by matching
frequency information to cochlear place (but losing some low frequency components in the
speech). The spectrograms below give audio examples and spectrograms of the effect of having
an electrode fully inserted (top) compared to one that is 8 mm short of full insertion (bottom).
Note the higher range of frequencies (not very important for intelligibility), and missing low
frequency components for the short insertion.
12
We therefore did a direct comparison of shifted vs. matched conditions in a crossover training
study with 3 hours of training per condition. Looking at only at results for sentences, the
boxplots on the next page shows that for the male talker, performance is always better in the
shifted condition, whereas for the female talker, matched is better. This is easily understood
given that male voices have more crucial information in the low frequency region that is lost in
the matched condition. On the other hand, performance in the shifted condition benefits more
from training. This also is easily understood, as in the shifted condition, the information is still
present, but presented in a new way. In the matched condition, crucial information is lost. It also
seems likely that with further training, performance in the shifted condition would improve
further still, leading to improved performance even for the female talker.
Faulkner, A., Rosen, S., & Norman, C. (2001) The right information matters more than
frequency-place alignment: simulations of cochlear implant processors with an electrode array
insertion depth of 17 mm. Speech, Hearing and Language: Work in Progress 13: 52-71.
13
Shifting vs.
Matching:
sentences
Male talker:
shifted > matched
Significant training effect:
Training helps in matched
only when first
shifted
matched
matched
shifted
Female talker:
shifted < matched
Significant training effect:
Training helps more in
shifted
14
In a final example of the ability of listeners to adapt to
transformations of the speech signal, our PhD student
Matt Smith has been simulating the effects of a missing
region of nerve fibres, a so-called hole in hearing.
As the next slide illustrates, it is normally assumed that
the residual auditory nerve fibres necessary for the
functioning of a cochlear implant, are spread reasonably
uniformly through the cochlea.
15
‘normal’ cochlear representation
base
cochlear location
analysis filter bank
high
frequency
low
frequency
apex
16
But suppose auditory nerve fibres do not survive in a
region normally tuned to the crucial mid-frequency
region of speech. Again we have two choices. We can
preserve the frequency-to-electrode relationship and
avoid the need for adaptation, accepting the fact that a
frequency region is dropped from the signal, illustrated
in the next slide.
17
A ‘hole’ can mean dropped frequencies
base
cochlear location
analysis filter bank
high
frequency
low
frequency
apex
18
Or similarly as was done in the shifted condition
previously, we can warp the representation of
frequencies from the hole to adjacent regions, as
illustrated in the next slide. But here we would expect
adaptation to the altered acoustic structure of speech to
be necessary.
19
A ‘hole’ can mean warped frequencies
high
frequency
low
frequency
base
apex
20
Audio examples and spectrograms
21
The boxplots in the next slide show performance for the
three conditions as the listeners receive 3 hours of
experience in the warped condition. Interestingly, even
before any training, the warped condition leads to a
better performance than the dropped condition, but
performance increases markedly for the warped
condition but relatively little for the dropped condition.
This study shows a clear advantage for altering the
acoustic structure of speech to preserve crucial
information, because people show a great deal of
plasticity in adapting to altered acoustic structure.
22
Adapting to a warped spectrum
100
80
60
Condition
40
dropped
20
warped
0
matched
Base1
Session
T1
T2
T3
T4
T5
Retest
23
Conclusions
• Adaptation to spectrally-shifted, warped and rotated speech suggests
considerable plasticity and scope for learning by adult listeners in
general, and implant users, in particular.
• This ability needs to be allowed for in any study of speech processing,
simulated or real. Perceptual testing without allowing opportunity for
learning is likely to seriously underestimate the intelligibility of signals
transformed in acoustic structure.
• Speech processors should deliver the most informative frequency
range irrespective of electrode position.
24
WAV files available: I
File name
Slide number
Description
ice_cream_no_shift.WAV
5 left column
Top
“The ice cream was pink”
Simulation of a cochlear implant (CI) with full electrode insertion.
ice_cream_22mm_shift.WAV
5 left column
One down
“The ice cream was pink” shifted condition
Simulation of a CI electrode incompletely inserted by 2.2 mm
ice_cream_43mm_shift.WAV
5 left column
Two down
“The ice cream was pink” shifted condition
Simulation of a CI electrode incompletely inserted by 4.3 mm
ice_cream_65mm_shift.WAV
5 left column
Bottom
“The ice cream was pink” shifted condition
Simulation of a CI electrode incompletely inserted by 6.5 mm
buying_bread_normal.WAV
7 top
spectrogram
“They’re buying some bread”
Normal speech
buying_bread_rotate.WAV
7 bottom
spectrogram
“They’re buying some bread”
Spectrally-rotated speech
cut_knife_normal.WAV
7 bottom left
“She cut with her knife”
Normal speech
cut_knife_rotated.WAV
7 bottom
right
“She cut with her knife”
Spectrally-rotated speech
green_tomatoes_0mm_match_m.WAV
12 top
spectrogram
“The green tomatoes are small”
Simulation of a cochlear implant (CI) with full electrode insertion.
green_tomatoes_8mm_match_m.WAV
12 bottom
spectrogram
“The green tomatoes are small” matched condition
Simulation of a CI electrode incompletely inserted by 8 mm
25
WAV files available: II
File name
Slide number
Description
birch_normal.wav
21 top
spectrogram
“The birch canoe slid on the smooth planks.”
Simulation of a cochlear implant (CI) with full representation of
auditory nerve fibres.
birch_dropped.wav
21 middle
spectrogram
“The birch canoe slid on the smooth planks.” dropped condition
Simulation of a CI with a mid-frequency hole in auditory nerve
fibres, but preserving the frequency-to-cochlear-place relationship.
Information is lost.
birch_warped.wav
21 bottom
spectrogram
“The birch canoe slid on the smooth planks.” warped condition
Simulation of a CI with a mid-frequency hole in auditory nerve
fibres, in which the representation of the frequency information is
warped. Information is preserved, but represented differently.
26
Thank you!
27