Download Speech coding in aids for the deaf: An overview of research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Telecommunications relay service wikipedia , lookup

Hearing loss wikipedia , lookup

Noise-induced hearing loss wikipedia , lookup

Audiology and hearing health professionals in developed and developing countries wikipedia , lookup

Olivocochlear system wikipedia , lookup

Auditory system wikipedia , lookup

Sensorineural hearing loss wikipedia , lookup

Dysprosody wikipedia , lookup

Speech perception wikipedia , lookup

Transcript
Dept. for Speech, Music and Hearing
Quarterly Progress and
Status Report
Speech coding in aids for the
deaf: An overview of
research from 1924 to 1982
Risberg, A.
journal:
volume:
number:
year:
pages:
STL-QPSR
23
4
1982
065-098
http://www.speech.kth.se/qpsr
9
STGQPSR 4 / 1982
11.
A.
SPEECH AND H E A R I N G DEFECTS AND AIDS
AY OVERVIEW OF RESEARCH FROM
SPEECH CODING I N AIDS FOR THE DEAF:
Arne Risberg
Abstract
The aim o f speech coding i n a i d s f o r t h e d e a f i s t o a d a p t t h e
acoustic speech code t o t h e sensory c a p a b i l i t i e s of a limited residual
hearing, a d i r e c t stimulation of t h e auditory nerve, o r t o a transnission through t h e visual o r t a c t i l e sensory system. In t h e 1920°s, Gault
(1926) s t a r t e d h i s work on transmission of speech through t h e t a c t i l e
sense. I n 1925 Perwitzschky suggested t h a t t h e speech perception a b i l i t y o f a h e a r i n g i m p a i r e d p e r s o n w i t h h e a r i n g o n l y i n low f r e q u e n c i e s
could b e improved by frequency lowering. During t h e s i x t y y e a r s from
1922 t o 1982, many attempts have been made t o develop speech coding a i d s
f o r t h e deaf. A s e l e c t i o n of t h e more important a r t i c l e s published up t o
1979 h a s been compiled by L e v i t t , P i c k e t t , & Houde (1980). I n s p i t e o f
g r e a t e f f o r t s , v e r y few a i d s based a speech recoding t e c h n i q u e s a r e
today used by deaf persons. The only exception seems t o be some visual
a i d s used f o r speech t r a i n i n g , a few persons t h a t use t a c t i l e aids, and
200 t o 300 p e r s m s t h a t use cochlear implants of d i f f e r e n t types. There
a r e s e v e r a l r e a s o n s f o r t h i s l a c k o f s u c c e s s . I n t h i s a r t i c l e I have
t r i e d t o give an overview of t h i s area of research and development. Work
on amplitude compression systems has not been included.
Is t h e a u d i t o r y s y s t e m s p e c i a l ?
I n a paper presented a t the Wasllinyton Conference on Speech-Analy~i n g Aids f o r t h e Eeaf i n 1967, Liberman
&
a 1 (1968) suggested t h a t t h e
explanation f o r t h e d i f f i c u l t i e s experienced i n learning t o read visual
transforms of t h e acoustic speech signal, t h e sonagram, was t h a t speech
i s a s p e c i a l code a1-d t h a t a special decoder i n t h e brain of the l i s t e n e r i s needed. This decoder only works f o r t h e auditory input. A tactuall y o r v i s u a l l y p r e s e n t e d speech s i g n a l does n o t have a c c e s s t o t h i s
decoder and t h i s cannot be changed by training.
The support t h a t Liberman
&
a1 presented f o r t h e i r hypotheses was
t h e limited success achieved by people t h a t have t r i e d t o l e a r n t o read
sonagrams, e s p e c i a l l y t h e r e s u l t s o f t h e experiment made by P o t t e r ,
Kopp, & Green a t B e l l Telephone L a b o r a t o r i e s (1347).
p a p e r , Cole
&
I n a recent
a 1 (1980) q u e s t i o n t h e hypot'rleses of L i b e r n a n & a l . I n
s p i t e o f t h e r a t h e r p r i m i t i v e t e c h n i c a l d e v i c e used by P o t t e r & a l l a
12-channel spectrum analyzer, t h a t presented Cle energy i n t h e frequency
range from 300-3000 H z on a rnoving phospllorous b e l t , t h e most motivated
s u b j e c t could read a vocabulary of 800 words a f t e r about 200 h o u r s of
training. binpared t o t h e amount of t r a i n i n g given a child i n learning
*
Invited l e c t u r e a t t h e c o l l q i u m on "Speech Processing Aids and Coch-
lear Implants", W t i n g e n , BRD, September 1982.
ze6
.f
I
I
i
g
-.
F:
b
1
Hz
--I--
FREQUENCY
100
SOURCE FEATURES
VOICE
NOISE
TRANSIENT
SILENCE
I
1
I
1
I
0
I
I
I
I
I
1
.5
S
P
EE
CH
C
O
D
1
1
I
1.0
sec
1
N G
f
Table 1.
Table 2.
Visually contrastive phoneme groups ( W w a r d
1.
p b m
2.
3.
w r
f v
4.
t d n 1 ~ 6 s z t J d 3 j 3j k g h
&
Barber, 1960).
Wsults f r o m a study of t h e a b i l i t y t o identify prosodic
contrasts during lipreading (Risberg & Lubker, 1978).
T e s t material
Guessing %
vowel length
50
Range %
Mean %
Normal hearinq
65-80
72.0
6.4
Hearing impaired
55-80
70.4
7.8
75-100
85.7
8.9
80-100
90.8
5.9
Normal hearing
60-75
67.9
6.4
Hearing impaired
45-85
65.0
14.8
6.3-75.0
42.8
22.0
Syllable stress
50
Normal hearing
Hearing impaired
Juncture
-hasis
SD %
50
50
Stress-pattern in two-syllable words could in most cases be identified.
The selection of code in a lipreading aid
In describing the sonagram in Fig. 1, it was said that the acoustic
speech signal could be seen as the combined effect of a source and a
filter. The same model can be used in speech perception. The perception
is then based on the parallel processing of source and filter information. This model is very suitable to the work on lipreading aids. During
lipreading, the filter information, that is, the place of articulation
of different vowel and consonant phonemes, can with reasonable success
be seen, whereas source features are less visible. A lipreading aid
should then primarily be designed to transmit information on source
features, voice, noise, transient, and silence, plus changes in the
fundamental frequency. These features will also relatively well describe
the manner of articulation of consonants. A n exception might be nasalization and lateralization.
The result of an experiment that supports this is shown in Fig. 2
(Risberg & Lubker, 1368). In this study, older hearing impaired persons
were tested in three situations: only listening to lowpass filtered
speech with a cutoff frequency of 180 Hz, only lipreading, and lipreading plus listening to lowpass filtered speech. The test material was
unknown sentences read by a female speaker. The great improvement in the
audio-visual situation supports the above conclusion. During this severe
filtering, mainly source features and prosodic features could be perceived, as shown by Miller & Nicely (1955). Some of the good results
obtained by the subjects in this experiment were based on the use of the
intonation pattern (fundamental frequency variatims) in the seutences.
That this information is vital to the lipreader has been shown by the
group at The University College in Lmdon, that works on extracochlear
implants (Rosen & al, 1981; Fourcin & al, 1982). In the experiment the
task of the lipreader was to track word by word what a speaker read £ram
a short story in two situations, only lipreading and lipreading supported by acoustically transmitted information of fundamental frequency. The
number of correctly reprted words per minute was measured. With l i p
reading only, the subjects could track with a speed of 10 to 20 words
per minute. When lipreading was supplemented by information of fundamental frequency variations they tracked with a speed of 25 to 40 words per
minute. The tracking procedure has been developed at the Central Institute for the Deaf in USA (De Filippo & Scott, 1979). The ~aethodgives a
good iradication of the e f f e c t i v i t y of a communication aid.
The r e s u l t s shown i n Fig. 2 i n d i c a t e t h a t t h e r e i s good hope t h a t
speech recoding aids can be designed t h a t give substantial help during
l i p r e a d i n g . The amount of a c o u s t i c speech i n f o r m a t i o n t r a n s m i t t e d
through the lowpass f i l t e r i n the f i r s t experiment is very limited. I t
ought t o be possible to transmit t h i s information in d e d form through
another sense modality, a small residual hearing, o r by a d i r e c t stimulation of the auditory nerve.
Recoding a i d s f o r t h e d e a f . Some examples
It i s beyond the scope of t h i s overview to give a detailed d e s c r i p
t i o n of a l l experiments t h a t have been made w i t h d i f f e r e n t coding approaches i n technical a i d s f o r t h e deaf. In the following, some examples
are selected that i l l u s t r a t e the problems encountered w i t h t h e d i f f e r e n t
coding s t r a t e g i e s and the d i f f e r e n t sense modalities used. In the disc u s s i o n on a u d i t o r y receding, o n l y frequency lowering systems a r e
included. A review of work both on amplitude compression and frequency
lowering has been published by Braida
&
a1 (1979).
Auditory recoding
Work on a u d i t o r y recoding t o h e l p t h e deaf was s t a r t e d by Perwitsckiky,
who i n 1925 suggested that frequency lowering could improve
hearing impaired subjects' a b i l i t y to perceive speech. %is suggestion
was based on the knowledge of t h e r e l a t i v e importance of speech informat i o n i n t h e frequency spectrum ( F l e t c h e r , 1953) and t h e o b s e r v a t i o n
that many hearing impaired persons have b e t t e r hearing f o r low frequen-
cies than f o r high. In the 1 9 2 0 ' ~ frequency
~
lowering systems could not
be technically realized. During the l a s t 30 years, however, many differ-
e n t experiments have been ~nadew i t h frequency lowering. The t e s t e d
systems can roughly be grouped i n t o three categories: l i n e a r frequency
lowering, nonlinear frequency lowering, and selective frequency lowering, s e e Fig. 3. A review of published
by Braida
&
experiments h a s been compiled
a 1 (1979).
In l i n e a r frequency lowering, the whole frequency range of speech
is lowered, o r compressed, t o f i t i n t o t h e frequency range of t h e re-
s i d u a l hearing. Lowering r a t i o s of up t o f o u r have been used. This h a s
been achieved e i t h e r by means of t a p e r e c o r d e r s w i t h a r o t a t i n g head
( ~ e k e n ,1963), channel voccders (Denes. 1967; Pimonow, 1962), or systems
based on t h e FFT-algorithm (Block
&
Boerger, 1980; Hicks
&
a l , 1981).
ORIGINAL
I
I
RESIDUAL
HEARING
LINEAR
FREQUENCY
LOWERING
NONLINEAR
FREQUENCY
LOWERING
I /;---:-
SELECTIVE
I\
FREQUENCY
LOWERING
I
I I L---~iIunI!l
---
Fig. 3.
Different technical systems for frequency lowering.
STL-QPSR 4/1982
Linear frequency lowering rnight result in increased rnasking of tlle high
frequencies by the low frequecies. Tb overcome this r,~asking,nonlinear
frequency lowering systems have been tried l lock & Boerger, 1980;
Hicks, 1981). In these experiments, a technique based on the FFT-transform was also used. In selective frequency luwerh~g,only high frequency
fricative sounds are changed to low frequency sounds (Risberg, 1969).
Different technical solutions have been used, e.g., a chan11el vocuder
technique (Ling & Druz, 1967), modulation with a carrier around 4000 Hz
Johansson, 1959; Velmans, 1972), and shift registers with read-out at
lower speed than read-in (Knorr, 1976). In some systems a
voiced/unvoiced detector is included to avoid lowering of voiced sounds
as this might introduce undesired distorsions (Risberg, 1963; Knorr,
1976). One hearing aid with frequency lowering has been ~omr~~ercially
manufactured by Oticon (Oticon TP72) based on the lowering system developed by Johansson (1359). In an evaluation study some positive results
were obtained (Foust & Gengel, 1973).
The evaluation studies of different frequency lowering systems have
in many cases been made with normal hearing subjects. A hearing loss has
then been simulated by means of a lowpass filter, usually with a cutoff
frequency around 1000 Hz. In all experiments it has been sldown that the
normal hearing subjects can improve their ability to identify speech by
means of the lowered speech signal. When experiments have been made with
hearing impaired subjects, it has also been shown that they, wit31 training, can improve their ability to understand speech over the system,
but when an equal amount of training has been given with the original
speech signal no difference has, as a rule, been obtained between direct
transmission and lowering (Oeken,1963; Block & Boerger, 1980). In the
experiment by Block & Boerger (1980), however, better results were
obtained on CV-syllables wit11 lowering but about the sane results on
words and sentences.
The results of evaluation experiments with frequency lowering have
been disappointing. Theoretically, it seems reasonable to expect that
some irnprovernent in speech perception ability could be obtained with
this technique. There are probably several reasons for this lack of
success. In some systems inappropriate d i n g techiques that have introduced disturbances have been used. The training time lzas in all studies
been too short. In several studies it has also been shown that the
frequency selectivity or frequency discrimination ability has deteriorated in sensorineural hearing loss (Zwicker & Schorn, 1978; Gellgel,
1973). It is, therefore, possible that subjects that could not perceive
the differences between the lowered speech sounds have been used. This
is a very likely explanation when severely hard of hearing subjects have
been used (Ling & Druz, 1967; Foust & Gengel, 1973). Even with a
moderate hearing loss, frequency discrimination ability can be affected.
Fig. 4 shows the audiogram and frequency discrimination ability of a
subject with a moderate loss. In spite of a better hearing in the right
ear, frequency discrimination ability is poor in this ear but almost
normal in the left ear. The speech signal is severely distorted in this
subject's right ear.
One of the problems with many of the lowering systems tested has
been that they only existed as laboratory equipment or as a computer
program. The amount of training that could be given was, therefore, limited. It is very likely that very long training times are needed before
full use can be made of a frequency lowering signal. The results from
the experiment by Block & Boerger (1980) showed that better results were
obtained on syllables with the lowering system than with the direct signal. This night indicate that syllable identification could be improved
after the short training period but that word and sentence intelligibility does not improve so quickly.
A possible strategy for future work on frequency lowering might be
to concentrate the work on systems that can easily be made wearable.
More work is needed on the selection of suitable subjects and in all
experiments some measurements of the auditory capacity of the subjects
must be included. Fig. 5 shows the audiograms of five subjects with a
sensorineural hearing loss. In the bottom part the results obtained by
these subjects on tests with monosyllabic phorletically balanced word
lists (PB) and on identifying keywords in sentences ( S ) are shown. In
both cases the speech material was presented in a silent room. Subject A
had no difficultiestoperc~ivespeechinthissituation. Thisisatypical
audiogram for an older patient with an acquired hearing loss, often
diagnosed as presbyacusis. Typically for these subjects is that they
complain about their difficulties to perceive speech in the presence of
noise but have no difficulties in silence. A recoding system for these
subjects must then be designed for this situation. The high age of these
subjects might mean that they have difficulties in learning to use a new
acoustic code. Subject B has a more flat audiogram. In this case the
dynamic range is also limited due to recruitment. This night, therefore,
be a case where amplitude compression can be of some help.
__
__-
__
FREQUENCY, Hz
-
.
-
- --
- - -- -
-
Fig. 4. Puretone
audiogram and result of frequency
discrimination
masuremmts on a
subject with large
differences infrequency discrimination ability in
the t m ears.
FREQUENCY, Hz
Fig. 5. Puretone
audiograms and result obtained on
speech perception
tests w i t h five
hearing i n p i r e d
persons. PB=results on a list
w i t h mnosyllabic mrds, S = results on a test
with sentences.
STGQPSR 4/1982
Subject C is a case of low frequency hearing wit11 very low results
on the speech tests. This is the type of subject that needs help from a
frequency lowering system. An ordinary hearing aid is in these cases of
very little help. Speech perception is only possible with simultaneous
lipreading and even in this situation speech perception is often difficult. Fig. 6 shows results that, at least theoretically, indicate, that
selective frequency lowering can be of help in these cases (Risberg,
1965). The experiment was made with the fricative lowering system of
Johansson (1959). Normal hearing persons were used and hearing impairment was simulated by means of lowpass filtering. Ttie test material
consisted of CV-syllables and monosyllabic words. iJith frequency lowering, an increase in the per cent correct identified words was obtained.
This was coupled with an increase in per cent correctly perceived
manner of consonant articulation. As subjects with the kind of hearing
loss that was simulated must rely on lipreading, this increased the
ability to identify the manner of consonant articulation is important. A
technical aid with selective frequency lowering might then be a good
lipreading aid. A small study that also seems to show this has been made
by Spens & Martony (1972).
Subject D has poor hearing in the whole frequncy range. In this
case, frequency discrimination ability was also shown to be very abnormal. It is, therefore, not likely that this subject can be helped by
means of a frequency lowering system. A system that improves the ability
to perceive some source features and fundamental frequency variations
might be used. Audiogram E follows closely the threshold of vibration in
the ear (Norber, 1967). This is an audiogram from a cwngenitally totally
deaf child. Auditory stimulation in the ear can in this case probably
not give more than tactile stimulation on a finger, for example.
Reding to a visual signal
As an information tranmsission channel in an aid for speech perception, the visual channel has the limitation that it makes simultaneous
lipreading difficult. In 1968, Upton showed a solution to this problem.
Speech information was presented on the lipreader's eyeglasses by ineans
of a number of small lamps. In a later design this system was changed to
a system based on light-emitting diodes that projected signals on a
mirror on the eyeglasses. The lipreader then saw the speaker's face
surrounded by a changing light pattern.
d
0-
o ONLY FILTER
X TRANSPOSER
+
FILTER
1
/
MANNER
LX
0
o
w
500
Fig. 6.
1k
LOW-PASS CUTOFF
FREQUENCY
2k
Per cent correct identified mnosyllabic
mrds and per cent correct i d e n t i f i e d m ner and place of consonant articulation.
Comparison between results obtained with
lcwpass f i l t e r i n g only and with frequency
loweringplus lowpass f i l t e r (Risberg, 1965).
Tactile aids and cochlear implants
Tactile aids and cochlear implants can for two reasons be grouped
together. The first is that in many cases the speech d i n g problem has
been approached in very rnuch the sarne way. The other reason is that it
often has been questioned if a single channel cochlear implant can give
more speech information than what can be obtained by means of a tactile
aid.
Coding strategies
The coding strategies used in many tactile aids and cochlear implants can roughly be grouped into three categories: direct transmission, spectral coding, and feature coding. Sometimes approaches have
been used that can be described as a combination of these categories.
In the first approach, direct transmission, the acoustic signal is
directly transmitted to the selected sensory modality by means of a
vibrator or an electrode inserted into the cochlea or placed outside the
cochlea. Sometimes the speech signal is adapted to the limited dynamic
properties or limited frequency range of the modality. In tile secvnd
type of code, spectral coding, the main part of the information in
speech is assumed to be transmitted by the time-varying distribution of
energy in the frequency domain. This approach results in a technical
system consisting of a number of filter channels. The energy in the
different channels is transmitted to the subject by means of a suitable
code. Based on experiments with channel vocoders it seems that six to
ten channels are a minimum number required. Fig. 7 shows the results of
experiments made by Hill & a1 (1968) and Zollner (1979). In these experiments a channel vocoder was used where the synthesis filters were replaced by sinusoids with the same frequency as the center frequency of
the synthesis channels. The figure shows that with only six channels
about 70 % of the vowels and the consonants are correctly identified and
90 % of the monosyllabic words. These results have been interpreted to
indicate that in a speech coding aid a maximum of ten channels are
needed to get a good ability to perceive speech over the system. No
transmission of the source features, voice and noise, or changes in
fundamental frequency have been considered to be necessary. The results
shown in Fig. 7 have, however, been obtained with normal hearing listeners. This means that a sensory system with a good time and frequency
resolution was used. It is not likely that this good resolution can be
obtained when, e. g., the tactile sense is used or in direct stimulation
of the auditory nerve.
I
0
t
11
MONOSYLLABIC WORDS
ZOLLNER 1979
o CONSONANTS
1
x VOWELS
HILL, ET AL 1968
NUMBER OF SlNUSOlOS
Fig. 7.
Speech perception obtained in a channel vocoder with different
manner of channels. As synthesis siqrals, sinusoids with the
same cer,ter frequency as the synthesis channel were used
(Hill & al, 1968; Zollner,1979)
.
-
-
BP
lo
* 5L0
P
Hz
-MODU-
BONE CONDUCTION
VIBRATOR
LATOR
J
-
13--
AMPLIFIER
BP9
B P 8
*
*
I
I
I
-
L P
_(
50Hz
-
I
I
I
I
I
*
LP
50Hz
LATOR
I
I
I
I
I
I
I
1
I
I
MODULATOR
I
Fig. 8.
I
MODU-
a
Tactile speech-coding based on a channel voccder.
i .
.
In the future, speech perception coding systems are seen as the
recognition of a number of acoustic features such as: periodic, noise,
fundamental frequency variations, the frequency of the formants, formant
transitions, time relationbetweendifferentacoustic elements, etc.
Some of these features are more important than others. Some of them cal
be identified during lipreading and some cannot. Based on an opinion of
the relative importance of the features, a selection is made and the
selected features are transmitted in a suitable code. In some systems
different phonetic classes, such as voiced plosive, unvoiced fricative,
nasalized, etc. are automatically recognized and coded. This automatic
recognition increases the possibility to select a suitable code.
Tactile aids
Gault started his work on tactile recoding in the 1920's (1926).
From the beginning he used single channel systems but he gradually
realized that the small usable frequency range, only up to 500 Hz, and
the poor frequency discrimination ability of the tactile sense made it
necessary to use a system where frequency was substituted for place of
stimulation. Gault could show that deaf and normal hearing subjects
could learn to identify some words with single channel tactile aids and
that they could get help during lipreading (1926).
During the 1950's the interest in tactile recoding increased as a
result of the invention of the channel vocoder. Tactile systems were
built of the basic type, shown in Fig. 8,with the number of channels
ranging from 10 to 24. Evaluation studies showed that subjects could
learn to detect syllabic structures and gross spectral properties and
that the systems gave support during lipreading (Pickett, 1963). The
effect was, however, small. Liberman & a1 (1968), in the paper quoted
earlier, suggested that this was due to the existence of a special
decoder in the brain of the listener and that this decoder only worked
for auditory signals. Kirman (1974) made a review of published studies
on tactile speech communication and came Lo the conclusion that a more
probable explanation was that the tactile code used was less suitable to
the transmission of speech signals. He suggested that a matrix of vibrators instead should be used. Based on his suggestion several systems
of this type have been tested. In some of them a frequency-intensity
pattern is transmitted as in the earlier systems (Sparks & al, 1978) and
in some of then a time-frequency-amplitude pttern is trarlsrnitted s p n s ,
ENGELMANN
0
/
M/
DIFFERENT TACTILE At DS
Fig. 9.
Type and placerrrent of the vibrators in the differe n t t a c t i l e speech aids carpred by Spens ( 1980)
.
SYSTEM
TYPE
Originator
Correct
Originator
Fig. 10.
Wsults obtained in the T r i s o n of scme tactile speech a i d s (Spens, 1980).
STL-QPSR 4/1982
from the speech signal and how to transmit them in a wearable technical
aid.
Another approach in the present studies of tactile aids is to make
the aids comfortably wearable. One objection to most of the earlier work
on tactile speech transmission is that the systems are so bulky that
they can only be used in the laboratory which will result in very short
training periods. An alternative approach is then to start from the
requirement that the aid must be easily wearable and tile11try to irlcorprate as much of the wanted recoding power as possible. This approach
has in our laboratory resulted in a very simple aid where an optimum
recoding is made only for the time-amplitude information. The electronics is buildt into a case of a Wy-worn hearing aid. Preliminary studies with normal hearing subjects show that the aid gives some support
during lipreading. Fig. 11 shows tl~eresult of an evaluation study. Two
groups of normal hearing subjects tried to lipread sentence material.
Half of the group started with only lipreading and half started with
lipreading plus the tactile aid. After five to six lists they changed
the situation. As can be seen, both groups obtained better results when
the vibrator is used. The aid has also been tried by several persons
with an acquired total deafness. They find it very useful, especially to
get information about environment sounds. The tactile signal also makes
lipreading less tiring.
Cochlear implants
Work on direct stimulation of the auditory nerve, cochlear implants, has been going on at several laboratories since the end of the
1960's. Most of the work on cochlear implants has been based on the
assumption that the only manner in which it could be possible to obtain
sorne ability to perceive speech in such systems was to use a nultichannel approach, that copies the frequency-analyzing ability of the peripheral auditory system. Single channel systen~swere believed only to be
able to transmit information about speech rhythm and some information
about environmental sounds (Zwicker & Zollner, 1980). The results reported by House & al (1979) and also the results of the evaluation of 13
patients equipped with single channel systems in the study by Bilger
(1977) supported this view.
Psycllophysical studies during electric stimulation by means of an
electrode inserted in the cochlea give pitch sensations that depend on
the site of stimulation. This pitch sensation changes, however, with the
frequency of stimulation. The usable dynamic range is very s m a l l , often
o n l y 6 t o 10 dB. Based on t h e s e r e s u l t s it seems t h a t a p o s s i b l e code,
then, could
be t o use a multichannel system with constant stimulation
frequency and t o a d a p t t h e a m p l i t u d e range t o t h e u s a b l e range d u r i n g
electric s t i m u l a t i o n . This t y p e of coding i s used by Chouard (1977).
Based on t h e same s t u d i e s t h a t were used t o d e s i g n t a c t i l e spectrum
systems, t h e necessary number of channels w a s estimated t o be t e n
&
a l , 1968; Z o l l n e r , 1979).
i ill
The s e l e c t e d code v e r y much resembles
t a c t i l e spectrum coding. I n b o t h s i t u a t i o n s t h e d i f f e r e n c e between
voiced and unvoiced sounds is not coded e f f i c i e n t l y and
intonation is
not coded a t all. Strong energy i n one part of t h e frequency range a l s o
masks energy i n other parts.
The cochlear implant group i n Australia lias more d i r e c t l y applied
knowledge of t h e importance of d i f f e r e n t acoustic speech elements
d e s i g n their i m p l a n t (Tong
&
to
a l , 1981). T h e s e l e c t e d code c a n be de-
s c r i b e d as a f e a t u r e code. The frequency o f t h e midfrequericy range
(mainly the frequency of the second formant) is coded as the position of
stimulation on t e n electrodes i n the cochlea. Stimulation frequency is
p r o p o r t i o n a l t o the fundamental frequency and f o r unvoiced sounds t h e
stimulation frequency i s lowered t o 70 Hz, w h i c h gives a rough percept.
From a speech coding point of view this c d e seen1 r a t h e r promising and
the code is also well matched t o simultaneous speech-reading. Evaluation
studies report t h a t one subject could c o r r e c t l y i d e n t i f y 65 % of spondee
words i n a known set of 16. H e could a l s o i d e n t i f y 16 % of unknown
simple sentences (Tong
&
a l , 1981).
The lnultichannel systems r e s u l t i n a bulky e l e c t r o n i c u n i t t h a t
must b e implanted and a bulky, power consuming, e x t e r n a l s t i m u l a t o r .
Even i f e l e c t r o n i c c i r c u i t r y
gradually
can be n i n i a t u r i z e d it is
l i k e l y that it w i l l be d i f f i c u l t t o reduce the s i z e and weight of these
systems s o t h a t t h e y can b e e a s i l y wearable. An i m p o r t a n t q u e s t i o n is
then i f these multichannel systerns give much b e t t e r r e s u l t s than s i n g l e
channel systems. Recently, r e s u l t s have been published t h a t makes it
necessary t o answer t h i s question ( ~ b x h n a i r - ~ e s o y e&r a l , 1981; Michelson
&
Schindler, 1981).
In a single chanrlel approach t o e l e c t r i c stimulation of the auditory nerve no attempt i s made t o simulate t h e frequency analyzing abilit y of the peripheral auditory system. Instead t h e t h e domain properties
of the systern a r e used ( ~ e i d e l ,1380). Experirnerlts have shown that with
e l e c t r i c stirnulation i n the cochlea subjects can scale p i t c h up t o 400
LIPREADING
o WITHOUT VIBRATOR
W I T H VIBRATOR
TRAINING SESSION
Fig. 11.
Results from evaluation of simple singlevibrator tactile lipreading aids.
7
AUDITORY, SL 40 dB
20
50
100
200
500
Ik
2k
5k
10k
FREQUENCY, Hz
Fig. 12.
Comparison between frequency difference ability obtained
with auditory stimulation (Wier & al., 1977), tactile stimulation (Rothenberg & al, 1977; Goff, 1967), and direct electrical stimulation of the auditory nerve (Bilqer, 1977; Fischer,
1981).
S : CK
RESPONSE
Fig. 13.
S: SS
RESPONSE
V m l confusions reported by Hochmair-Desoyer & a1 (1981). The vowels
are arranged in groups with about the same f i r s t forrrrant frequency.
that can be obtained with a l l subjects or i f these four subjects represent an optimum. Similiar results have been reported by Michelson &
Schindler (1981).
Stimulation outside the cochlea, extracochlear stimulation, i s
attractive as no irreversible surgical operation has t o be made. This
technique i s studied i n several laboratories. Fourcin & a1 (1982) i n
London have concentrated their work on the transmission of prosodic features as t h i s w i l l result i n an aid that w i l l give good help during
speechreading.
Cochlear implants and hearing a i d s
The very good results that have been reported from experiments with
cochlear implants w i l l with certainty result i n a rapid increase of
experiments i n this area. I t i s then of interest to compare the results
that have been reported from these studies with results that are obtained with hearing aids on severely hard of hearing subjects. Fig. 15
shows results from measurements of speech perception ability by means of
a t e s t that consisted of twelve known spondee words. The results are
plotted against the mean hearing loss for tihe frequencies 500, 1000, and
2000 Hz.
I n the figure results on similar t e s t s reported from cochlear
implant studies are shown. CK, FW, LP and SS are from the Vienna group
(Hochmair-Lksoyer
&
a l l 1981) and MC1 from the Australian group (Wng &
a l l 1901). As can be seen, the reported results from cochlear implants
are comparable t o a hearing loss i n the order of 90 t o 95 dB. The good
results from subjects with hearing aids with a mean loss of more than
100 dB are from persons with a combined conductive and sensorineural
hearing loss. The results shown i n the figure indicate that w i t h a
cochlear implant of the type used tday, a totally deaf person might be
changed into a person with a severe hearing loss.
Sane conclusions
The work on speech d i n g in aids for the deaf has, as was pointed
out i n the introduction, very rarely resulted i n technical aids that are
used by the deaf. I n this overview some reasons for this lack of success
have been discussed. The rapid development around large scale integration of electronic circuitry rnakes it possible t o realize ever1 very
complex speech processing algorithms i n compact hardware. The problem
is, however, to specify what the electronic circuit should do. Tb solve
this problem basic research is needed around speech processing and the
importance of different speech elements for the perception of speech
both with and without the support of lipreading. More knowledge is also
needed about the capability of the sensory system selected for signal
transmission.
The relative success recently achieved in the work on cochlear
implants will result in an increased activity in this area. In many
cases it is difficult to explain published results from evaluation
studies based on our present knowledge about speech perception and the
neurophysiology of the auditory system. Basic research is needed around
different aspects of speech signal transmission by means of electrical
stimulation of the auditory nerve. This can be achieved by means of a
closer cooperation between groups working on implants and speech research groups. This coupling can also result in better designed studies
for the evaluation of the speech perception ability achieved with different types of implants and different speech d i n g systems.
References
Bilger, R.C. and Hopkinson, N.T. (1377): "Hearing performance in the
86, pp.
auditory prosthesis", Ann. Otol. Rhinol. Laryngol., Suppl 38, 76-91.
Bilger, R.C.( 1977): "Psychoacoustic evaluation of present prostheses",
86, pp. 92-140.
Ann. Otol. Rhinol. Laryngol., Suppl 38, Block, R. & Boerger, G. ( 1980): "Harverbessernde Verfahren nit Ejandbreitenkompression", Acustica 45, pp. 294-303.
Braida, L.D., Durlach, N.I., Lippmann, R.P., Hicks, B.L., Rabinowitz,
C.M., & R e d , C.M. (1979): "Hearing aids - A review of past research on
linear anplification, amp1itude cori~pression,and frequency lowering",
ASHA Monographs No. 19.
Chouard, C.R. and MacLeod, P.(1977): "Auditory prosthesis equipment", US
patent 4 207 44 1980; French patent 1977..
Cole, K.A., Rudnicky, A.I., Zue, V.W., & Reddy, D.R. (1900): "Speech as
patterns on paper",pp. 3-50 in Cole, R.A. (Ed): Perception and Production of Fluent Speech, Lawrence Erlbaum Ass.Publ., Hillsdale, W.
Cornett, R.O. (1967): "Cued speech", A n . Ann. Deaf. 112, pp. 3-13.
De Filippo, C.L. & Scott, B.L. (1978): "A method for training and
evaluating the reception of ongoing speech", J. Acoust. Soc. Am.63,
pp. 1186-1192.
STL-QPSR 4/1982
Denes, P.B. (1963): "On the statistics of spoken English1',J. Acoust.
Soc. Am. 35, pp 892-904.
Denes, P.B. (1967): "On the rootor theory of speech perception", pp.
309-314 in ?J.FJ. Dunn (Crl): Plodels for Perception of Speech and Visual
Fom, llIT Press, Cambridge, MA.
Farwell, R.f.1. (1376): "Speech reading: a research review", An. Ann.
Deaf. 122, pp. 19-30.
Fischer, R. ( 1981) : "llethodik und Experimente zur psychoakustischen
Evaluation des durch Cochleaqrcthesen wiedererlangbaren trcjrvermijgens in
sensorineural Ertaubten", Diss. Techn. UniversitZt ;Jien.
Fletcher, H. (1953): Speech and Hearing in Communication, Van Noostrand, Princeton, NJ.
Fourcin, A.J., Douek, E.E., Floore, B.C.J., Rosen, S.M., Walliker, J.R.,
Howard, D.M., Abberton, E., & Frampton. (???) (19132):"Speech perception
by promontory stimulation", paper presented at the Int. Cochlear Prosthesis Conf., New York, NY.
FOUS~,K.O. & Gengel, R.W. (1373): "Speech discri~ninationby sensorineural hearing-impaired persons using a transposing hearing aid", Scand.
Audiol. 2, pp. 161-170; reprinted in Levitt, Pickett, & Houde (1980),
pp. 227-236.
Gault, R.H. (1926): "Touch as a substitute for hearing in the interpretation and control of speech", Arch. Otolaryngol. 3, pp. 121-135; reprinted in Levitt, Pickett, & Hode (1980), pp. 243-246.
Gengel, RW. ( 1973): "Temporal effects in frequency discrimination by
hearing-impaired listeners", J. must. Soc. Arn.54, pp. 11-15.
Gengel, RW. (1976): "~ptofi's wearable eyeglass speechreading aid: history and current status",pp. 291-299 in Hirsh, S.K., Eldredge, D.H., &
Silverman, S.R. (Ms.) Hearing and Davis: Essays Honoring Hallowell Davis,
- Washington University Press, St. muis, MI. 1976.
Gescheider, G.A. (1970): "Some cor;7parisons between touch and hearing",
IEEE Trans. on man-machine systems, ,pp. 28-35.
Goff, G.D. (1967): "Differential discrimination of frequency of cuta74, pp. 294-233.
neous mechanical vibration", J. Exp. Psychology Hicks, B.L., Braida, L.D., & Durlach, N.I. (1981): "Pitch invariant
frequency lowering with nonuniform spectral compression", pp. 121-124 in
Proc. IEEE, Int. Conf. Acoust. Speech and Signal Processing, 1381.
Hill, F.J., FlcRae, L.P., & McClellan, R.P. (1968): "Speech recognition
as a function of channel capacity in a discrete set of channels", J.
Acoust. Soc. Am. 44,pp. 13-18.
Hochmair-Desoyer, I.J.8 Hochmair, E.S., Burian, K., & Fisher, R.E.
( 1931): "Four years of experience with cochlear prostheses", Medical
progress tllrough technology 8, pp. 107-119.
House, CJ.F., Berliner, K.I., & Eisenberg, L.S. (1979): "Present status
and future directions of the Ear Research Institute cochlear implant
program", Acta Otolaryngol. 87, pp. 176-184.
Jackson, P.L., Ilontgomery, A.A., & Binnie , C.A. (1976): "Perceptual
dimensions underlying vowel lipreading performance", J. Speech Hear.
Res.19,
- pp. 796-812.
Johansson, B. (1959): "A new d i n g amplifier system for the severely
hard of hearing", pp. 665-667 in Proc. I11 Int. Conf. Acoustics, VoL 11.
Keidel, W.D. (1973): "The cochlear model in skin stimulation",pp. 2732 in Geldard (m): Cutaneous Communication Systems and Devices, The
Psychnomic Society.
Keidel, W.D. (1980): "Neurophysiological requirement for implanted
cochlear prostheses", Audiology 19, pp. 105-127.
King, A., Parker, A., Spanner, M., & Wright, R.( 1982): "A speech display
computer for use in schools for the deaf", pp. 755-758 in Proc. IEEE
ICASSP 82.
Kirman, J.H. (1974): "Tactile communication of speech: a review anrl an
analysis", Psychol. Bull. 80, pp. 54-74; reprinted in Levitt, Pickett, &
Houde (1980), pp. 297-3187
Knorr, S.G. (1976): "A hearing aid for subjects with extreme high24, pp. 473-480.
frequency losses", IEEE Trans. ASSP ~evitt,H., Pickett, J.M., & Houde, R.A. (Eds) (1380): Sensory Aids for
the Hearins Impaired, IEEE Press, John Wiley & Sons, Inc., New York.
Liberman, AJI., Cooper, F.S., Shankweiler, D.P., & Studdert-Kennedy, M.
113,
(1968): "Why are speech spectograms hard to read?" Am. Ann. Deaf pp. 127-133; reprinted in Levitt, Pickett, & I-bude (19(30), pp. 287-288.
Ling, D. & Druz, W.S. (1967): "Transposition of high frequency sounds
by partly vocoding of the speech spectrum: its use by deaf children", J.
Aud. Res. 7, pp. 133-144.
~ a k i ,J.E., Conklin, J.M., Gustafson, M.S., & Humphrey-Whitehead, B.K.
(1981): "The speech spectrographic display: Interpretation of visual
pattern by hearing impaired adults", J. Speech Hear. Dis.46,
- pp. 379387.
Michelson, R.P. & Schindler, R.A. (1981): "Multichannel cochlear im91, pp. 38-42.
plant. Preliminary results in man", The Laryngoscope Miller, G.A. & Nicely, P.E. (1955): "An analysis of perceptual confu1.27,pp. 338sions among some English consonants", J. Acoust. Soc. 2411352.
Nicholls, G.H. (1979): "Cued speech and the reception of spoken
language". Report from School of Hunan Communication Disorders, McGill
University, Elontreal.
Norber, E.11. (1967): "Vibrotactile sensitivity of deaf children to high
78, pp. 2128-2146.
intensity sound",. The Laryngoscope Oeken, F.W. (1963):
"Frequenztransposition zur Horverbesserung bei
Inneohrschwerh5rigkeit", Arch. Ohren- usw. Heilk. u. Z. fIals- usw.
Heilk. 181, pp. 418-425.
Sbens, K-E.
N%rtony, J. (1972): "The transposer as a lipreading aid",
Processing; reprinted in Levitt, Pickett, & Houde (19130), pp. 224-226.
&
&. 246-248 in Proc. 1972 IEEE Conf. Speech Comm. &
Traunmuller, H. ( 1980): "The Sentiphone: A tactual speech communica- pp. 183-193.
tion aid", J. Corn. Dis.13,
Tong, Y.C., Black, R.C., Clark, G.M., Forster, I.C., Millar, J.B.,
~ ' ~ o u ~ h l iB*J.,
n , & Patrick, J.F. (1979): "A preliminary report on a
multiple-channel cochlear implant operation", J. Laryngol. and Otology
93, pp-679-695.
Tong, Y.C., Clark, G.M., Dowell, R.C., Martin, L.F.A., Seligman, P.M., &
Patrick, J.F. (1981): "A multiple-channel cochlear implant and wearable
92, pp. 193-198.
speech processor", Acta Otolaryngol. Upton, H.W. (1968): "Wearable eyeglass speechreading aid", Am. Ann.
Deaf 113, pp. 222-229; reprinted in Uvitt, Pickett, & Houde (1980), pp.
322-333.
Velmans, M.L. (1972): Aids for deaf persons. British patent No 1340105.
Wier, G., Jesteadt, W., & Green, D.M. (1977): "Frequency discrimination
61,
as function of frequency and sensation level", J. Acoust. Soc. Am. pp. 178-184.
Woodward, 1.I.F. & Barber, C.G. (1960): "Phoneme perception in lipreading",J. Speech Hear. Res.3,
- pp. 212-222.
Zollner, M. (1979): "VerstZndlichkeit der Sprache eines einfachen Vocoders", Acustica 43, pp. 271-272.
Zwicker, E. & Schorn, K. (1978):
"Psychoacoustic tuning curves in
17, pp. 120-140.
audiology", Audiology Zwicker, E. & Zollner, M. (1980): "Criteria for VIIIth nerve implants
and for feasible coding", Scand. Audiol. Suppl. 11, pp. 179-181.
I