* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Speech coding in aids for the deaf: An overview of research
Telecommunications relay service wikipedia , lookup
Hearing loss wikipedia , lookup
Noise-induced hearing loss wikipedia , lookup
Audiology and hearing health professionals in developed and developing countries wikipedia , lookup
Olivocochlear system wikipedia , lookup
Auditory system wikipedia , lookup
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Speech coding in aids for the deaf: An overview of research from 1924 to 1982 Risberg, A. journal: volume: number: year: pages: STL-QPSR 23 4 1982 065-098 http://www.speech.kth.se/qpsr 9 STGQPSR 4 / 1982 11. A. SPEECH AND H E A R I N G DEFECTS AND AIDS AY OVERVIEW OF RESEARCH FROM SPEECH CODING I N AIDS FOR THE DEAF: Arne Risberg Abstract The aim o f speech coding i n a i d s f o r t h e d e a f i s t o a d a p t t h e acoustic speech code t o t h e sensory c a p a b i l i t i e s of a limited residual hearing, a d i r e c t stimulation of t h e auditory nerve, o r t o a transnission through t h e visual o r t a c t i l e sensory system. In t h e 1920°s, Gault (1926) s t a r t e d h i s work on transmission of speech through t h e t a c t i l e sense. I n 1925 Perwitzschky suggested t h a t t h e speech perception a b i l i t y o f a h e a r i n g i m p a i r e d p e r s o n w i t h h e a r i n g o n l y i n low f r e q u e n c i e s could b e improved by frequency lowering. During t h e s i x t y y e a r s from 1922 t o 1982, many attempts have been made t o develop speech coding a i d s f o r t h e deaf. A s e l e c t i o n of t h e more important a r t i c l e s published up t o 1979 h a s been compiled by L e v i t t , P i c k e t t , & Houde (1980). I n s p i t e o f g r e a t e f f o r t s , v e r y few a i d s based a speech recoding t e c h n i q u e s a r e today used by deaf persons. The only exception seems t o be some visual a i d s used f o r speech t r a i n i n g , a few persons t h a t use t a c t i l e aids, and 200 t o 300 p e r s m s t h a t use cochlear implants of d i f f e r e n t types. There a r e s e v e r a l r e a s o n s f o r t h i s l a c k o f s u c c e s s . I n t h i s a r t i c l e I have t r i e d t o give an overview of t h i s area of research and development. Work on amplitude compression systems has not been included. Is t h e a u d i t o r y s y s t e m s p e c i a l ? I n a paper presented a t the Wasllinyton Conference on Speech-Analy~i n g Aids f o r t h e Eeaf i n 1967, Liberman & a 1 (1968) suggested t h a t t h e explanation f o r t h e d i f f i c u l t i e s experienced i n learning t o read visual transforms of t h e acoustic speech signal, t h e sonagram, was t h a t speech i s a s p e c i a l code a1-d t h a t a special decoder i n t h e brain of the l i s t e n e r i s needed. This decoder only works f o r t h e auditory input. A tactuall y o r v i s u a l l y p r e s e n t e d speech s i g n a l does n o t have a c c e s s t o t h i s decoder and t h i s cannot be changed by training. The support t h a t Liberman & a1 presented f o r t h e i r hypotheses was t h e limited success achieved by people t h a t have t r i e d t o l e a r n t o read sonagrams, e s p e c i a l l y t h e r e s u l t s o f t h e experiment made by P o t t e r , Kopp, & Green a t B e l l Telephone L a b o r a t o r i e s (1347). p a p e r , Cole & I n a recent a 1 (1980) q u e s t i o n t h e hypot'rleses of L i b e r n a n & a l . I n s p i t e o f t h e r a t h e r p r i m i t i v e t e c h n i c a l d e v i c e used by P o t t e r & a l l a 12-channel spectrum analyzer, t h a t presented Cle energy i n t h e frequency range from 300-3000 H z on a rnoving phospllorous b e l t , t h e most motivated s u b j e c t could read a vocabulary of 800 words a f t e r about 200 h o u r s of training. binpared t o t h e amount of t r a i n i n g given a child i n learning * Invited l e c t u r e a t t h e c o l l q i u m on "Speech Processing Aids and Coch- lear Implants", W t i n g e n , BRD, September 1982. ze6 .f I I i g -. F: b 1 Hz --I-- FREQUENCY 100 SOURCE FEATURES VOICE NOISE TRANSIENT SILENCE I 1 I 1 I 0 I I I I I 1 .5 S P EE CH C O D 1 1 I 1.0 sec 1 N G f Table 1. Table 2. Visually contrastive phoneme groups ( W w a r d 1. p b m 2. 3. w r f v 4. t d n 1 ~ 6 s z t J d 3 j 3j k g h & Barber, 1960). Wsults f r o m a study of t h e a b i l i t y t o identify prosodic contrasts during lipreading (Risberg & Lubker, 1978). T e s t material Guessing % vowel length 50 Range % Mean % Normal hearinq 65-80 72.0 6.4 Hearing impaired 55-80 70.4 7.8 75-100 85.7 8.9 80-100 90.8 5.9 Normal hearing 60-75 67.9 6.4 Hearing impaired 45-85 65.0 14.8 6.3-75.0 42.8 22.0 Syllable stress 50 Normal hearing Hearing impaired Juncture -hasis SD % 50 50 Stress-pattern in two-syllable words could in most cases be identified. The selection of code in a lipreading aid In describing the sonagram in Fig. 1, it was said that the acoustic speech signal could be seen as the combined effect of a source and a filter. The same model can be used in speech perception. The perception is then based on the parallel processing of source and filter information. This model is very suitable to the work on lipreading aids. During lipreading, the filter information, that is, the place of articulation of different vowel and consonant phonemes, can with reasonable success be seen, whereas source features are less visible. A lipreading aid should then primarily be designed to transmit information on source features, voice, noise, transient, and silence, plus changes in the fundamental frequency. These features will also relatively well describe the manner of articulation of consonants. A n exception might be nasalization and lateralization. The result of an experiment that supports this is shown in Fig. 2 (Risberg & Lubker, 1368). In this study, older hearing impaired persons were tested in three situations: only listening to lowpass filtered speech with a cutoff frequency of 180 Hz, only lipreading, and lipreading plus listening to lowpass filtered speech. The test material was unknown sentences read by a female speaker. The great improvement in the audio-visual situation supports the above conclusion. During this severe filtering, mainly source features and prosodic features could be perceived, as shown by Miller & Nicely (1955). Some of the good results obtained by the subjects in this experiment were based on the use of the intonation pattern (fundamental frequency variatims) in the seutences. That this information is vital to the lipreader has been shown by the group at The University College in Lmdon, that works on extracochlear implants (Rosen & al, 1981; Fourcin & al, 1982). In the experiment the task of the lipreader was to track word by word what a speaker read £ram a short story in two situations, only lipreading and lipreading supported by acoustically transmitted information of fundamental frequency. The number of correctly reprted words per minute was measured. With l i p reading only, the subjects could track with a speed of 10 to 20 words per minute. When lipreading was supplemented by information of fundamental frequency variations they tracked with a speed of 25 to 40 words per minute. The tracking procedure has been developed at the Central Institute for the Deaf in USA (De Filippo & Scott, 1979). The ~aethodgives a good iradication of the e f f e c t i v i t y of a communication aid. The r e s u l t s shown i n Fig. 2 i n d i c a t e t h a t t h e r e i s good hope t h a t speech recoding aids can be designed t h a t give substantial help during l i p r e a d i n g . The amount of a c o u s t i c speech i n f o r m a t i o n t r a n s m i t t e d through the lowpass f i l t e r i n the f i r s t experiment is very limited. I t ought t o be possible to transmit t h i s information in d e d form through another sense modality, a small residual hearing, o r by a d i r e c t stimulation of the auditory nerve. Recoding a i d s f o r t h e d e a f . Some examples It i s beyond the scope of t h i s overview to give a detailed d e s c r i p t i o n of a l l experiments t h a t have been made w i t h d i f f e r e n t coding approaches i n technical a i d s f o r t h e deaf. In the following, some examples are selected that i l l u s t r a t e the problems encountered w i t h t h e d i f f e r e n t coding s t r a t e g i e s and the d i f f e r e n t sense modalities used. In the disc u s s i o n on a u d i t o r y receding, o n l y frequency lowering systems a r e included. A review of work both on amplitude compression and frequency lowering has been published by Braida & a1 (1979). Auditory recoding Work on a u d i t o r y recoding t o h e l p t h e deaf was s t a r t e d by Perwitsckiky, who i n 1925 suggested that frequency lowering could improve hearing impaired subjects' a b i l i t y to perceive speech. %is suggestion was based on the knowledge of t h e r e l a t i v e importance of speech informat i o n i n t h e frequency spectrum ( F l e t c h e r , 1953) and t h e o b s e r v a t i o n that many hearing impaired persons have b e t t e r hearing f o r low frequen- cies than f o r high. In the 1 9 2 0 ' ~ frequency ~ lowering systems could not be technically realized. During the l a s t 30 years, however, many differ- e n t experiments have been ~nadew i t h frequency lowering. The t e s t e d systems can roughly be grouped i n t o three categories: l i n e a r frequency lowering, nonlinear frequency lowering, and selective frequency lowering, s e e Fig. 3. A review of published by Braida & experiments h a s been compiled a 1 (1979). In l i n e a r frequency lowering, the whole frequency range of speech is lowered, o r compressed, t o f i t i n t o t h e frequency range of t h e re- s i d u a l hearing. Lowering r a t i o s of up t o f o u r have been used. This h a s been achieved e i t h e r by means of t a p e r e c o r d e r s w i t h a r o t a t i n g head ( ~ e k e n ,1963), channel voccders (Denes. 1967; Pimonow, 1962), or systems based on t h e FFT-algorithm (Block & Boerger, 1980; Hicks & a l , 1981). ORIGINAL I I RESIDUAL HEARING LINEAR FREQUENCY LOWERING NONLINEAR FREQUENCY LOWERING I /;---:- SELECTIVE I\ FREQUENCY LOWERING I I I L---~iIunI!l --- Fig. 3. Different technical systems for frequency lowering. STL-QPSR 4/1982 Linear frequency lowering rnight result in increased rnasking of tlle high frequencies by the low frequecies. Tb overcome this r,~asking,nonlinear frequency lowering systems have been tried l lock & Boerger, 1980; Hicks, 1981). In these experiments, a technique based on the FFT-transform was also used. In selective frequency luwerh~g,only high frequency fricative sounds are changed to low frequency sounds (Risberg, 1969). Different technical solutions have been used, e.g., a chan11el vocuder technique (Ling & Druz, 1967), modulation with a carrier around 4000 Hz Johansson, 1959; Velmans, 1972), and shift registers with read-out at lower speed than read-in (Knorr, 1976). In some systems a voiced/unvoiced detector is included to avoid lowering of voiced sounds as this might introduce undesired distorsions (Risberg, 1963; Knorr, 1976). One hearing aid with frequency lowering has been ~omr~~ercially manufactured by Oticon (Oticon TP72) based on the lowering system developed by Johansson (1359). In an evaluation study some positive results were obtained (Foust & Gengel, 1973). The evaluation studies of different frequency lowering systems have in many cases been made with normal hearing subjects. A hearing loss has then been simulated by means of a lowpass filter, usually with a cutoff frequency around 1000 Hz. In all experiments it has been sldown that the normal hearing subjects can improve their ability to identify speech by means of the lowered speech signal. When experiments have been made with hearing impaired subjects, it has also been shown that they, wit31 training, can improve their ability to understand speech over the system, but when an equal amount of training has been given with the original speech signal no difference has, as a rule, been obtained between direct transmission and lowering (Oeken,1963; Block & Boerger, 1980). In the experiment by Block & Boerger (1980), however, better results were obtained on CV-syllables wit11 lowering but about the sane results on words and sentences. The results of evaluation experiments with frequency lowering have been disappointing. Theoretically, it seems reasonable to expect that some irnprovernent in speech perception ability could be obtained with this technique. There are probably several reasons for this lack of success. In some systems inappropriate d i n g techiques that have introduced disturbances have been used. The training time lzas in all studies been too short. In several studies it has also been shown that the frequency selectivity or frequency discrimination ability has deteriorated in sensorineural hearing loss (Zwicker & Schorn, 1978; Gellgel, 1973). It is, therefore, possible that subjects that could not perceive the differences between the lowered speech sounds have been used. This is a very likely explanation when severely hard of hearing subjects have been used (Ling & Druz, 1967; Foust & Gengel, 1973). Even with a moderate hearing loss, frequency discrimination ability can be affected. Fig. 4 shows the audiogram and frequency discrimination ability of a subject with a moderate loss. In spite of a better hearing in the right ear, frequency discrimination ability is poor in this ear but almost normal in the left ear. The speech signal is severely distorted in this subject's right ear. One of the problems with many of the lowering systems tested has been that they only existed as laboratory equipment or as a computer program. The amount of training that could be given was, therefore, limited. It is very likely that very long training times are needed before full use can be made of a frequency lowering signal. The results from the experiment by Block & Boerger (1980) showed that better results were obtained on syllables with the lowering system than with the direct signal. This night indicate that syllable identification could be improved after the short training period but that word and sentence intelligibility does not improve so quickly. A possible strategy for future work on frequency lowering might be to concentrate the work on systems that can easily be made wearable. More work is needed on the selection of suitable subjects and in all experiments some measurements of the auditory capacity of the subjects must be included. Fig. 5 shows the audiograms of five subjects with a sensorineural hearing loss. In the bottom part the results obtained by these subjects on tests with monosyllabic phorletically balanced word lists (PB) and on identifying keywords in sentences ( S ) are shown. In both cases the speech material was presented in a silent room. Subject A had no difficultiestoperc~ivespeechinthissituation. Thisisatypical audiogram for an older patient with an acquired hearing loss, often diagnosed as presbyacusis. Typically for these subjects is that they complain about their difficulties to perceive speech in the presence of noise but have no difficulties in silence. A recoding system for these subjects must then be designed for this situation. The high age of these subjects might mean that they have difficulties in learning to use a new acoustic code. Subject B has a more flat audiogram. In this case the dynamic range is also limited due to recruitment. This night, therefore, be a case where amplitude compression can be of some help. __ __- __ FREQUENCY, Hz - . - - -- - - -- - - Fig. 4. Puretone audiogram and result of frequency discrimination masuremmts on a subject with large differences infrequency discrimination ability in the t m ears. FREQUENCY, Hz Fig. 5. Puretone audiograms and result obtained on speech perception tests w i t h five hearing i n p i r e d persons. PB=results on a list w i t h mnosyllabic mrds, S = results on a test with sentences. STGQPSR 4/1982 Subject C is a case of low frequency hearing wit11 very low results on the speech tests. This is the type of subject that needs help from a frequency lowering system. An ordinary hearing aid is in these cases of very little help. Speech perception is only possible with simultaneous lipreading and even in this situation speech perception is often difficult. Fig. 6 shows results that, at least theoretically, indicate, that selective frequency lowering can be of help in these cases (Risberg, 1965). The experiment was made with the fricative lowering system of Johansson (1959). Normal hearing persons were used and hearing impairment was simulated by means of lowpass filtering. Ttie test material consisted of CV-syllables and monosyllabic words. iJith frequency lowering, an increase in the per cent correct identified words was obtained. This was coupled with an increase in per cent correctly perceived manner of consonant articulation. As subjects with the kind of hearing loss that was simulated must rely on lipreading, this increased the ability to identify the manner of consonant articulation is important. A technical aid with selective frequency lowering might then be a good lipreading aid. A small study that also seems to show this has been made by Spens & Martony (1972). Subject D has poor hearing in the whole frequncy range. In this case, frequency discrimination ability was also shown to be very abnormal. It is, therefore, not likely that this subject can be helped by means of a frequency lowering system. A system that improves the ability to perceive some source features and fundamental frequency variations might be used. Audiogram E follows closely the threshold of vibration in the ear (Norber, 1967). This is an audiogram from a cwngenitally totally deaf child. Auditory stimulation in the ear can in this case probably not give more than tactile stimulation on a finger, for example. Reding to a visual signal As an information tranmsission channel in an aid for speech perception, the visual channel has the limitation that it makes simultaneous lipreading difficult. In 1968, Upton showed a solution to this problem. Speech information was presented on the lipreader's eyeglasses by ineans of a number of small lamps. In a later design this system was changed to a system based on light-emitting diodes that projected signals on a mirror on the eyeglasses. The lipreader then saw the speaker's face surrounded by a changing light pattern. d 0- o ONLY FILTER X TRANSPOSER + FILTER 1 / MANNER LX 0 o w 500 Fig. 6. 1k LOW-PASS CUTOFF FREQUENCY 2k Per cent correct identified mnosyllabic mrds and per cent correct i d e n t i f i e d m ner and place of consonant articulation. Comparison between results obtained with lcwpass f i l t e r i n g only and with frequency loweringplus lowpass f i l t e r (Risberg, 1965). Tactile aids and cochlear implants Tactile aids and cochlear implants can for two reasons be grouped together. The first is that in many cases the speech d i n g problem has been approached in very rnuch the sarne way. The other reason is that it often has been questioned if a single channel cochlear implant can give more speech information than what can be obtained by means of a tactile aid. Coding strategies The coding strategies used in many tactile aids and cochlear implants can roughly be grouped into three categories: direct transmission, spectral coding, and feature coding. Sometimes approaches have been used that can be described as a combination of these categories. In the first approach, direct transmission, the acoustic signal is directly transmitted to the selected sensory modality by means of a vibrator or an electrode inserted into the cochlea or placed outside the cochlea. Sometimes the speech signal is adapted to the limited dynamic properties or limited frequency range of the modality. In tile secvnd type of code, spectral coding, the main part of the information in speech is assumed to be transmitted by the time-varying distribution of energy in the frequency domain. This approach results in a technical system consisting of a number of filter channels. The energy in the different channels is transmitted to the subject by means of a suitable code. Based on experiments with channel vocoders it seems that six to ten channels are a minimum number required. Fig. 7 shows the results of experiments made by Hill & a1 (1968) and Zollner (1979). In these experiments a channel vocoder was used where the synthesis filters were replaced by sinusoids with the same frequency as the center frequency of the synthesis channels. The figure shows that with only six channels about 70 % of the vowels and the consonants are correctly identified and 90 % of the monosyllabic words. These results have been interpreted to indicate that in a speech coding aid a maximum of ten channels are needed to get a good ability to perceive speech over the system. No transmission of the source features, voice and noise, or changes in fundamental frequency have been considered to be necessary. The results shown in Fig. 7 have, however, been obtained with normal hearing listeners. This means that a sensory system with a good time and frequency resolution was used. It is not likely that this good resolution can be obtained when, e. g., the tactile sense is used or in direct stimulation of the auditory nerve. I 0 t 11 MONOSYLLABIC WORDS ZOLLNER 1979 o CONSONANTS 1 x VOWELS HILL, ET AL 1968 NUMBER OF SlNUSOlOS Fig. 7. Speech perception obtained in a channel vocoder with different manner of channels. As synthesis siqrals, sinusoids with the same cer,ter frequency as the synthesis channel were used (Hill & al, 1968; Zollner,1979) . - - BP lo * 5L0 P Hz -MODU- BONE CONDUCTION VIBRATOR LATOR J - 13-- AMPLIFIER BP9 B P 8 * * I I I - L P _( 50Hz - I I I I I * LP 50Hz LATOR I I I I I I I 1 I I MODULATOR I Fig. 8. I MODU- a Tactile speech-coding based on a channel voccder. i . . In the future, speech perception coding systems are seen as the recognition of a number of acoustic features such as: periodic, noise, fundamental frequency variations, the frequency of the formants, formant transitions, time relationbetweendifferentacoustic elements, etc. Some of these features are more important than others. Some of them cal be identified during lipreading and some cannot. Based on an opinion of the relative importance of the features, a selection is made and the selected features are transmitted in a suitable code. In some systems different phonetic classes, such as voiced plosive, unvoiced fricative, nasalized, etc. are automatically recognized and coded. This automatic recognition increases the possibility to select a suitable code. Tactile aids Gault started his work on tactile recoding in the 1920's (1926). From the beginning he used single channel systems but he gradually realized that the small usable frequency range, only up to 500 Hz, and the poor frequency discrimination ability of the tactile sense made it necessary to use a system where frequency was substituted for place of stimulation. Gault could show that deaf and normal hearing subjects could learn to identify some words with single channel tactile aids and that they could get help during lipreading (1926). During the 1950's the interest in tactile recoding increased as a result of the invention of the channel vocoder. Tactile systems were built of the basic type, shown in Fig. 8,with the number of channels ranging from 10 to 24. Evaluation studies showed that subjects could learn to detect syllabic structures and gross spectral properties and that the systems gave support during lipreading (Pickett, 1963). The effect was, however, small. Liberman & a1 (1968), in the paper quoted earlier, suggested that this was due to the existence of a special decoder in the brain of the listener and that this decoder only worked for auditory signals. Kirman (1974) made a review of published studies on tactile speech communication and came Lo the conclusion that a more probable explanation was that the tactile code used was less suitable to the transmission of speech signals. He suggested that a matrix of vibrators instead should be used. Based on his suggestion several systems of this type have been tested. In some of them a frequency-intensity pattern is transmitted as in the earlier systems (Sparks & al, 1978) and in some of then a time-frequency-amplitude pttern is trarlsrnitted s p n s , ENGELMANN 0 / M/ DIFFERENT TACTILE At DS Fig. 9. Type and placerrrent of the vibrators in the differe n t t a c t i l e speech aids carpred by Spens ( 1980) . SYSTEM TYPE Originator Correct Originator Fig. 10. Wsults obtained in the T r i s o n of scme tactile speech a i d s (Spens, 1980). STL-QPSR 4/1982 from the speech signal and how to transmit them in a wearable technical aid. Another approach in the present studies of tactile aids is to make the aids comfortably wearable. One objection to most of the earlier work on tactile speech transmission is that the systems are so bulky that they can only be used in the laboratory which will result in very short training periods. An alternative approach is then to start from the requirement that the aid must be easily wearable and tile11try to irlcorprate as much of the wanted recoding power as possible. This approach has in our laboratory resulted in a very simple aid where an optimum recoding is made only for the time-amplitude information. The electronics is buildt into a case of a Wy-worn hearing aid. Preliminary studies with normal hearing subjects show that the aid gives some support during lipreading. Fig. 11 shows tl~eresult of an evaluation study. Two groups of normal hearing subjects tried to lipread sentence material. Half of the group started with only lipreading and half started with lipreading plus the tactile aid. After five to six lists they changed the situation. As can be seen, both groups obtained better results when the vibrator is used. The aid has also been tried by several persons with an acquired total deafness. They find it very useful, especially to get information about environment sounds. The tactile signal also makes lipreading less tiring. Cochlear implants Work on direct stimulation of the auditory nerve, cochlear implants, has been going on at several laboratories since the end of the 1960's. Most of the work on cochlear implants has been based on the assumption that the only manner in which it could be possible to obtain sorne ability to perceive speech in such systems was to use a nultichannel approach, that copies the frequency-analyzing ability of the peripheral auditory system. Single channel systen~swere believed only to be able to transmit information about speech rhythm and some information about environmental sounds (Zwicker & Zollner, 1980). The results reported by House & al (1979) and also the results of the evaluation of 13 patients equipped with single channel systems in the study by Bilger (1977) supported this view. Psycllophysical studies during electric stimulation by means of an electrode inserted in the cochlea give pitch sensations that depend on the site of stimulation. This pitch sensation changes, however, with the frequency of stimulation. The usable dynamic range is very s m a l l , often o n l y 6 t o 10 dB. Based on t h e s e r e s u l t s it seems t h a t a p o s s i b l e code, then, could be t o use a multichannel system with constant stimulation frequency and t o a d a p t t h e a m p l i t u d e range t o t h e u s a b l e range d u r i n g electric s t i m u l a t i o n . This t y p e of coding i s used by Chouard (1977). Based on t h e same s t u d i e s t h a t were used t o d e s i g n t a c t i l e spectrum systems, t h e necessary number of channels w a s estimated t o be t e n & a l , 1968; Z o l l n e r , 1979). i ill The s e l e c t e d code v e r y much resembles t a c t i l e spectrum coding. I n b o t h s i t u a t i o n s t h e d i f f e r e n c e between voiced and unvoiced sounds is not coded e f f i c i e n t l y and intonation is not coded a t all. Strong energy i n one part of t h e frequency range a l s o masks energy i n other parts. The cochlear implant group i n Australia lias more d i r e c t l y applied knowledge of t h e importance of d i f f e r e n t acoustic speech elements d e s i g n their i m p l a n t (Tong & to a l , 1981). T h e s e l e c t e d code c a n be de- s c r i b e d as a f e a t u r e code. The frequency o f t h e midfrequericy range (mainly the frequency of the second formant) is coded as the position of stimulation on t e n electrodes i n the cochlea. Stimulation frequency is p r o p o r t i o n a l t o the fundamental frequency and f o r unvoiced sounds t h e stimulation frequency i s lowered t o 70 Hz, w h i c h gives a rough percept. From a speech coding point of view this c d e seen1 r a t h e r promising and the code is also well matched t o simultaneous speech-reading. Evaluation studies report t h a t one subject could c o r r e c t l y i d e n t i f y 65 % of spondee words i n a known set of 16. H e could a l s o i d e n t i f y 16 % of unknown simple sentences (Tong & a l , 1981). The lnultichannel systems r e s u l t i n a bulky e l e c t r o n i c u n i t t h a t must b e implanted and a bulky, power consuming, e x t e r n a l s t i m u l a t o r . Even i f e l e c t r o n i c c i r c u i t r y gradually can be n i n i a t u r i z e d it is l i k e l y that it w i l l be d i f f i c u l t t o reduce the s i z e and weight of these systems s o t h a t t h e y can b e e a s i l y wearable. An i m p o r t a n t q u e s t i o n is then i f these multichannel systerns give much b e t t e r r e s u l t s than s i n g l e channel systems. Recently, r e s u l t s have been published t h a t makes it necessary t o answer t h i s question ( ~ b x h n a i r - ~ e s o y e&r a l , 1981; Michelson & Schindler, 1981). In a single chanrlel approach t o e l e c t r i c stimulation of the auditory nerve no attempt i s made t o simulate t h e frequency analyzing abilit y of the peripheral auditory system. Instead t h e t h e domain properties of the systern a r e used ( ~ e i d e l ,1380). Experirnerlts have shown that with e l e c t r i c stirnulation i n the cochlea subjects can scale p i t c h up t o 400 LIPREADING o WITHOUT VIBRATOR W I T H VIBRATOR TRAINING SESSION Fig. 11. Results from evaluation of simple singlevibrator tactile lipreading aids. 7 AUDITORY, SL 40 dB 20 50 100 200 500 Ik 2k 5k 10k FREQUENCY, Hz Fig. 12. Comparison between frequency difference ability obtained with auditory stimulation (Wier & al., 1977), tactile stimulation (Rothenberg & al, 1977; Goff, 1967), and direct electrical stimulation of the auditory nerve (Bilqer, 1977; Fischer, 1981). S : CK RESPONSE Fig. 13. S: SS RESPONSE V m l confusions reported by Hochmair-Desoyer & a1 (1981). The vowels are arranged in groups with about the same f i r s t forrrrant frequency. that can be obtained with a l l subjects or i f these four subjects represent an optimum. Similiar results have been reported by Michelson & Schindler (1981). Stimulation outside the cochlea, extracochlear stimulation, i s attractive as no irreversible surgical operation has t o be made. This technique i s studied i n several laboratories. Fourcin & a1 (1982) i n London have concentrated their work on the transmission of prosodic features as t h i s w i l l result i n an aid that w i l l give good help during speechreading. Cochlear implants and hearing a i d s The very good results that have been reported from experiments with cochlear implants w i l l with certainty result i n a rapid increase of experiments i n this area. I t i s then of interest to compare the results that have been reported from these studies with results that are obtained with hearing aids on severely hard of hearing subjects. Fig. 15 shows results from measurements of speech perception ability by means of a t e s t that consisted of twelve known spondee words. The results are plotted against the mean hearing loss for tihe frequencies 500, 1000, and 2000 Hz. I n the figure results on similar t e s t s reported from cochlear implant studies are shown. CK, FW, LP and SS are from the Vienna group (Hochmair-Lksoyer & a l l 1981) and MC1 from the Australian group (Wng & a l l 1901). As can be seen, the reported results from cochlear implants are comparable t o a hearing loss i n the order of 90 t o 95 dB. The good results from subjects with hearing aids with a mean loss of more than 100 dB are from persons with a combined conductive and sensorineural hearing loss. The results shown i n the figure indicate that w i t h a cochlear implant of the type used tday, a totally deaf person might be changed into a person with a severe hearing loss. Sane conclusions The work on speech d i n g in aids for the deaf has, as was pointed out i n the introduction, very rarely resulted i n technical aids that are used by the deaf. I n this overview some reasons for this lack of success have been discussed. The rapid development around large scale integration of electronic circuitry rnakes it possible t o realize ever1 very complex speech processing algorithms i n compact hardware. The problem is, however, to specify what the electronic circuit should do. Tb solve this problem basic research is needed around speech processing and the importance of different speech elements for the perception of speech both with and without the support of lipreading. More knowledge is also needed about the capability of the sensory system selected for signal transmission. The relative success recently achieved in the work on cochlear implants will result in an increased activity in this area. In many cases it is difficult to explain published results from evaluation studies based on our present knowledge about speech perception and the neurophysiology of the auditory system. Basic research is needed around different aspects of speech signal transmission by means of electrical stimulation of the auditory nerve. This can be achieved by means of a closer cooperation between groups working on implants and speech research groups. This coupling can also result in better designed studies for the evaluation of the speech perception ability achieved with different types of implants and different speech d i n g systems. References Bilger, R.C. and Hopkinson, N.T. (1377): "Hearing performance in the 86, pp. auditory prosthesis", Ann. Otol. Rhinol. Laryngol., Suppl 38, 76-91. Bilger, R.C.( 1977): "Psychoacoustic evaluation of present prostheses", 86, pp. 92-140. Ann. Otol. Rhinol. Laryngol., Suppl 38, Block, R. & Boerger, G. ( 1980): "Harverbessernde Verfahren nit Ejandbreitenkompression", Acustica 45, pp. 294-303. Braida, L.D., Durlach, N.I., Lippmann, R.P., Hicks, B.L., Rabinowitz, C.M., & R e d , C.M. (1979): "Hearing aids - A review of past research on linear anplification, amp1itude cori~pression,and frequency lowering", ASHA Monographs No. 19. Chouard, C.R. and MacLeod, P.(1977): "Auditory prosthesis equipment", US patent 4 207 44 1980; French patent 1977.. Cole, K.A., Rudnicky, A.I., Zue, V.W., & Reddy, D.R. (1900): "Speech as patterns on paper",pp. 3-50 in Cole, R.A. (Ed): Perception and Production of Fluent Speech, Lawrence Erlbaum Ass.Publ., Hillsdale, W. Cornett, R.O. (1967): "Cued speech", A n . Ann. Deaf. 112, pp. 3-13. De Filippo, C.L. & Scott, B.L. (1978): "A method for training and evaluating the reception of ongoing speech", J. Acoust. Soc. Am.63, pp. 1186-1192. STL-QPSR 4/1982 Denes, P.B. (1963): "On the statistics of spoken English1',J. Acoust. Soc. Am. 35, pp 892-904. Denes, P.B. (1967): "On the rootor theory of speech perception", pp. 309-314 in ?J.FJ. Dunn (Crl): Plodels for Perception of Speech and Visual Fom, llIT Press, Cambridge, MA. Farwell, R.f.1. (1376): "Speech reading: a research review", An. Ann. Deaf. 122, pp. 19-30. Fischer, R. ( 1981) : "llethodik und Experimente zur psychoakustischen Evaluation des durch Cochleaqrcthesen wiedererlangbaren trcjrvermijgens in sensorineural Ertaubten", Diss. Techn. UniversitZt ;Jien. Fletcher, H. (1953): Speech and Hearing in Communication, Van Noostrand, Princeton, NJ. Fourcin, A.J., Douek, E.E., Floore, B.C.J., Rosen, S.M., Walliker, J.R., Howard, D.M., Abberton, E., & Frampton. (???) (19132):"Speech perception by promontory stimulation", paper presented at the Int. Cochlear Prosthesis Conf., New York, NY. FOUS~,K.O. & Gengel, R.W. (1373): "Speech discri~ninationby sensorineural hearing-impaired persons using a transposing hearing aid", Scand. Audiol. 2, pp. 161-170; reprinted in Levitt, Pickett, & Houde (1980), pp. 227-236. Gault, R.H. (1926): "Touch as a substitute for hearing in the interpretation and control of speech", Arch. Otolaryngol. 3, pp. 121-135; reprinted in Levitt, Pickett, & Hode (1980), pp. 243-246. Gengel, RW. ( 1973): "Temporal effects in frequency discrimination by hearing-impaired listeners", J. must. Soc. Arn.54, pp. 11-15. Gengel, RW. (1976): "~ptofi's wearable eyeglass speechreading aid: history and current status",pp. 291-299 in Hirsh, S.K., Eldredge, D.H., & Silverman, S.R. (Ms.) Hearing and Davis: Essays Honoring Hallowell Davis, - Washington University Press, St. muis, MI. 1976. Gescheider, G.A. (1970): "Some cor;7parisons between touch and hearing", IEEE Trans. on man-machine systems, ,pp. 28-35. Goff, G.D. (1967): "Differential discrimination of frequency of cuta74, pp. 294-233. neous mechanical vibration", J. Exp. Psychology Hicks, B.L., Braida, L.D., & Durlach, N.I. (1981): "Pitch invariant frequency lowering with nonuniform spectral compression", pp. 121-124 in Proc. IEEE, Int. Conf. Acoust. Speech and Signal Processing, 1381. Hill, F.J., FlcRae, L.P., & McClellan, R.P. (1968): "Speech recognition as a function of channel capacity in a discrete set of channels", J. Acoust. Soc. Am. 44,pp. 13-18. Hochmair-Desoyer, I.J.8 Hochmair, E.S., Burian, K., & Fisher, R.E. ( 1931): "Four years of experience with cochlear prostheses", Medical progress tllrough technology 8, pp. 107-119. House, CJ.F., Berliner, K.I., & Eisenberg, L.S. (1979): "Present status and future directions of the Ear Research Institute cochlear implant program", Acta Otolaryngol. 87, pp. 176-184. Jackson, P.L., Ilontgomery, A.A., & Binnie , C.A. (1976): "Perceptual dimensions underlying vowel lipreading performance", J. Speech Hear. Res.19, - pp. 796-812. Johansson, B. (1959): "A new d i n g amplifier system for the severely hard of hearing", pp. 665-667 in Proc. I11 Int. Conf. Acoustics, VoL 11. Keidel, W.D. (1973): "The cochlear model in skin stimulation",pp. 2732 in Geldard (m): Cutaneous Communication Systems and Devices, The Psychnomic Society. Keidel, W.D. (1980): "Neurophysiological requirement for implanted cochlear prostheses", Audiology 19, pp. 105-127. King, A., Parker, A., Spanner, M., & Wright, R.( 1982): "A speech display computer for use in schools for the deaf", pp. 755-758 in Proc. IEEE ICASSP 82. Kirman, J.H. (1974): "Tactile communication of speech: a review anrl an analysis", Psychol. Bull. 80, pp. 54-74; reprinted in Levitt, Pickett, & Houde (1980), pp. 297-3187 Knorr, S.G. (1976): "A hearing aid for subjects with extreme high24, pp. 473-480. frequency losses", IEEE Trans. ASSP ~evitt,H., Pickett, J.M., & Houde, R.A. (Eds) (1380): Sensory Aids for the Hearins Impaired, IEEE Press, John Wiley & Sons, Inc., New York. Liberman, AJI., Cooper, F.S., Shankweiler, D.P., & Studdert-Kennedy, M. 113, (1968): "Why are speech spectograms hard to read?" Am. Ann. Deaf pp. 127-133; reprinted in Levitt, Pickett, & I-bude (19(30), pp. 287-288. Ling, D. & Druz, W.S. (1967): "Transposition of high frequency sounds by partly vocoding of the speech spectrum: its use by deaf children", J. Aud. Res. 7, pp. 133-144. ~ a k i ,J.E., Conklin, J.M., Gustafson, M.S., & Humphrey-Whitehead, B.K. (1981): "The speech spectrographic display: Interpretation of visual pattern by hearing impaired adults", J. Speech Hear. Dis.46, - pp. 379387. Michelson, R.P. & Schindler, R.A. (1981): "Multichannel cochlear im91, pp. 38-42. plant. Preliminary results in man", The Laryngoscope Miller, G.A. & Nicely, P.E. (1955): "An analysis of perceptual confu1.27,pp. 338sions among some English consonants", J. Acoust. Soc. 2411352. Nicholls, G.H. (1979): "Cued speech and the reception of spoken language". Report from School of Hunan Communication Disorders, McGill University, Elontreal. Norber, E.11. (1967): "Vibrotactile sensitivity of deaf children to high 78, pp. 2128-2146. intensity sound",. The Laryngoscope Oeken, F.W. (1963): "Frequenztransposition zur Horverbesserung bei Inneohrschwerh5rigkeit", Arch. Ohren- usw. Heilk. u. Z. fIals- usw. Heilk. 181, pp. 418-425. Sbens, K-E. N%rtony, J. (1972): "The transposer as a lipreading aid", Processing; reprinted in Levitt, Pickett, & Houde (19130), pp. 224-226. & &. 246-248 in Proc. 1972 IEEE Conf. Speech Comm. & Traunmuller, H. ( 1980): "The Sentiphone: A tactual speech communica- pp. 183-193. tion aid", J. Corn. Dis.13, Tong, Y.C., Black, R.C., Clark, G.M., Forster, I.C., Millar, J.B., ~ ' ~ o u ~ h l iB*J., n , & Patrick, J.F. (1979): "A preliminary report on a multiple-channel cochlear implant operation", J. Laryngol. and Otology 93, pp-679-695. Tong, Y.C., Clark, G.M., Dowell, R.C., Martin, L.F.A., Seligman, P.M., & Patrick, J.F. (1981): "A multiple-channel cochlear implant and wearable 92, pp. 193-198. speech processor", Acta Otolaryngol. Upton, H.W. (1968): "Wearable eyeglass speechreading aid", Am. Ann. Deaf 113, pp. 222-229; reprinted in Uvitt, Pickett, & Houde (1980), pp. 322-333. Velmans, M.L. (1972): Aids for deaf persons. British patent No 1340105. Wier, G., Jesteadt, W., & Green, D.M. (1977): "Frequency discrimination 61, as function of frequency and sensation level", J. Acoust. Soc. Am. pp. 178-184. Woodward, 1.I.F. & Barber, C.G. (1960): "Phoneme perception in lipreading",J. Speech Hear. Res.3, - pp. 212-222. Zollner, M. (1979): "VerstZndlichkeit der Sprache eines einfachen Vocoders", Acustica 43, pp. 271-272. Zwicker, E. & Schorn, K. (1978): "Psychoacoustic tuning curves in 17, pp. 120-140. audiology", Audiology Zwicker, E. & Zollner, M. (1980): "Criteria for VIIIth nerve implants and for feasible coding", Scand. Audiol. Suppl. 11, pp. 179-181. I