Download The identification of the mood of a speaker by hearing

Dept. for Speech, Music and Hearing Quarterly Progress and Status Report The identification of the mood of a speaker by hearing impaired listeners Öster, A-M. and Risberg, A. journal: volume: number: year: pages: STL-QPSR 27 4 1986 079-090 http://www.speech.kth.se/qpsr A. THE IDENTIFICATION OF THE MOOD OF A SPEAKER BY HEARING IMPAIRED LISTENERS AnreMarie aster and Arne Risberg Abstract Recordings w e r e made when t w o professional actors, one male and one female, read a number of sentences in the moods angry, astonished, sad, a f r a i d , happy and p o s i t i v e . Based on l i s t e n i n g t e s t s w i t h normalhearing adults, a s e t of sentences were selected on which the listeners agreed to the mood of the speaker. From these sentences, a test list w a s compiled. In the list, the number of different moods w e r e reduced to four: angry, astonished, sad ad happy. An analysis was made of the median fundamental frequency and the total range of fundamental irequency variation i n the test sentences. Normal-hearing children, age ten, hearing impaired children ard adults w e r e tested with this list. For the normal-hearing children, the number of confusions were few but many of the hearing impaired subjects had great d i f f i c u l t i e s i n identifying the speakers' moods. A t e s t was also made when normal-hearing persons listened to the test sentences when they were low-pass f i l t e r e d with a cutoff frequency of 500 Hz. This reduced considerably the subjects' a b i l i t i e s to identify the moods and about the same confusions were made as by the hearing impaired listeners. A plausable explanation of the results of the both normal-hearing listeners i n the f i l t e r i n g situation and the results from the hearing impaired s u b j e c t s seems to be t h e reduced frequency d i s c r i m i n a t i o n ability. O I-IN Ahearing impairment results i n d i f f i c u l t i e s to detect and identify the acoustic elements of the different speech sourds. This d i f f i c u l t y can be explained by reduced useful dynamic and frequency range, degradation in frequency selectivity, in reduced a b i l i t y to detect frequency and amplitude changes etc. Hearing impaired persons' d i f f i c u l t i e s to understand speech is often measured by means of lists of monosyllabic words, but sometimes sentences are used which ought to give a more valid measure of a person's d i f f i c u l t i e s to communicate w i t h others. In a communication situation, however, the true meaning of the communication is a l s o transmitted by how something is said, how words are emphasized, the speaker's mood and a t t i t u d e toward what is said, etc. ?his type of information is transmitted by temp, rhythm ad intonation, changes i n voice quality, etc. Hearing impaired persons' a b i l i t i e s to identify t h i s type of information have been t h e t o p i c i n a few s t u d i e s only. Fourcin (1980) have used synthetic speech stimuli to study hearing impaired children's abilities to identify intonation contours in statements and questions. Risberg & Melfors (1978) studied hearing impaired persons' abilities to identify which word was emphasized in a sentence. The information is in both cases mainly transmitted by means of changes in the fundamental frequency. The results of the studies showed that many of the subjects had difficulties in using this information. The acoustic correlates of a speaker's different moods have been studied among others by Cowan (1936); Fairbanks & Hoaglin (1941); Fairbanks & Pronovost (1939); Lieberman & Michaels (1962); Williams & Stevens (1972). They all found that the most important factor in signaling the speaker's mood is the mean fundamental frequency and the range but that other factors also contribute, e.g., intensity, voice quality, formant frequency changes, etc. As the above-mentioned experiments of Fourcin (1980) and experiments of Risberg & Melfors (1984) have shown that hearing impaired persons' abilities to use information in fundamental frequency changes are reduced, it is also possible that they have difficulties in identifying the speaker's mood. The aim of this study is to shed some light on this problem. MEXMoD Recording of speech material In studies of the acoustic correlates of a speaker's rclood and the listeners' abilities to identify these, two different types of material can be used. The first is "field" recordings frcm actual situations where the speaker's mood is evident from the situation. The second type of material is recordings of professional actors simulating specific moods. The first type of material might be more realistic than the second but has several drawbacks, eq., limited possibilities to select the speech material, a poor control of the acoustic situation, etc. Williams & Stevens (1972) compared the recording of a speaker reporting from a dramatic event (the Hindenburg disaster) with the recording f r o m an actor simulating the reporter's emotional state during the event. They found differences in details but general agreements in the mode of speaking and in the fundamental frequency range and variation. In the study presented here, it was decided to use recordings from professional actors. Two speakers were used: one male and one female professional actor. They were asked to read the sentences in Table I in the mods "angry", "astonished", "sad", "afraid", "happy" and "positive". In studies of this type, it is necessary to select sematically neutral sentences. As it was planned to use the material with children, it was also necessary to use simple sentences which referred to the children's interests. Some of the sentences in Table I might not be ideal in tests with naive listeners, as they might cause difficulties for the actors to express the intended mood. In the listening tests, Table I: Sentences used in the experiment. I. Fri,Xen kan for sent till sblan. (The teacher was late to school) 11. Dan karmer p8 torsdag (They are caning on Thursday) 111. Det var Olle san vann tkivlingen (It was Olle who w m the canpetition) IV. Sarmarlwet barjar sent i &r (Surmrer vacation starts late this year) V. Bollen studsade in g e m fdnstret (The ball bounced in through the VI. Det finns en dtta i skafferiet (There is a mouse in the pantry) these sentences might also have caused some interaction between the most likely moods, based on the meaning of the sentences, and the actor's Ebth actors read the sentences in the six different intended mood. moods. The recordings were made with a high-quality microphone and tape recorder in an anechoic room. Selecting stimuli for the test list It was apparent that the actors, more or less successfully, had been able to achieve the intended mood in the different sentences. For same sentences, it was apparent that they had been unsuccessful and in some, the acoustic quality was unsatisfactory. The first author selected 72 sentences from the recordings. Each interded mood was presented 12 times. Tb select the stimuli for the final test list, 23 members of the Dept. of Speech Cbmmunication & Music Arxxlstics listened to the tape w e r head-nes. On the answering sheet with all the sentences in the test, they marked which of the six moods they thougt was intended by the speaker. The results of the listening test are shown in Table 11. In the table the disagreement between the listeners for the 72 different sentences is shown. For the sentences with the intended mood "sad" (mood no 3), for example, all listeners agreed on sentences 34 and 42. On sentence 6, one listener identified the mood as no 2, "astonished", on sentence 11 one listener also disagreed and identified the intended mood as no 1, "angry", and so on. The total per cent confusions made in the test on the sentences are shown in Fig. 1. m 2. -Astcnishedn M P 3 n o 1 . "Angry" Fixla Sentence Gcntenca m type Sentence Sentenfx Onfu- 68 69 15 ('139 + 57 62 10 29 45 7 32 2 no 5 MI Mm * 20 33 38 ('155 N MI PlVI 66 MI11 FII FI MI FIII E'IV E'IV 67 70 35 49 3 type l?Iv FV MI1 MV Mvl MI1 MVI M MI MI11 M Nood no 3. "Sad" aioMI --- ---585 5.6 484,484,40484 34 42 6 ("111 18 60 22 43 56 61 63 23 E'III Mm FI FIII FV MI1 EV w MI11 PlVI MI FI Mmd no 4. "Afraid" Sentence Sentence Conf* no type aiane MVI M MI1 MI FII Em FII Mv WI FII WI MI1 Table 11. Sentence no sentence type rn FII FII w t.UII FIII MI1 F'III MI11 M MIV MVI Sentence Sentence Ccn* no type sions FI Pa11 MI11 MVI EVI MVI MI1 MI11 Em MI1 FI PI1 NtPnber of confusians and type of ccmfusians made by a group of 23 normal-hearing subjects on the test tape. "Sentence no" is the nmber of the sentence on the tape. "Sentence type", Mmale , F=female speaker, I-VI from Table I. In the c o l m "Confusions", the nmber of cmfusims and which cmfusims that were made are shuwn. The d i f f e r e n t m d s are nLPnbered 1-6. Sentences marked * were used in the final test tape. Hz 60 ANGRY - I Hz I I I I I I AFRAID 300 60 I Hz 60 I I HAPPY - I Hz 300 I POSITIVE 60 l l 0 Fig. 2. l l l ~ ~ ~ ~ ' ~ l 10 .5 l l l 15 l l l l l l 25 sec 2.0 FLnzdamental frequency variations i n the sentence "Dan karoner pb torsdag" (They are caning on Thursday) in the four different mods for the male speaks. I - :'o - I ..... .....0 ..,.. :o .: X ANGRY 0 ASTONISHED 0 SAD .. v AFRAID O.j AHAPPY - I - - ......... A .. ... ...'+'.. . . . . " ........ .*.... . . . . .+ . .<A. ...... . . ..:.:.;. .......... A j i.;:;.+. ..::*....$. ; :...... : . + .+ ...................... 0.: .. . .. . . - .. :.:. .. .. . .. v.:x : ......:.:g;.x; ......... . .. ...;..: - : ..%' .:.+-..:* .:. ....'..:... ..' ' ..' . . Q . . .....: ..' *. - i . I - I I I 100 150 FUNDAMENTAL FREQUENCY, MEDIAN VALUE Fig. 3. - ......'. in 200 HZ Relatim between median value and total range in the different moods for the m a speaker. The figure shows results for sentences where more than 75% of 23 noml-hearing listeners agreed m the mood. l l l t test, 22 normal-hearing adult visitors at the Department listened to the tape over headphones and selected one of the four moods marked on the answering sheets for each test sentence. The result is shown as per cent confusions in the matrices shown in Fig. 5. In the next experiment, the test tape was presented over a loadspeaker in a normal classroom to 20 normal hearing children of age ten years. The results are sbwn in Fig. 6. In the last experiment, ten normal-hearing members of the Department listened to the test tape when the signal was low-pass filtered with a cutoff frequency of 500 Hz, damping 70 d~/oct. The results are shown in Fig. 7. TOTAL FEMALE VOlCE MALE VOICE ANGRY ANGRY ANGRY ASTONISHED ASTONISHED ASTONISHED SAD SAD SAD HAPPY HAPPY HAPPY Fig. 5. Confusions in per cent between different moods of the speaker for 22 normal-hearing adults on the test list with four moods. TOTAL MALE VOICE FEMALE VOICE ANGRY ANGRY ANGRY ASTONISHED ASTONISHED ASTONtSHED SAD SAD SAD HAPPY HAPPY HAPPY Fig. 6. Ccmfusions in per cent between different moods of the speaker for 2 0 nonnal-hearing children. MALE VOlCE ANGRY ASTONISHE0 FEMALE VOlCE m ] 1 ( 1 ( ( ANGRY - 100 - - ASTONISHED SAD HAPPY Fig. 7. Confusicms in per cent betwen different moods of the speaker for 10 no&-hearing adults. nbe test tap was lots-pass filtered with a cutoff frequency of 500 Hz. msts with hearing impaired subjects Two groups of hearing impaired subjects were tested. The first was a group of 18 children from the School for the Partially Hearing in Stockholm. They were between 11 and 14 years old, with a mean of 13 years. Their hearing losses were between 40 and 97 dB for the frequencies 500, 1000 and 2000 Hz for the best ear, with a mean of 76 dB. In all cases, the hearing impairment was congenital or early acquired. The method of communication used in the schcx>l is oral, and the children always used hearing aids. The children listened to the test tape w e r headphones (llX39). Before the actual test they were carefully trained with the four training sentences until they understood the task. The results are slmwn in Fig. 8. The children were also tested with a list of three-word sentences where the emphasis was placed on the first, second or the third word. The main acoustic difference in these test sentences is changes in the fundamental frequency (Risberg & Agelfors, 1978). The children's abilities to detect small changes in a sinusoidal signal was also measured (Risberg & Agelfors, 1984). TOTAL MALE VOICE FEMALE VOICE ANGRY ANGRY ANGRY ASTONISHED ASTONISHED ASTONISHED SAD SAD SAD HAPPY HAPPY HAPPY Fig. 8. Cm£usims in per cent between differentfor 18 hearing impaired children. of the speaker The other group of hearing impaired subjects consisted of 45 patients at the Rehabilitation Clinic of the South Hospital in Stodkholm. The patients' ages varied from 26 to 74 with a mean of 55 years. Their hearing losses were between 10 to 88 dB in the best ear for the frequencies 500, 1000 and 2000 Hz in the best ear with a mean of 38 dB. The cause of hearing impairment was in most cases presbyacusis or miseinduced hearing loss. This group listened to the test with their personal hearing aids when the sentences were presented from a loadspeaker in an ordinary room. Before testing, they were trained with the four training sentences. For 24 of the subjects, the same test tape was presented twice with three weeks interval between the two test sessions. The confusions made in the first test session with total group of 45 patients are shown in Fig. 9. TOTAL MALE VOICE FEMALE VOICE ANGRY ANGRY ANGRY ASTONISHED ASTONISHED ASTONISHED SAD SAD SA 0 HAPPY HAPPY HAPPY Fig. 9. Ccnfusions in per cent between different mods of the speaker for 45 hearing -red adults. DISCUSSION The final test list with 16 sentences and with the four moods: "angry", "astonished", "sad" and "hapm)' seemed be satisfactory. In the test with normal hearing listeners, the number of disagreements was low. Eighteen of the 22 adult listeners agreed with the intended mood on all 16 sentences, two disagreed on one sentence, and one on two sentences and one on three sentences. Fbr the test with the normal hearing ten-years old children, the number of disagreements was higher. Six of them agreed with the intended mood on all 16 sentences, seven agreed on 15, four on 14 and Wee on 13 of the sentences. The m a i n disagreement was on sentence V, "The ball bounced in through the wirdod', prorwxlnced by the female voice in the mood "angry" and identified as "sad", and for the same sentence the stimulus in the mood "hapmj' for the male voice was identified as "angry". Sentence 111, "It was Olle who won the competition" pronounced in the mood "happy" was for the male voice often identified as "angry". The speaker's mood was in this stimulus expressed in a boisterous way that in many respects resembled the way he expressed "angry". It is possible that especially the first sentence for the children was too loaded with the associations that influenced them. In continued work in this area, it is necessary t x put more effort in selecting semantically neutral sentences, especially if the test is to be used with children. Many of the hearing impaired subjects, both children and adults, had difficulties in identifying the speaker's mood, see Figs. 8 and 9. For the children, the per cent correct identification was 63% and for References Oowan, M. (1936): "Pitch and intensity characteristics of stage speech", Arch. of Speech, Sqpl., Dec., pp. 3-92. Fairbanks, G. & Hoaglin, L.W. (1941): "An experimental study of the durational characteristics of the voice during the expression of emotion", Speech Monograph, 8, pp. 85-91. Fairbanks, G. & Pronovost, W. (1939): "An experimental study of the pitch dharacteristics of the voice during the expression of emutmn , Speech Monograph, 6, pp. 87-104. II Fastl, H. & Weinberger, M. (1981): "Frequency discrimination of pure tones and complex tones", Ikustica, 49, pp. 77-78. Fonagy, I. (1981): "Emotion, voice and music", pp. 51-79 in (J. Sundberg, ed): Research aspects on singing, Proc. from a seminar organized by the committee for the acoustics of music, Publ. issued by the Ibyal Swedish lkademy of Music, no 33, Stockholm. Fourcin, A.J. (1980): "Speech pattern audiometry", pp. 170-208 in ( H A Beagley, ed.): Auditory investigation; the Scientific and logical Basis, Clarendon Press, Oxfod. Huttar, G.L. (1967): "Some relations between emotions and the prosodic parameters of speech", Speedh Comm. Lab., Inc, St. Ebrbara/~A, Momgra@h no 1, July 1967. Huttar, G.L. (1968): " Wlations between prosodic variables and e m tions in normal American English utterances", J. Speech w i n g Ftes. 11, pp. 481-487. - Lieberman, P. & Michaels, S.B. (1962): "Some aspects of fundamental frequency and envelope amplitude as related to the emotional content of 34:7, pp. 922-927. speech", J. Acoust.Soc.Am. Risberg, A. & Welfors, E. (1978): "On the identificatian of intanation contours by hearing impaired listeners", STL-QPSR 2-3/1978, pp. 51-61. Risberg, A. and Agelfors, E. (1984): "m the relation between frequency discrimination ability and the degree of hearing loss", m P S R 4/1984, pp. 59-70. Williams, C.E. & Stevens, K.L. (1972): "Emotions and speech: Some acous52:4, part 2, pp. 1238-1250. tical correlates", J~ust.SocAn. -

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download The identification of the mood of a speaker by hearing