Download The identification of the mood of a speaker by hearing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hearing loss wikipedia , lookup

Speech perception wikipedia , lookup

Telecommunications relay service wikipedia , lookup

Noise-induced hearing loss wikipedia , lookup

Sensorineural hearing loss wikipedia , lookup

Audiology and hearing health professionals in developed and developing countries wikipedia , lookup

Dysprosody wikipedia , lookup

Transcript
Dept. for Speech, Music and Hearing
Quarterly Progress and
Status Report
The identification of the
mood of a speaker by hearing
impaired listeners
Òˆster, A-M. and Risberg, A.
journal:
volume:
number:
year:
pages:
STL-QPSR
27
4
1986
079-090
http://www.speech.kth.se/qpsr
A.
THE IDENTIFICATION OF THE MOOD OF A SPEAKER BY HEARING IMPAIRED
LISTENERS
AnreMarie aster and Arne Risberg
Abstract
Recordings w e r e made when t w o professional actors, one male and one
female, read a number of sentences in the moods angry, astonished, sad,
a f r a i d , happy and p o s i t i v e . Based on l i s t e n i n g t e s t s w i t h normalhearing adults, a s e t of sentences were selected on which the listeners
agreed to the mood of the speaker. From these sentences, a test list
w a s compiled. In the list, the number of different moods w e r e reduced
to four: angry, astonished, sad ad happy. An analysis was made of the
median fundamental frequency and the total range of fundamental irequency variation i n the test sentences.
Normal-hearing children, age ten, hearing impaired children ard
adults w e r e tested with this list. For the normal-hearing children, the
number of confusions were few but many of the hearing impaired subjects
had great d i f f i c u l t i e s i n identifying the speakers' moods. A t e s t was
also made when normal-hearing persons listened to the test sentences
when they were low-pass f i l t e r e d with a cutoff frequency of 500 Hz. This
reduced considerably the subjects' a b i l i t i e s to identify the moods and
about the same confusions were made as by the hearing impaired listeners. A plausable explanation of the results of the both normal-hearing
listeners i n the f i l t e r i n g situation and the results from the hearing
impaired s u b j e c t s seems to be t h e reduced frequency d i s c r i m i n a t i o n
ability.
O
I-IN
Ahearing impairment results i n d i f f i c u l t i e s to detect and identify
the acoustic elements of the different speech sourds. This d i f f i c u l t y
can be explained by reduced useful dynamic and frequency range, degradation in frequency selectivity, in reduced a b i l i t y to detect frequency
and amplitude changes etc. Hearing impaired persons' d i f f i c u l t i e s to
understand speech is often measured by means of lists of monosyllabic
words, but sometimes sentences are used which ought to give a more valid
measure of a person's d i f f i c u l t i e s to communicate w i t h others. In a
communication situation, however, the true meaning of the communication
is a l s o transmitted by how something is said, how words are emphasized,
the speaker's mood and a t t i t u d e toward what is said, etc. ?his type of
information is transmitted by temp, rhythm ad intonation, changes i n
voice quality, etc.
Hearing impaired persons' a b i l i t i e s to identify
t h i s type of information have been t h e t o p i c i n a few s t u d i e s only.
Fourcin (1980) have used synthetic speech stimuli to study hearing
impaired children's abilities to identify intonation contours in statements and questions. Risberg & Melfors (1978) studied hearing impaired
persons' abilities to identify which word was emphasized in a sentence.
The information is in both cases mainly transmitted by means of changes
in the fundamental frequency. The results of the studies showed that
many of the subjects had difficulties in using this information.
The acoustic correlates of a speaker's different moods have been
studied among others by Cowan (1936); Fairbanks & Hoaglin (1941);
Fairbanks & Pronovost (1939); Lieberman & Michaels (1962); Williams &
Stevens (1972). They all found that the most important factor in signaling the speaker's mood is the mean fundamental frequency and the
range but that other factors also contribute, e.g., intensity, voice
quality, formant frequency changes, etc. As the above-mentioned experiments of Fourcin (1980) and experiments of Risberg & Melfors (1984)
have shown that hearing impaired persons' abilities to use information
in fundamental frequency changes are reduced, it is also possible that
they have difficulties in identifying the speaker's mood. The aim of
this study is to shed some light on this problem.
MEXMoD
Recording of speech material
In studies of the acoustic correlates of a speaker's rclood and the
listeners' abilities to identify these, two different types of material
can be used. The first is "field" recordings frcm actual situations
where the speaker's mood is evident from the situation. The second type
of material is recordings of professional actors simulating specific
moods. The first type of material might be more realistic than the
second but has several drawbacks, eq., limited possibilities to select
the speech material, a poor control of the acoustic situation, etc.
Williams & Stevens (1972) compared the recording of a speaker reporting
from a dramatic event (the Hindenburg disaster) with the recording f r o m
an actor simulating the reporter's emotional state during the event.
They found differences in details but general agreements in the mode of
speaking and in the fundamental frequency range and variation. In the
study presented here, it was decided to use recordings from professional
actors.
Two speakers were used: one male and one female professional actor.
They were asked to read the sentences in Table I in the mods "angry",
"astonished", "sad", "afraid", "happy" and "positive".
In studies of this type, it is necessary to select sematically
neutral sentences. As it was planned to use the material with children,
it was also necessary to use simple sentences which referred to the
children's interests. Some of the sentences in Table I might not be
ideal in tests with naive listeners, as they might cause difficulties
for the actors to express the intended mood. In the listening tests,
Table I: Sentences used in the experiment.
I. Fri,Xen kan for sent till sblan.
(The teacher was late to school)
11. Dan karmer p8 torsdag
(They are caning on Thursday)
111. Det var Olle san vann tkivlingen
(It was Olle who w m the canpetition)
IV. Sarmarlwet barjar sent i &r
(Surmrer vacation starts late this year)
V.
Bollen studsade in g e m fdnstret
(The ball bounced in through the
VI.
Det finns en dtta i skafferiet
(There is a mouse in the pantry)
these sentences might also have caused some interaction between the most
likely moods, based on the meaning of the sentences, and the actor's
Ebth actors read the sentences in the six different
intended mood.
moods. The recordings were made with a high-quality microphone and tape
recorder in an anechoic room.
Selecting stimuli for the test list
It was apparent that the actors, more or less successfully, had
been able to achieve the intended mood in the different sentences. For
same sentences, it was apparent that they had been unsuccessful and in
some, the acoustic quality was unsatisfactory.
The first author selected 72 sentences from the recordings. Each interded mood was presented 12 times. Tb select the stimuli for the final test list, 23
members of the Dept. of Speech Cbmmunication & Music Arxxlstics listened
to the tape w e r head-nes.
On the answering sheet with all the sentences in the test, they marked which of the six moods they thougt was
intended by the speaker. The results of the listening test are shown in
Table 11.
In the table the disagreement between the listeners for the 72
different sentences is shown. For the sentences with the intended mood
"sad" (mood no 3), for example, all listeners agreed on sentences 34
and 42. On sentence 6, one listener identified the mood as no 2, "astonished", on sentence 11 one listener also disagreed and identified the
intended mood as no 1, "angry", and so on. The total per cent confusions made in the test on the sentences are shown in Fig. 1.
m 2.
-Astcnishedn
M P 3 n o 1 . "Angry"
Fixla
Sentence Gcntenca
m
type
Sentence Sentenfx Onfu-
68
69
15
('139
+ 57
62
10
29
45
7
32
2
no
5
MI
Mm
*
20
33
38
('155
N
MI
PlVI
66
MI11
FII
FI
MI
FIII
E'IV
E'IV
67
70
35
49
3
type
l?Iv
FV
MI1
MV
Mvl
MI1
MVI
M
MI
MI11
M
Nood no 3. "Sad"
aioMI
---
---585
5.6
484,484,40484
34
42
6
("111
18
60
22
43
56
61
63
23
E'III
Mm
FI
FIII
FV
MI1
EV
w
MI11
PlVI
MI
FI
Mmd no 4. "Afraid"
Sentence Sentence Conf*
no
type
aiane
MVI
M
MI1
MI
FII
Em
FII
Mv
WI
FII
WI
MI1
Table 11.
Sentence
no
sentence
type
rn
FII
FII
w
t.UII
FIII
MI1
F'III
MI11
M
MIV
MVI
Sentence Sentence Ccn*
no
type
sions
FI
Pa11
MI11
MVI
EVI
MVI
MI1
MI11
Em
MI1
FI
PI1
NtPnber of confusians and type of ccmfusians made by a group of 23 normal-hearing subjects on the test
tape. "Sentence no" is the nmber of the sentence on the tape. "Sentence type", Mmale , F=female
speaker, I-VI from Table I. In the c o l m "Confusions", the nmber of cmfusims and which cmfusims
that were made are shuwn. The d i f f e r e n t m d s are nLPnbered 1-6. Sentences marked * were used in the
final test tape.
Hz
60
ANGRY
-
I
Hz
I
I
I
I
I
I
AFRAID
300 60
I
Hz
60
I
I
HAPPY
-
I
Hz
300
I
POSITIVE
60
l
l
0
Fig. 2.
l
l
l
~
~
~
~
'
~
l
10
.5
l
l
l
15
l
l
l
l
l
l
25 sec
2.0
FLnzdamental frequency variations i n the sentence
"Dan karoner pb torsdag" (They are caning on Thursday) in the four different mods for the male
speaks.
I
-
:'o
-
I
.....
.....0 ..,..
:o
.:
X ANGRY
0 ASTONISHED
0 SAD
..
v AFRAID
O.j
AHAPPY
-
I
-
-
.........
A ..
...
...'+'.. . . . . "
........
.*....
.
. . . .+
. .<A. ......
. . ..:.:.;. .......... A j
i.;:;.+. ..::*....$.
;
:......
: . + .+
......................
0.:
.. . .. . .
-
..
:.:.
.. .. . .. v.:x
:
......:.:g;.x;
.........
. .. ...;..:
- : ..%'
.:.+-..:* .:. ....'..:...
..' '
..'
.
.
Q
.
.
.....: ..'
*.
-
i
.
I
-
I
I
I
100
150
FUNDAMENTAL FREQUENCY,
MEDIAN VALUE
Fig. 3.
-
......'.
in
200
HZ
Relatim between median value and total range in the
different moods for the m a speaker.
The figure
shows results for sentences where more than 75% of
23 noml-hearing listeners agreed m the mood.
l
l
l
t
test, 22 normal-hearing adult visitors at the Department listened to the
tape over headphones and selected one of the four moods marked on the
answering sheets for each test sentence. The result is shown as per
cent confusions in the matrices shown in Fig. 5. In the next experiment, the test tape was presented over a loadspeaker in a normal classroom to 20 normal hearing children of age ten years. The results are
sbwn in Fig. 6. In the last experiment, ten normal-hearing members of
the Department listened to the test tape when the signal was low-pass
filtered with a cutoff frequency of 500 Hz, damping 70 d~/oct. The
results are shown in Fig. 7.
TOTAL
FEMALE VOlCE
MALE VOICE
ANGRY
ANGRY
ANGRY
ASTONISHED
ASTONISHED
ASTONISHED
SAD
SAD
SAD
HAPPY
HAPPY
HAPPY
Fig. 5. Confusions in per cent between different moods of the speaker for
22 normal-hearing adults on the test list with four moods.
TOTAL
MALE VOICE
FEMALE VOICE
ANGRY
ANGRY
ANGRY
ASTONISHED
ASTONISHED
ASTONtSHED
SAD
SAD
SAD
HAPPY
HAPPY
HAPPY
Fig. 6. Ccmfusions in per cent between different moods of the speaker for
2 0 nonnal-hearing children.
MALE VOlCE
ANGRY
ASTONISHE0
FEMALE VOlCE
m
]
1 ( 1 ( (
ANGRY
-
100 -
-
ASTONISHED
SAD
HAPPY
Fig. 7. Confusicms in
per cent betwen different
moods of the
speaker for 10
no&-hearing
adults. nbe
test tap was
lots-pass filtered
with a cutoff frequency of 500 Hz.
msts with hearing impaired subjects
Two groups of hearing impaired subjects were tested. The first was
a group of 18 children from the School for the Partially Hearing in
Stockholm. They were between 11 and 14 years old, with a mean of 13
years. Their hearing losses were between 40 and 97 dB for the frequencies 500, 1000 and 2000 Hz for the best ear, with a mean of 76 dB. In
all cases, the hearing impairment was congenital or early acquired. The
method of communication used in the schcx>l is oral, and the children
always used hearing aids. The children listened to the test tape w e r
headphones (llX39). Before the actual test they were carefully trained
with the four training sentences until they understood the task. The
results are slmwn in Fig. 8. The children were also tested with a list
of three-word sentences where the emphasis was placed on the first,
second or the third word. The main acoustic difference in these test
sentences is changes in the fundamental frequency (Risberg & Agelfors,
1978). The children's abilities to detect small changes in a sinusoidal
signal was also measured (Risberg & Agelfors, 1984).
TOTAL
MALE VOICE
FEMALE VOICE
ANGRY
ANGRY
ANGRY
ASTONISHED
ASTONISHED
ASTONISHED
SAD
SAD
SAD
HAPPY
HAPPY
HAPPY
Fig. 8. Cm£usims in per cent between differentfor 18 hearing impaired children.
of the speaker
The other group of hearing impaired subjects consisted of 45 patients at the Rehabilitation Clinic of the South Hospital in Stodkholm.
The patients' ages varied from 26 to 74 with a mean of 55 years. Their
hearing losses were between 10 to 88 dB in the best ear for the frequencies 500, 1000 and 2000 Hz in the best ear with a mean of 38 dB. The
cause of hearing impairment was in most cases presbyacusis or miseinduced hearing loss. This group listened to the test with their personal hearing aids when the sentences were presented from a loadspeaker
in an ordinary room. Before testing, they were trained with the four
training sentences. For 24 of the subjects, the same test tape was
presented twice with three weeks interval between the two test sessions.
The confusions made in the first test session with total group of 45
patients are shown in Fig. 9.
TOTAL
MALE VOICE
FEMALE VOICE
ANGRY
ANGRY
ANGRY
ASTONISHED
ASTONISHED
ASTONISHED
SAD
SAD
SA 0
HAPPY
HAPPY
HAPPY
Fig. 9.
Ccnfusions in per cent between different mods of the speaker
for 45 hearing -red
adults.
DISCUSSION
The final test list with 16 sentences and with the four moods:
"angry", "astonished", "sad" and "hapm)' seemed
be satisfactory. In
the test with normal hearing listeners, the number of disagreements was
low. Eighteen of the 22 adult listeners agreed with the intended mood on
all 16 sentences, two disagreed on one sentence, and one on two sentences and one on three sentences. Fbr the test with the normal hearing
ten-years old children, the number of disagreements was higher. Six of
them agreed with the intended mood on all 16 sentences, seven agreed on
15, four on 14 and Wee on 13 of the sentences. The m a i n disagreement
was on sentence V, "The ball bounced in through the wirdod', prorwxlnced
by the female voice in the mood "angry" and identified as "sad", and for
the same sentence the stimulus in the mood "hapmj' for the male voice
was identified as "angry". Sentence 111, "It was Olle who won the
competition" pronounced in the mood "happy" was for the male voice
often identified as "angry". The speaker's mood was in this stimulus
expressed in a boisterous way that in many respects resembled the way he
expressed "angry". It is possible that especially the first sentence for
the children was too loaded with the associations that influenced them.
In continued work in this area, it is necessary t
x put more effort in
selecting semantically neutral sentences, especially if the test is to
be used with children.
Many of the hearing impaired subjects, both children and adults,
had difficulties in identifying the speaker's mood, see Figs. 8 and 9.
For the children, the per cent correct identification was 63% and for
References
Oowan, M. (1936): "Pitch and intensity characteristics of stage speech",
Arch. of Speech, Sqpl., Dec., pp. 3-92.
Fairbanks, G. & Hoaglin, L.W. (1941): "An experimental study of the
durational characteristics of the voice during the expression of emotion", Speech Monograph, 8, pp. 85-91.
Fairbanks, G. & Pronovost, W. (1939): "An experimental study of the
pitch dharacteristics of the voice during the expression of emutmn ,
Speech Monograph, 6, pp. 87-104.
II
Fastl, H. & Weinberger, M. (1981): "Frequency discrimination of pure
tones and complex tones", Ikustica, 49, pp. 77-78.
Fonagy, I. (1981): "Emotion, voice and music", pp. 51-79 in (J. Sundberg, ed): Research aspects on singing, Proc. from a seminar organized
by the committee for the acoustics of music, Publ. issued by the Ibyal
Swedish lkademy of Music, no 33, Stockholm.
Fourcin, A.J. (1980): "Speech pattern audiometry", pp. 170-208 in ( H A
Beagley, ed.): Auditory investigation; the Scientific and logical
Basis, Clarendon Press, Oxfod.
Huttar, G.L. (1967): "Some relations between emotions and the prosodic
parameters of speech", Speedh Comm. Lab., Inc, St. Ebrbara/~A, Momgra@h
no 1, July 1967.
Huttar, G.L. (1968): " Wlations between prosodic variables and e m tions in normal American English utterances", J. Speech w i n g Ftes. 11,
pp. 481-487.
-
Lieberman, P. & Michaels, S.B. (1962): "Some aspects of fundamental
frequency and envelope amplitude as related to the emotional content of
34:7, pp. 922-927.
speech", J. Acoust.Soc.Am. Risberg, A. & Welfors, E. (1978): "On the identificatian of intanation
contours by hearing impaired listeners", STL-QPSR 2-3/1978, pp. 51-61.
Risberg, A. and Agelfors, E. (1984): "m the relation between frequency
discrimination ability and the degree of hearing loss", m P S R 4/1984,
pp. 59-70.
Williams, C.E. & Stevens, K.L. (1972): "Emotions and speech: Some acous52:4, part 2, pp. 1238-1250.
tical correlates", J~ust.SocAn. -