Download Speech intelligibili..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sound wikipedia , lookup

Earplug wikipedia , lookup

Sound localization wikipedia , lookup

Auditory system wikipedia , lookup

Speech perception wikipedia , lookup

Dysprosody wikipedia , lookup

Transcript
1.0
1.1
CHAPTER ONE
INTRODUCTION
Hearing sound is different from listening to it and is also different from comprehending the
sound. Speech intelligibility refers to the ability to comprehend what is spoken. It has been
observed that humans can perceive a lot of sound without listening to it which is referred to as
background noise. Humans can sometimes listen to sound without being able to understand it.
This could be as a result of so many factors. Music is usually easier to understand than speech
because in speech the information is ever changing and the brain may not be able to fill in
missing part unlike music that contains some repeated words and rhymes that helps the listener
understand what is been sang even after missing some parts of the music.
In terms of individual communication, speech is probably the most important and efficient
means, even in today's multi-media society. Experimental tests by Chapanis (Andrew marsh,
uwa, 1999) show that, as would be expected, the performance time of cooperative tasks
performed in groups was up to ten times faster when speech was allowed compared to when it
was not.
In situations where egress is complex or difficult, such as in high-rise buildings or large
factories, human voice is often used to provide information. Failure to understand message
content can result in several ways. A message that is not intelligent may not be understood. A
message spoken in Spanish to an audience that only understands Cantonese will not be
understood. A person talking rapidly or with a speech impediment can cause a message to not be
understood. Even a well spoken, intelligent message in the language native to the listener can be
misunderstood if it is not audible or if its delivery to the listener is distorted. These last failure
mechanisms are the basis for the specification, modeling, and measurement of speech
intelligibility performance.
Thus, with many rooms being used solely for speech between individuals and groups, it is
important that acoustic designs accommodate and enhance such use. In other to fully understand
and make recommendations to speech intelligibility, there is the need to first understand how
sound is perceived in humans.
Page 1 of 15
1.2
ANATOMY OF THE HUMAN EAR
Plate 1: Diagram showing anatomy of the ear.
Source: virtualmedicalcentre. com
From the above diagram the ear is divided into three parts, they are;
1.2.1 OUTER EAR
The outer part of the human ear is made up of the pinna, the ear canal (or external auditory
meatus) and the eardrum (or tympanic membrane).
1.2.2 MIDDLE EAR
It is made of the three ear ossicles; the hammer (malleus) attached to the eardrum; the anvil
(incus); and the stirrup (stapes), the oval window and the Eustachian tube.
1.2.3 INNER EAR
It consists of the vestibule, semi- circular canals, and cochlea.
Page 2 of 15
1.3
HEARING MECHANISM
Sound is a disturbance that originates from a source and travels through a medium which could
be air, liquid or solid which causes a sensation of hearing in animals. Sound is the means of
auditory communication, including frog calls, bird songs and spoken language. The ear is the
part of the animal sense organ that recognizes sound but sound interpretation is the function of
the brain.
Sometimes sound waves are transmitted to the inner ear by a method of hearing called bone
conduction. For example, people hear their own voice partly by bone conduction. The voice
causes the bones of the skull to vibrate, and these vibrations directly stimulate the soundsensitive cells of the inner ear. Only a relatively small part of a normal person’s hearing depends
on bone conduction, but some totally deaf people can be helped if sound vibrations are
transferred to the skull bones by a hearing aid.
Humans hear primarily by detecting airborne sound waves, which are collected by the auricles.
The auricles also help locate the direction of sound.
After being collected by the auricles, sound waves pass through the outer auditory canal to the
eardrum, causing it to vibrate. The vibrations of the eardrum are then transmitted through the
ossicles, the chain of bones in the middle ear. As the vibrations pass from the relatively large
area of the eardrum through the chain of bones, which have a smaller area, their force is
concentrated. This concentration amplifies, or increases, the sound.
When the sound vibrations reach the stirrup, the stirrup pushes in and out of the oval window.
This movement sets the fluids in the vestibular and tympanic canals in motion. To relieve the
pressure of the moving fluid, the membrane of the oval window bulges out and in. The
alternating changes of pressure in the fluid of the canals cause the basilar membrane to move.
The organ of Corti, which is part of the basilar membrane, also moves, bending its hairlike
projections. The bent projections stimulate the sensory cells to transmit impulses along the
auditory nerve to the brain. The part of the ear that is dedicated to sensing balance and position
also sends impulses through the eighth cranial nerve, the 8th nerve's Vestibular Portion. Those
impulses are sent to the vestibular portion of the central nervous system. The messages from the
Page 3 of 15
right and left vestibular systems feed by way of the right and left vestibular nerves into the
vestibular centers (nuclei) in the brainstem. These centers also receive input from the eyes,
muscles, spinal cord, and joints. Furthermore, higher centers in the brain continue to process the
information. The final result is an integrated system that allows us to maintain our balance in our
ever-changing environment.
Plate 2; diagram showing hearing mechanism
Source: virtualmedicalcentre. com
Page 4 of 15
2.0
2.1
CHAPTER TWO
SPEECH INTELLIGIBILITY
The intelligibility of speech refers to the accuracy with which a normal listener can understand a
spoken word or phrase. Given the fact that some of the information communicated through
speech is contained within contextual, visual and gestural cues, it is still possible to understand
meaning even if only a fraction of the discrete speech units are heard correctly. However, in
large auditoria and places where reproduced speech is used, the listener has limited access to
these cues and must rely more heavily upon the sound actually produced by the mouth.
2.1.1 MEASURING INTELLIGIBILITY
Research into this area began with the development of telephone and telecommunication systems
in the early part of this century. A product of this research was a quantitative measure for
intelligibility based on articulation testing. This procedure (as described by Lochner and Burger)
normally consists of an announcer reading out lists of syllables, words or sentences to one or
more listeners within the test enclosure. The percentage of these correctly recorded by the
listeners is called the articulation score and is then taken as an 'in-situ' measure of the speech
intelligibility of that enclosure.
The science of articulation testing was substantially refined at Bell Telephone Laboratories and
later at the Psycho-Acoustic Laboratory at Harvard University. From this later work, a set of
phonetically balanced, mono-syllabic test lists were prepared, called the Harvard P.B.50 word
score. In order to negate any influence of non-phonetic cues on the measured intelligibility, these
word lists comprise only of meaningless or jumbled syllables. Thus, in order to be correctly
recorded by the listener, each consonant and vowel sound must be clearly audible. As a further
measure, many tests are conducted with the syllables embedded in a carrier phrase in an attempt
to simulate fluent speech. There are now many derivations of this methodology (such as the
Fairbanks rhythm method used by Bradley and Latham); however, the resulting value is a
percentage score of correctly recorded syllables. Thus the degree of intelligibility is considered
to correlate with the average of these scores. This percentage becomes the measured speech
intelligibility rating for that particular enclosure.
Page 5 of 15
However, even under perfect conditions, the maximum word score normally attainable is about
95% due to unavoidable errors. A word score of 80% enables the audience to understand every
sentence without due effort. In a room where the word score is closer to 70%, the listener has to
concentrate to understand what is said whilst below 60% the intelligibility is quite poor.
2.1.2 PREDICTING INTELLIGIBILITY
There are several available methods of predicting Speech Intelligibility within an enclosure.
These include the articulation index (AI), the speech interference level (SIL), the A-weighted
signal-to-noise ratio (Lsa), useful/detrimental sound ratio's (U80 and U95) and the speech
transmission index (STI). Each of these methods is based on the same fundamental principle,
determining a ratio between the received speech signal and the level of interfering noise. It is this
basic signal-to-noise relationship upon which speech intelligibility is deemed to depend - the
higher the ratio, the greater the intelligibility.
There are basically three measurable factors which influence these signal-to-noise ratios;

The level and manner of the speech output,

The level and spectrum of background noise and

The nature and duration of the room response.
2.1.3 CALCULATING SPEECH INTELLIGIBILITY
There are recently developed softwares that can be used to calculate speech intelligibility. They
able to calculate reverberation time, critical distance, echo, and sound levels over the seating
areas for rooms. The results show the inter-relationship between reverberation time, room
finishes, speaker selection and speech intelligibility. The calculation is carried out by inputting
the physical parameters of the space into the program. AutoCAD file can be used to transfer the
data to the program. The data entry includes surface materials, background noise, and the seating
layout. This calculation helps in auralization which is an acoustic program that simulate the
sound as it would be heard after the project is built.
Page 6 of 15
3.0
CHAPTER THREE
3.1 EFFECTS OF HUMAN MIDDLE EAR MUSCLE CONTRACTIONS ON SPEECH
INTELLIGIBILITY.
An assessment of the relationship between the acoustic reflex and speech intelligibility was
made. Monosyllabic words mixed with noise were presented at -6, -3.0 and +3 dB signal-tonoise ratios. The word lists were presented with a 2000-Hz tone in the contra lateral ear at a level
15 dB above or 20 dB below the acoustic reflex threshold to evaluate intelligibility differences
with the acoustic reflex contracted and relaxed. The results indicated that a significant decrement
in speech intelligibility differences with the acoustic reflex contracted and relaxed. The results
indicated that a significant decrement in speech intelligibility occurred with the reflex contracted
at signal-to-noise ratios of -3, and 0 dB. Slight but not significant decrements were seen at -6 and
+3 dB signal-to-noise ratios.
3.2
EFFECT OF AGE ON BINAURAL SPEECH INTELLIGIBILITY
Sentence perception performance, in quiet and in background noise, was measured in three
groups of adult subjects categorized as young, middle-aged, and elderly. Pure tone audiometric
thresholds, measures of inner ear function, obtained in all subjects were within the clinically
normal hearing range. The primary purpose of this study was to determine the effect of age on
speech perception: a secondary purpose was to determine if the speech recognition problem
commonly reported in elderly subjects might be due to alterations at sites central to the
peripheral nervous system inner ear. Standardized sentence lists were presented in free field
conditions in order to invoke binaural hearing that occurs at the brainstem level, and to simulate
everyday speech-in-noise listening conditions. The results indicated: (1) an age effect on speech
perception performance in quiet and in noise backgrounds, (2) absolute pure tone thresholds
conventionally obtained monaurally do not accurately predict suprathreshold speech perception
performance in elderly subjects, and (3) by implication the listening problems of the elderly may
be influenced by auditory processing changes upstream of the inner ear.
Page 7 of 15
4.0
4.1
CHAPTER FOUR
FACTORS AFFECTING SPEECH INTELLIGIBILITY
For speech to be intelligible, it must have adequate audibility (sound pressure level) and
adequate clarity. For audibility, we are concerned with the signal-to-noise ratio. Voice is highly
modulated, and so while intelligibility measurements do incorporate audibility, it is not to the
same standards used for audibility of tone generating systems.
4.1.1 BACKGROUND NOISE
Within every acoustic environment, there is always a certain level of ambient background noise
present. The level of this is mostly dependant on the activities taking place within the space and
its more immediate surrounds. The most obvious effect of background noise is that it masks the
speech signal, thus reducing the signal-to-noise ratio as the receiver must specifically concentrate
on the speech. There is, however, another effect of background noise.
As is obvious, short duration speech may entail significant vocal effort whereas conversations of
a longer duration require a much lower level in order that it is comfortably sustained by the
speaker. This, however, explains only one aspect of a speaker choice of speech level.
One of the most important determinants of speech level is what is termed the Lombard effect.
This effect (originally noted by Lombard and studied further by Lane and Tranel) is most clearly
illustrated when required to speak whilst listening to headphones. In general, a speaker checks
his vocal effort using feedback from his own hearing and the exertion of muscles participating in
the speech process. With headphones obscuring the ears and the music masking much of the
feedback, the voice is almost automatically raised to compensate. Lane and Tranel showed that
background and other interfering noise have much the same effect, as do both temporary and
permanent hearing loss.
Quantifying this effect is difficult as the individual response is often complex. For example, in
normal conversation, rather than raising their voice, participants are more likely to move closer
together. However a rule-of-thumb relationship (as suggested by Lazarus) is that every 1 dB
increase in interfering noise above 45 dB will result in an average rise in output speech level of
0.5 - 0.6 dB. This automatic rise does not normally occur at softer speech levels as these are
more likely to mean individual or face-to-face conversations. In these situations, the clear
Page 8 of 15
preference is almost always to move closer together meaning that this effect is only applied to
speech levels of 55 dB or higher.
4.1.2 ROOM RESPONSE
The nature of the room response can significantly affect speech intelligibility. This influence,
whether beneficial or detrimental, is a function of the impulse response. In general, the enclosure
will enhance the perception of speech when the amount of energy reaching the listener within the
speech integration period (35-50 msec) is relatively high. Given that ambient background noise
is constantly being reflected about the enclosure, any additional early speech reflections will
effectively increase the apparent signal-to-noise ratio. However, late arriving reflections and
excessive reverberation actually contribute to the apparent background noise level by interfering
with the direct speech signal. Thus, too much late sound energy will tend to reduce the apparent
signal-to-noise ratio.
4.1.3 SPEECH LEVEL
In a disturbance-free environment, normal speech levels fall between 55 and 65 dB (measured at
a distance of 1m from the speaker). In specific situations, levels may reach as high as 96 dB
when shouting a warning, or as low as 30 dB when whispering softly. It should be noted that
speech levels vary significantly between individuals, even in similar acoustic conditions. Thus,
when making predictions of speech intelligibility, it is often wise to base them on worst-case
values rather than the average. This can be done by tempering the average value by an amount
representing one standard deviation of inter-individual speech levels. Even this does not fully
account for the variation at extremes of effort. As can easily be shown (Pearsons et al), up to
15% of women cannot raise their voice above 75 dB whilst 15% of shouting men can easily
exceed 96 dB, sometimes reaching as high as 104 dB. The following table presents 'worst-case'
values for vocal effort.
Vocal Effort
dB(A)
Vocal Effort dB(A)
Whispering
32
Raised
57
Soft
37
Loud
62
Relaxed
42
Very Loud
67
Shouting
72
Normal (private) 47
Page 9 of 15
Normal (public)
52
Max. Shout
77
Table 1: Showing average vocal effort and sound level.
Source:
When calculating intelligibility, it has been shown that loud or shouted speech is more difficult
to understand, regardless of the level at the listeners’ ear. This is due mainly to changes in
phonetics and intonation, becoming noticeable above 75 dB. Additionally, if, for more normal
speech, the level arriving at the ear is very high (greater than 80 dB), there is research to indicate
an overloading of the ear. This may occur as a result of a very short speaker-listener distance and
results in a further decrease.
Page 10 of 15
5.0
5.1
CHAPTER FIVE
MEASURES OF SPEECH INTELLIGIBILITY
5.1.1 THE A-WEIGHTED SIGNAL/NOISE RATIO (SNA)
This is probably the simplest and easiest to apply of all the methods proposed. Simply
determined, this measurement relates to the difference between the A-weighted long-term
average speech level and the A-weighted long-term average level of background noise, measured
over any particular time.
SNA = LSA - LNA
5.1.2 THE ARTICULATION INDEX (AI.)
This value is basically a linear measure ranging from 0.0 to 1.0 based on calculations of the
signal to noise ratios in five octave bands (with centre frequencies of 0.25, 0.5, 1, 2 and 4 kHz).
It is possible to obtain a more accurate calculation based upon 1/3rd octave band sound pressure
levels (based on work by Kryter); however, this requires more detailed knowledge of both the
speech and noise spectrums. Since the speech level usually refers to the long term value for
normal speakers, octave spectra are normally sufficient for simple calculations.
5.1.3 SPEECH TRANSMISSION INDEX (STI)
It is a measure of speech transmission quality. The absolute measurement of speech intelligibility
is a complex science. The STI measures some physical characteristics of a transmission channel
(a room, electro-acoustic equipment, telephone line, etc.), and expresses the ability of the
channel to carry across the characteristics of a speech signal. STI is a well-established objective
measurement predictor of how the characteristics of the transmission channel affect speech
intelligibility. STI predicts the likelihood of syllables, words and sentences being comprehended.
As an example, for native speakers, this likelihood is given by:
STI
Quality according to Intelligibility of
Intelligibility of
Intelligibility of
Page 11 of 15
Value
IEC 60268-16
Syllables in %
Words in %
Sentences in %
0 - 34
0 - 67
0 – 89
poor
34 - 48
67 - 78
89 – 92
fair
48 - 67
78 - 87
92 – 95
good
67 - 90
87 - 94
95 – 96
90 - 96
94 - 96
96 – 100
0 - 0.3 bad
0.3 0.45
0.45 0.6
0.6 0.75
0.75 - 1 excellent
Table 2: speech intelligibility likelihood for native speakers
Source:
Another method is defined for computing a physical measure that is highly correlated with the
intelligibility of speech as evaluated by speech perception tests given a group of talkers and
listeners. This measure is called the Speech Intelligibility Index, or SII
5.2
RECOMMENDED ACOUSTICS FOR SPEECH
Unamplified speech sound at distant of 3meters
 30 dBA - whispering.
 60 dBA - lecture voice.
 70 dBA - loud actor (down to 50 dB after 30 rows of seats).
Page 12 of 15
6.0
6.1.
CHAPTER SIX
SPEECH REINFORCEMENT
The goal of a speech reinforcement system is to deliver the speaking voice to listeners with
sufficient clarity to be understood. Given the complexity of the speech signal, the task of
providing high-quality speech reinforcement in real-world, less-than-ideal conditions is doubly
complicated. This can be done through the following means.
6.1.1 MASKING.
The most common obstacle that speech system designers face is the intrusion of unwanted
sounds that inevitably interfere with the speech signal. The effect is called “masking,” — a
general term that covers a very wide variety of situations.
Masking noise can come from acoustical sources such as ventilation equipment, traffic, crowds
and commonly, reverberation and echoes. It can also arise electronically from thermal noise, tape
hiss or distortion products. If the sound system has unusually large peaks in its frequency
response, the speech signal can even end up masking itself.
One relationship between the strength of the speech signal and the masking sound is called the
signal-to-noise ratio expressed in decibels. Ideally, the S/N ratio is greater than 0dB, indicating
that the speech is louder than the noise. Just how much louder the speech needs to be in order to
be understood varies with, among other things, the type and spectral content of the masking
noise. The most uniformly effective mask is broadband noise.
6.1.2 FREQUENCY RESPONSE.
One of the most obvious aspects of sound system performance that affect intelligibility is
frequency response. Severely band-limited systems deliver speech poorly. For instance,
telephones are generally limited to a 2 kHz bandwidth, and this makes it hard to distinguish
between “f” and “s” or “d” and “t” sounds.
High-quality speech systems need to cover the frequency range of about 80 Hz (for especially
deep male voices) to about 10 kHz (for best reproduction of consonants, which are crucial to
intelligibility). Response below 80 Hz must be eliminated to the extent possible: not only do
these frequencies fall below the range of the speech signal, but also they will cause particularly
destructive masking at high sound levels.
Page 13 of 15
6.1.3 DISTORTION.
Early studies of intelligibility in communication systems suggest that clipping the peaks of the
speech signal, and then amplifying it to restore its peak-to-peak amplitude, improves
intelligibility. The trick works in very noisy situations because clipping generates partials that are
harmonically related to the fundamental — and thus less likely to mask the speech — and
because it both accentuates consonants and increases the sound power of the signal. As such, it
has been helpful for band-limited communication systems that are used in very noisy
environments, such as the deck of an aircraft carrier.
6.1.4 TIME RESPONSE.
Perhaps because it remains poorly understood and its effects are more subtle, phase response in
communication systems has received scant attention. In fact, most published research about
“phase” and intelligibility actually deals with the effects of relative polarity. It’s been shown, for
instance, that when speech is presented with noise over headphones, intelligibility increases by
about 25% if the speech signal in one ear is inverted relative to the other ear. But this result has
no application in sound reinforcement, other than for in-ear stage monitors.
Page 14 of 15
REFERENCES
1. http://www.sdngnet.com/Files/Lectures/FUTA-ARC-507
2. Andrew Marsh, UWA, 1999.The School of Architecture and Fine Arts The University of
Western Australia
3. Greinwald, John H. Jr. MD; Hartnick, Christopher J. MD. The Evaluation of Children
with Hearing Loss. Archives of Otolaryngology — Head & Neck Surgery. 128(1):84-87,
January 2002
4. Stenström, J. Sten: Deformities of the ear; In: Grabb, W., C., Smith, J.S. (Edited):
“Plastic Surgery”, Little, Brown and Company, Boston, 1979, ISBN 0-316-32269-5 (C),
ISBN 0-316-32268-7 (P)
5. Lam SM. Edward Talbot Ely: father of aesthetic otoplasty. [Biography. Historical
Article. Journal Article] Archives of Facial Plastic Surgery. 6(1):64, 2004 Jan-Feb.
6. Siegert R. Combined reconstruction of congenital auricular atresia and severe microtia.
[Evaluation Studies. Journal Article] Laryngoscope. 113(11):2021-7; discussion 2028-9,
2003 Nov.
7. ENCARTA 2009
Page 15 of 15