Download Zwicker_Zwicker_AES

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Noise-induced hearing loss wikipedia , lookup

Earplug wikipedia , lookup

Sensorineural hearing loss wikipedia , lookup

Sound wikipedia , lookup

Sound localization wikipedia , lookup

Auditory system wikipedia , lookup

Soundscape ecology wikipedia , lookup

Sound from ultrasound wikipedia , lookup

Transcript
PAPERS
Audio Engineering and Psychoacoustics:
Matching Signals to the Final Receiver,
the Human Auditory System*
EBERHARD ZWICKER AND U. TILMANN ZWICKER
Institute of Electroacoustics, Technical University Munich, D-8000 Munchen 2, Germany
The consequences of the fact that the human auditory system is the final receiver in
almost all cases of sound recording, transmission, and reproduction are discussed. Th e
strategies of processing and transmitting sound as effectively as possible on one hand, and
also as "undistorted" as possible on the other need adaption to the perception characteristics
of the auditory system: The transformation of frequency to critical-band rate as well as the
transformation of level to specific loudness are the tools used for this adaption.
Examples for practical applications of the basic idea are illustrated.
0 INTRODUCTION
During the last few years, digital sound processing
and storage have been adopted widely in audio, and
are providing excellent sound quality. However, converting an audio stereo signal to a 16-bit digital format
with appropriate redundancy for error correction and
with a minimum sampling rate around 44 kHz requires
extremely extended bandwidth for signal transmission
and storage, the latter coupled with huge mass-storage
necessities. The large bandwidth results in problems for
radio transmission in particular, so there is considerable
interest in avoiding any redundancy in the signal other than
for error-correction purposes. To achieve sound
transmission or reproduction that is not only very
good but also efficient, all equipment has to be adapted
to the characteristics of the final receiver, in this case
the human ear. Any part of the transmitted signal that is
not recognized by the auditory system shows bad
matching to the receiver and provides unnecessary
redundancy. Considerable progress has been made to
implement methods of reduction of unnecessary data
derived from findings in the field of psycho-acoustics.
Most of these efforts concern the future digital audio
broadcasting (DAB), for example [1]-[3], but
* Manuscript received 1990 July 18; revised 1990 December
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
12.digital storage can profit from the possible information
reduction as well.
In fields other than audio, such as the transmission
of electrical power, the adaption to the final receivers is
very well established and generally applied for
transmission from power plant to power plant as well
as from power plant to factories and even to individual
households. In the field of transmitting information,
the same rule holds as for power transmission. Therefore, all of our efforts in improving electroacoustic
information transmission—including recording—have
to be seen from the perspective of the final receiver, the
human auditory system. This perspective has many more
advantages in audio engineering, such as in instrumentation and with public-address applications, as
discussed in this paper.
1 THE FINAL RECEIVER: THE HUMAN
AUDITORY SYSTEM AND PERCEPTION
Eventually important is the perception of sound. We
do not perceive frequency, we rather perceive pitch;
we do not perceive level, but loudness. We do not
perceive spectral shape, modulation depth, or frequency
of modulation; instead we perceive "sharpness," "fluctuation strength," or "roughness." We also do not perceive time directly; our perception is the subjective
1
ZWICKER AND ZWICKER
duration, often quite different from the physical du-
PAPERS
PAPERS
ration. In all of the hearing sensations mentioned, which
are described in detail elsewhere [4]— [6], masking plays
an important role in the frequency, as well as in the time
domain. Consequently Sec. 2 deals with masking effects
and the transformation from frequency scale to
critical-band-rate scale and from level scale to
specific-loudness scale. The information received by our
auditory system can be described most effectively in the
three dimensions of specific loudness, critical-band rate,
and time. The resulting three-dimensional pattern is the
measure from which the assessment of sound quality
can be achieved. Some applications of this pattern,
which is reproduced in a modern loudness meter, for
example, are discussed especially in view of modern
electroacoustic transmission and reproduction.
In this paper, the main emphasis is on practical applications of psychoacoustics in the field of perception
and reproduction of sound, and many scientific details
are therefore omitted. Rather, the basic, important facts
are enhanced. Further information is available from
books on psychoacoustics and electroacoustics [4]—[8].
2 PSYCHOACOUSTICAL PRINCIPLES
APPLICABLE IN AUDIO ENGINEERING
PRACTICE
2.1 Transformation from Frequency to
Critical-Band Rate
barely audible test tone, and the resulting level, called
excitation level, is shown as ordinate. The level of the
narrow-band maskers is 60 dB for all curves. Comparing
the results produced from different center frequencies
of the masker, we find the form of the curves to be
rather dissimilar, no matter what frequency scaling we
use. It seems as if the shape of the curves is similar for
center frequencies up to about 500 Hz on linear
frequency scale, while for center frequencies above
500 Hz there is a similarity on a logarithmic frequency
scale. This intuitive result is quite accurate since the
hearing-equivalent critical-band-rate scale mentioned
follows a linear frequency scale up to about 500 Hz
and then a logarithmic frequency scale above 500 Hz.
This relation is illustrated in Fig. 2 by two different
frequency scales, one divided linearly, the other logarithmically. Approximations, which sometimes may
be useful within certain frequency ranges, are also indicated. Fig. 2(a) shows the uncoiled inner ear, including
the basilar membrane. It indicates that the critical-band-rate scale is directly related to the place along
the basilar membrane where all the sensory cells are
located in a very equidistant configuration (one row of
inner hair cells and three rows of outer hair cells).
Thus the critical-band-rate scale is closely related to our
physiology, too.
The critical-band concept is based on the well-proven
The effect of masking plays a very important role in
hearing, and is differentiated into simultaneous and
nonsimultaneous masking. An example for the simultaneous condition would be the case where we have a
conversation with our neighbor while a loud truck passes
by. In this case our conversation is severely disturbed.
To continue our conversation successfully we have to
raise our voice to produce more speech power and
greater loudness. In music, similar effects take place.
The different instruments can mask each other, and
softer instruments become audible only when the loud
instrument pauses. Such simultaneous masking is outlined here for quantitatively easily describable conditions, while nonsimultaneous masking is discussed
in Sec. 2.3. Simultaneous masking can be understood
more easily if instead of the frequency scale a
hearingequivalent scale, that is, the critical-band-rate
scale, is used.
Masking usually is described as the sound-pressure
level of a test sound (a pure tone in most cases) necessary to
be barely audible in, the presence of a masker. For
narrow-band noises used as maskers and pure tones
used as test sounds, masking patterns can be produced
for different center frequencies of the narrow-band noise
maskers, as shown in Fig. 1. The same information is
given in Fig. 1(a) and (b). However, in Fig. 1(a) the
level of the barely audible pure tone is plotted as a
function of frequency on a linear scale, in contrast to
the logarithmic scale used in Fig. 1(b). In order to
make the masking patterns directly comparable through
having the same peak values, the so-called masking
index, a value of 2-6 dB (for details see the literature
mentioned), is added to the sound-pressure level of the
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
3
ZWICKER AND ZWICKER
PAPERS
0.5
(a)
4
1
5kHz 10 20
(b)
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
Fig. 1. Excitation level (masking level with added masking index) of narrow-band noises of given center frequency as a function
of frequency. Broken lines—threshold in quiet. (a) Linear scale. (b) Logarithmic scale.
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
5
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
assumption that our auditory system analyzes a broad
spectrum in parts that correspond to critical bands.
Adding one critical band to the next, so that the upper
limit of the lower critical band corresponds to the lower
limit of the next higher critical band, produces the
scale of the critical-band rate. Since critical bands have a
100-Hz width up to 500 Hz and above 500 Hz take a
relative width of 20%, it becomes clear why the critical-band rate is dependent on frequency as illustrated.
This can also be seen in Fig. 2(c), where the
critical-band rate is plotted as a function of frequency on
the logarithmic scale, a scale more appropriate for approximating the critical-band rate. The latter fact is
especially advantageous for problems dealing with
speech transmission, where important spectral features
are located in the spectral region between 300 and 5000 Hz.
However, it is also necessary to realize that the linear
relation between frequency and critical-band rate plays an
important role in music based on harmony.
system which, through saturation, is effective at low
levels only. At higher levels the feedback automatically
disappears. This leads to a shape of the masking curve
corresponding to the amplitude of the traveling wave
along the basilar membrane, as seen for higher levels.
Recent data of this traveling wave measured at very
low levels have proven that feedback takes place in
Because the critical-band concept is used in so many
models and hypotheses, a unit for the critical-band rate was
defined, which is one critical band wide. It is the bark,
in memory of Barkhausen, a scientist from Dresden,
Germany, who introduced the phon, a unit describing
the loudness level for which the critical band plays an
important role.
When frequency is transferred into critical-band rate,
the masking patterns outlined in Fig. 1 change to those
seen in Fig. 3. There the level of the barely audible
pure tone (again expressed as excitation level, that is,
including the masking index) is plotted as a function
of the critical-band rate for the same narrow-band
maskers as shown in Fig. 1. The effectiveness of the
natural frequency scale, that is, the critical-band-rate
scale, is obvious. The shapes of the curves for different
center frequencies are very similar. Only at very low
frequencies, below about 100 Hz, where special masking
effects (such as the masking-period patterns) lower the
amount of masking, the upper slope is somewhat steeper.
Fig. 3. Excitation level versus critical-band rate for
narrow-band noises of given center frequency and 60-dB sound
pressure level. Broken lines—threshold in quiet. Adopted from
[5 ] .
It is not only the masking effect that can be described
more simply and become more easily understandable
in terms of this natural scale corresponding to
locationalong the basilar membrane, but also many other
effects, such as pitch, frequency differences barely
noticeable, or the growth of loudness as a function of
bandwidth. Therefore when dealing with hearing
sensations, it is very effective to transfer first the
frequency scale into the critical-band-rate scale.
The effect of masking produced by narrow-band
maskers is level dependent and, therefore, a nonlinear
effect. As shown in Fig. 4(a), all masked thresholds
show a steep rise from low to higher frequencies up to the
maximum of masking. Beyond this maximum, the masked
threshold decreases quite rapidly toward higher
frequencies for low and medium masker levels. At
higher masker levels, however, the slope toward high
frequencies, that is, larger critical-band rate, becomes
increasingly shallow. This nonlinear rise of the upper
slope of the masked threshold with the masker level is
an effect that is assumed to be produced in the inner
ear already. The outer hair cell rows form a feedback
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
6
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
_
f
-(z=1Barkr--100rne)
pf.../2, _1
Bark-9+41d _________kHz/
_ z _
10
(a)
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
(b)
(c)
7
ZWICKER AND ZWICKER
PAPERS
Fig. 2. (a) Scale of uncoiled cochlea. (b), (c) Critical-band rate (ordinate, linear scale) as a function of frequency. (b) Linear scale.
(c) Logarithmic scale. Useful approximations are indicated by broken lines and related equations. Adopted from [5].
8
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
the inner ear already, producing a narrower amplitude
distribution of the traveling wave and consequently a
narrower masking pattern at low masker levels. This
is also seen in data for masking patterns produced by a
model reproducing peripheral preprocessing in the inner
ear, including nonlinear feedback with lateral coupling.
The real masking data [Fig. 4(a)] and the model data
[Fig. 4(b)] compare very nicely. Therefore, and in
accordance with physiological data from animals, it can be
assumed that simultaneous masking is already produced in
the peripheral preprocessing of the inner ear, that is,
before the information is transferred to the neural level.
Another possibility to measure masking is
psycho-acoustical tuning curves. In this case the level
of the test tone is fixed, while the level of the masker, in
most cases also a tone, is increased so that the test tone just
becomes inaudible. Plotting this masker level as a
function of critical-band rate results in the so-called
psychoacoustical tuning curves. Such a curve is outlined
in Fig. 4(c); it has a shape that correlates quite strongly
with the data seen in Fig. 4(d), as described in the
following.
The assumption that frequency selectivity takes place
in the peripheral part of the auditory system (the inner
ear) and produces the critical-band-rate scale can also be
supported by experiments on the suppression of so-
called spontaneous otoacoustic emissions [9], which
appear in half of all ears at levels around 0 dB SPL.
These tonal emissions, which are proven to be produced
in the inner ear, can be suppressed by adding a suppressor tone, the level of which is given in the related
suppression tuning curves as a function of the
critical-band rate. The curve given belongs to the
criterion of 6-dB amplitude reduction of the spontaneous
otoacoustic emission. These are objective data because
they do not depend on a subject 's response at all. By
comparing Fig. 4(c) and (d), one can easily see that the
psycho-acoustically measured tuning curves, which
involve the highest possible signal processing level in the
brain, show the same frequency selectivity as the
suppression tuning curves resulting from purely
peripheral processing. Therefore the frequency-selective
and nonlinear effect of simultaneous masking produced in
our auditory system can be assumed as being produced
already in the peripheral part of the inner ear and still in the
analog domain, that is, installed before the signal
information is transferred into neural information using
spike rates. Since the arrangement of the hair cells is
equidistant and the form of the traveling wave in the
inner ear, besides the shift along the basilar membrane,
does not change much as a function of frequency, it
becomes understandable why the natural scale of the
critical-band rate, which corresponds to the location
along the
100
dB
80
0
60
a,
40
20
0
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
9
ZWICKER AND ZWICKER
PAPERS
100
dB
80
0
rn 60
a,
Ti 40
— 20
0
psychoacoustical
tuning curve
100
dB
80
0
L
60
F. 2
A
1
0.
0. 40
0
20
0
06
10
8 10 12 14 16Bark 20
6
8
10
12
14
16Bark
20
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
critical - band rate
(c)
(d)
Fig. 4. (a) Level of test tone barely masked by narrow-band noise of given level, (b) Test-tone level needed in model to
produce 1-dB increment anywhere along basilar membrane. (c) Psychoacoustical and (d) suppression tuning curve; each as a
function of critical-band rate.
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
11
ZWICKER AND ZWICKER
basilar membrane, is the adequate scale to describe
frequency and frequency-selective effects in hearing.
2.2 Transformation from Level to Specific
Loudness
When we talk about loudness in view of quantitative
relations, we often think of the loudness function of a
1-kHz tone. This function is established by answering
the question of how much louder a sound is heard relative
to a standard sound. The standard sound in electroacoustics is a 1-kHz tone, and the reference level in this
case is 40 dB. Many measurements of different
laboratories have produced similar results so that
eventually the loudness function of a 1-kHz-tone in
the free-field was standardized. It is given in Fig. 5 as a
solid curve. With the definition that a 1-kHz tone of
40-dB SPL has the loudness of 1 sone, the curve indicates that doubling the loudness from 1 to 2 sone is
equivalent to increasing the sound-pressure level from
40 to 50 dB. The same holds for larger levels: a doubling in
loudness is achieved with each increment of 10 dB of
the 1-kHz tone. This means that 50 dB corresponds to 2
sone, while 100 dB corresponds to 64 sone. The loudness
function of the 1-kHz tone above 40 dB corresponds to a
power law if loudness is related to sound intensity. Its
exponent can be extracted easily by the fact that a
10-dB increment produces an increment in loudness of a
factor of 2, which in logarithmic values is equivalent to
an increment of 3 dB. Therefore the exponent of the
power law connecting loudness with the sound intensity
of the 1-kHz tone for sound pressure levels above 40 dB is
0.3. At sound pressure levels below 40 dB, the loudness
function becomes steeper and steeper toward threshold
in quiet, which per definition corresponds to a loudness
of 0 sone. On the logarithmic loudness scale this zero
corresponds to a value of minus infinity.
From the masking pattern outlined in Fig. 4(a) and
the corresponding excitation pattern, we already know
Fig. 5. Loudness function of 1-kHz tone (solid line) and
uniform exciting noise (dotted line). Loudness is given as a
function of sound pressure level. Approximations using power
laws are indicated as broken and dashed-dotted lines together
with the corresponding equations. Adopted from [5].
PAPERS
spectral width, does not lead to an infinitesimally narrow
excitation in our auditory system—the final receiver—and
thus on the critical-band-rate scale. Instead, it results in
an excitation over a range increasing with larger SPL
values of the 1-kHz tone. Although easily describable
in purely physical terms, the 1-kHz tone produces a
complex pattern of excitation, which from this point of
View does not seem directly useful for answering the
question we are interested in, namely, transferring the
excitation into an equivalent psycho-acoustic value.
When we talk about loudness, we mean total loudness,
knowing that this loudness is comprised of very many
partial loudnesses which are located along the criticalband-rate scale. The physiological equivalent of this
assumption would be that all the neural activity of the
sensory cells along the basilar membrane is summed up
into a value that finally leads to the total loudness. Many
experiments dealing with the loudness of sounds of
different spectral widths have shown that the instruments
our auditory system uses are the critical bands that shape
and weigh the many partial loudnesses to be summed
up. If the summation or integral mentioned leads to the
loudness that is given in units of sones, the value we
are looking for has to have the dimension of sones per
bark. This value is called specific loudness and is denoted
by N' . The total loudness N is thus the integral of specific
loudness over the critical-band rate, which can be expressed
mathematically as follows
(1)
[10]:
24 bark
N=
N'(z) dz .
Since the 1-kHz tone produces a complicated excitation
pattern and therefore also a complicated specific-loudness
pattern, we have to search for a sound that produces more
homogeneous excitation versus the critical-bandrate
pattern. This sound is the uniform exciting noise, which
fills up the entire auditory range in such a way that the
same sound intensity falls into each of the 24 abutting
critical bands (meaning that all critical bands are
positioned adjacent without space between them). The
loudness of such a uniform exciting noise was measured.
It was found that the loudness of 1 sone is reached at a
level of about 30 dB for uniform exciting noise. The
entire loudness function of uniform exciting noise is
shown by the dotted line in Fig. 5. The curve rises
somewhat more steeply with level than the loudness of the
1-kHz tone, at least for levels of uniform exciting noise to
about 50 dB. Above 60 dB, the dotted line can also be
approximated by a straight line, which is shown
dotted-dashed in Fig. 5. This straight line again means that
a power law holds for the relation between the loudness
of uniform exciting noise and the sound intensity of that
noise. The exponent of this dotted-dashed line is smaller,
however, than that for the loudness function of the 1-kHz
tone (dashed straight line). It has a value of only 0.23, and
thus the two loudness functions shown in Fig. 5 come
closer together at higher
that a 1-kHz tone, although it has an infinitely small
12
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
levels.
Besides the different exponents of the two loudness
functions at higher levels it is also interesting to see that
the loudness of uniform exciting noise is much larger than
the loudness of the 1-kHz tone in almost the entire level
range indicated. For example, the loudness of a 60-dB
uniform exciting noise is about 3.5 times larger than
the loudness of the 1-kHz tone with the same level. This
difference is a very distinct effect, which plays an
important role in judging and measuring the loudness of
noises. It indicates very clearly that an overall
sound-pressure level of broad-band noises is an
extremely inadequate value if loudness is to be approximated. Unfortunately most noises producing annoyance to people are broad-band noises, and the
A-weighted sound-pressure level is a measure of the total
level, which creates misleading values when used as an
indication for loudness. Almost all sounds occurring in
audio broadcasting and recording not only have a large
bandwidth but also differ in spectral shape. Therefore
meters based merely on total level (such as VU or
peak-level meters) usually give readings quite unrelated
to loudness, although these readings should correspond to
loudness sensation as closely as possible from the view of
the listener as the final receiver. This is referred to later.
Because uniform exciting noise produces the same excitation along the whole critical-band-rate scale, it can
be used very nicely to calculate the value we are searching
for (the specific loudness) out of its total loudness. Fig. 6
shows the procedure schematically. In Fig. 6(a) the
excitation levels of uniform exciting noise (dashed) and
of narrow-band noise, one critical band wide and centered
at 1 kHz (solid), are shown. The two distributions are
given for the condition that both the uniform exciting
noise and the narrow-band noise have the same
sound-pressure level of 64 dB. This value was chosen
because the level in each of the 24 abutting critical bands
produced by the uniform exciting noise is 50 dB, leading
to an overall sound-pressure level of 50 dB + (10 x log
24) dB = 64 dB. For the narrow-band noise, the entire
intensity is concentrated around 1 kHz, corresponding
to a critical-band rate of 8.5 bark. The distribution of
the excitation level as a function of critical-band rate
reaches a peak value of 64 dB for the narrow-band
noise, while it remains constant for the uniform exciting
noise at 50 dB.
AUDIO ENGINEERING AND PSYCHOACOUSTICS
different values of the excitation level, that is, different
values of the total level and total loudness of uniform
exciting noise. The results show that specific loudness
is related to the excitation in a similar way as the total
loudness of uniform exciting noise is related to the
sound intensity of the noise at high levels, namely,
through a power law with an exponent of 0.23. The
effect of threshold, which influences the relation between specific loudness and excitation level for levels
ranging between threshold and about 40 dB above
threshold, is ignored here for reasons of simplicity and
accessibility. For practical applications, the exponent
of 0.23 is often approximated with 0.25, as it then
corresponds to the factor 0.5 of the sound pressure,
and this square root is easily available technically.
The distribution of specific loudness as a function
of critical-band rate for the 1-kHz narrow-band noise
with the same level of 64 dB is shown in Fig. 6(b) by a
solid line. It is obvious that the loudness of the two
noises, the uniform exciting noise and the narrow-band
noise, that is, the integral of specific loudness over
critical-band rate, is quite different for the two noises.
For the narrow-band noise, the area below the curve
corresponding to the integral is only about one quarter of
that of the rectangularly shaped area of the uniform
exciting noise, The same relation can be seen in Fig. 5
for a level of 64 dB, where the two curves indicate a
loudness of 20 sone for the uniform exciting noise, but
only 5 sone for the 1-kHz tone, which is as loud as the
narrow-band noise centered at 1 kHz.
The distributions of specific loudness as a function of
critical-band rate shown in Fig. 6(b) are the most
extreme cases. The one produced by uniform exciting
Using our assumption that total loudness is the integral
over specific loudness along the critical-band-rate scale
(as discussed), we can calculate the specific loudness
corresponding to an excitation level of 50 dB from the
total loudness of uniform exciting noise. According to
Fig. 5, uniform exciting noise with a level of 64 dB
produces a total loudness of 20 sone. Dividing this
value by 24 bark, the entire width of the critical-bandrate scale, leads to the value for the specific loudness
caused by an excitation level of 50 dB, that is, 20 sone
divided by 24 bark leads to about 0.85 sone/bark. The
same procedure can be used to calculate the relation
between specific loudness and the excitation level for
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
13
ZWICKER AND ZWICKER
14
PAPERS
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
critic
al-band rate
(a)
(b)
Fig. 6. (a) Excitation level and (b) specific loudness of narrow-band noise (solid lines) and uniform exciting noise (broken
lines) of equal sound-pressure levels (64 dB) as a function of critical-band rate.
60
dB
11)
40
0
20
x
a)
01 2
1 6 B a rk
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
24
15
ZWICKER AND ZWICKER
PAPERS
noise is completely flat. However, even this noise produces a flat shape only if the frequency response between
the free-field condition and the sound pressure at the ear
drum is not accounted for. In our discussion we ignore
this for didactical reasons, but for precise loudness
measurements all these effects naturally have to be
included [4], [6]. The distribution of specific loudness over
the critical-band rate is often called the loudness pattern.
This pattern varies for different kinds of noises, tones, or
complex tones quite drastically. However, this loudness
pattern is the pattern that is most interesting for the
assessment of sound quality in the case of steady-state
conditions because it shows on both coordinates the
adequate hearing values: frequency is expressed via
critical-band rate and level is expressed via specific
loudness. If temporal effects are taken into account as
well, then the time-varying specific-loudness versus
critical-band-rate pattern contains all the information
that eventually is evaluated by our auditory system.
2.3 Pattern of Specific Loudness versus
Critical-Band Rate versus Time
From the many temporal effects included in the
masking mechanisms only that of postmasking is discussed here, because it has the biggest impact on efficient coding for digital audio broadcasting (DAB) [2].
Postmasking results from the gradual release of the
effect of a masker, that is, masking does not immediately
stop with switching off the masker but still lasts while
there is actually no masker present. Postmasking depends on the duration of the masker. Fig. 7 shows a
typical result for a 2-kHz test-tone burst of 5-ms duration. The delay time at which the test-tone burst is
presented after the end of the masker is plotted as the
abscissa. The level of the test-tone burst necessary for
audibility is the ordinate. For a long masker duration
of at least 200 ms, the solid curve indicates postmasking.
60
dB
*IT S 40 5.
200
200ms5
UMN
li
Tm=
:•
'
"
'''
5
n
t
td
fr.2kHz

20ms 50 100 200 500
delay time, td
5 10
Fig. 7. Dependence of postmasking on masker duration: Level
of barely audible test-tone burst as a function of its delay time
(time between end of masker and end of test tone). Duration of
maskers 200 and 5 ms; level of masker (uniform masking noise)
60 dB; duration of 2-kHz test tone 5 ms. Adopted from [5].
It decreases from the value for simultaneous masking
(plotted on the left, outside the logarithmic scale) as a
function of the delay time. However, postmasking
produced by a very short masker burst (such as 5 ms)
behaves quite differently. Postmasking in this case (as
indicated by the dotted line in Fig. 7) decays much faster
so that already after about a 50-ms threshold in quiet is
reached. This implies that postmasking strongly depends on
the duration of the masker and therefore is another highly
nonlinear effect.
Specific loudness as calculated from excitation in
the steady-state condition can also be considered as
being a time-dependent value. Simultaneous masking
and postmasking can be used to approximate the time
functions of the specific loudness. Using this complete
transformation, the specific loudness for a tone burst of
200 ms and that for a tone burst of 5 ms is plotted over
time in Fig. 8. The tone bursts are located on the linear
time scale in such a way that both bursts end at the same
instant (200 ms). For the 200-ms tone burst, the specific
loudness shows a very steep rise and stays at the peak
value for almost 200 ms. The subsequent decay does not
seem to have only one time constant. The specific
loudness of the 5-ms tone burst rises just as quickly as
for the 200-ms tone burst; the decay, however, is quite
different and much faster, as can be expected from the
postmasking pattern shown in Fig. 7. The different
behavior of the specific loudness after the end of the
tone bursts is shown by a dotted and a solid line. The two
different decays can be approximated very roughly by
single time constants of about 30 ms for a tone-burst
duration of 200 ms and about 8 ms for a duration of 5
ms. Actually, in both cases the slope is much steeper
during the early decay and less steep during the later
decay (compared to the approximation using only one
time constant).
These functions of specific loudness versus
critical-band rate versus time illustrate best the
information flow in the human auditory system. As
three-dimensional patterns, they contain all the
information that is subsequently processed and leads to the
different hearing sensations. An example for such a
complete pattern is shown in Fig. 9 for the spoken word
"electroacoustics" fed into the auditory system. The
specific loudness produced by this sound is plotted for
22 places with a 1-bark spacing along the
critical-band-rate scale. For speech transmission, the
spectral resolution in about 20 abutting channels is
sufficient; for the transmission of music, additional
information on pitch is necessary. However, the most
important information, especially in music with strong
time
Fig. 8. Specific loudness produced by masker bursts of 200 ms (dotted line) and 5 ms (solid line) as a function of time.
16
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
temporal effects, can already be
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
17
ZWICKER AND ZWICKER
seen nicely in patterns showing the specific loudness
as a function of critical-band rate and of time. The
pattern in Fig. 9 clearly shows the formants of the
vowels and the spectral centers of the consonants, and
also indicates the relatively quick rise following the
stimulus, as well as a longer decay corresponding to
postmasking.
PAPERS
paper, it would be much better to control the broadcasting level utilizing a loudness level meter [111,
[12] rather than a volume meter, the reading of which is
only of importance for preventing equipment overload
but not for the listener.
Total loudness can be derived from the 24
specific-loudness channels by summing up all 24 channels
and feeding this function through a special low pass
which in useful approximation reproduces the behavior of
our auditory system in regard to temporal effects in
loudness perception. Through this special low pass, the
time function of the perceived loudness is strongly
smoothed, but shows single syllables with clear
separation. It is then evident that peak loudness,
normally assumed to be the perceived loudness, is
produced by the vowels in speech. Consonants and
plosives are very important for the understanding of
speech and are also very clearly visible in the
specific-loudness versus critical-bandrate versus time
pattern; their contribution to the total loudness, however,
is almost negligible.
3 APPLICATIONS
3.1 Loudness
Loudness is a sensation of great interest in many
problems related to audio engineering. For example, it
is of interest how the loudness of a piece of music is
perceived where the level changes drastically as a
function of time. Often engineers are interested in a
single number that is comparable with other data.
Fig.shows the loudness versus time function of pieces of
broadcast music interrupted by a commercial. In order
to get an estimate of the loudness perceived by the
listener, the so-called cumulative loudness distribution is
calculated for the different parts of the broadcast, as
indicated by the numbers and the dashed vertical dividing
lines. The cumulative loudness distribution supplies
information about the probability that a given loudness is
exceeded. This probability is shown in Fig.
10( ) for the three different temporal parts indicated
by the numbers in Fig. 10(a). At the start of the specific
sequence, around (0) a jingle is presented.
Comparisons of the loudnesses perceived by many
subjects have indicated that the average loudness corresponding to N50 (the loudness exceeded in 50% of
the time) gives an inadequate number, whereas N 5 to
N10 give adequate readings of what the subjects really
perceive. It becomes very clear from Fig. 10(b) that
the commercial 0 is perceived far louder than the adjacent pieces of music ©.
Sometimes in broadcasting different voices follow
each other in the program, with the level being monitored on a volume meter. Adjustment for equal level of
the voices often leads to strongly unequal loudness
perceived by the listener (the final receiver), who
can be rather annoyed by this, In accordance with
the basic idea introduced at the beginning of this
18
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
22 N ,
I
lsone/Bark __________________________________________
2 1
20— __
-0 19
g 18
ai
Ln
_o
0
0
17 ____
16
_________________ ar1111.11111mos...—
. ____________________
S
TI
CS
time
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
19
ZWICKER AND ZWICKER
PAPERS
Fig. 9. Specific-loudness versus critical-band-rate versus time pattern of spoken word "electroacoustics." Specific loudness is
plotted for 22 discrete values of critical-band rate. Ordinate scale is marked at panel related to 21 bark. Abs cissa—time; 200 ms
is indicated. Total loudness as a function of time is plotted on top. Adopted from [6].
20
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
3.2 Sharpness
Sharpness is an important concept correlated with
the color of sound, and can also be calculated from the
specific-loudness versus critical-band-rate pattern. It
was found that the sharpness of narrow-band noises
increases proportionally with the critical-band rate for
center frequencies below about 3 kHz. At higher frequencies, however, sharpness increases more strongly,
an effect that has to be taken into account when the
sharpness S is calculated using a formula that gives
the weighted first momentum of the critical-band-rate
distribution of specific loudness,
i2 4 b a r k
N
acum.
24 bark
S = 0 .1 1
fo
(2)
N' dz
' • g(z)
0
z dz
In Eq. (2) the denominator gives the total loudness,
while the upper integral is the weighted momentum
mentioned. The weighting factor g(z) takes into account
the fact that spectral components above 3 kHz contribute
more to sharpness than components below that frequency. An example of the calculation of sharpness is
given in Fig. 11 for uniform exciting noise and for a
high-pass noise above 3 kHz. The weighted specific
loudnesses are shown as a function of the critical-band
rate together with the location of their first momentum
(center of gravity) marked by arrows. When the cutoff
frequency of the high-pass noise is shifted toward lower
values and the noise is finally transformed into a uniform
exciting noise, loudness increases quite strongly; however, sharpness decreases markedly, in agreement with
psychoacoustical results.
3.3 Fluctuation Strength
Fluctuation strength is a sensation correlated to the
temporal variation of sounds. It was examined quite
extensively by Fastl [13] during the last decade. It is
C
D,
important for the transmission of music as well as for the
transmission of speech. Interestingly, the fluctuation
strength measured as a function of the modulation frequency shows a maximum near 4 Hz, a value for which the
frequency of syllables in running speech has a maximum
as well.
Fluctuation strength can also be calculated using the
temporal dependence of the specific-loudness versus
critical-band-rate pattern. The period of the modulation
(or its frequency) as well the ratio between maximum
specific loudness and minimum specific loudness are
of importance. Without going into detail, the influence
of room acoustics on the fluctuation strength may be
illustrated by using a 100% amplitude-modulated
1-kHz tone. Such a tone, recorded under free-field conditions, is played back in a room. The 100% amplitude
modulation is decreased quite strongly to a nonsinusoidal amplitude modulation (Fig. 12). The specific
loudness corresponding to the frequency range around 1
kHz is shown as a function of time in Fig. 12(a) for the
recorded sound and in Fig. 12(b) for the sound picked
up by a microphone in the room. The difference between
the two time functions is remarkable, indicating that room
acoustics influence the fluctuation strength quite
strongly and thus the quality of sound reproduction.
Actual values in the example illustrated lead to a 75%
reduction of fluctuation strength.
3.4 Room Acoustics
Room acoustics, however, produce positive effects,
too. For example, reverberation increases the loudness
of a speaker in a room because of the many reflections
4
8 12 16Bark 24
critical-band rate
Fig. 11. Sharpness of uniform exciting noise (broken line, area
hatched lower left to upper right), and high-pass noise (dotted
line, hatched upper left to lower right). Weighted
specific loudness is shown as a function of critical-band
rate. Calculated sharpness is indicated by vertical arrows.
Adopted from [51.
time
(a)
5
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
10 15 sone 20
loudness
(b)
21
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
Fig. 10. (a) Loudness—time function of broadcast including a jingle at (0), preparation for a commercial 0, commercial 0,
and music 3. (b) Cumulative loudness distributions.
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
22
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
that finally lead to a more diffuse field rather than to a free
field. As an example, a speaker is approximated as a
source of constant volume velocity, and Fig. 13
indicates the effect of increasing reverberation in a
room when the same speech source produces loudness
versus time functions under three different conditions.
Curves (a) give the free-field condition, curves (b) the
condition for a room with a reverberation time of 0.6 s,
and curves (c) give the same for a room with a reverberation time of 2.5 s. Short periods from a 10-min
speech are shown in the left part of Fig. 13. The right part
indicates the corresponding cumulative distributions
resulting from the loudness versus time functions for
the three conditions. Using the loudness exceeded in
10% of the time as an indication of the perceived loudness,
it can be expected that the speech is 1.2 times louder in
the room with 0.6-s reverberation time and about two
times louder in the room with 2.5-s reverberation
compared with the loudness produced in the free-field
condition. This increment in loudness is often very helpful
for the intelligibility of speech in rooms as long as the
reverberation time does not produce temporal masking,
which reduces the audibility of faint consonants
3.5 Digital Transmission and Reproduction
of Audio Signals at Reduced Bit Rate
Transmission and reproduction at reduced bit rate
(especially in light of the proposed realizations for DAB)
as a new and important area in electroacoustics and
audio engineering were a major motivation for writing
this paper. The pattern of specific loudness versus critical-band rate versus time can be used as a 'yardstick'
that we have to follow in order to reduce information
without introducing audible distortion of the sound.
This holds for music as well as for speech. The systems
realized in this area are strictly following this basic
idea and, so far, mostly masking effects have been
taken into account. Since this particular area is well
covered by other publications and this paper deals with
the fundamental ideas behind DAB, there is no need
to go into the technical details here. It should be mentioned, though, that in music the physical equivalent of
spectral pitch percepts—which can be extracted by a
hearing-equivalent spectral analysis—can also be used
as a tool to reduce the information flow drastically without
making this reduction audible [14].
10
sone
Bark
8
a,
c 6 7
.(.2 4
U
o_
vi 2
free
field
0
fr.-250ms
appearing in sequence to loud vowels.
The specific-loudness versus critical-band-rate versus
20
sone
Bark
16
12
8
4
0
time
(a)
(b)
Fig. 12. Specific-loudness versus time function of tone with 100% amplitude modulation. (a) in free-field condition. (b) played
back in room.
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
23
ZWICKER AND ZWICKER
PAPERS
100
E0
80
`Z
15
c
c
13 3
1
u( )
60. (7,
20
0
0
.
5
24
1
2
3
5
total loudness
10sone 20
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
Fig. 13. Effect of reverberation time on loudness—time functions (left) and on loudness distributions (right). Data obtained (a)
in free-field condition and in rooms with reverberation times of (b) 0.6 s and (c) 2.5 s indicate increase of loudness with increasing
reverberation time. Adopted from [6].
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
25
ZWICKER AND ZWICKER
time pattern produced by music or speech can be seen on
the screen incorporated in modern loudness meters. It is
very interesting and impressive to listen to music or
speech and at the same time look at this information flow
indicated by the movement of this pattern. It illustrates
very strongly what we have tried to convey to the reader
as the basic idea behind the data reduction necessary for
efficient DAB.
4 CONCLUSIONS
The specific-loudness versus critical-band-rate versus
time pattern contains all the information that is used
by our auditory system in order to produce the different
hearing sensations. We propose not to transfer less
information than contained in this pattern; however,
we also do not need to transfer more than this information. A reproduction accuracy of 1 dB in excitation
level, corresponding to a relative value of 7% in specific
loudness, is sufficient for practical applications.
A Personal Comment of E. Z.
PAPERS
and Models," Trans. Comm. on Hearing Research
H-84-13, Acoust. Society of Japan, 1984.
[9] E. Zwicker, "The Inner Ear, a Sound Processing
and a Sound Emitting System," J. Acoust. Soc. Jpn
(E), vol. 9, pp. 59-74 (1988).
[10] ISO 532, "Acoustics—Method for Calculating
Loudness Level," International Organization for Standardization, Geneva, Switzerland, 1975.
[11] B. Bauer and E. L. Torick, "Researches in
Loudness Measurements," IEEE Trans. Audio Electroacoust., vol. Au-14, no. 3, 1966.
[12] B. L. Jones and E. L. Torick, "A New Loudness
Indicator for Use in Broadcasting," SMPTE (1981
Sept.).
[13] H. Fastl, "Fluctuation Strength of Modulated
Tones and Broadband Noise," in: R. Klinke and R.
Hartmann Eds., Hearing—Physiological Bases and
Psychophysics (Springer, Berlin, 1989).
[14] W. Heinbach, "Aurally Adequate Signal Representation: The Part-Tone-Time-Pattern," Acustica,
vol. 67, pp. 113-121 (1988).
In 1950 I had to solve the problem of why
tape-recorded music was accepted differently if
recorded and played back by different apparatus. The
barely noticeable amplitude and frequency modulation
as a function of modulation frequency and level, as characteristics of the final receiver (our auditory system),
have led to the solution. Today, after having published
more than 200 papers related to the field, I am propagating the same approach for a solution of the problems
in modern electroacoustics and audio engineering, although at a somewhat higher level. It is obvious that we
have learned quite a bit during the last 40 years, and I
would like to thank all those who have contributed to that
very much.
5 REFERENCES
[1] D. Krahe, "Ein Verfahren zur Datenreduktion
bei digitalen Audio-Signalen unter Ausnutzung psychoakustischer Phanomene," Rundfunktech. Mitt., vol.
30, no. 3, pp. 117-123 (1986).
[2] G. Stoll, M. Link, and G. Theile, Masking-Pattern
Adapted Subband Coding: Use of the Dynamic
Bit-Rate Margin," presented at the 84th Convention of
the Audio Engineering Society, J. Audio Eng. Soc.
(Abstracts), vol. 36, p. 382 (1988 May), preprint 2585.
[3] G. Stoll and Y. Dehery, "High Quality Audio
Bit-Rate Reduction System Family for Different Applications," in Proc. IEEE ITC '90 (1990), pp. 937941.
[4] E. Zwicker and R. Feldtkeller, Das Ohr als Nachrichtenempfanger (Hirzel, Stuttgart, 1967).
[5] E. Zwicker, Psychoakustik (Springer, Berlin,
1982).
[6] E. Zwicker and H. Fastl, Psychoacoustics: Facts
and Models (Springer, Berlin, 1990).
[7] E. Zwicker and M. Zollner, Elektroakustik
(Springer, Berlin, 1987).
[8] H. Fastl, "Dynamic Hearing Sensations: Facts
26
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
AUDIO ENGINEERING AND PSYCHOACOUSTICS
27
ZWICKER AND ZWICKER
PAPERS
THE AUTHORS
28
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
AUDIO ENGINEERING AND PSYCHOACOUSTICS
E. Zwicker
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
U. T. Zwicker
29
ZWICKER AND ZWICKER
PAPERS
Eberhard Zwicker was born in Ohringen, Germany,
in 1924. He studied physics at the University Tfibingen
(1945/46), and electrical engineering (communications)
at the Technical University Stuttgart from 1946 to 1950.
He received a doctor-engineer degree in electroacoustics
in 1952. That year, as scientific assistant with Professor
30
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March
PAPERS
Dr. R. Feldtkeller at the Technical University Stuttgart
(TUS), he began a career of teaching and research. He
spent one year as a researcher at the Harvard University
Psychoacoustics Laboratories, Cambridge, MA (1956/
57), as associate professor at the TUS (1957/61), lecturing with Dr. J. Zwislocki at Syracuse University
Bio-Acoustic-Laboratory, Syracuse, NY, with visits
to numerous colleges in the USA at the invitation of
the American Institute of Physics (1961/62); extraordinary professor at TUS (1962/67); research at Bell
Telephone Labs, Murray Hill, NJ (1964); and as Professor and Director of the Institute of Electroacoustics,
Technical University Munich (1967/90).
During those years, Professor Zwicker was also a
member of the Kuratorium der Technisch-Physikalischen Bundesanstalt (1970/76), Speaker of the special
research group Cybernetics sponsored by the Deutsche
Forschungsgemeinschaft (1971/77), Dean of the Faculty
of Electrical Engineering at the Technical University
of Munich (1977/79), Speaker of the special research
group Hearing sponsored by the Deutsche Forschungsgemeinschaft (1983/90), and did research and lecturing
throughout the world including the USA, Great Britain,
Japan, France, Switzerland, The Netherlands, Belgium,
Spain, Poland, Czechoslovakia, Hungary, Italy, Austria, and Argentina.
Professor Zwicker's committee activity in the field
of acoustics started in 1955 as a member of the German
DIN standardization committees on Acoustic Measurements, Loudness and Noise Measurements, and
Electronic Filters. In 1958 he became a member of
ISO TC 43/working group Loudness From Objective
Analysis, and was the German delegate to ISO meetings
in Stockholm (1958), Rapallo (1960), Helsinki (1961),
and Baden-Baden (1962). In 1959 he was secretary of
the 3rd Congress of the International Commission on
Acoustics in Stuttgart. He became international correspondent of the Committee of Hearing and Bioacoustics (CHABA) in 1963, a member of the International Commission on Acoustics from 1966 to 1972,
and in 1984 was corresponding member of the Institute of
Noise Control Engineering.
In 1956, Professor Zwicker was awarded the venia
legendi in electroacoustics by the Nachrichtentechnische
Gesellschaft. He received a Fellowship from the
Acoustical Society of America in 1962, and, in 1987,
he was awarded that society's Silver Medal. In 1982
he was made an Honorary Member of the Audio Engineering Society, and in 1988 he received the following
31
awards, Bundesverdienstkreuz am Bande des Verdienstordens der Bundesrepublik Deutschland, the KarlKiipfmtiller-Ehrenring from the Technical University
Darmstadt, and the Preis der HOrgerdte-Akustiker.
In 1990 October Professor Zwicker retired from his
duties as Director of the Institute of Electroacoustics
at the Technical University Munich. A little more than a
month later, on November 22, he died of cancer at his
home in Icking, Germany. In 1991 February, the AES
Gold Medal was awarded to him posthumously.
Professor Zwicker's obituary appears in In Memoriam
in the 1991 March issue of the AES Journal.
•
Ulrich Tilmann Zwicker was born in Stuttgart, Germany, in 1955. He studied physics and electrical engineering at the Technical University Munich (TUM)
from which he received a bachelor's degree specializing in
communications engineering, electroacoustics, and
psychoacoustics in 1978, and a master's degree in 1981.
After graduating, Dr. Zwicker became a research
associate at the Institute of Electroacoustics (TUM) in
the area of acoustics, electroacoustics, psychoacoustics,
acoustical measurements, and audio. In 1983 he became
assistant professor at the Institute of Instrumentation
(TUM) where his research moved into the field of highfrequency/solid-state acoustics, instrumentation, and
control. In addition to administrative tasks, his work
included teaching instrumentation with associated laboratory courses, and cooperative projects with Siemens
AG, Mercedes-Benz AG, and the 1st Institute of Metrology, Beijing, China. In 1988 he received the Dr.Ing. degree with a dissertation in instrumentation.
In 1989, Dr. Zwicker continued his research in the
area of psychoacoustics and acoustics as visiting research associate to the Department of Audiology and
Department of Electrical and Computer Engineering
at Northeastern University, Boston, MA; the Physics/
Astronomy Department of Michigan State University,
East Lansing, MI; and the Department of Neurophysiology of the University of Wisconsin Medical School,
Madison, WI. He became an associate professor at the
Institute of Electroacoustics (TUM) in 1990 doing research in binaural hearing. In 1990 September, he joined
the European Patent Office, in Munich, as a patent
examiner doing substantive examination.
Dr. Zwicker has published extensively and has given
numerous invited and contributed lectures in his field of
interest. He is a member of the AES.
J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March