Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PAPERS Audio Engineering and Psychoacoustics: Matching Signals to the Final Receiver, the Human Auditory System* EBERHARD ZWICKER AND U. TILMANN ZWICKER Institute of Electroacoustics, Technical University Munich, D-8000 Munchen 2, Germany The consequences of the fact that the human auditory system is the final receiver in almost all cases of sound recording, transmission, and reproduction are discussed. Th e strategies of processing and transmitting sound as effectively as possible on one hand, and also as "undistorted" as possible on the other need adaption to the perception characteristics of the auditory system: The transformation of frequency to critical-band rate as well as the transformation of level to specific loudness are the tools used for this adaption. Examples for practical applications of the basic idea are illustrated. 0 INTRODUCTION During the last few years, digital sound processing and storage have been adopted widely in audio, and are providing excellent sound quality. However, converting an audio stereo signal to a 16-bit digital format with appropriate redundancy for error correction and with a minimum sampling rate around 44 kHz requires extremely extended bandwidth for signal transmission and storage, the latter coupled with huge mass-storage necessities. The large bandwidth results in problems for radio transmission in particular, so there is considerable interest in avoiding any redundancy in the signal other than for error-correction purposes. To achieve sound transmission or reproduction that is not only very good but also efficient, all equipment has to be adapted to the characteristics of the final receiver, in this case the human ear. Any part of the transmitted signal that is not recognized by the auditory system shows bad matching to the receiver and provides unnecessary redundancy. Considerable progress has been made to implement methods of reduction of unnecessary data derived from findings in the field of psycho-acoustics. Most of these efforts concern the future digital audio broadcasting (DAB), for example [1]-[3], but * Manuscript received 1990 July 18; revised 1990 December J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 12.digital storage can profit from the possible information reduction as well. In fields other than audio, such as the transmission of electrical power, the adaption to the final receivers is very well established and generally applied for transmission from power plant to power plant as well as from power plant to factories and even to individual households. In the field of transmitting information, the same rule holds as for power transmission. Therefore, all of our efforts in improving electroacoustic information transmission—including recording—have to be seen from the perspective of the final receiver, the human auditory system. This perspective has many more advantages in audio engineering, such as in instrumentation and with public-address applications, as discussed in this paper. 1 THE FINAL RECEIVER: THE HUMAN AUDITORY SYSTEM AND PERCEPTION Eventually important is the perception of sound. We do not perceive frequency, we rather perceive pitch; we do not perceive level, but loudness. We do not perceive spectral shape, modulation depth, or frequency of modulation; instead we perceive "sharpness," "fluctuation strength," or "roughness." We also do not perceive time directly; our perception is the subjective 1 ZWICKER AND ZWICKER duration, often quite different from the physical du- PAPERS PAPERS ration. In all of the hearing sensations mentioned, which are described in detail elsewhere [4]— [6], masking plays an important role in the frequency, as well as in the time domain. Consequently Sec. 2 deals with masking effects and the transformation from frequency scale to critical-band-rate scale and from level scale to specific-loudness scale. The information received by our auditory system can be described most effectively in the three dimensions of specific loudness, critical-band rate, and time. The resulting three-dimensional pattern is the measure from which the assessment of sound quality can be achieved. Some applications of this pattern, which is reproduced in a modern loudness meter, for example, are discussed especially in view of modern electroacoustic transmission and reproduction. In this paper, the main emphasis is on practical applications of psychoacoustics in the field of perception and reproduction of sound, and many scientific details are therefore omitted. Rather, the basic, important facts are enhanced. Further information is available from books on psychoacoustics and electroacoustics [4]—[8]. 2 PSYCHOACOUSTICAL PRINCIPLES APPLICABLE IN AUDIO ENGINEERING PRACTICE 2.1 Transformation from Frequency to Critical-Band Rate barely audible test tone, and the resulting level, called excitation level, is shown as ordinate. The level of the narrow-band maskers is 60 dB for all curves. Comparing the results produced from different center frequencies of the masker, we find the form of the curves to be rather dissimilar, no matter what frequency scaling we use. It seems as if the shape of the curves is similar for center frequencies up to about 500 Hz on linear frequency scale, while for center frequencies above 500 Hz there is a similarity on a logarithmic frequency scale. This intuitive result is quite accurate since the hearing-equivalent critical-band-rate scale mentioned follows a linear frequency scale up to about 500 Hz and then a logarithmic frequency scale above 500 Hz. This relation is illustrated in Fig. 2 by two different frequency scales, one divided linearly, the other logarithmically. Approximations, which sometimes may be useful within certain frequency ranges, are also indicated. Fig. 2(a) shows the uncoiled inner ear, including the basilar membrane. It indicates that the critical-band-rate scale is directly related to the place along the basilar membrane where all the sensory cells are located in a very equidistant configuration (one row of inner hair cells and three rows of outer hair cells). Thus the critical-band-rate scale is closely related to our physiology, too. The critical-band concept is based on the well-proven The effect of masking plays a very important role in hearing, and is differentiated into simultaneous and nonsimultaneous masking. An example for the simultaneous condition would be the case where we have a conversation with our neighbor while a loud truck passes by. In this case our conversation is severely disturbed. To continue our conversation successfully we have to raise our voice to produce more speech power and greater loudness. In music, similar effects take place. The different instruments can mask each other, and softer instruments become audible only when the loud instrument pauses. Such simultaneous masking is outlined here for quantitatively easily describable conditions, while nonsimultaneous masking is discussed in Sec. 2.3. Simultaneous masking can be understood more easily if instead of the frequency scale a hearingequivalent scale, that is, the critical-band-rate scale, is used. Masking usually is described as the sound-pressure level of a test sound (a pure tone in most cases) necessary to be barely audible in, the presence of a masker. For narrow-band noises used as maskers and pure tones used as test sounds, masking patterns can be produced for different center frequencies of the narrow-band noise maskers, as shown in Fig. 1. The same information is given in Fig. 1(a) and (b). However, in Fig. 1(a) the level of the barely audible pure tone is plotted as a function of frequency on a linear scale, in contrast to the logarithmic scale used in Fig. 1(b). In order to make the masking patterns directly comparable through having the same peak values, the so-called masking index, a value of 2-6 dB (for details see the literature mentioned), is added to the sound-pressure level of the J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 3 ZWICKER AND ZWICKER PAPERS 0.5 (a) 4 1 5kHz 10 20 (b) J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March Fig. 1. Excitation level (masking level with added masking index) of narrow-band noises of given center frequency as a function of frequency. Broken lines—threshold in quiet. (a) Linear scale. (b) Logarithmic scale. J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 5 PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS assumption that our auditory system analyzes a broad spectrum in parts that correspond to critical bands. Adding one critical band to the next, so that the upper limit of the lower critical band corresponds to the lower limit of the next higher critical band, produces the scale of the critical-band rate. Since critical bands have a 100-Hz width up to 500 Hz and above 500 Hz take a relative width of 20%, it becomes clear why the critical-band rate is dependent on frequency as illustrated. This can also be seen in Fig. 2(c), where the critical-band rate is plotted as a function of frequency on the logarithmic scale, a scale more appropriate for approximating the critical-band rate. The latter fact is especially advantageous for problems dealing with speech transmission, where important spectral features are located in the spectral region between 300 and 5000 Hz. However, it is also necessary to realize that the linear relation between frequency and critical-band rate plays an important role in music based on harmony. system which, through saturation, is effective at low levels only. At higher levels the feedback automatically disappears. This leads to a shape of the masking curve corresponding to the amplitude of the traveling wave along the basilar membrane, as seen for higher levels. Recent data of this traveling wave measured at very low levels have proven that feedback takes place in Because the critical-band concept is used in so many models and hypotheses, a unit for the critical-band rate was defined, which is one critical band wide. It is the bark, in memory of Barkhausen, a scientist from Dresden, Germany, who introduced the phon, a unit describing the loudness level for which the critical band plays an important role. When frequency is transferred into critical-band rate, the masking patterns outlined in Fig. 1 change to those seen in Fig. 3. There the level of the barely audible pure tone (again expressed as excitation level, that is, including the masking index) is plotted as a function of the critical-band rate for the same narrow-band maskers as shown in Fig. 1. The effectiveness of the natural frequency scale, that is, the critical-band-rate scale, is obvious. The shapes of the curves for different center frequencies are very similar. Only at very low frequencies, below about 100 Hz, where special masking effects (such as the masking-period patterns) lower the amount of masking, the upper slope is somewhat steeper. Fig. 3. Excitation level versus critical-band rate for narrow-band noises of given center frequency and 60-dB sound pressure level. Broken lines—threshold in quiet. Adopted from [5 ] . It is not only the masking effect that can be described more simply and become more easily understandable in terms of this natural scale corresponding to locationalong the basilar membrane, but also many other effects, such as pitch, frequency differences barely noticeable, or the growth of loudness as a function of bandwidth. Therefore when dealing with hearing sensations, it is very effective to transfer first the frequency scale into the critical-band-rate scale. The effect of masking produced by narrow-band maskers is level dependent and, therefore, a nonlinear effect. As shown in Fig. 4(a), all masked thresholds show a steep rise from low to higher frequencies up to the maximum of masking. Beyond this maximum, the masked threshold decreases quite rapidly toward higher frequencies for low and medium masker levels. At higher masker levels, however, the slope toward high frequencies, that is, larger critical-band rate, becomes increasingly shallow. This nonlinear rise of the upper slope of the masked threshold with the masker level is an effect that is assumed to be produced in the inner ear already. The outer hair cell rows form a feedback J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 6 PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS _ f -(z=1Barkr--100rne) pf.../2, _1 Bark-9+41d _________kHz/ _ z _ 10 (a) J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March (b) (c) 7 ZWICKER AND ZWICKER PAPERS Fig. 2. (a) Scale of uncoiled cochlea. (b), (c) Critical-band rate (ordinate, linear scale) as a function of frequency. (b) Linear scale. (c) Logarithmic scale. Useful approximations are indicated by broken lines and related equations. Adopted from [5]. 8 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS the inner ear already, producing a narrower amplitude distribution of the traveling wave and consequently a narrower masking pattern at low masker levels. This is also seen in data for masking patterns produced by a model reproducing peripheral preprocessing in the inner ear, including nonlinear feedback with lateral coupling. The real masking data [Fig. 4(a)] and the model data [Fig. 4(b)] compare very nicely. Therefore, and in accordance with physiological data from animals, it can be assumed that simultaneous masking is already produced in the peripheral preprocessing of the inner ear, that is, before the information is transferred to the neural level. Another possibility to measure masking is psycho-acoustical tuning curves. In this case the level of the test tone is fixed, while the level of the masker, in most cases also a tone, is increased so that the test tone just becomes inaudible. Plotting this masker level as a function of critical-band rate results in the so-called psychoacoustical tuning curves. Such a curve is outlined in Fig. 4(c); it has a shape that correlates quite strongly with the data seen in Fig. 4(d), as described in the following. The assumption that frequency selectivity takes place in the peripheral part of the auditory system (the inner ear) and produces the critical-band-rate scale can also be supported by experiments on the suppression of so- called spontaneous otoacoustic emissions [9], which appear in half of all ears at levels around 0 dB SPL. These tonal emissions, which are proven to be produced in the inner ear, can be suppressed by adding a suppressor tone, the level of which is given in the related suppression tuning curves as a function of the critical-band rate. The curve given belongs to the criterion of 6-dB amplitude reduction of the spontaneous otoacoustic emission. These are objective data because they do not depend on a subject 's response at all. By comparing Fig. 4(c) and (d), one can easily see that the psycho-acoustically measured tuning curves, which involve the highest possible signal processing level in the brain, show the same frequency selectivity as the suppression tuning curves resulting from purely peripheral processing. Therefore the frequency-selective and nonlinear effect of simultaneous masking produced in our auditory system can be assumed as being produced already in the peripheral part of the inner ear and still in the analog domain, that is, installed before the signal information is transferred into neural information using spike rates. Since the arrangement of the hair cells is equidistant and the form of the traveling wave in the inner ear, besides the shift along the basilar membrane, does not change much as a function of frequency, it becomes understandable why the natural scale of the critical-band rate, which corresponds to the location along the 100 dB 80 0 60 a, 40 20 0 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 9 ZWICKER AND ZWICKER PAPERS 100 dB 80 0 rn 60 a, Ti 40 — 20 0 psychoacoustical tuning curve 100 dB 80 0 L 60 F. 2 A 1 0. 0. 40 0 20 0 06 10 8 10 12 14 16Bark 20 6 8 10 12 14 16Bark 20 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS critical - band rate (c) (d) Fig. 4. (a) Level of test tone barely masked by narrow-band noise of given level, (b) Test-tone level needed in model to produce 1-dB increment anywhere along basilar membrane. (c) Psychoacoustical and (d) suppression tuning curve; each as a function of critical-band rate. J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 11 ZWICKER AND ZWICKER basilar membrane, is the adequate scale to describe frequency and frequency-selective effects in hearing. 2.2 Transformation from Level to Specific Loudness When we talk about loudness in view of quantitative relations, we often think of the loudness function of a 1-kHz tone. This function is established by answering the question of how much louder a sound is heard relative to a standard sound. The standard sound in electroacoustics is a 1-kHz tone, and the reference level in this case is 40 dB. Many measurements of different laboratories have produced similar results so that eventually the loudness function of a 1-kHz-tone in the free-field was standardized. It is given in Fig. 5 as a solid curve. With the definition that a 1-kHz tone of 40-dB SPL has the loudness of 1 sone, the curve indicates that doubling the loudness from 1 to 2 sone is equivalent to increasing the sound-pressure level from 40 to 50 dB. The same holds for larger levels: a doubling in loudness is achieved with each increment of 10 dB of the 1-kHz tone. This means that 50 dB corresponds to 2 sone, while 100 dB corresponds to 64 sone. The loudness function of the 1-kHz tone above 40 dB corresponds to a power law if loudness is related to sound intensity. Its exponent can be extracted easily by the fact that a 10-dB increment produces an increment in loudness of a factor of 2, which in logarithmic values is equivalent to an increment of 3 dB. Therefore the exponent of the power law connecting loudness with the sound intensity of the 1-kHz tone for sound pressure levels above 40 dB is 0.3. At sound pressure levels below 40 dB, the loudness function becomes steeper and steeper toward threshold in quiet, which per definition corresponds to a loudness of 0 sone. On the logarithmic loudness scale this zero corresponds to a value of minus infinity. From the masking pattern outlined in Fig. 4(a) and the corresponding excitation pattern, we already know Fig. 5. Loudness function of 1-kHz tone (solid line) and uniform exciting noise (dotted line). Loudness is given as a function of sound pressure level. Approximations using power laws are indicated as broken and dashed-dotted lines together with the corresponding equations. Adopted from [5]. PAPERS spectral width, does not lead to an infinitesimally narrow excitation in our auditory system—the final receiver—and thus on the critical-band-rate scale. Instead, it results in an excitation over a range increasing with larger SPL values of the 1-kHz tone. Although easily describable in purely physical terms, the 1-kHz tone produces a complex pattern of excitation, which from this point of View does not seem directly useful for answering the question we are interested in, namely, transferring the excitation into an equivalent psycho-acoustic value. When we talk about loudness, we mean total loudness, knowing that this loudness is comprised of very many partial loudnesses which are located along the criticalband-rate scale. The physiological equivalent of this assumption would be that all the neural activity of the sensory cells along the basilar membrane is summed up into a value that finally leads to the total loudness. Many experiments dealing with the loudness of sounds of different spectral widths have shown that the instruments our auditory system uses are the critical bands that shape and weigh the many partial loudnesses to be summed up. If the summation or integral mentioned leads to the loudness that is given in units of sones, the value we are looking for has to have the dimension of sones per bark. This value is called specific loudness and is denoted by N' . The total loudness N is thus the integral of specific loudness over the critical-band rate, which can be expressed mathematically as follows (1) [10]: 24 bark N= N'(z) dz . Since the 1-kHz tone produces a complicated excitation pattern and therefore also a complicated specific-loudness pattern, we have to search for a sound that produces more homogeneous excitation versus the critical-bandrate pattern. This sound is the uniform exciting noise, which fills up the entire auditory range in such a way that the same sound intensity falls into each of the 24 abutting critical bands (meaning that all critical bands are positioned adjacent without space between them). The loudness of such a uniform exciting noise was measured. It was found that the loudness of 1 sone is reached at a level of about 30 dB for uniform exciting noise. The entire loudness function of uniform exciting noise is shown by the dotted line in Fig. 5. The curve rises somewhat more steeply with level than the loudness of the 1-kHz tone, at least for levels of uniform exciting noise to about 50 dB. Above 60 dB, the dotted line can also be approximated by a straight line, which is shown dotted-dashed in Fig. 5. This straight line again means that a power law holds for the relation between the loudness of uniform exciting noise and the sound intensity of that noise. The exponent of this dotted-dashed line is smaller, however, than that for the loudness function of the 1-kHz tone (dashed straight line). It has a value of only 0.23, and thus the two loudness functions shown in Fig. 5 come closer together at higher that a 1-kHz tone, although it has an infinitely small 12 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS levels. Besides the different exponents of the two loudness functions at higher levels it is also interesting to see that the loudness of uniform exciting noise is much larger than the loudness of the 1-kHz tone in almost the entire level range indicated. For example, the loudness of a 60-dB uniform exciting noise is about 3.5 times larger than the loudness of the 1-kHz tone with the same level. This difference is a very distinct effect, which plays an important role in judging and measuring the loudness of noises. It indicates very clearly that an overall sound-pressure level of broad-band noises is an extremely inadequate value if loudness is to be approximated. Unfortunately most noises producing annoyance to people are broad-band noises, and the A-weighted sound-pressure level is a measure of the total level, which creates misleading values when used as an indication for loudness. Almost all sounds occurring in audio broadcasting and recording not only have a large bandwidth but also differ in spectral shape. Therefore meters based merely on total level (such as VU or peak-level meters) usually give readings quite unrelated to loudness, although these readings should correspond to loudness sensation as closely as possible from the view of the listener as the final receiver. This is referred to later. Because uniform exciting noise produces the same excitation along the whole critical-band-rate scale, it can be used very nicely to calculate the value we are searching for (the specific loudness) out of its total loudness. Fig. 6 shows the procedure schematically. In Fig. 6(a) the excitation levels of uniform exciting noise (dashed) and of narrow-band noise, one critical band wide and centered at 1 kHz (solid), are shown. The two distributions are given for the condition that both the uniform exciting noise and the narrow-band noise have the same sound-pressure level of 64 dB. This value was chosen because the level in each of the 24 abutting critical bands produced by the uniform exciting noise is 50 dB, leading to an overall sound-pressure level of 50 dB + (10 x log 24) dB = 64 dB. For the narrow-band noise, the entire intensity is concentrated around 1 kHz, corresponding to a critical-band rate of 8.5 bark. The distribution of the excitation level as a function of critical-band rate reaches a peak value of 64 dB for the narrow-band noise, while it remains constant for the uniform exciting noise at 50 dB. AUDIO ENGINEERING AND PSYCHOACOUSTICS different values of the excitation level, that is, different values of the total level and total loudness of uniform exciting noise. The results show that specific loudness is related to the excitation in a similar way as the total loudness of uniform exciting noise is related to the sound intensity of the noise at high levels, namely, through a power law with an exponent of 0.23. The effect of threshold, which influences the relation between specific loudness and excitation level for levels ranging between threshold and about 40 dB above threshold, is ignored here for reasons of simplicity and accessibility. For practical applications, the exponent of 0.23 is often approximated with 0.25, as it then corresponds to the factor 0.5 of the sound pressure, and this square root is easily available technically. The distribution of specific loudness as a function of critical-band rate for the 1-kHz narrow-band noise with the same level of 64 dB is shown in Fig. 6(b) by a solid line. It is obvious that the loudness of the two noises, the uniform exciting noise and the narrow-band noise, that is, the integral of specific loudness over critical-band rate, is quite different for the two noises. For the narrow-band noise, the area below the curve corresponding to the integral is only about one quarter of that of the rectangularly shaped area of the uniform exciting noise, The same relation can be seen in Fig. 5 for a level of 64 dB, where the two curves indicate a loudness of 20 sone for the uniform exciting noise, but only 5 sone for the 1-kHz tone, which is as loud as the narrow-band noise centered at 1 kHz. The distributions of specific loudness as a function of critical-band rate shown in Fig. 6(b) are the most extreme cases. The one produced by uniform exciting Using our assumption that total loudness is the integral over specific loudness along the critical-band-rate scale (as discussed), we can calculate the specific loudness corresponding to an excitation level of 50 dB from the total loudness of uniform exciting noise. According to Fig. 5, uniform exciting noise with a level of 64 dB produces a total loudness of 20 sone. Dividing this value by 24 bark, the entire width of the critical-bandrate scale, leads to the value for the specific loudness caused by an excitation level of 50 dB, that is, 20 sone divided by 24 bark leads to about 0.85 sone/bark. The same procedure can be used to calculate the relation between specific loudness and the excitation level for J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 13 ZWICKER AND ZWICKER 14 PAPERS J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS critic al-band rate (a) (b) Fig. 6. (a) Excitation level and (b) specific loudness of narrow-band noise (solid lines) and uniform exciting noise (broken lines) of equal sound-pressure levels (64 dB) as a function of critical-band rate. 60 dB 11) 40 0 20 x a) 01 2 1 6 B a rk J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 24 15 ZWICKER AND ZWICKER PAPERS noise is completely flat. However, even this noise produces a flat shape only if the frequency response between the free-field condition and the sound pressure at the ear drum is not accounted for. In our discussion we ignore this for didactical reasons, but for precise loudness measurements all these effects naturally have to be included [4], [6]. The distribution of specific loudness over the critical-band rate is often called the loudness pattern. This pattern varies for different kinds of noises, tones, or complex tones quite drastically. However, this loudness pattern is the pattern that is most interesting for the assessment of sound quality in the case of steady-state conditions because it shows on both coordinates the adequate hearing values: frequency is expressed via critical-band rate and level is expressed via specific loudness. If temporal effects are taken into account as well, then the time-varying specific-loudness versus critical-band-rate pattern contains all the information that eventually is evaluated by our auditory system. 2.3 Pattern of Specific Loudness versus Critical-Band Rate versus Time From the many temporal effects included in the masking mechanisms only that of postmasking is discussed here, because it has the biggest impact on efficient coding for digital audio broadcasting (DAB) [2]. Postmasking results from the gradual release of the effect of a masker, that is, masking does not immediately stop with switching off the masker but still lasts while there is actually no masker present. Postmasking depends on the duration of the masker. Fig. 7 shows a typical result for a 2-kHz test-tone burst of 5-ms duration. The delay time at which the test-tone burst is presented after the end of the masker is plotted as the abscissa. The level of the test-tone burst necessary for audibility is the ordinate. For a long masker duration of at least 200 ms, the solid curve indicates postmasking. 60 dB *IT S 40 5. 200 200ms5 UMN li Tm= :• ' " ''' 5 n t td fr.2kHz 20ms 50 100 200 500 delay time, td 5 10 Fig. 7. Dependence of postmasking on masker duration: Level of barely audible test-tone burst as a function of its delay time (time between end of masker and end of test tone). Duration of maskers 200 and 5 ms; level of masker (uniform masking noise) 60 dB; duration of 2-kHz test tone 5 ms. Adopted from [5]. It decreases from the value for simultaneous masking (plotted on the left, outside the logarithmic scale) as a function of the delay time. However, postmasking produced by a very short masker burst (such as 5 ms) behaves quite differently. Postmasking in this case (as indicated by the dotted line in Fig. 7) decays much faster so that already after about a 50-ms threshold in quiet is reached. This implies that postmasking strongly depends on the duration of the masker and therefore is another highly nonlinear effect. Specific loudness as calculated from excitation in the steady-state condition can also be considered as being a time-dependent value. Simultaneous masking and postmasking can be used to approximate the time functions of the specific loudness. Using this complete transformation, the specific loudness for a tone burst of 200 ms and that for a tone burst of 5 ms is plotted over time in Fig. 8. The tone bursts are located on the linear time scale in such a way that both bursts end at the same instant (200 ms). For the 200-ms tone burst, the specific loudness shows a very steep rise and stays at the peak value for almost 200 ms. The subsequent decay does not seem to have only one time constant. The specific loudness of the 5-ms tone burst rises just as quickly as for the 200-ms tone burst; the decay, however, is quite different and much faster, as can be expected from the postmasking pattern shown in Fig. 7. The different behavior of the specific loudness after the end of the tone bursts is shown by a dotted and a solid line. The two different decays can be approximated very roughly by single time constants of about 30 ms for a tone-burst duration of 200 ms and about 8 ms for a duration of 5 ms. Actually, in both cases the slope is much steeper during the early decay and less steep during the later decay (compared to the approximation using only one time constant). These functions of specific loudness versus critical-band rate versus time illustrate best the information flow in the human auditory system. As three-dimensional patterns, they contain all the information that is subsequently processed and leads to the different hearing sensations. An example for such a complete pattern is shown in Fig. 9 for the spoken word "electroacoustics" fed into the auditory system. The specific loudness produced by this sound is plotted for 22 places with a 1-bark spacing along the critical-band-rate scale. For speech transmission, the spectral resolution in about 20 abutting channels is sufficient; for the transmission of music, additional information on pitch is necessary. However, the most important information, especially in music with strong time Fig. 8. Specific loudness produced by masker bursts of 200 ms (dotted line) and 5 ms (solid line) as a function of time. 16 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS temporal effects, can already be J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 17 ZWICKER AND ZWICKER seen nicely in patterns showing the specific loudness as a function of critical-band rate and of time. The pattern in Fig. 9 clearly shows the formants of the vowels and the spectral centers of the consonants, and also indicates the relatively quick rise following the stimulus, as well as a longer decay corresponding to postmasking. PAPERS paper, it would be much better to control the broadcasting level utilizing a loudness level meter [111, [12] rather than a volume meter, the reading of which is only of importance for preventing equipment overload but not for the listener. Total loudness can be derived from the 24 specific-loudness channels by summing up all 24 channels and feeding this function through a special low pass which in useful approximation reproduces the behavior of our auditory system in regard to temporal effects in loudness perception. Through this special low pass, the time function of the perceived loudness is strongly smoothed, but shows single syllables with clear separation. It is then evident that peak loudness, normally assumed to be the perceived loudness, is produced by the vowels in speech. Consonants and plosives are very important for the understanding of speech and are also very clearly visible in the specific-loudness versus critical-bandrate versus time pattern; their contribution to the total loudness, however, is almost negligible. 3 APPLICATIONS 3.1 Loudness Loudness is a sensation of great interest in many problems related to audio engineering. For example, it is of interest how the loudness of a piece of music is perceived where the level changes drastically as a function of time. Often engineers are interested in a single number that is comparable with other data. Fig.shows the loudness versus time function of pieces of broadcast music interrupted by a commercial. In order to get an estimate of the loudness perceived by the listener, the so-called cumulative loudness distribution is calculated for the different parts of the broadcast, as indicated by the numbers and the dashed vertical dividing lines. The cumulative loudness distribution supplies information about the probability that a given loudness is exceeded. This probability is shown in Fig. 10( ) for the three different temporal parts indicated by the numbers in Fig. 10(a). At the start of the specific sequence, around (0) a jingle is presented. Comparisons of the loudnesses perceived by many subjects have indicated that the average loudness corresponding to N50 (the loudness exceeded in 50% of the time) gives an inadequate number, whereas N 5 to N10 give adequate readings of what the subjects really perceive. It becomes very clear from Fig. 10(b) that the commercial 0 is perceived far louder than the adjacent pieces of music ©. Sometimes in broadcasting different voices follow each other in the program, with the level being monitored on a volume meter. Adjustment for equal level of the voices often leads to strongly unequal loudness perceived by the listener (the final receiver), who can be rather annoyed by this, In accordance with the basic idea introduced at the beginning of this 18 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS 22 N , I lsone/Bark __________________________________________ 2 1 20— __ -0 19 g 18 ai Ln _o 0 0 17 ____ 16 _________________ ar1111.11111mos...— . ____________________ S TI CS time J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 19 ZWICKER AND ZWICKER PAPERS Fig. 9. Specific-loudness versus critical-band-rate versus time pattern of spoken word "electroacoustics." Specific loudness is plotted for 22 discrete values of critical-band rate. Ordinate scale is marked at panel related to 21 bark. Abs cissa—time; 200 ms is indicated. Total loudness as a function of time is plotted on top. Adopted from [6]. 20 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS 3.2 Sharpness Sharpness is an important concept correlated with the color of sound, and can also be calculated from the specific-loudness versus critical-band-rate pattern. It was found that the sharpness of narrow-band noises increases proportionally with the critical-band rate for center frequencies below about 3 kHz. At higher frequencies, however, sharpness increases more strongly, an effect that has to be taken into account when the sharpness S is calculated using a formula that gives the weighted first momentum of the critical-band-rate distribution of specific loudness, i2 4 b a r k N acum. 24 bark S = 0 .1 1 fo (2) N' dz ' • g(z) 0 z dz In Eq. (2) the denominator gives the total loudness, while the upper integral is the weighted momentum mentioned. The weighting factor g(z) takes into account the fact that spectral components above 3 kHz contribute more to sharpness than components below that frequency. An example of the calculation of sharpness is given in Fig. 11 for uniform exciting noise and for a high-pass noise above 3 kHz. The weighted specific loudnesses are shown as a function of the critical-band rate together with the location of their first momentum (center of gravity) marked by arrows. When the cutoff frequency of the high-pass noise is shifted toward lower values and the noise is finally transformed into a uniform exciting noise, loudness increases quite strongly; however, sharpness decreases markedly, in agreement with psychoacoustical results. 3.3 Fluctuation Strength Fluctuation strength is a sensation correlated to the temporal variation of sounds. It was examined quite extensively by Fastl [13] during the last decade. It is C D, important for the transmission of music as well as for the transmission of speech. Interestingly, the fluctuation strength measured as a function of the modulation frequency shows a maximum near 4 Hz, a value for which the frequency of syllables in running speech has a maximum as well. Fluctuation strength can also be calculated using the temporal dependence of the specific-loudness versus critical-band-rate pattern. The period of the modulation (or its frequency) as well the ratio between maximum specific loudness and minimum specific loudness are of importance. Without going into detail, the influence of room acoustics on the fluctuation strength may be illustrated by using a 100% amplitude-modulated 1-kHz tone. Such a tone, recorded under free-field conditions, is played back in a room. The 100% amplitude modulation is decreased quite strongly to a nonsinusoidal amplitude modulation (Fig. 12). The specific loudness corresponding to the frequency range around 1 kHz is shown as a function of time in Fig. 12(a) for the recorded sound and in Fig. 12(b) for the sound picked up by a microphone in the room. The difference between the two time functions is remarkable, indicating that room acoustics influence the fluctuation strength quite strongly and thus the quality of sound reproduction. Actual values in the example illustrated lead to a 75% reduction of fluctuation strength. 3.4 Room Acoustics Room acoustics, however, produce positive effects, too. For example, reverberation increases the loudness of a speaker in a room because of the many reflections 4 8 12 16Bark 24 critical-band rate Fig. 11. Sharpness of uniform exciting noise (broken line, area hatched lower left to upper right), and high-pass noise (dotted line, hatched upper left to lower right). Weighted specific loudness is shown as a function of critical-band rate. Calculated sharpness is indicated by vertical arrows. Adopted from [51. time (a) 5 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 10 15 sone 20 loudness (b) 21 PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS Fig. 10. (a) Loudness—time function of broadcast including a jingle at (0), preparation for a commercial 0, commercial 0, and music 3. (b) Cumulative loudness distributions. J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 22 PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS that finally lead to a more diffuse field rather than to a free field. As an example, a speaker is approximated as a source of constant volume velocity, and Fig. 13 indicates the effect of increasing reverberation in a room when the same speech source produces loudness versus time functions under three different conditions. Curves (a) give the free-field condition, curves (b) the condition for a room with a reverberation time of 0.6 s, and curves (c) give the same for a room with a reverberation time of 2.5 s. Short periods from a 10-min speech are shown in the left part of Fig. 13. The right part indicates the corresponding cumulative distributions resulting from the loudness versus time functions for the three conditions. Using the loudness exceeded in 10% of the time as an indication of the perceived loudness, it can be expected that the speech is 1.2 times louder in the room with 0.6-s reverberation time and about two times louder in the room with 2.5-s reverberation compared with the loudness produced in the free-field condition. This increment in loudness is often very helpful for the intelligibility of speech in rooms as long as the reverberation time does not produce temporal masking, which reduces the audibility of faint consonants 3.5 Digital Transmission and Reproduction of Audio Signals at Reduced Bit Rate Transmission and reproduction at reduced bit rate (especially in light of the proposed realizations for DAB) as a new and important area in electroacoustics and audio engineering were a major motivation for writing this paper. The pattern of specific loudness versus critical-band rate versus time can be used as a 'yardstick' that we have to follow in order to reduce information without introducing audible distortion of the sound. This holds for music as well as for speech. The systems realized in this area are strictly following this basic idea and, so far, mostly masking effects have been taken into account. Since this particular area is well covered by other publications and this paper deals with the fundamental ideas behind DAB, there is no need to go into the technical details here. It should be mentioned, though, that in music the physical equivalent of spectral pitch percepts—which can be extracted by a hearing-equivalent spectral analysis—can also be used as a tool to reduce the information flow drastically without making this reduction audible [14]. 10 sone Bark 8 a, c 6 7 .(.2 4 U o_ vi 2 free field 0 fr.-250ms appearing in sequence to loud vowels. The specific-loudness versus critical-band-rate versus 20 sone Bark 16 12 8 4 0 time (a) (b) Fig. 12. Specific-loudness versus time function of tone with 100% amplitude modulation. (a) in free-field condition. (b) played back in room. J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 23 ZWICKER AND ZWICKER PAPERS 100 E0 80 `Z 15 c c 13 3 1 u( ) 60. (7, 20 0 0 . 5 24 1 2 3 5 total loudness 10sone 20 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS Fig. 13. Effect of reverberation time on loudness—time functions (left) and on loudness distributions (right). Data obtained (a) in free-field condition and in rooms with reverberation times of (b) 0.6 s and (c) 2.5 s indicate increase of loudness with increasing reverberation time. Adopted from [6]. J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March 25 ZWICKER AND ZWICKER time pattern produced by music or speech can be seen on the screen incorporated in modern loudness meters. It is very interesting and impressive to listen to music or speech and at the same time look at this information flow indicated by the movement of this pattern. It illustrates very strongly what we have tried to convey to the reader as the basic idea behind the data reduction necessary for efficient DAB. 4 CONCLUSIONS The specific-loudness versus critical-band-rate versus time pattern contains all the information that is used by our auditory system in order to produce the different hearing sensations. We propose not to transfer less information than contained in this pattern; however, we also do not need to transfer more than this information. A reproduction accuracy of 1 dB in excitation level, corresponding to a relative value of 7% in specific loudness, is sufficient for practical applications. A Personal Comment of E. Z. PAPERS and Models," Trans. Comm. on Hearing Research H-84-13, Acoust. Society of Japan, 1984. [9] E. Zwicker, "The Inner Ear, a Sound Processing and a Sound Emitting System," J. Acoust. Soc. Jpn (E), vol. 9, pp. 59-74 (1988). [10] ISO 532, "Acoustics—Method for Calculating Loudness Level," International Organization for Standardization, Geneva, Switzerland, 1975. [11] B. Bauer and E. L. Torick, "Researches in Loudness Measurements," IEEE Trans. Audio Electroacoust., vol. Au-14, no. 3, 1966. [12] B. L. Jones and E. L. Torick, "A New Loudness Indicator for Use in Broadcasting," SMPTE (1981 Sept.). [13] H. Fastl, "Fluctuation Strength of Modulated Tones and Broadband Noise," in: R. Klinke and R. Hartmann Eds., Hearing—Physiological Bases and Psychophysics (Springer, Berlin, 1989). [14] W. Heinbach, "Aurally Adequate Signal Representation: The Part-Tone-Time-Pattern," Acustica, vol. 67, pp. 113-121 (1988). In 1950 I had to solve the problem of why tape-recorded music was accepted differently if recorded and played back by different apparatus. The barely noticeable amplitude and frequency modulation as a function of modulation frequency and level, as characteristics of the final receiver (our auditory system), have led to the solution. Today, after having published more than 200 papers related to the field, I am propagating the same approach for a solution of the problems in modern electroacoustics and audio engineering, although at a somewhat higher level. It is obvious that we have learned quite a bit during the last 40 years, and I would like to thank all those who have contributed to that very much. 5 REFERENCES [1] D. Krahe, "Ein Verfahren zur Datenreduktion bei digitalen Audio-Signalen unter Ausnutzung psychoakustischer Phanomene," Rundfunktech. Mitt., vol. 30, no. 3, pp. 117-123 (1986). [2] G. Stoll, M. Link, and G. Theile, Masking-Pattern Adapted Subband Coding: Use of the Dynamic Bit-Rate Margin," presented at the 84th Convention of the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. 36, p. 382 (1988 May), preprint 2585. [3] G. Stoll and Y. Dehery, "High Quality Audio Bit-Rate Reduction System Family for Different Applications," in Proc. IEEE ITC '90 (1990), pp. 937941. [4] E. Zwicker and R. Feldtkeller, Das Ohr als Nachrichtenempfanger (Hirzel, Stuttgart, 1967). [5] E. Zwicker, Psychoakustik (Springer, Berlin, 1982). [6] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models (Springer, Berlin, 1990). [7] E. Zwicker and M. Zollner, Elektroakustik (Springer, Berlin, 1987). [8] H. Fastl, "Dynamic Hearing Sensations: Facts 26 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March AUDIO ENGINEERING AND PSYCHOACOUSTICS 27 ZWICKER AND ZWICKER PAPERS THE AUTHORS 28 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS AUDIO ENGINEERING AND PSYCHOACOUSTICS E. Zwicker J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March U. T. Zwicker 29 ZWICKER AND ZWICKER PAPERS Eberhard Zwicker was born in Ohringen, Germany, in 1924. He studied physics at the University Tfibingen (1945/46), and electrical engineering (communications) at the Technical University Stuttgart from 1946 to 1950. He received a doctor-engineer degree in electroacoustics in 1952. That year, as scientific assistant with Professor 30 J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March PAPERS Dr. R. Feldtkeller at the Technical University Stuttgart (TUS), he began a career of teaching and research. He spent one year as a researcher at the Harvard University Psychoacoustics Laboratories, Cambridge, MA (1956/ 57), as associate professor at the TUS (1957/61), lecturing with Dr. J. Zwislocki at Syracuse University Bio-Acoustic-Laboratory, Syracuse, NY, with visits to numerous colleges in the USA at the invitation of the American Institute of Physics (1961/62); extraordinary professor at TUS (1962/67); research at Bell Telephone Labs, Murray Hill, NJ (1964); and as Professor and Director of the Institute of Electroacoustics, Technical University Munich (1967/90). During those years, Professor Zwicker was also a member of the Kuratorium der Technisch-Physikalischen Bundesanstalt (1970/76), Speaker of the special research group Cybernetics sponsored by the Deutsche Forschungsgemeinschaft (1971/77), Dean of the Faculty of Electrical Engineering at the Technical University of Munich (1977/79), Speaker of the special research group Hearing sponsored by the Deutsche Forschungsgemeinschaft (1983/90), and did research and lecturing throughout the world including the USA, Great Britain, Japan, France, Switzerland, The Netherlands, Belgium, Spain, Poland, Czechoslovakia, Hungary, Italy, Austria, and Argentina. Professor Zwicker's committee activity in the field of acoustics started in 1955 as a member of the German DIN standardization committees on Acoustic Measurements, Loudness and Noise Measurements, and Electronic Filters. In 1958 he became a member of ISO TC 43/working group Loudness From Objective Analysis, and was the German delegate to ISO meetings in Stockholm (1958), Rapallo (1960), Helsinki (1961), and Baden-Baden (1962). In 1959 he was secretary of the 3rd Congress of the International Commission on Acoustics in Stuttgart. He became international correspondent of the Committee of Hearing and Bioacoustics (CHABA) in 1963, a member of the International Commission on Acoustics from 1966 to 1972, and in 1984 was corresponding member of the Institute of Noise Control Engineering. In 1956, Professor Zwicker was awarded the venia legendi in electroacoustics by the Nachrichtentechnische Gesellschaft. He received a Fellowship from the Acoustical Society of America in 1962, and, in 1987, he was awarded that society's Silver Medal. In 1982 he was made an Honorary Member of the Audio Engineering Society, and in 1988 he received the following 31 awards, Bundesverdienstkreuz am Bande des Verdienstordens der Bundesrepublik Deutschland, the KarlKiipfmtiller-Ehrenring from the Technical University Darmstadt, and the Preis der HOrgerdte-Akustiker. In 1990 October Professor Zwicker retired from his duties as Director of the Institute of Electroacoustics at the Technical University Munich. A little more than a month later, on November 22, he died of cancer at his home in Icking, Germany. In 1991 February, the AES Gold Medal was awarded to him posthumously. Professor Zwicker's obituary appears in In Memoriam in the 1991 March issue of the AES Journal. • Ulrich Tilmann Zwicker was born in Stuttgart, Germany, in 1955. He studied physics and electrical engineering at the Technical University Munich (TUM) from which he received a bachelor's degree specializing in communications engineering, electroacoustics, and psychoacoustics in 1978, and a master's degree in 1981. After graduating, Dr. Zwicker became a research associate at the Institute of Electroacoustics (TUM) in the area of acoustics, electroacoustics, psychoacoustics, acoustical measurements, and audio. In 1983 he became assistant professor at the Institute of Instrumentation (TUM) where his research moved into the field of highfrequency/solid-state acoustics, instrumentation, and control. In addition to administrative tasks, his work included teaching instrumentation with associated laboratory courses, and cooperative projects with Siemens AG, Mercedes-Benz AG, and the 1st Institute of Metrology, Beijing, China. In 1988 he received the Dr.Ing. degree with a dissertation in instrumentation. In 1989, Dr. Zwicker continued his research in the area of psychoacoustics and acoustics as visiting research associate to the Department of Audiology and Department of Electrical and Computer Engineering at Northeastern University, Boston, MA; the Physics/ Astronomy Department of Michigan State University, East Lansing, MI; and the Department of Neurophysiology of the University of Wisconsin Medical School, Madison, WI. He became an associate professor at the Institute of Electroacoustics (TUM) in 1990 doing research in binaural hearing. In 1990 September, he joined the European Patent Office, in Munich, as a patent examiner doing substantive examination. Dr. Zwicker has published extensively and has given numerous invited and contributed lectures in his field of interest. He is a member of the AES. J. Audio Eng. Soc., Vol. 39, No. 3, 1991 March