Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Loudness model extension improving predictions of broadband sounds Josef Schlittenlachera) Wolfgang Ellermeierb) Applied Cognitive Psychology, Technische Universität Darmstadt Alexanderstraße 10, 64283 Darmstadt, Germany Takeo Hashimotoc) Department of Electrical and Mechanical Engineering, Seikei University 3-3-1 Kichijoji Kitamachi, Musashino-shi 180-8633 Tokyo, Japan Current loudness standards disagree in the evaluation of broadband sounds. For example, ANSI S3.4-2007 predicts pink noise to result in greater loudness than other standards do, e.g. DIN 45631, and – more importantly – overestimates loudness compared to actual experimental results. For this reason, an extension of ANSI S3.4-2007 is proposed that corrects for its overestimation of broadband sounds while at the same time retaining its modeling of equal loudness contours. A major change implemented consists of transforming the excitation based on ERBs to specific loudness in Bark. Further changes include using the precise ISO 226:2003 equal-loudness contours in the critical 3-kHz region. Recent and new listening-test data using both synthetic and recorded sounds shows that the extended model predicts loudness more accurately than the original one. 1 INTRODUCTION ANSI S3.4-20071, a comparatively young standard for the computation of loudness of steady sounds, shares the basic principles of the well-established method of DIN 45631 (1991)2 or ISO 532 B (1975)7: Transmission characteristics of the human ear, excitation patterns and specific loudness. Nonetheless it produces different outcomes because the details of its implementation are different. For example, ANSI S3.4 uses an approximation of the revised equal loudness contours of ISO 226:20035, while the other models are based on an earlier version of this standard, ISO 226:1987. Note, however, almost all real-world sounds are complex sounds, not sinusoids. Recent research has shown the loudness of broadband sounds to be considerably overestimated by a) email: [email protected] email: [email protected] c) email: [email protected] b) ANSI S3.411,16. A nearby reason could be that it uses 40 equivalent rectangular bandwidth (ERBN) filters to model peripheral filtering by the auditory system. Compared to other models being based on 24 Bark, this leads to greater spectral summation. Furthermore, it has been shown that the major differences occur at the region around 3 kHz17. A closer look at the equal loudness contours reveals that ANSI S3.4 predicts significantly greater loudness values than ISO 226:2003 in the most sensitive region of the human ear. This fact also contributes to a higher value being computed for complex sounds. The present proposal to extend ANSI S3.4 accounts for both circumstances. By reducing spectral summation and the loudness around 3 kHz, computed overall loudness as well as the specific loudness of broadband sounds become very similar to the outcome of DIN 45631. At the same time the modified model predicts the equal loudness contours of ISO 226:2003 very well. Before introducing the model extension, this paper presents new experimental results using bandpass-filtered pink noise designed to pinpoint the differences between the models in predicting specific loudness. 2 EXPERIMENTS The loudness of pure tones, which has been studied extensively, depends very much on the upper slope of the excitation patterns. By contrast, the loudness of broadband sounds is largely obtained by summing the main loudness of the critical bands involved. Though it cannot be neglected, the upper slope plays a minor role. This makes it difficult to measure specific loudness for a given frequency. The goal of the present experiments using bandpass-filtered pink noise is to provide estimates for the specific loudness of broadband noise for two frequency regions: Low frequencies and the region around 3 kHz. 2.1 Equipment and stimuli Two diotic bandpass-filtered pink noise sounds were compared to a 1-kHz pure tone, at three levels each: 45, 60 and 75 dB SPL. The first noise was obtained by bandpass filtering pink noise between 125 Hz and 1000 Hz (in the following called “lower noise”), the second noise stimulus was generated with limiting frequencies of 1.25 kHz and 5 kHz (“higher noise”). The upper limiting frequency of the lower noise was chosen to be rather high because the current standards do not exhibit similar upper slopes in specific loudness at lower frequencies than 1 kHz. So for this choice, any differences must be primarily due to main loudness. The limiting frequencies of the higher noise were chosen in order to cover a broad frequency range which reveals the greatest differences between the current standards17. The duration of the simuli was 1 second with a Gaussian rise and fall time of 20 ms. They were computed with 24 bit resolution and 48 kHz sampling rate, D/A converted by an external audio interface (RME Hammerfall DSP Multiface II), and subsequently presented via Sennheiser HDA 200 headphones. The experiments took place in a double-walled sound proof chamber. For the Sennheiser HDA 200 headphones, free-field equalization data are available15. These data, based on subjective comparisons made by 16 participants, allow the calculation of freefield levels on the basis of levels measured by a coupler. To get these levels, the same type of coupler and adapter were used here. The free-field frequency response of the headphones thus obtained is almost flat up to 1 kHz, but it shows some irregularities in the frequency region of the second sound, having -4 dB at 3.15 kHz and +5 dB at 4 kHz, both relative to the value at 1 kHz. After the sounds had been generated digitally and adjusted to have the same Leq, the free-field equalization filter was applied digitally and the result stored to a wav-file. The data of course also allowed the calibration of the absolute level being based on the free-field level. 2.2 Procedure and participants A method of adjustment was used like in Schlittenlacher et al.16: The standard stimulus (Ss) and comparison stimulus (Sc) were separated by 1 second with a silent time of 3 seconds until the pair was presented again. The participant was instructed to adjust the level of Sc. He or she could listen to the pair as often as needed until Sc and Ss appeared to be matched in loudness. Because of previous experiences16, the 1-kHz pure tone was always the Ss, the bandpass-filtered pink noise always the Sc. This procedure was repeated eight times for each condition with Sc starting from levels well above or below the expected loudness match. The order of ascending (A) and descending (D) trials was counterbalanced: ADDADAAD. Before the experiment proper, two trials of training were run. 12 students aged 20 to 43 with a median age of 21 years, 6 of them female and 6 male, participated for course credit. Their thresholds in quiet were no worse than 15 dB HL at any frequency between 125 Hz and 8 kHz, measured in octave steps. The predictions made by the loudness standards (see the figures below) were obtained using software provided by the Working Group for Technical Acoustics of TU München (www.mmk.ei.tum.de/~kes/LoudnessMeter) and the Cambridge University Hearing Group (hearing.psychol.cam.ac.uk/Demos/demos.html). 2.3 Results and discussion The loudness matches produced by the 12 participants are illustrated in figures 1 and 2. Circles indicate the medians of a total of 96 trials per condition, whisker lines the interquartile range. The abscissa shows the sound pressure level of the adjusted bandpass-filtered pink noise, the ordinate that of the fixed 1-kHz pure tone. This means that the ordinate also shows the loudness level in phon. As for the 125 to 1000 Hz bandpass-filtered pink noise (“lower noise”), it can be seen in figure 1 that ANSI S3.4 overestimates its loudness by up to 5 dB, however, it is within the interquartile range for two of the three conditions investigated. By contrast, the DIN 45631 prediction always lies within the interquartile range of the data, and comes very close to the median. The loudness of the 1.25 to 5 kHz bandpass-filtered pink noise (“higher noise”) is overestimated significantly by both algorithms, see figure 2, with DIN 45631 being closer to the subjective evaluations. In order to close the circle between the three stimuli, the two noises were also compared directly by 6 of the participants at 60 dB SPL. On average, they adjusted the higher noise to a median level of 58 dB SPL to sound equally loud as the lower noise fixed at 60 dB SPL. Likewise, they adjusted the lower noise to 61 dB SPL to sound as loud as the higher noise fixed at 60 dB SPL. This means that the higher noise needs to be about 1.5 dB less in level in order to be perceived as loud as the lower noise. Almost the same value is obtained when comparing figures 1 and 2, meaning that this transitivity test was successful. The results for the lower noise suggest that the main loudness described by DIN 45631 is probably a very good approximation in the frequency range up to 1 kHz. The subjective evaluations of the higher noise are rather surprising and in conflict with older experiments which found increased spectral summation above 2 kHz8. One might suspect that the free-field equalizer is not as well established as that for other headphones9,19, however, the measurements used15 rely on a careful experiment with subjective evaluations according to IEC 268-74, and the 4 dB amplification of our implementation at the most crucial third-octave at 3.15 kHz is already rather high for a headphone assumed to have a flat free-field frequency response. Even if the absolute level of the present result might be questioned, the experiment clearly indicates that both algorithms overestimate the loudness of broadband sounds around 1.25 to 5 kHz, and that particularly the remarkably high values predicted by ANSI S3.4 might not be correct. 3 EXTENSION FOR ANSI S3.4-2007 Altogether, the experiments confirm that ANSI S3.4 overestimates the loudness of broadband sounds which had already been shown for full-spectrum pink noise16. Therefore sections 3.2 and 3.3 propose an extension and a slight modification of the standard to correct for these departures. First of all, the original algorithm will be introduced. 3.1 Structure of ANSI S3.4 The input of ANSI S3.4 is a spectrum, whereupon various options like components of a complex tone or third-octave levels are allowed. At first, the transmission characteristics of the ear are considered by transfer functions for the outer and middle ear which leads to levels hypothesized to be processed by the cochlea. Then equivalent rectangular bandwidth (ERBN) filters model the critical bands. Applying them takes into account that a component not only excites the basilar membrane at a single point but also at neighboring frequencies. At the same time, this implements the phenomenon of masking. Thereafter the excitation patterns are transformed to specific loudness which is finally summed along the ERBN critical bands and across the two ears to loudness. As the present work does not focus on and hence not give any new findings about masking or the physiology of the ear, nothing about the first stages up to the excitation patterns will be changed. Only the later processing stages will be extended to better comply with the experimental results about loudness and specific loudness. 3.2 Transformation of excitation from ERB to Bark ANSI S3.4 gives the excitation as a function of the ERBN-number. However, the ERB-scale, which relies on notched-noise masking experiments3,14, is not the only alternative to model critical bands. The Bark scale goes further back and is based on various hearing phenomena, for example spectral loudness summation10. It consists of 24 critical bands instead of 40 ERBN which entails less spectral summation of loudness. Why both scales co-exist is not only due to historical reasons but also to the different methods employed in determining their number and bandwidth. While the ERB scale is closely related to peripheral masking as observed at the cochlear level, the suprathreshold psychophysical phenomena used to define critical (Bark) bands may reflect higher stages of auditory processing. Thus, it might be that not all information available at the entrance of the auditory pathway is used to determine the perceptive value of loudness. Apart from this speculation, it is a fact that both scales rely on many and frequently reproduced observations. In addition, there is evidence that “the value of the CBL (critical band for loudness) is similar to, but a little greater than the ERBN”13. For this reason, the present extension proposes to transform the peripheral excitation based on ERBN to the Bark scale, so both scales will be used by the model for the calculation of loudness. Different solutions are conceivable for this transformation. Frequently, Gaussian-shaped or other windows are used for broad tuning tasks12. However, a much simpler method yields very good or even better results here. The present modeling suggests to perform the transformation as follows: For a given frequency or position on the basilar membrane, the processed excitation shall be the arithmetic mean of the original (ERBN) excitation within the Bark band centered at this frequency. For example, the band centered at 16 Bark (3.21 kHz) has got limiting frequencies of 15.5 Bark or 24.4 ERBN (2.95 kHz) and 16.5 Bark or 25.9 ERBN (3.50 kHz). So the excitation at 16 Bark is the mean excitation within 24.4 ERBN to 25.9 ERBN. Analogous to the 0.1 ERBN steps used by ANSI S3.4, the excitation is calculated in 0.1 Bark steps from 0.6 Bark to 23.5 Bark. If a limiting frequency falls below the minimum of 1.8 ERBN or exceeds the maximum of 38.9 ERBN, it is set to the minimum or maximum, respectively. It should further be emphasized that rather than the mean of excitation levels, the mean excitation in linear units is taken. Taking the mean within one Bark has the advantage to implicitly implement the findings of the classical experiments18: No matter how intensity, or in this case excitation, is distributed within one Bark, it results in the same loudness. From the computational point of view, this transformation can be implemented easily and efficiently by multiplying the ERBN excitation vector with a fixed transformation matrix, resulting in the Bark excitation vector. 3.3 Parameters for the calculation of specific loudness Apart from the insufficiencies in predicting broadband sounds, ANSI S3.4 deviates from the equal loudness contours of ISO 226:2003 in the crucial region around 3 kHz. It estimates them very well near the threshold in quiet, however, the discrepancy increases with level, reaching more than 5 dB at 80 phon. If one tried to improve the predictions for high levels by modifying the middle-ear transfer function, that would worsen the calculations for the low levels. Therefore a slight change of the nonlinearity appears to be much more promising. ANSI S3.4 gives three formulae for the calculation of specific loudness from excitation. The most important one for excitation between threshold in quiet and very high levels is 𝑁 ′ = 𝐶[(𝐺 ∙ 𝐸𝑆𝐼𝐺 + 𝐴)𝛼 − 𝐴𝛼 ] (1) where N’ is the specific loudness, ESIG the excitation, α the exponent accounting for the nonlinearity and C a scaling constant. For simplicity, the values for the cochlear amplifier G and A shall be related to the unprocessed ERB excitation and thus remain as specified in ANSI S3.4. The exponent α originally is only variable up to 500 Hz. To account for the equal loudness contours around 3 kHz it shall now be made frequency dependent across the entire range 𝑥 = log10 𝑓/𝐻𝑧 𝛼 = 0.0225𝑥 − 0.2604𝑥 3 + 1.1606𝑥 2 − 2.4046𝑥 + 2.1893 4 (2) (3) which yields that α lies between 0.20 and 0.23 from 500 Hz to 10 kHz. The formula was gained iteratively by choosing the exponent in a way that the equal loudness contours are met well above threshold. To avoid huge lookup tables, the values were eventually fitted to a function of logarithmic frequency where a fourth-order polynomial was the most accurate among several choices. The constant factor C is used to ensure that a 40 dB, 1-kHz pure tone produces a loudness of 1 sone. So C is set to 0.043944. The rest of the algorithm remains unchanged: Specific loudness, now represented in 0.1 Bark steps, is integrated along the critical bands and multiplied by 2 for loudness summation across the two ears. The model extension proposed by Moore and Glasberg12 for better calculation of monaural and dichotic loudness may of course be combined with the present approach as well if its parameters are adapted to the Bark scale. 3.4 Discussion: Calculated loudness and specific loudness The proposed extension of the ANSI S3.4 loudness standard constitutes but a minor modification as is shown by figure 3. One step is added before calculating specific loudness and two parameters are assigned slightly different values. Nothing else is changed, so the model does not become substantially more complex. A first criterion for any valid loudness calculation is a proper prediction for 1-kHz tones as a function of level, because of its importance for the phon scale. Loudness should double with every 10-dB increase above 40 dB SPL and deviations should be within 5%11. This is fulfilled by ANSI S3.4 up to 90 dB and also by the proposed extension, see table 1. Other criteria are the threshold in quiet or equal loudness contours. If the sound of interest is a pure tone, the values for an average young normal-hearing listener can be looked up directly in the appropriate standards, nevertheless a loudness model should predict them as well. ISO 389-7:20056 states that the absolute threshold is at 2.4 phon which corresponds to 0.002 sone in the proposed extension. This value is obtained with an accuracy of ±2 dB between 80 Hz and 10 kHz and is thus within the deviation of the results obtained by the participating laboratories. The equal loudness contours are shown in figure 4. It can be seen that the estimations of the extended model agree reasonably well with the standard (ISO 226:2003). Although real-world sounds may have tonal components, most of them are broadband sounds with little similarity to a pure tone. That is why it is very important to look at the calculated loudness of broadband sounds. Figure 5 illustrates binaural specific loudness as predicted by the two standards and by the present extension for pink noise. To enable this comparison, the summation across the two ears was performed before spectral summation and the original specific loudness of ANSI S3.4 was converted to the Bark scale17. The specific loudness of the extension for ANSI S3.4 is very similar to that of DIN 45631, especially in the most sensitive auditory frequency range. This yields that loudness of pink noise calculated by the extension is close to that of DIN 45631 and hence also close to subjective evaluations16. At low levels, like 15 dB per third octave, the extension predicts loudness to be slightly higher than DIN 45631 does and almost exactly coincides with the experimental data (not shown here). Because of the similar main loudness, the predictions for bandpass-filtered pink noise are also close to those of DIN 45631 and the estimations for the lower noise investigated in the present study are very good (figure 1). With respect to the higher noise the model extension also overestimates loudness, however, not as dramatically as the original model (figure 2). Altogether, the proposed extension predicts broadband noise quite well. At last, a test of the extended model on real-world sounds is presented. Figure 6 depicts results presented by Schlittenlacher et al.16 and shows the predictions of the extended model for stationary technical sounds plotted against the corresponding subjective loudness matches. Third-octave levels were used as the input for the calculations. The prediction made by the proposed extension falls within the interquartile range of the adjustments made for six of the eight sounds. When comparing the differing model predictions and experimental data (figure 6), the root mean square error of the seven louder sounds, based on the deviation of loudness in percent, is 25%. DIN 45631 achieves 21%, the unchanged ANSI S3.4 56%. The softer notebook fan noise was excluded as participants probably were confused by a strong tonal component near the reference frequency, it is not predicted well by any algorithm. A deviation of about 20% is satisfactory because it is within the range of variations between listeners and some of the chosen technical sounds are only approximately stationary. 4 CONCLUSIONS The experimental results of this paper show that ANSI S3.4 and DIN 45631 calculate the loudness of low-frequency broadband sounds quite well. However, both considerably overestimate loudness around 3 kHz, the unchanged ANSI S3.4 to a greater extent than the DIN standard. The proposed extension of the ANSI standard accounts for this problem by reducing the amount of spectral summation and specific loudness around 3 kHz. This leads to predictions for the loudness of broadband and technical sounds which are quite consistent with actual loudness matches obtained in listening tests. At the same time the algorithm is optimized regarding ISO 226:2003. Thus, the extension combines the advantages of the current standards: Like ANSI S3.4, it estimates the revised equal loudness contours very well, but it also achieves the good predictions of DIN 45631 for broadband sounds. 5 ACKNOWLEDGMENTS The authors would like to thank Professor Hugo Fastl (Technische Universität München), Professor Seiichiro Namba and Professor Sonoko Kuwano (Osaka University) for valuable comments and the fruitful cooperation which led to this work. 6 REFERENCES 1. ANSI S3.4, “Procedure for the Computation of Loudness of Steady Sounds”, (2007) 2. DIN 45631, “Berechnung des Lautstärkepegels und der Lautheit aus dem Geräuschspektrum – Verfahren nach E. Zwicker (Procedure for calculating loudness level and loudness)“, (1991) 3. B.R. Glasberg, B.C.J. Moore, “Derivation of auditory filter shapes from notched-noise data”, Hearing Research, 47, (1990) 4. IEC 268-7, “Sound system equipment – Part 7: Headphones and earphones”, (1996) 5. ISO 226, “Acoustics – Normal equal-loudness-level contours”, (2003) 6. ISO 389-7, “Acoustics – Reference zero for the calibration of audiometric equipment – Part 7: Reference threshold of hearing under free-field and diffuse-field listening conditions”, (2005) 7. ISO 532, “Acoustics – Method for calculating loudness level”, (1975) 8. H. Fastl, “Loudness and Masking Patterns of Narrow Noise Bands”, Acustica, 33, (1975) 9. H. Fastl, E. Zwicker, “A free-field equalizer for TDH 39 earphones”, J. Acoust. Soc. of Am., 73(1), (1983) 10. H. Fastl, E. Zwicker, Psychoacoustics – Facts and models, 3rd edition, Springer, (2007) 11. H. Fastl, F. Völk and M. Straubinger, “Standards for calculating loudness of stationary or time-varying sounds”, Proc. Inter-Noise 2009, Ottawa, (2009) 12. B.C.J. Moore, B.R. Glasberg, “Modeling binaural loudness”, J. Acoust. Soc. of Am., 121(3), (2007) 13. B.C.J. Moore, An Introduction to the Psychology of Hearing, 6th Edition, Emerald, (2012) 14. R.D. Patterson, “Auditory filter shapes derived with noise stimuli”, J. Acoust. Soc. of Am., 59(3), (1976) 15. U. Richter, “Characteristic data of different kinds of earphones used in the extended high frequency range for pure-tone audiometry”, Mechanik und Akustik, PTB-MA-72, Braunschweig, (2003) 16. J. Schlittenlacher, T. Hashimoto, H. Fastl, S. Namba, S. Kuwano and S. Hatano, “Loudness of pink noise and stationary technical sounds“, Proc. Inter-Noise 2011, Osaka, (2011) 17. J. Schlittenlacher, H. Fastl, T. Hashimoto, S.Kuwano and S.Namba, “Differences of loudness algorithms across the frequency spectrum”, Tagungsband Fortschritte der Akustik – DAGA 2012, Darmstadt, (2012) 18. E. Zwicker, G. Flottrop, S.S. Stevens, “Critical Band Width in Loudness Summation“, J. Acoust. Soc. of Am., 29(5), (1957) 19. E. Zwicker, D. Maiwald, “Über das Freifeldübertragungsmaß des Kopfhörers DT 48 (On the free-field response of the earphone DT 48)”, Acustica, 13, (1963) L1kHz 10 20 30 NANSI 0.03 0.14 0.42 NANSI, extended 0.03 0.14 0.42 Target value 40 1.00 1.00 1.00 50 2.10 2.10 2.00 60 4.17 4.15 4.00 70 8.10 8.02 8.00 80 16.0 15.5 16.0 90 33.2 31.0 32.0 Table 1: Loudness calculated for 1-kHz pure tones 100 70.4 63.4 64.0 dB sone sone sone Figure 1: Level of a 1-kHz pure tone that is as loud as a bandpass pink noise from 125 to 1000 Hz. Circles indicate medians, whiskers the interquartile range. Figure 2: Level of a 1-kHz pure tone that is as loud as a bandpass pink noise from 1.25 to 5 kHz. Circles indicate medians, whiskers the interquartile range. Spectrum Outer ear transfer function Middle ear transfer function Calculation of excitation patterns ERB to Bark transformation Transformation to specific loudness* Summation along critical bands Summation to binaural loudness Loudness Figure 3: Structure of the extended algorithm. The dashed line indicates that the step was added, the star that parameters were changed Figure 4: Equal loudness contours Figure 5: Binaural specific loudness of pink noise with a third-octave level of 55 dB Figure 6: Loudness of technical sounds. The ordinate shows the loudness as given by the fixed 1-kHz reference tone, the abscissa as calculated by the extended ANSI S3.4. Circles indicate medians, whiskers the interquartile range