Download Loudness model extension improving predictions of broadband

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Auditory system wikipedia , lookup

Noise-induced hearing loss wikipedia , lookup

Sound localization wikipedia , lookup

Sensorineural hearing loss wikipedia , lookup

Olivocochlear system wikipedia , lookup

Sound from ultrasound wikipedia , lookup

Sound wikipedia , lookup

Earplug wikipedia , lookup

Noise in music wikipedia , lookup

Transcript
Loudness model extension improving predictions of broadband
sounds
Josef Schlittenlachera)
Wolfgang Ellermeierb)
Applied Cognitive Psychology, Technische Universität Darmstadt
Alexanderstraße 10, 64283 Darmstadt, Germany
Takeo Hashimotoc)
Department of Electrical and Mechanical Engineering, Seikei University
3-3-1 Kichijoji Kitamachi, Musashino-shi 180-8633 Tokyo, Japan
Current loudness standards disagree in the evaluation of broadband sounds. For example,
ANSI S3.4-2007 predicts pink noise to result in greater loudness than other standards do,
e.g. DIN 45631, and – more importantly – overestimates loudness compared to actual
experimental results. For this reason, an extension of ANSI S3.4-2007 is proposed that
corrects for its overestimation of broadband sounds while at the same time retaining its
modeling of equal loudness contours. A major change implemented consists of
transforming the excitation based on ERBs to specific loudness in Bark. Further changes
include using the precise ISO 226:2003 equal-loudness contours in the critical 3-kHz
region. Recent and new listening-test data using both synthetic and recorded sounds shows
that the extended model predicts loudness more accurately than the original one.
1
INTRODUCTION
ANSI S3.4-20071, a comparatively young standard for the computation of loudness of
steady sounds, shares the basic principles of the well-established method of DIN 45631 (1991)2
or ISO 532 B (1975)7: Transmission characteristics of the human ear, excitation patterns and
specific loudness. Nonetheless it produces different outcomes because the details of its
implementation are different. For example, ANSI S3.4 uses an approximation of the revised
equal loudness contours of ISO 226:20035, while the other models are based on an earlier version
of this standard, ISO 226:1987.
Note, however, almost all real-world sounds are complex sounds, not sinusoids. Recent
research has shown the loudness of broadband sounds to be considerably overestimated by
a)
email: [email protected]
email: [email protected]
c)
email: [email protected]
b)
ANSI S3.411,16. A nearby reason could be that it uses 40 equivalent rectangular bandwidth
(ERBN) filters to model peripheral filtering by the auditory system. Compared to other models
being based on 24 Bark, this leads to greater spectral summation. Furthermore, it has been shown
that the major differences occur at the region around 3 kHz17. A closer look at the equal loudness
contours reveals that ANSI S3.4 predicts significantly greater loudness values than
ISO 226:2003 in the most sensitive region of the human ear. This fact also contributes to a higher
value being computed for complex sounds.
The present proposal to extend ANSI S3.4 accounts for both circumstances. By reducing
spectral summation and the loudness around 3 kHz, computed overall loudness as well as the
specific loudness of broadband sounds become very similar to the outcome of DIN 45631. At the
same time the modified model predicts the equal loudness contours of ISO 226:2003 very well.
Before introducing the model extension, this paper presents new experimental results using
bandpass-filtered pink noise designed to pinpoint the differences between the models in
predicting specific loudness.
2
EXPERIMENTS
The loudness of pure tones, which has been studied extensively, depends very much on the
upper slope of the excitation patterns. By contrast, the loudness of broadband sounds is largely
obtained by summing the main loudness of the critical bands involved. Though it cannot be
neglected, the upper slope plays a minor role. This makes it difficult to measure specific
loudness for a given frequency. The goal of the present experiments using bandpass-filtered pink
noise is to provide estimates for the specific loudness of broadband noise for two frequency
regions: Low frequencies and the region around 3 kHz.
2.1 Equipment and stimuli
Two diotic bandpass-filtered pink noise sounds were compared to a 1-kHz pure tone, at
three levels each: 45, 60 and 75 dB SPL. The first noise was obtained by bandpass filtering pink
noise between 125 Hz and 1000 Hz (in the following called “lower noise”), the second noise
stimulus was generated with limiting frequencies of 1.25 kHz and 5 kHz (“higher noise”). The
upper limiting frequency of the lower noise was chosen to be rather high because the current
standards do not exhibit similar upper slopes in specific loudness at lower frequencies than
1 kHz. So for this choice, any differences must be primarily due to main loudness. The limiting
frequencies of the higher noise were chosen in order to cover a broad frequency range which
reveals the greatest differences between the current standards17.
The duration of the simuli was 1 second with a Gaussian rise and fall time of 20 ms. They
were computed with 24 bit resolution and 48 kHz sampling rate, D/A converted by an external
audio interface (RME Hammerfall DSP Multiface II), and subsequently presented via Sennheiser
HDA 200 headphones. The experiments took place in a double-walled sound proof chamber.
For the Sennheiser HDA 200 headphones, free-field equalization data are available15. These
data, based on subjective comparisons made by 16 participants, allow the calculation of freefield levels on the basis of levels measured by a coupler. To get these levels, the same type of
coupler and adapter were used here. The free-field frequency response of the headphones thus
obtained is almost flat up to 1 kHz, but it shows some irregularities in the frequency region of the
second sound, having -4 dB at 3.15 kHz and +5 dB at 4 kHz, both relative to the value at 1 kHz.
After the sounds had been generated digitally and adjusted to have the same Leq, the free-field
equalization filter was applied digitally and the result stored to a wav-file. The data of course
also allowed the calibration of the absolute level being based on the free-field level.
2.2 Procedure and participants
A method of adjustment was used like in Schlittenlacher et al.16: The standard stimulus (Ss)
and comparison stimulus (Sc) were separated by 1 second with a silent time of 3 seconds until
the pair was presented again. The participant was instructed to adjust the level of Sc. He or she
could listen to the pair as often as needed until Sc and Ss appeared to be matched in loudness.
Because of previous experiences16, the 1-kHz pure tone was always the Ss, the bandpass-filtered
pink noise always the Sc.
This procedure was repeated eight times for each condition with Sc starting from levels well
above or below the expected loudness match. The order of ascending (A) and descending (D)
trials was counterbalanced: ADDADAAD. Before the experiment proper, two trials of training
were run.
12 students aged 20 to 43 with a median age of 21 years, 6 of them female and 6 male,
participated for course credit. Their thresholds in quiet were no worse than 15 dB HL at any
frequency between 125 Hz and 8 kHz, measured in octave steps.
The predictions made by the loudness standards (see the figures below) were obtained using
software provided by the Working Group for Technical Acoustics of TU München
(www.mmk.ei.tum.de/~kes/LoudnessMeter) and the Cambridge University Hearing Group
(hearing.psychol.cam.ac.uk/Demos/demos.html).
2.3 Results and discussion
The loudness matches produced by the 12 participants are illustrated in figures 1 and 2.
Circles indicate the medians of a total of 96 trials per condition, whisker lines the interquartile
range. The abscissa shows the sound pressure level of the adjusted bandpass-filtered pink noise,
the ordinate that of the fixed 1-kHz pure tone. This means that the ordinate also shows the
loudness level in phon.
As for the 125 to 1000 Hz bandpass-filtered pink noise (“lower noise”), it can be seen in
figure 1 that ANSI S3.4 overestimates its loudness by up to 5 dB, however, it is within the
interquartile range for two of the three conditions investigated. By contrast, the DIN 45631
prediction always lies within the interquartile range of the data, and comes very close to the
median. The loudness of the 1.25 to 5 kHz bandpass-filtered pink noise (“higher noise”) is
overestimated significantly by both algorithms, see figure 2, with DIN 45631 being closer to the
subjective evaluations.
In order to close the circle between the three stimuli, the two noises were also compared
directly by 6 of the participants at 60 dB SPL. On average, they adjusted the higher noise to a
median level of 58 dB SPL to sound equally loud as the lower noise fixed at 60 dB SPL.
Likewise, they adjusted the lower noise to 61 dB SPL to sound as loud as the higher noise fixed
at 60 dB SPL. This means that the higher noise needs to be about 1.5 dB less in level in order to
be perceived as loud as the lower noise. Almost the same value is obtained when comparing
figures 1 and 2, meaning that this transitivity test was successful.
The results for the lower noise suggest that the main loudness described by DIN 45631 is
probably a very good approximation in the frequency range up to 1 kHz. The subjective
evaluations of the higher noise are rather surprising and in conflict with older experiments which
found increased spectral summation above 2 kHz8. One might suspect that the free-field
equalizer is not as well established as that for other headphones9,19, however, the measurements
used15 rely on a careful experiment with subjective evaluations according to IEC 268-74, and the
4 dB amplification of our implementation at the most crucial third-octave at 3.15 kHz is already
rather high for a headphone assumed to have a flat free-field frequency response. Even if the
absolute level of the present result might be questioned, the experiment clearly indicates that
both algorithms overestimate the loudness of broadband sounds around 1.25 to 5 kHz, and that
particularly the remarkably high values predicted by ANSI S3.4 might not be correct.
3
EXTENSION FOR ANSI S3.4-2007
Altogether, the experiments confirm that ANSI S3.4 overestimates the loudness of
broadband sounds which had already been shown for full-spectrum pink noise16. Therefore
sections 3.2 and 3.3 propose an extension and a slight modification of the standard to correct for
these departures. First of all, the original algorithm will be introduced.
3.1 Structure of ANSI S3.4
The input of ANSI S3.4 is a spectrum, whereupon various options like components of a
complex tone or third-octave levels are allowed. At first, the transmission characteristics of the
ear are considered by transfer functions for the outer and middle ear which leads to levels
hypothesized to be processed by the cochlea. Then equivalent rectangular bandwidth (ERBN)
filters model the critical bands. Applying them takes into account that a component not only
excites the basilar membrane at a single point but also at neighboring frequencies. At the same
time, this implements the phenomenon of masking. Thereafter the excitation patterns are
transformed to specific loudness which is finally summed along the ERBN critical bands and
across the two ears to loudness.
As the present work does not focus on and hence not give any new findings about masking
or the physiology of the ear, nothing about the first stages up to the excitation patterns will be
changed. Only the later processing stages will be extended to better comply with the
experimental results about loudness and specific loudness.
3.2 Transformation of excitation from ERB to Bark
ANSI S3.4 gives the excitation as a function of the ERBN-number. However, the ERB-scale,
which relies on notched-noise masking experiments3,14, is not the only alternative to model
critical bands. The Bark scale goes further back and is based on various hearing phenomena, for
example spectral loudness summation10. It consists of 24 critical bands instead of 40 ERBN
which entails less spectral summation of loudness.
Why both scales co-exist is not only due to historical reasons but also to the different
methods employed in determining their number and bandwidth. While the ERB scale is closely
related to peripheral masking as observed at the cochlear level, the suprathreshold
psychophysical phenomena used to define critical (Bark) bands may reflect higher stages of
auditory processing. Thus, it might be that not all information available at the entrance of the
auditory pathway is used to determine the perceptive value of loudness. Apart from this
speculation, it is a fact that both scales rely on many and frequently reproduced observations. In
addition, there is evidence that “the value of the CBL (critical band for loudness) is similar to, but
a little greater than the ERBN”13. For this reason, the present extension proposes to transform the
peripheral excitation based on ERBN to the Bark scale, so both scales will be used by the model
for the calculation of loudness.
Different solutions are conceivable for this transformation. Frequently, Gaussian-shaped or
other windows are used for broad tuning tasks12. However, a much simpler method yields very
good or even better results here. The present modeling suggests to perform the transformation as
follows:
For a given frequency or position on the basilar membrane, the processed excitation shall be
the arithmetic mean of the original (ERBN) excitation within the Bark band centered at this
frequency.
For example, the band centered at 16 Bark (3.21 kHz) has got limiting frequencies of
15.5 Bark or 24.4 ERBN (2.95 kHz) and 16.5 Bark or 25.9 ERBN (3.50 kHz). So the excitation at
16 Bark is the mean excitation within 24.4 ERBN to 25.9 ERBN. Analogous to the 0.1 ERBN
steps used by ANSI S3.4, the excitation is calculated in 0.1 Bark steps from 0.6 Bark to
23.5 Bark. If a limiting frequency falls below the minimum of 1.8 ERBN or exceeds the
maximum of 38.9 ERBN, it is set to the minimum or maximum, respectively. It should further be
emphasized that rather than the mean of excitation levels, the mean excitation in linear units is
taken.
Taking the mean within one Bark has the advantage to implicitly implement the findings of
the classical experiments18: No matter how intensity, or in this case excitation, is distributed
within one Bark, it results in the same loudness. From the computational point of view, this
transformation can be implemented easily and efficiently by multiplying the ERBN excitation
vector with a fixed transformation matrix, resulting in the Bark excitation vector.
3.3 Parameters for the calculation of specific loudness
Apart from the insufficiencies in predicting broadband sounds, ANSI S3.4 deviates from the
equal loudness contours of ISO 226:2003 in the crucial region around 3 kHz. It estimates them
very well near the threshold in quiet, however, the discrepancy increases with level, reaching
more than 5 dB at 80 phon. If one tried to improve the predictions for high levels by modifying
the middle-ear transfer function, that would worsen the calculations for the low levels. Therefore
a slight change of the nonlinearity appears to be much more promising.
ANSI S3.4 gives three formulae for the calculation of specific loudness from excitation. The
most important one for excitation between threshold in quiet and very high levels is
𝑁 ′ = 𝐶[(𝐺 ∙ 𝐸𝑆𝐼𝐺 + 𝐴)𝛼 − 𝐴𝛼 ]
(1)
where N’ is the specific loudness, ESIG the excitation, α the exponent accounting for the
nonlinearity and C a scaling constant. For simplicity, the values for the cochlear amplifier G and
A shall be related to the unprocessed ERB excitation and thus remain as specified in ANSI S3.4.
The exponent α originally is only variable up to 500 Hz. To account for the equal loudness
contours around 3 kHz it shall now be made frequency dependent across the entire range
𝑥 = log10 𝑓/𝐻𝑧
𝛼 = 0.0225𝑥 − 0.2604𝑥 3 + 1.1606𝑥 2 − 2.4046𝑥 + 2.1893
4
(2)
(3)
which yields that α lies between 0.20 and 0.23 from 500 Hz to 10 kHz. The formula was gained
iteratively by choosing the exponent in a way that the equal loudness contours are met well
above threshold. To avoid huge lookup tables, the values were eventually fitted to a function of
logarithmic frequency where a fourth-order polynomial was the most accurate among several
choices. The constant factor C is used to ensure that a 40 dB, 1-kHz pure tone produces a
loudness of 1 sone. So C is set to 0.043944.
The rest of the algorithm remains unchanged: Specific loudness, now represented in
0.1 Bark steps, is integrated along the critical bands and multiplied by 2 for loudness summation
across the two ears. The model extension proposed by Moore and Glasberg12 for better
calculation of monaural and dichotic loudness may of course be combined with the present
approach as well if its parameters are adapted to the Bark scale.
3.4 Discussion: Calculated loudness and specific loudness
The proposed extension of the ANSI S3.4 loudness standard constitutes but a minor
modification as is shown by figure 3. One step is added before calculating specific loudness and
two parameters are assigned slightly different values. Nothing else is changed, so the model does
not become substantially more complex.
A first criterion for any valid loudness calculation is a proper prediction for 1-kHz tones as a
function of level, because of its importance for the phon scale. Loudness should double with
every 10-dB increase above 40 dB SPL and deviations should be within 5%11. This is fulfilled by
ANSI S3.4 up to 90 dB and also by the proposed extension, see table 1.
Other criteria are the threshold in quiet or equal loudness contours. If the sound of interest is
a pure tone, the values for an average young normal-hearing listener can be looked up directly in
the appropriate standards, nevertheless a loudness model should predict them as well.
ISO 389-7:20056 states that the absolute threshold is at 2.4 phon which corresponds to
0.002 sone in the proposed extension. This value is obtained with an accuracy of ±2 dB between
80 Hz and 10 kHz and is thus within the deviation of the results obtained by the participating
laboratories. The equal loudness contours are shown in figure 4. It can be seen that the
estimations of the extended model agree reasonably well with the standard (ISO 226:2003).
Although real-world sounds may have tonal components, most of them are broadband
sounds with little similarity to a pure tone. That is why it is very important to look at the
calculated loudness of broadband sounds. Figure 5 illustrates binaural specific loudness as
predicted by the two standards and by the present extension for pink noise. To enable this
comparison, the summation across the two ears was performed before spectral summation and
the original specific loudness of ANSI S3.4 was converted to the Bark scale17. The specific
loudness of the extension for ANSI S3.4 is very similar to that of DIN 45631, especially in the
most sensitive auditory frequency range. This yields that loudness of pink noise calculated by the
extension is close to that of DIN 45631 and hence also close to subjective evaluations16. At low
levels, like 15 dB per third octave, the extension predicts loudness to be slightly higher than
DIN 45631 does and almost exactly coincides with the experimental data (not shown here).
Because of the similar main loudness, the predictions for bandpass-filtered pink noise are
also close to those of DIN 45631 and the estimations for the lower noise investigated in the
present study are very good (figure 1). With respect to the higher noise the model extension also
overestimates loudness, however, not as dramatically as the original model (figure 2).
Altogether, the proposed extension predicts broadband noise quite well.
At last, a test of the extended model on real-world sounds is presented. Figure 6 depicts
results presented by Schlittenlacher et al.16 and shows the predictions of the extended model for
stationary technical sounds plotted against the corresponding subjective loudness matches.
Third-octave levels were used as the input for the calculations. The prediction made by the
proposed extension falls within the interquartile range of the adjustments made for six of the
eight sounds. When comparing the differing model predictions and experimental data (figure 6),
the root mean square error of the seven louder sounds, based on the deviation of loudness in
percent, is 25%. DIN 45631 achieves 21%, the unchanged ANSI S3.4 56%. The softer notebook
fan noise was excluded as participants probably were confused by a strong tonal component near
the reference frequency, it is not predicted well by any algorithm. A deviation of about 20% is
satisfactory because it is within the range of variations between listeners and some of the chosen
technical sounds are only approximately stationary.
4
CONCLUSIONS
The experimental results of this paper show that ANSI S3.4 and DIN 45631 calculate the
loudness of low-frequency broadband sounds quite well. However, both considerably
overestimate loudness around 3 kHz, the unchanged ANSI S3.4 to a greater extent than the DIN
standard.
The proposed extension of the ANSI standard accounts for this problem by reducing the
amount of spectral summation and specific loudness around 3 kHz. This leads to predictions for
the loudness of broadband and technical sounds which are quite consistent with actual loudness
matches obtained in listening tests. At the same time the algorithm is optimized regarding
ISO 226:2003.
Thus, the extension combines the advantages of the current standards: Like ANSI S3.4, it
estimates the revised equal loudness contours very well, but it also achieves the good predictions
of DIN 45631 for broadband sounds.
5
ACKNOWLEDGMENTS
The authors would like to thank Professor Hugo Fastl (Technische Universität München),
Professor Seiichiro Namba and Professor Sonoko Kuwano (Osaka University) for valuable
comments and the fruitful cooperation which led to this work.
6
REFERENCES
1. ANSI S3.4, “Procedure for the Computation of Loudness of Steady Sounds”, (2007)
2. DIN 45631, “Berechnung des Lautstärkepegels und der Lautheit aus dem Geräuschspektrum
– Verfahren nach E. Zwicker (Procedure for calculating loudness level and loudness)“,
(1991)
3. B.R. Glasberg, B.C.J. Moore, “Derivation of auditory filter shapes from notched-noise data”,
Hearing Research, 47, (1990)
4. IEC 268-7, “Sound system equipment – Part 7: Headphones and earphones”, (1996)
5. ISO 226, “Acoustics – Normal equal-loudness-level contours”, (2003)
6. ISO 389-7, “Acoustics – Reference zero for the calibration of audiometric equipment – Part
7: Reference threshold of hearing under free-field and diffuse-field listening conditions”,
(2005)
7. ISO 532, “Acoustics – Method for calculating loudness level”, (1975)
8. H. Fastl, “Loudness and Masking Patterns of Narrow Noise Bands”, Acustica, 33, (1975)
9. H. Fastl, E. Zwicker, “A free-field equalizer for TDH 39 earphones”, J. Acoust. Soc. of Am.,
73(1), (1983)
10. H. Fastl, E. Zwicker, Psychoacoustics – Facts and models, 3rd edition, Springer, (2007)
11. H. Fastl, F. Völk and M. Straubinger, “Standards for calculating loudness of stationary or
time-varying sounds”, Proc. Inter-Noise 2009, Ottawa, (2009)
12. B.C.J. Moore, B.R. Glasberg, “Modeling binaural loudness”, J. Acoust. Soc. of Am., 121(3),
(2007)
13. B.C.J. Moore, An Introduction to the Psychology of Hearing, 6th Edition, Emerald, (2012)
14. R.D. Patterson, “Auditory filter shapes derived with noise stimuli”, J. Acoust. Soc. of Am.,
59(3), (1976)
15. U. Richter, “Characteristic data of different kinds of earphones used in the extended high
frequency range for pure-tone audiometry”, Mechanik und Akustik, PTB-MA-72,
Braunschweig, (2003)
16. J. Schlittenlacher, T. Hashimoto, H. Fastl, S. Namba, S. Kuwano and S. Hatano, “Loudness
of pink noise and stationary technical sounds“, Proc. Inter-Noise 2011, Osaka, (2011)
17. J. Schlittenlacher, H. Fastl, T. Hashimoto, S.Kuwano and S.Namba, “Differences of loudness
algorithms across the frequency spectrum”, Tagungsband Fortschritte der Akustik – DAGA
2012, Darmstadt, (2012)
18. E. Zwicker, G. Flottrop, S.S. Stevens, “Critical Band Width in Loudness Summation“, J.
Acoust. Soc. of Am., 29(5), (1957)
19. E. Zwicker, D. Maiwald, “Über das Freifeldübertragungsmaß des Kopfhörers DT 48 (On the
free-field response of the earphone DT 48)”, Acustica, 13, (1963)
L1kHz
10
20
30
NANSI
0.03 0.14 0.42
NANSI, extended 0.03 0.14 0.42
Target value
40
1.00
1.00
1.00
50
2.10
2.10
2.00
60
4.17
4.15
4.00
70
8.10
8.02
8.00
80
16.0
15.5
16.0
90
33.2
31.0
32.0
Table 1: Loudness calculated for 1-kHz pure tones
100
70.4
63.4
64.0
dB
sone
sone
sone
Figure 1: Level of a 1-kHz pure tone that is as loud as a bandpass pink noise from 125 to
1000 Hz. Circles indicate medians, whiskers the interquartile range.
Figure 2: Level of a 1-kHz pure tone that is as loud as a bandpass pink noise from 1.25 to 5 kHz.
Circles indicate medians, whiskers the interquartile range.
Spectrum
Outer ear transfer function
Middle ear transfer function
Calculation of excitation patterns
ERB to Bark transformation
Transformation to specific loudness*
Summation along critical bands
Summation to binaural loudness
Loudness
Figure 3: Structure of the extended algorithm. The dashed line indicates that the step was added,
the star that parameters were changed
Figure 4: Equal loudness contours
Figure 5: Binaural specific loudness of pink noise with a third-octave level of 55 dB
Figure 6: Loudness of technical sounds. The ordinate shows the loudness as given by the fixed
1-kHz reference tone, the abscissa as calculated by the extended ANSI S3.4. Circles indicate
medians, whiskers the interquartile range