Download Improving Speech Intelligibility in Noise with Binaural Beamforming

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hearing loss wikipedia , lookup

Auditory system wikipedia , lookup

Earplug wikipedia , lookup

Sensorineural hearing loss wikipedia , lookup

Sound localization wikipedia , lookup

Lip reading wikipedia , lookup

Audiology and hearing health professionals in developed and developing countries wikipedia , lookup

Noise-induced hearing loss wikipedia , lookup

Sound from ultrasound wikipedia , lookup

Transcript
www.siemens.com/hearing
Improving Speech Intelligibility in Noise
with Binaural Beamforming
The technology behind binax Narrow Directionality & Spatial SpeechFocus
Homayoun Kamkar-Parsi, Ph.D., Dipl.-Ing. Eghart Fischer, Dipl.-Ing. Marc Aubreville
© Siemens Audiology, 2014.
Abstract
This paper describes the new binaural directional features, implemented in the latest binaural Siemens hearing
aid system binax, which is capable of transmitting audio signals from one hearing instrument to the other in
bilateral fitting. ‘Narrow Directionality’, an enhanced binaural beamforming algorithm, especially designed for
very difficult listening situations is introduced. A second feature, ‘Spatial SpeechFocus’, a self-steering binaural
beam forming algorithm also is described. This feature is designed especially for situations with talkers from
side directions in noisy environments, e.g. in cars. The advantages of automatic activation and control of the
new algorithms and their low power consumption needed are also reviewed.
Introduction
Binaural hearing – meaning perceiving and processing slightly different acoustical information from both sides
of the head - enables our hearing system to perform amazingly well in multiple aspects. Surely the most wellknown benefit of our binaural system is the ability to localize acoustic sources. This important ability is essential to spatial orientation, and to react quickly and appropriately to acoustic events. But there are more fascinating qualities associated with binaural processing. Our brain is able to fuse the information perceived by both
ears together to form one enhanced common output even when the individual ears only received incomplete or
distorted input. This process is called binaural redundancy. Furthermore, our binaural system provides a kind of
natural noise reduction, meaning that due to spatial, spectral, and time-based differences between the ears, a
so-called binaural squelch is applied.
But perhaps the most intriguing area related to binaural processing is the effects which help us to understand a
particularly desired speech in background noise. Our brain is able to localize multiple sound sources and to
assign the correct characteristics to these sources simultaneously. As soon as the auditory system has localized
a preferred sound, it can extract the signals of this sound source out of a mixture of interfering sounds in a
process called binaural directed listening. Common real-world examples might be a dinner table, where an
individual is trying to listen to a person sitting off to one side, or while driving a car with the radio on and trying to understand the passenger.
All these benefits of natural binaural hearing can be reduced by hearing loss, aging, declining cognition function, and central auditory processing deficits; all factors that often apply to hearing instrument wearers. This
presents a challenge when new hearing instrument technology is designed, as these user-related conditions
need to be taken into consideration. In addition, this is especially critical regarding hearing instrument use, as
not being able to understand speech in background noise is the most common complaint from new hearing
instrument wearers, and poor performance in background noise is the most common reason for hearing instrument rejection.
How can we enhance binaural processing?
In order to emulate binaural listening, we first need to link the two hearing instruments, just like how the brain
uses input from both ears. Building upon the original e2e technology which won the prestigious German President’s Future Prize in 2012, e2e wireless 3.0 is more powerful than ever before. In addition to exchanging hearing instrument data such as volume and program settings, e2e wireless 3.0 is also able to directly transmit
audio signals between the two hearing instruments. This means that each hearing instrument in a bilateral pair
works with input not only from the two microphones on its own housing, it now also receives the acoustic
signal picked up by the two microphones on the other hearing instrument, creating a hearing instrument that
works with input from four microphones. And because in the wearing position, the microphones are located at
different parts of the head, the acoustic information each one picks up is slightly different, just like it is for the
right and left ear. Together, the information from all the microphones creates a much more complete and accurate impression of the surrounding acoustic environment. We call this heightened sensitivity of the acoustic
environment: high definition sound resolution (HDSR). The HDSR processing is what enables the new Siemens
binax hearing instruments to offer a number of features which emulate the natural binaural hearing processes.
We believe that the algorithms in binaural hearing instruments have to follow the same principles as our internal binaural processor, the brain. To achieve optimal speech intelligibility in very noisy situations, we strategically combine the signals from both right and left instruments in order to use any information available from
the target direction, while at the same time reducing the influence of the non-frontal directions where interferers are assumed. To better hear a speaker from one side in the presence of noise from the other side, we also
exploit the natural principles of auditory localization, using interaural time differences (ITDs) and interaural
level differences (ILDs). In this way, we keep the sound sources well on their natural position even though the
non-desired side is attenuated.
In all that we do, we know that speech intelligibility is not only a matter of simply attenuating interfering noise,
but also of keeping these (attenuated) interferers in the correct spatial position for later processing in the brain
(spatial stream segregation). Therefore, we ensure that the important binaural cues are well preserved so that
the natural and artificial “binaural processors” can work together.
Bringing benefits of binaural hearing to daily use
New features should not only be innovative, they also need to provide practical benefit for everyday use. First
of all, this means that these features should work automatically without wearer manipulation, and fast enough
in order to be effective in ever-changing real-life situations. The features of binax are fully integrated into the
automatic Universal program. They engage, disengage, adapt smoothly and automatically in response to the
changing acoustic environment, and function synergistically with other existing hearing instrument features.
They do not require separate programs, additional accessories, or volume adjustments. Because only when
everything just works, and works automatically, can the wearer forget that he is wearing hearing instruments.
Secondly, when we are talking about features in tiny hearing instruments with even tinier batteries, then practicality also means that new features need to be energy efficient. Only when these features can be engaged
when necessary without significantly reducing the battery life can they be truly beneficial. Compared with our
previous micon platform, which did not have bilateral audio data transfer, binax battery consumption remains
the same for the normal microphone modes and most importantly, when audio streaming is active, the additional binaural features offered by binax only increases the battery consumption by 200µA or less, from 1.3 to
still below 1.6mA. And this of course is only while the new features are automatically activated. This is considerably more efficient than other products using this type of binaural processing. Put in more practical terms, a
size 312 battery in a Pure micon S would last approximately 9.5 days given a typical usage profile1. That same
battery would last a Pure binax S 8.3 days given the same usage profile including active binaural features.
binax Narrow Directionality
Our objective was to develop binaural algorithms which address the most dissatisfying listening situations for
those with hearing loss. It has been well documented in MarkeTrak studies (e.g. Kochkin, 2005, Kochkin 2010)
that the most problematic situations for the hearing impaired are understanding speech in noisy environments
and large groups [1, 2]. In fact, satisfaction in the area did not improve substantially from MarkeTrak VII to
MarkeTrak VIII. Therefore, we created Narrow Directionality to address those problems.
Narrow Directionality is a new advanced binaural beamforming system, which uses our binaural wireless audio
link technology (e2e wireless 3.0). This algorithm improves speech understanding in extremely noisy and adverse acoustic environments, and provides a more efficient solution to the cocktail party effect compared to
bilaterally-fitted conventional monaural differential microphone systems.
Narrow Directionality is designed to enhance the speech signal coming from a target speaker located among
multiple other competing or interfering speakers around the listener. It creates a narrow beam towards the
front direction so the wearer can listen easily to any distinct speaker by turning his head towards him or her. It
improves the speech signal from the target speaker in two ways simultaneously: by quickly reducing other
1
16 hour wearing day including 2 hours of Bluetooth streaming
competing speech signals outside the beam angular range, and by boosting the level of the target speaker
signal within the beam (i.e. ‘focus’ on the target speaker).
As is well known, conventional monaural directional microphone systems can only effectively suppress interferences in the rear hemisphere. This new system, however, can essentially attenuate interfering speakers or
noise that is not immediately in front of the wearer (i.e. ‘narrow point of listening‘). The system also incorporates a module that prevents the attenuation or distortion of the target speech signal due to small head movements during a typical conversation. As a result, the system can quickly adapt and compensate for small movements such as +/-10° so that the target speech signal is not constrained to be precisely within the very frontal
narrow beam range. This allows a more comfortable conversation without having the user obliged to always
face directly the target speaker.
Additionally, the system is well integrated with an automatic control which smoothly adjusts (channel-wise) the
directivity from a wider to a narrower beam as the SNR in the environment deteriorates. This ensures the optimal listening experience in all kinds of noisy environments.
Monaural processing and binaural
processing
Existing advanced monaural directional microphone systems (i.e. monaural beamforming) provide great noise
suppression, but mostly from the back hemisphere. In simple terms, only noises or even interfering speakers at
the back of the hearing impaired person are well attenuated. So what if the interfering noise comes from the
side or from next to the target speaker? Narrow Directionality (i.e. our advanced binaural beamforming system;
see Figure 2) is built on top of our existing monaural directional microphone system to tackle even more challenging noisy situations.
Figure 2: Simplified block diagram of Binax Narrow Directionality system composed of a monaural processing stage followed
by a binaural processing stage, which takes as inputs the local signal (the monaural directional signal) and the contralateral
signal (the monaural directional signal transmitted from the hearing instrument from the other side of the head i.e. via e2e
wireless audio link)
Figure 3 illustrates the various listening modes from omnidirectional to Narrow Directionality. In Figure 3a, the
listening mode is set to omnidirectional. This implies no sound suppression of any kind. In this setting, listener
hears all the surrounding sounds equally. In Figure 3b, the listening mode is set to monaural or traditional directional processing. This creates a wide beam towards the front direction, and any interfering noise or talkers
from back hemisphere are attenuated. In Figure 3c, the listening mode is set to Narrow Directionality. The interferers from the back hemisphere remain attenuated, but additionally, interfering speakers in the proximity of
target speakers are also well suppressed. This is due to the narrower frontal beam offered by binax technology.
Figure 3: a) Omnidirectionality. b) Monaural directionality. c) Binaural Narrow Directionality
Insights of Narrow Directionality processing
As introduced in the previous section, Figure 2 is a simplified block diagram of our Narrow Directionality system
incorporating the fusion of monaural and binaural systems. Now if we take a closer look at our system as
shown in Figure 4, our binaural processing is composed of three essential components: the binaural
beamforming, the binaural noise reduction gain and the head movement compensation module. All those
components work together to achieve the Narrow Directionality effect.
Figure 4: Binax Narrow Directionality – closer look at the Binaural Processing system composed of three components: Binaural beamforming, binaural noise reduction gain and a head movement compensation module
Binaural Beamforming
It is a new kind of binaural beamforming based on utilizing the head shadowing effect. It takes into account
the contralateral wireless binaural signal as shown in Figure 4.
For each side, the binaural beamformer is designed as follows: it takes as input the local signal, which is the
monaural directional signal, and the contralateral signal, which is the monaural directional signal transmitted
from the hearing instrument from the other side of the head (i.e. via our binaural wireless link e2e wireless
3.0). It should be noted that the monaural directional signal (from the local side or the contralateral side) is
already an enhanced signal with reduced noise from the back hemisphere as illustrated in Figure 3b.
The output of the beamformer is generated by linearly adding the weighed local signal and the weighted contralateral signals. The weighting scheme, which is a crucial part of the overall design, is aimed to provide an
output signal with maximum lateral interference cancellation while keeping the frontal speaker untouched.
How should the weights be then optimized to achieve this goal? Taking the left hearing instrument as the reference, imagine the following example: a target speaker is located at 0° in front of the listener, who is fitted with
binaural hearing instruments. The local and contralateral (i.e. the transmitted signal from the right hearing
instrument) signals will therefore arrive at the same time to both ears without any head shadowing effect. In
other words, both signals will have approximately the same power and phase. However, for the case of a lateral
interfering noise (e.g. at 45°), the local and contralateral signals will be different due to head shadowing and
interaural time difference i.e. the signals will have power and phase (or time) differences. Therefore, given a
target at 0° with the presence of an interfering lateral noise or a competing talker at 45°, the local signal power
will be higher than the contralateral signal power due to the interfering noise. It also implies that the contralateral signal is the one which is less affected by the noise due to the head shadowing.
Having this example in mind, we designed an algorithm to derive the optimum weights with the following
criteria:
A) The sum of the weighted local and the weighted contralateral input signals should always produce
an output signal (to each ear) with minimum power with respect to the original local and contralateral signal powers. This criterion ensures that lateral interferences are attenuated. Importantly, the
weights are also adaptive and are updated within milliseconds to closely follow fast changes in the
noisy environment.
B) The additive combination of the weights should always add up to 1. This ensures that the target
signal at 0° remains untouched.
Applying A and B, our binaural beamformer creates a narrow beam to the front direction with the beam pattern
representation as depicted in Figure 5. Figure 5 illustrates Narrow Directionality (combination of binaural
beamforming and binaural noise reduction) output signal characteristic relative to monaural directionality.
Binaural Noise Reduction
To further enhance the output signal from our binaural beamformer as depicted in Figure 5b, we also developed a new adaptive binaural noise reduction gain which is fully integrated within the noise reduction unit.
This noise reduction gain can be interpreted as a binaural Wiener-based gain computed using as inputs the local
and the contralateral signals. We designed it to have some specific properties. It also attenuates lateral interfering speakers coming from outside the frontal target angular range (+-10°). Therefore, a competing speaker
very close to the target speaker is further attenuated (i.e. by applying gain below 1 – typical Wiener-based gain
attenuation). However, if there is a frontal target speaker present within the frontal angular range, the frontal
speaker is moderately amplified (or boosted) by applying a gain above 1 (which is not a typical Wiener-based
gain property).
5a)
5b)
5c)
Figure 5: Narrow Directionality output signal characteristic. a) Monaural directional output characteristic; b) Binaural
Beamforming output characteristic c) Binax Narrow Directionality: Binaural Beamforming combined with Binaural Noise
Reduction.
Narrow Directionality gives the hearing impaired wearer the perception that he is focusing on the person he is
directly looking at (like a magnifying glass) as illustrated in Figure 5c. It should be noted that the adaptation of
the binaural noise reduction gain is fast enough (within milliseconds) to rapidly amplify or attenuate depending
on the acoustic situation and of course without any background noise increase.
Head movement compensation
As discussed earlier, Narrow Directionality creates a narrow frontal beam to reduce efficiently interfering noises
from all directions under the assumption that the target speaker is in front of the listener. But what about small
movements from both the target speaker and the listener? Would any small head movements also attenuate the
desired target speaker due to the narrow beam? At first glance, the answer would be yes. And this is why Narrow Directionality also includes a head movement compensation module to avoid any target distortion. This is
necessary to ensure a normal, comfortable conversation without requiring the listener to always directly face
the target speaker.
Head movements are a part of normal behavior during a conversation, and they usually occur very quickly. Plus
and minus 10 degrees can be considered as a normal range of regular head movements by either the speaker or
the listener. So if the target signal is at +10°, the head compensation module modifies the originating input
binaural signal at +10° and brings it back to 0° (more specifically, the phase and the level of the contralateral
signal is re-adjusted to match the local signal). This allows the binaural beamforming and the binaural noise
reduction to behave in the same original manner; that is, the target signal is ‘seen’ at 0°, which lies within the
narrow frontal beam range of 0° of Narrow Directionality. As a result, the target signal is still enhanced rather
than attenuated due to head movements.
Automatic Control of Narrow Directionality
With Narrow Directionality, the user can now understand better in challenging situations, such as a loud cafeteria environment. However, in quieter situations, it is important to hear from all around. This is when an automatic control of binaural algorithms comes into play. The goal of this intelligent algorithm is to estimate the
complexity of a situation, and seamlessly introduce more directionality when needed. For this, multiple criteria
are evaluated.
The acoustic complexity of the hearing aid user’s listening environment is usually linked with background noise
level. In order to activate binaural processing, a certain background noise level which was optimized in various
noisy environments is needed. This threshold is higher than the respective noise level expected for monaural
processing. If this binaural threshold is exceeded, the binaural audio transmission is enabled. Once the audio
signal from the contralateral hearing instrument is available, the hearing instrument has far more possibilities
to analyze the scene. Using a combination of both ipsi- and contralateral metrics, the effect of the beamformer
is restricted to situations and frequency bands where it is useful. Furthermore, its strength is adjusted frequency-specifically, depending on the background noise level (Figure 6a). For lower noise levels, the monaural directional microphone is engaged to provide sufficient noise attenuation in these situations (Figure 6b). As the
noise level increases, Narrow Directionality engages and its effects are increased accordingly until it reaches full
directionality for high noise levels. This has the important advantage of keeping spatial orientation and sound
naturalness to a maximum in medium noisy situations, and in situations where most noise is contained in the
lowest frequencies. It should be noted that the classification of the acoustic situation surrounding the hearing
aid wearer is also taken into account. For instance, there might be some situations, such as enjoying loud music
that should not activate a narrow beamformer in an automatic program.
a)
b)
Figure 6: a) An example of frequency-dependent activation of binaural Narrow Directionality effect, e.g. in a cocktail party situation. 6b): directional microphone benefit is maximized by providing it in 48 channels
For the hearing instrument user, the mechanisms “under the hood” are not important. He or she only needs to
be focused on good speech understanding and listening comfort, in every situation. This is why it is so important to have a seamless adaptation as the acoustic scenery changes. Any control that is acting too fast is
prone to misdetections, whereas any control that is acting too slowly can be noticed, and therefore irritating. If
the situations change quickly, the automatic steering has to adapt quickly, if the situation changes gradually,
the automatic steering should also adapt gradually. The best possible outcome for a user reaction is: I did not
notice any automatic adjustments of the hearing instrument – it just always seemed right. The maximum user
benefit is reached when the amount of directionality and noise reduction is optimized so that the overall perception is natural and unnoticeable.
The efficacy of the Narrow Directionality algorithm for speech recognition for the hearing impaired has been
studied in clinical trials at two different independent sites. These behavioral findings are very encouraging, and
are in good agreement with SNR advantages expected in the algorithm design. This research is summarized in
a companion paper by Powers & Froehlich, 2014 [3].
Advanced beam forming in 360 degrees –
Spatial SpeechFocus
The binaural audio transmission introduced with the binax platform enables not only beamforming for situations where the target speaker is in the front, but also allows for beamformng to the side of the hearing instrument wearer, e.g. for situations like walking or sitting side by side.
Up to now, the best one could do when a target speaker is located to the side is to select the omnidirectional
mode. However, in these situations the interfering sources, such as street noise, often come from the respective other directions and omnidirectional processing cannot suppress the undesired sources. This is where Spatial SpeechFocus, introduced in the binax platform, comes into play. By suppressing an undesired side, and
enhancing the target signal from the desired side on both ears, this technology is built to increase listening
comfort as well as speech understanding.
a)
b)
Figure 7: Interaural time difference. a) f = 250Hz b) f=1 kHz. Whereas in the left figure, the direction of arrival of the sound wave is clear,
for the right figure it is already ambiguous.
The algorithm works on the same fundamental principles as the human hearing. When sound is coming from
one side of the head, it will have two major differences in the two ears: First, as sound propagates, it will arrive
earlier at the ear closer to the sound source. This effect is named interaural time difference (ITD). If sound
comes exactly from the side of the person (90 degrees), this time difference is approximately 0.7 milliseconds.
This will result in a phase difference of the respective sound, which can be used by a state of the art differential
beamformer [4]. This phase difference is most evident for frequencies lower than around 750Hz (Figure 7).
Luckily, there is another effect to exploit for these higher frequencies. As the sound waves travel pass the head,
lower frequencies sound waves are diffracted by the head, and hence not much attenuated. But for higher
frequencies, there is significant attenuation that can be detected and used to determine the origin of a sound.
This can be accomplished even for high frequency noises, albeit with less precision. This effect is called
interaural level difference (ILD). In the Spatial SpeechFocus algorithm, introduced with the binax platform, a socalled Wiener filter based approach is used to suppress signals coming from an undesired side.
a)
b)
Figure 8: Polar patterns for binax Spatial SpeechFocus, measured on the left hearing aid, in anechoic chamber condition.
Dashed: omnidirectional, solid: Spatial SpeechFocus, a) f=500Hz, b): f=2kHz
Both ITD and ILD are utilized to create a powerful beamforming algorithm, the effect of which can be observed
in the polar plots shown in figure 8. Depending on the frequency and room acoustics, the attenuation is approximately 10 dB.
The major advantage over mere copying of the preferred ear to the other side is that in this application, spatial
cues are kept. That is, the user can still localize the sound and has a natural spatial impression. The polar pattern in figure 9b shows a beamformer focused to the left side. It can be observed that sources coming from the
right side (regarded as noise in this case) are attenuated by the same amount on the left and right ear com-
pared to the omni signal in figure 9a. The interaural level differences (shaded area) and thus the localization of
the sources remain untouched.
This beamformer can be controlled manually, using a remote control application (“Spatial Configurator Direction”), but it also can be controlled automatically. For this, features that correlate with the signal-to-noise ratio
of speech are calculated independently in both hearing aids. These features represent the probability of speech
from the front, the back and one of the sides. By exchanging this information and combining the data, it is
possible to determine from which direction a speech signal originates. This is an extension of the previous generation micon SpeechFocus algorithm. Now the Spatial SpeechFocus beamformer is able to provide a signal
focused to either side in addition to the front and back. If speech is coming from both sides, or in quiet, an
omnidirectional microphone mode is chosen (Figure 10).
a)
b)
Figure 9: Polar plots for left and right ear for a) omnidirectional mode and b) Spatial SpeechFocus to the left side, f = 2 kHz,
measured on KEMAR in low reverberant room. The interaural level difference, represented by the shaded area, does not change
in both modes.
The switching is performed synchronously on both ears, using a smooth transition. In all situations, the correct
perception of the speaker(s) will be kept because the binaural cues are kept to a large extent.
Figure 10: Directivity patterns in Spatial SpeechFocus. The directionality is steered according to where speech originates
Spatial SpeechFocus is activated, if the hearing instrument detects a dedicated car situation in the automatic
program, or when a special program that the hearing instrument acoustician can configure is selected.
Conclusion
In this paper, we provide insights into the new binaural directional features implemented in the latest binaural
Siemens hearing aid binax. All described functionalities are based on the new ability of the hearing aid system
to transmit audio signals from ear to ear with low latency.
With ‘Narrow Directionality’, we presented an enhanced binaural beamforming algorithm, designed especially
for very difficult listening situations with multiple talkers in the background.
It provides a narrow acoustic focus to the “look direction” of the user, and thus enables the hearing aid user to
understand the preferred talker, even with several other talking persons in the same proximity.
What makes it also unique is its smooth, situation dependent activation and deactivation and the high resolution control of its strength of effect, which separately depends on the acoustic conditions in each frequency
band.
We also described ‘Spatial SpeechFocus’, a self steering binaural beam forming algorithm, especially useful for
situations with speakers talking from the left or right side in noisy environments. It is automatically activated
when the presence of speech from one side is detected in the presence of background noise. Like the human
ear, it uses interaural phase and level differences for maintaining a natural, spatially correct sound impression.
Finally, we highlighted the surprisingly low additional power consumption needed for operating these powerful
binaural algorithms.
References
[1] Kochkin, S. (2005). MarkeTrak VII: Customer satisfaction with hearing instruments in the digital age. Hearing J. 58(9), 30,32–34,38–40,42–43.
[2] Kochkin S. (2010). MarkeTrak VIII: Consumer satisfaction with hearing aids is slowly increasing. Hearing J
63(1),19-20,22,24,26,28,30-32.
[3] Powers, T. & Froehlich,M. (2014). Clinical results with a new wireless binaural directional hearing system.
Hearing Rev, in print.
[4] Elko, G. W. & Nguyen Pong, A.-T. (1995). ‘A simple adaptive first-order differential microphone’, IEEE AASP
Workshop on Applications of Signal Processing to Audio and Acoustic.