Download EE Dept., IIT Bombay IEEE Workshop on

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Heterodyne wikipedia , lookup

Chirp compression wikipedia , lookup

Resistive opto-isolator wikipedia , lookup

Spectral density wikipedia , lookup

Dynamic range compression wikipedia , lookup

Transcript
IEEE Workshop on Intelligent Computing, IIIT Allahabad, 13-15 Oct. 2014
Signal processing for improving
speech perception by persons with
sensorineural hearing loss:
Challenges and some solutions
P. C. Pandey
IIT Bombay
Outline
A. Speech & Hearing
B. Sliding-band Dynamic Range Compression
(N. Tiwari & P. C. Pandey, NCC 2014)
C. Automated modification of consonant-vowel
ratio of stops
(A. R. Jayan & P. C Pandey, Int. J. Speech Technology, 2014)
2/15
P. C. Pandey, "Signal processing for improving speech perception by
persons with sensorineural hearing loss: Challenges and some solutions",
IEEE Workshop on Intelligent Computing, IIIT Allahabad, 13-15 Oct. 2014
Part A
Speech & Hearing
3/15
Speech Production Mechanism
Excitation source & filter model
• Excitation: voiced/unvoiced
glottal, frication
• Filtering: vocal tract filter
4/15
Speech segments
• Words • Syllables • Phonemes • Sub-phonemic segments
Phonemes: basic speech units
• Vowels: Pure vowels, Diphthongs
• Consonants: Semivowels, Stops, Fricatives, Affricates, Nasals
/aba/
/apa/
/ada/
/aga/
5/15
Phonemic features
• Modes of excitation
• Glottal
Unvoiced (constriction at the glottis), Voiced (glottal vibration)
• Frication
Unvoiced (constriction in vocal tract), Voiced (constriction in v.t., glottal vibration)
• Movement of articulators
• Continuant (steady-state v.t. configuration): vowels, nasal stops, fricatives
• Non-continuant (changing v.t.): diphthongs, semivowels, oral stops (plosives)
• Place of articulation (place of maximum constriction in v.t.)
Bilabial, Labio-dental, Linguo-dental, Alveolar, Palatal, Velar, Gluttoral
• Changes in voicing frequency (Fo)
Supra-segmental features: Intonation, Rhythm
6/15
Hearing Mechanism
Peripheral auditory system
• External ear: sound collection
○
Pinna
○ Auditory
canal
• Middle ear: impedance matching
○ Ear
drum
○ Middle
ear bones
• Inner ear (cochlea): analysis & transduction
• Auditory nerve: transmission of neural impulses
Central auditory system
Information processing & interpretation
7/15
Auditory
system
Tonotopic map
of cochlea
8/15
Hearing impairment
Types of hearing losses
• Conductive
• Central
• Sensorineural
• Functional
Sensorineural hearing loss: abnormalities in the cochlear
hair cells or the auditory nerve
• Aging
• Excessive exposure to noise
• Infection
• Adverse effect of medicines
• Congenital
9/15
Effects of sensorineural hearing loss
• Elevated hearing thresholds: inaudibility of low-level
sounds
• Reduced dynamic range & loudness recruitment
(abnormal loudness growth): distortion of loudness
relationship among speech components
• Increased temporal masking: poor detection of acoustic
landmarks
• Increased spectral masking (widening of auditory filters):
reduced ability to sense spectral shapes
>> Poor intelligibility and degraded perception of speech,
particularly in noisy environment.
10/15
Signal processing in hearing aids
Currently available techniques
• Frequency selective amplification: improves audibility
but not necessarily intelligibility
• Automatic volume control: not effective in improving
intelligibility
• Multichannel dynamic range compression (with settable
attack & release times, compression ratios): effectiveness
reduced due to processing artifacts
11/15
Techniques under development
• Noise suppression
• Distortion-free dynamic range compression
• Techniques for reducing the effects of increased
spectral masking
o Binaural dichotic presentation
o Spectral contrast enhancement
o Multi-band frequency compression
• Improvement of consonant-to-vowel ratio (CVR): for
reducing the effects of increased temporal masking
12/15
Analog Hearing Aids
Pre-amp → AVC → Selectable Freq. Response → Amp.
Digital Hearing Aids
Pre-amp & AVC → ADC
→ Multi-band Amplitude Compr. & Freq. Resp.
→ DAC & Amp.
Existing Problems
• Noisy environment & reverberation
• Distortions due to multiband amplitude compression
• Poor speech perception due to increased spectral &
temporal masking
• Visit to audiologist for change of settings
13/15
Proposed Hearing Aids
• Distortion-free dynamic range compression &
adjustable frequency response
• Noise suppression & de-reverberation
• Processing for reducing the effects of increased
spectral masking
• Processing for reducing the effects of increased
temporal masking
• User selectable settings
• Implementation using a low-power DSP chip with
acceptable signal delay (< 60 ms)
14/15
Some Solutions for improving speech
perception by listeners with moderate-tosevere sensorineural loss
• Sliding-band dynamic range compression as a solution
to the problem posed by loudness recruitment
• Automated modification of consonant-vowel ratio of
stop consonants as a solution to the problem posed by
increased intraspeech spectral and temporal masking.
• Implementation using a 16-bit fixed-point DSP
processor & testing for satisfactory operation.
15/15
To be continued to Part B.
Workshop: IEEE Workshop on Intelligent Computing, Allahabad, October 13-15, 2014, organized jointly by CSIR-CEERI Pilani and IIIT Allahabad.
Speaker: Prof. P. C. Pandey, EE Dept, IIT Bombay
Topic: Signal processing for improving speech perception by persons with sensorineural hearing loss: Challenges and some solutions
Abstract
Sensorineural hearing loss is caused by abnormalities in the cochlear hair cells or the auditory nerve. It occurs due to aging, excessive
exposure to noise, infection, or abnormalities at the time of birth. It is generally associated with elevated hearing thresholds, reduced dynamic range,
and increased temporal and spectral masking, leading to degraded perception of speech, particular in noisy environment. Several signal processing
techniques have been investigated and reported to address these problems. However, most of these are not suited for use in hearing aids due to
distortions caused by processing related artifacts or due to constraints of size, power, and acceptable signal delay. As some of the possible
solutions, two signal processing techniques have been investigated: (i) a sliding-band dynamic range compression as a solution to the problem
posed by loudness recruitment, and (ii) automated modification of consonant-vowel ratio of stop consonants as a solution to the problem posed by
increased intraspeech spectral and temporal masking. Both techniques have been implemented using a 16-bit fixed-point DSP processor and tested
for satisfactory operation.
Persons with sensorineural loss generally have a highly reduced dynamic range of hearing, with a significant frequency-dependent
elevation of hearing threshold levels without corresponding increase in the upper comfortable listening levels. Signal processing for dynamic range
compression is used to present the sounds comfortably within the limited dynamic range of the listener. Analog hearing aids generally use singleband compression with the gain being dependent on the time-varying signal level. As the power is mostly contributed by the low-frequency
components, the amplification of the high-frequency components depends on the energy in the low-frequency components. Thus the high frequency
components may become inaudible and distortions in temporal envelope may get introduced. In multiband compression available in most digital
hearing aids, the spectral components of the input signal are divided in multiple bands and the gain for each band is calculated on the basis of
signal power in that band. This type of processing can introduce spurious spectral distortions and use of a large number of bands reduces spectral
contrasts and the modulation depth of speech, resulting in an adverse effect on the perception of certain speech cues. Further, the frequency
response of a multiband compression system has a time-varying magnitude response without corresponding variation in the phase response, which
can cause audible distortions, particularly for non-speech audio. These distortions may partly offset the advantages of dynamic range compression
for the hearing-impaired listener. In order to significantly reduce the temporal and spectral distortions associated with the currently used single-band
and multiband compressions in hearing aids, a "sliding-band compression" has been developed. It involves calculating a frequency-dependent gain
function, in which the gain for each spectral sample is determined by the short-time power in an auditory critical band centered at it. The gain
calculation takes into account the specified hearing thresholds, compression ratios, and attack and release times. Unlike single-band compression, it
does not result in any significant temporal distortions because the effect of short-time energy of a spectral component on other spectral components
is limited to those located within a critical bandwidth. Due to use of sliding critical bands for calculating the power spectrum, formant transitions do
not result in discontinuities in the processed output. The technique is realized using an FFT-based analysis-synthesis method which masks phase
related discontinuities and can be integrated with other FFT-based signal processing in hearing aids. The technique is implemented and tested for
satisfactory real-time operation on a 16-bit fixed-point DSP processor.
Increasing the level of the consonant segments relative to the nearby vowel segments, known as consonant-vowel ratio (CVR) modification,
is reported to be effective in improving speech intelligibility for listeners in noisy backgrounds and for hearing impaired listeners. A technique
for real-time CVR modification of stops using the rate of change of spectral centroid for detection of spectral transitions is presented. Its
effectiveness in improving the recognition of consonants in the presence of speech spectrum shaped noise is evaluated by conducting
listening tests on normal-hearing subjects. At lower values of SNR, there was an increase of 7 - 21% in recognition scores and an
equivalent SNR advantage of 3 dB. The technique is implemented on a DSP board based on a 16-bit fixed point processor with on-chip
FFT hardware and tested for satisfactory real-time operation.
References
[1] N. Tiwari and P. C. Pandey, A sliding-band dynamic range compression for use in hearing aids, Proc. National Conference on
Communications 2014 (NCC 2014), Kanpur, Feb. 28 - Mar. 2, 2014, paper no. 1569847357.
[2] A. R. Jayan & P. C. Pandey, "Automated modification of consonant-vowel ratio of stops for improving speech intelligibility", Int. J. Speech
Technology, 2014 (accepted for publication).
Dr. Prem C. Pandey
Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He is currently also the Associate Dean of Academic Programmes.
He received B.Tech. in electronics engineering from Banaras Hindu University in 1979, M.Tech. in electrical engineering from IIT Kanpur in
1981, and Ph.D. in electrical & biomedical engineering from the University of Toronto (Canada) in 1987. In 1987, he joined the University of
Wyoming (USA) as an assistant professor and later joined IIT Bombay in 1989.
His research interests include speech & signal processing; biomedical signal processing & instrumentation; electronic instrumentation &
embedded system design. The focus of his R&D efforts has been in the areas of impedance cardiography and aids for persons with speech
and hearing impairment.