Download EE Dept., IIT Bombay

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Heterodyne wikipedia, lookup

Chirp compression wikipedia, lookup

Electrical engineering wikipedia, lookup

Resistive opto-isolator wikipedia, lookup

Electronic engineering wikipedia, lookup

Array processing wikipedia, lookup

Spectral density wikipedia, lookup

Dynamic range compression wikipedia, lookup

Transcript
Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical
Engineering, VJTI, Mumbai, Feb 23-27, 2015, Coordinator: Prof. Alice N. Cheeran.
Session: Feb 27, Friday, 10:00 a.m. to 11:30 a.m.
============================================================================
Signal processing for persons with
sensorineural hearing loss:
Challenges and some solutions
P. C. Pandey
IIT Bombay
Outline
A. Speech & Hearing
B. Sliding-band Dynamic Range Compression
(Ref: N. Tiwari & P. C. Pandey, NCC 2014, Paper No.1569847357)
C. Automated modification of consonant-vowel
ratio of stops
(Ref: A. R. Jayan & P. C. Pandey, Int. J. Speech Technology, vol. 18, pp. 113–130,
2015)
2/15
P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions,” AICTE
Sponsored Faculty Development Programme on Signal Processing and Applications, Dept. of Electrical Engineering, VJTI,
Mumbai, Feb. 23-27, 2015.
============================================================================
Part A
Speech & Hearing
3/15
Speech Production
Excitation source & filter model
• Excitation: voiced/unvoiced
glottal, frication
• Filtering: vocal tract filter
4/15
Speech segments
• Words • Syllables • Phonemes • Sub-phonemic segments
Phonemes: basic speech units
• Vowels: Pure vowels, Diphthongs
• Consonants: Semivowels, Stops, Fricatives, Affricates, Nasals
/aba/
/apa/
/ada/
/aga/
5/15
Phonemic features
• Modes of excitation
• Glottal: Unvoiced (constriction at the glottis), Voiced (glottal vibration)
• Frication: Unvoiced (constriction in vocal tract), Voiced (constriction in
v.t. & glottal vibration)
• Movement of articulators
• Continuant (steady-state v.t. configuration): vowels, nasal stops,
fricatives
• Non-continuant (changing v.t.): diphthongs, semivowels, oral stops
(plosives)
• Place of articulation (place of maximum constriction in v.t.)
Bilabial, Labio-dental, Linguo-dental, Alveolar, Palatal, Velar, Gluttoral
• Changes in voicing frequency (Fo)
Supra-segmental features: Intonation, Rhythm
6/15
Hearing Mechanism
Peripheral auditory system
• External ear: sound collection
○
Pinna
○ Auditory canal
• Middle ear: impedance matching
○ Ear
drum
○ Middle
ear bones
• Inner ear (cochlea): analysis & transduction
• Auditory nerve: transmission of neural impulses
Central auditory system
Information processing & interpretation
7/15
Auditory
system
Tonotopic map
of cochlea
8/15
Hearing Impairment
Types of hearing losses
• Conductive
• Central
• Sensorineural
• Functional
Sensorineural hearing loss
Associated with abnormalities in the cochlear hair cells or
the auditory nerve.
Causes: aging, excessive noise exposure, infection,
adverse effect of medicines, congenital.
9/15
Effects of sensorineural hearing loss
• Elevated hearing thresholds: inaudibility of low-level sounds
• Reduced dynamic range & loudness recruitment (abnormal
loudness growth): distortion of loudness relationship among
speech components
• Increased temporal masking: poor detection of acoustic
landmarks
• Increased spectral masking (widening of auditory filters):
reduced ability to sense spectral shapes
>> Poor intelligibility and degraded perception of speech,
particularly in noisy environment.
10/15
Signal Processing in Hearing Aids
Currently available techniques
• Frequency selective amplification: improves audibility but not
necessarily intelligibility
• Automatic volume control: not effective in improving
intelligibility
• Multichannel dynamic range compression (with settable
attack & release times, compression ratios): effectiveness
reduced due to processing artifacts
11/15
Techniques under development
• Noise suppression
• Distortion-free dynamic range compression
• Techniques for reducing the effects of increased spectral
masking
o Binaural dichotic presentation
o Spectral contrast enhancement
o Multi-band frequency compression
• Improvement of consonant-to-vowel ratio (CVR): for reducing
the effects of increased temporal masking
12/15
Analog Hearing Aids
Pre-amp → AVC → Freq. Response → Amp.
Digital Hearing Aids
Pre-amp & AVC
→ ADC
→ Multi-band Amplitude Compr. & Freq. Resp.
→ DAC & Amp.
Existing Problems
• Poor intelligibility in noisy environment & reverberation
• Distortions due to multiband amplitude compression
• Poor speech perception due to increased spectral &
temporal masking
• Visit to audiologist for change of settings
13/15
Proposed Hearing Aids
• Distortion-free dynamic range compression & adjustable
frequency response
• Noise suppression & de-reverberation
• Processing for reducing the effects of increased spectral
masking
• Processing for reducing the effects of increased temporal
masking
• Implementation of signal processing using a low-power DSP
chip with acceptable signal delay (< 60 ms)
• User selectable settings
14/15
Some Solutions for Improving Speech
Perception by Listeners with Moderate-tosevere Sensorineural Loss
• Sliding-band dynamic range compression as a solution to the
problem posed by loudness recruitment
• Automated modification of consonant-vowel ratio of stop
consonants as a solution to the problem posed by increased
intraspeech spectral and temporal masking.
• Implementation using a 16-bit fixed-point DSP processor &
testing for satisfactory operation.
15/15
Workshop: AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical
Engineering, VJTI, Mumbai, Feb 23-27, 2015, Coordinator: Prof. alice N. Cheeran.
Speaker: Prof. P. C. Pandey, EE Dept, IIT Bombay
Topic: Signal processing for persons with sensorineural hearing loss: Challenges and some solutions
Abstract
Sensorineural hearing loss is caused by abnormalities in the cochlear hair cells or the auditory nerve. It occurs due to aging,
excessive exposure to noise, infection, or congenital abnormalities. It is generally associated with elevated hearing thresholds, reduced
dynamic range and loudness recruitment, and increased temporal and spectral masking, leading to degraded perception of speech,
particularly in noisy environment. To address these problems, several signal processing techniques have been reported. Most of these
techniques are not suited for use in hearing aids due to distortions caused by processing related artifacts, computational complexities in
implementing the technique for real-time processing using a low-power processor, or excessive signal delay which may interfere with
lipreading. We have investigated two novel techniques: (i) a sliding-band dynamic range compression as a solution to the problem
posed by loudness recruitment [1], and (ii) automated modification of consonant-vowel ratio of stop consonants as a solution to the
problem posed by increased intraspeech spectral and temporal masking [2].
Persons with sensorineural loss generally have a highly reduced dynamic range of hearing, with a significant frequencydependent elevation of hearing threshold levels without corresponding increase in the upper comfortable listening levels. To present the
sounds comfortably within the limited dynamic range of the listener, analog hearing aids generally use single-band compression with
the gain being dependent on the time-varying signal level. As the power is mostly contributed by the low-frequency components, the
high frequency components may become inaudible and distortions in temporal envelope may get introduced. In multiband compression
available in most digital hearing aids, the spectral components of the input signal are divided in multiple bands and the gain for each
band is calculated on the basis of signal power in that band. This type of processing can introduce spurious spectral distortions. Use of a
large number of bands reduces spectral contrasts and the modulation depth of speech, resulting in an adverse effect on the perception of
certain speech cues. Further, the frequency response of a multiband compression system has a time-varying magnitude response without
corresponding variation in the phase response, which can cause audible distortions, particularly for non-speech audio. These distortions
may partly offset the advantages of dynamic range compression for the hearing-impaired listener. In order to significantly reduce the
temporal and spectral distortions associated with the currently used single-band and multiband compressions in hearing aids, a "slidingband compression" has been developed. It involves calculating a frequency-dependent gain function, in which the gain for each spectral
sample is determined by the short-time power in an auditory critical band centered at it. The gain calculation takes into account the
specified hearing thresholds, compression ratios, and attack and release times. Unlike single-band compression, it does not result in any
significant temporal distortions because the effect of short-time energy of a spectral component on other spectral components is limited
to those located within a critical bandwidth. Due to use of sliding critical bands for calculating the power spectrum, formant transitions
do not result in discontinuities in the processed output. The technique is realized using an FFT-based analysis-synthesis method which
masks phase related discontinuities.
Increasing the level of the consonant segments relative to the nearby vowel segments, known as consonant-vowel
ratio (CVR) modification, is reported to be effective in improving speech intelligibility for listeners in noisy backgrounds
and for hearing impaired listeners. A technique for real-time CVR modification of stops using the rate of change of spectral
centroid for detection of spectral transitions is presented. Its effectiveness in improving the recognition of consonants in the
presence of speech spectrum shaped noise is evaluated by conducting listening tests on normal-hearing subjects. At lower
values of SNR, there was an increase of 7 - 21% in recognition scores and an equivalent SNR advantage of 3 dB.
Both the techniques have been implemented using a 16-bit fixed-point DSP processor with on-chip FFT hardware
and have been tested for satisfactory real-time operation. They can be integrated with other FFT-based signal processing
techniques in hearing aids.
References
[1] N. Tiwari and P. C. Pandey, A sliding-band dynamic range compression for use in hearing aids, Proc. National
Conference on Communications 2014 (NCC 2014), Kanpur, Feb. 28 - Mar. 2, 2014, paper no. 1569847357.
[2] A. R. Jayan & P. C. Pandey, Automated modification of consonant-vowel ratio of stops for improving speech
intelligibility, Int. J. Speech Technology, vol. 18, pp. 113–130, 2015. DOI: 10.1007/s10772-014-9254-4.
Dr. Prem C. Pandey
Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He received B.Tech. in electronics engineering
from Banaras Hindu University in 1979, M.Tech. in electrical engineering from IIT Kanpur in 1981, and Ph.D. in electrical
& biomedical engineering from the University of Toronto (Canada) in 1987. In 1987, he joined the University of Wyoming
(USA) as an assistant professor and later joined IIT Bombay in 1989.
His research interests include speech & signal processing; biomedical signal processing & instrumentation;
electronic instrumentation & embedded system design. The focus of his R&D efforts has been in the areas of impedance
cardiography and aids for persons with speech and hearing impairment.