Download ICMSAO Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Amitosis wikipedia , lookup

Organ-on-a-chip wikipedia , lookup

Tissue engineering wikipedia , lookup

Transcript
A NEW FEATURE
EXTRACTION MOTIVATED
BY HUMAN EAR
Amin Fazel
Sharif University of Technology
Hossein Sameti, S. K. Ghiathi
February 2005
Outline
 Introduction
 Physiological basis in the human auditory
system
 Modeling of the basilar membrane and hair
cells
 Experimental results
 Summary and conclusions
Thursday, February 03, 2005
Department of Computer Engineering
2/26
Introduction
 Speech is #1 real-time communication
medium among humans.
 Advantages of voice interface to machines:



Hands-free operation
Speed
Ease of use
Thursday, February 03, 2005
Department of Computer Engineering
3/26
Introduction

Human is a
high-performance
existence proof
for speech
recognition in
noisy
environments.
Wall Street Journal/Broadcast news readings, 5000 words
Untrained human listeners vs. Cambridge HTK LVCSR system
Thursday, February 03, 2005
Department of Computer Engineering
4/26
Physiological Basis
Thursday, February 03, 2005
Department of Computer Engineering
5/26
Physiological Basis
Inner Ear
Semicircular
Canals
Cochlea

The semicircular canals are the
body's balance organs.

Hair cells, in the canals, detect
movements of the fluid in the
canals caused by angular
acceleration

The canals are connected to the
auditory nerve.
Thursday, February 03, 2005
Department of Computer Engineering
6/26
Physiological Basis
Inner Ear
Semicircular
Canals
Cochlea

The inner ear structure called the cochlea is a snail-shell
like structure divided into three fluid-filled parts.

Two are canals (Scala tympani and Scala Vestibuli) for
the transmission of pressure and in the third is the
sensitive organ of Corti, which detects pressure
impulses and responds with electrical impulses which
travel along the auditory nerve to the brain
Thursday, February 03, 2005
Department of Computer Engineering
7/26
Physiological Basis
Inner Ear
Semicircular
Canals
Cochlea










The organ of Corti can be thought of as the body's microphone.
Perception of pitch and perception of loudness is connected with this organ.
It is situated on the basilar membrane in the cochlea duct
It contains inner hair cells and outer hair cells.
There are some 16,000 -20,000 of the hair cells distributed along the basilar
membrane.
Vibrations of the oval window causes the cochlear fluid to vibrate.
This causes the Basilar membrane to vibrate thus producing a traveling wave.
This causes the bending of the hair cells which produces generator potentials
If large enough will stimulate the fibers of the auditory nerve to produce action
potentials
The outer hair cells amplify vibrations of the basilar membrane
Thursday, February 03, 2005
Department of Computer Engineering
8/26
Modeling of BM and Hair Cells
 Different parts of basilar membrane
and hair cells are sensitive to
different frequencies of input signal.
Thursday, February 03, 2005
Department of Computer Engineering
9/26
Modeling of BM and Hair Cells

Since corporation of basilar membrane and hair cells
changes all frequencies of speech into mechanical energy,
with good approximation, we can discretely represent
basilar membrane and hair cells as forced damped
oscillators with different natural frequencies.
Thursday, February 03, 2005
Department of Computer Engineering
10/26
Modeling of BM and Hair Cells

We stimulate these oscillators with input sound

In this simulation we have an oscillating particle which is
always pulled by a force towards the center of oscillation

Displacement of the article from the center of oscillation
is shown by x and the inward force is equal to –kx.

k is the constant for each oscillator
constant
k  m0
Thursday, February 03, 2005
2
Department of Computer Engineering
11/26
Modeling of BM and Hair Cells

Since we have a foreign force (posed by sound), we can
no further use those standard equations which assume
the energy of system is constant. If we don't consider the
effect of friction, the energy of system will not decrease
and it becomes instable. So we must add a force in
opposite direction of movement. Since the direction of
movement is determined by v (velocity), the friction force
is –bv

Viewing each diapason as a filter
Bandwidth
Thursday, February 03, 2005
Department of Computer Engineering
m 0
b
Q
12/26
Modeling of BM and Hair Cells

We model the state of each oscillator with the
pair [x v], where x is the displacement and v is
the velocity of particle
 x old 
 x new  1 t 0  


v
v  0 1 t   old 
 a 
 new  



Where ∆t is the inverse of sampling frequency
Thursday, February 03, 2005
Department of Computer Engineering
13/26
Modeling of BM and Hair Cells
 The particle is imposed by three forces:

The diapason itself pulls the particle by force –kx

The sound imposes a foreign force, say Fexternal
 To compute Fexternal from the current sample we use the
value of sample itself as the external force

The friction opposes to the movement by force –bv
Thursday, February 03, 2005
Department of Computer Engineering
14/26
Modeling of BM and Hair Cells
 Now we can compute a, using the following
formula
a
F  bv pr  kxpr
m
 For using this model in feature extraction

After calculation of the energy for each of these
oscillators, we use them as feature vectors in
ASR systems
1 2 1 2
E  mv  kx
2
2
Thursday, February 03, 2005
Department of Computer Engineering
15/26
Experimental results


We transform a speech with our human based model and
compare it to spectrum domain of this speech
These two transformations have little differences
Thursday, February 03, 2005
Department of Computer Engineering
16/26
Experimental results
 This comparing shows that this human based
model can be used impressively in ASR
systems.
 In addition, this method can be used as an
effective and quick signal transformation
instead of FFT or wavelet in various tasks.
Thursday, February 03, 2005
Department of Computer Engineering
17/26
ASR Experiments
 The feature extraction algorithm proposed
for speech recognition were tested on a
English digit database

For training we use 1386 digit sequences spoken by
18 speakers

In testing phase we use 200 digit sequences that
uttered by speakers out of training database

The testing database split to four groups of 50
sequences and four types of noises added to these
groups
Thursday, February 03, 2005
Department of Computer Engineering
18/26
ASR Experiments
 Recognition is performed using HTK



16 emitting states and three mixture continuous HMM
model
3-state silence model
Single state inter-digit pause model
 In the reference experiments, MFCC_0_D_A is used



Consists of 13 standard cepstral coefficients including C0
augmented with first and second derivations of them
MFCC features were generated by applying a Hamming
window of size 25 ms and overlap 10 ms to the same
pre-emphasized 23-channel Mel-scale filterbank.
The cepstral features were obtained from DCT of logenergy over the 23 frequency channels.
Thursday, February 03, 2005
Department of Computer Engineering
19/26
ASR Experiments
 Car Noise
Word error Rate %
Comparing of MFCC and HEFE for Car Noise
100
90
80
70
60
50
40
30
20
10
0
MFCC
HEFE
20dB
Thursday, February 03, 2005
15dB
5dB
10dB
SNR. dB
Department of Computer Engineering
0dB
-5dB
20/26
ASR Experiments
 Exhibition Noise
Comparing of MFCC and HEFE for Exhibition Noise
Word error Rate %
100
80
60
40
MFCC
20
HEFE
0
20dB
Thursday, February 03, 2005
15dB
10dB
5dB
SNR. dB
Department of Computer Engineering
0dB
-5dB
21/26
ASR Experiments
 Babble Noise
Comparing of MFCC and HEFE for Babble Noise
Word error Rate %
120
100
80
60
40
MFCC
20
HEFE
0
20dB
Thursday, February 03, 2005
15dB
10dB
5dB
SNR. dB
Department of Computer Engineering
0dB
-5dB
22/26
ASR Experiments
 Subway Noise
Comparing of MFCC and HEFE for Subway Noise
Word error Rate %
100
80
60
40
20
MFCC
HEFE
0
20dB
15dB
10dB
5dB
0dB
-5dB
SNR. dB
Thursday, February 03, 2005
Department of Computer Engineering
23/26
ASR Experiments
 For all contaminated speech, HEFE shows
superior performance for all noise types at
most SNR levels.
 For babble noise, HEFE demonstrates
significantly better performance than MFCC.
 For subway noise, improvements by the HEFE
are least significant, but still noticeable.
Thursday, February 03, 2005
Department of Computer Engineering
24/26
Summary
 In this paper we have introduced a simple
model for basilar membrane and hair calls
based on physiological basis
 We use this model for feature extraction in
ASR systems
 These features significantly outperform
MFCC features at babble noise
Thursday, February 03, 2005
Department of Computer Engineering
25/26
Thank you!