Download Hearing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hearing loss wikipedia , lookup

Evolution of mammalian auditory ossicles wikipedia , lookup

Ear wikipedia , lookup

Earplug wikipedia , lookup

Soundscape ecology wikipedia , lookup

Olivocochlear system wikipedia , lookup

Noise-induced hearing loss wikipedia , lookup

Audiology and hearing health professionals in developed and developing countries wikipedia , lookup

Sound wikipedia , lookup

Sound from ultrasound wikipedia , lookup

Sound localization wikipedia , lookup

Sensorineural hearing loss wikipedia , lookup

Auditory system wikipedia , lookup

Transcript
Hearing 1
Hearing
SGN-14006 / A.K.
Hearing 2
1 Introduction
! 
Sources: Rossing. (1990). ”The science of sound”. Chapters 5–7.
Karjalainen. (1999). ”Kommunikaatioakustiikka”.
Moore. (1997). ”An introduction to the psychology of hearing”.
SGN-14006 / A.K.
Auditory system can be divided in two parts
–  Peripheral auditory system (outer, middle, and inner ear)
–  Auditory nervous system (in the brain)
! 
! 
Contents:
1. 
Introduction
2. 
Ear physiology
3. 
Masking
4. 
Sound pressure level
5. 
Loudness
6. 
Pitch
7. 
Spatial hearing
Ear physiology studies the peripheral system
Psychoacoustics studies the entire sensation:
relationships between sound stimuli and the subjective
sensation
Hearing 3
1.1 Auditory system
! 
SGN-14006 / A.K.
Dynamic range of hearing is wide
–  ratio of a very loud to a barely audible sound pressure level is
1:105 (powers 1:1010, 100 dB)
! 
Frequency range of hearing varies a lot between
individuals
–  only few can hear from 20 Hz to 20 kHz
–  sensitivity to low sounds (< 100Hz) is not very good
–  sensitivity to high sounds (> 12 kHz) decreases along with age
! 
Selectivity of hearing
–  listener can pick an instrument from among an orchestra
–  listener can follow a speaker at a cocktail party
–  One can sleep in background noise but still wake up to an
abnormal sound
Hearing 4
1.2 Psychoacoustics
! 
SGN-14006 / A.K.
Perception involves information processing in the brain
–  Information about the brain is limited
! 
Psychoacoustics studies the relationships between sound
stimuli and the resulting sensations
–  Attempt to model the process of perception
–  For example trying to predict the perceived loudness / pitch /
timbre from the acoustic properties of the sound signal
! 
In a psychoacoustic listening test
–  Test subject listens to sounds
–  Questions are made or the subject is asked to describe her
sensasions
Hearing 5
2 Ear physiology
! 
SGN-14006 / A.K.
The human ear consists of three main parts:
(1) outer ear, (2) middle ear, (3) inner ear
Hearing 6
2.1 Outer ear
! 
SGN-14006 / A.K.
Outer ear consists of:
–  pinna – gathers sound; direction-dependent response
–  auditory canal (ear canal) - conveys sound to middle ear
Nerve
signal
to brain
[Chittka05]
Hearing 7
2.2 Middle ear
! 
SGN-14006 / A.K.
Middle ear contains
–  Eardrum that transforms sound waves into mechanic vibration
–  Tiny audtory bones: hammer (resting against the eardrum, see
figure), anvil and stirrup
! 
! 
The bones transmit eardrum vibrations to the oval window
of the inner ear
Acoustic reflex: when sound
pressure level exceeds
~80 dB, eardrum tension
increases and stirrup is
removed from oval window
–  Protects the inner ear
from damage
Hearing 8
2.3 Inner ear, cochlea
! 
! 
! 
! 
! 
! 
! 
SGN-14006 / A.K.
The inner ear contains the cochlea: a fluid-filled organ where
vibrations are converted into nerve impulses to the brain.
Cochlea = Greek: “snail shell”.
Spiral tube: When stretched out, approximately 30 millimeters long.
Vibrations on the cochlea’s oval window cause hydraulic pressure
waves inside the cochlea
Inside the cochlea there is
the basilar membrane,
On the basilar membrane
there is the organ of Corti
with nerve cells that are
sensitive to vibration
Nerve cells transform
movement information into
neural impulses in the
auditory nerve
Hearing 9
2.4 Basilar membrane
! 
SGN-14006 / A.K.
Figure: cochlea stretched out for illustration purposes
–  Basilar membrane divides the fluid of the cochlea into separate
tunnels
–  When hydraulic pressure waves travel along the cochlea, they
move the basilar membrane
Hearing 10
Basilar membrane
! 
! 
SGN-14006 / A.K.
Different frequencies produce highest amplitude at different sites
Preliminary frequency analysis happens on the basilar membrane
Travelling waves:
Best freq (Hz)
Hearing 11
2.5 Sensory hair cells
! 
! 
! 
SGN-14006 / A.K.
Distributed along the basilar membrane are sensory hair cells that
transform membrane movement into neural impulses
When a hair cell bends, it generates neural impulses
–  Impulse rate depends on vibrate amplitude and frequency
Each nerve cell has a characteristic frequency to which it is most
responsive to (Figure: tuning curves of 6 different cells)
Hearing 12
3 Masking
! 
! 
SGN-14006 / A.K.
Masking describes the situation where a weaker but
clearly audible signal (maskee, test tone) becomes
inaudible in the presence of a louder signal (masker)
Masking depends on both the spectral structure of the
sounds and their variation over time
Hearing 13
3.1 Masking in frequency domain
! 
SGN-14006 / A.K.
Model of the frequency analysis in the auditory system
–  subdivision of the frequency axis into critical bands
–  frequency components within a same critical band mask each
other easily
–  Bark scale: frequency scale that is derived by mapping
frequencies to critical band numbers
! 
! 
Hearing 14
Masking in frequency domain
! 
Figure: masked thresholds [Herre95]
–  masker: narrowband noise around 250 Hz, 1 kHz, 4 kHz
–  spreading function: the effect of masking extends to the spectral
vicinity of the masker (spreads more towards high freqencies)
! 
Narrowband noise masks a tone (sinusoidal) easier than
a tone masks noise
Masked threshold refers to the raised threshold of
audibility caused by the masker
– 
– 
Additivity of masking: joint masked thresh is approximately
(but slightly more than) sum of the components
sounds with a level below the masked threshold are inaudible
masked threshold in quiet = threshold of hearing in quiet
Hearing 15
3.2 Masking in time domain
! 
SGN-14006 / A.K.
SGN-14006 / A.K.
Hearing 16
Masking: Examples
SGN-14006 / A.K.
Forward masking
–  masking effect extends to times after the masker is switched off
! 
Backwards masking
! 
Forward/backward masking does not extend far in time
! 
–  masking extends to times before the masker is been switched on
" simultaneous masking is more important phenomenon
backward
masking
forward
masking
! 
! 
A single tone is played, followed by the same tone
and a higher frequency tone. HF tone is reduced in
intensity first by 12 dB, then by steps of 5 dB.
Sequence repeats twice: second time the frequency
separation between the tones is increased.
Attempt to mask higher frequencies
Attempt to mask lower frequencies (not masked as
easily)
Hearing 17
Application to audio steganography
! 
! 
SGN-14006 / A.K.
Idea: hide a message in the audio data, keeping the
message inaudible yet decodable
Example
Hearing 18
4 Sound pressure level
! 
! 
–  Here robustness to environmental noise was important
SGN-14006 / A.K.
Sound signal s1(t) at time t represents pressure deviation
from normal atmospheric pressure
Sound pressure pRMS = E{s(t )2} is the (linear) RMS-level
of the signal
–  E{ } denotes expectation (RMS = root-mean-square level)
! 
Due to the wide dynamic range, decibel scale is
convenient
–  pdB = 20 log10 (pRMS / p0) = Lp
where p0 is a reference pressure
Hearing 19
4.1 Threshold of hearing and dB scale
! 
SGN-14006 / A.K.
Threshold of hearing
–  Weakest audible sound pressure at 1 kHz frequency is 20 µPa,
which has been chosen to be the reference level p0 of the dBscale
! 
–  Lp = 20 log10(p/p0) = 10 log10(p2/p02)
Threshold of pain
–  Loudest sound
that the auditory
system can
meaningfully
deal with
–  130 dB @ 1 kHz
Hearing 20
4.2 Multiple sources
! 
! 
Two sound sources: s(t) = s1(t) + s2(t)
RMS pressure level of the summary signal:
pRMS = E{s(t )2 } = E{s1 (t ) 2 + 2s1 (t )s2 (t ) + s2 (t ) 2}
! 
If the signals are uncorrelated E{s1 (t )s2 (t )} = 0
and the above formula simplifies to
pRMS = p12 + p2 2
If p1 = p2, the sound pressure level of the summary
signal is 3 dB higher than that of p1 (why?)
SGN-14006 / A.K.
Hearing 21
Multiple sources
! 
SGN-14006 / A.K.
Hearing 22
5 Loudness
Two sources with 80 dB sound pressure level
! 
–  Source signals uncorrelated: together produce 83 dB level
–  Sources correlate perfectly (same sound): results in 86 dB level
! 
Loudness describes the subjective level of sound
–  Perception of loudness is relatively complex, but
–  consistent phenomenon and
–  one of the central parts of psychoacoustics
Doubling the sound amplitude increases the sound
pressure level by 6 dB
–  Because: Lp = 20 log10(2·p/p0) = 20 log10(p/p0) + 6 [dB]
! 
–  Equivalent to adding another identical source next to the first one
! 
The loudness of a sound can be compared to a
standardized reference tone, for example 1000 Hz
sinusoidal tone
–  Loudness level (phon) is defined to be the sound pressure level
(dB) of a 1000 Hz sinusoidal, that has the the same subjective
loudness as the target sound
–  For example if the heard sound is perceived as equally loud as 40
dB 1kHz sinusoidal, is the loudness level 40 phons
Intuitively: if the two sources do not correlate, the
components of the two audio signals may amplify or
cancel out each other, depending on their relative phases,
and hence the level will be only 83 dB
Hearing 23
5.1 Equal-loudness curves
Loudness level (phons)
SGN-14006 / A.K.
SGN-14006 / A.K.
Hearing 24
5.3 Critical bands
! 
SGN-14006 / A.K.
Listening to two sinusoids with nearby frequencies and increasing
their frequency difference, the perceived loudness increases
when the frequency difference exceeds critical bandwidth
Sound pressure level (dB)
–  Figure: 1 kHz @ 60 dB, Critical bandwidth is 160 Hz at 1 kHz
! 
Frequency (Hz)
Ear analyzes sound at critical band resolution. Each critical band
contributes to the overall loudness level
Hearing 25
5.4 Loudness of a complex sound
! 
! 
! 
Loudness of a complex sound is calculated by using so-called
loudness density as intermediate unit
Loudness density at each critical band is (roughly) proportional to the
log-power of the signal at the band (weighted according to sensitivity
of hearing and spread slightly by convolving over frequency)
Overall loudness is obtained by summing up loudness density values
from each critical band
Figure: integration of loudness for a sinusoidal tone and for wideband
noise
Loudness density (sones / Bark)
! 
SGN-14006 / A.K.
Hearing 26
6 Pitch
! 
SGN-14006 / A.K.
Pitch
–  Subjective attribute of sounds that enables us to arrange them on
a frequency-related scale ranging from low to high
–  Sound has a certain pitch if human listaners can consistently
match the frequency of a sinusoidal tone to the pitch of the sound
! 
Fundamental frequency vs. pitch
– 
– 
– 
– 
! 
Fundamental frequency is a physical attribute
Pitch is a perceptual attribute
Both are measured in Hertz (Hz)
In practise, perceived pitch ≈ fundamental frequency
"Perfect pitch" or "absolute pitch" - ability to recognize the
pitch of a musical note without any reference
–  Minority of the population can do that
Frequency / Bark
Hearing 27
6.1 Harmonic sound
! 
SGN-14006 / A.K.
For a sinusoidal tone
Hearing 28
6.2 Pitch perception
! 
–  Fundamental frequency = sinusoidal frequency
–  Pitch ≈ sinusoidal frequency
! 
Pitch perception has been tried to explain using two
competing theories
–  Place theory: “Peak activity along the basilar membrane
determines pitch” (fails to explain missing fundamental)
–  Periodicity theory: “Pitch depends on rate, not place, of response.”
Neurons fire in sync with signals
Harmonic sound
Trumpet sound:
* Fundamental
frequency
F = 262 Hz
* Wavelength
1/F = 3.8 ms
SGN-14006 / A.K.
! 
The real mechanism
is a combination of
the above
–  Sound is subdivided into
subbands (critical bands)
–  Periodicity of the
amplitude envelope
(see lowest panel) is
analyzed within bands
–  Results are combined
across bands
Hearing 29
6.3 Perceptually-motivated frequency scales
SGN-14006 / A.K.
Hearing 30
Subjective attributes of sound
SGN-14006 / A.K.
! 
Sounds are typically described using four main attributes
! 
Table: dependence of the subjective attributes on physical
parameters
–  loudness, pitch, timbre, and duration
mm. on basilar membrane
–  ♦♦♦ = strongly dependent, ♦♦ = to some extent ♦ = weak dependency
frequency / kHz
Subjective attribute
Physical parameter
frequency / mel
frequency / Bark
Loudness
Pitch
Timbre
Duration
Pressure
♦ ♦ ♦
♦
♦
♦
Frequency
♦
♦ ♦ ♦
♦ ♦
♦
Spectrum
♦
♦
♦ ♦ ♦
♦
Duration
♦
♦
♦
♦♦♦
Envelope
♦
♦
♦ ♦
♦
Hearing 31
7 Spatial hearing
! 
SGN-14006 / A.K.
The most important auditory cues for localizing a sound
sources in space are
1.  Interaural time difference
2.  Interaural intensity difference
3.  Direction-dependent filtering of the sound spectrum by head and
pinnae
! 
Terms
– 
– 
– 
– 
Monaural : with one ear
Binaural : with two ears
Interaural : between the ears (interaural time difference etc)
Lateralization : localizing a source in horizontal plane
Hearing 32
7.1 Monaural source localization
! 
! 
SGN-14006 / A.K.
Diretional hearing works to some extent even with one ear
Head and pinna form a direction-dependent filter
–  Direction-dependent changes in the spectrum of the sound arriving in the
ear can be described with HRTFs
–  HRTF = head-related transfer function
! 
HRTFs are crucial
for localizing
sources in the
median plane
(vertical localization)
Hearing 33
Monaural source localization
! 
SGN-14006 / A.K.
HRTFs can be measured by recording
7.2 Localizing a sinusoidal
! 
–  Sound emitted by a source
–  Sounds arriving to the auditory canal or eardrum (transfer function
of the auditory canal does not vary along with direction)
! 
Hearing 34
! 
In practice
SGN-14006 / A.K.
Experimenting with sinusoidal tones helps to understand
the localization of more complex sounds
Angle-of-arrival perception for sinusoids below 750 Hz is
based mainly on interaural time difference
–  left: microphone in the ear of a test subject,
OR
–  right:
head and torso
simulator
Hearing 35
Localizing a sinusoidal
! 
SGN-14006 / A.K.
Interaural time difference is useful only up to 750 Hz
–  Above that, the time difference is ambiguous, since there are
several wavelengths within the time difference
–  Moving the head (or source movement) helps: can be done up to
1500 Hz
! 
Hearing 36
7.3 Localizing complex sounds
! 
Complex sounds refer to sounds that
–  involve a number of different frequency components and
–  vary over time
! 
At higher frequencies
(> 750 Hz) the auditory system
utilizes interaural intensity
difference
–  Head causes and acoustic
”shadow” (sound level is
lower behind the head)
–  Works especially at
high frequencies
SGN-14006 / A.K.
Localizing sound sources is typically a result of
combining all the above-described mechanisms
1.  Interaural time difference (most important)
2.  Interaural intensity difference
3.  HRTFs
! 
Wideband noise: directional hearing works well
Hearing 37
7.4 Lateralization in headphone listening
! 
SGN-14006 / A.K.
When listening with headphones, the sounds are often localized
inside the head, on the axis between the ears
–  Sound does not seem to come from outside the head because the
diffraction caused by pinnae and head is missing
–  If the sounds are processed with HRTFs carefully, they move outside the
head