Download Hearing Lecture notes (1): Introductory Hearing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Auditory processing disorder wikipedia , lookup

Earplug wikipedia , lookup

Noise-induced hearing loss wikipedia , lookup

Sound wikipedia , lookup

Soundscape ecology wikipedia , lookup

Sound localization wikipedia , lookup

Olivocochlear system wikipedia , lookup

Sensorineural hearing loss wikipedia , lookup

Sound from ultrasound wikipedia , lookup

Auditory system wikipedia , lookup

Transcript
1
SECOND YEAR COURSE
PERCEPTION
Hearing Lecture notes (1): Introductory Hearing
1. What is hearing for ?
• (i) Indicate direction of sound sources (better than eyes since omnidirectional, no eye-lids; but poorer resolution of direction).
• (ii) Recognise the identity and content of a sound source (such as speech
or music or a car).
• (iii) Give information on the nature of the environment via echoes,
reverberation (normal room, cathedral, open field).
2. Waveforms and Frequency Analysis
Sound is a change in the pressure of the air. The waveform of any sound
shows how the pressure changes over time. The eardrum moves in response
to changes in pressure.
Any waveform shape can be produced by adding together sine waves of
appropriate frequencies and amplitudes. The amplitudes (and phases) of the
sine waves give the spectrum of the sound. The spectrum of a sine wave is a
single point at the frequency of the sine wave. The spectrum of white noise is
a line covering all frequencies.
The cochlea breaks the waveform at the ear down into its component sine
waves - frequency analysis. Hair cells in the cochlea respond to these
component frequencies. This process of frequency analysis is impaired in
sensori-neural hearing loss. It cannot be compensated for by a conventional
hearing aid.
3. Why does the auditory system analyse sound by frequency ?
Some animals do not analyse sound by frequency, but simply transmit the
pressure waveform at the ear directly. We could do this by having hair cells
on the eardrum. But instead we have an elaborate system to analyse sound
into its frequency components. We do this because, since almost all sounds
are structured in frequency, we can detect them, especially in the presence of
other sounds, more easily by "looking" at the spectrum than at the
waveform.
In the six panels below, the left-hand column shows plots of the waveform of
a sound - the way that pressure changes over time. The right-hand column
shows the spectrum of the sound - how much of each sine-wave you have to
add together in order to make that particular waveform.
The upper panel is a sine wave tone with a frequency of 1000 Hz. A sine
wave has energy at just one frequency, so the spectrum is just one point.
waveform
spectrum
p=1.0 *sin(2 pi 1000 t)
1
1.0
0.5
0
-0.5 0
0.005
1
-1
time (t)
1000 frequency (Hz)
The middle panel is white noise (like the sound of a waterfall). White noise
has equal energy at all frequencies, so the spectrum is a horizontal line.
10
Noise
5
1.0
0
-5
-10
time t
frequency (Hz)
The lower panel is the sine tone added to the noise. The spectrum of the
sum is just the sum of the spectra of the two components. Notice that you
can see the tone very easily in the spectrum, but it is completely obscured by
the noise in the waveform.
Noise + Tone
1.0
10
5
0
-5
-10
time t
1000 frequency (Hz)
4. Sine waves
A sine wave has three properties which appear in the basic equation:
p(t) = a* sin(2π ft + )
(i) frequency (f) - measured in Hertz (Hz), cycles per second.
2
(ii) amplitude (a) - is a measure of the pressure change of a sound. It is
usually measured in decibels (dB) relative to another sound; the dB scale
is a logarithmic scale : if we have two sounds p 1 and p 2, then p1 is
20*log10(p1 /p2 ) dB grester than p 2 . Doubling pressure (amplitude) gives
on increase of 6dB: 20 * log10(2/1) = 20 * 0.3 = 6.
Amplitude squared is proportional to the energy, or level, or intensity (I)
of a sound. The decibel difference between two sounds can also be
expressed in terms of intensity changes: 10*log10(I1 /I2 ). Doubling
intensity gives an increase of 3dB (10 * 0.3). The just noticeable difference
(jnd) between two sounds is about 1dB
(iii) phase ( - measured in degrees or radians, indicates the relative time of
a wave.
The sine wave shown above has an amplitude of 1, a frequency of 1000 Hz,
and it starts in zero sine phase (φ = 0
5. Complex periodic sounds
A sound which has more than one (sine-wave) frequency component is a
complex sound. A periodic sound is one which repeats itself at regular
intervals. A sine wave is a simple periodic sound. Musical instruments or
the voice produce complex periodic sounds. They have a spectrum
consisting of a series of harmonics. Each harmonic is a sine wave that has a
frequency that is an integer multiple of the fundamental frequency.
For example, the note 'A' played by the oboe to tune the orchestra has a
fundamental frequency of 440 Hz, giving harmonics at 440, 880, 1320, 1760,
2200, 2640, etc. If the oboe played a higher pitch, the fundamental frequency
(and so all the harmonic frequencies of the note) would be higher. The period
of a complex sound is 1/fundamental frequency (in this case 1/440 = 0.0023s
= 2.3ms). A different instrument, with a different timbre, playing the same
pitch as the oboe, would have harmonics at the same frequencies, but the
harmonics would have different relative amplitudes. The overall timbre of a
natural instrument is partly determined by the relative amplitudes of the
harmonics, but the attack of the note is also important. Different harmonics
start at different times in different instruments, and the rate at which they
start also differs markedly across instruments. Cheap synthesisers cannot
imitate the attack, and so they do not make very lifelike sounds. Expensive
synthesisers (like Yamaha's Clavinova) store the whole note including the
attack and so sound very realistic.
Here is one-tenth of a second of the waveform and also the spectrum of a
complex periodic sound consisting of the first four harmonics of a
fundamental of 100 Hz. Notice that there are 10 cycles of the waveform in
0.1s, and all the frequency components are integer multiples of 100 Hz.
3
4
4
2
0
-2 0
0.05
-4
1
time (t)
1.0
100 200 300 400
frequency (Hz)
Here is a sound with the same period, but a different timbre. Notice that the
waveform has a different shape, but the same period. The change in timbre
is produced by making the higher harmonics lower in amplitude.
2
1
0
-1 0
-2
0.05
1
time (t)
1.0
0.5
.25
100 200 300 400
frequency (Hz)
We can also change the shape of the waveform by changing the relative phase
of the different frequencies. In this example four components were all in
sine phase, in the next example the odd harmonics are in sine phase and the
even in cosine phase. This change produces very little change in timbre.
1
0
-1
-2
0
0.05
1
time (t)
1.0
0.5
.25
100 200 300 400
frequency (Hz)
6. Linearity
Most studies of the auditory system have used sine waves. If we know how
a system responds to sine waves, then we can predict exactly how it will
behave to complex waves (which are made up of sine waves), provided that
the system is linear.
• The output of a linear system to the sum of two inputs, is equal to the
sum of its outputs to the two inputs separately.
• Equivalently, if you double the input to a linear system, then you double
the output.
• A linear system can only output frequencies that are present on the input,
non-linear systems always add extra frequency components.
The filters we describe below are linear. The auditory system is only linear to
a first approximation.
7. Filters
A filter lets through some frequencies but not others. A treble control acts as
a low-pass filter, letting less of the high frequencies through as you turn the
treble down. A bass control acts as a high-pass filter, letting less of the low
frequencies through as you turn the bass down. A band-pass filter only lets
through frequencies that fall within some range. A slider on a graphic
equalizer controls the output level of a band-pass filter. In analysing sound
into its frequency components, the ear acts like a set of band-pass filters.
We can represent the action of a filter with a diagram like a spectrum which
shows by how much each frequency is attenuated (or reduced in amplitude)
when it passes through the filter.
5
6
Input sound
1.0
4
2
0
-2 0
-4
0.05
1
time (t)
100 200 300 400
frequency (Hz)
Filter
Low-Pass Filter
1.0
0.5
.25
100 200 300 400
frequency (Hz)
Output sound
1.0
0.5
2
1
.25
0
-1 0
-2
0.05
time (t)
1
100 200 300 400
frequency (Hz)
8. Resonance
A resonant system acts like a band-pass filter, responding to a narrow range
of frequencies. Examples are: a tuning fork, a string of a harp or piano, a
swing. Helmholtz was almost right in thinking that the ear consisted of a
series of resonators - like a grand-piano with the sustaining pedal held
down. Here is what happens when a complex sound is passed through a
sharply- tuned band-pass filter. Notice that a complex wave goes in, but a
sine wave comes out. Each part of the basilar membrane acts like a bandpass filter tuned to a different frequency.
Input sound
4
2
0
-2 0
-4
0.05
1
time (t)
1.0
100 200 300 400
frequency (Hz)
Filter
1.0
Band-Pass Filter
0.5
.25
100 200 300 400
frequency (Hz)
Output sound
7
8
1.0
0
0
-1.0
0.05
1
time (t)
1.0
0.5
.25
100 200 300 400
frequency (Hz)
What you should know.
You should understand the meaning of all the terms shown in italics. You
should also be able to explain all the diagrams in this handout. If you do not
understand any of the terms or diagrams, first try asking someone else in the
class who you think might. If you still don't, then ask me either in a lecture,
after a lecture or in my office.
SECOND YEAR COURSE
PERCEPTION
Hearing Lecture Notes (2): Ear and Auditory Nerve
1
THE EAR
There are three main parts of the ear: the pinna (or external ear) and meatus,
the middle ear, and the cochlea (inner ear).
1.1 Pinna and meatus
The pinna serves different functions in different animals. Those with mobile
pinnae (donkey, cat) use it to amplify sound coming from a particular
direction, at the expense of other sounds. The human pinna is not mobile,
but serves to colour high frequency sounds by interference between the
echoes reflected off its different structures (like the colours of light produced
by reflection from an oil slick). Only frequencies that have a wavelength
comparable to the dimensions of the pinna are influenced by it (> 3kHz).
Different high frequencies are amplified by different amounts depending on
the direction of the sound. The brain interprets these changes as direction.
The meatus is the tube that links the pinna to the eardrum. It resonates at
around 2kHz so that frequencies in that region are transmitted more
efficiently to the cochlea than others. This frequency region is particularly
important in speech.
1.2 Middle ear: tympanic membrane, malleus, incus and stapes
The middle ear transmits the vibrations of the ear drum (tympanic
membrance) to the cochlea. The middle ear performs two functions.
(i) Impedance matching - vibrations in air must be transmitted efficiently into
the fluid of the cochlea. If there were no middle ear most of the sound
would just bounce off the cochlea. The middle ear helps turn a large
amplitude vibration in air into a small amplitude vibration (of the same
energy) in fluid. The large area of the ear-drum compared with the small
area of the stapes helps to achieve this, together with the lever action of
the three middle ear bones or ossicles (malleus, incus, stapes).
(ii) Protection against loud low frequency sounds - the cochlea is susceptible
to damage from intense sounds. The middle ear offers some protection
by the stapedius reflex, which tenses muscles that stiffen the vibration of
the ossicles, thus reducing the extent to which low frequency sounds are
transmitted. The reflex is triggered by loud sounds; it also reduces the
extent of upward spread of masking from intense low-frequency sounds
(see hearing lecture 3).
Damage to the middle ear causes a Conductive Hearing Loss which can
usually be corrected by a hearing aid. In a conductive hearing loss, absolute
thresholds are elevated. These thresholds are measured in an audiological
test and shown in an audiogram. Appropriate amplification at different
frequencies compensates for the conductive loss.
1.3 Inner ear: cochlea
The snail-shaped cochlea, unwound, is a three-chambered tube. Two of the
chambers are separated by the basilar membrane, on which sits the organ of
Corti. The tectorial membrane sits on top of the organ of Corti and is fixed
9
rigidly to the organ of Corti at one end only. Sound produces a travelling
wave down the basilar membrane that is detected by shearing movement
between the tectorial and basilar membranes bending the hairs on top of
inner hair cells that form part of of the organ of Corti. Different frequencies of
sound give maximum vibration at different places along the basilar
membrane.
When a low frequency pure tone stimulates the ear, the whole basilar
membrane, up to the point at which the travelling wave dies out, vibrates at
the frequency of the tone. The amplitude of the vibration has a very sharp
peak. The vibration to high frequency tones peaks nearer the base of the
membrane than does the vibration to low frequency sounds. The
characteristic frequency (CF) of a particular place along the membrane is the
frequency that peaks at that point. If more than one tone is present at a time,
then their vibrations on the membrane add together (but see remarks on
non-linearity).
high frequencies
low frequencies
0
base
apex
distance along basilar membrane
More intense tones give a greater vibration than do less intense:
high frequencies
low frequencies
high intensity
low intensity
0
base
CF
apex
distance along basilar membrane
A brief click contains energy at virtually all frequencies. Each part of the
basilar membrane will resonate to its particular frequency.
10
11
1
Response of band-pass filter
to a click
0.5
0
0
0.2
0.4
0.6
-0.5
-1
0.8
1
time (t)
1 / CF
It is a useful approximation to note that each point on the basilar membrane
acts like a very sharply tuned band-pass filter. In the normal ear these filters
are just as sharply tuned as are individual fibers of the auditory nerve (see
below).
1.3.1
Non-linearity
In normal ears the response of the basilar membrane to sound is actually
non-linear - there is significant distortion.
• If you double the input to the basilar membrane, the output less than
doubles (saturating non-linearity).
• If you add a second tone at a different frequency, the response to the first
tone can decrease (Two-tone suppression)
• If you play two tones (say 1000 & 1200 Hz) a third tone can appear (at 800
Hz) - the so-called Cubic Difference Tone.
1.3.2
Sensori-neural hearing loss (SNHL)
Sensori-neural hearing loss can be brought about by exposure to loud
sounds (particularly impusive ones like gun shots), or by infection or by
antibiotics. It usually arises from loss of outer hair cells. It is likely that outer
hair cells act as tiny motors; they feed back energy into the ear at the CF. In
ears with a sensori-neural hearing loss,(SNHL) this distortion is reduced or
disappears. So, paradoxically, abnormal ears are more nearly linear.
1.3.3
Forms of deafness
There are two major forms of deafness: conductive and sensori-neural.
Conductive
Sensori-neural
Origin
Middle-ear
Cochlea (OHCs)
Thresholds
Raised
Raised
Filter bandwidths
Normal
Increased
Loudness growth
Normal
Increased (Recruitment)
Bold symptoms are not alleviated by a conventional hearing aid.
1.3.4
Role of outer hair cells
The active feedback of energy by outer hair cells into the basilar membrane
is probably responsible for:
(i) the sharp peak in the basilar membrane response -low thresholds and
narrow bandwidth;
(ii) oto-acoustic emissions (sounds that come out of the ear);
(iii) the non-linear response of the basilar membrane vibration. The more
linear behaviour of the SNHL basilar membrane is probably the cause of
loudness recruitment (abnormally rapid growth of loudness).
2
AUDITORY NERVE
As the hairs of inner hair cells bend, the voltage of the hair cell changes; when
the hairs are bent sufficiently in one direction (but not the other) the voltage
changes enough to release neurotransmitter in the junction between the hair
cell and the auditory nerve synapse, and the auditory nerve fires. This
direction corresponds to a pressure rarefaction in the air. After firing, an
auditory nerve fibre has a refractory period of around 1 ms. Each hair cell
has about 10 auditory nerve fibers connected to it. These fibers have
different thresholds.
Inner hair cells stimulate the afferent auditory nerve, outer hair cells
generally do not, but are innervated by the efferent auditory nerve. Efferent
activity may influence the mechanical response of the basilar membrane via
the outer hair cells.
2.1 Response to single pure tones
As the amplitude of a tone played to the ear increases, so the rate of firing of
a nerve fibre at CF increases up to saturation. Most auditory nerve fibers
have high spontaneous rates and saturate rapidly, but there are others
(which are harder to record from) that have low spontaneous rates and
saturate more slowly. High spontaneous rate fibers code intensity changes
at low levels, and the low spontaneous rate ones code intensity changes at
high levels.
100
saturation
many
80
60
few
high spontaneous rate
40
20
low spontaneous rate
30
60
log amplitude (dB SPL)
90
2.2 Frequency threshold curves (FTCs)
FTCs plot the minimum intensity of sound needed at a particular frequency
to just stimulate an auditory nerve fibre above spontaneous activity. The
high frequency slopes are very steep (c. 300 dB/oct), the low frequency
slopes generally have a steep tip followed by a flatter base. Damage to the
cochlea easily abolishes the tip, and explains some features of Sensori-Neural
Hearing Loss: raised thresholds and reduced frequency selectivity.
12
13
Abnormal bandwidth
100
Abnormal
Threshold
80
60
Normal
bandwidth
40
20
Normal
Threshold
Characteristic
Frequency
log frequency
2.3 Characteristic frequency (CF)
The CF of an auditory nerve fibre is the frequency at which least energy is
needed to stimulate it. Different nerve fibers have different CFs and different
thresholds. The CF of a fiber is roughly the same as the resonant frequency
of the part of the basilar membrane that it is attached to.
2.4 Phase locking
The auditory nerve will tend to fire at a particular phase of a stimulating
low-frequency tone. So the inter-spike intervals tend to occur at integer
multiples of the period of the tone. With high frequency tones (> 3kHz) phase
locking gets weaker, because the capacitance of inner hair cells prevents them
from changing in voltage sufficiently rapidly. Please note that the weaker
phase-locking at high frequencies is NOT due to the refractory period.
2.5 Coding frequency
How does the brain tell, from the pattern of firing in the auditory nerve,
what frequencies are present? There are two alternatives:
(a) place of maximal excitation - fibres whose CF is close to a stimulating
tone's frequency will fire at a higher rate than those remote from it. So
the frequency of a tone will be given by the place on the membrane from
which emerge fibers having the highest rate of firing.
(b) timing information - fibres with a CF near to a stimulating tone's
frequency will be phase locked to the tone, provided it is low in
frequency (< 3kHz). So, consistent inter-spike intervals across a band of
fibers indicate the frequency of a tone.
14
1
0.5
0
-0.5
0
0.2
0.4
0.6
0.8
1
-1
Response to Low Frequency tones
Inter-spike Intervals
2 periods
1 period
nerve spike
time (t)
Response to High Frequency tones > 5kHz
Random intervals
time (t)
2.6 Coding intensity
How does the brain tell, from the pattern of firing in the auditory nerve,
what are the intensities of the different frequencies present? The dynamic
range of most auditory nerve fibres (high spontaneous) is not sufficient to
cover the range of hearing (c.100dB). Low spontaneous rte fibers have a
larger dynamic range and provide useful information at high levels. So
information about intensity is carried in different fibers at different levels.
2.7 Two-tone suppression
If a tone at a fiber's CF is played just above threshold for that fiber, the fiber
will fire. But if a second tone is also played, at a frequency and level in the
shaded area of the next diagram, then the firing rate will be reduced. This
two-tone suppression demonstrates that the normal auditory system is nonlinear. If the system were linear, then the firing rate could only be
unchanged or increased by the addition of an extra tone.
Two-tone suppression is a characteristic of the normal ear and may be absent
in the damaged ear. It is formally similar to lateral inhibition in vision, but it
has a very different underlying cause. Lateral inhibition in vision is the
result of neural mechanisms whereas two-tone inhibition is the result of
mechanical processes inthe cochlea.
15
100
80
Regions for
two-tone
suppression
60
Test tone at
Characteristic
Frequency
40
20
log frequency
2.8 Cochlear implants
Implants can be fitted to patients who are profoundly deaf (>90dB loss), who
gain very little benefit from conventional hearing aids. In multi-channel
implants, a number of bipolar electrodes are inserted into the cochlea,
terminating at different places. Electrical current derived from band-pass
filtering sound can stimulate selectively auditory nerve fibers near the
electrode, giving some crude 'place' coding of frequency.
The best patients' hearing is good enough for them to understand isolated
words over the telephone, but there is a great deal of variation across
patients, which may be partly due to the integrity of the auditory nerve and
higher pathways. It is increasingly common to fit cochlear implants to
profoundly deaf children, so that they gain exposure to spoken language.
This move raises ethical issues, as well as social ones for the signing deaf
community, some of whom oppose implants.
3
WHAT YOU SHOULD KNOW.
You should understand the meaning of all the terms shown in italics. You
should also be able to explain all the diagrams in this handout. If you do not
understand any of the terms or diagrams, first try asking someone else in the
class whom you think might understand. If you still don't, then ask me
either in the lecture or afterwards.
SECOND YEAR COURSE
PERCEPTION
Hearing Lecture notes (3): Introductory psychoacoustics
1.
BASIC TERMS
There is an importnat distinction between terms used to describe physical
properties and those used to describe psychological properties.
Psychological properties are usually influenced by many physical ones.
Physical
Intensity
Level
Frequency
Spectrum
Psychological
Loudness
Pitch
Timbre
1.1.
Absolute threshold
Human listeners are most sensitive to sounds around 2-3kHz. Absolute
threshold at these frequencies for normal young adults is around 0 dB Sound
Pressure Level (SPL - level relative to 0.0002 dyne/cm2 ). Thresholds increase
to about 50 dB SPL at 100 Hz and 10 dB SPL at 10 kHz. A normal young
adult's absolute threshold for a pure tone defines 0 dB Hearing Level (HL) at
that frequency. An audiogram measures an individual's threshold at different
frequencies relative to 0dB HL. Normal ageing progressively increases
thresholds at high frequencies (presbyacusis). A noisy environment will lead
to a more rapid hearing loss (40 dB loss at 4kHz for a factory worker at age
35, compared with 20 dB for an office worker). The term Sensation Level (SL)
gives the number of dB that a sound is above its absolute threshold for a
particular individual.
2.
FREQUENCY RESOLUTION AND MASKING
Ohm's Acoustic Law states that we can perceive the individual Fourier
components of a complex sound. It is only partly true since the ear has a
limited ability to resolve different frequencies. Our ability to separate
different frequencies in the ear depends on the sharpness of our auditory
filters. The physiology underlying auditory filters is described in the
previous Notes. The bandwidth of human auditory filters at different
frequencies can be measured psychoacoustically in masking experiments
(see below). The older literature refers to the width of an auditory filter at a
particular frequency as the Critical Band . Sounds can be separated by the ear
when they fall within a Critical Band, but they mix together when they do
not. For example (and somewhat oversimplified!), only harmonics that are
separated by more than a critical band can be heard out from a mixture;
only noise that is within a critical band contributes to the masking of a tone.
A simple demonstration of the bandwidth of noise that contributes to the
masking of a tone is in the following band-limiting demonstration which is
Demonstration 2 in the ASA "Auditory Demonstrations" CD.
16
17
• In silence, you can hear all ten 5dB steps of the 2000Hz tone.
• In wide-band noise you can only hear about five because of masking.
• As the bandwidth of the noise is decreased to 1000 Hz and then to 250 Hz
there is no change, because your auditory bandwidth is narrower than
these values.
• When the bandwidth of the noise is decreased to 10 Hz, you hear more
tone steps because the noise bandwidth is now narrower than the
auditory filter and so less noise gets into the auditory filter to mask the
tone.
Measurement of auditory bandwidth
with band-limited noise
1000 Hz
Broadband
Level
250 Hz
frequency
2000 Hz
Tone
Noise
Auditory bandwidth
Noise bandwidth
Detection mechanism
The masked threshold of a tone is its level when it is just detectable in the
presence of some other sound. It will of course vary with the masking
sound. The amount of masking is the difference between the masked
threshold and the abolute threshold. Generally, individuals with broader
auditory filters (as a result of SNHL) show more masking. In Simultaneous
masking the two sounds are presented at the same time. In Forward masking
the masking sound is presented just before the test tone. It gives slightly
different results from simultaneous masking because of non-linearities in the
auditory system.
Types of Masking
Mask
Target
Forward
Mask
Target
Backward
Simultaneous
In older studies of masking:
Mask frequency and level are fixed
Threshold level for the Target is measured at different frequencies
In measuring Psychoacoustic Tuning Curves:
Target frequency and level are fixed
Threshold level for the Mask is measured at different frequencies
2.1.Psychophysical Tuning Curves
A psychophysical method can be used to generate an analogy to the
physiological frequency threshold curve for a single auditory fiber. A
narrowband noise of variable center frequency is the masker, and a fixed
frequency and fixed level pure tone at about 20 dB HL is the target. The
level of masker is found that just masks the tone for different masker
frequencies. Compare the following diagram with the FTC in the previous
Notes.
100
80
60
40
Target frequency
20
800
1000 1200
600
Masker center frequency
Using these techniques (and other similar ones) we can estimate the shape
and bandwidth of human auditory filters at different (target) frequencies.
The bandwidth values are shown in the next diagram. At 1kHz the
bandwidth is about 130; at 5kHz about 650 Hz.
18
19
1000
800
600
400
200
0
0
2000
4000
6000
Center Frequency (Hz)
8000
Psychophysical tuning curves measured in people with SNHL often show
increased auditory bandwidths at those frequencies where they have a
hearing loss.
2.2.
Excitation pattern
Using the filter shapes and bandwidths derived from masking experiments
we can produce the excitation pattern produced by a sound. The excitation
pattern shows how much energy comes through each filter in a bank of
auditory filters. It is analogous to the pattern of vibration on the basilar
membrane. For a 1000 Hz pure tone the excitation pattern for a normal and
for a SNHL listener look like this:
Normal
SNHL
1000
2000
center frequency of auditory filter
The excitation pattern to a complex tone is simply the sum of the patterns to
the sine waves that make up the complex tone (since the model is a linear
one). We can hear out a tone at a particular frequency in a mixture if there is
a clear peak in the excitation pattern at that frequency.
Since people suffering from SNHL have broader auditory filters their
excitation patterns do not have such clear peaks. Sounds mask each other
more, and so they have difficulty hearing sounds (such as speech) in noise.
3.
NON-LINEARITIES
To a first approximation the cochlea acts like a row of linear overlapping
band-pass filters. But there is clear evidence that the cochlea is in fact
inherently non-linear (ie its non-linearity is not just a result of over-loading it
at high signal levels). In a non-linear system the output to (a+b) is not the
same as the output to (a) plus the output to (b).
3.1.
Combination tones
If two tones at frequencies f 1 and f2 are played to the same ear
simultaneously, a third tone is heard at a frequency (2f1 -f2 ) provided that f1
and f2 are close in frequency (f2 /f1 < 1.2) and at similar levels. Combination
tones are often absent in SNHL.
3.2.
Two-tone suppression
Two-tone suppression
Mask 1000 Hz
Target 1000 Hz
a
Less masking
c
Mask 1000 Hz +
Suppressor 900 Hz
b
a
Target 1000 Hz
c
In single auditory nerve recordings, the response to a just supra threshold
tone at CF can be reduced by a second tone, even though the tone would itself have increased the nerve's firing rate. A similar effect is found in
forward masking. The forward masking of tone a on tone c can be reduced
if a is accompanied by a third tone b with a different frequency, even though
b has no effect on c on its own. Two-tone suppression is often absent in
SNHL.
What you should know
You should understand: what an auditory filter is and how it is measured;
what an excitation pattern is and how it changes for those having a SNHL.
You should know what is the evidence for non-linearities in human hearing.
20
SECOND YEAR COURSE
PERCEPTION
Hearing Lecture notes (4): Pitch Perception
Definition: Pitch is the 'attribute of auditory sensation in terms of which
sounds may be ordered on a musical scale'.
1.
PURE TONES
Pitch of pure tones is influenced mainly by their frequency, but also by
intensity: high frequency pure tones go flat when played loud. The pitch of
pure tones is probably coded by a combination of place and timing
mechanisms:
• Place mechanisms can explain diplacusis (same tone giving different
pitches in the two ears) more easily than can timing mechanisms.
• But timing theories based on phase-locked neural discharge appear to be
needed in order to explain our ability to distinguish the frequencies of
very short duration tones (whose place representation would be very
blurred).
• Timing theories could be the whole story for musical pitch since it
deteriorates at high frequencies where phase locking is weak. (The
highest note on the piano is around 4 kHz; higher notes lose their sense
of musical pitch). For very high frequency tones (5-20kHz) you can tell
crudely which is the higher in frequency, but not what musical note is
being played.
2.
COMPLEX TONES
Structure. Almost all sounds that give a sensation of pitch are periodic.
Their spectrum consists of harmonics that are integer multiples of the
fundamental. The pitch of a complex periodic tone is close to the pitch of a
sine wave at the fundamental. Helmholtz claimed that the pitch is heard at
the fundamental since the fundamental frequency gives the lowest
frequency peak on the basilar membrane.
Fundamental =200 Hz
Harmonic spacing
= 200 Hz
1.0
4
3
2
1
0
1
-1
-2
-3
-4
Period = 1/200 s = 5ms
time (t)
200 400 600 800
frequency (Hz)
2.1.Missing fundamental
Seebeck (and later Schouten) showed that complex periodic sounds with no
energy at the fundamental may still give a clear pitch sensation at the
fundamental (cf telephone speech - the telephone acts as a high-pass filter,
removing energy below about 300 Hz).
21
22
Harmonic spacing
= 200 Hz
3
1.0
2
1
0
1
-1
-2
-3
Period = 1/200 s = 5ms
time (t)
200 400 600 800
frequency (Hz)
2.2.
Helmholtz's place theory
Helmholtz suggested that the ear reintroduces energy at the fundamental by
a process of distortion that produces energy at frequencies corresponding to
the difference between two components physically present (i.e. at the
harmonic spacing). Any pair of adjacent harmonics would generate energy
at the fundamental.
Helmholtz's explanation is wrong because:
(i) a pitch at the fundamental is still heard in lowpass filtered masking noise
that heavily masks the fundamental.
(ii) a complex sound consisting of enharmonic frequencies (eg 807, 1007, 1207)
gives a pitch that is slightly higher than the difference of 200.
(iii) the distortion only occurs at high intensities but low intensities still give
the pitch.
2.3.
Schouten's timing theory
Schouten proposed that the brain times the intervals between beats of the
unresolved (see next diagram) harmonics of a complex sound, in order to
find the pitch.
Schouten's theory is wrong because:
(i) pitch is determined more by the resolved than by the unresolved harmonics
(ii) you can still hear a pitch corresponding to the fundamental when the two
consecutive frequency components go to opposite ears.
The following diagram shows the excitation pattern that would be produced
on the basilar membrane separately by individual harmonics of a 200 Hz
fundamental. Notice that the excitation patterns of the higher numbered
harmonics are closer together than those of the low-numbered harmonics.
This is because the filters have a bandwidth which is roughly a tenth of their
center frequency (and so is constant on a log scale), whereas harmonics are
equally spaced in frequency on a linear scale. More harmonics then get into
a high-frequency filter than into a low-frequency one. The low-numbered
harmonics are resolved by the basilar membrane (giving roughly sinusoidal
output in their filters); but the high-numbered harmonics are not resolved.
They add together in their filters to give a complex vibration which shows
beats at the fundamental frequency.
23
resolved
800 600 400
unresolved 1600
base
apex
log frequency
Output of 1600 Hz filter
Output of 200 Hz filter
1/200s = 5ms
2
0.6
1
0.4
0.5
0.2
0
-1
-1.5
-2
1/200s = 5ms
1
0.8
1.5
-0.5
200
0
0
0.2
0.4
0.6
0.8
1
-0.2 0
0.2
0.4
0.6
0.8
1
-0.4
-0.6
-0.8
-1
2.4.
Pattern recognition theories
Goldstein's theory states that pitch is determined by a pattern recognition
process on the resolved harmonics from both ears. The brain finds the bestfitting harmonic series to the resolved frequencies, and takes its fundamental
as the pitch.
Goldstein's theory accounts well for most of the data, but there is also a
weak pitch sensation from periodic sounds which do not contain any
resolvable harmonics or from aperiodic sounds that have a regular envelope
(such as amplitude modulated noise). A theory such as Schouten's may be
needed in addition to Goldstein's in order to account for such effects.
Evidence for there being two separate mechanisms for resolved and
unresolved harmonics is:
• pitch discrimination and musical pitch labelling (eg A#) is much worse
for sounds consisting of only unresolved harmonics;
• comparison of pitches between two sounds one having resolved and the
other unresolved harmonics is worse than comparison of pitches between
two sounds both with unresolved harmonics.
3.
WHAT YOU SHOULD KNOW
You should know the evidence for and against the three different theories of
pitch perception for complex tones, and the difference between place and
timing mechanisms for the pitch of pure tones.
SECOND YEAR COURSE
PERCEPTION
Hearing Lecture Notes (5): Binaural hearing and localization
Possible cues to localization of a sound:
• binaural time/intensity differences (inherently ambiguous);
• pinna effects;
• reverberation and intensity;
• head movements.
1.
PURE TONES
1.1.
Rayleigh's duplex theory
(only applies to azimuth , ie localization in horizontal plane)
1.1.1.
Low frequency tones (<1500 Hz) localised by phase
differences:
• Phase locking present for low frequency tones (<4kHz).
• Jeffress' cross-correlator gives possible neural model.
• Maximum time difference between the ears is about 670 us
corresponding to half a cycle at 1500 Hz (the upper limit for binaural
phase sensitivity)
• Onset time is different from the ongoing phase difference. Onset time
differences are important for short sounds.
1.1.2.
High (and low) frequency tones localised by intensity
differences
• Shadow cast by head greater at high (20 dB at 6 kHz) than low frequencies
(3 dB at 500 Hz) i.e. head acts as a lowpass filter.
• auditory nerve not phase-locked for high frequency tones (>4kHz).
• phase differences are ambiguous for high frequency tones (>1500Hz)
1.2.
Time/intensity trade
The time/intensity trade is shown by titrating a phase difference in one
direction against an intensity difference in the other direction.
• Varies markedly with frequency of a sound.
• Not due to a peripheral effect of intensity on nerve latency, since:
• can get multiple images
• optimally traded stimulus is distinguishable from untraded.
1.3
Cone of confusion
Binaural cues are inherently ambiguous. The same differences can be
produced by a sound anywhere on the surface of an imaginary cone whose
tip is in the ear. For pure tones this ambiguity can only be resolved by head
movements.
2.
COMPLEX TONES
2.1.
Timing cues
As with pure tones, onset time cues are important for (particularly short)
complex tones. But the use of other timing cues is different since high
frequency complex tones can change in localization with an ongoing timing
difference. The next diagram shows the output of an auditory filter at 1600
Hz to a complex tone with a fundamental of 200 Hz. The 1400, 1600 and
1800 Hz components of the complex pass through the filter and add together
to give the complex wave shown in the diagram. The complex wave has an
envelope that repeats at 200 Hz. Phase differences would not change the
24
localization of any of those tones if they were heard individually, but we can
localize sounds by the relative timing of the envelopes in the two ears
(provided that the fundamental frequency (envelope frequency) is less than
about 400 Hz).
Output of 1600 Hz filter to complex tone with a 200Hz fundamental
frequency; right ear leading by 500 us.
1/200s = 5ms
2
1.5
1
0.5
0
-0.5
0
0.2
0.4
0.6
0.8
1
-1
-1.5
-2
Left
Right ear leads by 500 us
2
1.5
1
0.5
0
-0.5
0
0.2
0.4
0.6
0.8
1
-1
-1.5
-2
Right
2.2.
Pinna effects
(mainly median plane)
Pinna reflects high frequency sound (wavelength less than dimesnions of
outer ear) with echoes whose latency varies with direction (Batteau).
Reflections cause echoes which interfere with other echoes/direct sound to
give spectral peaks and notches. Frequency of peaks and notches varies with
direction of sound and are used to indicate direction in median plane.
2.3
Head movements
Head movements can resolve the ambiguity of front-back confusions.
3.
DISTANCE
A distant sound will be quieter and have relatively more reverberation than
a close sound. Increasing the proportion of reverberant sound leads to
greater apparent distance. Lowpass filtering also leads to greater apparent
distance; (high frequencies absorbed more by water vapour in air by up to
about 3 dB/100 ft). If you know what a sound is, then you can use its actual
timbre and loudness to tell its distance.
4.
VISION
Seen location easily dominates over heard location when the two are in
conflict.
5.
PRECEDENCE (OR HAAS) EFFECT
In an echoic environment the first wavefront to reach a listener indicates the
direction of the source. The brain suppresses directional information from
subsequent sounds.
25
Since echoes come from different directions than the main sound, they may
be ignored more easily with two ears.
6.
BINAURAL EFFECTS
A number of psychoacoustic phenomena demonstrate that we are only
binaurally sensitive to the phase of a pure tone if its frequency is less than
about 2 kHz.
6.1.
Binaural beats
Fluctuations in intensity and/or localisation when two different tones one to
each ear (e.g. 500 + 504 Hz gives a beat at 4 Hz). Only works for low
frequency tones < 1.5 kHz.
6.2.
Binaural masking level difference (BMLD)
When the same tone in noise is played to both ears, the tone is harder to
detect than when one ear either does not get the tone, or has the tone at a
different phase. Magnitude of effect declines above about 1 kHz, as phaselocking breaks down. Explained by Durlach's Equalization and Cancellation
model.
6.3.
Cramer-Huggins pitch
If noise is fed to one ear and the same noise to the other ear but with the
phase changed in a narrow band of frequencies, subjects hear a pitch
sensation at the frequency of the band. Pitch gets rapidly less clear above
1500 Hz. (NB Can be explained by models of the BMLD effect if you think of
the phase-shifted band as the 'tone').
WHAT YOU SHOULD KNOW
You should be able to describe the different cues used to localize pure and
complex tones. You should understand why phase-locking does not occur
for high frequency pure tones, and why this important in localization and in
other binaural effects. You should know what the BMLD is and how
Durlach's model explains it.
26
SECOND YEAR COURSE
PERCEPTION
Hearing Lecture Notes (6): Auditory Object Recognition & Music
Timbre
Vowel sounds in speech differ in the relative amplitudes of their harmonics. A
particular vowel has harmonics that have a greater amplitude near the formant
frequencies. A formant is a resonant frequency of the vocal tract. As you change
the pitch of a vowel, you change the fundamental frequency and the spacing of the
harmonics, but the formant frequencies stay the same. If you change the vowel
without changing the pitch of the voice, the fundamental and the harmonic spacing
stay the same but the formant frequencies change. Here is the spectrum of the vowel
in "bit" on a fundamental frequency of 125 Hz.
F1 = 396Hz
Formants
F2 = 1520Hz
F3 = 1940Hz
125 Hz
(fundamental)
frequency
Musical instruments: The synthetic sounds produced by a simple keyboard
synthesiser differ in:
• the relative amplitude of their harmonics;
• their attack time and decay time.
For most synthesisers the relative amplitudes of the different harmonics stay
constant throughout the sound.
The sounds produced by a natural musical instrument are much more complex; the
different harmonics start and stop at different times and change in relative amplitude
throughout the "steady-state" of the note. Our ability to identify a natural musical
instrument from another depends more on the attack (and decay) than the "steadystate". The nature of the attack and the relative amplitudes during the staedy-state
are not constant for a particular instrument. They depend on the style of playing,
where in the the range of the instrument the note is etc.
Auditory Scene Analysis
Ears receive waves from many different sound sources at the same time eg multiple
talkers, or instruments, cars, machinery etc. In order to recognise the pitch and
timbre of the sound from a particular source the brain must decide which frequencies
"belong together" and have come from this source. The problem is formally similar
to that of "parsing" a visual scene into separate objects.
Principles enunciated by the Gestalt psychologists in vision are useful as heuristics
for helping the decide what sounds will be grouped together: proximity, similarity,
good continuation, common fate, all have auditory analogues.
27
The brain needs to group simultaneously (separating out which frequency
components that are present at a particular time have come from the same sound
source) and also successively (deciding which group of components at one time is a
continuation of a previous group).
Auditory streaming
Auditory streaming is the formation of perceptually distinct apparent sound sources.
Temporal order judgement is good within a stream but bad between steams.
Examples include:
• implied polyphony
• noise burst replacing a consonant in a sentence.
• click superimposed on a sentence or melody.
Grouping Principles
(i) Proximity
• Tones close in frequency will group together, so as to minimise the extent of
frequency jumps and the number of streams.
• Tones with similar timbre will tend to group together.
• Speech sounds of similar pitch will tend to be heard from the same speaker.
• Sounds from different locations are harder to group together across time than
those from the same location.
•
•
(ii) Common fate
Sounds from a common source tend to start and stop at the same time and change
in amplitude or frequency together (vibrato).
A single component is easy to hear out if it is the only one to change in a
complex.
(iii) Good continuation
Abrupt discontinuities in frequency or pitch, can give the impression of a different
sound source.
Continuity Effect
Sound that is interrupted by a noise that masks it, can appear to be continuous.
Alternations of sound and mask can give the illusion of continuity with the auditory
system interpolating across the mask.
Music Perception
Tuning Consonant intervals have harmonics that do not beat together to give
roughness, i.e. at small integer frequency ratios: 2:1 (octave) 3:2 (fifth) 4:3 (fourth)
5:4 (major third).
Unfortunately, a scale based on such intervals is not internally consistent and does
not allow modulations.
Equal temperament sacrifices some consonance in the primary intervals for an equal
size of semitone (21/12), and so sounds equally in tune in any key.
Absolute pitch About 1 person in 10,000 has "Absolute Pitch" - they can identify
the pitch of musical note without the use of an external reference pitch. Most people
can only give pitch names relatively - "if that is A this must be C". Absolute pitch is
much more common in people who had musical training at an early age than among
those who started later, and is probably more common in those whose early training
involved learning the names of notes. It can be a liability, since pitch perception can
change as you grow older, and international pitch standards also change. A more
common absolute ability is the ability to tell when a piece of music is being played
in the correct key.
28
Melody The pitch of a tone can be regarded as having chroma (musical note name)
and height (which octave). Melodies are hard to recognise if only chroma is
maintained (transposing notes by octaves). Overall contour is an important attribute
of melody, and allows variation of chroma within a recognisable framework.
29