Download Brad May, PhD

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Amusia wikipedia , lookup

Sensorineural hearing loss wikipedia , lookup

Olivocochlear system wikipedia , lookup

Soundscape ecology wikipedia , lookup

Sound localization wikipedia , lookup

Sound wikipedia , lookup

Auditory system wikipedia , lookup

Sound from ultrasound wikipedia , lookup

Transcript
10/9/12
Comprehensive review of psychoacoustics
Moore BCJ (2012) An Introduction to the Psychology of Hearing. Academic Press: London.
Pitch
Houtsma AJM, Goldstein JL (1972) The central origin of the pitch of complex tones: evidence from
musical interval recognition. J Acoust Soc Am 51:520-9.
Loudness
Moore BCJ, Glasberg BR, Vickers DA (1995) Simulation of the effects of loudness recruitment on the
intelligibility of speech in noise. Brit J Audiol 29:131-43.
Cai S, Ma WL, Young ED (2009) Simulation of the effects of loudness recruitment on the intelligibility
of speech in noise. JARO 10:5-22.
May et al. 2012
Brad May, PhD
Dept. of Otolaryngology-Head and Neck Surgery
Johns Hopkins University
1
10/9/12
Frequency is the number of times that
a periodic event repeats in a unit of
time. For example, the sinusoidal
amplitude oscillations of a pure tone
are described in cycles per second, or
Hertz. These are physical attributes of
the stimulus that can be measured with
instruments.
2
10/9/12
Pitch is the perceptual property that
allows sounds to be ordered along a
musical scale (low to high). Pitch is
related to the objective property of
frequency, but the two attributes are not
identical. The frequency content of a
sound does not necessarily predict its
pitch. Properties of sound other than
frequency (e.g., duration, bandwidth, or
loudness) also influence pitch.
Pitch is specified in mels. By
convention, 1000 mels equal the pitch
of a 1000-Hz tone at a 40 dB
sensation level. This subjective
property of sound can only be
measured with the human ear. For
example, the change from 1000 to
8000 Hz spans 3 octaves on the
frequency scale but only 1.5 octaves
on the mel scale
3
10/9/12
The perception of pure tone pitch is
usually measured by presenting two
tone in sequence (2IFC) and asking the
subject to indicate which has the higher
pitch. The difference limen for frequency
(DLF, ∆F) is the change in frequency
that produces 75% correct responses.
This measure is typically called
frequency discrimination because the
frequency of the tone is being varied,
but the subject’s response is based on
the perception of pitch information.
Weber’s Law: The just-noticeabledifference (JND) for frequency is a
constant fraction of the reference
frequency (∆F/F is a constant).
Frequency discrimination thresholds
can be quite small at low frequencies.
For example, the ∆F for a 1000-Hz
tone is about 2 Hz (arrow). An issue
for the psychoacoustician is
explaining how this information is
represented in the auditory system.
4
10/9/12
We know from previous lectures that
auditory neurons may use place and
time representations to encode pitch
information.
Psychoacousticians search for
auditory stimuli that can separate the
relative importance of these two
processing modalities.
A basic problem for any place theory of
pitch perception is to explain the
underlying representation of pitch
information at very small DLFs.
The DLF at 1000 Hz is only 2 Hz.
Tones of 995 and 1005 Hz produce the
nearly identical excitation patterns
shown on the right.
5
10/9/12
A problem for a time theory is the
upper limits of phase-locking at
frequencies near 1000-Hz. A dual
representation is suggested by the
steep rise in the mel function at this
transition.
The relative importance of place vs
time cues can be tested by varying the
duration of the pure tone (numerical
labels on the right).
At short durations, energy spreads over
a wider range of frequencies, lowering
the slope of the excitation pattern. At
frequencies below 4 kHz (arrow), the
DLFs for short duration tones are better
than those predicted by place models.
6
10/9/12
The pitch of a pure tone decreases with
increasing level at frequencies below
2500 Hz. It increases at frequencies
above 2500 Hz.
The excitation pattern of a low-frequency
tone shifts toward the base of the
cochlea at high sound levels. Therefore,
place models predict an upward shift in
pitch. This prediction is only true for highfrequency tones.
Place models do not adequately predict the effects of frequency and
level on the DLFs of pure tones at frequencies below 4 – 5 kHz. These
results suggest that pitch is determined by temporal mechanisms at low
frequencies and by place mechanisms at high frequencies.
7
10/9/12
The relative importance of place vs time representations can be directly
measured with complex sounds that independently manipulate both
parameters.
In the mid 19th century, contemporary
theorists assumed that the pitch of
complex tones was produced by
patterns of excitation on the basilar
membrane. Seebeck challenged this
view with a simple forced air siren.
8
10/9/12
When the rotating disk of the siren contained
one hole and the period of rotation was
constant (T), the pitch of the siren equaled 1/
T. For example, a 2-ms interval produced a
pitch that was equivalent to a 500-Hz tone.
This result is not surprising because the siren
produced a pulse train with 500 pulses/s and
the output spectrum concentrated energy at
500-Hz.
Adding a second hole, evenly spaced
with the first, produced a pitch that
equaled a 1000-Hz tone (2/T). Again,
this result is not surprising because
the pulse train and activation pattern
were the same as those produced by
a 1-kHz tone.
If the second hole was not evenly
spaced, the pitch of the siren dropped
to the pitch of a 500-Hz tone.
Because the output spectrum still
contained most of its energy at 1000
Hz, pitch must be derived from the 2ms period of the pulse train (T1 + T2).
9
10/9/12
The 2000-Hz carrier frequency of the
upper sine wave is an exact multiple
of the 200-Hz modulation frequency.
The spectrum of the stimulus contains
energy at 1800, 2000, and 2200 Hz.
The 2040-Hz carrier frequency of the
lower sine wave produces an
inharmonic series with components at
1840, 2040, and 2240 Hz.
The lower stimulus has a higher pitch
than the upper stimulus.
The higher pitch of the lower stimulus
is not determined by repetition rate or
harmonic spacing, which are identical
for both stimuli.
For both stimuli, pitch is predicted by
the time interval between the
prominent peaks in the waveforms.
10
10/9/12
This figure simulates the response of
the basilar membrane to a periodic
pulse train with a presentation rate of
200 pulses per second. Harmonic
frequencies are marked with arrows.
The lower harmonics (black labels) of
the waveform are clearly resolved,
but the higher harmonics (red labels)
are not. The complex response at
high-frequency locations repeats at a
rate that equals the pulse
presentation rate.
The 200-Hz pitch of the pulse train
will persist if the resolved lowfrequency harmonics are removed
from the stimulus or masked by lowpass noise. Classical place theory
cannot account for this phenomenon.
11
10/9/12
•  A complex tone may produce multiple pitches.
•  Pitches produced by the lower resolved harmonics (partials) sound like pure
tones.
•  Pitches that correspond to the interaction of higher unresolved harmonics
produce a less pure residue pitch.
•  The value of the residue pitch is determined by the periodicity of the basilar
membrane waveform where the partials interfere.
Schouten, 1970
The spectro-temporal model incorporates
both place and time cues. The initial spectral
analysis is performed by peripheral filters
that contain resolved harmonics at low
frequencies and unresolved harmonics at
high frequencies. The complex response at
high harmonics repeats at the fundamental
frequency of the input.
The subsequent temporal analysis
determines common spike intervals. The
period of the inputs will be represented in
the intervals of filters tuned to both resolved
and unresolved harmonics.
12
10/9/12
Instantaneous sound pressure is the
deviation from the local ambient
pressure caused by a sound wave at a
given moment in time.
Effective sound pressure is the root
mean square of the instantaneous
sound pressure over a given interval of
time. This is the property that is
measured by a sound level meter.
13
10/9/12
Sound pressure level (SPL) is a
logarithmic measure of sound
pressure. It is typically specified in
decibels (dB) above a standard
reference level. The common reference
is 20 µPa RMS, which is the absolute
threshold of human hearing for a pure
tone stimulus (1 kHz).
Loudness is the perceptual property
that allows sounds to be ordered along
a magnitude scale (quiet to loud).
Loudness is related to the objective
property of SPL, but the two attributes
are not identical. Properties other than
SPL (e.g., duration, bandwidth, or
frequency) also influence loudness.
14
10/9/12
Loudness scales are typically specified
in sones. By convention, one sone equals
the loudness of a 1000-Hz tone at a 40 dB
sensation level. A sound judged twice as
loud is 2 sones. Like pitch, this subjective
property can only be measured with the
human ear.
Steven’s Power Law states that
sensation is related to stimulus intensity
by an exponent that depends on sensory
modality and the constant k that depends
on units of measurement.
Discrimination thresholds for pure tone
loudness are measured by presenting
two tones (2IFC) at different sound
levels and asking the subject to indicate
which is louder. The difference limen for
intensity (DLI, ∆I) is the change in level
that produces 75% correct responses.
This measure is usually called intensity
discrimination although the SPL of the
stimulus is being manipulated to
determine the just-noticeable-difference
(JND) for loudness.
15
10/9/12
Weber’s Law predicts that the JND for
intensity should be a constant fraction of
the reference intensity (∆I/I is a constant).
This prediction holds true for broadband
noise, but pure tones show a “near miss”
because relative thresholds improve at
higher sensation levels.
Increased performance at higher
sensation levels may be explained by
the effects of tone level on excitation
patterns. Although levels near the
center of the excitation pattern increase
in 10 dB steps, levels near the highfrequency side increase by more than
10 dB.
16
10/9/12
Loudness equality is usually specified in
phons. The phon value of a tone is the
SPL of an equally loud 1-kHz tone. The
contours shift upward at low frequencies
because low-frequency tones do not
sound as loud as 1-kHz tones when
presented at the same SPL. The
contours are closely spaced because
loudness grows rapidly at low
frequencies.
Equal-loudness contours can be
explained by the summed discharge
rates of auditory nerve fibers. The
neural phon is based on the spike
count elicited by a 1000-Hz tone. This
metric captures the essential
loudness features of human
psychoacoustic performance.
17
10/9/12
This figure shows an equal loudness
contour for 1-kHz tones with different
durations. Tones with a duration of
less 150 msec must be presented at
a higher SPL to produce the same
loudness sensation.
The effect of duration on loudness
has been attributed to temporal
integation properties of neural
discharge rates and the spread of
energy at brief stimulus durations.
In this example, the width of a
noiseband is varied while total energy
is held constant at one of five different
SPLs.
Energy summation: the loudness of
a complex sound is based upon the
sum of energy when components fall
within a critical band.
Loudness summation: loudness is
based on the sum of the loudness
levels in each critical band when
multiple critical bands are involved.
18
10/9/12
This figure describes the effects of
bandwidth on loudness in terms of
three non-overlapping “loudness
units.” The striped rectangles
symbolize four bands of noise with
constant SPL but different
bandwidths.
Loudness grows faster across critical
bands because the growth of
loudness within critical bands
decelerates at higher levels of input.
Stimuli that span critical bands work
at a lower point on the input/output
curve.
Loudness functions derived from equalloudness contours contrast the growth of
loudness at low vs high frequencies. The
growth of loudness is relatively stable at
frequencies above 1 kHz.
Sound level meters and power amplifiers
are weighted to take these frequency
effects into account. Standards for
environmental noise control are usually
based on A-weighted measurements that
reflect the shape of the 40-phon contour.
19
10/9/12
Deafness related changes in cochlear
excitation patterns often produce
abnormally steep loudness functions.
This rapid growth of loudness is termed
“loudness recruitment.” Hearing aids
must compress the dynamic range of
amplification to produce sounds that are
both audible and not uncomfortably loud.
Young and colleagues (Cai et al., 2008)
recently described abnormal neural excitation
patterns in the VCN of sound-exposed cats.
The high-frequency side of the excitation
pattern expanded at 2 – 3 times the normal
rate at higher sound levels.
Question: Do sound-exposed cats
experience loudness recruitment?
20
10/9/12
Listeners respond faster to louder
sounds. Animal psychophysical
paradigms estimate the subjective
loudness of auditory stimuli by
analyzing reaction time.
A reaction time equivalent of the phon can be
derived by adjusting the sound level of a tone
until the reaction of the response matches the
reaction time of a 1000-Hz tone at a specified
SPL. Equal-loudness contours obtained with
this metric capture the essential loudness
features of human psychoacoustic
performance.
21
10/9/12
Relative to their normal pre-exposure reaction
times, sound-exposed cats show accelerated
responses that resemble the recruitment effects
of hearing-impaired humans.
This is the same sound exposure that produced
expanded neural excitation patterns in the VCN
of cats with acoustic trauma (slide 39).
•  For broadband sounds and high-frequency tones, the growth of loudness is
well predicted by Steven’s Power Law.
•  The “near miss” to Weber’s Law can be explained by the effects of SPL on
basilar membrane excitation patterns.
•  Loudness recruitment is associated with altered excitation patterns that are
observed in the central auditory system, not the auditory periphery.
•  Energy summation and loudness summation suggest that the auditory system
has the capacity to integrate the neural correlates of loudness within and
across multiple channels of input.
22