Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
10/9/12 Comprehensive review of psychoacoustics Moore BCJ (2012) An Introduction to the Psychology of Hearing. Academic Press: London. Pitch Houtsma AJM, Goldstein JL (1972) The central origin of the pitch of complex tones: evidence from musical interval recognition. J Acoust Soc Am 51:520-9. Loudness Moore BCJ, Glasberg BR, Vickers DA (1995) Simulation of the effects of loudness recruitment on the intelligibility of speech in noise. Brit J Audiol 29:131-43. Cai S, Ma WL, Young ED (2009) Simulation of the effects of loudness recruitment on the intelligibility of speech in noise. JARO 10:5-22. May et al. 2012 Brad May, PhD Dept. of Otolaryngology-Head and Neck Surgery Johns Hopkins University 1 10/9/12 Frequency is the number of times that a periodic event repeats in a unit of time. For example, the sinusoidal amplitude oscillations of a pure tone are described in cycles per second, or Hertz. These are physical attributes of the stimulus that can be measured with instruments. 2 10/9/12 Pitch is the perceptual property that allows sounds to be ordered along a musical scale (low to high). Pitch is related to the objective property of frequency, but the two attributes are not identical. The frequency content of a sound does not necessarily predict its pitch. Properties of sound other than frequency (e.g., duration, bandwidth, or loudness) also influence pitch. Pitch is specified in mels. By convention, 1000 mels equal the pitch of a 1000-Hz tone at a 40 dB sensation level. This subjective property of sound can only be measured with the human ear. For example, the change from 1000 to 8000 Hz spans 3 octaves on the frequency scale but only 1.5 octaves on the mel scale 3 10/9/12 The perception of pure tone pitch is usually measured by presenting two tone in sequence (2IFC) and asking the subject to indicate which has the higher pitch. The difference limen for frequency (DLF, ∆F) is the change in frequency that produces 75% correct responses. This measure is typically called frequency discrimination because the frequency of the tone is being varied, but the subject’s response is based on the perception of pitch information. Weber’s Law: The just-noticeabledifference (JND) for frequency is a constant fraction of the reference frequency (∆F/F is a constant). Frequency discrimination thresholds can be quite small at low frequencies. For example, the ∆F for a 1000-Hz tone is about 2 Hz (arrow). An issue for the psychoacoustician is explaining how this information is represented in the auditory system. 4 10/9/12 We know from previous lectures that auditory neurons may use place and time representations to encode pitch information. Psychoacousticians search for auditory stimuli that can separate the relative importance of these two processing modalities. A basic problem for any place theory of pitch perception is to explain the underlying representation of pitch information at very small DLFs. The DLF at 1000 Hz is only 2 Hz. Tones of 995 and 1005 Hz produce the nearly identical excitation patterns shown on the right. 5 10/9/12 A problem for a time theory is the upper limits of phase-locking at frequencies near 1000-Hz. A dual representation is suggested by the steep rise in the mel function at this transition. The relative importance of place vs time cues can be tested by varying the duration of the pure tone (numerical labels on the right). At short durations, energy spreads over a wider range of frequencies, lowering the slope of the excitation pattern. At frequencies below 4 kHz (arrow), the DLFs for short duration tones are better than those predicted by place models. 6 10/9/12 The pitch of a pure tone decreases with increasing level at frequencies below 2500 Hz. It increases at frequencies above 2500 Hz. The excitation pattern of a low-frequency tone shifts toward the base of the cochlea at high sound levels. Therefore, place models predict an upward shift in pitch. This prediction is only true for highfrequency tones. Place models do not adequately predict the effects of frequency and level on the DLFs of pure tones at frequencies below 4 – 5 kHz. These results suggest that pitch is determined by temporal mechanisms at low frequencies and by place mechanisms at high frequencies. 7 10/9/12 The relative importance of place vs time representations can be directly measured with complex sounds that independently manipulate both parameters. In the mid 19th century, contemporary theorists assumed that the pitch of complex tones was produced by patterns of excitation on the basilar membrane. Seebeck challenged this view with a simple forced air siren. 8 10/9/12 When the rotating disk of the siren contained one hole and the period of rotation was constant (T), the pitch of the siren equaled 1/ T. For example, a 2-ms interval produced a pitch that was equivalent to a 500-Hz tone. This result is not surprising because the siren produced a pulse train with 500 pulses/s and the output spectrum concentrated energy at 500-Hz. Adding a second hole, evenly spaced with the first, produced a pitch that equaled a 1000-Hz tone (2/T). Again, this result is not surprising because the pulse train and activation pattern were the same as those produced by a 1-kHz tone. If the second hole was not evenly spaced, the pitch of the siren dropped to the pitch of a 500-Hz tone. Because the output spectrum still contained most of its energy at 1000 Hz, pitch must be derived from the 2ms period of the pulse train (T1 + T2). 9 10/9/12 The 2000-Hz carrier frequency of the upper sine wave is an exact multiple of the 200-Hz modulation frequency. The spectrum of the stimulus contains energy at 1800, 2000, and 2200 Hz. The 2040-Hz carrier frequency of the lower sine wave produces an inharmonic series with components at 1840, 2040, and 2240 Hz. The lower stimulus has a higher pitch than the upper stimulus. The higher pitch of the lower stimulus is not determined by repetition rate or harmonic spacing, which are identical for both stimuli. For both stimuli, pitch is predicted by the time interval between the prominent peaks in the waveforms. 10 10/9/12 This figure simulates the response of the basilar membrane to a periodic pulse train with a presentation rate of 200 pulses per second. Harmonic frequencies are marked with arrows. The lower harmonics (black labels) of the waveform are clearly resolved, but the higher harmonics (red labels) are not. The complex response at high-frequency locations repeats at a rate that equals the pulse presentation rate. The 200-Hz pitch of the pulse train will persist if the resolved lowfrequency harmonics are removed from the stimulus or masked by lowpass noise. Classical place theory cannot account for this phenomenon. 11 10/9/12 • A complex tone may produce multiple pitches. • Pitches produced by the lower resolved harmonics (partials) sound like pure tones. • Pitches that correspond to the interaction of higher unresolved harmonics produce a less pure residue pitch. • The value of the residue pitch is determined by the periodicity of the basilar membrane waveform where the partials interfere. Schouten, 1970 The spectro-temporal model incorporates both place and time cues. The initial spectral analysis is performed by peripheral filters that contain resolved harmonics at low frequencies and unresolved harmonics at high frequencies. The complex response at high harmonics repeats at the fundamental frequency of the input. The subsequent temporal analysis determines common spike intervals. The period of the inputs will be represented in the intervals of filters tuned to both resolved and unresolved harmonics. 12 10/9/12 Instantaneous sound pressure is the deviation from the local ambient pressure caused by a sound wave at a given moment in time. Effective sound pressure is the root mean square of the instantaneous sound pressure over a given interval of time. This is the property that is measured by a sound level meter. 13 10/9/12 Sound pressure level (SPL) is a logarithmic measure of sound pressure. It is typically specified in decibels (dB) above a standard reference level. The common reference is 20 µPa RMS, which is the absolute threshold of human hearing for a pure tone stimulus (1 kHz). Loudness is the perceptual property that allows sounds to be ordered along a magnitude scale (quiet to loud). Loudness is related to the objective property of SPL, but the two attributes are not identical. Properties other than SPL (e.g., duration, bandwidth, or frequency) also influence loudness. 14 10/9/12 Loudness scales are typically specified in sones. By convention, one sone equals the loudness of a 1000-Hz tone at a 40 dB sensation level. A sound judged twice as loud is 2 sones. Like pitch, this subjective property can only be measured with the human ear. Steven’s Power Law states that sensation is related to stimulus intensity by an exponent that depends on sensory modality and the constant k that depends on units of measurement. Discrimination thresholds for pure tone loudness are measured by presenting two tones (2IFC) at different sound levels and asking the subject to indicate which is louder. The difference limen for intensity (DLI, ∆I) is the change in level that produces 75% correct responses. This measure is usually called intensity discrimination although the SPL of the stimulus is being manipulated to determine the just-noticeable-difference (JND) for loudness. 15 10/9/12 Weber’s Law predicts that the JND for intensity should be a constant fraction of the reference intensity (∆I/I is a constant). This prediction holds true for broadband noise, but pure tones show a “near miss” because relative thresholds improve at higher sensation levels. Increased performance at higher sensation levels may be explained by the effects of tone level on excitation patterns. Although levels near the center of the excitation pattern increase in 10 dB steps, levels near the highfrequency side increase by more than 10 dB. 16 10/9/12 Loudness equality is usually specified in phons. The phon value of a tone is the SPL of an equally loud 1-kHz tone. The contours shift upward at low frequencies because low-frequency tones do not sound as loud as 1-kHz tones when presented at the same SPL. The contours are closely spaced because loudness grows rapidly at low frequencies. Equal-loudness contours can be explained by the summed discharge rates of auditory nerve fibers. The neural phon is based on the spike count elicited by a 1000-Hz tone. This metric captures the essential loudness features of human psychoacoustic performance. 17 10/9/12 This figure shows an equal loudness contour for 1-kHz tones with different durations. Tones with a duration of less 150 msec must be presented at a higher SPL to produce the same loudness sensation. The effect of duration on loudness has been attributed to temporal integation properties of neural discharge rates and the spread of energy at brief stimulus durations. In this example, the width of a noiseband is varied while total energy is held constant at one of five different SPLs. Energy summation: the loudness of a complex sound is based upon the sum of energy when components fall within a critical band. Loudness summation: loudness is based on the sum of the loudness levels in each critical band when multiple critical bands are involved. 18 10/9/12 This figure describes the effects of bandwidth on loudness in terms of three non-overlapping “loudness units.” The striped rectangles symbolize four bands of noise with constant SPL but different bandwidths. Loudness grows faster across critical bands because the growth of loudness within critical bands decelerates at higher levels of input. Stimuli that span critical bands work at a lower point on the input/output curve. Loudness functions derived from equalloudness contours contrast the growth of loudness at low vs high frequencies. The growth of loudness is relatively stable at frequencies above 1 kHz. Sound level meters and power amplifiers are weighted to take these frequency effects into account. Standards for environmental noise control are usually based on A-weighted measurements that reflect the shape of the 40-phon contour. 19 10/9/12 Deafness related changes in cochlear excitation patterns often produce abnormally steep loudness functions. This rapid growth of loudness is termed “loudness recruitment.” Hearing aids must compress the dynamic range of amplification to produce sounds that are both audible and not uncomfortably loud. Young and colleagues (Cai et al., 2008) recently described abnormal neural excitation patterns in the VCN of sound-exposed cats. The high-frequency side of the excitation pattern expanded at 2 – 3 times the normal rate at higher sound levels. Question: Do sound-exposed cats experience loudness recruitment? 20 10/9/12 Listeners respond faster to louder sounds. Animal psychophysical paradigms estimate the subjective loudness of auditory stimuli by analyzing reaction time. A reaction time equivalent of the phon can be derived by adjusting the sound level of a tone until the reaction of the response matches the reaction time of a 1000-Hz tone at a specified SPL. Equal-loudness contours obtained with this metric capture the essential loudness features of human psychoacoustic performance. 21 10/9/12 Relative to their normal pre-exposure reaction times, sound-exposed cats show accelerated responses that resemble the recruitment effects of hearing-impaired humans. This is the same sound exposure that produced expanded neural excitation patterns in the VCN of cats with acoustic trauma (slide 39). • For broadband sounds and high-frequency tones, the growth of loudness is well predicted by Steven’s Power Law. • The “near miss” to Weber’s Law can be explained by the effects of SPL on basilar membrane excitation patterns. • Loudness recruitment is associated with altered excitation patterns that are observed in the central auditory system, not the auditory periphery. • Energy summation and loudness summation suggest that the auditory system has the capacity to integrate the neural correlates of loudness within and across multiple channels of input. 22