* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Hearing Lecture notes (1): Introductory Hearing
Survey
Document related concepts
Transcript
1 SECOND YEAR COURSE PERCEPTION Hearing Lecture notes (1): Introductory Hearing 1. What is hearing for ? • (i) Indicate direction of sound sources (better than eyes since omnidirectional, no eye-lids; but poorer resolution of direction). • (ii) Recognise the identity and content of a sound source (such as speech or music or a car). • (iii) Give information on the nature of the environment via echoes, reverberation (normal room, cathedral, open field). 2. Waveforms and Frequency Analysis Sound is a change in the pressure of the air. The waveform of any sound shows how the pressure changes over time. The eardrum moves in response to changes in pressure. Any waveform shape can be produced by adding together sine waves of appropriate frequencies and amplitudes. The amplitudes (and phases) of the sine waves give the spectrum of the sound. The spectrum of a sine wave is a single point at the frequency of the sine wave. The spectrum of white noise is a line covering all frequencies. The cochlea breaks the waveform at the ear down into its component sine waves - frequency analysis. Hair cells in the cochlea respond to these component frequencies. This process of frequency analysis is impaired in sensori-neural hearing loss. It cannot be compensated for by a conventional hearing aid. 3. Why does the auditory system analyse sound by frequency ? Some animals do not analyse sound by frequency, but simply transmit the pressure waveform at the ear directly. We could do this by having hair cells on the eardrum. But instead we have an elaborate system to analyse sound into its frequency components. We do this because, since almost all sounds are structured in frequency, we can detect them, especially in the presence of other sounds, more easily by "looking" at the spectrum than at the waveform. In the six panels below, the left-hand column shows plots of the waveform of a sound - the way that pressure changes over time. The right-hand column shows the spectrum of the sound - how much of each sine-wave you have to add together in order to make that particular waveform. The upper panel is a sine wave tone with a frequency of 1000 Hz. A sine wave has energy at just one frequency, so the spectrum is just one point. waveform spectrum p=1.0 *sin(2 pi 1000 t) 1 1.0 0.5 0 -0.5 0 0.005 1 -1 time (t) 1000 frequency (Hz) The middle panel is white noise (like the sound of a waterfall). White noise has equal energy at all frequencies, so the spectrum is a horizontal line. 10 Noise 5 1.0 0 -5 -10 time t frequency (Hz) The lower panel is the sine tone added to the noise. The spectrum of the sum is just the sum of the spectra of the two components. Notice that you can see the tone very easily in the spectrum, but it is completely obscured by the noise in the waveform. Noise + Tone 1.0 10 5 0 -5 -10 time t 1000 frequency (Hz) 4. Sine waves A sine wave has three properties which appear in the basic equation: p(t) = a* sin(2π ft + ) (i) frequency (f) - measured in Hertz (Hz), cycles per second. 2 (ii) amplitude (a) - is a measure of the pressure change of a sound. It is usually measured in decibels (dB) relative to another sound; the dB scale is a logarithmic scale : if we have two sounds p 1 and p 2, then p1 is 20*log10(p1 /p2 ) dB grester than p 2 . Doubling pressure (amplitude) gives on increase of 6dB: 20 * log10(2/1) = 20 * 0.3 = 6. Amplitude squared is proportional to the energy, or level, or intensity (I) of a sound. The decibel difference between two sounds can also be expressed in terms of intensity changes: 10*log10(I1 /I2 ). Doubling intensity gives an increase of 3dB (10 * 0.3). The just noticeable difference (jnd) between two sounds is about 1dB (iii) phase ( - measured in degrees or radians, indicates the relative time of a wave. The sine wave shown above has an amplitude of 1, a frequency of 1000 Hz, and it starts in zero sine phase (φ = 0 5. Complex periodic sounds A sound which has more than one (sine-wave) frequency component is a complex sound. A periodic sound is one which repeats itself at regular intervals. A sine wave is a simple periodic sound. Musical instruments or the voice produce complex periodic sounds. They have a spectrum consisting of a series of harmonics. Each harmonic is a sine wave that has a frequency that is an integer multiple of the fundamental frequency. For example, the note 'A' played by the oboe to tune the orchestra has a fundamental frequency of 440 Hz, giving harmonics at 440, 880, 1320, 1760, 2200, 2640, etc. If the oboe played a higher pitch, the fundamental frequency (and so all the harmonic frequencies of the note) would be higher. The period of a complex sound is 1/fundamental frequency (in this case 1/440 = 0.0023s = 2.3ms). A different instrument, with a different timbre, playing the same pitch as the oboe, would have harmonics at the same frequencies, but the harmonics would have different relative amplitudes. The overall timbre of a natural instrument is partly determined by the relative amplitudes of the harmonics, but the attack of the note is also important. Different harmonics start at different times in different instruments, and the rate at which they start also differs markedly across instruments. Cheap synthesisers cannot imitate the attack, and so they do not make very lifelike sounds. Expensive synthesisers (like Yamaha's Clavinova) store the whole note including the attack and so sound very realistic. Here is one-tenth of a second of the waveform and also the spectrum of a complex periodic sound consisting of the first four harmonics of a fundamental of 100 Hz. Notice that there are 10 cycles of the waveform in 0.1s, and all the frequency components are integer multiples of 100 Hz. 3 4 4 2 0 -2 0 0.05 -4 1 time (t) 1.0 100 200 300 400 frequency (Hz) Here is a sound with the same period, but a different timbre. Notice that the waveform has a different shape, but the same period. The change in timbre is produced by making the higher harmonics lower in amplitude. 2 1 0 -1 0 -2 0.05 1 time (t) 1.0 0.5 .25 100 200 300 400 frequency (Hz) We can also change the shape of the waveform by changing the relative phase of the different frequencies. In this example four components were all in sine phase, in the next example the odd harmonics are in sine phase and the even in cosine phase. This change produces very little change in timbre. 1 0 -1 -2 0 0.05 1 time (t) 1.0 0.5 .25 100 200 300 400 frequency (Hz) 6. Linearity Most studies of the auditory system have used sine waves. If we know how a system responds to sine waves, then we can predict exactly how it will behave to complex waves (which are made up of sine waves), provided that the system is linear. • The output of a linear system to the sum of two inputs, is equal to the sum of its outputs to the two inputs separately. • Equivalently, if you double the input to a linear system, then you double the output. • A linear system can only output frequencies that are present on the input, non-linear systems always add extra frequency components. The filters we describe below are linear. The auditory system is only linear to a first approximation. 7. Filters A filter lets through some frequencies but not others. A treble control acts as a low-pass filter, letting less of the high frequencies through as you turn the treble down. A bass control acts as a high-pass filter, letting less of the low frequencies through as you turn the bass down. A band-pass filter only lets through frequencies that fall within some range. A slider on a graphic equalizer controls the output level of a band-pass filter. In analysing sound into its frequency components, the ear acts like a set of band-pass filters. We can represent the action of a filter with a diagram like a spectrum which shows by how much each frequency is attenuated (or reduced in amplitude) when it passes through the filter. 5 6 Input sound 1.0 4 2 0 -2 0 -4 0.05 1 time (t) 100 200 300 400 frequency (Hz) Filter Low-Pass Filter 1.0 0.5 .25 100 200 300 400 frequency (Hz) Output sound 1.0 0.5 2 1 .25 0 -1 0 -2 0.05 time (t) 1 100 200 300 400 frequency (Hz) 8. Resonance A resonant system acts like a band-pass filter, responding to a narrow range of frequencies. Examples are: a tuning fork, a string of a harp or piano, a swing. Helmholtz was almost right in thinking that the ear consisted of a series of resonators - like a grand-piano with the sustaining pedal held down. Here is what happens when a complex sound is passed through a sharply- tuned band-pass filter. Notice that a complex wave goes in, but a sine wave comes out. Each part of the basilar membrane acts like a bandpass filter tuned to a different frequency. Input sound 4 2 0 -2 0 -4 0.05 1 time (t) 1.0 100 200 300 400 frequency (Hz) Filter 1.0 Band-Pass Filter 0.5 .25 100 200 300 400 frequency (Hz) Output sound 7 8 1.0 0 0 -1.0 0.05 1 time (t) 1.0 0.5 .25 100 200 300 400 frequency (Hz) What you should know. You should understand the meaning of all the terms shown in italics. You should also be able to explain all the diagrams in this handout. If you do not understand any of the terms or diagrams, first try asking someone else in the class who you think might. If you still don't, then ask me either in a lecture, after a lecture or in my office. SECOND YEAR COURSE PERCEPTION Hearing Lecture Notes (2): Ear and Auditory Nerve 1 THE EAR There are three main parts of the ear: the pinna (or external ear) and meatus, the middle ear, and the cochlea (inner ear). 1.1 Pinna and meatus The pinna serves different functions in different animals. Those with mobile pinnae (donkey, cat) use it to amplify sound coming from a particular direction, at the expense of other sounds. The human pinna is not mobile, but serves to colour high frequency sounds by interference between the echoes reflected off its different structures (like the colours of light produced by reflection from an oil slick). Only frequencies that have a wavelength comparable to the dimensions of the pinna are influenced by it (> 3kHz). Different high frequencies are amplified by different amounts depending on the direction of the sound. The brain interprets these changes as direction. The meatus is the tube that links the pinna to the eardrum. It resonates at around 2kHz so that frequencies in that region are transmitted more efficiently to the cochlea than others. This frequency region is particularly important in speech. 1.2 Middle ear: tympanic membrane, malleus, incus and stapes The middle ear transmits the vibrations of the ear drum (tympanic membrance) to the cochlea. The middle ear performs two functions. (i) Impedance matching - vibrations in air must be transmitted efficiently into the fluid of the cochlea. If there were no middle ear most of the sound would just bounce off the cochlea. The middle ear helps turn a large amplitude vibration in air into a small amplitude vibration (of the same energy) in fluid. The large area of the ear-drum compared with the small area of the stapes helps to achieve this, together with the lever action of the three middle ear bones or ossicles (malleus, incus, stapes). (ii) Protection against loud low frequency sounds - the cochlea is susceptible to damage from intense sounds. The middle ear offers some protection by the stapedius reflex, which tenses muscles that stiffen the vibration of the ossicles, thus reducing the extent to which low frequency sounds are transmitted. The reflex is triggered by loud sounds; it also reduces the extent of upward spread of masking from intense low-frequency sounds (see hearing lecture 3). Damage to the middle ear causes a Conductive Hearing Loss which can usually be corrected by a hearing aid. In a conductive hearing loss, absolute thresholds are elevated. These thresholds are measured in an audiological test and shown in an audiogram. Appropriate amplification at different frequencies compensates for the conductive loss. 1.3 Inner ear: cochlea The snail-shaped cochlea, unwound, is a three-chambered tube. Two of the chambers are separated by the basilar membrane, on which sits the organ of Corti. The tectorial membrane sits on top of the organ of Corti and is fixed 9 rigidly to the organ of Corti at one end only. Sound produces a travelling wave down the basilar membrane that is detected by shearing movement between the tectorial and basilar membranes bending the hairs on top of inner hair cells that form part of of the organ of Corti. Different frequencies of sound give maximum vibration at different places along the basilar membrane. When a low frequency pure tone stimulates the ear, the whole basilar membrane, up to the point at which the travelling wave dies out, vibrates at the frequency of the tone. The amplitude of the vibration has a very sharp peak. The vibration to high frequency tones peaks nearer the base of the membrane than does the vibration to low frequency sounds. The characteristic frequency (CF) of a particular place along the membrane is the frequency that peaks at that point. If more than one tone is present at a time, then their vibrations on the membrane add together (but see remarks on non-linearity). high frequencies low frequencies 0 base apex distance along basilar membrane More intense tones give a greater vibration than do less intense: high frequencies low frequencies high intensity low intensity 0 base CF apex distance along basilar membrane A brief click contains energy at virtually all frequencies. Each part of the basilar membrane will resonate to its particular frequency. 10 11 1 Response of band-pass filter to a click 0.5 0 0 0.2 0.4 0.6 -0.5 -1 0.8 1 time (t) 1 / CF It is a useful approximation to note that each point on the basilar membrane acts like a very sharply tuned band-pass filter. In the normal ear these filters are just as sharply tuned as are individual fibers of the auditory nerve (see below). 1.3.1 Non-linearity In normal ears the response of the basilar membrane to sound is actually non-linear - there is significant distortion. • If you double the input to the basilar membrane, the output less than doubles (saturating non-linearity). • If you add a second tone at a different frequency, the response to the first tone can decrease (Two-tone suppression) • If you play two tones (say 1000 & 1200 Hz) a third tone can appear (at 800 Hz) - the so-called Cubic Difference Tone. 1.3.2 Sensori-neural hearing loss (SNHL) Sensori-neural hearing loss can be brought about by exposure to loud sounds (particularly impusive ones like gun shots), or by infection or by antibiotics. It usually arises from loss of outer hair cells. It is likely that outer hair cells act as tiny motors; they feed back energy into the ear at the CF. In ears with a sensori-neural hearing loss,(SNHL) this distortion is reduced or disappears. So, paradoxically, abnormal ears are more nearly linear. 1.3.3 Forms of deafness There are two major forms of deafness: conductive and sensori-neural. Conductive Sensori-neural Origin Middle-ear Cochlea (OHCs) Thresholds Raised Raised Filter bandwidths Normal Increased Loudness growth Normal Increased (Recruitment) Bold symptoms are not alleviated by a conventional hearing aid. 1.3.4 Role of outer hair cells The active feedback of energy by outer hair cells into the basilar membrane is probably responsible for: (i) the sharp peak in the basilar membrane response -low thresholds and narrow bandwidth; (ii) oto-acoustic emissions (sounds that come out of the ear); (iii) the non-linear response of the basilar membrane vibration. The more linear behaviour of the SNHL basilar membrane is probably the cause of loudness recruitment (abnormally rapid growth of loudness). 2 AUDITORY NERVE As the hairs of inner hair cells bend, the voltage of the hair cell changes; when the hairs are bent sufficiently in one direction (but not the other) the voltage changes enough to release neurotransmitter in the junction between the hair cell and the auditory nerve synapse, and the auditory nerve fires. This direction corresponds to a pressure rarefaction in the air. After firing, an auditory nerve fibre has a refractory period of around 1 ms. Each hair cell has about 10 auditory nerve fibers connected to it. These fibers have different thresholds. Inner hair cells stimulate the afferent auditory nerve, outer hair cells generally do not, but are innervated by the efferent auditory nerve. Efferent activity may influence the mechanical response of the basilar membrane via the outer hair cells. 2.1 Response to single pure tones As the amplitude of a tone played to the ear increases, so the rate of firing of a nerve fibre at CF increases up to saturation. Most auditory nerve fibers have high spontaneous rates and saturate rapidly, but there are others (which are harder to record from) that have low spontaneous rates and saturate more slowly. High spontaneous rate fibers code intensity changes at low levels, and the low spontaneous rate ones code intensity changes at high levels. 100 saturation many 80 60 few high spontaneous rate 40 20 low spontaneous rate 30 60 log amplitude (dB SPL) 90 2.2 Frequency threshold curves (FTCs) FTCs plot the minimum intensity of sound needed at a particular frequency to just stimulate an auditory nerve fibre above spontaneous activity. The high frequency slopes are very steep (c. 300 dB/oct), the low frequency slopes generally have a steep tip followed by a flatter base. Damage to the cochlea easily abolishes the tip, and explains some features of Sensori-Neural Hearing Loss: raised thresholds and reduced frequency selectivity. 12 13 Abnormal bandwidth 100 Abnormal Threshold 80 60 Normal bandwidth 40 20 Normal Threshold Characteristic Frequency log frequency 2.3 Characteristic frequency (CF) The CF of an auditory nerve fibre is the frequency at which least energy is needed to stimulate it. Different nerve fibers have different CFs and different thresholds. The CF of a fiber is roughly the same as the resonant frequency of the part of the basilar membrane that it is attached to. 2.4 Phase locking The auditory nerve will tend to fire at a particular phase of a stimulating low-frequency tone. So the inter-spike intervals tend to occur at integer multiples of the period of the tone. With high frequency tones (> 3kHz) phase locking gets weaker, because the capacitance of inner hair cells prevents them from changing in voltage sufficiently rapidly. Please note that the weaker phase-locking at high frequencies is NOT due to the refractory period. 2.5 Coding frequency How does the brain tell, from the pattern of firing in the auditory nerve, what frequencies are present? There are two alternatives: (a) place of maximal excitation - fibres whose CF is close to a stimulating tone's frequency will fire at a higher rate than those remote from it. So the frequency of a tone will be given by the place on the membrane from which emerge fibers having the highest rate of firing. (b) timing information - fibres with a CF near to a stimulating tone's frequency will be phase locked to the tone, provided it is low in frequency (< 3kHz). So, consistent inter-spike intervals across a band of fibers indicate the frequency of a tone. 14 1 0.5 0 -0.5 0 0.2 0.4 0.6 0.8 1 -1 Response to Low Frequency tones Inter-spike Intervals 2 periods 1 period nerve spike time (t) Response to High Frequency tones > 5kHz Random intervals time (t) 2.6 Coding intensity How does the brain tell, from the pattern of firing in the auditory nerve, what are the intensities of the different frequencies present? The dynamic range of most auditory nerve fibres (high spontaneous) is not sufficient to cover the range of hearing (c.100dB). Low spontaneous rte fibers have a larger dynamic range and provide useful information at high levels. So information about intensity is carried in different fibers at different levels. 2.7 Two-tone suppression If a tone at a fiber's CF is played just above threshold for that fiber, the fiber will fire. But if a second tone is also played, at a frequency and level in the shaded area of the next diagram, then the firing rate will be reduced. This two-tone suppression demonstrates that the normal auditory system is nonlinear. If the system were linear, then the firing rate could only be unchanged or increased by the addition of an extra tone. Two-tone suppression is a characteristic of the normal ear and may be absent in the damaged ear. It is formally similar to lateral inhibition in vision, but it has a very different underlying cause. Lateral inhibition in vision is the result of neural mechanisms whereas two-tone inhibition is the result of mechanical processes inthe cochlea. 15 100 80 Regions for two-tone suppression 60 Test tone at Characteristic Frequency 40 20 log frequency 2.8 Cochlear implants Implants can be fitted to patients who are profoundly deaf (>90dB loss), who gain very little benefit from conventional hearing aids. In multi-channel implants, a number of bipolar electrodes are inserted into the cochlea, terminating at different places. Electrical current derived from band-pass filtering sound can stimulate selectively auditory nerve fibers near the electrode, giving some crude 'place' coding of frequency. The best patients' hearing is good enough for them to understand isolated words over the telephone, but there is a great deal of variation across patients, which may be partly due to the integrity of the auditory nerve and higher pathways. It is increasingly common to fit cochlear implants to profoundly deaf children, so that they gain exposure to spoken language. This move raises ethical issues, as well as social ones for the signing deaf community, some of whom oppose implants. 3 WHAT YOU SHOULD KNOW. You should understand the meaning of all the terms shown in italics. You should also be able to explain all the diagrams in this handout. If you do not understand any of the terms or diagrams, first try asking someone else in the class whom you think might understand. If you still don't, then ask me either in the lecture or afterwards. SECOND YEAR COURSE PERCEPTION Hearing Lecture notes (3): Introductory psychoacoustics 1. BASIC TERMS There is an importnat distinction between terms used to describe physical properties and those used to describe psychological properties. Psychological properties are usually influenced by many physical ones. Physical Intensity Level Frequency Spectrum Psychological Loudness Pitch Timbre 1.1. Absolute threshold Human listeners are most sensitive to sounds around 2-3kHz. Absolute threshold at these frequencies for normal young adults is around 0 dB Sound Pressure Level (SPL - level relative to 0.0002 dyne/cm2 ). Thresholds increase to about 50 dB SPL at 100 Hz and 10 dB SPL at 10 kHz. A normal young adult's absolute threshold for a pure tone defines 0 dB Hearing Level (HL) at that frequency. An audiogram measures an individual's threshold at different frequencies relative to 0dB HL. Normal ageing progressively increases thresholds at high frequencies (presbyacusis). A noisy environment will lead to a more rapid hearing loss (40 dB loss at 4kHz for a factory worker at age 35, compared with 20 dB for an office worker). The term Sensation Level (SL) gives the number of dB that a sound is above its absolute threshold for a particular individual. 2. FREQUENCY RESOLUTION AND MASKING Ohm's Acoustic Law states that we can perceive the individual Fourier components of a complex sound. It is only partly true since the ear has a limited ability to resolve different frequencies. Our ability to separate different frequencies in the ear depends on the sharpness of our auditory filters. The physiology underlying auditory filters is described in the previous Notes. The bandwidth of human auditory filters at different frequencies can be measured psychoacoustically in masking experiments (see below). The older literature refers to the width of an auditory filter at a particular frequency as the Critical Band . Sounds can be separated by the ear when they fall within a Critical Band, but they mix together when they do not. For example (and somewhat oversimplified!), only harmonics that are separated by more than a critical band can be heard out from a mixture; only noise that is within a critical band contributes to the masking of a tone. A simple demonstration of the bandwidth of noise that contributes to the masking of a tone is in the following band-limiting demonstration which is Demonstration 2 in the ASA "Auditory Demonstrations" CD. 16 17 • In silence, you can hear all ten 5dB steps of the 2000Hz tone. • In wide-band noise you can only hear about five because of masking. • As the bandwidth of the noise is decreased to 1000 Hz and then to 250 Hz there is no change, because your auditory bandwidth is narrower than these values. • When the bandwidth of the noise is decreased to 10 Hz, you hear more tone steps because the noise bandwidth is now narrower than the auditory filter and so less noise gets into the auditory filter to mask the tone. Measurement of auditory bandwidth with band-limited noise 1000 Hz Broadband Level 250 Hz frequency 2000 Hz Tone Noise Auditory bandwidth Noise bandwidth Detection mechanism The masked threshold of a tone is its level when it is just detectable in the presence of some other sound. It will of course vary with the masking sound. The amount of masking is the difference between the masked threshold and the abolute threshold. Generally, individuals with broader auditory filters (as a result of SNHL) show more masking. In Simultaneous masking the two sounds are presented at the same time. In Forward masking the masking sound is presented just before the test tone. It gives slightly different results from simultaneous masking because of non-linearities in the auditory system. Types of Masking Mask Target Forward Mask Target Backward Simultaneous In older studies of masking: Mask frequency and level are fixed Threshold level for the Target is measured at different frequencies In measuring Psychoacoustic Tuning Curves: Target frequency and level are fixed Threshold level for the Mask is measured at different frequencies 2.1.Psychophysical Tuning Curves A psychophysical method can be used to generate an analogy to the physiological frequency threshold curve for a single auditory fiber. A narrowband noise of variable center frequency is the masker, and a fixed frequency and fixed level pure tone at about 20 dB HL is the target. The level of masker is found that just masks the tone for different masker frequencies. Compare the following diagram with the FTC in the previous Notes. 100 80 60 40 Target frequency 20 800 1000 1200 600 Masker center frequency Using these techniques (and other similar ones) we can estimate the shape and bandwidth of human auditory filters at different (target) frequencies. The bandwidth values are shown in the next diagram. At 1kHz the bandwidth is about 130; at 5kHz about 650 Hz. 18 19 1000 800 600 400 200 0 0 2000 4000 6000 Center Frequency (Hz) 8000 Psychophysical tuning curves measured in people with SNHL often show increased auditory bandwidths at those frequencies where they have a hearing loss. 2.2. Excitation pattern Using the filter shapes and bandwidths derived from masking experiments we can produce the excitation pattern produced by a sound. The excitation pattern shows how much energy comes through each filter in a bank of auditory filters. It is analogous to the pattern of vibration on the basilar membrane. For a 1000 Hz pure tone the excitation pattern for a normal and for a SNHL listener look like this: Normal SNHL 1000 2000 center frequency of auditory filter The excitation pattern to a complex tone is simply the sum of the patterns to the sine waves that make up the complex tone (since the model is a linear one). We can hear out a tone at a particular frequency in a mixture if there is a clear peak in the excitation pattern at that frequency. Since people suffering from SNHL have broader auditory filters their excitation patterns do not have such clear peaks. Sounds mask each other more, and so they have difficulty hearing sounds (such as speech) in noise. 3. NON-LINEARITIES To a first approximation the cochlea acts like a row of linear overlapping band-pass filters. But there is clear evidence that the cochlea is in fact inherently non-linear (ie its non-linearity is not just a result of over-loading it at high signal levels). In a non-linear system the output to (a+b) is not the same as the output to (a) plus the output to (b). 3.1. Combination tones If two tones at frequencies f 1 and f2 are played to the same ear simultaneously, a third tone is heard at a frequency (2f1 -f2 ) provided that f1 and f2 are close in frequency (f2 /f1 < 1.2) and at similar levels. Combination tones are often absent in SNHL. 3.2. Two-tone suppression Two-tone suppression Mask 1000 Hz Target 1000 Hz a Less masking c Mask 1000 Hz + Suppressor 900 Hz b a Target 1000 Hz c In single auditory nerve recordings, the response to a just supra threshold tone at CF can be reduced by a second tone, even though the tone would itself have increased the nerve's firing rate. A similar effect is found in forward masking. The forward masking of tone a on tone c can be reduced if a is accompanied by a third tone b with a different frequency, even though b has no effect on c on its own. Two-tone suppression is often absent in SNHL. What you should know You should understand: what an auditory filter is and how it is measured; what an excitation pattern is and how it changes for those having a SNHL. You should know what is the evidence for non-linearities in human hearing. 20 SECOND YEAR COURSE PERCEPTION Hearing Lecture notes (4): Pitch Perception Definition: Pitch is the 'attribute of auditory sensation in terms of which sounds may be ordered on a musical scale'. 1. PURE TONES Pitch of pure tones is influenced mainly by their frequency, but also by intensity: high frequency pure tones go flat when played loud. The pitch of pure tones is probably coded by a combination of place and timing mechanisms: • Place mechanisms can explain diplacusis (same tone giving different pitches in the two ears) more easily than can timing mechanisms. • But timing theories based on phase-locked neural discharge appear to be needed in order to explain our ability to distinguish the frequencies of very short duration tones (whose place representation would be very blurred). • Timing theories could be the whole story for musical pitch since it deteriorates at high frequencies where phase locking is weak. (The highest note on the piano is around 4 kHz; higher notes lose their sense of musical pitch). For very high frequency tones (5-20kHz) you can tell crudely which is the higher in frequency, but not what musical note is being played. 2. COMPLEX TONES Structure. Almost all sounds that give a sensation of pitch are periodic. Their spectrum consists of harmonics that are integer multiples of the fundamental. The pitch of a complex periodic tone is close to the pitch of a sine wave at the fundamental. Helmholtz claimed that the pitch is heard at the fundamental since the fundamental frequency gives the lowest frequency peak on the basilar membrane. Fundamental =200 Hz Harmonic spacing = 200 Hz 1.0 4 3 2 1 0 1 -1 -2 -3 -4 Period = 1/200 s = 5ms time (t) 200 400 600 800 frequency (Hz) 2.1.Missing fundamental Seebeck (and later Schouten) showed that complex periodic sounds with no energy at the fundamental may still give a clear pitch sensation at the fundamental (cf telephone speech - the telephone acts as a high-pass filter, removing energy below about 300 Hz). 21 22 Harmonic spacing = 200 Hz 3 1.0 2 1 0 1 -1 -2 -3 Period = 1/200 s = 5ms time (t) 200 400 600 800 frequency (Hz) 2.2. Helmholtz's place theory Helmholtz suggested that the ear reintroduces energy at the fundamental by a process of distortion that produces energy at frequencies corresponding to the difference between two components physically present (i.e. at the harmonic spacing). Any pair of adjacent harmonics would generate energy at the fundamental. Helmholtz's explanation is wrong because: (i) a pitch at the fundamental is still heard in lowpass filtered masking noise that heavily masks the fundamental. (ii) a complex sound consisting of enharmonic frequencies (eg 807, 1007, 1207) gives a pitch that is slightly higher than the difference of 200. (iii) the distortion only occurs at high intensities but low intensities still give the pitch. 2.3. Schouten's timing theory Schouten proposed that the brain times the intervals between beats of the unresolved (see next diagram) harmonics of a complex sound, in order to find the pitch. Schouten's theory is wrong because: (i) pitch is determined more by the resolved than by the unresolved harmonics (ii) you can still hear a pitch corresponding to the fundamental when the two consecutive frequency components go to opposite ears. The following diagram shows the excitation pattern that would be produced on the basilar membrane separately by individual harmonics of a 200 Hz fundamental. Notice that the excitation patterns of the higher numbered harmonics are closer together than those of the low-numbered harmonics. This is because the filters have a bandwidth which is roughly a tenth of their center frequency (and so is constant on a log scale), whereas harmonics are equally spaced in frequency on a linear scale. More harmonics then get into a high-frequency filter than into a low-frequency one. The low-numbered harmonics are resolved by the basilar membrane (giving roughly sinusoidal output in their filters); but the high-numbered harmonics are not resolved. They add together in their filters to give a complex vibration which shows beats at the fundamental frequency. 23 resolved 800 600 400 unresolved 1600 base apex log frequency Output of 1600 Hz filter Output of 200 Hz filter 1/200s = 5ms 2 0.6 1 0.4 0.5 0.2 0 -1 -1.5 -2 1/200s = 5ms 1 0.8 1.5 -0.5 200 0 0 0.2 0.4 0.6 0.8 1 -0.2 0 0.2 0.4 0.6 0.8 1 -0.4 -0.6 -0.8 -1 2.4. Pattern recognition theories Goldstein's theory states that pitch is determined by a pattern recognition process on the resolved harmonics from both ears. The brain finds the bestfitting harmonic series to the resolved frequencies, and takes its fundamental as the pitch. Goldstein's theory accounts well for most of the data, but there is also a weak pitch sensation from periodic sounds which do not contain any resolvable harmonics or from aperiodic sounds that have a regular envelope (such as amplitude modulated noise). A theory such as Schouten's may be needed in addition to Goldstein's in order to account for such effects. Evidence for there being two separate mechanisms for resolved and unresolved harmonics is: • pitch discrimination and musical pitch labelling (eg A#) is much worse for sounds consisting of only unresolved harmonics; • comparison of pitches between two sounds one having resolved and the other unresolved harmonics is worse than comparison of pitches between two sounds both with unresolved harmonics. 3. WHAT YOU SHOULD KNOW You should know the evidence for and against the three different theories of pitch perception for complex tones, and the difference between place and timing mechanisms for the pitch of pure tones. SECOND YEAR COURSE PERCEPTION Hearing Lecture Notes (5): Binaural hearing and localization Possible cues to localization of a sound: • binaural time/intensity differences (inherently ambiguous); • pinna effects; • reverberation and intensity; • head movements. 1. PURE TONES 1.1. Rayleigh's duplex theory (only applies to azimuth , ie localization in horizontal plane) 1.1.1. Low frequency tones (<1500 Hz) localised by phase differences: • Phase locking present for low frequency tones (<4kHz). • Jeffress' cross-correlator gives possible neural model. • Maximum time difference between the ears is about 670 us corresponding to half a cycle at 1500 Hz (the upper limit for binaural phase sensitivity) • Onset time is different from the ongoing phase difference. Onset time differences are important for short sounds. 1.1.2. High (and low) frequency tones localised by intensity differences • Shadow cast by head greater at high (20 dB at 6 kHz) than low frequencies (3 dB at 500 Hz) i.e. head acts as a lowpass filter. • auditory nerve not phase-locked for high frequency tones (>4kHz). • phase differences are ambiguous for high frequency tones (>1500Hz) 1.2. Time/intensity trade The time/intensity trade is shown by titrating a phase difference in one direction against an intensity difference in the other direction. • Varies markedly with frequency of a sound. • Not due to a peripheral effect of intensity on nerve latency, since: • can get multiple images • optimally traded stimulus is distinguishable from untraded. 1.3 Cone of confusion Binaural cues are inherently ambiguous. The same differences can be produced by a sound anywhere on the surface of an imaginary cone whose tip is in the ear. For pure tones this ambiguity can only be resolved by head movements. 2. COMPLEX TONES 2.1. Timing cues As with pure tones, onset time cues are important for (particularly short) complex tones. But the use of other timing cues is different since high frequency complex tones can change in localization with an ongoing timing difference. The next diagram shows the output of an auditory filter at 1600 Hz to a complex tone with a fundamental of 200 Hz. The 1400, 1600 and 1800 Hz components of the complex pass through the filter and add together to give the complex wave shown in the diagram. The complex wave has an envelope that repeats at 200 Hz. Phase differences would not change the 24 localization of any of those tones if they were heard individually, but we can localize sounds by the relative timing of the envelopes in the two ears (provided that the fundamental frequency (envelope frequency) is less than about 400 Hz). Output of 1600 Hz filter to complex tone with a 200Hz fundamental frequency; right ear leading by 500 us. 1/200s = 5ms 2 1.5 1 0.5 0 -0.5 0 0.2 0.4 0.6 0.8 1 -1 -1.5 -2 Left Right ear leads by 500 us 2 1.5 1 0.5 0 -0.5 0 0.2 0.4 0.6 0.8 1 -1 -1.5 -2 Right 2.2. Pinna effects (mainly median plane) Pinna reflects high frequency sound (wavelength less than dimesnions of outer ear) with echoes whose latency varies with direction (Batteau). Reflections cause echoes which interfere with other echoes/direct sound to give spectral peaks and notches. Frequency of peaks and notches varies with direction of sound and are used to indicate direction in median plane. 2.3 Head movements Head movements can resolve the ambiguity of front-back confusions. 3. DISTANCE A distant sound will be quieter and have relatively more reverberation than a close sound. Increasing the proportion of reverberant sound leads to greater apparent distance. Lowpass filtering also leads to greater apparent distance; (high frequencies absorbed more by water vapour in air by up to about 3 dB/100 ft). If you know what a sound is, then you can use its actual timbre and loudness to tell its distance. 4. VISION Seen location easily dominates over heard location when the two are in conflict. 5. PRECEDENCE (OR HAAS) EFFECT In an echoic environment the first wavefront to reach a listener indicates the direction of the source. The brain suppresses directional information from subsequent sounds. 25 Since echoes come from different directions than the main sound, they may be ignored more easily with two ears. 6. BINAURAL EFFECTS A number of psychoacoustic phenomena demonstrate that we are only binaurally sensitive to the phase of a pure tone if its frequency is less than about 2 kHz. 6.1. Binaural beats Fluctuations in intensity and/or localisation when two different tones one to each ear (e.g. 500 + 504 Hz gives a beat at 4 Hz). Only works for low frequency tones < 1.5 kHz. 6.2. Binaural masking level difference (BMLD) When the same tone in noise is played to both ears, the tone is harder to detect than when one ear either does not get the tone, or has the tone at a different phase. Magnitude of effect declines above about 1 kHz, as phaselocking breaks down. Explained by Durlach's Equalization and Cancellation model. 6.3. Cramer-Huggins pitch If noise is fed to one ear and the same noise to the other ear but with the phase changed in a narrow band of frequencies, subjects hear a pitch sensation at the frequency of the band. Pitch gets rapidly less clear above 1500 Hz. (NB Can be explained by models of the BMLD effect if you think of the phase-shifted band as the 'tone'). WHAT YOU SHOULD KNOW You should be able to describe the different cues used to localize pure and complex tones. You should understand why phase-locking does not occur for high frequency pure tones, and why this important in localization and in other binaural effects. You should know what the BMLD is and how Durlach's model explains it. 26 SECOND YEAR COURSE PERCEPTION Hearing Lecture Notes (6): Auditory Object Recognition & Music Timbre Vowel sounds in speech differ in the relative amplitudes of their harmonics. A particular vowel has harmonics that have a greater amplitude near the formant frequencies. A formant is a resonant frequency of the vocal tract. As you change the pitch of a vowel, you change the fundamental frequency and the spacing of the harmonics, but the formant frequencies stay the same. If you change the vowel without changing the pitch of the voice, the fundamental and the harmonic spacing stay the same but the formant frequencies change. Here is the spectrum of the vowel in "bit" on a fundamental frequency of 125 Hz. F1 = 396Hz Formants F2 = 1520Hz F3 = 1940Hz 125 Hz (fundamental) frequency Musical instruments: The synthetic sounds produced by a simple keyboard synthesiser differ in: • the relative amplitude of their harmonics; • their attack time and decay time. For most synthesisers the relative amplitudes of the different harmonics stay constant throughout the sound. The sounds produced by a natural musical instrument are much more complex; the different harmonics start and stop at different times and change in relative amplitude throughout the "steady-state" of the note. Our ability to identify a natural musical instrument from another depends more on the attack (and decay) than the "steadystate". The nature of the attack and the relative amplitudes during the staedy-state are not constant for a particular instrument. They depend on the style of playing, where in the the range of the instrument the note is etc. Auditory Scene Analysis Ears receive waves from many different sound sources at the same time eg multiple talkers, or instruments, cars, machinery etc. In order to recognise the pitch and timbre of the sound from a particular source the brain must decide which frequencies "belong together" and have come from this source. The problem is formally similar to that of "parsing" a visual scene into separate objects. Principles enunciated by the Gestalt psychologists in vision are useful as heuristics for helping the decide what sounds will be grouped together: proximity, similarity, good continuation, common fate, all have auditory analogues. 27 The brain needs to group simultaneously (separating out which frequency components that are present at a particular time have come from the same sound source) and also successively (deciding which group of components at one time is a continuation of a previous group). Auditory streaming Auditory streaming is the formation of perceptually distinct apparent sound sources. Temporal order judgement is good within a stream but bad between steams. Examples include: • implied polyphony • noise burst replacing a consonant in a sentence. • click superimposed on a sentence or melody. Grouping Principles (i) Proximity • Tones close in frequency will group together, so as to minimise the extent of frequency jumps and the number of streams. • Tones with similar timbre will tend to group together. • Speech sounds of similar pitch will tend to be heard from the same speaker. • Sounds from different locations are harder to group together across time than those from the same location. • • (ii) Common fate Sounds from a common source tend to start and stop at the same time and change in amplitude or frequency together (vibrato). A single component is easy to hear out if it is the only one to change in a complex. (iii) Good continuation Abrupt discontinuities in frequency or pitch, can give the impression of a different sound source. Continuity Effect Sound that is interrupted by a noise that masks it, can appear to be continuous. Alternations of sound and mask can give the illusion of continuity with the auditory system interpolating across the mask. Music Perception Tuning Consonant intervals have harmonics that do not beat together to give roughness, i.e. at small integer frequency ratios: 2:1 (octave) 3:2 (fifth) 4:3 (fourth) 5:4 (major third). Unfortunately, a scale based on such intervals is not internally consistent and does not allow modulations. Equal temperament sacrifices some consonance in the primary intervals for an equal size of semitone (21/12), and so sounds equally in tune in any key. Absolute pitch About 1 person in 10,000 has "Absolute Pitch" - they can identify the pitch of musical note without the use of an external reference pitch. Most people can only give pitch names relatively - "if that is A this must be C". Absolute pitch is much more common in people who had musical training at an early age than among those who started later, and is probably more common in those whose early training involved learning the names of notes. It can be a liability, since pitch perception can change as you grow older, and international pitch standards also change. A more common absolute ability is the ability to tell when a piece of music is being played in the correct key. 28 Melody The pitch of a tone can be regarded as having chroma (musical note name) and height (which octave). Melodies are hard to recognise if only chroma is maintained (transposing notes by octaves). Overall contour is an important attribute of melody, and allows variation of chroma within a recognisable framework. 29