Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
What can we hear? James D. Johnston home.comcast.net/~retired_old_jj [email protected] 3/23/2002 Copyright James D. Johnston 2003. Permission granted for any educational use. 1 How do our ears work? And what can we detect in a natural soundfield? 3/23/2002 Copyright James D. Johnston 2003 2 How One Ear Works - Short Form! We aren’t going to talk about binaural today. (this being the short-form of what should occupy a semester’s examination and discussion) The ear is usually broken into 3 separate parts, the outer, middle, and inner ears. The outer ear consists of the head, the pinna, and the ear canal. The middle ear consists of the eardrum, the 3 small bones, and the connection to the cochlea. Finally, the inner ear consists of the cochlea, containing the organ of corti, basilar membrane, tectoral membrane, and the associated fluids and spaces. 3/23/2002 Copyright James D. Johnston 2003 3 The Outer Ear The outer ear provides frequency directivity via shadowing, shaping, diffraction, and the like. It is different (by enough to matter) for different individuals, but can be summarized by the “Head Related Transfer Functions” (HRTF’s) or “Head Related Impulse Responses”(HRIR’s) mentioned in the literature, at least on the average or for a given listener. The HRTF’s or HRIR’s are ways of determining the effect on a sound coming from a given direction to a given ear. The ear canal inserts a 1 octave or so wide resonance at about 1 to 4 kHz depending on the individual. 3/23/2002 Copyright James D. Johnston 2003 4 The Middle Ear The middle ear carries out several functions, the most important of which, for levels and frequencies that are normally (or wisely) experienced, is matching the impedence of the air to the fluid in the cochlea. There are several other functions related to overload protection and such, which are not particularly germane under comfortable conditions. The primary effect of the middle ear is to provide a 1-zero high pass function, with a matching pole at approximately 700Hz or so, depending on the individual. 3/23/2002 Copyright James D. Johnston 2003 5 The Inner Ear A complicated subject at best, the inner ear can be thought of as having two membranes, each a travelling wave filter, one a high-pass, and the other a low-pass filter. Between the two membranes are two sets of hair cells, the inner hair cells, and the outer hair cells. The inner hair cells are primarily detectors. They fire when the movement of the two membranes are different. The outer hair cells are primarily a system that controls the exact points of the very steep low pass filters and high pass filters. The outer hair cells can polarize and depolarize, and change both their length and stiffness. This polarization is how they affect the relative tunings of the two membranes. 3/23/2002 Copyright James D. Johnston 2003 6 Outer Hair Cells Fully Polarized frequency Outer Hair Cells Fully Depolarized 3/23/2002 Copyright James D. Johnston 2003 7 An example (not a human subject) 3/23/2002 Copyright James D. Johnston 2003 8 The exact magnitude and shape of those curves are under a great deal of discussion and examination, but it seems clear that, in fact, the polarization of the outer hair cells creates the compression exhibited in the difference between applied intensity (the external power) and the internal loudness (the actual sensation level experienced by the listener). There is at least 60dB of compression available. Fortunately, the shape of the resulting curve does not change very much, except at the tails, between the compressed and uncompressed state, leading to a set of filter functions known as the cochlear filters. 3/23/2002 Copyright James D. Johnston 2003 9 Critical Bands and Cochlear Filters The overall effect of this filter structure is time/frequency analysis of a particular sort, called critical band (Bark Scale) or effective rectangular bandwidth (ERB) filter functions. Note that this is not a set of filters, but rather a continuous set of filters, with lower and higher bandwidths varying according to the center frequency. Roughly speaking, critical bandwidths are about 100Hz up to 700Hz, and 1/3 octave thereafter. ERB’s are usually a bit narrower, especially at higher frequencies. 3/23/2002 Copyright James D. Johnston 2003 10 A discussion of which is right, and which should be used, is, by itself, well beyond the range of a one-hour seminar. The basic point that must come out of this discussion is that the sound arriving in an ear will be analyzed in something approximating 100Hz bandwidth filters at low frequencies, and at something like 1/3 octave bandwidths at higher frequencies, and that the system will detect either the signal waveform itself (below 500Hz) or the signal envelope (above 4000 Hz), or a bit of both (in the range between 500Hz and 4000 Hz). Exactly what is detected is likewise, by itself, well beyond a one hour seminar, and furthermore, a consensus is yet to emerge. 3/23/2002 Copyright James D. Johnston 2003 11 3/23/2002 Copyright James D. Johnston 2003 12 For a given cochlear filter bandwidth, there is a corresponding time width of the main lobe of the filter. For the auditory system, these filter lengths vary approximately by a factor of 40:1, from the range of 10 milliseconds down to .25 millisecond. This means that at low frequencies, the time resolution available to the ear is quite poor, but that at high frequencies, it is quite accurate, on the order of a dozen or so samples at 48kHz. Over any time extent longer than this, the ear, due to the compression effects of the ear, can not be considered a linear transducer. This can create problems, such as pre-echo, in filterbanks or even in simple filters under some situations. 3/23/2002 Copyright James D. Johnston 2003 13 2.25kHz filter 750Hz Filter 3/23/2002 Copyright James D. Johnston 2003 14 Schematic Cochlea Audio In HF F I L T E R S . . . . LF Feedback 3/23/2002 Copyright James D. Johnston 2003 D E T E C T O R S Auditory Nerve CNS Feedback 15 How about the detectors? • Below 500Hz, the detectors fire on the positive going edge of the filtered waveform. • Above 2kHz, the detectors fire synchronously with the ENVELOPE of the filtered waveform • Between 500Hz and 2kHz, the detectors function on a mix of the two mechanisms. 3/23/2002 Copyright James D. Johnston 2003 16 What does this mean, in practical terms. • Below 500Hz, distorting the waveform itself, and moving zero-crossings of the filtered waveform (to to distortion, phase shifts, etc) will be audible. • Above 2kHz, the same effects happen on the signal envelope. Again, phase shifts can radically change the signal envelope, as can distortions. • Between 500Hz and 2kHz, both mechanisms will operate to some extent, with each favored toward its end of the frequency spectrum. 3/23/2002 Copyright James D. Johnston 2003 17 So? • At low frequencies, don’t change zero crossings or the signal waveform. • At high frequencies, don’t change the signal envelope. • Things like jitter, distortions, and phase shift can cause either of these problems. 3/23/2002 Copyright James D. Johnston 2003 18 What are the hard level limits? • The atmosphere, due to the discrete nature of air molecules, has a noise level. At the eardrum, it is approximately white noise at a level of 6dB SPL. • The ear’s lowest detection level is about -6dB SPL, which nearly matches the energy in the critical band near the ear canal resonance due to basic atmospheric noise. 3/23/2002 Copyright James D. Johnston 2003 19 Fletcher’s loudness plot goes here. (From Fletcher) 3/23/2002 Copyright James D. Johnston 2003 20 What about the loud end of things? • Anything above 120dB SPL is bad for the auditory system. • Anything above 140dB SPL is in a regime where the atmosphere is very nonlinear. Some signals (percussion, natural sounds, shuttle takeoffs) may reach these levels. • More than 70-80dB of instantaneous dynamic range across frequency in a 20 millisecond period is approximately the largest spectral tilt that is audible. 3/23/2002 Copyright James D. Johnston 2003 21 What does extreme loudness mean? • 194dB SPL in a sine wave represents a sine wave that goes from zero to two atmospheres. This can not be physically realized. • Above that level, the proper term is “shock wave”, as the air is propagating in a very nonlinear fashion. • 32 bits of uniform PCM dynamic range takes us from the noise level of the atmosphere (6dB SPL) to 198dB SPL, or 4dB above 1 atmosphere. This level is usually experienced in catastrophic military situations. 3/23/2002 Copyright James D. Johnston 2003 22 What about high and low frequencies? • Frequencies in the lowest audio octave are sensed substantially by the body. The hearing apparatus has a high-pass filter, which is fortunate, because otherwise the “weather” would be deafeningly loud. • 20kHz is not a firm “cutoff” for human hearing. Children appear to hear above 20kHz, as do some teens who haven’t been noise-exposed. • Age and noise exposure reduce high-frequency hearing ability. • At high power levels, ultrasonic signals are perceived on the skin. These levels are approached in sonar and the like, however the only musical occurrences may be from percussion, and at a close range. 3/23/2002 Copyright James D. Johnston 2003 23 What about nonlinear effects? • The ear analyzes on a time-scale much like that of the cochlear filters. If a long-term signal or signal-processing process is longer than the shortest cochlear filter, the effects of the nonuniform time/frequency scaling and detection must be considered. 3/23/2002 Copyright James D. Johnston 2003 24 An extreme example, pre-echo in audio codecs. 3/23/2002 Copyright James D. Johnston 2003 25 A potential, but unproven, issue with pre-echo. 3/23/2002 Copyright James D. Johnston 2003 26 Some conclusions • Audible effects must be considered as analyzed by critical band filters. These filters determine both time and frequency sensitivity to artifacts. • Altering waveform at low frequencies, or signal envelope at high frequencies, will create audible differences. 3/23/2002 Copyright James D. Johnston 2003 27 • 0dB SPL is a more than reasonable minimum level for presentations. More low-level response is only useful before the ear is involved. • Recording engineers may meet levels peaking above 150dB or so, but they may not be either accurately recordable or reproducible. • 20kHz is a reasonable limit for adult human beings, but is not a “hard limit”. An young individual may be able to hear above 20kHz. Other sensory modes are not generally active at high frequencies at levels that we hope to be exposed to. 3/23/2002 Copyright James D. Johnston 2003 28 • A variety of nonlinear effects may create audible differences due to small time or frequency changes in signals. • In general, the farther removed from the original frequency that an artifact occurs, the more audible it will be, if it creates sensation or changes sensation at a point where signal energy on the basilar membrane is lower. 3/23/2002 Copyright James D. Johnston 2003 29