Download On the Naturalnessof Two-Channel Stereo Sound

PAPERS On the Naturalnessof Two-Channel Stereo Sound* GUNTHER THEILE Institut fiir Rundfunktechnik GmbH, D-8000 Miinchen 45, Germany Psychoacoustic principles are considered in order to enhance the naturalness of the sound image achievable in a conventional two-loudspeaker arrangement. It is found that simulation of depth and space are lacking when the coincident microphone and panpot techniques are applied. To obtain optimum simulation of spatial perspective it is important for the two-loudspeaker signals to have interaural correlation that is as natural as possible. This requirement is met by the so-called sphere microphone, used as a main microphone, associated with the room-related balancing technique, which generates artificial reflections and reverberation from spot-microphone signals. Music recordings confirm that the sphere microphone combines favorable imaging characteristics with regard to spatial perspective, accuracy of localization, and sound color; and that the room-related balancing technique is able to preserve this stereophonic quality. 0 INTRODUCTION , A particularly large number of studies have been published during the last few years with the goal of improving the capabilities of current stereophony. This applies to microphone techniques as well as to mixing and reproduction techniques, and major overall progress can be expected. In this paper possible developments in stereophonic recording technique are described, which may improve the "naturalness" of the stereophonic sound image in the playback room. First, how can the desired naturalness of the stereophonic sound image be defined? The simplest theorem would be: the reproduced sound image must correspond to the original sound image. This definition appears to be problematic because identity can definitely not be required, in principle, as a goal for optimizing the stereophonic transmission technique. Identity may conceivably be appropriate for head-referred stereophony, or perhaps for the reproduction of a speaker's voice through loudspeakers, but it is probably appropriate to a limited extent only for the reproduction of the sound of a large orchestra through loudspeakers. Aesthetic irregularities in the orchestra, poor conditions of room acoustics, as well as the necessity of creating a sound * Presented at the AES 9th International Conference, troit, MI, 1991February 1-2. J. AudioEng.Soc.,Vol.39,No.1O,1991October De- image "suitable for a living room"--in other words, the essential problems of loudspeaker stereophony-actually force a deviation from identity. The desired natural stereophonic sound image should therefore meet two requirements: it should satisfy aesthetically and it should match the tonal and spatial properties of the original sound at the same time. Both requirements will undoubtedly be contradictory in many situations. However, the compromise, namely, optimization by the sound engineer, will be the better, the more flexible the stereophonic recording technique is and the more accurately the psychoacoustic principles are understood and taken into account from the technical and artistic points of view. I STEREOPHONIC IMAGING OF SPACE Which stereophonic loudspeaker signals does the ear require so that a natural sound image is achieved? What kind of quality of stereophonic presentation of direction, distance, and spatial impression _ is possible, in principle, in the case of conventional two-channel loud- The term spatial impression comprises two attributes of the sound image [1], [2]. (1) reverberance (a temporal slurring of auditory events [2]), which is caused by late reflections and reverberation, and (2) auditory spaciousness (a spatial spreading of auditory events [2]), which is caused by early reflectionsin the range of 10-80-ms delay. 761 THEILE speaker reproduction? The following fundamental statements have been derived in earlier papers by means of the association model [3] for spatial hearing, 1) The distance of the phantom sound source [2] is equal to the (mean) distance from the two stereo loudspeakers. The spatial perspective can only be represented in the simulation plane between the loudspeakers in a manner similar to the perspective presentation in the visual area [4] (see Fig. 1). The real distance from the loudspeakers corresponds to the real distance from the picture, 2) The spatial perspective in the simulation plane is better achieved as the interaural signal differences during natural listening are imitated more accurately by the loudspeaker signal differrences [4]. Due to an inverse filtering process postulated in [3]- [5], the auditory system recognizes the relations between the left and right loudspeaker signals independent of the binaural crosstalk and evaluates them according to the listening experience, Thus, in principle, optimum presentation of direction, distance, and space in the simulation plane is made possible by the stereophonic signal differences generated by a dummy head [4]. A dummy-head signal which produces the head-referred three-dimensional perception of space during headphone reproduction generates, PAPERS during loudspeaker reproduction,2 an equivalent loudspeaker-referred presentation of the spatial perspective in the simulation plane, which is comparable to the spatial perspective of a picture. To verify this important statement, suitable experiments can be carried out. When the dummy-head signals are compared in a listening test with stereophonic signals which do not provide the head-specific interaural signal differences with sufficient accuracy, a relatively high degree of sensitivity of the ear to such interaural "irregularities" is noted during playback through headphones. The quality of the perceived spatial image suffers in some way and to some degree. The result of the same listening comparison during playback through loudspeakers is both surprising and impressive: the quality of the perceived spatial image (in the simulation plane) suffers in a similar way and to almost the same degree. Two examples are shown in Fig. 2. 1) A dummy head recording of a sound source [such as a loudspeakerlocatedon the right side of the dummy head; see Fig. 2(a)] will produce a correspondent image on the right side of the headphone listener and, in the case of loudspeaker reproduction, a sharp image located close to the right loudspeaker, due to maximum magnitudes of the interaural signal differences of the dummy head (case A). When the maximum natural interaural time difference of 0.74 ms is enlarged to the unnatural value of about 1 ms by means of a delay device (case B), the stereophonic quality drops distinctly. This is true even in the case of playback through loudspeakers. The sound event appears in a more blurred and vague manner in the case of the unnatural interaural time difference of 1 ms. When comparing a dummy-head recording A and a coincident-microphone recording B of an orchestra by means of headphones, the superiority of the dummy head is obvious. The dummy-head signal produces a head-referred natural spatial impression, but the coincident-microphone signal produces a poor spatial impression. It is important that a corresponding stereophonic quality difference can be observed in the case of loudspeaker reproduction: the dummy-head signal generates a loudspeaker-referred presentation of the spatial perspective in the simulation plane (according to Fig. 1), but the coincident-microphone signal produces a fiat distribution of sound sources between the two loudspeakers in front of the listener without simulating spatial perspective. The coincident-microphone signal, which does not provide any head-specific interaural signal difference, fails not only in generating a head-referred presentation of the authentic spatial impression and depth, but also in generating a loud- Fig. 1. The distance of this picture can be compared with the distance of stereo loudspeakers. The visual perspective, which is simulated by applying phenomena of spatial vision, can be compared to the stereophonic perspective, which can be simulated by applying corresponding phenomena of spatial hearing, 762 2 Loudspeaker reproduction does not include the technique of biphonal reproduction, that is, loudspeaker reproduction techniques that aim to simulate headphone reproduction by compensating the interaural crosstalk portions (a survey is found in [6]). The biphonal reproduction methods cannot be considered as a possibility to improve the capability of loudspeaker stereophony because the listening area is always minimal. J. Audio Eng.Soc.,Vol.39,No.10,1991 October PAPERS TWO-CHANNEL STEREO SOUND speaker-referred simulation of the spatial impression and depth, Summarizing loudspeaker stereophony according to the association model is based on introducing corresponding physical attributes of the ear signals (which correlate with phenomena of natural spatial hearing) into the stereophonic signals [4]. This is contradictory to summing localization theories, which attempt to introduce them into the resulting ear signals of the listener. On the basis of summing localization theories it is even today tried to assess stereophonic techniques (see, for example [7], [8]). As a recent example, Lipshitz has concluded that coincidence-microphone techniques are most advantageous for getting a natural spatial impression [7]: requirement is not met by pure intensity or time stereophony, and the stereophonicquality is not advantageous with respect to depth and space imaging (intensity stereophony) or localization (time stereophony) in comparison to dummy-head, OSS, or ORTF techniques, as found in practical comparison tests on the performance of the main microphones in different concert halls [9]. On this basis, there are possibilities for optimizing the stereophonic presentation of direction, distance, and spatial impression through two loudspeakers. The so-called sphere-microphone [10] and the room-related balancing techniques [11]-[13] represent appropriate optimization approaches for the recording end. I believe that spaced-microphone recording techniques are fundamentally flawed, although highly regarded in some quarters, and that coincident-microphone recordings are the correct way to go. His arguments are based on an analysis of the resulting, interaural characteristics of the listener's ear signal, according to the principle of summing localization: The level and time (or phase) differences at the listener's ears are not the same as those at the loudspeakers .... It is important that, as far as possible, the two loudspeaker signals combine at the listener's ears to produce cues which are compatible with natural hearing, However, natural interaural attributes of the listener's ear signals can only be obtained by using the dummyhead technique (head-referred imaging). In contrast, conventional two-loudspeaker stereophony is a loudspeaker-referred imaging technique, and it is important that, as far as possible, the two loudspeaker signals contain natural interaural attributes rather than the resultant listener's ear signals in the playback room. This 2 SPHERE [ In [ 10] a microphone system has been proposed where two boundary-layer microphones are placed on the sides, of a sphere, as shown in Fig. 3. This so-called sphere microphone produces stereophonic signals which are composed of natural interaural differences, quite similar to dummy-head signals, as required in Sec. 1. However, in contrast to the dummy head, it has a linear frontal frequency response (see Fig. 4, upper curve). The sphere-microphone signal does not contain those dummy-head-specific spectral cues, which are used for front-back orientationduring headphonelistening [2], [5] (Fig. 4, lower curve), but which are not used during loudspeaker-referred presentation and would therefore cause coloration problems [10]. The frequency responses of the sphere microphone Schoeps KFM 6U are plotted in Fig. 5. They are linear for sound reaching the sphere from the front (0°), and orchestra i I dummysoundI I I head <lc reproduction I I microphone I I reproduction (a) J. Audio Eng. Soc., Vol. 39, No. 10, 1991 October head I reproduction Fig. 2. Two examples for demonstrating ,_ I dummy coincidence I source I x MICROPHONE reproduction 1 (b) stereophonic quality differences in headphone and loudspeaker reproduction. 763 THEILE PAPERS the sum of left and right energy is frequency independent use to produce a stereo image of outstanding for sound sources moving toward the side. Also, the frequency response is linear for the integrated resultant of sounds reaching the sphere from any angle in the reverberant (diffuse) field [3]. The choice of pressure capsules sharply reduces sensitivity to air motion, while ensuring frequency response to the lower limit of human hearing (see Fig. 5). The sphere microphone fuses advantageous features of the systems already in general and spatial integrity, combined with excellent soundcolor neutrality and low-frequency response. At present the sphere microphone Schoeps KFM 6U (Fig. 6) is being tested in different situations and compared with other main microphones. First results confirm that the sphere microphone in fact combines favorable imaging characteristics with respect to spatial perspective, accuracy of localization, and sound color. Ironlal incidence naturalness lateral incidence +'-'-;..0,os,u,o zo°e 4 Qnt ntry 11 "°°w-Fig. 3. Principal function of sphere microphone. spheremicrophone . 2 5 dB ?100 125160200250320t*005006308001k 1,251,6 2 2.5 3.2 t. 5 6.3 g 10 12,516k HZ = Fig. 4. Frequency response of sphere microphone and dummy head (0° free field minus diffuse field). "', "ri_" L_ .-_-+_-._.]_1 _ :]-_T_4--t-.-'f.,t+kl-k -_-- ,_-l-_- -_L_'_44-_-_ __:=_-_-_ff--f--_i _ ! I I H Ill ! _]_L]__.,""""""""_ !17 ÷ _' -L4 , -- Fig. 5. Frequency response of sphere microphone (Free field 20 °, 40 °, 60°). Upper curves--right; curve--sum of left and right energies. 764 ' t_,+--t--i+_-_-,'--+_,'1lower curves--left; middle J. Audio Eng. Soc., Vol. 39, No. 10, 1991 October PAPERS TWO-CHANNEL 3 ROOM-RELATED BALANCING \ 3 Balancing gain is the level of the balancing signal with reference to the balancing signal's threshold level, which is measured at the threshold of perception of the balancing signal in the main-microphone signal. Fig. 6. Sphere microphone KFM 6U (Schoeps), suspended, sound --first reflections and reverberation- o_T-'] .......... l................. -20 O SOUND for example, [14]). However, those techniques are not satisfying, because the stereophonic quality of the direct sound is not improved by this method. In practice a delay compensation leads to "notching" effects, which are particularly disturbing when the musicians move about near the spot microphone. Experiments have shown [13] that pure panpot balancing can even be preferred in comparison to delay-compensated panpot balancing, depending on the recording situation and the desired balancing gain. 3 To preserve the perception of spatial perspective due to the main-microphone signal and, at the same time, achieve a high balancing gain, the spot-microphone signal should be delayed much more than necessary for the compensation, so as to fall within the region of the early reflections. It has been proposed to achieve the desired increase in volume by adding sound energy from artificially generated reflections [11]. This socalled room-related balancing technique has been tested and optimized recently [13] with the aid of an appropriate audioprocessing unit [15]. It was found that a loss in depth can be avoided satisfactorily by generating just two artificial reflections from the spot-microphone signal (according to Fig. 8), and that the greater the required balancing gain, the more the room-related balancing technique is favorable. Principally, the room-related balancing algorithm could be implemented into digital mixing desks so that it could be used alternatively to conventional panpot balancing. However, first of all further optimization is useful to minimize the signal processing effort and to introduce improvements, such as distance equalization (taking into account changes in spectrum, such as by absorbtion at the room boundaries), additional artificial reverberation (generated from the spot-microphone signal in accordance with the artificial reflections), and so on. Theoretical considerations [11] and practical tests [13] show that spot-microphone signals can be added to the sphere-microphone signal without disturbing the perception of spatial perspective. A disturbing effect occurs in the case of conventional panpot balancing, as demonstrated in Fig. 7. The signal picked up by a spot microphone is reproduced earlier than the corresponding main-microphone signal. Thus the ear interprets the spot-microphone signal as the direct sound [11]i [14], and due to the precedence effect, the favorable imaging characteristics of the sphere microphone (or any appropriate main microphone) are lost. Such recordings sound flat, without spatial depth, It is common practice to moderate this space-disturbing effect by artificial reverberation or by compensating the delay of the main-microphone signal (see, direct STEREO ............._ii_ I from-- "spot microphone" ! I lrom "main microphone_---_,,- [ I/ ·_ ' _-2o I : _ ' I ,i / _ 5 10 15 0,5 L-_-directional imaging (phantom sound sources) Fig. 7. Panpot balancing: main-microphone J. Audio Eng. Soc., Vol. 39, No. 10, 1991 October : ii T--- - -- ! ii I 20 25 30 35 40 45 50 ms delay of reflections signal plus spot-microphone " signal without time delay. 765 THEILE PAPERS 4 SUMMARY of Human Sound Localization (M.I.T. Press, Cambridge, MA, 1983). [3] G. Theile, "Zur Theorie der optimalen Wiedergabe von stereofonen Signalen fiber Lautsprecher und Kopfh6rer" (On the Theory of Optimum Reproduction of Stereophonic Signals through Loudspeakers and Earphones), Rundfunktechn. Mitt., vol. 25, pp. 155-170 ( 1981). [4] G. Theile, "On the Stereophonic Imaging of Natural Spatial Perspective via Loudspeakers: Theory," in Perception of Reproduced Sound (1987), pp. 135146. [5] G. Theile, "On the Standardization of the Frequency Response of High Quality Studio Headphones," J. Audio Eng. Soc., vol. 34, pp. 956-969 (1986 Dec.). [6] G. Steinke, "Stand und Entwicklungstendenzen der Stereofonie" (Current Situation and Development Trends in Stereophony), pts. 1 and 2, Tech. Mitt. RFZ, vol. 28, pp. 1-10, 25-32 (1984). [7] S. P. Lipshitz, "Stereo Microphone Techniques . . . Are the Purists wrong?" J. Audio Eng. Soc. (Features), vol. 34, pp. 716-744 (1986 Sept.). [8] J. C. Bennett, K. Barker, and F. O. Edeko, "A New Approach to the Assessment of Stereophonic Sound System Performance," J. Audio Eng. Soc., vol. 33, pp. 314- 321 (1985 May). [9] M. W6hr and B. Nellesen, "Untersuchungen zur Wahl des Hauptmikrofonverfahrens" (Studies on the Selection of the Main Microphone Method), in Proc. 14th Audio Engineers' Conf., pp. 106-120 (1986). [ 10] G. Theile, "Das Kugelfl_ichenmikronfon" (The Sphere Microphone), in Proc. 14th Audio Engineers' Conf., pp. 277-293 (1986). [11] G. Theile, "Hauptmikrofone und Stfitzmikrofone--neue Gesichtspunkte for ein bew_ihrtes Aufnahmeverfahren" (Main Microphone and Spot Microphones--New Aspects for a Proven Recording Technique), in Proc. 13th Audio Engineers' Conf., pp. 170-184 (1984). [12] M. W6hr, J. Goeres, C. P6sselt, and G. Theile, "Raumbezogene Sttitztechnik--M6glichkeit zur Optimierung der Aufnahmequalit_it" (Room-Related Sup- It can be concluded that current two-channel stereophony recording techniques can be improved with regard to the naturalness of the stereophonic sound image as defined. A consistent consideration of new knowledge and understanding with regard to psychoacoustic principles, particularly with regard to spatial hearing, leads to the principal result that the two-channel stereo presentation of direction, distance, and space is only possible as a presentation of spatial perspective in the simulation plane between the loudspeakers. This has the effect of creating a natural two-dimensional image of a three-dimensional space, We consider that an optimization of the techniques for simulating spatial perspective is achievable by using natural interaural signal differences instead of pure intensity or time differences. The room-related recording technique is consequently based on this knowledge, This technique can be applied to the well-tried main/ spot-microphone methods, which means that the optimum main microphone is the sphere microphone or a corresponding microphone generating natural interaural signal differences, and that any spot-microphone signal added to the main-microphone signal represents additional artificial reflections from the recording room. Since the generation of natural interaural signal differences and the simulation of artificial reflections and reverberation from a spot-microphone signal is possible with the aid of modern computer technology, a roomrelated recording technique can also be applied in principle to polymicrophony. This would result in astereophonic presentation of any artificially created space in the simulation plane. 5 REFERENCES [1] W. Kuhl, "R_umlickeit als Komponente des H6reindrucks" (Spaciousness as Component of the Auditory Impression)," Acustica, vol. 40, pp. 167181 (1978). [2] J. Blauert, Spatial Hearing--The Psychophysics directsound i firstsound -20 artificial reflection °Jt ,, I]", artificial 11 I I I I / _ 5 I0 15 0_.__ directional imaging Fig. 8. Room-related balancing: I, II I 20 T I I reflection 2 artificial reverberation 'I tl/t 3, 11, I 766 I ,, / -- dB reverberation II I II II I 25, 30 315 delay of the reflections main-microphone I I 40 45 i 50 ms signal plus artificial reflections and reverberation. J. Audio Eng.Soc.,Vol.39,No.10,1991October PAPERS TWO-CHANNEL STEREO SOUND porting Technique--Possibility for Optimizing the Recording Quality), in Proc. 15th Audio Engineers' Conf., pp. 302-315 (1988). [13] M. W6hr, G. Theile, H. J. Goeres, and A. Persterer, "Room-Related Balancing Technique--A Method for Optimizing Recording Quality," J. Audio Eng. Soc., vol. 39, pp. 623-631 (1991 Sept.). [14] J. Jecklin, Musikaufnahmen: Grundlagen, p. 55. [15] F. Richter and A. Persterer, "Design and Applications of a Creative Audio Processor," presented at the 86th Convention of the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. 37, p. 398 (1989 May), preprint 2782. Technik, Praxis (Music Recordings: Technique, Practice). Franzis-Verlag, The biography for Giinther Theile was published in the 1991 September issue of this Journal. J. Audio Eng. Soc., Vol. 39, No. 10, 1991 October Foundations, 1980), Munich 767

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download On the Naturalnessof Two-Channel Stereo Sound