Download On the Naturalnessof Two-Channel Stereo Sound

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Audio crossover wikipedia , lookup

Videocassette recorder wikipedia , lookup

Music technology (electronic and digital) wikipedia , lookup

Signal Corps (United States Army) wikipedia , lookup

Battle of the Beams wikipedia , lookup

Loudspeaker wikipedia , lookup

Home cinema wikipedia , lookup

HD-MAC wikipedia , lookup

Cellular repeater wikipedia , lookup

Surround sound wikipedia , lookup

Analog television wikipedia , lookup

Dynamic range compression wikipedia , lookup

Mixing console wikipedia , lookup

High-frequency direction finding wikipedia , lookup

Transcript
PAPERS
On the Naturalnessof Two-Channel
Stereo Sound*
GUNTHER
THEILE
Institut fiir Rundfunktechnik GmbH, D-8000 Miinchen 45, Germany
Psychoacoustic principles are considered in order to enhance the naturalness of the
sound image achievable in a conventional two-loudspeaker arrangement. It is found
that simulation of depth and space are lacking when the coincident microphone and
panpot techniques are applied. To obtain optimum simulation of spatial perspective it
is important for the two-loudspeaker signals to have interaural correlation that is as
natural as possible. This requirement is met by the so-called sphere microphone, used
as a main microphone, associated with the room-related balancing technique, which
generates artificial reflections and reverberation from spot-microphone signals. Music
recordings confirm that the sphere microphone combines favorable imaging characteristics
with regard to spatial perspective, accuracy of localization, and sound color; and that
the room-related balancing technique is able to preserve this stereophonic quality.
0 INTRODUCTION
,
A particularly large number of studies have been
published during the last few years with the goal of
improving the capabilities of current stereophony. This
applies to microphone techniques as well as to mixing
and reproduction techniques, and major overall progress
can be expected. In this paper possible developments
in stereophonic
recording technique are described,
which may improve the "naturalness" of the stereophonic sound image in the playback room.
First, how can the desired naturalness of the stereophonic sound image be defined? The simplest theorem
would be: the reproduced sound image must correspond
to the original sound image. This definition appears to
be problematic because identity can definitely not be
required, in principle, as a goal for optimizing the stereophonic transmission
technique. Identity may conceivably be appropriate for head-referred stereophony,
or perhaps for the reproduction of a speaker's voice
through loudspeakers, but it is probably appropriate to
a limited extent only for the reproduction of the sound
of a large orchestra through loudspeakers.
Aesthetic
irregularities
in the orchestra, poor conditions of room
acoustics,
as well as the necessity
of creating
a sound
* Presented at the AES 9th International Conference,
troit, MI, 1991February 1-2.
J. AudioEng.Soc.,Vol.39,No.1O,1991October
De-
image "suitable for a living room"--in
other words,
the essential problems of loudspeaker stereophony-actually force a deviation from identity. The desired
natural stereophonic sound image should therefore meet
two requirements:
it should satisfy aesthetically and it
should match the tonal and spatial properties of the
original sound at the same time.
Both requirements will undoubtedly be contradictory
in many situations. However, the compromise, namely,
optimization by the sound engineer, will be the better,
the more flexible the stereophonic recording technique
is and the more accurately the psychoacoustic principles
are understood and taken into account from the technical
and artistic points of view.
I STEREOPHONIC
IMAGING OF SPACE
Which stereophonic loudspeaker signals does the ear
require so that a natural sound image is achieved? What
kind of quality of stereophonic presentation of direction,
distance, and spatial impression _ is possible, in principle, in the case of conventional two-channel loud-
The term spatial impression comprises two attributes of
the sound image [1], [2]. (1) reverberance (a temporal slurring
of auditory events [2]), which is caused by late reflections
and reverberation, and (2) auditory spaciousness (a spatial
spreading of auditory events [2]), which is caused by early
reflectionsin the range of 10-80-ms delay.
761
THEILE
speaker reproduction?
The following
fundamental
statements have been derived in earlier papers by means
of the association model [3] for spatial hearing,
1) The distance of the phantom sound source [2] is
equal to the (mean) distance from the two stereo loudspeakers. The spatial perspective can only be represented in the simulation plane between the loudspeakers
in a manner similar to the perspective presentation in
the visual area [4] (see Fig. 1). The real distance from
the loudspeakers corresponds to the real distance from
the picture,
2) The spatial perspective in the simulation plane is
better achieved as the interaural signal differences during
natural listening are imitated more accurately by the
loudspeaker signal differrences [4]. Due to an inverse
filtering process postulated in [3]- [5], the auditory
system recognizes the relations between the left and
right loudspeaker signals independent of the binaural
crosstalk and evaluates them according to the listening
experience,
Thus, in principle, optimum presentation of direction,
distance, and space in the simulation plane is made
possible by the stereophonic signal differences generated
by a dummy head [4]. A dummy-head signal which
produces the head-referred three-dimensional
perception
of space during headphone reproduction
generates,
PAPERS
during loudspeaker reproduction,2 an equivalent loudspeaker-referred presentation of the spatial perspective
in the simulation plane, which is comparable to the
spatial perspective of a picture.
To verify this important statement, suitable experiments can be carried out. When the dummy-head signals
are compared in a listening test with stereophonic signals
which do not provide the head-specific interaural signal
differences with sufficient accuracy, a relatively high
degree of sensitivity of the ear to such interaural "irregularities" is noted during playback through headphones. The quality of the perceived spatial image suffers in some way and to some degree. The result of the
same listening comparison during playback through
loudspeakers is both surprising and impressive:
the
quality of the perceived spatial image (in the simulation
plane) suffers in a similar way and to almost the same
degree. Two examples are shown in Fig. 2.
1) A dummy head recording of a sound source [such
as a loudspeakerlocatedon the right side of the dummy
head; see Fig. 2(a)] will produce a correspondent image
on the right side of the headphone listener and, in the
case of loudspeaker reproduction, a sharp image located
close to the right loudspeaker, due to maximum magnitudes of the interaural signal differences of the dummy
head (case A). When the maximum natural interaural
time difference of 0.74 ms is enlarged to the unnatural
value of about 1 ms by means of a delay device (case
B), the stereophonic quality drops distinctly. This is
true even in the case of playback through loudspeakers.
The sound event appears in a more blurred and vague
manner in the case of the unnatural interaural time
difference of 1 ms.
When comparing a dummy-head recording A and a
coincident-microphone
recording B of an orchestra by
means of headphones, the superiority of the dummy
head is obvious. The dummy-head
signal produces a
head-referred
natural spatial impression, but the coincident-microphone
signal produces a poor spatial
impression. It is important that a corresponding
stereophonic quality difference can be observed in the case
of loudspeaker reproduction:
the dummy-head signal
generates a loudspeaker-referred
presentation
of the
spatial perspective in the simulation plane (according
to Fig. 1), but the coincident-microphone
signal produces a fiat distribution of sound sources between the
two loudspeakers in front of the listener without simulating spatial perspective.
The coincident-microphone
signal, which does not provide any head-specific interaural signal difference, fails not only in generating
a head-referred
presentation
of the authentic spatial
impression and depth, but also in generating a loud-
Fig. 1. The distance of this picture can be compared with
the distance of stereo loudspeakers. The visual perspective,
which is simulated by applying phenomena of spatial vision,
can be compared to the stereophonic perspective, which can
be simulated by applying corresponding phenomena of spatial
hearing,
762
2 Loudspeaker reproduction does not include the technique
of biphonal reproduction, that is, loudspeaker reproduction
techniques that aim to simulate headphone reproduction by
compensating the interaural crosstalk portions (a survey is
found in [6]). The biphonal reproduction methods cannot be
considered as a possibility to improve the capability of loudspeaker stereophony because the listening area is always
minimal.
J. Audio
Eng.Soc.,Vol.39,No.10,1991
October
PAPERS
TWO-CHANNEL
STEREO
SOUND
speaker-referred
simulation of the spatial impression
and depth,
Summarizing loudspeaker stereophony according to
the association model is based on introducing corresponding physical attributes of the ear signals (which
correlate with phenomena of natural spatial hearing)
into the stereophonic signals [4]. This is contradictory
to summing localization theories, which attempt to introduce them into the resulting ear signals of the listener.
On the basis of summing localization theories it is even
today tried to assess stereophonic techniques (see, for
example [7], [8]). As a recent example, Lipshitz has
concluded that coincidence-microphone
techniques are
most advantageous for getting a natural spatial impression [7]:
requirement is not met by pure intensity or time stereophony, and the stereophonicquality is not advantageous with respect to depth and space imaging (intensity
stereophony)
or localization (time stereophony) in
comparison to dummy-head, OSS, or ORTF techniques,
as found in practical comparison tests on the performance of the main microphones in different concert halls
[9].
On this basis, there are possibilities for optimizing
the stereophonic presentation
of direction, distance,
and spatial impression through two loudspeakers. The
so-called sphere-microphone
[10] and the room-related
balancing techniques [11]-[13]
represent appropriate
optimization approaches for the recording end.
I believe that spaced-microphone
recording techniques are fundamentally
flawed, although highly
regarded in some quarters, and that coincident-microphone recordings are the correct way to go.
His arguments are based on an analysis of the resulting,
interaural characteristics
of the listener's ear signal,
according to the principle of summing localization:
The level and time (or phase) differences at the listener's ears are not the same as those at the loudspeakers ....
It is important that, as far as possible,
the two loudspeaker signals combine at the listener's
ears to produce cues which are compatible with natural
hearing,
However, natural interaural attributes of the listener's
ear signals can only be obtained by using the dummyhead technique (head-referred
imaging). In contrast,
conventional two-loudspeaker
stereophony is a loudspeaker-referred imaging technique, and it is important
that, as far as possible, the two loudspeaker signals
contain natural interaural attributes rather than the resultant listener's ear signals in the playback room. This
2 SPHERE
[
In [ 10] a microphone system has been proposed where
two boundary-layer microphones are placed on the sides,
of a sphere, as shown in Fig. 3. This so-called sphere
microphone produces stereophonic signals which are
composed of natural interaural differences, quite similar
to dummy-head signals, as required in Sec. 1. However,
in contrast to the dummy head, it has a linear frontal
frequency response (see Fig. 4, upper curve). The
sphere-microphone
signal does not contain those
dummy-head-specific
spectral cues, which are used for
front-back orientationduring headphonelistening [2],
[5] (Fig. 4, lower curve), but which are not used during
loudspeaker-referred
presentation and would therefore
cause coloration problems [10].
The frequency responses of the sphere microphone
Schoeps KFM 6U are plotted in Fig. 5. They are linear
for sound reaching the sphere from the front (0°), and
orchestra
i
I dummysoundI
I
I
head
<lc
reproduction
I
I
microphone
I
I
reproduction
(a)
J. Audio Eng. Soc., Vol. 39, No. 10, 1991 October
head
I
reproduction
Fig. 2. Two examples for demonstrating
,_
I dummy
coincidence
I
source I
x
MICROPHONE
reproduction
1
(b)
stereophonic
quality differences
in headphone and loudspeaker
reproduction.
763
THEILE
PAPERS
the sum of left and right energy is frequency independent
use to produce a stereo image of outstanding
for sound sources moving toward the side. Also, the
frequency response is linear for the integrated resultant
of sounds reaching the sphere from any angle in the
reverberant (diffuse) field [3]. The choice of pressure
capsules sharply reduces sensitivity to air motion, while
ensuring frequency response to the lower limit of human
hearing (see Fig. 5). The sphere microphone fuses advantageous features of the systems already in general
and spatial integrity, combined with excellent soundcolor neutrality and low-frequency
response.
At present the sphere microphone Schoeps KFM 6U
(Fig. 6) is being tested in different situations and compared with other main microphones. First results confirm
that the sphere microphone in fact combines favorable
imaging characteristics
with respect to spatial perspective, accuracy of localization, and sound color.
Ironlal incidence
naturalness
lateral incidence
+'-'-;..0,os,u,o
zo°e
4
Qnt
ntry
11 "°°w-Fig. 3. Principal function of sphere microphone.
spheremicrophone .
2
5 dB
?100 125160200250320t*005006308001k 1,251,6 2 2.5 3.2 t. 5 6.3 g 10 12,516k
HZ
=
Fig. 4. Frequency response of sphere microphone and dummy head (0° free field minus diffuse field).
"', "ri_"
L_
.-_-+_-._.]_1
_ :]-_T_4--t-.-'f.,t+kl-k
-_--
,_-l-_-
-_L_'_44-_-_
__:=_-_-_ff--f--_i
_
! I I H Ill ! _]_L]__.,""""""""_
!17
÷ _'
-L4
,
--
Fig. 5. Frequency response of sphere microphone (Free field 20 °, 40 °, 60°). Upper curves--right;
curve--sum of left and right energies.
764
'
t_,+--t--i+_-_-,'--+_,'1lower curves--left;
middle
J. Audio Eng. Soc., Vol. 39, No. 10, 1991 October
PAPERS
TWO-CHANNEL
3 ROOM-RELATED
BALANCING
\
3 Balancing gain is the level of the balancing signal with
reference to the balancing signal's threshold level, which is
measured at the threshold of perception of the balancing signal
in the main-microphone signal.
Fig. 6. Sphere microphone KFM 6U (Schoeps), suspended,
sound
--first
reflections
and reverberation-
o_T-']
.......... l.................
-20
O
SOUND
for example, [14]). However, those techniques are not
satisfying, because the stereophonic quality of the direct
sound is not improved by this method. In practice a
delay compensation leads to "notching" effects, which
are particularly disturbing when the musicians move
about near the spot microphone.
Experiments have
shown [13] that pure panpot balancing can even be
preferred in comparison to delay-compensated
panpot
balancing, depending on the recording situation and
the desired balancing gain. 3
To preserve the perception of spatial perspective due
to the main-microphone signal and, at the same time,
achieve a high balancing gain, the spot-microphone
signal should be delayed much more than necessary
for the compensation, so as to fall within the region
of the early reflections. It has been proposed to achieve
the desired increase in volume by adding sound energy
from artificially generated reflections [11]. This socalled room-related balancing technique has been tested
and optimized recently [13] with the aid of an appropriate audioprocessing unit [15]. It was found that a
loss in depth can be avoided satisfactorily by generating
just two artificial reflections from the spot-microphone
signal (according to Fig. 8), and that the greater the
required balancing gain, the more the room-related
balancing technique is favorable.
Principally,
the room-related
balancing algorithm
could be implemented into digital mixing desks so that
it could be used alternatively to conventional panpot
balancing. However, first of all further optimization
is useful to minimize the signal processing effort and
to introduce improvements,
such as distance equalization (taking into account changes in spectrum, such
as by absorbtion at the room boundaries), additional
artificial reverberation
(generated from the spot-microphone signal in accordance with the artificial reflections), and so on.
Theoretical considerations
[11] and practical tests
[13] show that spot-microphone
signals can be added
to the sphere-microphone
signal without disturbing the
perception of spatial perspective. A disturbing effect
occurs in the case of conventional panpot balancing,
as demonstrated in Fig. 7. The signal picked up by a
spot microphone is reproduced earlier than the corresponding main-microphone
signal. Thus the ear interprets the spot-microphone
signal as the direct sound
[11]i [14], and due to the precedence effect, the favorable imaging characteristics
of the sphere microphone (or any appropriate main microphone) are lost.
Such recordings sound flat, without spatial depth,
It is common practice to moderate this space-disturbing effect by artificial reverberation
or by compensating the delay of the main-microphone signal (see,
direct
STEREO
............._ii_
I
from-- "spot
microphone"
!
I
lrom "main microphone_---_,,-
[
I/
·_
'
_-2o
I
:
_
'
I
,i
/ _
5
10
15
0,5
L-_-directional imaging
(phantom sound sources)
Fig. 7. Panpot balancing:
main-microphone
J. Audio Eng. Soc., Vol. 39, No. 10, 1991 October
:
ii
T---
-
--
!
ii
I
20
25
30
35
40
45
50 ms
delay of reflections
signal plus spot-microphone
"
signal without time delay.
765
THEILE
PAPERS
4 SUMMARY
of Human Sound Localization
(M.I.T. Press, Cambridge, MA, 1983).
[3] G. Theile, "Zur Theorie der optimalen Wiedergabe von stereofonen Signalen fiber Lautsprecher
und Kopfh6rer" (On the Theory of Optimum Reproduction of Stereophonic Signals through Loudspeakers
and Earphones),
Rundfunktechn.
Mitt., vol. 25, pp.
155-170 ( 1981).
[4] G. Theile, "On the Stereophonic
Imaging of
Natural Spatial Perspective via Loudspeakers: Theory,"
in Perception of Reproduced Sound (1987), pp. 135146.
[5] G. Theile, "On the Standardization of the Frequency Response of High Quality Studio Headphones,"
J. Audio Eng. Soc., vol. 34, pp. 956-969 (1986 Dec.).
[6] G. Steinke, "Stand und Entwicklungstendenzen
der Stereofonie" (Current Situation and Development
Trends in Stereophony), pts. 1 and 2, Tech. Mitt. RFZ,
vol. 28, pp. 1-10, 25-32 (1984).
[7] S. P. Lipshitz, "Stereo Microphone Techniques
. . . Are the Purists wrong?" J. Audio Eng. Soc. (Features), vol. 34, pp. 716-744 (1986 Sept.).
[8] J. C. Bennett, K. Barker, and F. O. Edeko, "A
New Approach to the Assessment of Stereophonic Sound
System Performance,"
J. Audio Eng. Soc., vol. 33,
pp. 314- 321 (1985 May).
[9] M. W6hr and B. Nellesen, "Untersuchungen zur
Wahl des Hauptmikrofonverfahrens"
(Studies on the
Selection of the Main Microphone Method), in Proc.
14th Audio Engineers' Conf., pp. 106-120 (1986).
[ 10] G. Theile, "Das Kugelfl_ichenmikronfon"
(The
Sphere Microphone),
in Proc. 14th Audio Engineers'
Conf., pp. 277-293 (1986).
[11] G. Theile, "Hauptmikrofone und Stfitzmikrofone--neue
Gesichtspunkte
for ein bew_ihrtes Aufnahmeverfahren"
(Main
Microphone
and
Spot
Microphones--New
Aspects for a Proven Recording
Technique),
in Proc. 13th Audio Engineers'
Conf.,
pp. 170-184 (1984).
[12] M. W6hr, J. Goeres, C. P6sselt, and G. Theile,
"Raumbezogene Sttitztechnik--M6glichkeit zur Optimierung der Aufnahmequalit_it" (Room-Related
Sup-
It can be concluded that current two-channel stereophony recording techniques can be improved with regard to the naturalness of the stereophonic sound image
as defined. A consistent consideration of new knowledge
and understanding with regard to psychoacoustic principles, particularly with regard to spatial hearing, leads
to the principal result that the two-channel stereo presentation of direction, distance, and space is only possible as a presentation
of spatial perspective in the
simulation plane between the loudspeakers. This has
the effect of creating a natural two-dimensional image
of a three-dimensional space,
We consider that an optimization of the techniques
for simulating spatial perspective is achievable by using
natural interaural signal differences instead of pure intensity or time differences. The room-related recording
technique is consequently based on this knowledge,
This technique can be applied to the well-tried main/
spot-microphone
methods, which means that the optimum main microphone is the sphere microphone or
a corresponding microphone generating natural interaural signal differences, and that any spot-microphone
signal added to the main-microphone
signal represents
additional artificial reflections from the recording room.
Since the generation of natural interaural signal differences and the simulation of artificial reflections and
reverberation from a spot-microphone signal is possible
with the aid of modern computer technology, a roomrelated recording technique can also be applied in principle to polymicrophony.
This would result in astereophonic presentation of any artificially created space
in the simulation plane.
5 REFERENCES
[1] W. Kuhl, "R_umlickeit
als Komponente
des
H6reindrucks"
(Spaciousness as Component of the
Auditory Impression),"
Acustica, vol. 40, pp. 167181 (1978).
[2] J. Blauert, Spatial Hearing--The
Psychophysics
directsound
i
firstsound
-20
artificial
reflection
°Jt
,,
I]",
artificial
11 I
I
I
I
/ _
5
I0
15
0_.__ directional imaging
Fig. 8. Room-related
balancing:
I, II
I
20
T
I I
reflection
2 artificial
reverberation
'I
tl/t
3,
11, I
766
I
,,
/
-- dB
reverberation
II I II II
I
25,
30
315
delay of the reflections
main-microphone
I
I
40
45
i
50
ms
signal plus artificial reflections and reverberation.
J. Audio
Eng.Soc.,Vol.39,No.10,1991October
PAPERS
TWO-CHANNEL STEREO SOUND
porting Technique--Possibility
for Optimizing the
Recording Quality), in Proc. 15th Audio Engineers'
Conf., pp. 302-315 (1988).
[13] M. W6hr, G. Theile, H. J. Goeres, and A.
Persterer, "Room-Related
Balancing Technique--A
Method for Optimizing Recording Quality," J. Audio
Eng. Soc., vol. 39, pp. 623-631 (1991 Sept.).
[14] J. Jecklin,
Musikaufnahmen:
Grundlagen,
p. 55.
[15] F. Richter and A. Persterer, "Design and Applications of a Creative Audio Processor," presented
at the 86th Convention of the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. 37, p. 398
(1989 May), preprint 2782.
Technik, Praxis (Music Recordings:
Technique, Practice). Franzis-Verlag,
The biography for Giinther Theile was published in the
1991 September issue of this Journal.
J. Audio Eng. Soc., Vol. 39, No. 10, 1991 October
Foundations,
1980), Munich
767