Download The Generation and Validation of High Fidelity Virtual Auditory Space

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sound wikipedia , lookup

Auditory processing disorder wikipedia , lookup

Earplug wikipedia , lookup

Sound localization wikipedia , lookup

Auditory system wikipedia , lookup

Transcript
Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vol. 20, No 3,1998
The Generation and Validation of High Fidelity
Virtual Auditory Space
Simon Carlile', Craig Jin1,2,Vaughn Harvey''2
Department of Physiology, The University of Sydney, NSW, Australia 2006
'Department of Electrical Engineering, The University of Sydney, NSW, Australia 2006
E-mail: [email protected]
Abstract-This paper reviews a number of the issues involved with
recording the filter functions of the outer ear and subsequently
using these functions to render virtual auditory space using
headphones. The acoustical problems associated with recording
within the confined tube of the auditory canal are considered for
both specification of the head related transfer function and the
headphone transfer function. The difficulties of acoustically
validating the rendered VfLS indicate the importance of validation
using a powerful test of airditory performance. We describe such
methods as used in our laboratory. The issue of individualised
HRTFs is considered and two relevant experiments are described
concerning the (i) mapping of the morphology of the outer ear to
the filter characteristics and (ii) the subsequent manipulation of
standard HRTFs.
Keywords-Virtual auditory space, Head-related transfer
function, Sound localization, Virtual auditory displays
I. INTRODUCTION
A. Applications of VAS
There are a number of outstanding problems in rendering
high fidelity virtual auditory space (VAS). This paper will
review some of the most common ways of measuring the filter
functions of the outer ears and of using these functions to
generate sounds localised in a virtual space. We will highlight
some of the technical difficulties associated with performing
these kinds of measurements on humans as well as the
problems associated with acoustically validating the rendered
virtual space. In discussing the nature of the perceptual errors
that can arise as a result of poorly rendered virtual space we
will focus on a class of large localisation errors; the so-called
cone-of-confbsion errors. On the one hand, for relatively trivial
applications of these technologies such as by the games
industry, these problems are not so acute, but where high
fidelity localisation is required (such as for directional or
location mapping) then these problems are far more profound.
Work over the last decade has demonstrated that, under
carefully controlled conditions, it is possible to render auditory
environments with high fidelity. In the case of artificial virtual
environments, non-auditory data can also be mapped into this
[email protected]
facilitate operations such as in the development of head-updisplays. In addition to the remapping of some of these data,
the auditory dimension has also been used to replace,
supplement or facilitate visual information. Some examples
include a horizon indicator that uses a moving auditory icon to
indicate the gravitational 'up' directions, a collision warning
system which indicates the direction of an incoming threat
using an appropriate auditory icon or as a flight plan
navigational aid. (e.g. see [I-41). The fidelity of the rendered
virtual auditory space is most important in mission critical
applications such as the gravitational horizon or collision
indicator.
B. The Acoustical Basis of VirtualAuditory Displays
The perception of auditory space is dependent on acoustical
cues that arise as a result of the pattern of sound waves arriving
at each ear (for recent review see [ 5 ] ) . A sound source located
away from the midline results in differences in the arrival time
and the level of the sounds at each ear. The auditory periphery,
comprising the pinna, concha, head and torso, interact with the
incoming sound waves and spectrally filter the sound. Because
of the morphological asymmetries of these structures, the
nature of this filtering is dependent on the relative location of
the sound in space (see [6]). These filter functions are
commonly described as the head related transfer functions
(HRTF) and ideally provide a complete description of the
transformation of the sound from a point in free space to the
eardrum. However, the perception of an externalised sound is
not dependent exclusively on the HRTFs and therc are a range
of other relevant acoustical cues ([7]).
In the ideal case, the functional requirement of a VAS
display is to generate the pattern of sound waves at the
eardrums that might have occurred had the sound occurred in
the free field (for recent review see [SI). In practice this is very
difficult to achieve. There are a range of technical difficulties
associated with recording the transfer function to the eardrum.
The placement of the probe microphone in the human outer ear
is not a straightforward procedure due to the delicacy of the
eardrum and the sensitivity of the proximal portion of the
auditory canal. Secondly, the impedance mismatch between the
outer and inner ear results in the reflection of energy at the
domain. This provides a means by which data that is either
eardrum and the consequent development of standing waves
highly symbolic, or data that is usually delivered to a different
sensory channel, can be mapped onto the auditory channel.
For instance, there are a variety of situations where the visual
channel carries a very high data load such as in a flight cockpit.
Efforts have been made to remodel these kinds of data to
along the length of the auditory canal. A microphone within the
auditory canal will sample pressures that result from a
combination of incoming and outgoing sounds [ 9 ] . Moreover,
close to the eardrum the sound field becomes particularly
complicated due to the coupling of the sound field and the
0-7803-5164-9/98/$10.000 1958 IEEE
1090
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 31, 2010 at 21:30 from IEEE Xplore. Restrictions apply.
effective reflecting surface of the eardrum [lo]. As a
consequence, it is difficult to specify an optimal location within
the proximal portion of the auditory canal at which the HRTF
can be specified. In addition, a compromise needs also to be
made between recording convenience, the safety of the
experimental subject and the complex acoustics close to the
eardrum.
Of course, the transfer functions of the headphones (HpTF)
used to deliver the VAS need also to be measured and
compensated for in the process. All of the problems associated
with the measurement of the HRTF also apply to the
measurement of the headphone transfer function. There are in
addition, two other problems that need to be overcome. First,
there is an increased difficulty in maintaining the exact location
of the microphone for both the free field and headphone
transfer functions. The headphones have a tendency to distort
the ear and push on the recording microphone. Variations in
the point to which these functions are specified will lead to
sharp and quite large disparities resulting from small variations
in the frequency of the standing wave null at the location of the
probe microphone ([6]). Secondly, most headphones tested
demonstrate significant variation in the transfer functions
dependent on their exact placement on the outer ear ([ 113).
This latter study also demonstrated that the use of a
standardised headphone calibration would lead to significant
errors in the regenerated HRTF as the headphone calibration
captures a significant component of the individualised
characteristics of a listener’s ears.
As a result of the many technical difficulties outlined
above, most approaches involve a number of approximations in
the recordings procedure and untested or untestabfe
assumptions in the rendering of VAS.
C. Common Methods Applied in Generating VirtualAuditory
Space
There are a number of methods that have been employed to
measure the HRTF (e.& [12-151). These methods can be
broadly divided into the (i) deep or (ii) shallow ear canal
recordings or (iii) blocked ear canal recording. The different
techniques all aim to overcome the problem of the artifacts
produced by the standing waves within the canal. We have
previously attempted to record as deeply as possible within the
canal by using the frequency of the standing wave null to
determine the distance of the microphone probe from the
effective reflecting surface of the eardrum ([14] see also [16]).
The microphone is placed sufficiently closely to the eardrum so
that the frequency of the standing wave occurs above 14kHz.
This provides a reasonable estimate of the transfer function at
the low to mid frequencies [16], but provides an increasing
underestimate for progressively higher frequencies. Other
authors have recorded at more distal locations within the canal
although these locations provide a poor estimate of the
spectrum of the sound at the eardrum (e.g., [29]). An alternative
approach is to record the so-called ‘Thevenin pressure’, which
is obtained at the entrance of the ear canal that has been
plugged to eliminate the effects of the canal resonance and the
standing waves. This situation is similar to the ‘open circuit’
situation for deriving the Thevenin equivalent circuit in
electronics, only in this case it’s an acoustical circuit. There are
two characteristic impedances describing this situation: (i) the
input impedance seen at the ear canal, Ze, canal and (ii) the
output impedance seen at the ear canal, Zradration. The validity of
this approach relies on the argument that, using a sufficiently
‘open’ headphone, the transfer function obtained with the
headphone will be equivalent to the free field transfer function.
Briefly, this can be seen as follows. The free field transfer
function is given by:
z e a r canal 1 (zear canal -k Zradiatlon 1,
while the transfer function with headphone delivery is:
z e a r canal / @ear canal -k Zheadphone),
where &adphone is the impedance of the headphone seen from
the ear canal. Thus it can be seen that an ‘open’ headphone
must have Zheadphone and Zradlatlon approximately equivalent.
The methods employed in generating the VAS will depend
on the methods that have been used in recording the HRTFs.
The simplest case is for the deep ear canal recordings. If the
transfer function of the measurement system is known (system
transfer function: the speaker, microphone and recording
system) then dividing the measurement made within the ear
canal with the system transfer function should provide the
HRTF to the measured point in the ear. Likewise, if the HpTF
has been obtained then dividing the HRTF with the HpTF will
produce a signal that, when filtered using the headphone,
should produce the desired sound at the eardrum. However, as
noted above, there are problems associated with obtaining
reliable HpTF. To avoid this difficulty, in-ear tube phones
such as those produced by Etymotic have been used in a
number of laboratories to generate VAS. Although relatively
expensive, the tube-phones appear to have two main
advantages. First, the placement within the ear is highly
reproducible compared to placement of circum-aural or supraaural headphones. Secondly, the manufacturers claim that the
transfer function of the driver is flat to the eardrum so there is
effectively no headphone transfer function that needs to be
compensated for.
It is common to treat the transfer function from a particular
sound source location as a two-stage transfer function. One
stage is a location independent transfer function and the other
stage is a location dependent transfer function. If the location
independent transfer function is deconvolved from the original
transfer function, then the resulting transfer function is known
as the directional transfer function or DTF. The primary reason
for performing such a manipulation is to remove measurement
artifacts. Theoretically this is generally taken care of by
calibrating the recordings using the inverse of the recording
microphone transfer function. However this method can be
prone to errors because of noise inherent in the measurement
process. If instead, all of the transfer functions are averaged to
obtain a location independent transfer function, which is then
deconvolved from the original transfer function, the influence
of noise upon the measurement process is greatly reduced.
Different techniques exist for computing the average transfer
function (also known as the diffuse field transfer function) and
probably the most popular method is the one introduced by
Middlebrooks [ 171 which involves averaging the log magnitude
1091
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 31, 2010 at 21:30 from IEEE Xplore. Restrictions apply.
spectra across all locations. A drawback to this procedure is the
invariable introduction o F an overall interaural distortion to the
simulation process. This can be reasoned as follows. If the
original interaural transfer fimction, ITFonglnal
, is given by the
equation:
ITFongmal = (IR*DR)/ (IL*DL),
where IR is the right location-independent transfer function and
DR is the right location-dependent transfer function (similarly
for the left ear), then the interaural transfer function resulting
from using the DTFs, DRand DL,is given by: ITFdtf = DR/DL
It then follows that the (difference between the two interaural
transfer functions can be (expressedas:
(ITFongmal/ ITFdtf) = IR/ID.
Thus IR/IDgives a measure of the distortion introduced into the
interaural transfer function when using the DTF.
One outstanding question is the general acoustical validation of
this approach. It is not practical or, in the case of the in-ear tube
phones even possible to calibrate the headphone transfer
functions for each operator prior to generating VAS. This leads
to some uncertainty regarding the acoustical accuracy of the
rendered HRTFs and beg:; the need for an appropriate means by
which individualised VAS can be appropriately and
convenientlyvalidated.
D. Tests of the Fidelity cfRendered Virtual Auditory Space
the orientation of the head and thus provides an objective
measure of the perceived location. Turning to face the source
of a sound is a highly ecological behaviour which brings the
source of the sound into the visual field [18]. All subjects
undergo a short period of training prior to localisation testing
to ensure that they can reliably use this method of pointing to
indicate the perceived location of the sound source.
Localisation errors fall broadly into two categories. The
most common type of error is associated with relatively small
deviations of the perceived location from the actual location
and is referred to here as a “local error”. The second type of
error typically involves a very large error where the perceived
location of the target is at a location reflected about the
interaural axis (the line passing through the two ears). In this
type of error the correct angle with respect to the median plane
is estimated but the spatial quadrant is confused; for example, a
sound located close to the anterior midline is judged to be close
to the posterior midline. Such errors occur relatively
infrequently (typically less than 4% under the conditions tested
in our laboratory) and are referred to as a “front-back
confusions” or more properly “cone-of-confksion” errors. The
large qualitative differences between local and cone-ofconfusion errors is generally taken to indicate the failure of two
different processes in auditory localisation (for an extensive
discussion see [6]).
We have examined both the performance of individuals and
a pooled population of subjects using a number of statistical
measures. The cluster of localisation estimates about an
individual target location can be described by the centroid and
standard deviation calculated using a Kent or Fisher
distribution as appropriate ([ 19, 201). The centroid indicates
the systematic error in localisation and the dispersion the
accuracy. The systematic errors and the dispersion of these
clusters are smallest for frontal locations close to the audiovisual horizon and largest for locations behind and above the
subject (Figure 1: taken from [21]). The spherical correlation
coefficient is used to assess the association between the
centroids of the perceived locations and the actual target
locations (the correlation of the data shown in Figure 1 was
0.98 for the pooled data and the rate of front-back confksions
was 3.2% of the localisation trials.
We have assessed the fidelity of virtual auditory space by
using sound localisation performance. The ability of a subject
to localise a burst of broadband noise in anechoic space is
compared with his/her localisation performance for the same
type of stimulus presented in virtual auditory space. From an
evolutionary point of vkw, the most sophisticated form of
sound localisation involv4esbinaural and monaural processing
and is capable of determining the locations of very brief sounds
with considerable accuriicy, particularly if the sounds are
spectrally dense (for recent review see [5]). There is
considerable evolutionary pressure to accurately localise
sounds such as the inadvertent movement noises of a predator
or prey. We have chosen localisation of a brief sound as a test
of the fidelity of VAS as processing of these kinds of stimuli
represent a demanding test of auditory localisation abilities. B. Localisation Performance in Virtual Auditory Space
Combined with appropriate statistical treatments (see below)
Individualised HRTFs were used to generate virtual
such a test gives a powerfill evaluation of VAS fidelity
auditory space stimuli for each subject. The impulse responses
11. ASSESSESMENT OF SOUND LOCALISATION for the left and right ear for the same 76 locations used in
testing the free field localisation performance were calculated
PERFORMANCE
as described above. In some experiments the localisation
accuracy
for the different kinds of filtering were tested:
A. Free field Localisatiofr
Namely impulse responses calculated from (i) deep ear canal
We have developed an automated stimulus system that HRTF, (ii) blocked ear canal HRTF and (iii) DTF calculated
allows the placement of a sound source at any position on an from any of the microphone recording conditions. For the data
imaginary sphere surrounding the subject. The robot arm shown in Figure 2 the impulse responses for the left and right
places the sound stimulus at one of 76 randomly chosen ears were calculated from the deep ear canal recordings of five
locations in a darkened anechoic chamber. The subject turns to subjects and convolved with 150 ms of broad band noise using
face the perceived locatioii of the target and points his/her nose MatLab (Mathworks). Stimuli were generated using 16bit D-A
at the source. An electromagnet tracking device mounted on converters at 80kHz (TDT System 11) and were presented to
the top of the head (Polhemus: IsoTrack) is used to measure the subjects using in-ear tube phones (Etymotic AER-2). As
1092
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 31, 2010 at 21:30 from IEEE Xplore. Restrictions apply.
mentioned above, the principal disadvantage of using the tube
phones was that any inter-subject differences in the tube-phone
transfer functions could not be measured and accounted for
under these experimental conditions.
Sound localisation performance for stimuli presented in
VAS was assessed in the exactly the same manner as for the
free field localisation. The only difference in conditions was
that the subjects wore the in-ear tube phones to deliver the
stimulus. Some subjects demonstrate a pattern of
acclimatisation to the VAS and initially show a higher than
normal rate of front-back confusions that declined significantly
after one or two trial blocks ([22]). The spatial distribution of
localisation errors for sounds located in the virtual space was
very similar to that for sounds presented in the free field
(Figure 2).
On average, dispersion was greater by 1.5" for stimuli
presented in VAS compared to free field. There was a slight
increase in the dispersion of localisation estimates for locations
behind and also above the subjects when compared to free field
localisation. An increase in the front-back confusion rates (the
most prominent form of the cone-of-confusion errors) was also
seen with average rates rising from around 3-4% in the free
field up to 6% for sounds presented in virtual space. A
significant proportion of these front and back confusion are
probably attributed to an increase in the angular errors about
the interaural axes, particularly at the higher elevations. The
spherical correlation between the perceived and actual target
locations was 0.973, Furthermore the spherical correlation
between the VAS and free field localization was higher still
(0.98) indicating that subject biases evident in the free field
data were also replicated in the VAS data.
For some subjects we have carried out VAS testing using a
range of different methods of generating VAS and some
differences have been found between DTF and HRTF
recordings (e.g. Table 1). In this case, the subject was known
to have slight asymmetries between the ears that resulted in
differences in the directional characteristics of the ears. The
calculation of the DTF might be expected to produce a subtle
distortion in the interaural differences by removing a constant
interaural difference and result in confounded binaural spectral
cues. While there is some evidence that the VAS rendered
without calculating the DTF resulted in better performance in
either the deep ear canal recording or the blocked ear canal
recording, the differences are not that marked. This suggests
that there is redundancy in the auditory localisation cues that
may result in compensation for the deficiencies or distortion in
one cue set.
It has been known for some time that the individual
differences in the structural features of the outer ears can lead
to significant differences in the filter functions. Wenzel et ai.,
([23]) showed that localisation in VAS generated using HRTFs
that were not obtained from the listeners, often resulted in
localisation performance that was significantly degraded. Any
one subject showed a range of localisation performance using
HRTFs obtained from different subjects. The performance
differences presumably reflect the extent of the physical
differehces between the ears of the listener and the ears from
SCC
Front-back
confusions
I
0.95
0.89
0.93
1.5%
4.6%
I
0.96
Condition
Normal Free field
Deen ear: HRTF
Deep ear: DTF
Blocked ear: DTF
I docked ear: HRTF
I 0.95
I 2.6%
I 4.5%
I 4.0%
111. MDIVIDUALISED HRTFS
which the HRTFs were obtained. Subjects seemed largely
unable to learn to use these non-individualised HRTFs despite
repeated exposure.
This has important consequences for the generalised use of
VAS displays. Unless the HRTFs used in generating a display
are individualised for the listener there will likely be significant
variations in the fidelity of the VAS generated across a
population of listeners. To that end a number of projects are
being pursued in our laboratory that seek (i) to find ways in
which the structural features of an individual's ears can be used
to predict the nature of the HRTF and (ii) to examine ways in
which a standard set of HRTFs might be modified or 'morphed'
to better fit the filter functions of a particular listener's ears.
A. HRTF Mapping
No one has yet found a mathematically tractable or useful
functional relationship between the acoustical morphology of
the outer ear and the HRTF. And yet, such a relationship could
be exploited to produce a generic VAS system adaptable to
each individual. In hopes of such a realization, a number of inroads have been made towards solving this problem. For
example, Lopez-Poveda and Meddis [24] have used a waveequation model to characterize the human concha. Kulkarni
and Colburn [25]have applied ARMA modeling to HRTFs and
have noted that the resonances and anti-resonances of the pinna
may be characterized via the pole-zero positions of their
model. Wightman and Kistler [26] have used multidimensional
scaling analysis to analyze the differences between individual
HRTFs in a low-dimensional space. No matter which method
was employed, the underlying difficulty arises from the large
morphological differences between people and the amazing
sensitivity of the auditory system to the specific filter
characteristics of its own ears. It is not just the shape of the
ears that plays a role but also the angle of the ears, size of the
head, and shape of the torso. Therefore we have taken a
reductionist approach and have studied a system with a reduced
number of varying parameters. To this end, a real-life
acoustical mannequin was built which allows us to vary many
characteristics of the auditory periphery and to measure
acoustically the consequences of such modifications. In the
experiment described here we have recorded the HRTFs while
varying only the angle between the ear and the head (the ears
were attached to the head using a notched hinge).
1093
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 31, 2010 at 21:30 from IEEE Xplore. Restrictions apply.
Figure 1: Pooled localisation responses from 19 subjects shown for
front (f), back (b), left (I) and right(r) hemispheres. The actual target
locations are shown by the: small and the centroid of the pooled
data by the filled circle. The: ellipse indicates the standard deviation of
the cluster of perceived responses.
Figure 2: Location estimates pooled for 5 subjects for sounds
presented in virtual auditory space. All other details as for Figure 1.
‘*I
HRTF. The mean square error also provided a measure of the
explained variance accounted for by the linear functional
model of the data.
It was found that a reasonable mapping between the outer
ear angle, a , and the HRTF data could be derived using a
linear functional consisting of three parameter functions and
coefficients. The degree of match between an original HRTF
and that derived from the linear functional model is shown in
Figure 3. The first two functions, a(a) and b(a), of the linear
functional demonstrated a linear and quadratic functional
dependence, respectively, upon the outer ear angle, whilst the
third function, c(a), had a more complicated functional
dependence.
The approximately linear dependence of the first function,
a(a), on the angle between the ear and the head is shown in
Figure 4. These data indicate that the dependence of the
HRTFs upon a single physical parameter, the angle between
the outer ear and the head, expanded within the model to at
least three parameter functions. This is a rather sharp increase
in complexity when examined in terms of the number of
potential physical variations between individuals. Furthermore,
this example varies only a single physical parameter and the
additional complications introduced when two physical
parameters co-vary are unknown.
A mathematical model of the functional relationship of the
angle between the ear and the head and the HRTFs was
derived. While only one physical parameter was varied, namely
the angle of the ear, we did not expect that only a single
parameter in the model would be required to completely
describe the HRTF variations. Essentially, the angle of the ear
influences the HRTFs through a complicated boundary-value
problem involving the wave equation. However, by utilizing
this physical reductionist approach, one can begin to assess
more accurately the difficulties of the original problem.
Seven complete sets of HRTF recordings of the mannequin
were performed for each of 7 different angles of the ear with
respect to the head. Thes8eangles were (in degrees): 0, 10, 20,
30, 40, 50, 60. Each set of recordings contained the left and
right ear transfer functions for 390 locations equally distributed
on the sphere. Principle component analysis [27] was
performed across the HRTFs in all 7 recordings (each ear was
treated separately). A linear functional was then used to map
the angle between the ear and the head to the principle
component representation of the HRTFs in a given recording.
The linear functional was derived such that the coefficients of
the functions within the linear functional did not vary with the
angle of the ear, but did vary with location. This relationship is B. HRTF Morphing
described as follows:
Another interesting approach towards exploring the
relationship between outer ear morphology and HRTFs starts
hrtf,(a,b,c;k,l,m) = a(a) E, + b(a) TI + c(a) m,
not with the basic functional relationships between the ear and
where hrtf, specifies the principle component representation of the HRTF, but with the functional relationship between the
the HRTF with an outer ear angle of a and at a location labeled HRTFs of different individuals. This approach shall be called
by i , and a(& b(a), c(u) are functions of a with respective the ‘morphing approach’ and is one in which a given set of
vector coefficients El ,TI , , that vary with the location, again HRTFs for one individual is adapted or ‘morphed’ into another
set for a different individual. Middlebrooks [28] has recently
labeled by i. The parameter functions and coefficients of the
demonstrated that a simple scaling of the HRTFs along the
linear functional were derived by minimization of the mean
frequency axis can account for a substantial proportion of the
square error between the original HRTF and the calculated
variance between the HRTFs of different individuals. The
1094
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 31, 2010 at 21:30 from IEEE Xplore. Restrictions apply.
-
I
30.
I
I
20
I
1
I
I
1
I
4
500
200
a(a)
’00
0
-400
-
0
10
20
30
40
50
60
a (degrees)
Figure 4. T h e first parameter function, a(a), as a function of the
angle, a , between the outer ear and the head.
difficulty, in general, with this approach is deriving a suitable
‘morphing function’ between different HRTF data sets. Ongoing research in this laboratory is exploring the linear
functional technique described above. The intriguing aspect of
the morphing approach is the possibility of discovering a
psychophysical relationship between the morphing parameters
and the perception of the derived VAS. However experiments
within our own laboratory and personal communication with
Wightman and Kistler indicate that such a relationship will only
be useful if a small number of parameters are required.
ACKNOWLEDGEMENTS
This work was supported by grants from the Australian Research
Council, National Health and Medical Research Council (Australia),
the Ramaciotti Foundation and the University of Sydney. The authors
would like t o acknowledge the assistance of Johahn Leung, Stephanie
Hyams, Anna Corderoy and Andre‘ van Schaik.
REFERENCES
[I]
[2]
[3]
[4]
E. M. Wenzel. “Research in Virtual Auditory displays at NASA”, in
SimTecT 96, Melbourne, 1996.
E. M. Wenzel, “Localization in virtual acoustic displays”, Presence,
pp. 80-105, 1992.
D. R. Begault, 3-0 sound for virtual reality and multimedia, Chestnut
Hill, MA, Academic Press, Inc., 1994.
R. L. McKinley and M. A. Ericson, “Flight Demonstration of a 3-D
auditory display“, in Binaural and spatial hearing in real and virtual
environments, R. H. Gilkey and T. R. Anderson, Eds., 1997, Lawrence
Erlbaum Associates, Inc., Mahwah, New Jersey. pp. 683-699.
S. Carlile, “Auditory space”, in Virtual auditory space: Generution and
applications., S. Carlile, Ed., Landes, Austin, chapter 1, 1996.
S. Carlile, “The physical and psychophysical basis of sound
localization“, in Virtual auditory space: Generation and applications.,
S. Carlile, Ed., Landes, Austin, chapter 2, 1996.
N. I. Durlach, et al., “On the extemalization of auditory images”,
Presence, vol. 1, pp. 251-257, 1992.
S. Carlile, ed., Virtual auditoiy space: Generation and applications,
Landes, Austin, 1996.
[91 S. M. Khanna and M. R. Stinson, “Specification of the acoustical input
to the ear at high fiequencies”, J. Acoust. Soc. Am., vol. 77(2), pp.
577-589, 1985.
R. D. Rabbitt and M. H. Holmes, “Three dimensional acoustic waves in
the ear canal and their interaction with the tympanic membrane”, J
Acoust Soc Am, vol. 83(3), pp. 1064-1080, 1988.
D. Pralong and S. Carlile, “The role of individualized headphone
calibration for the generation of high fidelity virtual auditory space”, J
Acoust Soc Am, vol. 100(6), pp. 3785-3793, 1996.
F. L. Wightman and D. J. Kistler, “Headphone simulation of free field
listening. I: Stimulus synthesis”, J Acoust Soc Am, vol. 85(2), pp. 858867, 1989.
~ 3 1J. C. Middlebrooks, J. C. Makous, and D. M. Green, Directional
sensitivity of sound-pressure levels in the human ear canal., J. Acoust.
SOC.
Am., vol. 86(1), p. 89-108. 1989.
~ 1 4 1 D. Pralong and S. Carlile, “Measuring the human head-related transfer
functions: A novel method for the construction and calibration of a
miniature “in-ear’’recording system”, J Acoust Soc Am, vol. 95(6), pp.
3435-3444, 1994.
~ 1 5 1 D. Hammershoi and H. Moller, “Sound transmission to and within the
human ear canal”, JAcoust Soc Am, vol. 100(1), pp. 408-427, 1996.
[I61 J. C. K. Chan and C. D. Geisler, “Estimation of eardrum acoustic
pressure and of ear canal length from remote points in the canal”, J.
Acoust. SOC.Am., vol. 87(3), pp. 1237-1247, 1990.
~ 7 1J. C. Middlebrooks and D. M. Green, “Directional dependence of
interaural envelope delays”, J. Acoust. SOC.Am., vol. 87(5), ~ p 2149.
2162, 1990.
[I81 J. C. Middlebrooks and D. M. Green, “Sound localization by human
listeners”, Annu. Rev. PsychoL, vol. 42, pp. 135-159, 1991.
[19] P. H. W. Leong and S. Carlile, “Methods for spherical data analysis and
visualisation”, J Neurosci Methods, In Press, 1998.
f20] N. I. Fisher, T. Lewis, and B. J. J. Embleton, Statistical analysis of
spherical data, Pper Back (with errata) ed. Cambridge, Cambridge
University Press, 1993.
[211 S. Carlile, P. Leong, and S. Hyams, “The nature and distribution of
errors in the localization of sounds by humans”, Hear Res, vol. 114, pp.
179-196, 1997.
D. Pralong and S. Carlile, “Localization accuracy in virtual auditory
[221 space”, Proc Aust Neurosci SOC,vol. 7, p. 225, 1996.
~231 E. M. Wenzel, et al., “Localization using non-individulaized headrelated transfer functions”, J Acoust Soc Am, vol. 94(1), pp. 11 1-123,
1993.
P I E. A. Lopez-Poveda and R. Meddis, “A physical model of sound
diffraction and reflections in the human concha”, J. Acoust. Soc. Am.,
VOI. 100(5), pp. 3248-59, 1996.
1251 A. Kulkarni and H. S. Colburn, “Infinite-impulse respmse models of
the head-related transfer function”, J. Acoust.. SOC. Am., Submitted,
1997.
[26] F. L. Wightman and D. J. Kistler, “Multi-dimensional scaling analysis
of head-related transfer functions”, IZEE workshop on applications of
signal processing to audio and acoustics, 1993.
~ 7 1I. T. Jolliffe, Principle component analysis. New York, SpringerVerlag, 1986.
[28] J. C. Middlebrooks, Z. A. Onsan, and L. Xu, “Virtual sound
localization with non-individualized transfer functions is improved by
scaling transfer functions in frequency”, Abstracts of the twenty-first
midwinter research meeting of the association for research in
otolaryngology, vol. 21, p. 43, 1998.
4291 J. C. Middlebrooks, J. C. Makous, and D. M. Green, “Directional
sensitivity of sound-pressure levels in the human ear canal”, J. Acoust.
SOC.Am.,vol. 86, pp, 89-108, 1989.
1095
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 31, 2010 at 21:30 from IEEE Xplore. Restrictions apply.