Download Selective attention reduces physiological noise in the external ear

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sound localization wikipedia , lookup

Sensorineural hearing loss wikipedia , lookup

Noise-induced hearing loss wikipedia , lookup

Earplug wikipedia , lookup

Soundscape ecology wikipedia , lookup

Auditory system wikipedia , lookup

Transcript
Hearing Research 312 (2014) 143e159
Contents lists available at ScienceDirect
Hearing Research
journal homepage: www.elsevier.com/locate/heares
Research paper
Selective attention reduces physiological noise in the external ear
canals of humans. I: Auditory attention
Kyle P. Walsh*, Edward G. Pasanen, Dennis McFadden
Department of Psychology and Center for Perceptual Systems, 1 University Station A8000, University of Texas, Austin, TX 78712-0187, USA
a r t i c l e i n f o
a b s t r a c t
Article history:
Received 12 July 2013
Received in revised form
13 February 2014
Accepted 28 March 2014
Available online 13 April 2014
In this study, a nonlinear version of the stimulus-frequency OAE (SFOAE), called the nSFOAE, was used to
measure cochlear responses from human subjects while they simultaneously performed behavioral tasks
requiring, or not requiring, selective auditory attention. Appended to each stimulus presentation, and
included in the calculation of each nSFOAE response, was a 30-ms silent period that was used to estimate
the level of the inherent physiological noise in the ear canals of our subjects during each behavioral
condition. Physiological-noise magnitudes were higher (noisier) for all subjects in the inattention task,
and lower (quieter) in the selective auditory-attention tasks. These noise measures initially were made at
the frequency of our nSFOAE probe tone (4.0 kHz), but the same attention effects also were observed
across a wide range of frequencies. We attribute the observed differences in physiological-noise magnitudes between the inattention and attention conditions to different levels of efferent activation
associated with the differing attentional demands of the behavioral tasks. One hypothesis is that when
the attentional demand is relatively great, efferent activation is relatively high, and a decrease in the gain
of the cochlear amplifier leads to lower-amplitude cochlear activity, and thus a smaller measure of noise
from the ear.
Ó 2014 Elsevier B.V. All rights reserved.
1. Introduction
Although much has been learned about the anatomy, neurophysiology, and biochemistry of the olivocochlear efferent system
since the early reports of Rasmussen (1946, 1953), its function
during everyday listening remains uncertain. Motivated by the
seminal, if controversial, report by Hernández-Peón et al. (1956),
there has been continual interest in the question of whether the
olivocochlear efferent system plays a role in selective attention
(e.g., Picton and Hillyard, 1971; Puel et al., 1988; Meric and Collet,
1994a; Fritz et al., 2007). Hernández-Peón et al. (1956) reported
that gross electrical potentials recorded from the dorsal cochlear
nucleus (DCN) in the auditory brainstem of cats were reduced in
magnitude when the animals attended to visual, somatic, or olfactory stimuli, relative to when the animals were in a state of
inattention. Ultimately, this fascinating finding was discredited on
grounds of poor experimental control (e.g., Worden, 1973). Nevertheless, it created considerable, continuing interest in the
* Corresponding author. Current address: Department of Psychology and Center
for Cognitive Sciences, 75 East River Road, University of Minnesota, Minneapolis,
Minnesota 55455, USA.
E-mail address: [email protected] (K.P. Walsh).
http://dx.doi.org/10.1016/j.heares.2014.03.012
0378-5955/Ó 2014 Elsevier B.V. All rights reserved.
possibility that the attentional demands of a behavioral task, or
those of an environment, can modulate the afferent responses of
the peripheral auditory system, either at the level of the auditory
brainstem (as in Hernández-Peón; e.g., Lukas, 1980), or in the responses of the cochlea (e.g., Puel et al., 1988; Giard et al., 1994;
Maison et al., 2001; Harkrider and Bowers, 2009). Research confirms everyday experience that humans are able to control their
attention (Hafter et al., 1998; Gallun et al., 2007). However, after
more than one-half century of research, there is a paucity of clear
evidence that cognitive processesdsuch as the selective allocation
of attentional resourcesdcan affect the responses of the afferent
auditory periphery.
If attentional demands (or other cognitive or perceptual demands) were capable of modulating afferent auditory responses at
the level of the cochleadthe transduction stage in the auditory
systemdthe medial olivocochlear bundle (MOCB) would be one
neural pathway through which these effects would be realized. This
pathway originates in the superior olivary complex of the brainstem, which is innervated directly by efferent neurons originating
in the inferior colliculus and auditory cortex (Mulders and
Robertson, 2000a,b). The fibers of the MOCB terminate at the bases of the outer hair cells (OHCs) of the cochlea (Warr and Guinan,
1979), where they inhibit (hyperpolarize) the OHCs, which in turn
144
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
increases the local stiffness of the cochlear partition and diminishes
the displacement of the basilar membrane (Cooper and Guinan,
2006), thereby reducing the afferent output of the inner ear. If
the efferent flow from the cortex to the brainstem varied with level
of attention, then the activity in the MOCB would vary, as would the
afferent flow from the cochlea.
The OHCs are part of the cochlear-amplifier system (Davis, 1983)
that is thought to be involved in the production of otoacoustic
emissions (OAEs)dweak sounds produced in the inner ear and
measured in the external ear canal (Kemp, 1978, 1980). For this
reason, OAEs have been used to examine how human cochlear responses are affected by MOCB activation, and in turn how the
attentional demands of a behavioral task affect efferent feedback to
the cochlea. Previous studies have demonstrated that OAE magnitudes measured during auditory- or visual-attention tasks were
different from OAE magnitudes measured during tasks that did not
require attention (Puel et al., 1988; Froehlich et al., 1990, 1993;
Meric and Collet, 1992, 1994b; Giard et al., 1994; Ferber-Viart
et al., 1995; Maison et al., 2001; Harkrider and Bowers, 2009).
However, across studies, the directions of the attentional effects on
OAEs have been inconsistent, the magnitudes of the observed differences always have been small (less than about 1 dB), and comparisons across studies have been made difficult by significant
procedural differences (see Discussion).
This is the first in a series of reports describing differences in
cochlear responses when human subjects are, or are not, engaged
in selective attention to either auditory or visual stimuli. In all cases,
physiological and behavioral auditory measures were obtained
simultaneously during the same test trials. In this first report, a
nonlinear version of the stimulus-frequency OAE (SFOAE), called
the nSFOAE or the residual from linear prediction (Walsh et al.,
2010a,b), was used to measure cochlear responses during tasks
that required either selective auditory attention to strings of digits
spoken by one of two simultaneous talkers (dichotic or diotic
listening), or relative inattention. In a companion paper, we report
similar results involving visual rather than auditory attention
(Walsh et al., 2014). These first two reports emphasize cochlear
measures made during brief silent periods following the nSFOAEevoking stimuli. Later we also will report parallel measurements
obtained during the nSFOAE-evoking stimuli, which we call “perstimulatory” measures. Both the silent-period and perstimulatory
measures exhibited marked differences during attention and inattention conditions.
Our measure of physiological noise was recorded in the external
ear canals of our subjects during every behavioral condition, using
the same cancellation procedure used to estimate the perstimulatory nSFOAE response. In contrast to the majority of previous studies on the effects of attention on OAEs that also measured
noise levels in the test ears (Froehlich et al., 1990, 1993; Ferber-Viart
et al., 1995; de Boer and Thornton, 2007; Harkrider and Bowers,
2009), every subject exhibited consistent differences in our
physiological-noise measure between the inattention and
selective-attention conditions. Specifically, the magnitudes of the
physiological noise always were higher during the inattention
condition than during the auditory selective-attention conditions,
the differences being about 3.0 dB averaged across subjects,
attention condition, and test frequency.
2. Methods
2.1. General
This first report focuses on an auditory measure of the physiological noise present in the external ear canals of humans during
each of several auditory-attention conditions. A nonlinear
procedure was used to estimate the level of the nSFOAE during a
brief silent period following each nSFOAE-evoking stimulus presentation. The Institutional Review Board at The University of Texas
at Austin approved the procedures described here. All subjects
provided their informed consent prior to any testing, and they were
paid for their participation. The behavioral measures will be
described first, followed by the physiological measures. Then, a
description will be provided of the integration of the behavioral and
physiological measures.
2.1.1. Subjects
Two males (both aged 22) and six females (aged 20e25) were
paid an hourly rate to participate in this study. All eight subjects
completed two 2-hr auditory-attention sessions. Across those sessions, each subject completed each of the experimental conditions
to be described a minimum of four times. All subjects had normal
hearing [15 dB Hearing Level (HL)] at octave frequencies between
250 and 8000 Hz, and normal middle-ear and tympanic reflexes, as
determined using an audiometric screening device (Auto Tymp 38,
GSI/VIASYS, Inc., Madison, WI). Across the eight subjects, two ears,
and four frequencies (0.5, 1.0, 2.0, and 4.0 kHz), the average middleear reflex (MER) threshold in our subjects was about 91 dB HL, and
no individual subject had unusually low or high thresholds. No
subject had a spontaneous otoacoustic emission (SOAE) stronger
than 15.0 dB SPL within 600 Hz of the frequency of the 4.0-kHz
probe tone used to elicit the nSFOAE.
2.2. Behavioral measures
Each subject was tested individually while seated in a reclining
chair inside a double-walled, sound-attenuated room. Two insert
earphone systems delivered sounds directly to the two external ear
canals. (The earphone systems are described in detail in Section 2.3
below.) Some of the sounds presented were relevant for the
behavioral task, and interleaved with these sounds were the stimuli
for evoking the nSFOAE response. A computer screen attached to an
articulating mounting arm was positioned by the subject to a
comfortable viewing distance, and was used to provide task instructions and trial-by-trial feedback. A numerical keypad was
provided to the subject to indicate his or her responses on the
behavioral task.
2.2.1. Selective auditory-attention conditions
2.2.1.1. Dichotic condition. There were two auditory selectiveattention conditions: one involved dichotic presentation of the
stimuli for the behavioral task, and one involved diotic presentation. For the dichotic-listening condition, two competing speech
streams were presented separately to the ears, and the task of the
subject was to attend to one of the speech streams. In one ear the
talker was female, in the other ear the talker was male, and which
ear received the female talker was selected trial-by-trial from a
closed set of random permutations. The number of trials having the
female voice presented to the right ear was approximately equal to
the number of trials having the female voice presented to the left
ear. On each trial, the two talkers simultaneously spoke two
different sequences of seven single-digit numbers. Each digit (0e9)
was selected randomly with replacement, and the digit sequence
spoken by the single female talker was selected independently
from that spoken by the single male talker. Each digit was presented during a 500-ms interval, and consecutive digits were
separated by 330-ms interstimulus intervals (ISIs). As described
below, the stimulus waveforms used to elicit the nSFOAE response
were presented in the ISIs between spoken digits. Fig. 1 shows an
example of the speech waveforms presented on a single trial of the
dichotic-listening condition.
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
145
Fig. 1. An example of the speech waveforms presented to the ears during one trial of the dichotic-listening condition. Each ear was presented with a series of seven spoken digits,
one series spoken by a female talker, and the other series spoken simultaneously by a male talker. The ear receiving the female talker was selected randomly on each trial. Each digit
was presented in a 500-ms temporal window, and a 330-ms ISI separated consecutive digits. Although not shown here, the nSFOAE-eliciting stimuli were presented in the ISIs, and
a 2000-ms silent response interval and a 200-ms feedback interval completed each trial. During the response interval, the subject performed a matching task based on the digits
spoken by the female talker.
The subject always was instructed to attend to the ear in which
the female was talking, to remember the seven-digit sequence that
she spoke, and then to choose a subset of that sequence from one of
two choices presented visually on a computer screen at the end of
the trial. Each choice of response consisted of five digits, presented
simultaneously, and the correct choice always corresponded to the
middle five digits spoken by the female talker. The incorrect choice
differed from the correct choice by only one randomly mismatched
digit, and it also was unrelated to the digit sequence spoken by the
male talker.
Not shown in Fig. 1 is the silent, 2000-ms response interval that
occurred at the end of each trial, during which the subject
responded by pressing one of two keys on a keypad (“4” or “6”) to
indicate whether the correct series of digits was displayed on the
left or the right side of the computer screen, respectively. The “5”
key had a raised nipple so that the subject knew where his or her
fingers were located without having to look at the keypad. At the
beginning of a block of trials, the subject placed his or her index,
middle, and ring fingers on the “4,” “5,” and “6” keys, respectively,
and maintained that placement through the entire block. This
eliminated body and head movements, which would disrupt the
nSFOAE recordings. Immediately following the behavioral
response, the series of digits selected by the subject was surrounded by an illuminated border, and a 200-ms feedback light
indicated which of the two response choices was correct.
2.2.1.1.1. Diotic condition. The diotic-listening condition was
similar to the dichotic-listening condition with the exception that
the male and female talkers were presented simultaneously to both
ears on each trial, rather than to separate ears. Thus, the dichoticlistening condition required attention to one of two spatial locations (the left or the right ear), whereas the diotic-listening condition required subjects to partition two speech streams that
seemed to originate from the same location in space, roughly in the
center of the head. It is intuitive that the diotic condition would be
more difficult than the dichotic condition, but this did not prove to
be true for all subjects.
2.2.1.1.2. Speech stimuli. One female talker and one male talker
were recorded to create the speech stimuli. The ten individual digit
waveforms for each talker were fitted to a 500-ms window by
aligning the onset of each waveform with the onset of the window,
and by adding the appropriate number of zero-amplitude samples
(“zeros”) to the end of each waveform to fill the window. The recordings were made using a 50-kHz sampling rate and 16-bit
resolution, and were not filtered or processed further before being
saved individually to disk. Before presentation, all waveforms were
lowpass filtered at 3.0 kHz, and were equalized such that the overall
level of each waveform was about 50 dB SPL. These levels were
weaker than the levels of the sounds used for evoking the nSFOAE
(see below).
2.2.2. Inattention condition
An inattention condition was used for comparison with the
dichotic- and diotic-listening conditions just described. This control
condition was designed to differ from the selective-attention conditions primarily in the amount of cognitive resources required to
perform the behavioral task. During each trial of the inattention
condition, series of speech-shaped noise (SSN) stimuli (described
below) were presented dichotically to the two ears instead of
spoken digits. The SSN stimuli had the characteristics of speech
without actually sounding like speech. As in the selective-listening
conditions, the SSN stimuli were interleaved with the nSFOAE
stimuli, and the timing of the stimulus presentations was the same
as for the two selective-listening conditions (see Fig. 1). The subject’s task simply was to press the number “4” on the response
keypad at the end of each trial, after the final sound in the stimulus
series.
The SSN stimuli were constructed by taking the Fast Fourier
Transform (FFT) of each of the 20 speech waveforms used in the
auditory-attention conditions and creating 20 samples of noise
having the same overall frequency and amplitude spectra, and the
same durations. A Hilbert transform was used to extract the envelope from each spoken digit from both talkers. Those envelopes
were lowpass filtered at 500 Hz to limit moment-to-moment
fluctuations, and then applied to the relevant sample of noise.
The resultant waveforms were not intelligible as speech although
the stimuli derived from the female talker were noticeably higher
in pitch. Similar to the dichotic-listening condition, one ear
received a series of seven SSN stimuli derived from the female
talker, and the other ear received a series of seven SSN stimuli
derived from the male talker. Different series of SSN stimuli were
presented on different trials, having been selected randomly with
replacement. The ear that received the “female” noise bursts was
selected randomly on each trial from a random set of permutations
that equated the number of trials in a block during which the “female” noise bursts were presented to the right versus left ear.
Although these manipulations were important as controls, they
146
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
were not important for the subject, who was not required to attend
preferentially to any of these stimuli.
2.2.2.1. General. There were nSFOAE-evoking stimuli interleaved
with the speech sounds such that every test trial had the potential
to yield two physiological responses (see below). To be accepted for
averaging, those physiological responses had to meet certain preestablished criteria (see Appendix), but they were evaluated for
acceptance only if a key-press response (correct or incorrect) was
made during the 2000-ms response interval. When a response
failed to meet the criteria for acceptance, an additional trial was
added to the block, and subjects received trial-by-trial feedback
about this process. By design, every block of trials provided at least
30 trials having both a behavioral response and at least one
accepted physiological response. Because sometimes subjects did
not produce a key-press within the allotted time, and sometimes
the physiological responses did not meet the pre-established
criteria, the number of trials necessary to acquire 30 usable trials
varied across blocks. The physiological responses obtained on trials
having a correct behavioral response were stored separately from
the responses obtained on trials having an incorrect response, but
in the end, the latter were discarded because they were based on
too few trials to be reliable. Thus, the physiological responses reported here are only those obtained from behaviorally correct trials,
meaning that, depending upon a subject’s behavioral performance,
the physiological responses for a particular block were based on
about 20e30 trials. After pooling across blocks (described below),
the final physiological responses were based on about 80e120
trials. Past experience with the nSFOAE procedure (Walsh et al.,
2010a,b) revealed that this is sufficient averaging to obtain reliable responses. The typical duration of a block of trials was about
4e6 min.
2.3. Physiological measures
The stimuli used to evoke the nSFOAE responses were presented
on the same trials used to collect the behavioral responses. The
nSFOAE-evoking stimuli were interleaved with the speech or SSN
stimuli and were delivered directly to the ears by the same two
insert earphone systems used to present the speech or SSN stimuli.
For the right ear, two Etymotic ER-2 earphones (Etymotic, Elk Grove
Village, IL) were attached to plastic sound-delivery tubes that were
connected to an Etymotic ER-10A microphone capsule. The
microphone capsule had two sound-delivery ports that were
enclosed by the foam ear-tip that was fitted into the ear canal. The
nSFOAE responses were elicited by sounds presented by both ER-2
earphones (see below), and were recorded using the ER-10A
microphone. No microphone was present in the left ear, just a
single ER-2 earphone for presenting the nSFOAE-evoking and
speech (or SSN) stimuli. The nSFOAEs and physiological-noise
measures always were recorded from the right ear only, but the
nSFOAE-evoking stimuli were presented to both ears simultaneously. Accordingly, here the right ear sometimes is referred to as
the ipsilateral ear and the left as the contralateral ear.
The acoustic stimuli (spoken digits or SSN sounds, and the
nSFOAE-evoking sounds) were presented and the nSFOAE responses were digitized simultaneously using a National Instruments sound board (PCI-MIO-16XE-10) installed in a Macintosh
G4 computer. Stimulus presentation and nSFOAE recording both
were implemented using custom-written LabVIEWÒ software
(National Instruments, Austin, TX). The sampling rate for both input
and output was 50 kHz with 16-bit resolution. The stimulus
waveforms were passed from the digital-to-analog converter in the
sound board to a custom-built lowpass filter/pre-amplifier before
being passed to the earphones for presentation. The analog output
of the microphone was passed to an Etymotic preamplifier (20 dB
gain), and then to a custom-built amplifier/bandpass filter (14 dB
gain, filtered from 0.4 to 15.0 kHz), before being passed to the
analog-to-digital converter on the sound board.
2.3.1. The nSFOAE procedure
As noted, the physiological measure used here was a nonlinear
version of the stimulus-frequency otoacoustic emission (SFOAE).
The SFOAE is a perstimulatory OAE emitted by the cochlea
throughout the presentation of a long-duration sound, typically a
tone (Kemp, 1978, 1980). The cochlear response is a tone of the
same frequency as the input stimulus, but it is much weaker and
has a time delay. The input tone and the SFOAE tone sum in the ear
canal, and the resultant must be processed in some way to extract
the cochlear response (Guinan et al., 2003). Our process for estimating the cochlear response relies on a version of the “doubleevoked” procedure described by Keefe (1998), and as a consequence, what was extracted is a nonlinear measure of the cochlear
response. Accordingly, we call our measure the nSFOAE (Walsh
et al., 2010a) to distinguish it from other SFOAE measures.
In the double-evoked procedure, the acoustic stimulus is presented to the same ear three times per trial (a “triplet”), and three
measures of the sound in the ear canal are collected. The first two
presentations are repetitions of an acoustic stimulus of exactly the
same frequency content, level, duration, and starting phase. In our
hands, these two presentations were made using different ER-2
earphones, each calibrated separately after placement in the ear
canal. For the third stimulus of each triplet, the two earphones
present simultaneously the same exact acoustic stimuli that they
previously had presented sequentially. Accordingly, the third
stimulus waveform was a more-or-less perfect sum of the first two
waveforms, and, here, its level was essentially 6 dB greater than
that of either of the first two stimulus presentations.
Recordings of the sound in the ear canal were collected during
all three presentations comprising each triplet. As noted, these
recordings are the sum of the input stimulus and any cochlear
response made to that stimulus, plus any extraneous sounds from
subject movement, breathing, or other physiological or ambient
noise. In order to extract the nSFOAE, the first and second recordings of a triplet were summed, and the third recording was
subtracted from this sum. If only linear processes were operating to
produce the individual cochlear responses contained in each of the
three recordings, and if the recording system itself was linear, the
result of this subtraction would have been near-perfect cancellationdspecifically, a resultant waveform whose magnitude was not
discriminable from that of the physical noise floor of our measurement system. In fact, when our procedures and equipment
were used to “stimulate” a passive cavity (see below), near-perfect
cancellation did occur; however, in a human ear canal, the result of
the double-evoked subtraction always was a residual waveform
whose magnitude did exceed the noise floor. This is the nSFOAE
response. The nSFOAE also could be called the residual from linear
prediction or the residual from additivity. The nSFOAE exists, in
part, because as stimulus level increases, the cochlear response
grows more slowly than linear additivity (Bacon et al., 2004). One
way of thinking about the nSFOAE is that it represents the amount
by which the amplitude of the sum of the first two recordings exceeds the amplitude of the third triplet recording. A stable estimate
of the nSFOAE was achieved by averaging the residual waveform
obtained from each triplet across multiple triplets in the same block
of trials. A primary strength of the double-evoked procedure is that
it eliminates the physical stimulus from the residual response.
The stimulus used here to evoke the nSFOAE always was a longduration tone presented in wideband noise. The tone was 4.0 kHz,
300 ms in duration, and 60 dB SPL in level. The noise had a
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
bandwidth of 0.1e6.0 kHz, was 250 ms in duration, and had an
overall level of about 62.7 dB SPL (a spectrum level of about 25 dB,
so the signal-to-noise ratio at 4.0 kHz was about 35 dB). The onset
of the tone always preceded the onset of the noise by 50 ms. The
tone was gated using a 5-ms cosine-squared rise and decay, and the
noise was gated using a 2-ms cosine-squared rise and decay. The
same random sample of noise was used across all presentations of a
triplet, across all triplets, across all conditions, and across all subjects. Consecutive nSFOAE stimuli always were separated by a 500ms ISI (during which digits were spoken by the two talkers). As
noted, the nSFOAE responses obtained during the presentations of
the tone and noise (the perstimulatory responses) will be described
elsewhere; here and in the companion paper on visual attention
(Walsh et al., 2014), the emphasis will be on the nSFOAEs measured
during brief silent periods that followed each nSFOAE-evoking
stimulus. Specifically, following the simultaneous offset of tone
and noise for each presentation was a 30-ms silent period from
which a measure of the physiological noise in the ear canal was
extracted for each triplet.
The above description of the double-evoked procedure and the
resulting nSFOAE response is accurate and necessary to understanding the data presented here. Every trial of every block was
analyzed as described above, and that includes the 30-ms silent
periods that ended each stimulus presentation. Note that all that
was saved for later analysis from each block of trials were the difference responses accumulated across trials for each triplet, not the
accumulated responses to each of the three presentations for each
triplet.
The double-evoked procedure is not a first choice for measuring
cochlear responses during periods of silence because SFOAEs and
nSFOAEs are, by definition, perstimulatory responses, and when
there is no tone or noise-band present, the double-evoked procedure is unnecessarily indirect. (There were no acoustic stimuli
during the silent periods, so there were no SFOAEs behaving nonlinearly across stimulus levels, so the rationale for subtracting two
summed responses from a double-amplitude response was gone.)
Once any aftereffects of the nSFOAE-evoking stimuli had died out
(see below), the double-evoked subtraction of responses obtained
during the silent period (likely) was tantamount to summing two
independent samples of noise and subtracting a third independent
sample of noise of the same approximate magnitude; that is,
summing three independent samples of noise. A simpler procedure
would have been to accumulate and store separately all three responses of each triplet, not just the difference responses trial-bytrial. However, when this study was designed, we did not anticipate that the silent periods would yield interesting attentional effects; they were included only for calibration purposes. As noted,
our primary interest initially was in the perstimulatory responses,
for which the double-evoked procedure was an appropriate choice,
and, as will be shown below, that procedure did yield measures that
differed across attentional conditions during the perstimulatory
periods, just as for the silent periodsdexperimental differences
that could not have been created by the double-evoked procedure
itself.
147
speech series (female talker) plus the interleaved nSFOAE-evoking
stimuli, and the bottom trace contains the unattended speech series (male talker) plus the identical interleaved nSFOAE-evoking
stimuli. On every trial, two triplets were presented consecutively;
thus, the third and sixth nSFOAE-evoking stimuli were twice the
amplitude of the other nSFOAE-evoking stimuli in the series.
Although nSFOAE responses always were measured from the right
ear only, the same nSFOAE-evoking stimulus series also was presented to the left ear on every trial. Recall that on one-half of the
dichotic-listening trials the subject was listening to the female voice
in the left (contralateral) ear while the nSFOAE response was
measured in the right ear, whereas in the diotic-listening condition,
the same speech stimuli and the nSFOAE-evoking stimuli were
presented simultaneously to both ears. Fig. 2 also illustrates that the
speech stimuli (about 50 dB SPL each) were weak relative to the
nSFOAE stimuli (about 60 dB SPL each). Thus, in order to perform the
dichotic- and diotic-listening tasks, subjects had to attend selectively
to relatively weak speech sounds in the gaps between relatively
strong bursts of tone-plus-noise. Note again, that the physiological
responses of interest in this report are those collected during the 30ms silent periods at the end of each nSFOAE-evoking stimulus (the
small open rectangles in Fig. 2).
2.4.2. Physiological-noise measure
The specific procedures used to obtain our physiological-noise
measure are shown in Fig. 3. For each block of trials, the responses from about 20 to 30 trials having correct behavioral responses were sorted and averaged for each of the two triplets per
trial. An example of an averaged, unfiltered nSFOAE response obtained for one of the triplets is shown at the top of the figure. The
initial 50 ms of the perstimulatory waveform is the response to the
4.0-kHz tone presented alone, and the final 250 ms is the response
to that same tone presented in wideband noise. Immediately
following the nSFOAE response was the 30-ms silent period,
delineated here by an open rectangle, from which our
physiological-noise measures were obtained. Following data
collection, each average nSFOAE response was analyzed offline by
passing the 330-ms raw waveform through a succession of 10-ms
rectangular windows, beginning at the onset of the response to
the tone in quiet and continuing in 1-ms increments until the end
of the silent period. At each step, the 10-ms waveform segment was
bandpass filtered at some center frequency (typically between 3.8
and 4.2 kHz because the tonal signal used during the perstimulatory period to elicit the nSFOAE was 4.0 kHz) using a 6thorder, elliptical digital filter, the rms amplitude of that filtered
waveform was calculated, and the result was converted to decibels
sound-pressure level (dB SPL). As illustrated at the bottom of Fig. 3,
here we will emphasize the final succession of 10 windows during
the silent period, called the asymptotic physiological response
(enclosed by the long rectangle). To estimate these asymptotic
values, the sound-pressure levels from the last 10 available analysis
windows were averaged (the levels of the 10 windows were averaged; the individual responses were not pooled). The decline in
magnitude of the physiological response during the first few milliseconds of the silent period is discussed below.
2.4. Behavior and physiology
2.4.1. Selective auditory-attention conditions
For the auditory-attention conditions, the stimuli used to evoke
the nSFOAE responses were interleaved either with the speech
stimuli used for the dichotic- and diotic-listening conditions, or with
the SSN stimuli used for the inattention condition. To illustrate this
arrangement, Fig. 2 shows an example of all the acoustic waveforms
presented to the ears during a single trial of the dichotic-listening
condition. The trace at the top of Fig. 2 contains the attended
2.4.3. Off-frequency measures
Once the initial analyses had been made at 4.0 kHz, it was clear
that analyses at other frequencies would be informative. For consistency, it was desirable to conduct these additional analyses using
the same 10% bandwidth used at 4.0 kHz, but this raised a problem.
The rise time of the digital filter would be a different fraction of our
10-ms analysis window at the different center frequencies, meaning that the true levels of the physiological noise would be
increasingly underestimated the lower the frequency. Our solution
148
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
Fig. 2. Schematic showing how the nSFOAE and speech waveforms were interleaved during one trial of the dichotic-listening condition. The nSFOAE-evoking stimulus always was
composed of a 300-ms tone and a 250-ms frozen sample of wideband noise, and the onset of the tone always preceded the onset of the noise by the difference in their durations. A
30-ms silent period, shown here as an open rectangle, followed each nSFOAE-evoking stimulus for the purpose of estimating the magnitude of the physiological noise in the nSFOAE
recordings. The nSFOAE cancellation procedure was performed separately on each of the two triplets presented on each trial, yielding two estimates of the nSFOAE per trial, and two
estimates of the background noise from the silent periods. The seven speech waveforms used for the behavioral task were presented in the 500-ms ISIs between nSFOAE presentations. A 2000-ms response interval and a 200-ms feedback interval completed each trial. For each block of trials, the physiological responses from the trials having a correct
behavioral response were based on about 20e30 trials.
was to estimate correction factors for each center frequency
examined. Specifically, for each frequency of interest, 10-ms sample
was obtained from a steady-state pure tone using the same procedures, software, and 6th-order elliptical filter set for a 10%
bandwidth, as were used to analyze the physiological responses.
The rms output of the filter then was compared with the actual rms
of the input waveform, and a correction factor was calculated in
decibels. These corrections ranged from 21.5 dB at 1.1 kHz to 1.5 dB
at 7 kHz (2.46 dB at 4.0 kHz). All noise levels reported in this and
the companion paper (Walsh et al., 2014) have been adjusted by
these frequency-dependent correction factors; the values in Walsh
et al. (2010a,b) were not so corrected. Note that these adjustments
had no effects on whatever differences existed between the physiological responses obtained from different attention conditions.
Fig. 3. Schematic showing how the physiological-noise measure was calculated. At the
end of a block of trials, the average physiological-noise recording from each triplet was
analyzed by passing a 10-ms rectangular window through the raw waveform in 1-ms
steps. At each step the noise waveform was bandpass filtered, and the rms voltage was
converted to decibels SPL. The figure emphasizes the physiological-noise magnitudes
from the final ten windows of the 30-ms silent period (enclosed by the rectangle), the
overall levels of which were averaged for comparison across conditions.
2.5. Data analysis
Although not mentioned previously, data collection and analysis
were slightly different for the three experimental conditions. For
the diotic-listening condition, the physiological responses that
satisfied the criteria for acceptance (see Appendix) were pooled
across all trials in that block having the same behavioral response.
That is, at the end of each block of trials, there were four physiological measures: an averaged physiological response for all the
trials having correct behavioral responses, all those having incorrect behavioral responses, and each of those separately for triplets 1
and 2. However, for the dichotic-listening and inattention conditions, there were eight physiological measures at the end of each
block of trials. The reason is that the four measures just described
for the diotic condition were kept separately for trials on which the
female voice (or female-derived speech-shaped sounds) was in the
right ear (which had the recording microphone; the ipsilateral ear)
or the left (contralateral) ear. The purpose of this procedure was to
allow a test of the logical possibility that the amount of efferent
activity differs in ears having, and not having, the targeted, female
voice. However, the use of this procedure meant that the number of
individual waveforms contributing to the physiological response in
the diotic condition was about double that in the dichotic and
inattention conditions. A solution emerged when we found no
systematic difference in the physiological responses from the
ipsilateral and contralateral ears, within either subjects or conditions (an unexpected result). This allowed us to sum the raw
contralateral responses with the raw ipsilateral responses for the
dichotic and inattention conditions (still using only behaviorally
correct trials), which essentially equated the number of individual
trials for all conditions.
Data were averaged in another way for another purpose.
Namely, each subject completed each of the experimental conditions at least 4 times (M ¼ 4.6). The averaged response waveform
from each block was passed through the moving-window filter
analysis and the values obtained were converted to dB SPL. The
resulting moving-window analyses were pooled (averaged) to yield
a single estimate of the physiological noise for each subject and
each condition. In order to estimate the asymptotic level of the
response at the end of the silent period (see Fig. 3), the levels in the
final 10 analysis windows of the pooled response were averaged.
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
Fig. 4. The nSFOAE magnitude for a representative subject over the time course of the
recording epoch: tone-alone (50 ms), tone-plus-noise (250 ms), and silent period
following termination of tone-plus-noise (30 ms). The dashed and solid lines show
nSFOAE magnitude during the inattention and attention conditions, respectively. The
tone was 4.0 kHz, 60 dB SPL; the noise was 0.1e6.0 kHz, 25 dB spectrum level. The
values plotted are levels measured by a 400-Hz bandpass filter centered at 4.0 kHz for
each 1-ms step in a succession of 10-ms rectangular windows. The same frozen-noise
sample was used for all presentations in every condition; hence the short-term temporal fluctuations in magnitude are approximately parallel for the two conditions. The
dotted/dashed gray line below the nSFOAE data represents the average physical noise
floor obtained using the identical experimental procedure but with the earphone/
microphone assembly placed in a 0.5-cc passive cavity.
The end result was similar numbers of individual trials contributing
to the averaged responses for all three experimental conditions for
each subjectdtypically between 80 and 120 trials (all having correct behavioral responses).
2.5.1. Statistical measures
Physiological-noise magnitudes from the selective-attention
and inattention conditions were compared using analyses of variance (ANOVA), and measures of effect size (d; see Cohen, 1992).
Here, effect size is the difference between the means of two distributions of data divided by an estimate of the pooled standard
deviation across the two distributions (see Eq. (1)). By convention,
effect sizes between 0.20 and 0.50 are considered to be small, effect
sizes between 0.50 and 0.80 are considered to be medium, and
effect sizes greater than 0.80 are considered to be large (Cohen,
1992).
d ¼ ðm1 m2 Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
s1 *n1 1 þ s22 *n2 1 =ðn1 þ n2 2Þ
(1)
Accordingly, Fig. 4 shows the nSFOAE response for one subject
across the full time course of a stimulus presentation, using the
analysis procedure just described. As can be seen, there is a small
and essentially constant nSFOAE during the 50 ms of tone-alone, a
short hesitation at the onset of the wideband noise, a rising, dynamic response lasting about 100 ms, and then an apparently
asymptotic response lasting throughout the course of the toneplus-noise. This is the same response pattern reported in Walsh
et al. (2010a) to essentially the same stimuli. In accord with
Guinan (e.g., Guinan et al., 2003), the interpretation is that tonealone was not effective in triggering an efferent response, but the
wideband noise was; the efferent response takes about 100 ms to
reach its maximum; it then remains essentially constant
throughout the course of the activating sound (and it persists for
several hundred milliseconds thereafter). What is new in this figure
is the difference in the nSFOAE magnitudes for the inattention and
attention conditions.
Fig. 4 also shows the focus of the current paper, the silent period.
As shown, physiological responses still could be measured after the
offset of the tone-plus-noise, and those responses still showed a
difference for these two attentional conditions. The responses
during the silent period were invariably smaller in the selectiveattention conditions than in the inattention conditions, and this
was true for every subject and for both the auditory- and visualattention tasks (see Walsh et al., 2014). Before showing the comparisons between the attention and inattention conditions in detail
for the silent period, we describe several equivalences in the data:
between ipsilateral and contralateral measurements, between the
two triplets, and between dichotic- and diotic-listening conditions.
3.1. Comparison of ipsilateral and contralateral measures
The physiological-noise levels measured in the right ear were
essentially the same whether the female voice being attended to was
in the right or the left eardthe ipsilateral and contralateral trials,
respectively. The data for both triplets of the dichotic-listening
condition are shown in Table 1. To review, each entry is based on
the following: for each block of trials, each subject provided one
averaged physiological response for ipsilateral trials on which the
behavioral response was correct, and one averaged physiological
response for contralateral trials on which the behavioral response
was correctdfor each triplet. For each of those responses, the levels
in the 10 analysis windows beginning 10 ms into the silent period
were averaged window-by-window, and those values were averaged
with the levels obtained from at least three other blocks of trials of
Table 1
Asymptotic physiological-noise levels (dB SPL) in the dichotic-listening condition,
from trials having correct behavioral responses.
Subject
3. Results
Behaviorally, subjects performed well on the digit-recognition
task in the selective-attention conditions, indicating that they
were attending reliably to the correct stimuli. This is an important
outcome to consider when interpreting the differences between
physiological-noise levels in the attention and inattention conditions. In the two auditory-attention conditions, subjects performed
at 86.0% correct on average (range ¼ 73.0e98.5%). There was no
statistical difference in behavioral performance across the dichoticand diotic-listening conditions.
Although the emphasis in this paper is on the physiological
responses during the silent periods, there is value in providing the
reader with some information about the general pattern of
response seen during the preceding perstimulatory period.
149
Triplet 1
Triplet 2
Ipsilateral
Contralateral
Ipsilateral
Contralateral
L01
L02
L03
L04
L05
L06
L07
L08
9.5
10.1
11.2
10.8
11.4
10.3
8.6
10.8
10.6
10.1
10.8
11.1
11.0
11.4
7.9
10.7
10.1
9.4
10.8
10.5
9.7
12.3
8.2
10.6
9.0
9.8
11.4
11.0
11.7
8.9
10.4
10.9
Mean
Std. dev.
Effect size (Ipsilaterald
Contralateral)
10.3
0.9
10.4
1.1
0.1
10.2
1.2
10.4
1.1
0.2
Each entry is a window-by-window mean first across the final ten 10-ms windows
of the silent period and then across at least four 30-trial blocks, corresponding to
about 80e120 trials (correct only) averaged for each physiological response.
150
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
Table 2
Asymptotic physiological noise levels (dB SPL) in the inattention and dichoticattention conditions, from trials having correct behavioral responses.
Subject
L01
L02
L03
L04
L05
L06
L07
L08
Mean
Std. dev.
Effect size
(InattentiondDichotic)
Triplet 1
Triplet 2
Inattention
Dichotic
Inattention
Dichotic
6.2
8.8
6.7
8.6
8.4
8.5
5.6
7.6
7.5
1.2
10.0
10.1
11.0
11.0
11.2
10.9
8.3
10.8
10.4
1.0
2.6
8.9
8.3
6.3
10.0
9.0
9.0
7.0
8.2
8.3
1.2
9.5
9.6
11.1
10.7
10.7
10.6
9.3
10.7
10.3
0.7
2.2
Each entry is a window-by-window mean first across the final ten 10-ms windows
of the silent period and then across at least four 30-trial blocks, corresponding to
about 80e120 trials (correct only) averaged for each physiological response.
the same condition. These results we call the asymptotic levels of the
physiological noise. Larger negative values indicate a smaller
physiological-noise measurementda quieter recording.
As the means and effect sizes at the bottom of Table 1 reveal, the
asymptotic levels were essentially identical for the ipsilateral and
contralateral measures for both triplets. This was an unexpected
outcome because, for us, the wiring of the olivocochlear system
(Brown, 2011) always has suggested that “ear-switching” is a likely
function of efferent activation (e.g., Cherry, 1953). The data in
Table 1 also reveal that there were no differences in the physiological noise levels across the two triplets.
For the inattention condition, just as for the dichotic-listening
condition, the physiological responses from the right ear were
averaged separately depending upon whether the SSN stimuli
based on the female-spoken digits were presented to the right or
left ear. This was done even though the subjects were unaware that
the SSN stimuli in one ear were based on the female voice and those
in the other ear on the male voice, nor were they required to attend
to those SSN stimuli. The results were similar to those in Table 1;
the means calculated either across ears or triplets differed by less
than 1 dB. (No systematic differences between the ipsilateral and
contralateral measures, nor the two triplets were seen in the visualattention data either; Walsh et al., 2014). Accordingly, we conclude
that the ipsilateral and contralateral measures from the silent
period are essentially equivalent, as are the triplet 1 and triplet 2
measures. So, in all of the analyses reported below, the ipsilateral
and contralateral data are averaged, within subjects and windowby-window, as a way to improve the reliability of our measures.
Also, for simplicity, often only data for triplet 1 are presented.
3.2. Comparison of diotic and dichotic conditions
The two auditory-attention conditions in this study involved
either dichotic or diotic stimulus presentations. Phenomenologically (for the authors, at least, if not for the highly practiced subjects), the dichotic condition was easier because the female and
male voices originated from different spatial locations. However,
the behavioral data were not systematically different for those two
listening conditions. Were the physiological-noise measures also
similar for the dichotic and diotic conditions? The short answer is
yes; there were no systematic differences between the dichotic and
diotic auditory-attention conditions. Accordingly, for the remainder
of the Results section, only the dichotic data are presented. The
details of the comparison between the diotic- and dichoticattention conditions are in the Appendix (Section 6.2).
Fig. 5. (Left) Physiological-noise magnitudes from the silent periods in the inattention
(white bars), and dichotic-listening conditions (black bars), averaged across eight
subjects. These data were collected from triplet 1. Error bars show one standard deviation. The physiological-noise magnitudes recorded in the ear canals of our subjects
were lower in the dichotic-listening condition than in the inattention condition, and
this was true for all eight subjects individually (see Table 2). The mean difference
between conditions was statistically significant. (Right) Physical noise-floor magnitudes calculated across three repeated measures of each condition in a passive cavity (a
syringe). Error bars show one standard deviation. In contrast with the human data, the
physical noise-floor magnitudes recorded in the passive cavity did not differ across
conditions.
3.3. Comparison of inattention and attention conditions
This brings us to the central question motivating this research:
Were the physiological-noise levels different when subjects were
attending, or not attending, to the spoken digits? The answer is yes.
In Table 2 we show the window-by-window averages for the ten
analysis windows beginning 10 ms into the silent period (the
asymptotic values) for both the inattention and dichotic-attention
conditions, and for both triplets. Examination of Table 2 reveals
that for every subject, and for both triplets, the physiological-noise
magnitudes always were larger (noisier) in the inattention condition, and smaller (quieter) in the dichotic-listening condition. The
effect sizes for the differences between inattention and dichotic
attention were greater than 2.0 for both triplets. A two-way univariate ANOVA, with experimental condition and triplet as the two
factors, revealed a significant main effect of condition on the average
noise magnitudes [F(1, 28) ¼ 45.8, p < 0.0001], but the main effect of
triplet was not significant [F(1, 28) ¼ 1.2, p ¼ 0.3], nor was the
interaction of condition and triplet [F(1, 28) ¼ 1.1, p ¼ 0.3]. Again,
similar results were obtained for the diotic-attention condition.
Our interpretation (elaborated below) is that the cortico-olivo
and medial olivocochlear components of the efferent system were
more strongly activated during the selective-attention conditions
than during the inattention condition, and that those differences in
activation persisted into and throughout the silent period (see
Backus and Guinan, 2006; Walsh et al., 2010a,b).1
1
We regard the apparently asymptotic noise levels seen at the end of the silent
period to be only temporarily asymptotic. If the differences in the physiological
noise seen at the end of the silent period are in fact attributable to differences in the
strength of the efferent effect under attention and inattention, then we should
expect those differences to diminish as the silent period lengthens. That is, as the
persistence of the efferent effect begins to decline (i.e., after a few hundred milliseconds of silence; see Backus and Guinan, 2006; Walsh et al., 2010a,b), then we
should expect the physiological noise for the attention and inattention conditions
to begin to converge, and with further increases in the silent period, that convergence should become complete. The final level should be somewhere above that for
the current inattention condition.
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
151
3.4. Supplementary analyses
3.4.1. Physical-noise measures
In order to test the possibility that the observed differences in
the magnitudes of physiological noise were due to an unappreciated procedural difference across the three conditions, an additional calibration was conducted. The acoustic stimuli were played
to, and recorded from, a syringe (a non-human, passive cavity)
whose volume was approximately 0.5 cm3, using exactly the same
equipment, software, and procedures as was used with the human
subjects. Responses were collected for three blocks of trials for each
(human) condition of listening. The data from these physical-noise
recordings were averaged and analyzed in the same way as the
physiological-noise recordings. The asymptotic “responses” for
triplet 1 from the silent periods are shown on the right side of Fig. 5.
Across the experimental conditions, the average physical-noise
magnitudes were highly similar to one another (averaging
about 15.7 dB at 4.0 kHz). For comparison, the physiological-noise
data from triplet 1 are shown on the left side of Fig. 5. The strong
implication is that the differences in the human data observed in
the attention and inattention conditions were not attributable to
inconsistencies or artifacts in the software or procedures used.
3.4.2. Initial decline
The data in Table 2 and Fig. 5 were obtained by averaging the
levels from the last ten 10-ms windows at the end of the 30-ms
silent period, and thus represent an asymptotic level of the physiological noise in the ear canal. For some subjects, the physiological
responses at the beginning of the silent period were stronger than
at asymptote, and they underwent a decline during the first few
milliseconds of the silent period. For other subjects there was little
or no decline. Because averaging the responses across all subjects
would have misrepresented the individual data, we partitioned the
subjects into two groups prior to averaging. The top panel of Fig. 6
shows across-subject averages for the three subjects exhibiting
little or no decline in response magnitude during the silent period,
and the bottom panel of Fig. 6 shows the same for the five subjects
who did exhibit a decline. The averages across subjects shown in
each panel are for each of the twenty 10-ms windows spanning the
entire 30-ms silent period. The dashed lines in the bottom panel
show the levels of physical noise measured in a passive cavity for
each condition. Only triplet 1 is illustrated because the data from
triplet 2 were similar.
We believe that the ears exhibiting decline were emitting echolike responses to the tone-plus-noise stimuli that ended just before
the silent periods. That is, we believe the declines represent
decaying nSFOAEs after stimulus offset. All the subjects exhibited
strong perstimulatory nSFOAE responses, but for the subgroup in
the top panel of Fig. 6, those responses had declined to the
asymptotic physiological noise floor by the end of the 5-ms decay of
the tone-plus-noise stimulus, especially in the inattention condition. For both groups, more decline was evident for the dichoticattention condition. Note that the differences between the inattention and dichotic-attention conditions present in the final moments of the 30-ms silent period (Fig. 5 and Table 2) also were
present, or were beginning to emerge, soon after the onset of the
silent period. When the asymptotic noise power was subtracted out
for each condition, the difference between inattention and
dichotic-attention still was present for both groups at the beginning of the silent period (not shown), and this difference was about
3 dB for both groups of subjects.
3.4.3. Correlations during initial decline
If the early portions of the noise-like waveforms we obtained
during the silent periods were, in part, after-effects of the tone-
Fig. 6. Physiological-noise magnitudes as a function of time from the start of the 30ms silent period. The top panel shows average magnitudes across the three subjects
whose nSFOAE responses showed no decline during the first few milliseconds of the
silent period, and the bottom panel shows the average magnitudes for the five subjects
whose nSFOAEs did exhibit a decline to asymptote during the silent period. The data in
both panels were averaged for triplet 1, and were filtered at 4.0 kHz. The data collected
during the inattention and dichotic-attention conditions are shown as open squares
and black circles, respectively. The lines connecting the data points are shown only to
guide the eye. At the bottom of the lower panel are physical noise measures obtained
from a passive cavity (a syringe) using the same equipment, software, and procedures
as used with the humans; these means were calculated across all 20 noise magnitudes
obtained over the entire 30-ms silent period. For the human data, error bars show one
standard error of the mean. For the syringe data, the flags show one standard deviation
calculated across three blocks of trials.
plus-noise stimuli preceding them (a decaying SFOAE), then the
fine structure of the averaged responses obtained at similar times
post-stimulus ought to be similar for triplets 1 and 2. To test this
implication, we calculated within-subject correlation coefficients
for pairs of filtered, averaged responses beginning at corresponding
moments during the silent periods from triplets 1 and 2dnamely
10-ms segments beginning 1 or 2 or 3, etc., ms after the offset of the
tone-plus-noise. The calculations were averaged separately within
the same two groups of subjects as used in the previous sectiondthose who did, or did not, exhibit a decline in physiologicalnoise level during the first few milliseconds of the silent period. The
results were qualitatively the same for the two groups. The correlations between the fine structures of the corresponding averaged
responses from the two triplets were 0.8 or greater for the first 10ms analysis window of the silent period and then gradually
declined with successive advancements of the window. The primary difference between the two groups of subjects was that the
five subjects showing an initial decline in physiological-noise level
had correlations that stayed high longer as the analysis window
was advanced into the silent period. Specifically, when the correlations had fallen to about 0.0 for the no-decline group (at about
152
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
307 ms), they still were about 0.4 for the with-decline group. For
both groups, the declines in correlation obtained were similar for
both the attention and inattention conditions. All these outcomes
were obtained with the analysis filter centered at 4.0 kHz, but a
similar pattern of results was obtained when the analysis filter was
moved to other frequencies. These results support the interpretation that the energy seen during the first few milliseconds of the
silent period consisted largely of decaying SFOAEs evoked by the
tone-plus-noise stimulus, and that the energy seen during the final
milliseconds was random in nature (i.e., noise).
3.4.4. Off-frequency measurements
In addition to measuring physiological-noise magnitudes at the
frequency of the 4.0-kHz tone used to elicit the nSFOAE, noise
measures also were obtained at a number of other frequencies. The
specific frequencies were selected for different reasons; some
represented peaks or valleys in the spectrum of the last 20 ms of the
noise sample used, while others were chosen to fill voids in the
spectral set. In the end, the outcomes did not differ according to the
original basis for choice of the individual frequencies.
For each subject and each condition, the averaged waveform for
triplet 1 of every trial was bandpass filtered symmetrically around
each selected frequency, using a filter whose bandwidth was 10% of
that frequency. A correction factor then was applied to the
magnitude at each center frequency to account for the differential
rise times of the different digital filters. Physiological-noise magnitudes at each frequency were calculated by averaging the last ten
available data points in the silent period (from 310 to 319 ms), just
as was done for the measures at 4.0 kHz. In Fig. 7, the physiologicalnoise magnitudes averaged across subjects in the inattention and
dichotic-listening conditions are plotted as a function of frequency.
The symbols for the two experimental conditions are the same as
those used in Fig. 6 above.
The data in Fig. 7 show that noise magnitudes were largest at the
lowest and highest frequencies tested, and were smaller at intermediate frequencies; the smallest noise magnitudes were
measured between about 2 and 4 kHz. At every frequency selected,
physiological-noise magnitudes were higher in the inattention
condition and lower in the dichotic-listening conditiondthe same
pattern of results that was observed at 4.0 kHz (those data are
shown again in Fig. 7 for comparison). Interestingly, the noise
magnitudes at 4.0 kHzdthe frequency of the probe tone used to
measure the nSFOAE during the perstimulatory perioddwere not
noticeably different from those at neighboring frequencies. Measurements at the two highest frequencies shown in Fig. 7 (6.5 and
7.0 kHz) were made outside the passband of the frozen sample of
wideband noise that was presented simultaneously with the probe
tone, yet the outcomes were essentially the same. These data reveal
that the mechanism responsible for the marked differences across
our attention conditions operated across a wide spectral region.
In addition to the human data, Fig. 7 contains two lines for
purposes of comparison. The dashed line shows the spectrum of the
noise measured in a syringe using the same equipment and procedures as used for the human data. (The published frequency
response of the Etymotic ER-10A microphone varies less than 3 dB
over the entire frequency range shown.) The solid line has a slope of
3 dB per octave, which is the rise expected in the human and syringe measurements attributable to the use of the 10% filter
bandwidth when analyzing the data. Note that the larger differences between the human and syringe data at low frequencies than
at high frequencies in Fig. 7 likely are attributable to factors such as
breathing, swallowing, and other essential sounds in the human
subjects.
The physiological-noise magnitudes in Fig. 7 were compared
across the inattention and dichotic-attention conditions using a
Fig. 7. The asymptotic physiological-noise magnitudes as a function of frequency,
averaged across the final 10 analysis windows of the silent period and across eight
subjects for triplet 1 of the inattention and dichotic-attention conditions. The data
from the inattention condition are shown as open squares and the data from the
dichotic-listening condition are shown as black circles. Error bars show one standard
error of the mean. At each frequency, the physiological-noise waveforms were bandpass filtered using a bandwidth that was 10% of that particular frequency. The dashed
line shows the mean level of the physical noise floor across experimental conditions,
when measured in a passive cavity. The solid line is shown for comparison with the
data; it has a slope of 3 dB per octavedthe slope predicted by the use of a filter whose
bandwidth doubles each octave.
two-way univariate ANOVA, with experimental condition and
center frequency (of the bandpass filter) as the two factors. Significant main effects were revealed for both condition [F(1,
168) ¼ 54.4, p < 0.0001] and center frequency [F(11, 168) ¼ 75.5,
p < 0.0001], but not for the interaction of the two factors [F(11,
168) ¼ 0.2, p ¼ 1.0].
In passing, we note that it probably is incorrect to assume that
the noise measurements shown for the humans at each frequency
in Fig. 7 originated from single, separate “characteristic places”
along the basilar membrane. Rather, it is likely that the level
Fig. 8. Physiological-response magnitudes at four selected frequencies as a function of
time from the start of the 30-ms silent period. Two panels were used to eliminate
overlap of the functions. The data points at the far right of the figure show the
asymptotic physiological-noise magnitudes averaged over the last ten analysis windows of the silent period. The data from the inattention and dichotic-attention conditions are shown as open squares and closed circles, respectively. The results for the
diotic-listening condition were essentially the same as for the dichotic condition. For
each frequency, the physiological-noise waveforms were bandpass filtered using a
bandwidth that was 10% of the indicated center frequency.
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
measured at each frequency represents a sum across a number of
reflection sites in the general vicinity of those characteristic places
(see Shera, 2003).
3.4.5. Initial declines across frequency
The rates of decline of the physiological responses (Fig. 6) also
were measured at the frequencies shown in Fig. 7. Fig. 8 shows
noise magnitudes in the inattention and dichotic-attention conditions at four frequencies as a function of time from the start of the
30-ms silent period. The four frequencies shown are the lowest and
highest that were tested (1.1 and 7.0 kHz), plus two intermediate
frequencies (3.2 and 5.3 kHz). These plots are averages across all
eight subjects because here the individual differences in magnitude
of decline were smaller than those seen at 4.0 kHz (Fig. 6) (perhaps
because at frequencies other than 4.0 kHz, the response more
closely reflects the bandwidth of the measurement filter, whereas
at 4.0 kHz the response is more nearly tonal). For the data measured
at 1.1, 3.2, and 5.3 kHz, just as for 4.0 kHz (Fig. 6), initial declines in
noise level were observed across the two conditions. The highest
frequency, 7.0 kHz, showed no initial decline, presumably because
that frequency was outside the passband of our MOC-eliciting
noise.
Four general results emerged from the analysis of the data
across multiple frequencies: (1) the time-course of the decline of
the physiological response became progressively shorter as frequency was increased, (2) there was no initial decline at frequencies above the passband of the wideband noise, (3) the
difference between the inattention and dichotic-attention conditions was smaller at the lower frequencies, and (4) the temporal
and asymptotic effects seen at 4.0 kHz were not different from the
effects seen at neighboring frequencies, suggesting that the 4.0-kHz
tone played no unique role in the effects seen during the silent
period. These facts confirm the assumption that the initial declines
are attributable to echo-like responses (decaying SFOAEs) to the
various frequencies present in the tone-plus-noise stimulus that
terminated just prior to the silent period. Presumably the declines
took longer at lower frequencies because of longer intra-cochlear
travel times, and it may be that the smaller attention effect at
low frequencies is attributable to the reduced density of efferent
innervation at the apical end of the cochlea (Brown, 2011; Liberman
et al., 1990).
3.4.6. Measurements at SOAE frequencies
Prior to data collection in the attention conditions, SOAEs were
identified from recordings in the quiet, using our standard procedures (Pasanen and McFadden, 2000). Six of the eight subjects
who participated in this experiment had SOAEs in the (right) ear
from which our physiological-noise measures always were obtained. To examine whether SOAE magnitudes also were affected by
the attentional manipulations, the averaged physiological-noise
waveforms from triplet 1 of the inattention and dichoticattention conditions were filtered around the frequencies of each
subject’s two strongest SOAEs. As before, the bandwidth of the filter
was 10% of the SOAE frequency, centered on the frequency of the
SOAE. The same moving-window analysis procedure described
above was used for the SOAE analyses.
The result was that the SOAE magnitudes measured during the
silent period also were stronger for the inattention condition than
for the dichotic condition (except for one SOAE for one subject
where the difference was reversed by 0.3 dB). The average difference between the inattention and dichotic-attention conditions
over the final ten windows of the silent period was 1.4 dB
(compared to about 2.8 dB in Table 2 for all eight subjects and
triplet 1 at 4.0 kHz). These are the results for triplet 1; this attentional effect on SOAEs was smaller for triplet 2. A simple
153
explanation is that, during the perstimulatory period, cortico-olivo
efferent activity was stronger in the attention condition than in the
inattention condition, and those different levels of efferent activity
then persisted through the silent period. Note that, during the
perstimulatory period, there would have been two contributions to
the overall efferent inhibition acting on the SOAEs: the reflexive
component triggered by the wideband noise (Guinan, 2006;
Guinan et al., 2003; Walsh et al., 2010a,b) plus whatever modulation of the reflexive component existed because of selective
attention.
Between-triplet correlations were calculated between the fine
structures of the averaged responses obtained at SOAE frequencies,
just as was reported for the averaged responses at 4.0 kHz (Section
3.4.3). Unlike the systematically declining correlations observed
through the silent period at 4.0 kHz, the between-triplet correlations between responses from corresponding analysis windows at
SOAE frequencies remained at about 0.8e0.9 over the entire 30-ms
silent period. This outcome suggests that SOAEs were present
immediately after the end of the tone-plus-noise stimulus used to
measure the physiological response, and further, the SOAEs
apparently were being synchronized by the stimuli used (e.g.,
Wilson, 1980; Wit and Ritsma, 1979), as evidenced by the high
running correlations between the fine structures of the averaged
responses.
4. Summary
(1) A tone-plus-noise stimulus was used to activate the MOC
efferent system and to obtain a nonlinear measure of
cochlear response (the nSFOAE) during both the perstimulatory period and a 30-ms silent period following the
tone-plus-noise stimulus. In some conditions, the subjects
needed to attend to auditory stimuli in order to perform a
behavioral task; in other conditions, they had no reason to
attend.
(2) The physiological noise recorded in the ear canals of our
human subjects was substantially weaker during behavioral
tasks requiring selective auditory attention than during a
task involving relative inattention. This was true for all
subjects. The magnitude of the difference was about 2e3 dB,
which corresponded to effect sizes larger than 2.0. The
implication is that the cortico-olivo component of the
efferent system was more active, and therefore MOC efferent
activity was greater, when selective attention was required to
complete the behavioral task.
(3) The magnitudes of the asymptotic physiological-noise responses were essentially identical for the dichotic- and
diotic-attention conditions. Behavioral performance in those
conditions also did not differ.
(4) In the dichotic-attention task, the magnitude of the asymptotic physiological response in the right (ipsilateral) ear was
essentially the same whether that ear contained the attended (female) voice or the non-attended (male) voice. That is,
there was no evidence of the efferent system acting to
implement switching between the ears in a condition where
it might have been expected.
(5) The physiological responses exhibited an initial period of
decline in magnitude before reaching their asymptotic levels,
presumably because of echo-like responses (decaying
SFOAEs) to the tone-plus-noise stimuli that terminated just
prior to the silent periods. The physiological responses during the inattention condition generally were greater than
those during the dichotic and diotic conditions during this
initial period of decline as well as during the remainder of
the 30-ms silent period. This confirms that the differences
154
(6)
(7)
(8)
(9)
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
across experimental conditions were attributable to persistence of the MOC activity present during the tone-plus-noise
stimuli into the silent period.
Cross correlations for pairs of filtered, averaged responses
from triplets 1 and 2 of the same block of trials were high
(0.8e0.9) for the first several milliseconds of the silent
period, and then declined to values near zero as asymptotic
noise magnitudes were reached. This suggests both that the
decline in physiological-response magnitude is attributable
to a decline in the echo-like responses to the acoustic stimulus and that, for the asymptotic measures, the doubleevoked procedure was essentially summing uncorrelated
samples of noise.
Physiological-noise magnitudes measured at different frequencies in the spectrum also differed according to the
attentional demands of the behavioral task, just as was
observed at 4.0 kHz. In other words, the presence of the 4.0kHz probe tone was not necessary to observe the effects of
attention on cochlear noise. The time course of decay in the
silent period was slightly longer, and the magnitude of decay
greater, at low frequencies than at high frequencies.
The physiological-noise levels measured at SOAE frequencies
in individual ears typically also were larger during inattention than during attention, suggesting that the efferent system was inhibiting the mechanisms underlying both SOAEs
and SFOAEs. When cross correlations between triplets were
examined at SOAE frequencies, the values remained high
across the entire duration of the silent period. The strong
implication is that the SOAEs were being synchronized by the
nSFOAE-evoking stimuli.
Because behavioral responses of the same type (a key press)
occurred at the end of every trial during both the inattention
and attention conditions, the various differences between
conditions cannot be attributed to the absence of similar
motor responses in one condition, as was logically possible in
past studies (Puel et al., 1988; Avan and Bonfils, 1992; Meric
and Collet, 1992, 1994b; Froehlich et al., 1993; Ferber-Viart
et al., 1995). The nature of the behavioral task can affect
the results when attention is manipulated; our behavioral
task was more similar to the identification tasks than the
detection tasks used by others (e.g., Hafter et al., 1998; Gallun
et al., 2007)
5. Discussion
We attribute the observed differences in physiological-noise
magnitudes between the inattention and attention conditions to
different levels of activation of the medial olivocochlear bundle
(MOCB) associated with the differing attentional demands of the
behavioral tasks. Just as the superior olivary complex (SOC) sends
efferent projections into the cochlea, the SOC receives efferent
projections from the inferior colliculus in the brainstem, as well as
direct projections from auditory cortex (Mulders and Robertson,
2000a,b). The very existence of these latter connections suggests
the possibility that cognitive processes originating in auditory
cortex can affect the processing of the sounds upon which those
cognitive processes are being based. The MOCB generally is viewed
to be the primary neural pathway through which attention can
modulate cochlear activity.
We recognize that our nonlinear measure is not capable of
distinguishing between noise that originates in the cochlea and
noise that originates in the middle- or outer-ear cavities. Here we
have presumed that the preponderance of what we measured
during our 30-ms silent intervals originated from the cochlea
largely because it is difficult for us to understand how attentional
demand could alter the contributions from the middle or outer ears
(details are below).
5.1. Interpreting the double-evoked measure
If, as argued above, the application of the double-evoked procedure to the responses measured during the 30-ms silent periods
was tantamount to summing three independent samples of noise,
then how is it that our estimates of that physiological noise were
different depending upon the attentional demands of the behavioral task? In order to understand the answer, it is necessary to
understand what we believe was happening prior to the silent
periods, during the perstimulatory segments of each trial. Recall
that each of the three nSFOAE-evoking stimuli presented during
every triplet consisted of 50 ms of 4.0-kHz tone-alone, followed by
250 ms of tone-plus-wideband-noise, followed by the 30 ms of
silence that was emphasized here. Guinan et al. (2003) have shown
that single tones of moderate level are not particularly effective at
activating the MOC efferent reflex, but wideband noises are, especially when they are binaural (Guinan et al., 2003). In accord with
Guinan’s reports and previous work in this laboratory (Walsh et al.,
2010a,b), the differencing manipulation required by the doubleevoked procedure revealed the averaged, filtered nSFOAE
response during the tone-alone period to be weak and essentially
constant throughout the 50-ms duration of tone-alone (see Fig. 4).
[The nonlinearity responsible for this nSFOAE response either
stems from mechanisms other than the MOC system or it depends
on some component of the MOC system having an extremely long
time-constant (Backus and Guinan, 2006; Cooper and Guinan,
2003; Guinan, 2006; Sridhar et al., 1995).] Then, following the
onset of the wideband noise, and the concomitant activation of the
MOC efferent reflex (Guinan, 2006, 2011; Guinan et al., 2003), the
nSFOAE response showed a rising, dynamic segment that took
about 100e150 ms to reach asymptote (see Walsh et al., 2010a,
Fig. 1; cf., Backus and Guinan, 2006), after which the response
remained essentially constant throughout the remainder of the
tone-plus-noise stimulus. Although MOC activation by the wideband noise is reflexive, its magnitude apparently differs depending
upon the attention condition, meaning that the magnitude of the
nSFOAE response at the end of the tone-plus-noise segment also
differed depending upon the level of attention required. Specifically, we believe that, during the perstimulatory period, the reflexive MOC efferent response to the wideband noise was
augmented by a cortico-olivo efferent response such that the
overall efferent effect was greater during the attention conditions
than during the inattention condition (those perstimulatory data
will be reported more fully elsewhere). What is missing from this
interpretation of the events during the perstimulatory period is
why there also should be differences observed during the silent
period following the perstimulatory period.
As previously demonstrated (Walsh et al., 2010a,b, Fig. 9, Fig. 6;
Goodman and Keefe, 2006; Backus and Guinan, 2006), the reflexive
efferent response activated by the wideband noise does not recover
immediately upon the termination of the wideband noise, but
rather it can persist unabated for several hundred milliseconds
after noise offset. It follows that the level of the overall efferent
activity at the onset of the silent periods (and throughout their
short time course), was greater in the attention conditions than in
the inattention condition. Accordingly, any measure obtained from
the cochlea during the silent period that can be affected by efferent
activity should have been different depending on the subjects’ level
of attention. Specifically, we suggest that the decaying SFOAEs seen
at the beginning of the silent period (Figs. 6 and 8), the nSFOAEs
measured at SOAE frequencies, and the physiological-noise
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
measurements obtained during the latter part of the silent period
(Figs. 5 and 7), all were weaker during the attention conditions than
the inattention conditions because the persisting efferent activity
was greater during the attention conditions than the inattention
conditions.
The physiological noise measured during the latter part of the
silent period has, to our knowledge, not been reported previously.
What is the origin of this noise? There may be multiple sources.
Over the years, investigators have discussed noise processes,
particularly Brownian motion, in various cochlear elements (e.g., de
Vries, 1948; Corey and Hudspeth, 1983; Harris, 1968; Svrcek-Seiler
et al., 1998), and some have discussed the potential benefits to
detection afforded by stochastic resonance (e.g., Jaramillo and
Wiesenfeld, 1998). Most relevant here, of course, are those mechanisms that lead eventually to acoustical noise, not just molecular,
receptor, or neural noise. Rather than discuss the merits of various
possible mechanisms, we will summarize what we know about the
characteristics of the noise we have measured during the silent
periods. It is wideband; the physiological noise we measured during the silent period existed over a range of about 1.1e7.0 kHz. It
was present in every subject tested, and, for every subject, its level
covaried with the attention condition at every frequency analyzed.
Our physiological-noise function has a U-shape: the largest magnitudes were measured at the lowest and highest frequencies
tested. (In contrast, the spectrum of Brownian noise declines
significantly per octave from low to high frequencies.) Over the
middle and upper frequency range, the spectrum of the physiological noise is highly similar in shape to the spectrum of the noise
we measured in a passive cavity using the same procedures and
equipment.
We have suggested that we are measuring cochlear noise
because we know that cochlear noise exists in healthy ears, that
this inherent noise is broadband, and that it can be suppressed by
efferent activation. Nutall et al. (1997) measured basilar membrane
(BM) vibration in the guinea-pig cochlea in the absence of sound
stimulation. They demonstrated that BM noise was associated with
a healthy, sensitive cochlea; BM noise decreased over the duration
of surgery, and disappeared completely with the death of the animal. Stimulation of the crossed olivocochlear bundle (COCB)
significantly suppressed BM noise by about 10 dB. Nutall et al.
(1997) concluded that BM noise is broadband because the noise
function measured in quiet was very similar to the response function to a low-level broadband stimulus. They proposed that thermal
noise could be the origin of this energy because the spectrum of
thermal noise is approximately flat.
An alternative explanation is that the noise we have measured
did not originate in the cochlea, but rather that the observed differences in noise level resulted simply from our subjects sitting
more quietly in the attention conditions than in the inattention
conditions. That way, there would be more head and body noise
recorded in the inattention conditions. However, there is considerable evidence against this suggestion. First, unlike many past
studies (Puel et al., 1988; Avan and Bonfils, 1992; Meric and Collet,
1992, 1994b; Froehlich et al., 1993; Ferber-Viart et al., 1995), our
subjects were required to make the same behavioral response on
every trial whether it was an inattention block or an attention
block. Second, our subjects were given trial-by-trial feedback as to
whether their physiological responses were being rejected as too
noisy, meaning that they knew when additional trials were being
added to the block; this encouraged them to sit quietly. Third, if the
subjects were more restless in the inattention condition, one would
expect that more trials would be rejected by our acceptance criteria
(see Appendix, Section 6.1) during the inattention blocks than
during attention blocks. In fact, however, the duration of a block of
trials was similar in the inattention and attention conditions,
155
suggesting that the levels of extraneous subject noise were similar
across conditions: the average block duration for the inattention
condition was 270.1 s (SD ¼ 69.1), and for the dichotic condition it
was 270.6 s (SD ¼ 79.0). The number of trials in each condition also
was very similar: the average number of trials for the inattention
condition was 32.3 (SD ¼ 3.6), and for the dichotic condition it was
32.2 (SD ¼ 4.4). Fourth, on every trial we measured the time
elapsed from the onset of the behavioral response interval until the
subject’s button pressdthe reaction time or RT. For every subject,
the RTs for the inattention task invariably were substantially faster
than the RTs for the attention tasks. If our subjects were totally
inattentive and moving around restlessly during the inattention
blocks, then one would expect the RTs to be slow, not fast. Our
subjects were attending to the stimulus sequences during the
inattention blocks, they just did not have to remember the sounds
for a subsequent cognitive task.
A related possibility is that subjects specifically reduced their
cardiac, pulmonary, or muscular noise during the attention blocks.
However, this alternative also is an implausible explanation for our
data. Ren et al. (1995) showed that the acoustic energy produced by
the cardiac cycle is weak, and has spectral components predominantly located below 500 Hz. Similarly, Gavriely et al. (1981)
measured the spectral characteristics of the human pulmonary
system and showed that the maximal frequency for inhalation and
exhalationdthe frequency beyond which no acoustic energy was
detectabledranged from 286 to 604 Hz, depending upon where on
the chest wall the measurements were made. Finally, acoustic noise
resulting from contractions of skeletal muscles in humans has been
shown to be below about 25 Hz (Diemont et al., 1988; Oster and
Jaffe, 1980). In contrast, the lowest frequency at which we
measured noise levels in our subjects was 1074 Hz, and we
observed consistent differences across subjects and frequencies up
to 7000 Hz.
Another explanation that has been proposed is that the middleear reflex (MER) was responsible for our findings. Specifically,
perhaps the MER made both forward and reverse transmission less
efficient (e.g., Puria, 2003) in the attention conditions. We regard
this to be an unlikely explanation for the attention/inattention
differences we have obtained. To be sure, the stimulus levels used
in this study were in the range where the MER could be activated in
some sensitive subjects (Guinan et al., 2003). But to explain the
differences in physiological noise measured in our attention and
inattention conditions, the MER would have to contract differently
in the two conditions even though all of the acoustic stimuli (the
speech sounds and the tone-plus-noise used to evoke efferent activity) were identical in the two conditions. We are not aware of any
evidence suggesting that the MER is differentially affected
depending upon cognitive demands, and even if it were, the
expectation would be that the biggest effects would be at low frequencies whereas our attention/inattention differences were about
the same across frequency. Finally, note that this MER proposal
does seem to assume that the physiological noise we have
measured during the silent period has its origin in the cochlea,
which we do believe. Although we cannot know for certain
whether the MER was activated for some of our subjects in this
study, we doubt strongly that the MER was responsible for the
differential measurements obtained for attention and inattention
conditions. Differential activation of the efferent system is a more
parsimonious explanation.
5.1.1. A speculation
While discussing our selective-attention results with colleagues,
we were informed of an intriguing mechanism apparently used by
the visual system during attention (W.S. Geisler III, personal
communication, 2012). Mitchell et al. (2009) showed that there are
156
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
low-frequency (<5 Hz) fluctuations in firing rate in neurons located
in the V4 region of visual cortex in rhesus monkey. Furthermore,
those low-frequency fluctuations in rate are ordinarily correlated
across neurons. Then, when attention is demanded of the monkey
to accomplish a discrimination task, the low-frequency fluctuations
cease being correlated. The interpretation of these findings is that,
when information is pooled across populations of neurons sensitive
to similar aspects of the attended stimuli, the effective signal-tonoise ratio is improved by the decorrelation in the firings. This
finding led us to consider whether a similar mechanism might be
operating in the cochlea. The following speculation was the result.
Imagine that, in the quiet, the basilar membrane (or associated
structures, or both) undergo weak vibrations that are attributable
in part to the random, spontaneous firings of the MOC efferent fibers. Each MOC fiber contacts dozens of OHCs spread across the
three linear rows, and each OHC receives innervation from several
different fibers (Warr and Guinan, 1979; Brown, 2011). The density
of this efferent innervation is greater in the middle and at the basal
end of the cochlea than at the apical end (Liberman et al., 1990;
Brown, 2011), but along much of the length of the basilar membrane, there is a complex, overlapping mesh of efferent innervation.
We speculate that, during inattention, the random, spontaneous
firings in the MOC fibers are somewhat positively correlated,
meaning that individual OHCs have a relatively high probability of
receiving nearly synchronous neurotransmitter releases from more
than one input fiber. We know that efferent input to an OHC causes
inhibition (e.g., Brown et al., 1983; Cooper and Guinan, 2006;
Guinan, 2006, 2011; Guinan et al., 2003; Murugasu and Russell,
1996; Rabbitt et al., 2009; Wiederhold and Kiang, 1970), so moments of multiple, nearly synchronous activations alternating with
moments of fewer or no activations would mean that there would
be corresponding alternations in the level of inhibition of the
affected OHCs. We speculate that those nearly synchronous activations, and the consequent nearly regular alternations in inhibition, lead to nearly regular alternations in the lengths of the OHCs
(Brownell et al., 1985; Breneman et al., 2009) that in turn lead to
weak vibrations, that then sum with other local random vibrations,
propagate basally, and escape into the ear canal where they can be
recorded by our microphone. This cochlear noise ordinarily would
not be separable from other sources of noise in the ear canal, but its
susceptibility to modulation by cortico-olivo influences both allow
it to be detected and confirm its cochlear origin.
To complete this speculation, we presume that, under heavy
attentional load, the increase in efferent activation that persists
throughout the silent period coincides with a persisting decrease in
synchronization of the firing patterns in the overlapping efferent
fibers. This loss of synchronization produces smaller differences in
the alternating magnitudes of inhibition, and the magnitude of the
sound produced is reduced. Were a mechanism of this sort operating, the auditory system would be using a similar strategy to that
apparently used by the visual systemddesynchronization of firingsdto solve a similar problem when attention is required either
within or across modalities.
5.2. Controls
Our auditory-attention and inattention conditions were
designed to be as similar as possible and to differ primarily in the
amount of cognitive resources required to perform the behavioral
task. The dichotic- and diotic-listening conditions had both
selective-attention (Treisman, 1969) and working-memory components (Baddeley, 1992), but the inattention condition had neither
of these components. Specifically, for the dichotic-listening condition: auditory attention was required to locate the female talker at
either the left or right ear, and then to maintain focus on the female
speech stream while concurrently ignoring the male speech
stream. While the subject was attending to the female talker,
working memory was required to retain the sequence of digits that
she spoke until the response interval began. Similar components
were required in the diotic-listening task. In contrast, in the inattention condition, the behavioral task only required that a subject
listen for the final sound on every trial. Then, for the attention and
inattention tasks alike, the actual behavioral response was the
same: a button press with the right hand.
At the level of the auditory cortex, the sounds presented during
the inattention and selective-listening conditions differed greatly
in terms of their perceptual saliency, but for the cochlea, the sounds
presented during every condition should have been processed
similarly. The spectra were identical, the temporal envelopes were
identical, and the overall level of the SSN stimuli was equal to the
overall level of the speech waveforms (about 50 dB SPL). Furthermore, the SSN stimuli were lowpass filtered at 3.0 kHz, just like the
actual speech waveforms, and the timing of the presentation of the
waveforms was identical in the inattention and selective-listening
conditions. In other words, the SSN stimuli had the characteristics
of speech without actually sounding like speech, and they were
presented exactly like the speech stimuli. As in the selectivelistening conditions, the physiological responses during the inattention condition were evaluated for acceptance only if a key-press
was recorded during the response interval. Again, the nSFOAE
stimuli were identical across all conditions.
5.3. Comparison with previous reports
Common to many past studies of the effects of attention on OAEs
was a concern that differential levels of physiological noise might
exist across experimental conditions (Froehlich et al., 1990, 1993;
Giard et al., 1994; Ferber-Viart et al., 1995; de Boer and Thornton,
2007; Harkrider and Bowers, 2009), and that these differences in
background noise would have the potential to confound the
interpretation of their OAE measures. Often, this concern was
rooted in the fact that the attention task commonly required a
motor responsedtypically a button-pressdbut the inattention task
did not (Puel et al., 1988; Avan and Bonfils, 1992; Meric and Collet,
1992, 1994b; Froehlich et al., 1993; Ferber-Viart et al., 1995).
In those studies that did measure and report the noise levels
from their OAE recordings (Froehlich et al., 1990, 1993; Ferber-Viart
et al., 1995; de Boer and Thornton, 2007; Harkrider and Bowers,
2009), only de Boer and Thornton (2007) found a significant main
effect of behavioral task (inattention versus attention) on the
physiological-noise magnitudes in their OAE measures. In that
study, click-evoked otoacoustic emissions (CEOAEs) were recorded
and then saved to two buffers in an alternating fashion, thus
yielding a pair of averaged CEOAE responses. For each pair, noise
level was calculated by subtracting the two averaged CEOAE responses, and then converting the rms of the difference waveform to
dB.
De Boer and Thornton (2007) observed that physiological-noise
levels were significantly higher on average during an inattention
task (pressing a button at the onset of each contralateral-noise
presentation) than during a passive visual-attention task (watching a silent movie), and marginally significantly higher than during
an active auditory-attention task (detecting tone pips in the clicktrain stimulus). There was no difference in noise level, however,
between the inattention task and an active visual-attention task
(judging the correctness of simple additive sums). For all comparisons, the magnitude of the difference between the inattention and
attention conditions was less than 1.0 dB SPL. Notably, the one
significant difference in noise level was for the comparison between the inattention task, which required a motor response (a
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
button-press), and the passive visual-attention task, which did not;
noise levels were higher when a motor response was required. In
the de Boer and Thornton study, there also were significantly more
rejected CEOAE measures (due to excessive noise) during the
inattention task than during either the passive visual- or active
auditory-attention tasks, but not during the active visual-attention
task, which had a higher rejection rate than the inattention task.
Overall, between the noise and rejection measures, there generally
was more subject noise associated with the inattention task than
with the various attention tasks (which was not true in the present
studydsee above).
In a similar study, Harkrider and Bowers (2009) also used
otoacoustic emissions to study the effects of attention on peripheral
auditory processing. Like de Boer and Thornton (2007), they
measured CEOAEs, saved the responses to alternating buffers, and
calculated a noise level for each condition by subtracting the two
averaged responses. Unlike de Boer and Thornton, however, Harkrider and Bowers did not observe a significant main effect of condition on noise level. They did observe more rejected CEOAE
measures in the inattention condition than in their auditoryattention conditions, but none of these differences were statistically significant.
In the present study, the same motor responseda button-press
on a keypaddwas required on every trial of every condition, thus
eliminating the possibility that the substantial differences in noise
levels across conditions were due to differences in the motor
behavior of our subjects. It is possible that we observe larger effects
of attention on cochlear noise levels than previous researchers
because we used a bilateral MOC elicitor, which produces more
activation than either an ipsilateral or contralateral noise presented
in isolation (Lilaonitkul and Guinan, 2009), or because our behavioral tasks were more demanding cognitively than those used
previously. The effects reported here actually could be underestimates of what would be obtained with a slightly stronger
wideband noise, because MOC activation increases with increasing
level of the eliciting sound over a range of moderate soundpressure levels (Veuillet et al., 1991; Guinan et al., 2003; Backus
and Guinan, 2006).
157
RO1 DC000153). The content is solely the responsibility of the authors and does not necessarily represent the official views of the
NIDCD or the National Institutes of Health. D.O. Kim and J.G. Guinan
provided helpful discussion about these results, and two anonymous reviewers provided thoughtful comments.
Appendix
nSFOAE acceptance criteria
In conclusion, we note that our results provide no obvious evidence for a mechanism that could help with differential attention
to one ear or to different frequency regions within an ear; nor is
there evidence for a mechanism that could switch attention rapidly
between various auditory targets. For the stimuli employed here, at
least, the mechanism we have documented operated equally on the
attended and non-attended ears, operated globally across the
spectrum, and showed persistence (was sluggishdalso see Walsh
et al., 2010a,b). Although individual readers may join us in
finding some of our new results interesting, the reality is that we
still do not know whether or how the auditory system profits from
the additional efferent activity during selective attention, nor even
whether there are degrees of activation of the attention mechanism. All we know is that MOC efferent activity does appear to be
correlated with selective attention. Learning more will require
more subtle manipulations than were reported here.
The nSFOAE procedure began with a calibration routine, during
which no speech stimuli were presented. The level of a 500-Hz tone
was adjusted in the right ear canal of the subject to attain 65 dB SPL.
This routine was run separately for each of the ER-2 earphones. A
calibration factor was calculated and then was used to scale the
amplitude of the experimental stimulus used to elicit the nSFOAE
(the tone-plus-noise stimulus described above). This calibration
routine was followed by two criterion-setting routines, and then
the main data-acquisition routine. The first criterion-setting
routine consisted of 12 trials (each having two triplets) during
which all nSFOAE responses were accepted unless the peak
amplitude of the recording exceeded 45 dB SPL. (Differencewaveforms whose amplitudes exceeded this limit typically were
observed when the subject moved, swallowed, or produced some
other artifactual noise.) All of the accepted nSFOAE responses were
averaged point-by-point, and the resultant waveform served as the
foundation for the accumulating nSFOAE average to be constructed
during the main acquisition routine. Furthermore, the rms value of
each accepted nSFOAE response was computed, and the resulting
distribution of rms values was used to evaluate subsequent responses during the main acquisition routine. The second criterionsetting routine consisted of a 20-s recording in the quiet during
which no sound was presented to the ears. The median rms voltage
from this recording was calculated, and was used as a measure of
the ambient (physiological) noise level for that individual subject.
During the main acquisition routine, each new nSFOAE response
was compared to the data collected during the two criterion-setting
routines, and was accepted into the accumulating nSFOAE average
if either one of two criteria was satisfied. First, if the rms value of
the new nSFOAE response was less than 0.25 standard deviations
above the median rms of the saved distribution, then the nSFOAE
was added to the accumulating average. Second, each new nSFOAE
response was subtracted point-for-point from the accumulating
nSFOAE average. The rms of this difference waveform was
computed, then converted to decibels. If the magnitude of the
difference waveform was less than 6.0 dB above the noise level
measured earlier in the quiet, the new nSFOAE was accepted into
the accumulating average. Subjects received feedback at the end of
each trial about whether the physiological data met the criteria for
acceptance. An additional trial was added to the block whenever
the physiological data were not acceptable, which encouraged the
subjects to remain as still and quiet as possible. The nSFOAE responses from triplet 1 always were averaged and analyzed separately from the responses from triplet 2, and the block of trials
terminated when each of the two averages was composed of about
20e30 nSFOAE responses. Eventually the accepted responses were
pooled across at least four blocks of trials (see text).
Acknowledgments
Comparison of dichotic- and diotic-attention conditions
This study was done as part of the requirements for a doctoral
degree from The University of Texas at Austin, by author KPW, who
now is located at the University of Minnesota. The work was supported by a research grant awarded to author DM by the National
Institute on Deafness and other Communication Disorders (NIDCD;
Before a determination could be made about the comparability
of the dichotic- and diotic-listening conditions, the two sets of data
had to be made equivalent. As mentioned above, approximately
half the number of individual trials contributed to the averaged
physiological responses during the dichotic (and inattention)
5.4. Final comment
158
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
blocks of trials as during the diotic blocksdbecause the responses
on ipsilateral trials initially were kept separate from those on
contralateral trials. To make the data equivalent, the raw ipsilateral
responses were summed, point for point, with the raw contralateral
responses within each dichotic block of trials prior to doing
window-by-window analyses like those already described. That is,
for the dichotic blocks, the response waveforms themselves were
summed prior to extracting the estimates of level within the successive 10-ms windowsdjust as for the diotic blocks. [The
demonstration of the comparability of the ipsilateral and contralateral responses (Table 1) confirms that this summing did not
distort the data.]
The results of this analysis are shown in Table A1. The individual
entries are window-by-window averages across the ten 10-ms
analysis windows beginning 10 ms into the silent period (the
asymptotic values), then averaged across at least four blocks of
trials. Although the dichotic and diotic data did exhibit differences
for some subjects, the means at the bottom of Table A1 reveal that,
on average, the asymptotic levels were essentially identical across
the two listening conditions and the two triplets. Note that the
effect sizes for the dichotic/diotic difference were small for both
triplets. As a consequence of this comparison, for simplicity, the
diotic data are omitted throughout the Results section; for all
comparisons discussed, the diotic and dichotic data were essentially the same.
Table A1
Asymptotic physiological-noise levels (dB SPL) in the auditory-attention conditions,
from trials having correct behavioral responses.
Subject
L01
L02
L03
L04
L05
L06
L07
L08
Mean
Std. dev.
Effect size
(DichoticdDiotic)
Triplet 1
Triplet 2
Dichotic
Diotic
Dichotic
Diotic
13.2
12.4
13.0
15.6
13.6
14.1
10.6
14.4
13.4
1.5
11.0
12.4
14.2
12.5
13.9
14.5
12.1
15.7
13.3
1.5
0.0
12.5
12.8
14.5
14.2
13.4
12.7
11.8
14.4
13.3
1.0
14.1
12.4
13.9
13.6
13.9
13.6
11.1
12.0
13.1
1.1
0.2
Within each block of trials, the raw ipsilateral and contralateral responses were
averaged prior to window-by-window analysis.
Each entry is a window-by-window mean first across the final ten 10-ms windows
of the silent period and then across at least four 30-trial blocks, corresponding to
about 80e120 trials (correct only) averaged for each physiological response.
References
Avan, P., Bonfils, P., 1992. Analysis of possible interactions of an attentional task with
cochlear micromechanics. Hear. Res. 57, 269e275.
Backus, B.C., Guinan Jr., J.J., 2006. Time-course of the human medial olivocochlear
reflex. J. Acoust. Soc. Am. 119, 2889e2904.
Bacon, S.P., Fay, R.R., Popper, A.N. (Eds.), 2004. Compression: From Cochlea to
Cochlear Implants. Springer, New York.
Baddeley, A., 1992. Working memory. Science 31, 556e559.
Breneman, K.D., Brownell, W.E., Rabbit, R.D., 2009. Hair cell bundles: flexoelectric
motors of the inner ear. PloS One 4 (4), e5201.
Brown, M.C., 2011. Anatomy of olivocochlear neurons. In: Ryugo, D.K., Fay, R.R.,
Popper, A.N. (Eds.), Auditory and Vestibular Efferents. Springer, New York,
pp. 17e37.
Brown, M.C., Nuttall, A.L., Masta, R.I., 1983. Intracellular recordings from cochlear
inner hair cells: effects of stimulation of the crossed olivocochlear efferents.
Science 222, 69e72.
Brownell, W.E., Bader, C.R., Bertrand, D., de Ribaupierre, Y., 1985. Evoked mechanical
responses of isolated cochlear outer hair cells. Science 227, 194e196.
Cherry, C., 1953. Some experiments on the recognition of speech, with one and with
two ears. J. Acoust. Soc. Am. 25, 975e979.
Cohen, J., 1992. A power primer. Psych. Bull. 112, 155e159.
Cooper, N.P., Guinan Jr., J.J., 2003. Separate mechanical processes underlie fast
and slow effects of medial olivocochlear efferent activity. J. Physiol. 548,
307e312.
Cooper, N.P., Guinan Jr., J.J., 2006. Efferent-mediated control of basilar membrane
motion. J. Physiol. 576, 49e54.
Corey, D.P., Hudspeth, A.J., 1983. Kinetics of the receptor current in bullfrog saccular
hair cells. J. Neurosci. 3, 962e976.
Davis, H., 1983. An active process in cochlear mechanics. Hear. Res. 9, 79e90.
de Boer, J., Thornton, A.R.D., 2007. Effect of subject task on contralateral suppression
of click evoked otoacoustic emissions. Hear. Res. 233, 117e123.
de Vries, H.L., 1948. Brownian movement and hearing. Physica 14, 48e60.
Diemont, B., Figini, M.M., Orizio, C., Perin, R., Veicsteinas, A., 1988. Spectral analysis
of muscular sound at low and high contraction level. Int. J. Bio-Med. Comp. 23,
161e175.
Ferber-Viart, C., Duclaux, R., Collet, L., Guyonnard, F., 1995. Influence of auditory
stimulation and visual attention on otoacoustic emissions. Physiol. Behav. 57,
1075e1079.
Fritz, J.B., Elhilali, M., David, S.V., Shamma, S.A., 2007. Auditory attention e focusing
the searchlight on sound. Curr. Opin. Neurobiol. 17, 437e455.
Froehlich, P., Collet, L., Chanal, J.-M., Morgon, A., 1990. Variability of the influence of
a visual task on the active micromechanical properties of the cochlea. Brain Res.
508, 286e288.
Froehlich, P., Collet, L., Morgon, A., 1993. Transiently evoked otoacoustic emission
amplitudes change with changes of directed attention. Physiol. Behav. 53, 679e
682.
Gallun, F.J., Mason, C.R., Kidd Jr., G., 2007. Task-dependent costs in processing two
simultaneous auditory stimuli. Percept. Psychophys. 69, 757e771.
Gavriely, N., Palti, Y., Alroy, G., 1981. Spectral characteristics of normal breath
sounds. J. Appl. Physiol. 50, 307e314.
Giard, M.-H., Collet, L., Bouchet, P., Pernier, J., 1994. Auditory selective attention in
the human cochlea. Brain Res. 633, 353e356.
Goodman, S.S., Keefe, D.H., 2006. Simultaneous measurement of noise-activated
middle-ear muscle reflex and stimulus frequency otoacoustic emissions.
J. Assoc. Res. Otolaryngol. 7, 125e139.
Guinan Jr., J.J., 2006. Olivocochlear efferents: anatomy, physiology, function, and the
measurement of efferent effects in humans. Ear Hear 27, 589e607.
Guinan Jr., J.J., 2011. Physiology of the medial and lateral olivocochlear systems. In:
Ryugo, D.K., Fay, R.R., Popper, A.N. (Eds.), Auditory and Vestibular Efferents.
Springer, New York, pp. 39e81.
Guinan Jr., J.J., Backus, B.C., Lilaonitkul, W., Aharonson, V., 2003. Medial olivocochlear efferent reflex in humans: otoacoustic emission (OAE) measurement
issues and the advantages of stimulus frequency OAEs. J. Assoc. Res. Otolaryngol. 4, 521e540.
Hafter, E.R., Bonnel, A.-M., Gallun, E., Cohen, E., 1998. A role for memory in divided
attention between two independent stimuli. In: Palmer, A.R., Rees, A.,
Summerfield, A.Q., Meddis, R. (Eds.), Psychophysical and Physiological Advances
in Hearing. Whurr, London, pp. 228e237.
Harkrider, A.W., Bowers, C.D., 2009. Evidence for a cortically mediated release from
inhibition in the human cochlea. J. Am. Acad. Audiol. 20, 208e215.
Harris, G.G., 1968. Brownian motion in the cochlear partition. J. Acoust. Soc. Am. 44,
176e186.
Hernández-Peón, R., Scherrer, H., Jouvet, M., 1956. Modification of electric activity
in cochlear nucleus during “attention” in unanesthetized cats. Science 123,
331e332.
Jaramillo, F., Wiesenfeld, K., 1998. Mechanoelectrical transduction assisted by
Brownian motion: a role for noise in the auditory system. Nat. Neurosci. 1 (5),
384e388.
Keefe, D.H., 1998. Double-evoked otoacoustic emissions. I. Measurement theory and
nonlinear coherence. J. Acoust. Soc. Am. 103, 3489e3498.
Kemp, D.T., 1978. Stimulated acoustic emissions from within the human auditory
system. J. Acoust. Soc. Am. 64, 1386e1391.
Kemp, D.T., 1980. Towards a model for the origin of cochlear echoes. Hear. Res. 2,
533e548.
Liberman, M.C., Dodds, L.W., Pierce, S., 1990. Afferent and efferent innervation of the
cat cochlea: quantitative analysis with light and electron microscopy. J. Comp.
Neurol. 301, 443e460.
Lilaonitkul, W., Guinan Jr., J.J., 2009. Human medial olivocochlear reflex: effects as
functions of contralateral, ipsilateral, and bilateral elicitor bandwidths. J. Assoc.
Res. Otolaryngol. 10, 459e470.
Lukas, J.H., 1980. Human auditory attention: the olivocochlear bundle may function
as a peripheral filter. Psychophysiol. 17, 444e452.
Maison, S., Micheyl, C., Collet, L., 2001. Influence of focused auditory attention on
cochlear activity in humans. Psychophys. 38, 35e40.
Meric, C., Collet, L., 1992. Visual attention and evoked otoacoustic emissions: a
slight but real effect. Int. J. Psychophysiol. 12, 233e235.
Meric, C., Collet, L., 1994a. Attention and otoacoustic emissions: a review. Neurosci.
Biobehav. Rev. 18, 215e222.
Meric, C., Collet, L., 1994b. Differential effects of visual attention on spontaneous
and evoked otoacoustic emissions. Int. J. Psychophysiol. 17, 281e289.
Mitchell, J.F., Sundberg, K.A., Reynolds, J.H., 2009. Spatial attention decorrelates
intrinsic activity fluctuations in macaque area V4. Neuron 63, 879e888.
Mulders, W.H.A.M., Robertson, D., 2000a. Effects on cochlear responses of activation
of descending pathways from the inferior colliculus. Hear. Res. 149, 11e23.
Mulders, W.H.A.M., Robertson, D., 2000b. Evidence for direct cortical innervation of
medial olivocochlear neurons in rats. Hear. Res. 144, 65e72.
K.P. Walsh et al. / Hearing Research 312 (2014) 143e159
Murugasu, E., Russell, I.J., 1996. The effect of efferent stimulation on basilar membrane displacement in the basal turn of the guinea pig cochlea. J. Neurosci. 16,
325e332.
Nutall, A.L., Guo, M., Ren, T., Dolan, D.F., 1997. Basilar membrane velocity noise. Hear.
Res. 114, 35e42.
Oster, G., Jaffe, J.S., 1980. Low frequency sounds from sustained contraction of human skeletal muscle. Biophys. J. 30, 119e127.
Pasanen, E.G., McFadden, D., 2000. An automated procedure for identifying spontaneous otoacoustic emissions. J. Acoust. Soc. Am. 108, 1105e1116.
Picton, T.W., Hillyard, S.A., 1971. Human auditory evoked potentials. II. Effects of
attention. Electroenceph. Clin. Neurophys. 36, 191e199.
Puel, J.-L., Bonfils, P., Pujol, R., 1988. Selective attention modifies the active micromechanical properties of the cochlea. Brain Res. 447, 380e383.
Puria, S., 2003. Measurements of human middle ear forward and reverse acoustics:
implications for otoacoustic emissions. J. Acoust. Soc. Am. 113, 2773e2789.
Rabbitt, R.D., Clifford, S., Breneman, K.D., Farrell, B., Brownell, W.E., 2009. Power
efficiency of outer hair cell somatic electromotility. PLoS Comput. Biol. 5 (7),
e1000444.
Rasmussen, G.L., 1946. The olivary peduncle and other fiber projections of the superior olivary complex. J. Comp. Neurol. 84, 141e219.
Rasmussen, G.L., 1953. Further observations of the efferent cochlear bundle.
J. Comp. Neurol. 99, 61e74.
Ren, T., Zhang, M., Nuttall, A.L., Miller, J.M., 1995. Heart beat modulation of spontaneous otoacoustic emissions in guinea pig. Acta Otolaryngol. Stockh. 115,
725e731.
Shera, C.A., 2003. Mammalian spontaneous otoacoustic emissions are amplitudestabilized cochlear standing waves. J. Acoust. Soc. Am. 114, 244e262.
Sridhar, T.S., Liberman, M.C., Brown, M.C., Sewell, W.F., 1995. A novel cholinergic
“slow effect” of efferent stimulation on cochlear potentials in the guinea pig.
J. Neurosci. 15, 3667e3678.
Svrcek-Seiler, W.A., Gebeshuber, I.C., Rattay, F., Biro, T.S., Markum, H., 1998. Micromechanical models for the Brownian motion of hair cell stereocilia. J. Theor.
Biol. 193, 623e630.
Treisman, A.M., 1969. Strategies and models of selective attention. Psych. Rev. 76,
282e299.
Veuillet, E., Collet, L., Duclaux, R., 1991. Effect of contralateral acoustic stimulation
on active cochlear micromechanical properties in human subjects: dependence
on stimulus variables. J. Neurophysiol. 65, 724e735.
Walsh, K.P., Pasanen, E.G., McFadden, D., 2010a. Properties of a nonlinear version
of the stimulus-frequency otoacoustic emission. J. Acoust. Soc. Am. 127,
955e969.
Walsh, K.P., Pasanen, E.G., McFadden, D., 2010b. Overshoot measured physiologically and psychophysically in the same human ears. Hear. Res. 268, 22e37.
159
Walsh, K.P., Pasanen, E.G., McFadden, D., 2014. Selective attention reduces physiological noise in the external ear canals of humans. II: visual attention. Hear. Res
312, 160e167.
Warr, W.B., Guinan Jr., J.J., 1979. Efferent innervation of the organ of Corti: two
separate systems. Brain Res. 173 (1), 152e155.
Wiederhold, M.L., Kiang, N.Y.S., 1970. Effects of electric stimulation of the crossed
olivocochlear bundle on single auditory-nerve fibers in the cat. J. Acoust. Soc.
Am. 48, 950e965.
Wilson, J.P., 1980. Evidence for a cochlear origin for acoustic reemissions, threshold
fine-structure and tonal tinnitus. Hear. Res. 2, 233e252.
Wit, H.P., Ritsma, R.J., 1979. Stimulated acoustic emissions from the human ear.
J. Acoust. Soc. Am. 66, 911e913.
Worden, F.G., 1973. Auditory habituation. In: Peeke, H., Herz, M. (Eds.), Habituation.
Academic Press, New York, pp. 109e137.
Glossary
ANOVA: analysis of variance
CEOAE: click evoked otoacoustic emission
d: Cohen’s d, effect size
DCN: dorsal cochlear nucleus
FFT: fast Fourier transform
HL: hearing level
ISI: interstimulus interval
MOC: medial olivocochlear
MOCB: medial olivocochlear bundle
nSFOAE: nonlinear stimulus-frequency otoacoustic emission
OAE: otoacoustic emission
OHC: outer hair cell
SFOAE: stimulus-frequency otoacoustic emission
SOAE: spontaneous otoacoustic emission
SOC: superior olivary complex
SSN: speech-shaped noise
Not included
dB SPL: decibels sound-pressure level
Hz: hertz
kHz: kilohertz
min: minute
ms: millisecond