Download Multistable representation of speech forms: a functional - GIPSA-Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Perception wikipedia , lookup

Neuroeconomics wikipedia , lookup

Neurolinguistics wikipedia , lookup

Neurophilosophy wikipedia , lookup

Recurrent neural network wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Neuroesthetics wikipedia , lookup

State-dependent memory wikipedia , lookup

Anatomy of the cerebellum wikipedia , lookup

Embodied language processing wikipedia , lookup

Guided imagery wikipedia , lookup

Mental chronometry wikipedia , lookup

Affective neuroscience wikipedia , lookup

Aging brain wikipedia , lookup

Neurocomputational speech processing wikipedia , lookup

Speech perception wikipedia , lookup

Process tracing wikipedia , lookup

Broca's area wikipedia , lookup

Emotional lateralization wikipedia , lookup

Time perception wikipedia , lookup

Inferior temporal gyrus wikipedia , lookup

Cognitive neuroscience of music wikipedia , lookup

Transcript
www.elsevier.com/locate/ynimg
NeuroImage 23 (2004) 1143 – 1151
Multistable representation of speech forms: a functional MRI study of
verbal transformations
Marc Sato,a,* Monica Baciu,b Hélène LKvenbruck,a Jean-Luc Schwartz,a Marie-Agnès Cathiard,a
Christoph Segebarth,c and Christian Abrya
a
Institut de la Communication Parlée, CNRS UMR 5009, Institut National Polytechnique de Grenoble, Université Stendhal, Grenoble, France
Laboratoire de Psychologie et Neurocognition, CNRS UMR 5105, Université Pierre Mendès, Grenoble, France
c
Unité Mixte INSERM/Université Joseph Fourier U594, NeuroImagerie Fonctionnelle et Métabolique, LRC CEA 30V, Grenoble, France
b
Received 12 March 2004; revised 12 July 2004; accepted 19 July 2004
Available online 6 October 2004
We used functional magnetic resonance imaging (fMRI) to localize the
brain areas involved in the imagery analogue of the verbal transformation effect, that is, the perceptual changes that occur when a
speech form is cycled in rapid and continuous mental repetition. Two
conditions were contrasted: a baseline condition involving the simple
mental repetition of speech sequences, and a verbal transformation
condition involving the mental repetition of the same items with an
active search for verbal transformation. Our results reveal a predominantly left-lateralized network of cerebral regions activated by the
verbal transformation task, similar to the neural network involved in
verbal working memory: the left inferior frontal gyrus, the left
supramarginal gyrus, the left superior temporal gyrus, the anterior
part of the right cingulate cortex, and the cerebellar cortex, bilaterally.
Our results strongly suggest that the imagery analogue of the verbal
transformation effect, which requires percept analysis, form interpretation, and attentional maintenance of verbal material, relies on a
working memory module sharing common components of speech
perception and speech production systems.
D 2004 Elsevier Inc. All rights reserved.
Keywords: fMRI; Auditory imagery; Verbal working memory; Speech
production; Speech perception
Introduction
Auditory imagery can be defined as an introspective and
conscious persistence of an auditory experience in the absence of
related auditory input. It plays an important role in numerous
cognitive functions whenever auditory material is represented for
* Corresponding author. Institut de la Communication Parlée-UMR
CNRS 5009, Institut National Polytechnique de Grenoble-Université
Stendhal 46, Avenue Félix Viallet, 38031 Grenoble Cedex 01-France.
Fax: +33 4 76 57 47 10.
E-mail address: [email protected] (M. Sato).
Available online on ScienceDirect (www.sciencedirect.com.)
1053-8119/$ - see front matter D 2004 Elsevier Inc. All rights reserved.
doi:10.1016/j.neuroimage.2004.07.055
analysis to make comparisons or to form interpretations from it
(Reisberg, 1992). For example, with visually presented stimuli,
auditory imagery may be involved in various phonological tasks,
such as homophone detection, rhyme judgments, or metrical stress
evaluation (for a review, see Reisberg, 1992; Smith et al., 1995).
Likewise, auditory imagery may also play a role in music
perception and cognition (e.g., Aleman et al., 2000; Halpern and
Zatorre, 1999) and even during auditory hallucinations in
schizophrenia (e.g., Shergill et al., 2000). As it requires temporary
manipulation of verbal material, auditory imagery is assumed to be
mediated by verbal working memory, that is, the cognitive system
involved in the temporary maintenance and manipulation of
information (Baddeley and Logie, 1992; Reisberg et al., 1989;
Smith et al., 1995). A classical model of verbal working memory,
in line with psychological, neuropsychological, and developmental
findings, is the phonological, or articulatory, loop (Baddeley, 1986,
1992; Baddeley and Hitch, 1974; for a review of verbal working
memory models, see Miyake and Shah, 1999). According to this
model, verbal working memory comprises two components, a
passive phonological store, in which the phonological material is
temporarily held, and an active articulatory rehearsal system, the
role of which is to refresh the phonological material by subvocal
articulation. Previous behavioral as well as functional neuroimaging studies have provided convergent evidence for the
involvement of this mnemic system during auditory imagery
(McGuire et al., 1996; Shergill et al., 2001; Smith et al., 1995). For
instance, Shergill et al. (2001) showed that consciously recalling
the sound of someone’s voice (imagining speech) involves a leftlateralized network of frontal, parietal, temporal, and cerebellar
areas similar to that traditionally considered to be involved in
verbal working memory.
One such auditory imagery task is the imagery analogue of the
verbal transformation effect, first described by Warren and Gregory
(1958). This effect arises when a speech form is cycled in rapid and
continuous repetition. While, initially, a percept matching the
original form is perceived, at some point of time, another percept
1144
M. Sato et al. / NeuroImage 23 (2004) 1143–1151
suddenly pops up, corresponding to an abrupt change in perception
of the original speech form. This transformation process persists
throughout the repetition procedure, leading to perceptual transitions from one speech form to another (or back to the original
form). For example, rapid repetitions of the word blifeQ provide a
sound flow fully compatible with the perception that either blifeQ or
bflyQ is being repeated. Verbal transformations have mainly been
studied as a pure perceptual effect, that is, when a speech stimulus
is repetitively presented to listeners (e.g., MacKay et al., 1993; Pitt
and Shoaf, 2002; Warren, 1961). It has been shown that this effect
also occurs during production, when subjects repeatedly utter the
speech stimulus in a covert or an overt mode (Reisberg et al., 1989;
Sato and Schwartz, 2003; Smith et al., 1995). Thus, the verbal
transformation effect seems to provide an interesting phenomenon
to test for a possible coupling between production and perception
within the language system.
The aim of the present study was to evaluate the neural correlates
of the imagery analogue of the verbal transformation effect. We
contrasted a baseline condition involving bsimpleQ mental repetition
of speech sequences and a verbal transformation condition involving
the mental repetition of the same stimuli and an active search for
transformations. Thus, we were able to identify the neural network
of brain areas specifically associated with the search for verbal
transformations. Since the verbal transformation task requires
subjects to analyze the mentally repeated speech form until the
emergence of a new one and, then, to make judgements about what
form is perceived, three distinct cognitive processes are possibly
involved: the on-line linguistic processing of the current rehearsed
sequence (syllable parsing), the decision-making process during the
popping up of a new speech form, and finally the temporary storage
of the newly built representation. Our hypothesis was therefore that
the verbal transformation task must involve a frontal–parietal
coupling between the left inferior frontal gyrus and the left
supramarginal gyrus, two regions traditionally considered to be
involved in the phonological analysis of spoken words (for a review,
see Poldrack et al., 1999) and in the temporary storage of
phonological forms (Cohen et al., 1997; Honey et al., 2000; Jonides
et al., 1998; Paulesu et al., 1993). This frontal–parietal coupling
involved in the on-line processing of the rehearsed sequence and the
temporary storage of the latterly built representation could then
provide a well-adapted platform for the decision-making process
during the emergence of a new speech form.
Materials and methods
Participants
Ten right-handed, healthy volunteers participated in the study
(seven males; mean age 24 F 7 years; handedness was assessed by
means of the Edinburgh Inventory; Oldfield, 1971). All participants were native speakers of French without hearing or speaking
disorders; none of them reported history of neurological or
psychiatric disease. Informed consent was obtained from each
participant before the experiment. The study was approved by the
local ethics committee.
Speech sequences
Two pairs of monosyllabic nonsense words were selected from
a set used in a previous behavioral study dealing with the imagery
analogue of the verbal transformation task (Sato and Schwartz,
2003). The speech sequences, /ps}/–/ }ps/ and /sp}/–/p}s/, consisted in a combination of the neutral vowel [ X ] and the bilabial
[p] and alveolar [s] consonants. We also used a pair of reversible
disyllabic first names, /lwi3ã/–/3ã lwi/ (bLouis-Jean,Q bJeanLouisQ) to test for lexical influences during the verbal transformation task (Shoaf and Pitt, 2002). The three pairs of speech
sequences are all phonotactically attested in French.
Task
For each pair of speech sequences, two conditions were
considered: a baseline condition involving simple mental repetition
of the speech sequences and a verbal transformation (VT) condition
involving mental repetition of the same stimuli with active search for
transformations. The choice of a mental repetition condition as
baseline was made to minimize cerebral activations not directly
associated with the verbal transformation task, considering that both
conditions share a common primary process, that is, the mental
repetition of the (same) speech sequences.
Concerning the VT condition, a previous behavioral study (Sato
and Schwartz, 2003) has shown that for the three pairs of stimuli
selected in this study, each speech sequence may be transformed into
its associated one. Subjects were instructed to detect the successive
occurrences of these transformations. For instance, for the [sp}]/
[p}s] pair, they were trained to first repeat [sp}] until the perceptual
pop up of [p}s], then to continue producing [p}s] while searching for
the associated form (i.e., [sp}]), and so on. This procedure slightly
differs from the classical verbal transformation paradigm in that, in
the latter, subjects are not informed about expected transformations.
However, since a previous behavioral study has demonstrated that
the main dynamics of perceptual transitions is that of a pairwise
coupling (Ditzinger et al., 1997), we hypothesize that our procedure
does involve all the specific cognitive processes associated to the
verbal transformation.
In order to ensure optimal task performance during the scans,
subjects had been extensively trained before the functional
magnetic resonance imaging (fMRI) experiments. During the
training sessions (lasting about 1 h), the tasks were rehearsed,
initially overtly and eventually covertly. Subjects were thereby
instructed to signal the detection of the verbal transformations
during the active search tasks using a finger-tapping procedure. All
subjects signaled such transformations. The rate of repetition has
been controlled during the training sessions during the overt task
performance. No significant differences between conditions have
then been noted. A repetition rate of 2 Hz has been chosen because
it had appeared during preliminary measurements that the
production of the speech sequences at this particular frequency is
fairly QnaturalQ and comfortable to the subjects.
fMRI paradigm
A block paradigm was used, which was composed of nine
baseline epochs interleaved with nine VT epochs. Each epoch
lasted 50 s. The total duration of the paradigm was 15 min. At the
beginning of each epoch, an instruction screen indicating the
condition and the particular pair of speech sequences to be used
was displayed (3 s). No message was displayed during the task
performance. Screens were generated by means of PsyScope v.1.1
(Carnegie Mellon Department of Psychology; Cohen et al., 1993)
running on a Macintosh computer. They were transmitted to the
M. Sato et al. / NeuroImage 23 (2004) 1143–1151
1145
reference. The images were then spatially normalized into the
reference system of Talairach and Tournoux (1988), using as
template a representative brain from the Montreal Neurological
Institute (Evans et al., 1994). The first step of the spatial
normalization was to determine the optimal affine transformation
for mapping the anatomical image onto the T1 template. Residual
differences between images were then corrected using nonlinear
basis functions (Friston et al., 1995a). The normalization parameters used were subsequently applied to the functional images.
Finally, the latter were filtered with a low-pass Gaussian filter
(FWHM = 8 8 12 mm3) to improve the signal-to-noise ratio
and to reduce the interindividual variability.
participants via a video projector, a projection screen situated
behind the magnet and a mirror centered above the subject’s eyes.
The order of the speech sequences was pseudo-randomized
between scans and subjects. In order to minimize movement
artefacts, a foam rubber pad secured the subjects’ heads. Finally,
participants wore earplugs to attenuate scanner noise.
MR acquisition
Two functional scans were performed. Axial T2*-weighted
functional images covering the whole brain were acquired with a
gradient-recalled echo echo-planar imaging (EPI) pulse sequence on
a 1.5-T MR imager (Philips NT, Best, The Netherlands). The
imaging volume was oriented parallel to the bicommissural plane
(AC-PC). Positioning of the image planes was performed on scout
images acquired in the sagittal plane. Each functional volume
composed 25 adjacent slices (slice thickness of 6 mm). The major
MR parameters of the EPI pulse sequence were as follows: TR =
3500 ms, TE = 45 ms, pulse angle = 908, field-of-view = 256 256
mm2, acquisition and reconstruction matrixes = 64 64 pixels.
bDummyQ volumes acquired at the beginning of each session to
allow for MRI signal stabilization were discarded from the
functional data set. Between the first and the second functional
MR scans, high-resolution 3-D T1-weighted images were acquired
from the same volume as the functional images (150 slices, 1 mm
thickness, field-of-view = 256 256 mm2, acquisition and
reconstruction matrixes = 256 256 pixels).
Data analysis: statistical model and inference
To detect task-related activations, the functional data were
analyzed using an implementation of the general linear model
(GLM; Friston et al., 1995b) in which the condition effects
(covariates of interest) were estimated voxelwise. Each condition
was modeled by convolving a boxcar function with the standard
hemodynamic response function to create regressors of interest.
Global activity was corrected by grand mean scaling, a high-pass
filter removed low-frequency drifts in the BOLD signal, and finally
the signal was temporally smoothed.
Fixed-effect analysis
We first proceeded to a multisubjects fixed-effect analysis. The
effect of the search for verbal transformations was assessed by
contrasting all VT epochs with all baseline epochs. The SPM{t} map
obtained was then transformed into SPM{Z} maps, thresholded at
P b 0.05 and corrected for multiple comparisons (cluster extent
threshold = 15 voxels). Activation clusters were characterized in
terms of their peak heights (Z value maxima) and locations within
the stereotactic space of Talairach and Tournoux (1988) (for
Data analysis: preprocessing
The data were preprocessed and analyzed with SPM-99
software (Wellcome Department of Cognitive Neurology, London,
UK) in the MATLAB environment (Mathworks, Sherbon, USA).
Motion correction was carried out for each subject. Functional
images were realigned using the first scan of each session as a
Table 1
Maximal activation peaks obtained for the [VT–baseline] contrast
Region of maximal activation
Voxels
Left/right
Talairach coordinates
x
Inferior frontal gyrus
Precentral gyrus
Middle frontal gyrus
Supramarginal gyrus
Supramarginal gyrus
Postcentral gyrus
Supramarginal gyrus
Supramarginal gyrus
Cerebellum
Cerebellum
Cerebellum
Cerebellum
Cerebellum
Cerebellum
Middle frontal gyrus
Medial frontal gyrus
Inferior frontal gyrus
Middle frontal gyrus
Inferior frontal gyrus
Caudate nucleus
264
192
103
146
171
53
77
46
19
40
L
L
L
L
L
L
R
R
L
L
L
R
R
R
R
L/R
R
R
R
L
y
55
48
28
44
51
59
44
63
32
20
20
20
32
20
24
4
59
51
36
20
BA
T value
Z value
44
4/6
6
40
40
3
40
40
12.63
10.93
10.43
11.85
11.17
7.83
10.99
6.40
9.88
7.97
5.31
9.85
8.96
7.13
9.82
8.86
8.02
6.77
6.29
5.86
Inf
Inf
Inf
Inf
Inf
7.80
Inf
6.39
Inf
Inf
5.30
Inf
Inf
7.10
Inf
Inf
Inf
6.75
6.28
5.85
z
9
2
1
33
36
14
32
33
52
63
68
64
51
55
3
14
13
6
24
11
Fixed-effect analysis, P b 0.05 corrected, cluster size threshold = 15 voxels. BA: Brodmann area.
22
44
55
46
52
28
51
29
23
12
32
37
18
12
61
44
21
44
4
23
6
6
44/45
6
1146
M. Sato et al. / NeuroImage 23 (2004) 1143–1151
coordinate conversion from MNI to Talairach and Tournoux
stereotactic space, see Brett et al., 2002).
Random-effect analysis
In order to test the potential predictive validity of our results and
make inference at the population level, we subsequently proceeded
to a random-effect analysis implemented with a two-level procedure
(Holmes and Friston, 1998; see also Friston et al., 1999). The
specific contrast images (i.e., VT versus baseline) were first
calculated for each subject individually and they were then entered
into a one-tailed t test with nine degrees of freedom. The set of
t values thus obtained constituted an SPM{t} map, which was
transformed into an SPM{Z} map thresholded at P b 0.005
uncorrected for multiple comparisons (cluster extent threshold =
15 voxels) for the two regions for which we had a strong a priori
hypothesis (i.e., the left inferior frontal and supramarginal gyri). All
other activations reported survived a threshold corresponding to P b
0.05 corrected for multiple comparisons at the cluster level.
Results
The fixed-effect analysis of the [VT–baseline] contrast
revealed a predominantly left-lateralized network of areas,
including frontal, parietal, temporal, basal ganglia, and cerebellar
regions (see Table 1 and Fig. 1). The maximal peak activation
was detected in the left opercular part of the inferior frontal gyrus
(i.e., in Broca’s area; x, y, z = 55 9 22), within a cluster
extending dorsally into the middle frontal gyrus (x, y, z = 28
1 55) and into the anterior part of the precentral gyrus (x, y, z =
48 2 44). Activations of the left anterior part of the insular
cortex and of the left perisylvian limit of the superior temporal
gyrus were also evident within this cluster. Less extensive
activations of homologous prefrontal or perisylvian areas were
found in the right hemisphere, with three peaks of activation
within the inferior frontal gyrus and within the middle frontal
gyrus (x, y, z = 59 13 21; x, y, z = 36 24 4; x, y, z = 24 3 61).
Bilateral activations of the medial prefrontal cortex (x, y, z = 4
14 44), that is, the supplementary motor area, were also detected,
extending ventrally within the anterior part of the cingulate cortex
in both hemispheres. In addition, activations were found in both
left and right inferior parietal lobules, albeit less extensive within
the right hemisphere (x, y, z = 44 33 46; x, y, z = 44 32
51). Peak activations were detected in the supramarginal gyrus,
with a cluster extending, within the left-hemisphere, into the
anterior limit of the postcentral gyrus (x, y, z = 59 14 28).
Finally, activations were also found in the left caudate nucleus (x,
y, z = 20 11 23) and in the cerebellum, bilaterally (x, y, z =
32 52 23; x, y, z = 20 64 37).
Fig. 1. Statistical parametric maps provided by the [VT–baseline] contrast in the fixed-effect analysis. The threshold is set at P b 0.05 corrected, excluding
clusters with spatial extent of less than 15 voxels. (A) Glass-brain views in the stereotactic space of the Montreal Neurological Institute. The two stronger
activations are in (a) the left inferior frontal gyrus (x, y, z = 55 9 22) and in (b) the left supramarginal gyrus (x, y, z = 44 33 46). (B) Left: activations
obtained in the sagittal, coronal, and transverse slices comprising the activation in the left inferior frontal gyrus (x, y, z = 55 9 22). Right: mean size
effect F standard error of the mean, for each regressor of interest within the peak voxel. The regressors of interest represent, respectively, the VT
conditions for the two pairs of monosyllabic nonsense words (1,2), for the pair of disyllabic first names (3), and the baseline conditions for the same
sequences (respectively, 4,5,6). (C) Left: activations obtained in the sagittal, coronal, and transverse slices comprising the activation in the left
supramarginal gyrus (x, y, z = 44 33 46). Right: mean size effect F standard error of the mean for each regressor of interest within the peak voxel. The
regressors of interest represent, respectively, the VT conditions for the two pairs of monosyllabic nonsense words (1,2), for the pair of disyllabic first names
(3), and the baseline conditions for the same sequences (respectively, 4,5,6).
M. Sato et al. / NeuroImage 23 (2004) 1143–1151
1147
Table 2
Maximal activation peaks obtained for the [VT–baseline] contrast
Region of maximal activation
Voxels
Left/right
Anterior cingulate gyrus
Anterior cingulate gyrus
Anterior cingulate gyrus
Cerebellum
Cerebellum
Cerebellum
Superior temporal gyrus
Inferior frontal gyrus
Inferior frontal gyrus
Supramarginal gyrus
Cerebellum
Cerebellum
Cerebellum
35
R
R
R
R
R
R
L
L
L
L
L
L
L
Talairach coordinates
x
41
29
17
31
y
16
4
24
36
12
20
55
55
44
40
4
4
8
BA
T value
Z value
32
24
6/32
6.39
5.17
4.90
5.63
4.44
4.24
5.17
4.28
4.24
4.44
4.30
4.23
3.82
3.83
3.44
3.34
3.60
3.15
3.06
3.44
3.08
3.06
3.15
3.09
3.06
2.87
z
21
13
2
48
51
55
8
9
9
41
55
63
40
27
32
44
23
13
12
0
22
16
35
7
12
13
22
44
44
40
Random-effects analysis, P b 0.005 uncorrected, cluster size threshold = 15 voxels. BA: Brodmann area. Except for the left inferior frontal and supramarginal
gyri, all activations reported survived a threshold corresponding to P b 0.05 corrected for multiple comparisons at the cluster level.
In order to test for possible differences in activation arising from
the use of different types of speech sequences, we estimated the
mean size of effect for each regressor of interest for the peak
activations within the left inferior frontal gyrus (x, y, z = 55 9 22;
Fig. 1B) and within the left supramarginal gyrus (x, y, z = 44 33
46; Fig. 1C). In both cases, a stronger effect was obtained for the
monosyllabic nonsense words than for the disyllabic first names.
The random-effect analysis yielded a more reduced set of
activations (see Table 2 and Fig. 2). Compared with the fixed-effect
analysis, the peak activation in the medial frontal gyrus was shifted
slightly ventrally into the anterior part of the right cingulate gyrus
(x, y, z = 16 21 27). Activations within the inferior frontal gyrus
were observed in the left hemisphere only (x, y, z = 55 9 22; x, y,
z = 44 9 16). As with the fixed-effect analysis, activations were
detected within the left supramarginal gyrus (x, y, z = 40 41
35), albeit more ventrally, the cerebellum (x, y, z = 4 55 7; x,
y, z = 36 48 23) and the left superior temporal gyrus (x, y, z =
55 8 0).
Fig. 2. Statistical parametric maps provided by the [VT–baseline] contrast in the random-effect analysis. The threshold is set at P b 0.005 uncorrected,
excluding clusters with spatial extent of less than 15 voxels. (A) Glass-brain views in the stereotactic space of the Montreal Neurological Institute. (B)
Activations obtained in the sagittal, coronal, and transverse slices comprising the activation in the left inferior frontal gyrus (x, y, z = 55 9 22). (C) Activations
obtained in the sagittal, coronal, and transverse slices comprising the activation in the left supramarginal gyrus (x, y, z = 40 41 35).
1148
M. Sato et al. / NeuroImage 23 (2004) 1143–1151
Discussion
The aim of this fMRI study was to explore the neural correlates
of the imagery analogue of the verbal transformation effect. The
auditory imagery task performed involves on-line linguistic
processing of the mentally rehearsed sequence, a decision-making
process during the popping up of a new speech form, and the
temporary storage of the newly built representation. To minimize
cerebral activations not directly associated with the search for
verbal transformations, two conditions were contrasted: a baseline
condition involving the simple mental repetition of speech
sequences and a verbal transformation condition (VT) involving
the mental repetition of the same items with an active search for
verbal transformation. The contrast between the verbal transformation task and the baseline revealed, in the random-effect
analysis, a predominantly left-lateralized network of brain areas
including the left inferior frontal gyrus, the left superior temporal
gyrus, the left supramarginal gyrus, the right anterior cingulate
gyrus, and the cerebellum bilaterally. In the fixed-effect analysis,
speech-motor-related areas were furthermore found activated: the
anterior part of the insular cortex, the supplementary motor area,
the premotor cortex, the left caudate nucleus, and the left
postcentral gyrus.
Task performance
As the verbal transformation task and the baseline were
performed covertly, so as to avoid task-related motion artefacts
during fMRI acquisition, no external assessment of task performance was conducted. Therefore, to ensure optimal realization of
the tasks during the scans, subjects were extensively trained before
the fMRI experiments. While slight fluctuations in repetition rate
within and between conditions may not be excluded a priori,
preliminary measurements during the training session, using an
overt repetition mode, displayed no significant differences of the
repetition rate between nor within conditions. Since the production
of the speech sequences at this particular frequency appears to be
very stable, little systematic effect of the repetition rate was
expected during the scans. Concerning the actual execution of the
imagery task on-line, all the participants signaled such transformations during the training session. Furthermore, upon debriefing following the actual fMRI experiments, they all reported
having performed the tasks correctly (i.e., having searched for the
transformations—and having detected some—during the active
condition and having merely repeated the sequences without
actively searching during the baseline condition). Finally, the
incidental occurrence of verbal transformations during the baseline
condition may certainly not be excluded a priori. However, while
such occurrences might lower the statistical significance of the
activations obtained when contrasting the baseline and active task
conditions, they are not expected to affect the main conclusions of
this study.
Involvement of the phonological loop
As it generally requires temporary manipulation of verbal
material, auditory imagery is assumed to be mediated by the
phonological loop—the Baddeley’s verbal working memory
system—and to involve both components of this model, that
is, the phonological store and the articulatory rehearsal system
(Baddeley and Logie, 1992; Reisberg et al., 1989; Smith et al.,
1995). Behavioral studies have tested this hypothesis using the
imagery analogue of the verbal transformation effect (Reisberg
et al., 1989; Smith et al., 1995). Those studies have shown that
the reinterpretation of auditory images is blocked both by
concurrent articulation and concurrent auditory input, two effects
known to disrupt performance during verbal working memory
tasks. Given that the neural network of brain areas obtained in
the [VT–baseline] contrast appears to be similar to that involved
in verbal working memory (Cohen et al., 1997; Henson et al.,
2000; Honey et al., 2000, 2002; Jonides et al., 1998; Paulesu et
al., 1993; Schumacher et al., 1996; Smith et al., 1998), our
results corroborate these behavioral evidences regarding the
involvement of verbal working memory during the verbal
transformation task.
A frontoparietal coupling between the left inferior frontal gyrus
and the supramarginal gyrus
Since the original study by Paulesu et al. (1993), several
memory load parametric functional studies have indicated a
neuroanatomical dissociation during verbal working memory tasks,
with the left inferior frontal gyrus and the left supramarginal gyrus
supporting the processes underlying verbal rehearsal and temporary retention of phonological contents, respectively (e.g., Cohen et
al., 1997; Honey et al., 2000, 2002).
Considering the processes specifically associated with the
verbal transformation task, the view of a frontal–parietal dissociation in terms of rehearsal and storage of phonological material
deserves to be further examined.
Traditionally, literature on verbal working memory and
language has emphasized Broca’s area (more specifically, the
opercular part of the inferior frontal gyrus) as a major structure
involved in subvocal rehearsal and speech production. Our results
permit to refine the role of Broca’s area in these processes. While
both conditions in our study involved subvocal rehearsal of the
same speech sequences, one of the main activations observed for
the [VT–baseline] contrast was that of the left inferior frontal gyrus
in its opercular part. It seems therefore difficult to assign a purely
rehearsal role to this region. Following this line, the classical view
of Broca’s area, as being the major structure for the motor act of
speech, has been reconsidered in recent functional neuroimaging
studies that obtained no activation of this region during automatic
speech tasks, executed either overtly or covertly (Bookheimer et
al., 2000; Murphy et al., 1997; Wise et al., 1999). Convergent
neurological findings have been obtained from stroke patients and
from patients having undergone resection. Removal of, or injury to,
Broca’s area does not necessarily result in persistent Broca’s
aphasia (as traditionally described) and the articulatory planning
deficits observed in Broca’s aphasia may well result from lesions
outside this region (Dronkers, 1996; Dronkers et al., 2000). Several
studies have demonstrated the implication of the left inferior
frontal gyrus during graphophonemical conversion processes in
silent reading (Fiez et al., 1999; Price et al., 1997), attentive speech
listening (Papathanassiou et al., 2000), lip reading (Calvert and
Campbell, 2003), and more specifically during phonological
analysis processing such as phoneme monitoring, syllable counting, and rhyming (Démonet et al., 1992, 1994; Paulesu et al., 1993;
for a review, see Poldrack et al., 1999). Therefore, it appears that
Broca’s area is not stricto-sensu related to subvocal rehearsal or to
the motor act of speech production but rather to the on-line analysis
of articulatory speech forms that support communicative or
M. Sato et al. / NeuroImage 23 (2004) 1143–1151
interpretative speech. Hence, this region appears well adapted to
syllable parsing in the verbal transformation task.
A broad range of other speech-related areas were found activated
in the [VT–baseline] contrast: the supplementary motor area, the
premotor area, the insular cortex, the caudate nucleus, and the
cerebellum. These regions are respectively considered to be
involved in the initiation (Ziegler et al., 1997), the planning and
coordination of articulatory movements (Dronkers, 1996; Wise et
al., 1999), and in the modulation and temporal encoding or control of
such movements (Mathiak et al., 2003; Wildgruber et al., 2001). We
hypothesize that these activations may be related to the shifts in
stimulus (from one speech form to the other) during the verbal
transformation task. These successive shifts would then imply
alternating articulatory or phonological mental representations of the
rehearsed items and therefore a stronger involvement of the speechrelated areas than is the case during baseline. In addition to these
speech-related areas, the left superior temporal gyrus was also
activated in the [VT–baseline] contrast. The activation of this region
has been observed during inner speech and auditory imagery tasks
and has been hypothesized to underlie a verbal monitoring system
(McGuire et al., 1996; Shergill et al., 2001, 2002). Communication
between frontal and temporal regions during the generation of inner
speech may then inform areas involved in language perception that
verbal output is self-generated. This hypothesis is also consistent
with the activations observed in these regions during auditory verbal
hallucinations in patients with schizophrenia (Shergill et al., 2000)
and with the notion that verbal hallucinations are due to a lack of
awareness of the patient’s normal inner speech.
The left supramarginal gyrus has been found activated in the
[VT–baseline] contrast. Given that the verbal transformation task
appears to require minimal verbal storage (i.e., the storage of the
rehearsed sequence until the emergence of the associated form), the
traditional view of this region as being involved in the temporary
storage of phonologically coded verbal material must be reexamined. Hickok and Poeppel (2000) (see also Hickok, 2003) have
offered a hypothesis that better accounts for our results. In their
model of speech perception, in line with several other models that
posit a basis in articulatory or gestural representations (Liberman
and Mattingly, 1985; Liberman and Whalen, 2000, Schwartz et al.,
2002), the authors suggest a fronto-temporo-parietal network
predominantly in the left hemisphere, which interfaces auditory
and articulatory representations of speech. According to these
authors, the supramarginal gyrus bshould not be the site of storage
of phonemic representations per se, [. . .] but rather serve to
interface sound-based representations in auditory cortex with
articulatory-based representations of speech in the frontal cortexQ
via sensorimotor recoding. From our results, we assume that a
similar fronto-temporo-parietal circuit is recruited during the verbal
transformation task. Accordingly, while the superior temporal
gyrus could be responsible for verbal monitoring, the supramarginal gyrus could be involved in the encoding of phonologicalintegrated representations of speech sequences. Once recoded,
these representations would then be sent to the inferior frontal
gyrus. At the emergence of a new representation, which could rely
on both the inferior frontal gyrus for syllable parsing and on the
anterior cingulate cortex for possible competition mechanisms
between representations (Carter et al., 1998), the new speech form
would then be sent back to the supramarginal gyrus. Further
arguments in favor of this frontoparietal coupling stem from the
analysis of the mean size of effect for each regressor of interest
inside the peak activations of these two regions (Fig. 1C). These
1149
estimations clearly show stronger effect for the monosyllabic
nonsense words than for the disyllabic first names during the VT
condition. This result may be explained by the lexical and semantic
nature of the latter sequences. With such materials, an auditorymotor network should not play a prominent role, rather subjects
should rely more on semantic codes (Jonides et al., 1998). It is thus
conceivable that boverlearnedQ lexical sequences require less neural
involvement in both the inferior frontal gyrus and the supramarginal gyrus than nonsense articulatory gestures.
Finally, the frontoparietal coupling involved in the on-line
linguistic processing of the mentally rehearsed sequence and the
temporary storage of the latterly built representation could provide
a well-adapted platform for the decision-making process during the
emergence of a new speech form. Activations of the cerebellar
cortex and the cingulate gyrus are of particular interest in relation
to this decision-making process. Desmond et al. (1997) suggested
that during verbal working memory tasks, the cerebellum could act
as a comparator system in which discrepancies between actual and
intended phonological output (coming from the articulatory control
system in the prefrontal cortex and the phonological store in the
inferior parietal cortex) could be estimated and then used to update
a feedforward articulatory rehearsal command to the frontal cortex.
The anterior cingulate gyrus may also be involved in the decisionmaking process. This region has indeed been previously described
as being implied in executive processes such as attentional control
and monitoring of performance during verbal working memory
tasks (Osaka et al., 2003; Smith and Jonides, 1999).
In summary, our results, combined with those of previous
behavioral studies, indicate the involvement of verbal working
memory during the imagery analogue of the verbal transformation
effect. The functional coupling observed in the left hemisphere
between the inferior frontal gyrus, the supramarginal gyrus, and the
superior temporal gyrus—areas considered to be involved in the online linguistic processing, the temporary storage, and the selfmonitoring of verbal material—strongly suggests that the auditory
imagery task shares common components of speech perception and
speech production and that it relies both on sound-based and on
articulatory representations. In addition to this fronto-temporoparietal neural network, activations observed within the right
anterior cingulate cortex and the cerebellum bilaterally are assumed
to reflect attentional control and comparison of speech forms
processes during the active search for verbal transformations. The
functional coupling between all these brain regions might constitute
the neural basis for decision-making processes during the emergence
of a new speech form.
References
Aleman, A., Nieuwenstein, M.R., Bocker, K.B., de Haan, H.F., 2000.
Music training and mental imagery ability. Neuropsychologia 38 (12),
1664 – 1668.
Baddeley, A.D., 1986. Working Memory. Oxford Univ. Press, Oxford.
Baddeley, A.D., 1992. Working Memory. Science 255, 556 – 559.
Baddeley, A.D., Hitch, G.J., 1974. Working memory. In: Bower, G.A.
(Ed.), Recent Advances in Learning and Motivation, vol. 8. New York
Academic Press, New York, pp. 47 – 90.
Baddeley, A.D., Logie, R., 1992. Auditory imagery and working memory.
In: Reisberg, D. (Ed.), Auditory Imagery. Lawrence Erlbaum, Hillsdale,
pp. 179 – 197.
Bookheimer, S.Y., Zeffiro, T.A., Blaxton, T., Gaillard, W.D., Theodore,
W.H., 2000. Activation of language cortex with automatic speech tasks.
Neurology 55, 1151 – 1157.
1150
M. Sato et al. / NeuroImage 23 (2004) 1143–1151
Brett, M., Johnrsude, I.S., Owen, A.M., 2002. The problem of localization
in the human brain. Nat. Neurosci. 3, 243 – 249.
Calvert, G.A., Campbell, R., 2003. Reading speech from still and moving
faces: the neural substrates of visible speech. J. Cogn. Neurosci. 15 (1),
57 – 70.
Carter, C.S., Braver, T.S., Bach, D.M., Botvinick, M.M., Noll, D., Cohen,
J.D., 1998. Anterior cingulate cortex, error detection, and the online
monitoring of performance. Science 280, 747 – 749.
Cohen, J.D., MacWhinney, B., Flatt, M.R., Provost, J., 1993. PsyScope: a
new graphic interactive environment for designing psychology experiments. Behav. Res. Methods Instrum. Comput. 25 (2), 257 – 271.
Cohen, J.D., Perlstein, W.M., Braver, T.S., Nystrom, L.E., Noll, D.C.,
Jonides, J., Smith, E.E., 1997. Temporal dynamics of brain activation
during a working memory task. Nature 386, 604 – 608.
Démonet, J.-F., Chollet, F., Ramsay, S., Cardebat, D., Nespoulous, J.-L.,
Wise, R., Rascol, A., Frackowiak, R.S.J., 1992. The anatomy of
phonological and semantic processing in normal subjects. Brain 115,
1753 – 1768.
Démonet, J.-F., Price, C., Wise, R., Frackowiak, R.S.J., 1994. A pet study
of cognitive strategies in normal subjects during language tasks:
influence on phonetic ambiguity and sequence processing on phoneme
monitoring. Brain 117 (4), 671 – 682.
Desmond, J.E., Gabrieli, J.D.E., Wagner, A., Ginier, B.L., Glover, G.H.,
1997. Lobular patterns of cerebellar activation in verbal working
memory and finger-tapping tasks as revealed by functional MRI.
J. Neurosci. 17, 9675 – 9685.
Ditzinger, T., Tuller, B., Kelso, J.A.S., 1997. Temporal patterning in an
auditory illusion: the verbal transformation effect. Biol. Cybern. 77,
23 – 30.
Dronkers, N.F., 1996. A new brain region for coordinating speech
articulation. Nature 384, 159 – 161.
Dronkers, N.F., Redfern, B.B., Knight, R.T., 2000. The neural architecture
of language disorders. In: Gazzaniga, M.S. (Ed.), The New Cognitive
Neurosciences. MIT Press, Cambridge, pp. 949 – 958.
Evans, A.C., Kamber, M., Collins, D.L., MacDonald, D., 1994. An MRIbased probabilistic atlas of neuroanatomy. In: Shorvon, S., Fish, D.,
Andermann, F., Bydder, G.M., Stefan, H. (Eds.), Magnetic Resonance
Scanning and Epilepsy, NATO ASI Series A, Life Sciences, vol. 264.
Plenum, New York, pp. 263 – 274.
Fiez, J., Balota, D.A., Raichle, M.E., Petersen, S.E., 1999. Effects of
lexicality, frequency, and spelling to sound consistency on the
functional anatomy of reading. Neuron 24, 205 – 218.
Friston, K.J., Ashburner, J., Frith, C.D., Poline, J.-B., Heater, J.D.,
Frackowiak, R.S.J., 1995a. Spatial registration and normalization of
images. Hum. Brain Mapp. 2, 165 – 189.
Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.-B., Frith, C.D.,
Frackowiak, R.S.J., 1995b. Statistical parametric maps in functional
imaging: a general linear approach. Hum. Brain Mapp. 2, 189 – 210.
Friston, K.J., Holmes, A.P., Worsley, K.J., 1999. How many subjects
constitute a study? NeuroImage 10 (1), 1 – 5.
Halpern, A.R., Zatorre, R.J., 1999. When that tunes runs through your head:
a PET investigation of auditory imagery for familiar melodies. Cereb.
Cortex 9 (7), 697 – 704.
Henson, R.N.A., Burgess, N., Frith, C.D., 2000. Recoding, storage,
rehearsal and grouping in verbal short-term memory: an fMRI study.
Neuropsychologia 38, 426 – 440.
Hickok, G., 2003. Auditory-motor interaction revealed by fMRI: speech,
music and working memory in area Spt. J. Cogn. Neurosci. 15 (5),
673 – 682.
Hickok, G., Poeppel, D., 2000. Towards a functional neuroanatomy of
speech perception. Trends Cogn. Sci. 4 (4), 131 – 138.
Holmes, A.P., Friston, K.J., 1998. Generalisibility, random effects and
population inference. NeuroImage 7, S754.
Honey, G.D., Bullmore, E., Sharma, T., 2000. Prolonged reaction time to a
verbal working memory task predicts increased power of posterior
parietal cortical activation. NeuroImage 12, 495 – 503.
Honey, G.D., Fu, C.H.Y., Kim, J., Brammer, M.J., Croudace, T.J., Suckling,
J., Pich, E.M., Williams, S.C.R., Bullmore, E., 2002. Effects of verbal
working memory load on corticocortical connectivity modelled by path
analysis of functional magnetic resonance imaging data. NeuroImage
17, 573 – 582.
Jonides, J., Schumacher, E.H., Smith, E.E., Koeppe, R.A., Awh, E., ReuterLorentz, P.A., Marshuetz, C., Willis, C.R., 1998. The role of parietal
cortex in verbal working memory. J. Neurosci. 18, 5026 – 5034.
Liberman, A.M., Mattingly, I.G., 1985. The motor theory of speech
perception revised. Cognition 21, 1 – 36.
Liberman, A.M., Whalen, D.H., 2000. On the relation of speech to
language. Trends Cogn. Sci. 4, 187 – 196.
MacKay, D.G., Wulf, G., Yin, C., Abrams, L., 1993. Relations between
word perception and production: new theory and data on the verbal
transformation effect. J. Mem. Lang. 32, 624 – 646.
Mathiak, K., Hertrich, I., Grodd, W., Ackermann, H., 2003. Cerebellum and
speech perception: a functional magnetic resonance imaging study.
J. Cogn. Neurosci. 14 (6), 902 – 912.
McGuire, P.K., Silbersweig, D.A., Murray, R.M., David, A.S., Frackowiak,
R.S., Frith, C.D., 1996. Functional anatomy of inner speech and
auditory imagery. Psychol. Med. 26 (1), 29 – 38.
Miyake, A., Shah, P., 1999. Models of Working Memory: Mechanisms of
Active Maintenance and Executive Control. Cambridge Univ. Press,
New York.
Murphy, K., Corfield, D.R., Guz, A., Fink, G.R., Harrison, J., Wise, R.J.S.,
Adams, L., 1997. Cerebral areas associated with motor control of
speech in humans. J. Appl. Physiol. 83 (5), 1438 – 1447.
Oldfield, R.C., 1971. The assessment and analysis of handedness: the
Edinburgh Inventory. Neuropsychologia 9, 97 – 114.
Osaka, M., Osaka, N., Kondo, H., Morishita, M., Fukuyama, H., Aso, T.,
Shibasaki, H., 2003. The neural basis of individual differences in
working memory capacity: an fMRI study. NeuroImage 18, 789 – 797.
Papathanassiou, D., Etard, O., Mellet, E., Zago, L., Mazoyer, B., TzourioMazoyer, N., 2000. A common language network for comprehension
and production: a contribution to the definition of language epicenters
with PET. NeuroImage 11 (4), 347 – 357.
Paulesu, E., Frith, C.D., Frackowiak, R.S.J., 1993. The neural correlates of
the verbal components of working memory. Nature 362, 342 – 344.
Pitt, M., Shoaf, L., 2002. Linking verbal transformations to their causes. J.
Exp. Psychol. Hum. Percept. Perform. 28, 150 – 162.
Poldrack, R.A., Wagner, A.D., Prull, M.W., Desmond, J.E., Glover, G.H.,
Gabrieli, J.D.E., 1999. Functional specialization for semantic and
phonological processing in the left inferior prefrontal cortex. NeuroImage 10, 15 – 35.
Price, C.J., Moore, C.J., Humphreys, G.W., Wise, R.J.S., 1997. Segregating
semantic from phonological processes during reading. J. Cogn.
Neurosci.s 9, 727 – 733.
Reisberg, D., 1992. Auditory Imagery. Lawrence Erlbaum, Hillsdale.
Reisberg, D., Smith, J.D., Baxter, A.D., Sonenshine, M., 1989. Enacted
auditory images are ambiguous; pure auditory images are not. Q. J. Exp.
Psychol. 41A, 619 – 641.
Sato, M., Schwartz, J.-L., 2003. Linking speech, verbal imagery and
working memory: articulatory control constraints in the verbal transformation effect. In: Solé, M.J., Recasens, D., Romero, J. (Eds.),
Proceedings of the XVth International Congress of Phonetic Sciences,
Casual Productions, Adelaide, pp. 435 – 438.
Schumacher, E.H., Lauber, E., Awh, E., Jonides, J., Smith, E., Koeppe,
R.A., 1996. PET evidence for an amodal verbal working memory
system. NeuroImage 3, 79 – 88.
Schwartz, J.-L., Abry, C., BoJ, L.-J., Cathiard, M.A., 2002. Phonology in a
theory of perception-for-action-control. In: Durand, J., Lacks, B. (Eds.),
Phonology: From Phonetics to Cognition. Oxford Univ. Press, Oxford,
pp. 240 – 280.
Shergill, S.S., Bullmore, E.T., Simmons, A., Murray, R.M., McGuire, P.K.,
2000. Functional anatomy of auditory imagery in schizophrenic patients
with auditory hallucinations. Am. J. Psychiatry 157 (10), 1691 – 1693.
M. Sato et al. / NeuroImage 23 (2004) 1143–1151
Shergill, S.S., Bullmore, E.T., Brammer, M.J., Williams, S.C., Murray,
R.M., McGuire, P.K., 2001. A functional study of auditory imagery.
Psychol. Med. 31 (2), 241 – 253.
Shergill, S.S., Brammer, M.J., Fukuda, R., Bullmore, E., Amaro Jr., E.,
Murray, R.M., McGuire, P.K., 2002. Modulation of activity in
temporal cortex during generation of inner speech. Hum. Brain
Mapp. 16, 219 – 227.
Shoaf, L., Pitt, M., 2002. Does node stability underlie the verbal
transformation effect? A test of node structure theory. Percept.
Psychophys. 64 (5), 795 – 803.
Smith, E.E., Jonides, J., 1999. Storage and executive processes in the
frontal lobes. Science 283, 1657 – 1661.
Smith, J.D., Reisberg, D., Wilson, M., 1995. The role of subvocalization in
auditory imagery. Neuropsychologia 11, 1433 – 1454.
Smith, E.E., Jonides, J., Marshuetz, C., Koeppe, R.A., 1998. Components
of verbal working memory: evidence from neuroimaging. Proc. Natl.
Acad. Sci. 95, 876 – 882.
1151
Talairach, J., Tournoux, P., 1988. A Co-Planar Stereo-Taxic Atlas of Human
Brain. Thieme, Stuttgart.
Warren, M.R., 1961. Illusory changes of distinct speech upon
repetition—The verbal transformation effect. Br. J. Psychol. 52,
249 – 258.
Warren, M.R., Gregory, R.L., 1958. An auditory analogue of the visual
reversible figure. Am. J. Psychol. 71, 612 – 613.
Wildgruber, D., Ackermann, H., Grodd, W., 2001. Differential contributions of motor cortex, basal ganglia, and cerebellum to speech motor
control: effects of syllable repetition rate evaluated by fMRI. NeuroImage 13, 101 – 109.
Wise, R.J., Greene, J., Buchel, C., Scott, S.K., 1999. Brain regions involved
in articulation. Lancet 353, 1057 – 1061.
Ziegler, W., Kilian, B., Deger, K., 1997. The role of left mesial
frontal cortex in fluent speech: evidence from a case of left
supplementary motor area hemorrhage. Neuropsychology 35,
1197 – 1208.