* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Multistable representation of speech forms: a functional - GIPSA-Lab
Neuroeconomics wikipedia , lookup
Neurolinguistics wikipedia , lookup
Neurophilosophy wikipedia , lookup
Recurrent neural network wikipedia , lookup
Holonomic brain theory wikipedia , lookup
Neuroesthetics wikipedia , lookup
State-dependent memory wikipedia , lookup
Anatomy of the cerebellum wikipedia , lookup
Embodied language processing wikipedia , lookup
Guided imagery wikipedia , lookup
Mental chronometry wikipedia , lookup
Affective neuroscience wikipedia , lookup
Aging brain wikipedia , lookup
Neurocomputational speech processing wikipedia , lookup
Speech perception wikipedia , lookup
Process tracing wikipedia , lookup
Broca's area wikipedia , lookup
Emotional lateralization wikipedia , lookup
Time perception wikipedia , lookup
www.elsevier.com/locate/ynimg NeuroImage 23 (2004) 1143 – 1151 Multistable representation of speech forms: a functional MRI study of verbal transformations Marc Sato,a,* Monica Baciu,b Hélène LKvenbruck,a Jean-Luc Schwartz,a Marie-Agnès Cathiard,a Christoph Segebarth,c and Christian Abrya a Institut de la Communication Parlée, CNRS UMR 5009, Institut National Polytechnique de Grenoble, Université Stendhal, Grenoble, France Laboratoire de Psychologie et Neurocognition, CNRS UMR 5105, Université Pierre Mendès, Grenoble, France c Unité Mixte INSERM/Université Joseph Fourier U594, NeuroImagerie Fonctionnelle et Métabolique, LRC CEA 30V, Grenoble, France b Received 12 March 2004; revised 12 July 2004; accepted 19 July 2004 Available online 6 October 2004 We used functional magnetic resonance imaging (fMRI) to localize the brain areas involved in the imagery analogue of the verbal transformation effect, that is, the perceptual changes that occur when a speech form is cycled in rapid and continuous mental repetition. Two conditions were contrasted: a baseline condition involving the simple mental repetition of speech sequences, and a verbal transformation condition involving the mental repetition of the same items with an active search for verbal transformation. Our results reveal a predominantly left-lateralized network of cerebral regions activated by the verbal transformation task, similar to the neural network involved in verbal working memory: the left inferior frontal gyrus, the left supramarginal gyrus, the left superior temporal gyrus, the anterior part of the right cingulate cortex, and the cerebellar cortex, bilaterally. Our results strongly suggest that the imagery analogue of the verbal transformation effect, which requires percept analysis, form interpretation, and attentional maintenance of verbal material, relies on a working memory module sharing common components of speech perception and speech production systems. D 2004 Elsevier Inc. All rights reserved. Keywords: fMRI; Auditory imagery; Verbal working memory; Speech production; Speech perception Introduction Auditory imagery can be defined as an introspective and conscious persistence of an auditory experience in the absence of related auditory input. It plays an important role in numerous cognitive functions whenever auditory material is represented for * Corresponding author. Institut de la Communication Parlée-UMR CNRS 5009, Institut National Polytechnique de Grenoble-Université Stendhal 46, Avenue Félix Viallet, 38031 Grenoble Cedex 01-France. Fax: +33 4 76 57 47 10. E-mail address: [email protected] (M. Sato). Available online on ScienceDirect (www.sciencedirect.com.) 1053-8119/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2004.07.055 analysis to make comparisons or to form interpretations from it (Reisberg, 1992). For example, with visually presented stimuli, auditory imagery may be involved in various phonological tasks, such as homophone detection, rhyme judgments, or metrical stress evaluation (for a review, see Reisberg, 1992; Smith et al., 1995). Likewise, auditory imagery may also play a role in music perception and cognition (e.g., Aleman et al., 2000; Halpern and Zatorre, 1999) and even during auditory hallucinations in schizophrenia (e.g., Shergill et al., 2000). As it requires temporary manipulation of verbal material, auditory imagery is assumed to be mediated by verbal working memory, that is, the cognitive system involved in the temporary maintenance and manipulation of information (Baddeley and Logie, 1992; Reisberg et al., 1989; Smith et al., 1995). A classical model of verbal working memory, in line with psychological, neuropsychological, and developmental findings, is the phonological, or articulatory, loop (Baddeley, 1986, 1992; Baddeley and Hitch, 1974; for a review of verbal working memory models, see Miyake and Shah, 1999). According to this model, verbal working memory comprises two components, a passive phonological store, in which the phonological material is temporarily held, and an active articulatory rehearsal system, the role of which is to refresh the phonological material by subvocal articulation. Previous behavioral as well as functional neuroimaging studies have provided convergent evidence for the involvement of this mnemic system during auditory imagery (McGuire et al., 1996; Shergill et al., 2001; Smith et al., 1995). For instance, Shergill et al. (2001) showed that consciously recalling the sound of someone’s voice (imagining speech) involves a leftlateralized network of frontal, parietal, temporal, and cerebellar areas similar to that traditionally considered to be involved in verbal working memory. One such auditory imagery task is the imagery analogue of the verbal transformation effect, first described by Warren and Gregory (1958). This effect arises when a speech form is cycled in rapid and continuous repetition. While, initially, a percept matching the original form is perceived, at some point of time, another percept 1144 M. Sato et al. / NeuroImage 23 (2004) 1143–1151 suddenly pops up, corresponding to an abrupt change in perception of the original speech form. This transformation process persists throughout the repetition procedure, leading to perceptual transitions from one speech form to another (or back to the original form). For example, rapid repetitions of the word blifeQ provide a sound flow fully compatible with the perception that either blifeQ or bflyQ is being repeated. Verbal transformations have mainly been studied as a pure perceptual effect, that is, when a speech stimulus is repetitively presented to listeners (e.g., MacKay et al., 1993; Pitt and Shoaf, 2002; Warren, 1961). It has been shown that this effect also occurs during production, when subjects repeatedly utter the speech stimulus in a covert or an overt mode (Reisberg et al., 1989; Sato and Schwartz, 2003; Smith et al., 1995). Thus, the verbal transformation effect seems to provide an interesting phenomenon to test for a possible coupling between production and perception within the language system. The aim of the present study was to evaluate the neural correlates of the imagery analogue of the verbal transformation effect. We contrasted a baseline condition involving bsimpleQ mental repetition of speech sequences and a verbal transformation condition involving the mental repetition of the same stimuli and an active search for transformations. Thus, we were able to identify the neural network of brain areas specifically associated with the search for verbal transformations. Since the verbal transformation task requires subjects to analyze the mentally repeated speech form until the emergence of a new one and, then, to make judgements about what form is perceived, three distinct cognitive processes are possibly involved: the on-line linguistic processing of the current rehearsed sequence (syllable parsing), the decision-making process during the popping up of a new speech form, and finally the temporary storage of the newly built representation. Our hypothesis was therefore that the verbal transformation task must involve a frontal–parietal coupling between the left inferior frontal gyrus and the left supramarginal gyrus, two regions traditionally considered to be involved in the phonological analysis of spoken words (for a review, see Poldrack et al., 1999) and in the temporary storage of phonological forms (Cohen et al., 1997; Honey et al., 2000; Jonides et al., 1998; Paulesu et al., 1993). This frontal–parietal coupling involved in the on-line processing of the rehearsed sequence and the temporary storage of the latterly built representation could then provide a well-adapted platform for the decision-making process during the emergence of a new speech form. Materials and methods Participants Ten right-handed, healthy volunteers participated in the study (seven males; mean age 24 F 7 years; handedness was assessed by means of the Edinburgh Inventory; Oldfield, 1971). All participants were native speakers of French without hearing or speaking disorders; none of them reported history of neurological or psychiatric disease. Informed consent was obtained from each participant before the experiment. The study was approved by the local ethics committee. Speech sequences Two pairs of monosyllabic nonsense words were selected from a set used in a previous behavioral study dealing with the imagery analogue of the verbal transformation task (Sato and Schwartz, 2003). The speech sequences, /ps}/–/ }ps/ and /sp}/–/p}s/, consisted in a combination of the neutral vowel [ X ] and the bilabial [p] and alveolar [s] consonants. We also used a pair of reversible disyllabic first names, /lwi3ã/–/3ã lwi/ (bLouis-Jean,Q bJeanLouisQ) to test for lexical influences during the verbal transformation task (Shoaf and Pitt, 2002). The three pairs of speech sequences are all phonotactically attested in French. Task For each pair of speech sequences, two conditions were considered: a baseline condition involving simple mental repetition of the speech sequences and a verbal transformation (VT) condition involving mental repetition of the same stimuli with active search for transformations. The choice of a mental repetition condition as baseline was made to minimize cerebral activations not directly associated with the verbal transformation task, considering that both conditions share a common primary process, that is, the mental repetition of the (same) speech sequences. Concerning the VT condition, a previous behavioral study (Sato and Schwartz, 2003) has shown that for the three pairs of stimuli selected in this study, each speech sequence may be transformed into its associated one. Subjects were instructed to detect the successive occurrences of these transformations. For instance, for the [sp}]/ [p}s] pair, they were trained to first repeat [sp}] until the perceptual pop up of [p}s], then to continue producing [p}s] while searching for the associated form (i.e., [sp}]), and so on. This procedure slightly differs from the classical verbal transformation paradigm in that, in the latter, subjects are not informed about expected transformations. However, since a previous behavioral study has demonstrated that the main dynamics of perceptual transitions is that of a pairwise coupling (Ditzinger et al., 1997), we hypothesize that our procedure does involve all the specific cognitive processes associated to the verbal transformation. In order to ensure optimal task performance during the scans, subjects had been extensively trained before the functional magnetic resonance imaging (fMRI) experiments. During the training sessions (lasting about 1 h), the tasks were rehearsed, initially overtly and eventually covertly. Subjects were thereby instructed to signal the detection of the verbal transformations during the active search tasks using a finger-tapping procedure. All subjects signaled such transformations. The rate of repetition has been controlled during the training sessions during the overt task performance. No significant differences between conditions have then been noted. A repetition rate of 2 Hz has been chosen because it had appeared during preliminary measurements that the production of the speech sequences at this particular frequency is fairly QnaturalQ and comfortable to the subjects. fMRI paradigm A block paradigm was used, which was composed of nine baseline epochs interleaved with nine VT epochs. Each epoch lasted 50 s. The total duration of the paradigm was 15 min. At the beginning of each epoch, an instruction screen indicating the condition and the particular pair of speech sequences to be used was displayed (3 s). No message was displayed during the task performance. Screens were generated by means of PsyScope v.1.1 (Carnegie Mellon Department of Psychology; Cohen et al., 1993) running on a Macintosh computer. They were transmitted to the M. Sato et al. / NeuroImage 23 (2004) 1143–1151 1145 reference. The images were then spatially normalized into the reference system of Talairach and Tournoux (1988), using as template a representative brain from the Montreal Neurological Institute (Evans et al., 1994). The first step of the spatial normalization was to determine the optimal affine transformation for mapping the anatomical image onto the T1 template. Residual differences between images were then corrected using nonlinear basis functions (Friston et al., 1995a). The normalization parameters used were subsequently applied to the functional images. Finally, the latter were filtered with a low-pass Gaussian filter (FWHM = 8 8 12 mm3) to improve the signal-to-noise ratio and to reduce the interindividual variability. participants via a video projector, a projection screen situated behind the magnet and a mirror centered above the subject’s eyes. The order of the speech sequences was pseudo-randomized between scans and subjects. In order to minimize movement artefacts, a foam rubber pad secured the subjects’ heads. Finally, participants wore earplugs to attenuate scanner noise. MR acquisition Two functional scans were performed. Axial T2*-weighted functional images covering the whole brain were acquired with a gradient-recalled echo echo-planar imaging (EPI) pulse sequence on a 1.5-T MR imager (Philips NT, Best, The Netherlands). The imaging volume was oriented parallel to the bicommissural plane (AC-PC). Positioning of the image planes was performed on scout images acquired in the sagittal plane. Each functional volume composed 25 adjacent slices (slice thickness of 6 mm). The major MR parameters of the EPI pulse sequence were as follows: TR = 3500 ms, TE = 45 ms, pulse angle = 908, field-of-view = 256 256 mm2, acquisition and reconstruction matrixes = 64 64 pixels. bDummyQ volumes acquired at the beginning of each session to allow for MRI signal stabilization were discarded from the functional data set. Between the first and the second functional MR scans, high-resolution 3-D T1-weighted images were acquired from the same volume as the functional images (150 slices, 1 mm thickness, field-of-view = 256 256 mm2, acquisition and reconstruction matrixes = 256 256 pixels). Data analysis: statistical model and inference To detect task-related activations, the functional data were analyzed using an implementation of the general linear model (GLM; Friston et al., 1995b) in which the condition effects (covariates of interest) were estimated voxelwise. Each condition was modeled by convolving a boxcar function with the standard hemodynamic response function to create regressors of interest. Global activity was corrected by grand mean scaling, a high-pass filter removed low-frequency drifts in the BOLD signal, and finally the signal was temporally smoothed. Fixed-effect analysis We first proceeded to a multisubjects fixed-effect analysis. The effect of the search for verbal transformations was assessed by contrasting all VT epochs with all baseline epochs. The SPM{t} map obtained was then transformed into SPM{Z} maps, thresholded at P b 0.05 and corrected for multiple comparisons (cluster extent threshold = 15 voxels). Activation clusters were characterized in terms of their peak heights (Z value maxima) and locations within the stereotactic space of Talairach and Tournoux (1988) (for Data analysis: preprocessing The data were preprocessed and analyzed with SPM-99 software (Wellcome Department of Cognitive Neurology, London, UK) in the MATLAB environment (Mathworks, Sherbon, USA). Motion correction was carried out for each subject. Functional images were realigned using the first scan of each session as a Table 1 Maximal activation peaks obtained for the [VT–baseline] contrast Region of maximal activation Voxels Left/right Talairach coordinates x Inferior frontal gyrus Precentral gyrus Middle frontal gyrus Supramarginal gyrus Supramarginal gyrus Postcentral gyrus Supramarginal gyrus Supramarginal gyrus Cerebellum Cerebellum Cerebellum Cerebellum Cerebellum Cerebellum Middle frontal gyrus Medial frontal gyrus Inferior frontal gyrus Middle frontal gyrus Inferior frontal gyrus Caudate nucleus 264 192 103 146 171 53 77 46 19 40 L L L L L L R R L L L R R R R L/R R R R L y 55 48 28 44 51 59 44 63 32 20 20 20 32 20 24 4 59 51 36 20 BA T value Z value 44 4/6 6 40 40 3 40 40 12.63 10.93 10.43 11.85 11.17 7.83 10.99 6.40 9.88 7.97 5.31 9.85 8.96 7.13 9.82 8.86 8.02 6.77 6.29 5.86 Inf Inf Inf Inf Inf 7.80 Inf 6.39 Inf Inf 5.30 Inf Inf 7.10 Inf Inf Inf 6.75 6.28 5.85 z 9 2 1 33 36 14 32 33 52 63 68 64 51 55 3 14 13 6 24 11 Fixed-effect analysis, P b 0.05 corrected, cluster size threshold = 15 voxels. BA: Brodmann area. 22 44 55 46 52 28 51 29 23 12 32 37 18 12 61 44 21 44 4 23 6 6 44/45 6 1146 M. Sato et al. / NeuroImage 23 (2004) 1143–1151 coordinate conversion from MNI to Talairach and Tournoux stereotactic space, see Brett et al., 2002). Random-effect analysis In order to test the potential predictive validity of our results and make inference at the population level, we subsequently proceeded to a random-effect analysis implemented with a two-level procedure (Holmes and Friston, 1998; see also Friston et al., 1999). The specific contrast images (i.e., VT versus baseline) were first calculated for each subject individually and they were then entered into a one-tailed t test with nine degrees of freedom. The set of t values thus obtained constituted an SPM{t} map, which was transformed into an SPM{Z} map thresholded at P b 0.005 uncorrected for multiple comparisons (cluster extent threshold = 15 voxels) for the two regions for which we had a strong a priori hypothesis (i.e., the left inferior frontal and supramarginal gyri). All other activations reported survived a threshold corresponding to P b 0.05 corrected for multiple comparisons at the cluster level. Results The fixed-effect analysis of the [VT–baseline] contrast revealed a predominantly left-lateralized network of areas, including frontal, parietal, temporal, basal ganglia, and cerebellar regions (see Table 1 and Fig. 1). The maximal peak activation was detected in the left opercular part of the inferior frontal gyrus (i.e., in Broca’s area; x, y, z = 55 9 22), within a cluster extending dorsally into the middle frontal gyrus (x, y, z = 28 1 55) and into the anterior part of the precentral gyrus (x, y, z = 48 2 44). Activations of the left anterior part of the insular cortex and of the left perisylvian limit of the superior temporal gyrus were also evident within this cluster. Less extensive activations of homologous prefrontal or perisylvian areas were found in the right hemisphere, with three peaks of activation within the inferior frontal gyrus and within the middle frontal gyrus (x, y, z = 59 13 21; x, y, z = 36 24 4; x, y, z = 24 3 61). Bilateral activations of the medial prefrontal cortex (x, y, z = 4 14 44), that is, the supplementary motor area, were also detected, extending ventrally within the anterior part of the cingulate cortex in both hemispheres. In addition, activations were found in both left and right inferior parietal lobules, albeit less extensive within the right hemisphere (x, y, z = 44 33 46; x, y, z = 44 32 51). Peak activations were detected in the supramarginal gyrus, with a cluster extending, within the left-hemisphere, into the anterior limit of the postcentral gyrus (x, y, z = 59 14 28). Finally, activations were also found in the left caudate nucleus (x, y, z = 20 11 23) and in the cerebellum, bilaterally (x, y, z = 32 52 23; x, y, z = 20 64 37). Fig. 1. Statistical parametric maps provided by the [VT–baseline] contrast in the fixed-effect analysis. The threshold is set at P b 0.05 corrected, excluding clusters with spatial extent of less than 15 voxels. (A) Glass-brain views in the stereotactic space of the Montreal Neurological Institute. The two stronger activations are in (a) the left inferior frontal gyrus (x, y, z = 55 9 22) and in (b) the left supramarginal gyrus (x, y, z = 44 33 46). (B) Left: activations obtained in the sagittal, coronal, and transverse slices comprising the activation in the left inferior frontal gyrus (x, y, z = 55 9 22). Right: mean size effect F standard error of the mean, for each regressor of interest within the peak voxel. The regressors of interest represent, respectively, the VT conditions for the two pairs of monosyllabic nonsense words (1,2), for the pair of disyllabic first names (3), and the baseline conditions for the same sequences (respectively, 4,5,6). (C) Left: activations obtained in the sagittal, coronal, and transverse slices comprising the activation in the left supramarginal gyrus (x, y, z = 44 33 46). Right: mean size effect F standard error of the mean for each regressor of interest within the peak voxel. The regressors of interest represent, respectively, the VT conditions for the two pairs of monosyllabic nonsense words (1,2), for the pair of disyllabic first names (3), and the baseline conditions for the same sequences (respectively, 4,5,6). M. Sato et al. / NeuroImage 23 (2004) 1143–1151 1147 Table 2 Maximal activation peaks obtained for the [VT–baseline] contrast Region of maximal activation Voxels Left/right Anterior cingulate gyrus Anterior cingulate gyrus Anterior cingulate gyrus Cerebellum Cerebellum Cerebellum Superior temporal gyrus Inferior frontal gyrus Inferior frontal gyrus Supramarginal gyrus Cerebellum Cerebellum Cerebellum 35 R R R R R R L L L L L L L Talairach coordinates x 41 29 17 31 y 16 4 24 36 12 20 55 55 44 40 4 4 8 BA T value Z value 32 24 6/32 6.39 5.17 4.90 5.63 4.44 4.24 5.17 4.28 4.24 4.44 4.30 4.23 3.82 3.83 3.44 3.34 3.60 3.15 3.06 3.44 3.08 3.06 3.15 3.09 3.06 2.87 z 21 13 2 48 51 55 8 9 9 41 55 63 40 27 32 44 23 13 12 0 22 16 35 7 12 13 22 44 44 40 Random-effects analysis, P b 0.005 uncorrected, cluster size threshold = 15 voxels. BA: Brodmann area. Except for the left inferior frontal and supramarginal gyri, all activations reported survived a threshold corresponding to P b 0.05 corrected for multiple comparisons at the cluster level. In order to test for possible differences in activation arising from the use of different types of speech sequences, we estimated the mean size of effect for each regressor of interest for the peak activations within the left inferior frontal gyrus (x, y, z = 55 9 22; Fig. 1B) and within the left supramarginal gyrus (x, y, z = 44 33 46; Fig. 1C). In both cases, a stronger effect was obtained for the monosyllabic nonsense words than for the disyllabic first names. The random-effect analysis yielded a more reduced set of activations (see Table 2 and Fig. 2). Compared with the fixed-effect analysis, the peak activation in the medial frontal gyrus was shifted slightly ventrally into the anterior part of the right cingulate gyrus (x, y, z = 16 21 27). Activations within the inferior frontal gyrus were observed in the left hemisphere only (x, y, z = 55 9 22; x, y, z = 44 9 16). As with the fixed-effect analysis, activations were detected within the left supramarginal gyrus (x, y, z = 40 41 35), albeit more ventrally, the cerebellum (x, y, z = 4 55 7; x, y, z = 36 48 23) and the left superior temporal gyrus (x, y, z = 55 8 0). Fig. 2. Statistical parametric maps provided by the [VT–baseline] contrast in the random-effect analysis. The threshold is set at P b 0.005 uncorrected, excluding clusters with spatial extent of less than 15 voxels. (A) Glass-brain views in the stereotactic space of the Montreal Neurological Institute. (B) Activations obtained in the sagittal, coronal, and transverse slices comprising the activation in the left inferior frontal gyrus (x, y, z = 55 9 22). (C) Activations obtained in the sagittal, coronal, and transverse slices comprising the activation in the left supramarginal gyrus (x, y, z = 40 41 35). 1148 M. Sato et al. / NeuroImage 23 (2004) 1143–1151 Discussion The aim of this fMRI study was to explore the neural correlates of the imagery analogue of the verbal transformation effect. The auditory imagery task performed involves on-line linguistic processing of the mentally rehearsed sequence, a decision-making process during the popping up of a new speech form, and the temporary storage of the newly built representation. To minimize cerebral activations not directly associated with the search for verbal transformations, two conditions were contrasted: a baseline condition involving the simple mental repetition of speech sequences and a verbal transformation condition (VT) involving the mental repetition of the same items with an active search for verbal transformation. The contrast between the verbal transformation task and the baseline revealed, in the random-effect analysis, a predominantly left-lateralized network of brain areas including the left inferior frontal gyrus, the left superior temporal gyrus, the left supramarginal gyrus, the right anterior cingulate gyrus, and the cerebellum bilaterally. In the fixed-effect analysis, speech-motor-related areas were furthermore found activated: the anterior part of the insular cortex, the supplementary motor area, the premotor cortex, the left caudate nucleus, and the left postcentral gyrus. Task performance As the verbal transformation task and the baseline were performed covertly, so as to avoid task-related motion artefacts during fMRI acquisition, no external assessment of task performance was conducted. Therefore, to ensure optimal realization of the tasks during the scans, subjects were extensively trained before the fMRI experiments. While slight fluctuations in repetition rate within and between conditions may not be excluded a priori, preliminary measurements during the training session, using an overt repetition mode, displayed no significant differences of the repetition rate between nor within conditions. Since the production of the speech sequences at this particular frequency appears to be very stable, little systematic effect of the repetition rate was expected during the scans. Concerning the actual execution of the imagery task on-line, all the participants signaled such transformations during the training session. Furthermore, upon debriefing following the actual fMRI experiments, they all reported having performed the tasks correctly (i.e., having searched for the transformations—and having detected some—during the active condition and having merely repeated the sequences without actively searching during the baseline condition). Finally, the incidental occurrence of verbal transformations during the baseline condition may certainly not be excluded a priori. However, while such occurrences might lower the statistical significance of the activations obtained when contrasting the baseline and active task conditions, they are not expected to affect the main conclusions of this study. Involvement of the phonological loop As it generally requires temporary manipulation of verbal material, auditory imagery is assumed to be mediated by the phonological loop—the Baddeley’s verbal working memory system—and to involve both components of this model, that is, the phonological store and the articulatory rehearsal system (Baddeley and Logie, 1992; Reisberg et al., 1989; Smith et al., 1995). Behavioral studies have tested this hypothesis using the imagery analogue of the verbal transformation effect (Reisberg et al., 1989; Smith et al., 1995). Those studies have shown that the reinterpretation of auditory images is blocked both by concurrent articulation and concurrent auditory input, two effects known to disrupt performance during verbal working memory tasks. Given that the neural network of brain areas obtained in the [VT–baseline] contrast appears to be similar to that involved in verbal working memory (Cohen et al., 1997; Henson et al., 2000; Honey et al., 2000, 2002; Jonides et al., 1998; Paulesu et al., 1993; Schumacher et al., 1996; Smith et al., 1998), our results corroborate these behavioral evidences regarding the involvement of verbal working memory during the verbal transformation task. A frontoparietal coupling between the left inferior frontal gyrus and the supramarginal gyrus Since the original study by Paulesu et al. (1993), several memory load parametric functional studies have indicated a neuroanatomical dissociation during verbal working memory tasks, with the left inferior frontal gyrus and the left supramarginal gyrus supporting the processes underlying verbal rehearsal and temporary retention of phonological contents, respectively (e.g., Cohen et al., 1997; Honey et al., 2000, 2002). Considering the processes specifically associated with the verbal transformation task, the view of a frontal–parietal dissociation in terms of rehearsal and storage of phonological material deserves to be further examined. Traditionally, literature on verbal working memory and language has emphasized Broca’s area (more specifically, the opercular part of the inferior frontal gyrus) as a major structure involved in subvocal rehearsal and speech production. Our results permit to refine the role of Broca’s area in these processes. While both conditions in our study involved subvocal rehearsal of the same speech sequences, one of the main activations observed for the [VT–baseline] contrast was that of the left inferior frontal gyrus in its opercular part. It seems therefore difficult to assign a purely rehearsal role to this region. Following this line, the classical view of Broca’s area, as being the major structure for the motor act of speech, has been reconsidered in recent functional neuroimaging studies that obtained no activation of this region during automatic speech tasks, executed either overtly or covertly (Bookheimer et al., 2000; Murphy et al., 1997; Wise et al., 1999). Convergent neurological findings have been obtained from stroke patients and from patients having undergone resection. Removal of, or injury to, Broca’s area does not necessarily result in persistent Broca’s aphasia (as traditionally described) and the articulatory planning deficits observed in Broca’s aphasia may well result from lesions outside this region (Dronkers, 1996; Dronkers et al., 2000). Several studies have demonstrated the implication of the left inferior frontal gyrus during graphophonemical conversion processes in silent reading (Fiez et al., 1999; Price et al., 1997), attentive speech listening (Papathanassiou et al., 2000), lip reading (Calvert and Campbell, 2003), and more specifically during phonological analysis processing such as phoneme monitoring, syllable counting, and rhyming (Démonet et al., 1992, 1994; Paulesu et al., 1993; for a review, see Poldrack et al., 1999). Therefore, it appears that Broca’s area is not stricto-sensu related to subvocal rehearsal or to the motor act of speech production but rather to the on-line analysis of articulatory speech forms that support communicative or M. Sato et al. / NeuroImage 23 (2004) 1143–1151 interpretative speech. Hence, this region appears well adapted to syllable parsing in the verbal transformation task. A broad range of other speech-related areas were found activated in the [VT–baseline] contrast: the supplementary motor area, the premotor area, the insular cortex, the caudate nucleus, and the cerebellum. These regions are respectively considered to be involved in the initiation (Ziegler et al., 1997), the planning and coordination of articulatory movements (Dronkers, 1996; Wise et al., 1999), and in the modulation and temporal encoding or control of such movements (Mathiak et al., 2003; Wildgruber et al., 2001). We hypothesize that these activations may be related to the shifts in stimulus (from one speech form to the other) during the verbal transformation task. These successive shifts would then imply alternating articulatory or phonological mental representations of the rehearsed items and therefore a stronger involvement of the speechrelated areas than is the case during baseline. In addition to these speech-related areas, the left superior temporal gyrus was also activated in the [VT–baseline] contrast. The activation of this region has been observed during inner speech and auditory imagery tasks and has been hypothesized to underlie a verbal monitoring system (McGuire et al., 1996; Shergill et al., 2001, 2002). Communication between frontal and temporal regions during the generation of inner speech may then inform areas involved in language perception that verbal output is self-generated. This hypothesis is also consistent with the activations observed in these regions during auditory verbal hallucinations in patients with schizophrenia (Shergill et al., 2000) and with the notion that verbal hallucinations are due to a lack of awareness of the patient’s normal inner speech. The left supramarginal gyrus has been found activated in the [VT–baseline] contrast. Given that the verbal transformation task appears to require minimal verbal storage (i.e., the storage of the rehearsed sequence until the emergence of the associated form), the traditional view of this region as being involved in the temporary storage of phonologically coded verbal material must be reexamined. Hickok and Poeppel (2000) (see also Hickok, 2003) have offered a hypothesis that better accounts for our results. In their model of speech perception, in line with several other models that posit a basis in articulatory or gestural representations (Liberman and Mattingly, 1985; Liberman and Whalen, 2000, Schwartz et al., 2002), the authors suggest a fronto-temporo-parietal network predominantly in the left hemisphere, which interfaces auditory and articulatory representations of speech. According to these authors, the supramarginal gyrus bshould not be the site of storage of phonemic representations per se, [. . .] but rather serve to interface sound-based representations in auditory cortex with articulatory-based representations of speech in the frontal cortexQ via sensorimotor recoding. From our results, we assume that a similar fronto-temporo-parietal circuit is recruited during the verbal transformation task. Accordingly, while the superior temporal gyrus could be responsible for verbal monitoring, the supramarginal gyrus could be involved in the encoding of phonologicalintegrated representations of speech sequences. Once recoded, these representations would then be sent to the inferior frontal gyrus. At the emergence of a new representation, which could rely on both the inferior frontal gyrus for syllable parsing and on the anterior cingulate cortex for possible competition mechanisms between representations (Carter et al., 1998), the new speech form would then be sent back to the supramarginal gyrus. Further arguments in favor of this frontoparietal coupling stem from the analysis of the mean size of effect for each regressor of interest inside the peak activations of these two regions (Fig. 1C). These 1149 estimations clearly show stronger effect for the monosyllabic nonsense words than for the disyllabic first names during the VT condition. This result may be explained by the lexical and semantic nature of the latter sequences. With such materials, an auditorymotor network should not play a prominent role, rather subjects should rely more on semantic codes (Jonides et al., 1998). It is thus conceivable that boverlearnedQ lexical sequences require less neural involvement in both the inferior frontal gyrus and the supramarginal gyrus than nonsense articulatory gestures. Finally, the frontoparietal coupling involved in the on-line linguistic processing of the mentally rehearsed sequence and the temporary storage of the latterly built representation could provide a well-adapted platform for the decision-making process during the emergence of a new speech form. Activations of the cerebellar cortex and the cingulate gyrus are of particular interest in relation to this decision-making process. Desmond et al. (1997) suggested that during verbal working memory tasks, the cerebellum could act as a comparator system in which discrepancies between actual and intended phonological output (coming from the articulatory control system in the prefrontal cortex and the phonological store in the inferior parietal cortex) could be estimated and then used to update a feedforward articulatory rehearsal command to the frontal cortex. The anterior cingulate gyrus may also be involved in the decisionmaking process. This region has indeed been previously described as being implied in executive processes such as attentional control and monitoring of performance during verbal working memory tasks (Osaka et al., 2003; Smith and Jonides, 1999). In summary, our results, combined with those of previous behavioral studies, indicate the involvement of verbal working memory during the imagery analogue of the verbal transformation effect. The functional coupling observed in the left hemisphere between the inferior frontal gyrus, the supramarginal gyrus, and the superior temporal gyrus—areas considered to be involved in the online linguistic processing, the temporary storage, and the selfmonitoring of verbal material—strongly suggests that the auditory imagery task shares common components of speech perception and speech production and that it relies both on sound-based and on articulatory representations. In addition to this fronto-temporoparietal neural network, activations observed within the right anterior cingulate cortex and the cerebellum bilaterally are assumed to reflect attentional control and comparison of speech forms processes during the active search for verbal transformations. The functional coupling between all these brain regions might constitute the neural basis for decision-making processes during the emergence of a new speech form. References Aleman, A., Nieuwenstein, M.R., Bocker, K.B., de Haan, H.F., 2000. Music training and mental imagery ability. Neuropsychologia 38 (12), 1664 – 1668. Baddeley, A.D., 1986. Working Memory. Oxford Univ. Press, Oxford. Baddeley, A.D., 1992. Working Memory. Science 255, 556 – 559. Baddeley, A.D., Hitch, G.J., 1974. Working memory. In: Bower, G.A. (Ed.), Recent Advances in Learning and Motivation, vol. 8. New York Academic Press, New York, pp. 47 – 90. Baddeley, A.D., Logie, R., 1992. Auditory imagery and working memory. In: Reisberg, D. (Ed.), Auditory Imagery. Lawrence Erlbaum, Hillsdale, pp. 179 – 197. Bookheimer, S.Y., Zeffiro, T.A., Blaxton, T., Gaillard, W.D., Theodore, W.H., 2000. Activation of language cortex with automatic speech tasks. Neurology 55, 1151 – 1157. 1150 M. Sato et al. / NeuroImage 23 (2004) 1143–1151 Brett, M., Johnrsude, I.S., Owen, A.M., 2002. The problem of localization in the human brain. Nat. Neurosci. 3, 243 – 249. Calvert, G.A., Campbell, R., 2003. Reading speech from still and moving faces: the neural substrates of visible speech. J. Cogn. Neurosci. 15 (1), 57 – 70. Carter, C.S., Braver, T.S., Bach, D.M., Botvinick, M.M., Noll, D., Cohen, J.D., 1998. Anterior cingulate cortex, error detection, and the online monitoring of performance. Science 280, 747 – 749. Cohen, J.D., MacWhinney, B., Flatt, M.R., Provost, J., 1993. PsyScope: a new graphic interactive environment for designing psychology experiments. Behav. Res. Methods Instrum. Comput. 25 (2), 257 – 271. Cohen, J.D., Perlstein, W.M., Braver, T.S., Nystrom, L.E., Noll, D.C., Jonides, J., Smith, E.E., 1997. Temporal dynamics of brain activation during a working memory task. Nature 386, 604 – 608. Démonet, J.-F., Chollet, F., Ramsay, S., Cardebat, D., Nespoulous, J.-L., Wise, R., Rascol, A., Frackowiak, R.S.J., 1992. The anatomy of phonological and semantic processing in normal subjects. Brain 115, 1753 – 1768. Démonet, J.-F., Price, C., Wise, R., Frackowiak, R.S.J., 1994. A pet study of cognitive strategies in normal subjects during language tasks: influence on phonetic ambiguity and sequence processing on phoneme monitoring. Brain 117 (4), 671 – 682. Desmond, J.E., Gabrieli, J.D.E., Wagner, A., Ginier, B.L., Glover, G.H., 1997. Lobular patterns of cerebellar activation in verbal working memory and finger-tapping tasks as revealed by functional MRI. J. Neurosci. 17, 9675 – 9685. Ditzinger, T., Tuller, B., Kelso, J.A.S., 1997. Temporal patterning in an auditory illusion: the verbal transformation effect. Biol. Cybern. 77, 23 – 30. Dronkers, N.F., 1996. A new brain region for coordinating speech articulation. Nature 384, 159 – 161. Dronkers, N.F., Redfern, B.B., Knight, R.T., 2000. The neural architecture of language disorders. In: Gazzaniga, M.S. (Ed.), The New Cognitive Neurosciences. MIT Press, Cambridge, pp. 949 – 958. Evans, A.C., Kamber, M., Collins, D.L., MacDonald, D., 1994. An MRIbased probabilistic atlas of neuroanatomy. In: Shorvon, S., Fish, D., Andermann, F., Bydder, G.M., Stefan, H. (Eds.), Magnetic Resonance Scanning and Epilepsy, NATO ASI Series A, Life Sciences, vol. 264. Plenum, New York, pp. 263 – 274. Fiez, J., Balota, D.A., Raichle, M.E., Petersen, S.E., 1999. Effects of lexicality, frequency, and spelling to sound consistency on the functional anatomy of reading. Neuron 24, 205 – 218. Friston, K.J., Ashburner, J., Frith, C.D., Poline, J.-B., Heater, J.D., Frackowiak, R.S.J., 1995a. Spatial registration and normalization of images. Hum. Brain Mapp. 2, 165 – 189. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.-B., Frith, C.D., Frackowiak, R.S.J., 1995b. Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2, 189 – 210. Friston, K.J., Holmes, A.P., Worsley, K.J., 1999. How many subjects constitute a study? NeuroImage 10 (1), 1 – 5. Halpern, A.R., Zatorre, R.J., 1999. When that tunes runs through your head: a PET investigation of auditory imagery for familiar melodies. Cereb. Cortex 9 (7), 697 – 704. Henson, R.N.A., Burgess, N., Frith, C.D., 2000. Recoding, storage, rehearsal and grouping in verbal short-term memory: an fMRI study. Neuropsychologia 38, 426 – 440. Hickok, G., 2003. Auditory-motor interaction revealed by fMRI: speech, music and working memory in area Spt. J. Cogn. Neurosci. 15 (5), 673 – 682. Hickok, G., Poeppel, D., 2000. Towards a functional neuroanatomy of speech perception. Trends Cogn. Sci. 4 (4), 131 – 138. Holmes, A.P., Friston, K.J., 1998. Generalisibility, random effects and population inference. NeuroImage 7, S754. Honey, G.D., Bullmore, E., Sharma, T., 2000. Prolonged reaction time to a verbal working memory task predicts increased power of posterior parietal cortical activation. NeuroImage 12, 495 – 503. Honey, G.D., Fu, C.H.Y., Kim, J., Brammer, M.J., Croudace, T.J., Suckling, J., Pich, E.M., Williams, S.C.R., Bullmore, E., 2002. Effects of verbal working memory load on corticocortical connectivity modelled by path analysis of functional magnetic resonance imaging data. NeuroImage 17, 573 – 582. Jonides, J., Schumacher, E.H., Smith, E.E., Koeppe, R.A., Awh, E., ReuterLorentz, P.A., Marshuetz, C., Willis, C.R., 1998. The role of parietal cortex in verbal working memory. J. Neurosci. 18, 5026 – 5034. Liberman, A.M., Mattingly, I.G., 1985. The motor theory of speech perception revised. Cognition 21, 1 – 36. Liberman, A.M., Whalen, D.H., 2000. On the relation of speech to language. Trends Cogn. Sci. 4, 187 – 196. MacKay, D.G., Wulf, G., Yin, C., Abrams, L., 1993. Relations between word perception and production: new theory and data on the verbal transformation effect. J. Mem. Lang. 32, 624 – 646. Mathiak, K., Hertrich, I., Grodd, W., Ackermann, H., 2003. Cerebellum and speech perception: a functional magnetic resonance imaging study. J. Cogn. Neurosci. 14 (6), 902 – 912. McGuire, P.K., Silbersweig, D.A., Murray, R.M., David, A.S., Frackowiak, R.S., Frith, C.D., 1996. Functional anatomy of inner speech and auditory imagery. Psychol. Med. 26 (1), 29 – 38. Miyake, A., Shah, P., 1999. Models of Working Memory: Mechanisms of Active Maintenance and Executive Control. Cambridge Univ. Press, New York. Murphy, K., Corfield, D.R., Guz, A., Fink, G.R., Harrison, J., Wise, R.J.S., Adams, L., 1997. Cerebral areas associated with motor control of speech in humans. J. Appl. Physiol. 83 (5), 1438 – 1447. Oldfield, R.C., 1971. The assessment and analysis of handedness: the Edinburgh Inventory. Neuropsychologia 9, 97 – 114. Osaka, M., Osaka, N., Kondo, H., Morishita, M., Fukuyama, H., Aso, T., Shibasaki, H., 2003. The neural basis of individual differences in working memory capacity: an fMRI study. NeuroImage 18, 789 – 797. Papathanassiou, D., Etard, O., Mellet, E., Zago, L., Mazoyer, B., TzourioMazoyer, N., 2000. A common language network for comprehension and production: a contribution to the definition of language epicenters with PET. NeuroImage 11 (4), 347 – 357. Paulesu, E., Frith, C.D., Frackowiak, R.S.J., 1993. The neural correlates of the verbal components of working memory. Nature 362, 342 – 344. Pitt, M., Shoaf, L., 2002. Linking verbal transformations to their causes. J. Exp. Psychol. Hum. Percept. Perform. 28, 150 – 162. Poldrack, R.A., Wagner, A.D., Prull, M.W., Desmond, J.E., Glover, G.H., Gabrieli, J.D.E., 1999. Functional specialization for semantic and phonological processing in the left inferior prefrontal cortex. NeuroImage 10, 15 – 35. Price, C.J., Moore, C.J., Humphreys, G.W., Wise, R.J.S., 1997. Segregating semantic from phonological processes during reading. J. Cogn. Neurosci.s 9, 727 – 733. Reisberg, D., 1992. Auditory Imagery. Lawrence Erlbaum, Hillsdale. Reisberg, D., Smith, J.D., Baxter, A.D., Sonenshine, M., 1989. Enacted auditory images are ambiguous; pure auditory images are not. Q. J. Exp. Psychol. 41A, 619 – 641. Sato, M., Schwartz, J.-L., 2003. Linking speech, verbal imagery and working memory: articulatory control constraints in the verbal transformation effect. In: Solé, M.J., Recasens, D., Romero, J. (Eds.), Proceedings of the XVth International Congress of Phonetic Sciences, Casual Productions, Adelaide, pp. 435 – 438. Schumacher, E.H., Lauber, E., Awh, E., Jonides, J., Smith, E., Koeppe, R.A., 1996. PET evidence for an amodal verbal working memory system. NeuroImage 3, 79 – 88. Schwartz, J.-L., Abry, C., BoJ, L.-J., Cathiard, M.A., 2002. Phonology in a theory of perception-for-action-control. In: Durand, J., Lacks, B. (Eds.), Phonology: From Phonetics to Cognition. Oxford Univ. Press, Oxford, pp. 240 – 280. Shergill, S.S., Bullmore, E.T., Simmons, A., Murray, R.M., McGuire, P.K., 2000. Functional anatomy of auditory imagery in schizophrenic patients with auditory hallucinations. Am. J. Psychiatry 157 (10), 1691 – 1693. M. Sato et al. / NeuroImage 23 (2004) 1143–1151 Shergill, S.S., Bullmore, E.T., Brammer, M.J., Williams, S.C., Murray, R.M., McGuire, P.K., 2001. A functional study of auditory imagery. Psychol. Med. 31 (2), 241 – 253. Shergill, S.S., Brammer, M.J., Fukuda, R., Bullmore, E., Amaro Jr., E., Murray, R.M., McGuire, P.K., 2002. Modulation of activity in temporal cortex during generation of inner speech. Hum. Brain Mapp. 16, 219 – 227. Shoaf, L., Pitt, M., 2002. Does node stability underlie the verbal transformation effect? A test of node structure theory. Percept. Psychophys. 64 (5), 795 – 803. Smith, E.E., Jonides, J., 1999. Storage and executive processes in the frontal lobes. Science 283, 1657 – 1661. Smith, J.D., Reisberg, D., Wilson, M., 1995. The role of subvocalization in auditory imagery. Neuropsychologia 11, 1433 – 1454. Smith, E.E., Jonides, J., Marshuetz, C., Koeppe, R.A., 1998. Components of verbal working memory: evidence from neuroimaging. Proc. Natl. Acad. Sci. 95, 876 – 882. 1151 Talairach, J., Tournoux, P., 1988. A Co-Planar Stereo-Taxic Atlas of Human Brain. Thieme, Stuttgart. Warren, M.R., 1961. Illusory changes of distinct speech upon repetition—The verbal transformation effect. Br. J. Psychol. 52, 249 – 258. Warren, M.R., Gregory, R.L., 1958. An auditory analogue of the visual reversible figure. Am. J. Psychol. 71, 612 – 613. Wildgruber, D., Ackermann, H., Grodd, W., 2001. Differential contributions of motor cortex, basal ganglia, and cerebellum to speech motor control: effects of syllable repetition rate evaluated by fMRI. NeuroImage 13, 101 – 109. Wise, R.J., Greene, J., Buchel, C., Scott, S.K., 1999. Brain regions involved in articulation. Lancet 353, 1057 – 1061. Ziegler, W., Kilian, B., Deger, K., 1997. The role of left mesial frontal cortex in fluent speech: evidence from a case of left supplementary motor area hemorrhage. Neuropsychology 35, 1197 – 1208.