Download Processing Prosodic Boundaries in Natural and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Dual consciousness wikipedia , lookup

Neuroeconomics wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Neuropsychology wikipedia , lookup

Neurophilosophy wikipedia , lookup

History of neuroimaging wikipedia , lookup

Aging brain wikipedia , lookup

Neuroesthetics wikipedia , lookup

Time perception wikipedia , lookup

Affective neuroscience wikipedia , lookup

Broca's area wikipedia , lookup

Lateralization of brain function wikipedia , lookup

Neurocomputational speech processing wikipedia , lookup

Speech synthesis wikipedia , lookup

Speech perception wikipedia , lookup

Neurolinguistics wikipedia , lookup

Cognitive neuroscience of music wikipedia , lookup

Emotional lateralization wikipedia , lookup

Embodied language processing wikipedia , lookup

Transcript
Cerebral Cortex March 2008;18:541--552
doi:10.1093/cercor/bhm083
Advance Access publication June 24, 2007
Processing Prosodic Boundaries in
Natural and Hummed Speech: An fMRI
Study
Speech contains prosodic cues such as pauses between different
phrases of a sentence. These intonational phrase boundaries (IPBs)
elicit a specific component in event-related brain potential studies,
the so-called closure positive shift. The aim of the present functional magnetic resonance imaging study is to identify the neural
correlates of this prosody-related component in sentences containing segmental and prosodic information (natural speech) and
hummed sentences only containing prosodic information. Sentences with 2 IPBs both in normal and hummed speech activated
the middle superior temporal gyrus, the rolandic operculum, and the
gyrus of Heschl more strongly than sentences with 1 IPB. The
results from a region of interest analysis of auditory cortex and
auditory association areas suggest that the posterior rolandic
operculum, in particular, supports the processing of prosodic information. A comparison of natural speech and hummed sentences
revealed a number of left-hemispheric areas within the temporal
lobe as well as in the frontal and parietal lobe that were activated
more strongly for natural speech than for hummed sentences.
These areas constitute the neural network for the processing of
natural speech. The finding that no area was activated more
strongly for hummed sentences compared with natural speech
suggests that prosody is an integrated part of natural speech.
Keywords: prosody processing, auditory cortex, intonational phrase
boundaries, fMRI, sentence processing
Introduction
The speech melody of an utterance can carry information that is
critically important to understand the meaning of a sentence
(see, for a review, Friederici and Alter 2004; Frazier et al. 2006).
In intonational languages such as German, Dutch, and English,
prosodic information on sentence level is mainly conveyed,
among others, by the pitch contour of an utterance and the
presence of speech pauses. Sentences usually contain one or
more major intonational phrases (IPhs; Selkirk 1995) that can be
separated by speech pauses. Syntactically relevant speech
breaks are also referred to as intonational phrase boundaries
(IPBs). In studies using an event-related brain potential (ERP)
paradigm, IPBs were observed to give rise to a positive shift in
the electroencephalographic (EEG) signal that is referred to as
the closure positive shift or CPS component (Steinhauer et al.
1999). This component has been interpreted as being specifically related to the prosodic information contained in IPBs, as it
has been observed using sentence materials that lacked semantic and syntactic information, such as pseudoword sentences
(Pannekamp et al. 2005), filtered speech materials (Steinhauer
and Friederici 2001), or hummed speech (Pannekamp et al.
2005). The present functional magnetic resonance imaging
Ó The Author 2007. Published by Oxford University Press. All rights reserved.
For permissions, please e-mail: [email protected]
Anja K. Ischebeck1,2, Angela D. Friederici2 and Kai Alter2,3
1
Clinical Department of Neurology, Innsbruck Medical
University, 6020 Innsbruck, Austria, 2 Department of
Neuropsychology, Max Planck Institute of Human Cognitive
and Brain Sciences, Leipzig, Germany and 3School of
Neurology, Neurobiology, and Psychiatry, Newcastle
University Medical School, Newcastle upon Tyne, UK
(fMRI) study attempts to identify the brain regions that are
involved in the processing of IPBs.
IPBs are often employed by the speaker to clarify the
structure of an otherwise syntactically ambiguous sentence. A
sentence like ‘‘Before Ben starts # the day dreaming has to stop’’
has a different meaning than ‘‘Before Ben starts the day #
dreaming has to stop’’ (# indicating a break). Syntactically
relevant speech breaks are also referred to as IPBs. IPBs often
separate major IPhs and correspond to major syntactic boundaries (Cooper and Paccia-Cooper 1981). They represent a high
level in the so-called prosodic hierarchy (Nespor and Vogel
1983; Selkirk 1995, 2000). In psycholinguistic experiments, IPBs
were shown to help resolve ambiguities related to late closure
ambiguities (Grabe et al. 1994; Schafer et al. 2000).
Experimental evidence suggests that humans can make use of
the prosodic information contained in IPBs, that is, in the
absence of semantic or syntactic information. Behaviorally, it
has been shown that listeners are able to detect major prosodic
boundaries in meaningless speech materials, such as, for example, reiterant speech (i.e., a sentence spoken as a string of
repeated syllables while preserving the original prosodic
contour) (de Rooij 1976), spectrally scrambled and low-pass-filtered speech (de Rooij 1975; Kreiman 1982), and hummed
sentences (t’Hart and Cohen 1990). It should be noted that the
rationale for using stimuli of this kind is based on the assumption that speech is separable into layers, such as semantics,
syntax, and prosody. This assumption, however, may only be an
approximation as it has been argued that the possibility to separate these layers is inherently limited (Searle 1969; Austin 1975).
In studies using an ERP paradigm, IPBs were observed to give
rise to a positive shift in the EEG signal that is referred to as the
CPS component (Steinhauer et al. 1999). Subsequent studies
indicated that the CPS is specifically related to the prosodic
aspects of an IPB, as it was also observed for sentence materials
that were stripped of semantic information, such as pseudoword sentences, and for sentence materials with reduced
or absent segmental information, such as hummed sentences (Pannekamp et al. 2005), and filtered speech materials
(Steinhauer and Friederici 2001). The CPS component is
typically distributed bilaterally with a central maximum with
a shift to the right for hummed sentences. However, due to the
intrinsic difficulty of source localization in EEG studies, it is
unclear which brain regions generate the CPS component.
To our knowledge, only one imaging study so far investigated
the processing of IPBs in speech (Strelnikov et al. 2006).
Strelnikov et al. compared sentence materials in Russian that
contained an IPB (e.g., ‘‘To chop not # to saw,’’ meaning one
should not chop but saw) with sentences that did not contain an
IPB (‘‘Father bought him a coat’’). Comparing sentences with
IPB (segmented) to sentences without IPB (not segmented),
stronger activation was observed within the right posterior
prefrontal cortex and an area within the right cerebellum. In the
reverse comparison, stronger activation was observed in the
gyrus of Heschl, bilaterally, and the left sylvian sulcus. However,
the 2 types of sentence materials were used in 2 different tasks
thus confounding stimulus type and task. Although the comparison between segmented and nonsegmented speech materials very likely yielded brain areas that are relevant for prosody
processing, it cannot be excluded that differences due to the
tasks contributed to the results.
It should be noted that the term ‘‘activation’’ suggests an
absolute value, although it is, in the case of fMRI data, always
relative. This is due to 2 reasons. First, the statistical analysis only
evaluates differences between conditions. Second, it is generally
difficult to define an absolute baseline in brain activation for
physiological reasons (Stark and Squire 2001; Tomasi et al.
2006). In the present article, the term ‘‘activation’’ is used only
when an experimental condition is compared with a low-level
baseline. For comparisons between conditions, the term ‘‘activation difference’’ is used or the term activation with an adjective indicating the direction of the difference e.g., (‘‘stronger’’).
Ideally, a study investigating the processing of IPBs should
compare conditions that do not differ in any other respect than
the presence or absence of IPBs, keeping everything else
constant. In the electrophysiological studies reviewed above
(e.g., Steinhauer et al. 1999), the same task was used on sentence
materials that either had 1 or 2 IPBs. The sentence materials
used in the electrophysiological studies reviewed above were
carefully constructed as sentence pairs with the same or similar
words. It should be noted that the IPBs in these sentences were
obligatory, entailing differences with regard to the syntactic and
semantic structure between the 2 sentences of such a pair.
However, these differences only play a role when the sentence
materials are presented naturally spoken but not when their
segmental content is removed by filtering or humming.
Hummed speech has the advantage that it preserves the
prosody of natural speech while removing major aspects of
semantic and syntactic information of the utterance. Human
speakers have been shown to be able to selectively preserve the
original prosodic contour of an utterance. When asked to
produce reiterant speech (i.e., selectively preserving the prosodic contour of a meaningful utterance by repeating a syllable),
the resultant utterance preserved the prosodic aspects from
normal speech, such as duration and pitch (Larkey 1983), as
well as accentuation and boundary marking (Rilliard and
Aubergé 1998). When utterances are hummed, the hummed
version of an utterance has also been shown to preserve pitch
contour and duration of the natural spoken version (Pannekamp
et al. 2005). Most importantly, the CPS component Pannekamp
et al. observed at IPBs within hummed speech materials was
similar to the CPS observed for natural speech, but more
lateralized to the right hemisphere. Hummed speech has the
additional advantage that it is a familiar human vocalization.
Different from speech materials that are rendered unintelligible
artificially and sound more unfamiliar (Scott et al. 2000; Meyer
et al. 2004), the known unintelligibility of humming effectively
prevents participants from any attempt to decipher the original
speech content of the signal.
In the present fMRI study, sentence pairs containing 1 or 2
IPBs were presented auditorily to the participants as natural
speech and as hummed sentences. Similar to previous electro542 Prosodic Boundaries in Natural and Hummed Speech
d
Ischebeck et al.
physiological studies (Steinhauer et al. 1999; Isel et al. 2005;
Pannekamp et al. 2005), the sentence pairs were constructed
using the same or equivalent content words, except for 1 or 2
critical words. All sentences were meaningful, syntactically
correct, and spoken with natural prosody. To ensure variability
in the sentence materials used with regard to the position of the
additional IPB, 2 types of sentence pairs were constructed, one
type with the additional IPB at an early position within the
sentence (type A), the other type at a later position in the
sentence (see Table 1, for examples of the sentence materials).
Similar to previous electrophysiological studies, the IPBs contained in the sentences were obligatory. We had used materials
with obligatory IPBs because such materials had been investigated in previous electrophysiological studies. A CPS had been
observed for sentences of type A (Steinhauer et al. 1999;
Pannekamp et al. 2005) as well as for coordination structures
like type B (Steinhauer 2003). In the case of the naturally
spoken sentences with obligatory IPBs, observed activation
differences might in part be due to the additional semantic and
syntactic differences between the sentences, rather than being
solely due to the presence of IPBs. To differentiate the
processing of prosody from associated syntactic and semantic
processing, also a hummed version of each sentence was
produced by a trained speaker who was instructed to preserve
the natural prosody of the original sentence.
The aim of the present study was to identify the neural
structures involved in the processing of sentence-level prosody
by comparing sentences with 2 IPBs to sentences that have only
1 IPB. To differentiate prosodic processing from syntactic and
semantic processing, differences between sentences with
a different number of IPBs were investigated separately for
natural speech and hummed sentences.
With regard to the neural correlates of IPB processing, we
hypothesized that the primary auditory cortices and the
auditory association areas play an important role. These areas
are, among others, involved in the processing of complex
auditory signals and speech (see for a review, Griffiths et al.
2004). The superior temporal gyrus (STG) has been observed to
be involved in processing of prosody (Doherty et al. 2004;
Hesling et al. 2005). Activation in regions outside the temporal
lobe has also been reported but appears to vary across different
studies. These activations were observed to depend on the
specifics of the task (Plante et al. 2002; Tong et al. 2005),
the type of prosody involved (e.g., affective vs. linguistic,
Wildgruber et al. 2004), and the degree of propositional
information contained in the stimulus materials (Gandour
et al. 2002, 2004; Hesling et al. 2005; Tong et al. 2005). It is
possible that areas outside the temporal lobe are also involved
and that the auditory association areas are only part of a more
extended processing network. IPBs are realized by variations in
the prosody of an utterance. We therefore hypothesized that
the STG, among others, will show a modulation, positive or
negative, due to the presence of an additional IPB.
Furthermore, given the observed shift of the CPS from
a bilateral distribution for natural speech to a more righthemispheric distribution for hummed sentences, a more righthemispheric lateralization for prosodic processing may be
observed. Evidence from patients with brain lesions inspired
a first raw hypothesis about prosody processing, namely, that
the right hemisphere plays a dominant role in prosody
processing. Patients with lesions within the left hemisphere
often suffer from aphasia, whereas nonaphasic patients with
right-hemispheric damage seem to have difficulties to perceive
or produce the prosodic aspects of speech (see, for a review,
Wong 2002; but see Perkins et al. 1996). On the basis of this first
raw hypothesis, 2 more sophisticated classes of hypotheses
have been developed. According to the acoustic or cuedependent class of hypotheses, hemispheric dominance is
determined solely by the acoustic properties of the auditory
signal. Zatorre and Belin (2001), for example, suggested that the
left hemisphere is specialized in processing the high-frequency
components that generate the vowels and consonants in
speech, whereas the right hemisphere is specialized in processing the low-frequency patterns that make up the intonational
contour of a syllable or sentence (for a similar view, see Poeppel
2003). According to the class of functional or task-dependent
hypotheses, hemispheric dominance depends on the function
of prosodic information (van Lancker 1980) or on the attentional focus required, for example, by an experimental task. A
shift from the right hemisphere to the left is assumed to occur
when the task or function of the prosodic information contained in a speech signal involves language processing rather
than prosody processing. A review of the available empirical
studies suggests that lateralization is stimulus dependent but
can, in addition, vary as a function of task (Friederici and Alter
2004). It should be noted, however, that the first raw hypothesis
of right-hemispheric dominance of prosody is still under debate.
If prosodic processes are mainly subserved by the right hemisphere, we should find the right hemisphere being activated in
both natural and hummed speech and if, moreover, prosody is
represented neuronally as a separate layer, a direct comparison
between hummed minus natural speech should result in more
left-hemispheric activation for natural speech.
Methods
Participants
Sixteen healthy right-handed healthy young adults (8 female; mean age:
26.1 years, standard deviation [SD] = 4.44) took part in the experiment.
The data of 2 participants were discarded from the analysis, one because
of scanner malfunction, the other because of too many errors. All
participants were or had been students of the University of Leipzig.
They were native speakers of German with normal hearing and had no
history of neurological or psychiatric illness. Volunteers were paid for
their cooperation and had given written consent before the experiment.
Ethical approval to the present study has been provided by the
University of Leipzig.
Materials
Forty-eight German sentence pairs were constructed, which either had
1 or 2 IPBs. Ideally, the materials used should only differ with regard to
the number of IPBs they contain. If possible, they should consist of the
same words to ensure similar lexical retrieval processes. They should
also contain the same number of words and syllables, so that they do not
differ in length. To ensure some variability in the sentence materials
used with regard to the position of the additional IPB, 2 types of
sentence pairs were constructed. One type (A) had an early additional
phrase boundary and one type (B) had an additional phrase boundary at
a later position in the sentence (see Table 1, for examples of the sentence materials). The IPBs contained in the sentences were obligatory
because such materials had been investigated in previous electrophysiological studies. Sentences of type A had been used in earlier electrophysiological studies where they were observed to elicit the CPS
component (Steinhauer et al. 1999). These materials were also found to
elicit the CPS component when they were low-pass filtered (Steinhauer
and Friederici 2001) or hummed (Pannekamp et al. 2005). In addition,
coordination structures like the type B sentences used here also have
been investigated in previous electrophysiological studies (Steinhauer
2003), and a CPS was observed. The pairs of 1 and 2 IPB sentences of
types A and B were as similar as possible with regard to the words used.
In type B, the same content words were used. Also in type A, the same
content words were used, with the exception of the verb, which was
either transitive or intransitive. To ensure comparability, the frequency
and number of syllables of the verb was matched within a type A
sentence pair. We had constructed these sentence materials with the
aim to create conditions that do not differ by more than the aspect
under scrutiny, namely the number of IPBs. It should be noted that the
sentence materials constructed for this experiment come close to this
ideal but are not perfect. The type A (and type B) sentences with 1 or 2
IPBs additionally differ from each other with regard to their syntactic
structure and word order (type B). A possibility to circumvent these
differences would have been the choice of sentence materials with
optional IPBs rather than obligatory IPBs as in the materials used here.
Although obligatory IPBs correspond to major syntactic boundaries,
optional IPBs are on a lower level of the prosodic hierarchy, namely the
level of phonological phrases (Selkirk 1995). They do not necessarily
correspond to syntactic phrases (Truckenbrodt 2005). Furthermore,
optional IPBs depend on several factors such as speech rate and hesitations (filled or unfilled pauses) which might make them more difficult
to detect for the listener. Although it is highly probable that optional
IPBs also elicit a CPS component, this has not yet been investigated.
The choice of our materials was primarily motivated to ensure comparability to previous electrophysiological studies where a CPS component
was observed. Although not optimal, we think that our materials are
reasonably comparable to allow inferences on prosodic processing.
The sentences were spoken with natural prosody. Additionally,
a hummed version of each sentence was recorded. All materials were
recorded from a trained female native speaker of German. The speaker
was instructed to speak a hummed version of the sentence after its naturally spoken version and to take care to preserve its original prosody,
speed, and total number of syllables of the normal sentence. In Figure 1,
spectrograms and fundamental frequency contours are given for type A
and type B example sentence (hummed and naturally spoken). The
recorded materials were digitized (44.1 kHz) and normalized (70%) with
regard to amplitude envelope to ensure an equal volume for all stimuli.
Design and Procedure
The task of the participants was to indicate in a 2-alternative forcedchoice task whether a probe word presented after the sentence had
Table 1
Example of the stimulus materials and mean length of the sentences in seconds
Sentences (naturally spoken and hummed)
Type A
1 IPB
2 IPBs
Type B
1 IPB
2 IBPs
Peter
Peter
Peter
Peter
Otto
Otto
Otto
Otto
verspricht Anna zu arbeiten # und das Büro zu putzen
promises Anna to work and to clean the office
verspricht # Anna zu entlasten # und das Büro zu putzen
promises to support Anna and to clean the office
bringt Fleisch, # Ute und Georg kaufen Salat und Säfte für das Grillfest
contributes meat, Ute and Georg buy salad and soft drinks to the barbecue
bringt Fleisch, # Ute kauft Salat # und Georg kauft Säfte für das Grillfest
contributes meat, Ute buys salad and Georg buys soft drinks to the barbecue
Natural
Hummed
4.42
4.64
4.74
5.06
5.73
5.76
6.12
5.85
Cerebral Cortex March 2008, V 18 N 3 543
Figure 1. Spectrograms (0--5000 Hz) and fundamental frequency contours (pitch: 0--500 Hz) for sample stimuli of the sentence materials used for all conditions of the experiment.
Natural speech is presented on the left, the respective hummed version of the sentence on the right. Spectograms and pitch contours are given for each pair (1 and 2 IPBs) of
sentence materials, for type A sentences in the top half, for type B sentences in the bottom half.
544 Prosodic Boundaries in Natural and Hummed Speech
d
Ischebeck et al.
Figure 2. Trial timing scheme.
been contained in the sentence or not. Probe words were chosen for
each sentence and belonged to different word classes (e.g., nouns, verbs,
determiners). The probes were spoken with final rising pitch, indicating
a question. Of the 96 sentences in total, 72 were selected as experimental materials, 18 as probe sentences, and 6 as practice sentences. In
each trial, participants had to decide whether the probe word had
occurred in the sentence or not. ‘‘Yes’’ (‘‘no’’) answers were given with
the middle (index) finger of the right hand. Sentences from the
experimental materials never contained the probe. The probe sentences
were not analyzed. In the case of the naturally spoken probe sentences,
the probe word was always contained in the sentence and a ‘‘yes’’
answer was expected. In the case of the hummed experimental sentence materials, the task was very easy. As soon as the hummed sentence
presentation finished, participants knew that ‘‘no’’ had to be the answer.
The main purpose of the task in the case of the hummed sentence
materials was to have participants attend to the presentation of the
sentence while it lasted. In the hummed probe sentences, one word of
the hummed sentence was naturally spoken. To prevent that the
participants suspended attention as soon as they detected a naturally
spoken word in the hummed probe sentence, the probe word was
identical to the naturally spoken word within the probe sentence in only
half of the trials. This ensured that participants had to wait until the
presentation of the probe word.
Naturally spoken and hummed sentence materials were presented in
alternating blocks. Every block consisted of 2 practice sentences, 24
experimental sentences, 6 probe sentences, and 6 null events (i.e., trials
in which no auditory stimulus was presented). The null events were
included to increase the efficiency of the design (Liu et al. 2001) and to
provide a baseline condition for analysis. The blocks of hummed
materials were of the same structure. The total experiment consisted
of 6 blocks (3 hummed, 3 normal), that is, 228 trials in total. Trials were
randomized with first-order transition probabilities between conditions
held constant. Three randomizations were generated in total. Each trial
began with the presentation of a hummed or normal sentence. Then
a probe word was presented 500 ms after the offset of the sentence. After
the presentation of the probe, a waiting time was inserted to ensure a total
intertrial interval (ITI) of 11 s (see Fig. 1). With a repetition time (TR) of 3
s and an ITI of 11 s, trial presentation and scanner triggering were
synchronous every 3 trials. To ensure synchronization, the beginning of
every third trial was synchronized using the scanner trigger. After
synchronization, a random waiting time of 0--600 ms was inserted before
the presentation of the sentence. Stimulus presentation and response
time recording were controlled by a computer outside the scanner using
Presentation software (Neurobehavioral Systems Inc., Albany, CA). The
experimental materials were presented over earphones within the
scanner. The participants were additionally protected from scanner noise
by earplugs. They reported that they had no difficulties understanding the
sentences over the scanner noise. A button box compatible with the
scanner environment was used to record the responses of the participants.
fMRI Data Acquisition
All imaging data were acquired using a 3T Bruker 30/100 Medspec MR
scanner. A high-resolution structural scan using the 3-dimensional
modified driven equilibrium Fourier transform imaging technique was
also obtained (128 sagittal slices, 1.5 mm thickness, in-plane resolution:
0.98 3 0.98 mm, field of vision [FOV]: 250 mm). Functional images were
acquired using a T2* gradient echo-planar imaging sequence. For the
functional measurement, a noise-reduced sequence was chosen to
ensure that noise would not impair comprehension. Eighteen ascending
axial slices per volume were continuously obtained, in the plane of the
anterior and posterior commissure (in-plane resolution of 3 3 3 mm,
FOV: 192 mm, 5 mm thickness, 1 mm gap, echo time: 40 ms, TR: 3000
ms). The 18 slices covered the temporal lobes and part of the parietal
lobes and cerebellum. According to a short questionnaire given to the
participants after the experiment, the noise level of the scanner did not
critically impair the auditory quality and comprehensibility of the
naturally spoken or the hummed stimulus materials.
Functional Data Analysis
fMRI data analysis was performed with statistical parametric mapping
(SPM2) software (Wellcome Department of Cognitive Neurology,
London, UK). The 6 blocks were measured as separate runs with 142
volumes each. The first 2 images of each functional run were discarded to
ensure signal stabilization. The functional data of each participant were
motion corrected. The structural image of each participant was registered to the time series of functional images and normalized using the T1
template provided by SPM2, corresponding approximately to Talairach
and Tournoux space (Talairach and Tournoux 1988). The functional
images were normalized using the normalization parameters of the
structural image and then smoothed with a full-width half-maximum
Gaussian kernel of 12 mm. A statistical analysis on the basis of the general
linear model was performed, as implemented in SPM2. Though hummed
and normal sentence materials were presented in blocks, an event-related
analysis was chosen. This made it possible to compare the sentences to
the null events interspersed within the blocks as a baseline, as well as to
discard probe trials and error trials from the analysis. The delta function of
the trial onsets per condition was convolved with the canonical form of
the hemodynamic response function as given in SPM2 and its first and
second temporal derivative to generate model time courses for the
different conditions. Each trial was modeled in SPM2 using the beginning
of the sentence as trial onset. Due to the length of the auditorily
presented sentence stimuli, the blood oxygen level--dependent response
was modeled with a duration of 5.345 s (average length of the sentences,
hummed and natural). Errors and probe sentences were modeled as
separate conditions and excluded from the contrasts calculated. The
functional time series was high-pass filtered with a frequency cutoff of 1/
80 Hz. No global normalization was used. Motion parameters and the
lengths of the individual sentences per trial were entered into the analysis
as parameters of no interest. For the random-effects analysis, the
calculated contrast images from the first-level analysis of each participant
were entered into a second-level analysis. Region of interest (ROI)
analysis—8 ROIs were created on the basis of the automated anatomical
labeling (AAL) atlas (Tzourio-Mazoyer et al. 2002): STG, the rolandic
operculum, the gyrus of Heschl, and the inferior frontal gyrus (pars
triangularis and pars opercularis). The ROI for the STG was divided into 3
parts, anterior (y > –15), middle (–35 < y < –15), and posterior (y < –35),
and the rolandic operculum into 2 parts, anterior (y > –15) and posterior
(y < –15). A visualization of the temporal ROIs is given in Figure 3. The
ROI analysis was performed by averaging the effect sizes (contrast
estimates) over all voxels for the left and right ROIs per participant.
Cerebral Cortex March 2008, V 18 N 3 545
Figure 3. A visualization of the 6 temporal ROIs as an ascending series of axial slices (shown only for the left hemisphere). ROIs: anterior (y[ 15, light blue), middle ( 15 [ y[
35, yellow), and posterior (y \ 35, brown) part of the STG, gyrus of Heschl (dark blue), anterior (y [ 35, medium blue), and posterior (y \ 35, orange) part of the Rolandic
operculum.
Results
Behavioral
Participants were excluded from further analysis if they made
more than 30% errors in any condition. This led to the exclusion
of one participant in addition to the one participant who was
excluded due to scanner malfunction. Analyzing the data of the
remaining 14 participants yielded 144 errors (5.7%) in total.
Response times were measured from the presentation of the
probe word. Only correct responses to experimental trials, not
probe trials, within the interval of 200--2000 ms were analyzed.
Response times were entered into a repeated-measures analysis
of variance (ANOVA) with speech type (natural vs. hummed)
and IPB (1 vs. 2 IPBs) as factors. Participants reacted significantly
faster to the hummed (473 ms, SD = 46.05) than to the naturally
spoken sentence materials (753 ms, SD = 85.36), yielding a
significant main effect of speech type (F1,13 = 30.64, mean
squared error [MSE] = 35691, P < 0.001). The number of IPBs did
not influence response times; the main effect of IPB and the
interaction speech type 3 IPB were not significant.
fMRI Data: Whole-Brain Analysis
Comparisons of Sentence Materials with a Different
Number of IPBs
For natural speech, sentences with 2 IPBs activated the STG,
extending to the rolandic operculum and gyrus of Heschl,
bilaterally, more strongly than sentences with 1 IPB (Fig. 4a and
Table 2). In the corresponding comparison for hummed
sentences, a significant cluster was observed in the left supramarginal gyrus, extending to the left STG and the left gyrus of
Heschl. No significant clusters were observed in the reverse
comparisons, for natural speech as well as for hummed
sentences.
Comparisons of the 2 Basic Types of Materials (hummed
sentences and natural speech) to Baseline
When natural speech was compared with baseline (null events),
the strongest activations were observed within the STG, bilaterally, and the supplemental motor area (SMA) (Table 2).
Further activations were observed within the right precentral
gyrus, the right insula, the left superior parietal lobule, and in the
cerebellum. When hummed sentences were compared with
baseline, activations were observed within the STG, bilaterally,
the precentral gyrus, bilaterally, the SMA, and the right middle
frontal gyrus, as well as the left inferior parietal lobule. Activations were also observed within the cerebellum, bilaterally.
Comparisons of Hummed Sentences to Natural Speech
Compared with hummed sentences, natural speech activated
the frontal gyrus and middle temporal gyrus, bilaterally, but with
546 Prosodic Boundaries in Natural and Hummed Speech
d
Ischebeck et al.
Figure 4. (a) Comparison of sentences with 2 IPBs to sentences with 1 IPB: 2 IPBs [
1 IPB. Natural speech (left) and hummed sentences (right). (b) Comparison of speech
types: natural speech [ hummed sentences. Threshold: P \ 0.001, uncorrected,
showing only clusters with more than 40 voxels, corresponding to a P \ 0.05
corrected on cluster level. Left is left in the image.
a left-hemispheric predominance, as well as the left angular
gyrus and the left thalamus and caudate, more strongly (Fig. 4b
and Table 2). No brain area was significantly more activated for
hummed sentences than for natural speech.
fMRI Data: ROI Analysis
To further investigate and compare the behavior of the brain
areas potentially involved in IPB processing, we conducted an
ROI analysis over 8 ROIs (pars opercularis and triangularis of the
inferior frontal gyrus, anterior, middle and posterior part of the
STG, gyrus of Heschl, and anterior and posterior rolandic
operculum). The effect sizes for each condition were averaged
over all voxels contained within each ROI per side and
participant. The results per ROI, hemisphere, and condition
are shown in Figure 5. They were entered into a repeatedmeasures ANOVA with ROI (8), ‘‘hemisphere’’ (left vs. right), IPB
(1 vs. 2 IPS), and speech type (naturally spoken vs. hummed) as
factors. The left hemisphere was on average more strongly
activated than the right hemisphere, which is reflected in
a significant main effect of hemisphere (F1,13 = 49.53, MSE =
2.279, P < 0.001). Natural speech activated the brain areas
investigated more strongly than hummed speech, yielding
a significant main effect of speech type (F1,13 = 11.93, MSE =
4.415, P < 0.01). More activation was observed for sentences
containing 2 IPBs than sentences containing 1 IPB giving a main
effect of IPB (F1,13 = 17.45, MSE = 0.385, P < 0.01). The different
overall activation levels within the ROIs yielded a significant
main effect of ROI (F7,91 = 72.71, MSE = 1.435, P < 0.001).
speech type and IPB yielded additive effects as none of the
interactions containing both factors reached significance. The
type of speech materials (hummed or natural) influenced
lateralization, yielding a significant speech type 3 hemisphere
interaction (F1,13 = 30.59, MSE = 0.419, P < 0.001) and depended
Table 2
Contrasts are thresholded at P \ 0.001 uncorrected and P \ 0.05 corrected on cluster level
Side
x
y
z
k
Z
Natural speech [ baseline
Frontal
Left
Right
Right
Right
Right
Temporal
Left
Right
Parietal
Left
Cerebellum
Left
Right
Right
PrecGa
k
Z
12
66
4928
4.78
SMA
MFGa
PrecG
3
27
51
3
36
3
66
30
0
4928
4928
4928
5.81
4.42
6.40
6.25
6.03
STGa
STG
54
51
6
3
6
54
4928
47
5.90
4.96
4.68
IPL
27
39
30
4928
3.63
42
33
45
54
36
57
36
36
36
92
93
87
4.00
4.04
3.82
39
24
105
3.61
3
36
30
54
15
24
6
3
54
6
30
48
7897
7897
7897
7897
5.36
4.77
4.63
4.09
STG
STGa
60
66
24
18
6
6
7897
7897
SPL
30
57
48
7897
Crus1
z
36
SMA
Insulaa
PrecGa
PrecGa
c
y
Hummed sentences [ baseline
42
57
36
7897
4.87
Natural [ hummed sentencesb
Frontal
Left
Right
Right
Right
Temporal
Left
Right
Parietal
Left
Basal ganglia
Left
x
Crus1
VI
Crus1
c
Hummedc [ natural speechb
IFG_tri, IFG_oper
SMA
IFG_tri
MFG
39
6
45
33
30
15
27
3
0
54
18
54
600
209
50
65
5.13
5.42
4.13
4.30
TP, MTG, STG
MTG, STG
57
63
9
3
24
18
583
150
5.01
4.43
AG, SPL
27
63
42
83
4.00
Thalamus, caudate
15
12
12
347
4.13
Natural speech: 2 pauses [ 1 pause
Temporoparietal
Left
HeschlG, STG, ROp
45
Right
STG, ROp, HeschlG
42
Natural speech: 1 pauses [ 2 pause
NS
a
NS
Hummed sentences: 2 pauses [ 1 pause
18
3
12
12
287
242
4.51
4.64
SupramG, STG, HeschlG
63
Hummed sentences: 1 pauses [ 2 pause
NS
Note: AG, angular gyrus; HeschlG, gyrus of Heschl; IFG_oper, inferior frontal gyrus, pars opercularis; IFG_tri, inferior frontal gyrus, pars triangularis; IPL, inferior parietal lobule; k, cluster size; MFG, middle
frontal gyrus; MTG, middle temporal gyrus; NS, not significant; PrecG, precentral gyrus; ROp, rolandic operculum; SPL, superior parietal lobule; TP, temporal pole; SupramG, surpamarginal gyrus.
Coordinates are reported as given by SPM2, corresponding only approximately to Talairach--Turnoux space (Talairach and Tournoux 1988; Brett et al. 2001). Anatomical labels are given on the basis of
the classification of the AAL (automated anatomical labeling) atlas (Tzourio-Mazoyer et al. 2002).
a
Activation is part of a bigger cluster.
b
Masked with natural speech [ baseline (hummed sentences [ baseline) at P \ 0.05, inclusive.
c
Cerebellar labels within the AAL atlas are based on Schmahmann et al. (1999). The first label denotes the location of the maximum, the following labels denote further areas containing a majority of
voxels of the activated cluster.
on the ROI giving a significant triple ROI 3 speech type 3
hemisphere interaction (F7,91 = 3.92, MSE = 0.103, P < 0.001).
The lateralization, type of materials, and number of IPBs
influenced activation differently in different ROIs, giving the
significant 2-way interactions ROI 3 hemisphere (F7,91 = 19.83,
MSE = 0.437, P < 0.001), ROI 3 speech type (F7,91 = 6.40, MSE =
0.175, P < 0.001), and ROI 3 IPB (F7,91 = 16.00, MSE = 0.018, P <
0.001). The significant triple interactions between ROI 3
hemisphere 3 IPB (F7,91 = 6.14, MSE = 0.007, P < 0.001) and
ROI 3 hemisphere 3 speech type (F7,91 = 3.92, MSE = 0.103, P <
0.001) indicate that speech type and IPB each interacted with
hemisphere differently in different ROIs. The remaining interactions were not significant.
To identify these differences between ROIs more closely we
conducted additional ANOVAs separately for different ROIs.
Significant main effects for IPB were observed in all ROIs except
for the inferior frontal gyrus ROIs. The main effect speech type
was significant in all ROIs but the anterior and posterior
rolandic operculum. Significant main effects for hemisphere
were observed in all ROIs with the exception of the anterior
STG and the anterior rolandic operculum. The hemisphere 3
speech type interaction was significant in all but 3 ROIs, the
anterior and posterior rolandic operculum and the gyrus of
Heschl. A hemisphere 3 IPB interaction was observed in the
posterior STG and the gyrus of Heschl. No ROI showed
a significant speech type 3 IPB interaction. The triple interaction hemisphere 3 speech type 3 IPB was significant
only in the anterior rolandic operculum.
To investigate differences with regard to lateralization for the
2 types of material in every ROI more closely, Newman--Keuls
post hoc tests were calculated for the interaction hemisphere 3
speech type. A stronger activation within the left hemisphere
compared with the right was observed for natural speech in
nearly all ROIs, except the anterior rolandic operculum and
the anterior part of the STG. For hummed speech, a lefthemispheric dominance was observed in only 4 ROIs, the gyrus
of Heschl, the middle and posterior part of the STG, and the
posterior rolandic operculum. No ROI showed a significantly
stronger activation of the right hemisphere for hummed
sentences compared with natural speech.
Discussion
The aim of the present study was to identify the brain
areas involved in the processing of sentence-level prosody by
investigating the processing of IPBs. We will first discuss our
results with regard to differences in the processing of sentences
Cerebral Cortex March 2008, V 18 N 3 547
Figure 5. Results of the ROI analysis (bars with black stripes 5 left hemisphere, white bars 5 right hemisphere). Effect sizes were averaged over all voxels of each ROI. Error bars
represent the standard error of the mean. n1, natural speech with 2 IPBs; n2, natural speech with 2 IPBs; h1, hummed sentences with 1 IPB; h2, hummed sentences with 2 IPBs;
IFG oper, inferior frontal gyrus, pars opercularis; IFG tri, inferior frontal gyrus, pars triangularis; ant/post RolOp, anterior/posterior part of the rolandic operculum; ant/mid/post STG,
anterior/middle/posterior part of the STG; Heschl, gyrus of Heschl.
with 2 IPBs as compared with sentences with 1 IPB and later
turn to the general differences between natural and hummed
speech.
Brain Areas Involved in IPB Processing
In the whole-brain analysis, stronger activation was observed for
sentences with 2 IPBs than sentences with 1 IPB within the left
STG, for natural speech as well as for hummed sentences. An
additional focus within the STG on the right side was significant
for natural speech but failed to reach significance for hummed
speech. This indicates that processing an additional IPB activates the STG similarly for natural speech and hummed
sentences.
In the ROI analysis (pars opercularis and triangularis of the
inferior frontal gyrus, anterior, middle and posterior part of the
STG, gyrus of Heschl, and anterior and posterior rolandic
operculum), more activation for 2 than for 1 IPB was observed
only in the temporal ROIs, but not within the inferior frontal
gyrus. This indicates that prosody processing mainly involves
brain areas related to auditory processing. In addition, all but
one ROIs, the posterior rolandic operculum, showed a significant main effect or an interaction with speech type (natural
speech or hummed sentences). This result could be taken to
indicate that these regions are involved in the processing of the
more complex segmental information contained in natural
speech. The posterior rolandic operculum (bilaterally), on the
other hand, did not show stronger activation for natural speech
compared with hummed sentences: there was no main effect or
any interaction with speech type. This suggests that this region
might play a less pronounced role in the processing of the
specific spectral composition and the additional linguistic
information (semantics, syntax) contained in natural speech.
As this region, however, was more strongly activated by
materials containing 2 IPBs rather than one (main effect of
IPB), it could be speculated that this region might be specifically
involved in the processing of the prosodic information.
548 Prosodic Boundaries in Natural and Hummed Speech
d
Ischebeck et al.
Interestingly, there was no interaction of IPB and speech type
in any of the ROIs investigated. This indicates that the activation
elicited by the presence of an additional IPB in the ROIs
investigated did not depend on the comprehensibility of the
utterance. Although IPBs can aid the understanding of an
utterance, we did not observe any evidence for an interaction
in the respective ROIs, not even within the left inferior frontal
gyrus, a region assumed to be involved in syntactic processing.
This finding should not lead to the conclusion, however, that
prosody and syntax do not interact. It is possible that syntax and
prosody are processed independently from each other in
different brain regions and that the interaction between both
types of information occurs in higher associative areas within
the brain. This is also suggested by a recent ERP study with
patients suffering from lesions in the corpus callosum (Friederici
et al. 2007).
So far, only one neuroimaging study investigated the perception of IPBs (Strelnikov et al. 2006). In the condition with IPB,
the IPB could be in one of 2 positions, changing the meaning of
the sentence. ‘‘To chop not # to saw’’ (‘‘Rubit nelzya, pilit’’)
means: ‘‘not to chop but to saw,’’ whereas ‘‘To chop # not to
saw’’ (‘‘Rubit, nelzya pilit’’) means ‘‘to chop and not to saw.’’ This
is a possible construction in Russian because, different from
Germanic languages such as German, Dutch, or English, the
negative word ‘‘not’’ may have scope to its left or right depending on the position of an IPB. In this condition, participants were
first presented with one of the 2 versions and had to select the
appropriate alternative (one needs to: chop/saw). In the
condition without IPB, participants were first presented with
a simple statement (‘‘Father bought him a coat’’) and then had to
select the appropriate alternative (‘‘His father bought him:
a coat/a watch’’). Strelnikov et al. reported stronger activation
for sentences containing an IPB (e.g., ‘‘To chop not # to saw,’’ or
‘‘To chop # not to saw’’) than for sentences without an IPB (e.g.,
‘‘Father bought him a coat’’) within the right posterior prefrontal cortex and an area within the right cerebellum. Different
from our results, Strelnikov et al. did not observe stronger
activation within the STG. However, as already outlined in the
introduction, the task used was different for both sentences
types. Although the task consisted in both cases of a visually
presented question with 2 response alternatives (‘‘Father
bought him: a coat/a watch’’ and ‘‘One needs to: saw/chop,’’
for the 2 examples, respectively), the first type of materials
(sentences without IPBs) required a semantic judgement based
on the segmental content of the utterance, whereas the second
type of materials (sentences with an IPB) required a semantic
judgement based on the segmental content as well as the
prosodic information (IPB position) of the utterance. It is
therefore possible that the differences observed by Strelnikov
et al. (2006) might not only be due to the presence or absence
of IPBs in the materials but also to task variables.
To summarize our results so far, we observed a stronger
activation within the STG for sentences containing 2 IPBs
compared with sentences containing 1 IPB. The activation
extended to the gyrus of Heschl and the rolandic operculum.
To find out whether one of these areas might show a specialization with regard to prosody processing as compared with the
processing of the segmental content of speech, we also conducted an ROI analysis. In the ROI analysis, the posterior part of
the rolandic operculum was the only brain region that showed
a modulation of activation due to the presence of an additional
IPB independent of the amount of segmental information
contained in natural speech as compared with hummed
sentences.
The Processing of Hummed Sentences Compared with
Natural Speech
Hummed sentences also represent a human vocalization and
a complex auditory signal. Comparing the processing of natural
speech with hummed sentences can therefore yield information with regard to the brain areas processing the segmental
content of speech as compared with brain areas processing the
more basic aspects of speech. Comparing the processing of
hummed sentences to baseline gives an initial overview of the
brain areas involved in the processing of hummed speech, as
well as, potentially, in the processing of prosodic aspects of
speech. The strongest activations were observed within the
STG, extending to the rolandic operculum and the gyrus of
Heschl, areas involved in auditory processing. Strong activations
were also observed within the SMA, the right precentral gyrus,
and the cerebellum. These activations are most likely due to the
manual response required by the task. On the basis of the logic
of cognitive subtraction, subtracting hummed sentences from
natural speech should yield brain areas that are involved in the
processing of segmental information, lexico-semantics, and
syntax. Stronger activations for natural speech than for hummed
sentences were observed in the middle temporal gyrus, bilaterally, and within the left opercular and triangular part of
the inferior frontal gyrus, corresponding approximately to
Brodmann area 44 and 45. These frontotemporal activations
are related to the processing of syntactic and semantic information contained in natural speech. Similar activations have
been reported in other studies comparing natural speech to
unintelligible speech (Spitsyna et al. 2006). Natural speech also
activated the STG more strongly than hummed speech, possibly
reflecting the involvement of the auditory cortex in processing
the more complex spectral components carrying the segmental
information of natural speech. It should be noted that the
probe-monitoring task used here might have had some influence on the brain activation patterns observed. Our task
induced an attentional focus on lexico-semantic processing
rather than pure prosody processing that might also have
influenced the activations we observed in temporal areas for
hummed speech. In the following, we will further discuss our
findings on the background of the organization of the auditory
system with regard to its relation to speech and prosody
processing.
The Brain Areas Involved in the Processing of Complex
Sounds and Speech
The auditory association cortex seems to represent mainly
spectral and temporal properties of auditory stimuli rather than
more abstract high-level properties such as auditory objects
(see, for reviews, Griffiths and Giraud 2004; Griffiths et al. 2004).
The auditory association areas are assumed to be processing
more abstract properties than primary auditory cortex. Some of
these properties are assumed to be processed in separate
regions. Rauschecker and Tian (2000) proposed that anterior
regions subserve auditory object identification, whereas posterior regions process spatial information, similar to the ventral
(‘‘what’’) and dorsal (‘‘where’’) pathways of the visual system
(Ungerleider and Mishkin 1982). Although such an anterior-posterior distinction is supported in part by single-cell recordings in monkeys (Recanzone 2000; Tian et al. 2001) as well as by
lesion (Clarke et al. 2000; Adriani et al. 2003) and brain-imaging
studies (Maeder et al. 2001), the distinction is not clear-cut. The
functional nature of this anterior--posterior division is therefore
still under debate (Zatorre et al. 2002).
With regard to the processing of speech, the dual pathway
model (Rauschecker and Tian 2000) suggests that speech should
activate regions anterior to the primary auditory cortex as it is
a familiar and highly complex auditory signal with no relation to
spatial information. When intelligible speech is compared with
nonspeech, stronger left-lateralized activations obtain in anterior
and middle parts of the superior temporal sulcus, for single
words as well as for sentences (Binder et al. 2000; Scott al. 2000;
Narain et al. 2003; Meyer et al. 2004). However, often also
activation within the posterior temporal and inferior parietal
regions is observed in imaging studies (Meyer et al. 2002;
Spitsyna et al. 2006). Although not directly derivable from the
dual pathway model, this finding is not altogether surprising. It
has long been known from lesion studies that posterior temporal
and inferior parietal regions are critical for supramodal language
comprehension (Geschwind 1965).
The stronger activations for natural speech than for hummed
sentences observed in the present study within the STG are
compatible with previous results. We observed stronger activation for natural than for hummed speech in middle and
posterior parts of the STG and the posterior part of the rolandic
operculum. With regard to the lack of activation in anterior
parts of the superior temporal lobe, it could be speculated that
hummed speech is recognized as a familiar human vocalization.
This familiarity might explain differences to results from studies
using speech materials that are rendered unintelligible artificially and sound more unfamiliar (Scott et al. 2000; Meyer et al.
2004).
It should be noted, however, that the activations observed
within the temporal cortex for speech may not be specific for
speech. Similar activations have been observed for other
Cerebral Cortex March 2008, V 18 N 3 549
complex auditory stimuli such as musical sounds and melodies
(Griffiths et al. 1998; Patterson et al. 2002; see, for a review,
Price et al. 2005). It is possible that differences between speech
and other complex auditory stimuli are subtle and go beyond
simple localization. A recent study by Tervaniemi et al. (2006),
for example, showed that the superior temporal region reacts
differently to changes in pitch or duration for speech syllables
than for musical sounds.
The Brain Areas Involved in the Perception of Prosody
Activations within the STGs have been regularly observed in
imaging studies when speech with or without prosodic modulation is presented and compared with baseline (e.g., Gandour
et al. 2003, 2004; Wildgruber et al. 2004; Hesling et al. 2005).
Regions outside the temporal lobe have also been observed to
show a modulation of activation dependent on prosodic properties but appear to vary considerably across different studies.
Prosody-related activations were observed to depend on the
specifics of the task (Plante et al. 2002; Tong et al. 2005),
the type of prosody involved (e.g., affective vs. linguistic,
Wildgruber et al. 2004), and the degree of propositional information contained in the stimulus materials (Gandour et al.,
2003, 2004; Hesling et al. 2005; Tong et al. 2005). Although the
superior temporal region seems to be the main candidate for the
perception of prosodic modulation in speech, other areas
outside the temporal lobe are likely to be involved, and the
auditory association areas are likely to be only part of a more
extended processing network for prosodic aspects of speech.
Imaging studies investigating the processing of sentence-level
prosody often compared natural speech to speech stimuli either
with no segmental information or with no or little prosodic
information. One possibility is to remove the segmental information by filtering, preserving the intonational contour of
the sentence (Meyer et al. 2002, 2004). The results from these
studies point toward a frontotemporal network with a righthemispheric dominance. The rolandic operculum, in particular
in the right hemisphere, has been identified as part of this
network. Another approach is to remove the intonational contour of sentences, for example, by high-pass filtering (Hermann
et al. 2003; Meyer et al. 2004) or to reduce prosody information
by speaking with little prosodic expressiveness (Hesling et al.
2005). The rationale for this approach is to identify prosodyrelated activations by subtracting low-prosody speech from
normal speech, thus ideally subtracting away activations related
to the processing of the segmental content of speech while
leaving prosody-related activations. The results from these
studies, however, are mixed. Meyer et al. (2004) did not observe
stronger activations for normal speech than for low-prosody
speech. In this study, the task required a prosody comparison
between 2 successive sentences stimuli in an experimental
setting in which flattened speech (no prosodic information) and
degraded speech (no segmental information) were presented in
a pseudorandomized order. Hesling et al. (2005) observed
different results for intelligible and nonintelligible (i.e., lowpass filtered) speech. In the case of high-prosody intelligible
speech, the right STG was more strongly activated than in the
case of low-prosody intelligible speech. High-prosody nonintelligible speech did not activate any brain regions more
strongly than low-prosody nonintelligible speech. These results
indicate that the additional prosodic information contained in
high-prosodic speech activates the STG, although not very
550 Prosodic Boundaries in Natural and Hummed Speech
d
Ischebeck et al.
strongly. Finally, a study by Doherty et al. (2004) compared
sentences with a rising pitch (question) to sentences with
a falling pitch (statement). Although their results might not
exclusively reflect prosody processing, they observed, among
others, a stronger activation within the STG, bilaterally, for
sentences with a rising pitch. These findings also indicate that
the STG might play a dominant role in the first stages of prosody
processing. The results of the present study, namely, that areas
within and around the STG are involved in IPB processing
therefore agree well with previous results.
The Lateralization of Prosody Processing
Although a right-hemispheric dominance in prosody processing
was initially suggested based on evidence from patients, the
results with regard to the perception of sentence-level nonaffective prosody show a mixed picture. Some studies show
greater impairment in patients with right hemisphere damage,
compared with patients that had lesions in their left hemisphere, and controls (Bryan et al 1989), whereas others find
greater impairment in patients with left hemisphere damage
(Perkins et al. 1996; Pell and Baum 1997) or a more complicated
pattern of preserved function alongside impairments for both
patient groups compared with controls (Baum et al. 1997;
Imaizumi et al. 1998; Walker et al. 2001; Baum and Dwivedi
2003). A possible reason for these differences might be that the
patient groups comprise patients with nonoverlapping patterns
of brain damage, often in frontoparietal areas. Cognitive
functioning in these patients might therefore be compromised
in a number of domains, such as, for example, working memory
and drawing inferences, faculties required in complex tasks
such as prosodic disambiguation of sentences. Imaging studies
investigating the perception of sentence-level prosody with
healthy adults circumvent this problem and allow a separate
assessment of lateralization for different brain regions.
A number of imaging studies have investigated the lateralization of prosodic processing (Meyer et al. 2002; Kotz et al. 2003;
Meyer et al. 2004; Hesling et al. 2005). Meyer et al. (2002, 2004)
observed stronger right-hemispheric activations within frontotemporal areas for low-pass--filtered speech than for natural
speech, whereas Kotz et al. (2003) using natural speech stimuli
observed predominantly bilateral activations. Hesling et al.
(2005) found activation within the right STG when high-prosody
intelligible speech was compared with low-prosody intelligible
speech. Another way to remove the propositional content of
speech while preserving its prosodic contour is to present
foreign language materials to listeners with no knowledge of this
language. In an experiment by Tong et al. (2005), bilateral activation was observed for English speakers listening to Chinese
language materials for 8 of 9 ROIs investigated. A stronger righthemispheric activation was observed for only one ROI within the
middle frontal gyrus. Tong et al. also found mostly bilateral
activation (6 of 9 ROIs) for Chinese participants listening to
Chinese and a dependence of lateralization on the type of task.
This evidence indicates that prosody when presented together
with segmental information is subserved by a bilaterally distributed network of fronto--temporal brain areas.
In the present study, hummed sentence materials were presented as well as natural speech. Under the hypothesis that the
right hemisphere is specialized in the processing of sentencelevel prosody, it was hypothesized that the processing of
hummed speech compared with baseline shows right-hemi-
spheric lateralization. In the whole-brain analysis, bilateral
activation was observed in the STG and the precentral gyrus,
and stronger activation of the right hemisphere was observed
within the middle frontal gyrus for hummed sentences. In the
ROI analysis, 6 of 8 ROIs showed stronger activation within the
left hemisphere for natural speech, whereas only 4 of 8 ROIs
show a left-hemispheric dominance for hummed sentences.
This suggests that prosodic processing is organized more bilaterally than the processing of other properties of speech, such as
syntax or lexico-semantics. Further insights about the lateralization with regard to the processing of specific linguistic
aspects of prosody can be derived from the lateralization of
IPB processing. An interaction between the number of IPBs and
lateralization was observed only in 2 ROIs (posterior STG, gyrus
of Heschl) with both ROIs showing a stronger modulation of
activation due to the number of IPBs in the left hemisphere, for
natural speech as well as hummed sentences. This could be
taken to indicate that these 2 temporal areas show a lefthemispheric dominance for prosody processing.
The left-hemispheric dominance in the temporal regions
observed in the present study might be due to some degree to
the task employed. Although other studies required participants
to attend directly to the prosodic information of the speech
stimuli (Meyer et al., 2002, 2004; Tong et al. 2005), a probe
detection task was used here that required participants, even in
the hummed speech condition, to attend to and memorize
a potentially appearing naturally spoken word carrying segmental information. The results of the present study are therefore
compatible with the functional or task-dependent class of
hypotheses. According to this class of hypotheses, the lateralization of prosody processing could shift from the right to the
left hemisphere when the task or the stimulus materials
promote attention to syntactic--semantic rather than prosodic
properties of the materials.
Conclusion
This study aimed at identifying the brain areas involved in the
processing of sentence-level prosody and, in particular, the
processing of IPBs. Sentences with 2 IPBs activated the STG,
bilaterally, more strongly than sentences with 1 IPB. This
pattern of activation was very similar for natural speech and
hummed sentences. The results from the ROI analysis suggest
that the posterior rolandic operculum might play a specific role
in the processing of prosodic information because it was the
only ROI not showing an influence of the type of speech
materials (hummed sentences or natural speech). When comparing natural speech and hummed sentence materials, we
found natural speech to activate a number of areas in the left
hemisphere more strongly than hummed sentences. The lefthemispheric dominance of temporal activations observed for
hummed sentences, however, might be due to the attentional
focus on segmental information required by the task employed
in the present study.
Funding
Human Frontier Science Program (HFSP RGP5300/2002-C102)
to KA.
Notes
Conflict of Interest: None declared.
Address correspondence to email: [email protected].
References
Adriani M, Maeder P, Meuli R, Bellmann A, Frischknecht R, Villemure J-G,
Mayer J, Annoni J-M, Bogousslavsky J, Fornari E, et al. 2003. Sound
recognition and localization in man: specialized cortical networks and
effects of acute circumscribed lesions. Exp Brain Res. 153:591--604.
Austin J. 1975. How to do things with words. The William James
Lectures delivered at Harvard University in 1955. 2nd ed. Cambridge
(MA): Harvard University Press.
Baum S, Pell M, Leonard C, Gordon J. 1997. The ability of right- and lefthemisphere-damaged individuals to produce and interpret prosodic
cues marking phrasal boundaries. Lang Speech. 40:313--330.
Baum SR, Dwivedi VD. 2003. Sensitivity to prosodic structure in left-and
right-hemisphere-damaged individuals. Brain Lang. 87:278--289.
Binder JR, Frost JA, Hammeke TA, Bellgowan PSF, Springer JA, Kaufman
JN, Possing ET. 2000. Human temporal lobe activation by speech and
nonspeech sounds. Cereb Cortex. 10:512--528.
Brett M, Christoff K, Cusack R, Lancaster J. 2001. Using the Talairach
atlas with the MNI template. Neuroimage. 13:S85.
Bryan K. 1989. Language prosody and the right hemisphere. Aphasiology. 3:285--299.
Clarke S, Bellman A, Meuli R, Assal G, Steck AJ. 2000. Auditory agnosia and
auditory spatial deficits following left hemispheric lesions: evidence
for distinct processing pathways. Neuropsychologia. 38:797--807.
Cooper WE, Paccia-Cooper J. 1981. Syntax and speech. Cambridge (MA):
Harvard University Press.
de Rooij JJ. 1975. Prosody and the perception of syntactic boundaries.
IPO Annu Prog Rep. 10:36--39.
de Rooij JJ. 1976. Perception of prosodic boundaries. IPO Annu Prog
Rep. 11:20--24.
Doherty CP, West WC, Dilley LC, Shattuck-Hufnagel S, Caplan D. 2004.
Question/statement judgments: an fMRI study of intonation processing. Hum Brain Mapp. 23:85--98.
Frazier L, Carlson K, Clifton C. 2006. Prosodic phrasing is central to
language comprehension. Trends Cogn Sci. 6:244--249.
Friederici AD, Alter K. 2004. Lateralization of auditory language
functions: a dynamic dual pathway model. Brain Lang. 89:267--276.
Friederici AD, von Cramon DY, Kotz SA. 2007. Role of the corpus
callosum in speech comprehension: interfacing syntax and prosdody. Neuron. 53:135--145.
Gandour J, Dzemidzic M, Wong D, Lowe M, Tong Y, Hsieh L,
Satthamnuwong N, Lurito J. 2003. Temporal integration of speech
prosody is shaped by language experience: an fMRI study. Brain Lang.
84:318--336.
Gandour J, Tong Y, Wong D, Talavage T, Dzemidzic M, Xu Y, Li X, Lowe
M. 2004. Hemispheric roles in the perception of speech prosody.
Neuroimage. 23:344--357.
Gandour J, Wong D, Lowe M, Dzemidzic M, Satthamnuwong N, Tong Y, Li
X. 2002. A cross-linguistic fMRI study of spectral and temporal cues
underlying phonological processing. J Cogn Neurosci. 14:1076--1087.
Geschwind N. 1965. Disconnexion syndromes in animals and man: part
I. Brain. 88:237--294.
Grabe E, Warren P, Nolan F. 1994. Resolving category ambiguities—
evidence from stress shift. Speech Commun. 15:101--114.
Griffiths TD, Büchel C, Frackowiak RSJ, Patterson RD. 1998. Analysis of
temporal structure in sound by the human brain. Nat Neurosci.
1:422--427.
Griffiths TD, Giraud AL. 2004. Auditory function. In: Frackowiak RSJ,
Friston KJ, Frith CD, Dolan RJ, Price CJ, Zeki S, Ashburner J, Penny W,
editors. Human brain function. 2nd ed. Amsterdam: Elsevier. p. 61--75.
Griffiths TD, Warren JD, Scott SK, Nelken I, King AJ. 2004. Cortical
processing of complex sound: a way forward? Trends Neurosci.
27:181--185.
Grosjean F, Hirt C. 1996. Using prosody to predict the end of sentences
in English and French: normal and brain-damaged subjects. Lang
Cogn Process. 11:107--134.
Herrmann CS, Friederici AD, Oertel U, Maess B, Hahne A, Alter K. 2003.
The brain generates its own sentence melody: a Gestalt phenomenon in speech perception. Brain Lang. 85:396--401.
Hesling I, Clement S, Bordessoules M, Allard M. 2005. Cerebral
mechanisms of prosodic integration: evidence from connected
speech. Neuroimage. 24:937--947.
Cerebral Cortex March 2008, V 18 N 3 551
Imaizumi S, Mori K, Kiritani S, Hiroshi H, Tonoike M. 1998. Taskdependent laterality for cue decoding during spoken language
processing. Neuroreport. 9:899--903.
Isel F, Alter K, Friederici AD. 2005. Influence of prosodic information on
the processing of split particles: ERP evidence from spoken German.
J Cogn Neurosci. 17:154--167.
Kotz SA, Meyer M, Alter K, Besson M, von Cramon DY, Friederici AD.
2003. On the lateralization of emotional prosody: an event-related
functional MR investigation. Brain Lang. 86:366--376.
Kreiman J. 1982. Perception of sentence and paragraph boundaries in
natural conversation. J Phon. 10:163--175.
Larkey LS. 1983. Reiterant speech: an acoustic and perceptual validation.
J Acoust Soc Am. 73:1337--1345.
Liu TT, Frank LR, Wong EC, Buxton RB. 2001. Detection power,
estimation efficiency and predictability in eventrelated fMRI. Neuroimage. 13:759--773.
Maeder PP, Meuli RA, Adriani M, Bellmann A, Fornari E,Thiran J-P, Pittet
A, Clarke S. 2001. Distinct pathways involved in sound recognition
and localization: a human fMRI study. Neuroimage. 14:802--816.
Meyer M, Alter K, Friederici AD, Lohmann G, von Cramon DY. 2002.
FMRI reveals brain regions mediating slow prosodic modulations in
spoken sentences. Hum Brain Mapp. 17:73--88.
Meyer M, Steinhauer K, Alter K, Friederici AD, von Cramon DY. 2004.
Brain activity varies with modulation of dynamic pitch variance in
sentence melody. Brain Lang. 89:277--289.
Narain C, Scott SK, Wise RJS, Rosen S, Leff A, Iversen SD, Matthews PM.
2003. Defining a left-lateralized response specific to intelligible
speech using fMRI. Cereb Cortex. 13:1362--1368.
Nespor M, Vogel I. 1983. Prosodic structure above the word. In: Cutler A,
Ladd DR, editors. Prosody: models and measurements. Berlin
(Germany): Springer Verlag. p. 123--140.
Pannekamp A, Toepel U, Alter K, Hahne A, Friederici AD. 2005. Prosodydriven sentence processing: an event-related brain potential study.
J Cogn Neurosci. 17:407--421.
Patterson RD, Uppenkamp S, Johnsrude I, Griffiths TD. 2002. The
processing of temporal pitch and melody information in auditory
cortex. Neuron. 36:767--776.
Pell M, Baum S. 1997. The ability to perceive and comprehend intonation
in linguistic and affective contexts by brain-damaged adults. Brain
Lang. 57:80--99.
Perkins JM, Baran JA, Gandour J. 1996. Hemispheric specialization in
processing intonation contours. Aphasiology. 10:343--362.
Plante E, Creusere M, Sabin C. 2002. Dissociating sentential prosody
from sentence processing: activation interacts with task demands.
Neuroimage. 17:401--410.
Poeppel D. 2003. The analysis of speech in different temporal integration windows: cerebral lateralization as asymmetric sampling in
time. Speech Commun. 41:245--255.
Price C, Thierry G, Griffiths T. 2005. Speech-specific auditory processing: where is it? Trends Cogn Sci. 9:271--276.
Rauschecker JP, Tian B. 2000. Mechanisms and streams for processing of
‘what’ and ‘where’ in auditory cortex. Proc Natl Acad Sci USA.
97:11800--11806.
Recanzone GH. 2000. Spatial processing in the auditory cortex of the
macaque monkey. Proc Natl Acad Sci USA. 97:11829--11835.
Rilliard A, Aubergé V. 1998. Reiterant speech for the evaluation of
natural vs. synthetic prosody. Third ESCA/COCOSDA Workshop on
Speech Synthesis, Jenolan Caves House, Blue Mountains, Australia,
November 26--29, 1998. p. 87--92.
Schafer AJ, Speer SR, Warren P, White SD. 2000. Intonational disambiguation in sentence production and comprehension. J Psycholing Res.
29:169--182.
Schmahmann JD, Doyon J, McDonald D, Holmes C, Lavoie K, Hurwitza
AS, Kabanic N, Toga A, Evans A, Petrides M. 1999. Three-dimensional
MRI atlas of the human cerebellum in proportional stereotaxic
space. Neuroimage. 10:233--260.
Scott SK, Blank CC, Rosen S, Wise RJS. 2000. Identification of a pathway
for intelligible speech in the left temporal lobe. Brain. 123:
2400--2406.
552 Prosodic Boundaries in Natural and Hummed Speech
d
Ischebeck et al.
Searle J. 1969. An essay in the philosophy of language. Cambridge (MA):
Cambridge University Press.
Selkirk E. 1995. Sentence prosody: intonation, stress, and phrasing. In:
Goldsmith JA, editor. The handbook of phonological theory. Cambridge (MA): Blackwell. p. 550--569.
Selkirk E. 2000. The interaction of constraints on prosodic phrasing. In:
Horne M, editor. Prosody: theory and experiment. Dordrecht (The
Netherlands): Kluwer Academic Publishing. p. 231--262.
Spitsyna G, Warren JE, Scott SK, Turkheimer FE, Wise RJS. 2006.
Converging language streams in the human temporal lobe. J Neurosci. 26:7328--7336.
Stark CEL, Squire LR. 2001. When zero is not zero: the problem of
ambiguous baseline conditions in fMRI. Proc Natl Acad Sci USA.
98:12760--12766.
Steinhauer K. 2003. Electrophysiological correlates of prosody and
punctuation. Brain Lang. 86:142--164.
Steinhauer K, Alter K, Friederici AD. 1999. Brain potentials indicate
immediate use of prosodic cues in natural speech processing. Nat
Neurosci. 2:191--196.
Steinhauer K, Friederici AD. 2001. Prosodic boundaries, comma rules,
and brain responses: the closure positive shift in ERPs as a universal
marker for prosodic phrasing in listeners and readers. J Psycholing
Res. 30:267--295.
Strelnikov KN, Vorobyev VA, Chernigovskaya TV, Medvedeva SV. 2006.
Prosodic clues to syntactic processing—a PET and ERP study.
Neuroimage. 29:1127--1134.
Talairach J, Tournoux P. 1988. Co-planar stereotaxic atlas of the human
brain. New York: Thieme.
Tervaniemi M, Szameitat A, Kruck S, Schröger E, Alter K, De Baene W,
Friederici AD. 2006. From air oscillations to music and speech:
functional magnetic resonance imaging evidence for fine-tuned
neural networks in audition. J Neurosci. 23:8647--8652.
t’Hart RC, Cohen A. 1990. A perceptual study of intonation: an
experimental-phonetic approach to speech melody. Cambridge
(UK): Cambridge University Press.
Tian B, Reser D, Durham A, Kustov A, Rauschecker J. 2001. Functional
specialization in rhesus monkey auditory cortex. Science. 292:
290--293.
Tomasi D, Ernst T, Caparelli EC, Chang L. 2006. Common deactivation
patterns during working memory and visual attention tasks: an intrasubject fMRI study at 4 Tesla. Hum Brain Mapp. 27:694--705.
Tong Y, Gandour J, Talavage T, Wong D, Dzemidzic M, Xu Y, Li X, Lowe
M. 2005. Neural circuitry underlying sentence-level linguistic
prosody. Neuroimage. 28:417--428.
Truckenbrodt H. 2005. A short report on intonation phrase boundaries
in German. Linguist Ber. 203:273--296.
Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O,
Delcroix N, Mazoyer B, Joliot M. 2002. Automated anatomical labelling
of activations in SPM using a macroscopic anatomical parcellation of
the MNI MRI single subject brain. Neuroimage. 15:273--289.
Ungerleider L, Mishkin M. 1982. Two cortical visual systems. In: Ingle DJ,
Goodale MA, RJW, editors. Analysis of visual behavior. Cambridge
(MA): MIT Press. p. 549--586.
Van Lancker D. 1980. Cerebral lateralization of pitch cues in the
linguistic signal. Pap Linguist: Int J Hum Commun. 13:200--277.
Walker JP, Fongemie K, Daigle T. 2001. Prosodic facilitation in the
resolution of syntactic ambiguities in subjects with left and right
hemisphere damage. Brain Lang. 78:169--196.
Wildgruber D, Hertrich I, Riecker A, Erb M, Anders S, Grodd W,
Ackermann H. 2004. Distinct frontal regions subserve evaluation of
linguistic and emotional aspects of speech intonation. Cereb Cortex.
14:1384--1389.
Wong PCM. 2002. Hemispheric specialization of linguistic pitch
patterns. Brain Res Bull. 59:83--95.
Zatorre RJ, Belin P. 2001. Spectral and temporal processing in human
auditory cortex. Cereb Cortex. 11:946--953.
Zatorre RJ, Bouffard M, Ahad P, Belin P. 2002. Where is ‘where’ in the
human auditory cortex? Nat Neurosci. 5:905--909.