Download Extended PDF

Report Distinct Computational Principles Govern Multisensory Integration in Primary Sensory and Association Cortices Highlights d Multisensory interactions differ computationally across the cortical hierarchy d In visual cortex, spatial disparity controlled the influence of auditory signals d In parietal cortex, spatial disparity determined the effect of task-irrelevant signals d Only parietal cortices weighted signals by their reliability and task relevance Rohe & Noppeney, 2016, Current Biology 26, 509–514 February 22, 2016 ª2016 Elsevier Ltd All rights reserved http://dx.doi.org/10.1016/j.cub.2015.12.056 Authors Tim Rohe, Uta Noppeney Correspondence [email protected] (T.R.), [email protected] (U.N.) In Brief Rohe and Noppeney show that multisensory interactions are pervasive but governed by distinct computational principles across the cortical hierarchy. Critically, only parietal cortices integrated signals weighted by their bottom-up sensory reliabilities and top-down task relevance into multisensory spatial priority maps. Current Biology Report Distinct Computational Principles Govern Multisensory Integration in Primary Sensory and Association Cortices Tim Rohe1,2,* and Uta Noppeney1,3,* 1Max Planck Institute for Biological Cybernetics, Spemannstrasse 38, 72076 Tübingen, Germany of Psychiatry and Psychotherapy, University of Tübingen, Calwerstrasse 14, 72076 Tübingen, Germany 3Centre for Computational Neuroscience and Cognitive Robotics, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK *Correspondence: [email protected] (T.R.), [email protected] (U.N.) http://dx.doi.org/10.1016/j.cub.2015.12.056 2Department SUMMARY Human observers typically integrate sensory signals in a statistically optimal fashion into a coherent percept by weighting them in proportion to their reliabilities [1–4]. An emerging debate in neuroscience is to which extent multisensory integration emerges already in primary sensory areas or is deferred to higher-order association areas [5–9]. This fMRI study used multivariate pattern decoding to characterize the computational principles that define how auditory and visual signals are integrated into spatial representations across the cortical hierarchy. Our results reveal small multisensory influences that were limited to a spatial window of integration in primary sensory areas. By contrast, parietal cortices integrated signals weighted by their sensory reliabilities and task relevance in line with behavioral performance and principles of statistical optimality. Intriguingly, audiovisual integration in parietal cortices was attenuated for large spatial disparities when signals were unlikely to originate from a common source. Our results demonstrate that multisensory interactions in primary and association cortices are governed by distinct computational principles. In primary visual cortices, spatial disparity controlled the influence of non-visual signals on the formation of spatial representations, whereas in parietal cortices, it determined the influence of task-irrelevant signals. Critically, only parietal cortices integrated signals weighted by their bottom-up reliabilities and topdown task relevance into multisensory spatial priority maps to guide spatial orienting. RESULTS Our senses are exposed to a constant influx of signals. To make sense of this cacophony, the brain needs to solve two computational challenges: first, it needs to determine which signals emanate from a common source based on them co-occurring in time (e.g., temporal synchrony) and space (e.g., spatial disparity) [4, 10, 11]. Second, it needs to integrate signals from a common source into a statistically optimal percept by weighting them in proportion to their reliabilities [1–3]. To determine the functional relevance and computational principles that govern multisensory interactions across the cortical hierarchy, we presented five participants with synchronous audiovisual spatial signals that varied in their spatial disparity and visual reliability (Figures 1A and 1B). On each trial, participants reported their perceived location of the auditory or visual signal. The study was approved by the human research review committee of the University of Tübingen. Combining psychophysics and multivariate fMRI pattern decoding, we characterized how human observers integrate auditory and visual signals into spatial representations in terms of the audiovisual weight index wAV that quantifies the influence of the true auditory and visual locations on (1) the perceived/reported auditory and visual spatial estimates (i.e., participants’ behavioral localization responses; Figure 1C) and (2) the spatial estimates decoded from regions of interest along the auditory [12] and visual [13] dorsal processing hierarchy (Figure 2). This audiovisual weight index ranges from pure visual (90 ) to pure auditory (0 ) influence. We performed the statistics on the behavioral and neural audiovisual weight indices using a two (auditory versus visual report) 3 two (high versus low visual reliability) 3 two (large versus small spatial disparity) factorial design based on circular statistics [14]. Behavioral Results Our results demonstrate that participants integrated auditory and visual signals weighted by their reliabilities and task relevance (see Table 1; see Figure S1 for histograms of reported signal locations across all conditions). The relative influence of the visual signal on participants’ perceived location was greater for high relative to low visual reliability (main effect of visual reliability: p < 0.001; permutation testing of a likelihood ratio test statistic). Moreover, it was greater when the location of the visual signal needed to be reported than when the location of the auditory signal needed to be reported (main effect of task relevance: p < 0.001). Thus, participants flexibly adjusted the weights according to the task-relevant sensory modality. As a consequence, they reported different auditory and visual locations for identical audiovisual signals. Critically, this difference significantly increased for large (>6.6 ) relative to small (%6.6 ) spatial disparities. In other words, audiovisual Current Biology 26, 509–514, February 22, 2016 ª2016 Elsevier Ltd All rights reserved 509 Figure 1. Example Trial, Experimental Design, and Behavioral Data (A) In a ventriloquist paradigm, participants were presented with synchronous audiovisual signals originating from four possible locations along the azimuth. The visual signal was a cloud of white dots. The auditory signal was a brief burst of white noise. Participants localized either the auditory or the visual signal (n.b. for illustrational purposes the visual angles of the cloud have been scaled in a non-uniform fashion in this scheme). (B) The four-factorial experimental design manipulated (1) the location of the visual (V) signal ( 10 , 3.3 , 3.3 , and 10 ), (2) the location of the auditory (A) signal ( 10 , 3.3 , 3.3 , and 10 ), (3) the reliability of the visual signal (high [VR+] versus low [VR ] reliability as defined by the spread of the visual cloud), and (4) task relevance (auditory versus visual report). Using fMRI, we measured activation patterns to audiovisual signals of all experimental conditions from voxels of regions along the auditory and visual spatial-processing hierarchies. (C) Behavioral results: audiovisual weight index wAV (across-participants circular mean and double-bootstrapped 68% confidence interval; n = 5) was computed as the angle between the auditory and visual regression coefficients. Audiovisual weight index wAV as a function of audiovisual spatial disparity (small [%6.6 ] versus large [>6.6 ]), task relevance (auditory versus visual report), and visual reliability (high [VR+] versus low [VR ]) are shown. For a purely visual influence, wAV is 90 . For a purely auditory influence, it is 0 . See also Figure S1. integration broke down when auditory and visual signals were far apart and more likely to be caused by independent sources (i.e., a significant interaction between task relevance and spatial disparity; p = 0.015). fMRI Decoding across the Cortical Hierarchy To characterize how auditory and visual signals were integrated into spatial representations at the neural level, we combined fMRI with multivariate pattern decoding. Based on a support-vector regression model trained on audiovisual spatially congruent trials, we decoded a brain area’s spatial estimate of spatially disparate audiovisual signals. First, we ensured that we could decode the spatial estimate for congruent trials significantly better than chance in all eight regions of interest (Table S1). Using the same analysis approach as for behavioral localization responses, we then investigated how the neural audiovisual weight wAV index was affected by visual reliability, task-relevant sensory modality, and spatial disparity (Figures 2A–2D). As the two (auditory versus visual report) 3 two (high versus low visual reliability) 3 two (large versus small spatial disparity) repeatedmeasures analysis did not reveal a significant three-way interaction (Table 1), Figure 2 presents the neural audiovisual weights separately as a function of visual reliability, task relevance, and spatial disparity (Figures 2A–2C) and of both task relevance and spatial disparity (Figure 2D). Effect of Sensory Reliability on Audiovisual Integration First, we asked which regions integrate auditory and visual signals weighted by their reliability as expected from principles of statistical optimality [1–3] and participants’ behavioral localization responses (Figure 2A). Surprisingly, visual reliability did not significantly influence audiovisual weighting in lower visual or auditory areas. Only higher parietal cortices (IPS0–IPS4) were governed by the classical reliability-driven reweighting with more weight being given to the auditory signal when the visual signal was unreliable. Whereas IPS0–4 mainly represented the location of the visual signal for high visual reliability (i.e., the audiovisual weight index was approximately 90 ), its spatial estimate shifted toward the location of the concurrent auditory signal for low visual reliability. Effect of Task Relevance on Audiovisual Integration and Its Interaction with Spatial Disparity Next, we asked where auditory and visual signals were integrated into spatial representations weighted by their task relevance (Figure 2B). Whereas we found a marginally significant main effect of task relevance (i.e., visual versus auditory report) on the audiovisual weight index already in higher-order auditory areas (hA) encompassing the belt and the planum temporale, the effect emerged predominantly in higher-order association areas such as IPS0–4 (cf. Table 1). In these areas, the visual signal exerted a stronger influence on the decoded location during visual than auditory report. Thus, both planum temporale and IPS0–4 formed different spatial estimates for identical audiovisual stimuli depending on which sensory modality was attended and reported. Importantly, the difference between spatial estimates for auditory and visual report was further increased in IPS3–4, when the spatial disparity between auditory and visual signals was large (i.e., significant interaction between task relevance and spatial 510 Current Biology 26, 509–514, February 22, 2016 ª2016 Elsevier Ltd All rights reserved Effect of Spatial Disparity on Audiovisual Integration In contrast to the interaction between task relevance and spatial disparity that was found in parietal areas, we observed a main effect of spatial disparity in low-level visual areas V1 and, marginally significant, in V2 (cf. solid lines are below dotted lines in V1 and V2; Figure 2C; Table 1). Only for small spatial disparities auditory signals exerted an ‘‘attractive’’ influence on the spatial representations decoded from low-level visual areas (cf. solid lines below 90 in V1; p = 0.094 in a one-sample permutation test in Figure 2C). Likewise, we observed a limited but significant attractive influence of visual signals on spatial representations decoded from auditory areas for small spatial disparities (solid lines above 0 in A1 in Figure 2C; p = 0.032 for unidirectional hypothesis in a one-sample permutation test). These results suggest that integration in low-level sensory areas depends on auditory and visual signals co-occurring within a spatial window [15]. In short, spatial disparity controls the influence of the non-preferred sensory signals on the spatial estimates in low-level sensory areas. Note, however, that spatial disparity was inherently correlated with the eccentricity of the audiovisual signals by virtue of the factorial and spatially balanced nature of our design. Whereas signals were presented para-foveally or peripherally for smalldisparity trials, they were presented in the periphery for largedisparity trials. Interaction between Spatial Disparity and Visual Reliability For completeness, we also observed an interaction between reliability and spatial disparity in V3AB. This interaction results from a larger spatial window of integration for less-reliable sensory signals (see [16]). Basically, it is easier to determine that two signals come from different sources when the visual input is reliable. Figure 2. fMRI Results Audiovisual weight index as a function of visual reliability, task relevance, and disparity and its correlation with the corresponding behavioral weight index in the regions of interest. Audiovisual weight index wAV (across-participants circular mean and double-bootstrapped 68% confidence interval; n = 5) was computed as the angle between the auditory and visual regression coefficients. For a purely visual region, wAV is 90 . For a purely auditory region, it is 0 . Asterisks indicate the statistical significance of effects on wAV derived from a circular log-likelihood ratio statistic. (A) Audiovisual weight index wAV as a function of visual reliability (high [VR+] versus low [VR ]). (B) Audiovisual weight index wAV as a function of task relevance (auditory [A] versus visual [V] report). (C) Audiovisual weight index wAV as a function of audiovisual spatial disparity (small [%6.6 ; D ] versus large [>6.6 ; D+]). (D) Audiovisual weight index wAV in IPS3–4 as a function of task relevance and disparity. (E) Circular-circular correlation (across-participants mean after Fisher z-transformation ± SEM; n = 5) between the neural weight index wAV and the equivalent behavioral weight index in the regions of interest. See also Figure S2 and Tables S1 and S2. disparity; cf. Table 1). In other words, when audiovisual spatial disparity was large and signals were unlikely to emanate from a common event, audiovisual interactions broke down and IPS3–4 predominantly represented the location of the task-relevant signal. Thus, spatial disparity controlled the influence of the task-irrelevant signal on the spatial estimate in IPS3–4. Relation of Neural and Behavioral Weight Indices of Audiovisual Spatial Integration Finally, we asked how and where along the cortical hierarchies the neural audiovisual weights were related with the behavioral audiovisual weights. Hence, we computed the correlation between the neural and behavioral weight indices for each of the regions of interest. The correlation coefficient increased along the visual processing hierarchy culminating in IPS3–4 (Figure 2E). Likewise, in the auditory system, the correlation between neural and behavioral weights was enhanced in higher-order auditory areas relative to primary auditory cortex. To further investigate which region predominantly drove participants’ perceptual localization responses, we decoded the reported signal location from fMRI activation patterns while accounting for the physical signal location and visual reliability. IPS3–4 showed the greatest and selectively significant correlation coefficient between true and decoded reported locations (Table S1). Collectively, these results suggest that audiovisual integration processes in higher-order visual (in particular IPS3–4) and auditory areas are closely related to participants’ trial-by-trial perceived stimulus location. Controlling for Eye Movements and Hemifield of Signals as Potential Confounds To address potential concerns that our decoding results may be confounded by eye movements, we performed a series of Current Biology 26, 509–514, February 22, 2016 ª2016 Elsevier Ltd All rights reserved 511 Table 1. Statistical Significance of Main and Interaction Effects of the Factors Visual Reliability, Task Relevance, and Spatial Disparity for the Behavioral and Neural Audiovisual Weight Index wAV VR p TR p Sp VR 3 TR p VR 3 S p TR 3 S p Behavior <0.001* <0.001* 0.607 0.002* 0.186 0.015* VR 3 TR 3 S p 0.752 V1 1 0.942 0.005* 0.961 0.227 1 0.999 V2 0.227 1 0.059 0.92 0.824 0.424 0.847 V3 1 0.665 1 0.811 0.904 1 1 V3AB 0.974 0.745 0.589 0.992 0.040* 1 0.997 IPS0–2 0.022* 0.047* 1 0.103 0.430 1 1 IPS3–4 0.028* <0.001* 0.999 0.949 0.994 0.021* 0.999 hA 0.997 0.066 1 0.984 1 0.979 1 A1 0.433 0.468 0.678 0.979 1 1 0.910 p values are based on permutation tests using a circular log-likelihood ratio statistic. For the neural weight index wAV, they are corrected for multiple comparisons across the eight regions of interest. n = 5. Asterisks indicate significant p values. VR, visual reliability; TR, task relevance; S, spatial disparity. See also Tables S2 and S3. control analyses. First, we evaluated participants’ eye movements based on eye-tracking data recorded concurrently during fMRI acquisition. Fixation was well maintained throughout the experiment with post-stimulus saccades detected in only 2.293% ± 1.043% (mean ± SEM) of the trials. Moreover, four (visual location) 3 four (auditory location) 3 two (visual reliability) 3 two (task relevance) repeated measures ANOVAs performed separately for (1) % saccades or (2) % eye blinks revealed no significant main effects or interactions (Table S2). The repeated measures ANOVA on post-stimulus mean horizontal eye position (0–875 ms post-stimulus onset) revealed no significant effects either. Small trends were observed for the main effect of task relevance and visual local positions. As a further control analysis, we therefore re-performed the linear regression analyses to compute the neural weight index wAV (with fMRI-decoded spatial location as dependent variable; see the Supplemental Experimental Procedures) and included post-stimulus mean horizontal eye position as a nuisance covariate in addition to the true auditory and visual locations to predict the fMRI-decoded locations. This analysis basically replicated our initial results (Figure S2; Table S3). Finally, we investigated the effect of within/across hemifield of presentation on our results (i.e., wAV) by including a nuisance variable that coded whether the auditory and visual signals were presented in the same or different hemifield in the linear regression analysis (with fMRI-decoded spatial location as dependent variable). This analysis again basically replicated our initial results (Figure S2; Table S3). DISCUSSION This study combined psychophysics and multivariate fMRI pattern decoding to characterize how the brain integrates audiovisual signals into spatial representations along the auditory [12] and visual [13] processing hierarchies. Our results demonstrate that distinct computational principles govern audiovisual interactions in primary sensory and higher-order association areas. Accumulating evidence has demonstrated that multisensory integration is not deferred to association cortices [17–22] but starts already at the primary, putatively unisensory level [23–28] via thalamo-cortical mechanisms [27], direct connectivity between sensory areas [29], or top-down influences from higher-order association cortices [30]. Our data also reveal bidirectional audiovisual influences at the primary cortical level. In particular, the auditory location influenced the spatial estimate encoded in primary visual cortex. In line with the spatial principle of multisensory integration [15], a concurrent auditory (resp. visual) signal attracted the spatial estimate in V1 (resp. A1) only when the two signals co-occurred close in space. In other words, spatial disparity controlled the influence of auditory signals on spatial estimates in primary visual areas. Yet, even for low spatial disparity, audiovisual influences in primary sensory areas were relatively small when compared to parietal cortices. These findings dovetail nicely with previous neurophysiological studies showing about 15% ‘‘multisensory’’ neurons in primary sensory areas [31] but more than 50% in classical association areas such as intraparietal or superior temporal sulci [32]. Critically, our study did not only show that multisensory interactions increased progressively along the cortical hierarchy but that they changed their computational operations from primary sensory to higher-order parietal areas. Only higher-order parietal areas (IPS0–4) integrated auditory and visual signals weighted by their reliability in line with principles of statistical optimality [1–3]. Yet, despite profound audiovisual interactions in IPS0–4, sensory signals were not fused into one unified amodal spatial representation [1, 3]. Instead, the sensory weights in IPS0–4 depended on the sensory modality that needed to be reported (cf. Figure 2B). This context-dependent weighting of the auditory and visual signals led to different spatial estimates for identical audiovisual stimuli under visual and auditory report. Critically, spatial disparity increased this difference between the spatial estimates. When auditory and visual signals were far apart and hence likely to come from independent events, audiovisual integration was attenuated and IPS3–4 encoded predominantly the location of the task-relevant signal. Thus, IPS3–4 gracefully transitions between information integration and segregation depending on the probability of the two signals being generated by a common cause [4, 10]. A recent model-based fMRI study showed that IPS3–4 is more likely to encode spatial estimates formed by Bayesian causal inference [33] than by traditional forced fusion [1, 3] or full segregation models. Yet, the principles that drove this result remained 512 Current Biology 26, 509–514, February 22, 2016 ª2016 Elsevier Ltd All rights reserved unclear (for further discussion, see the Supplemental Experimental Procedures). Using ‘‘model-free’’ multivariate decoding, the current study reveals three fundamental computational principles that determine audiovisual integration in IPS3–4: IPS3–4 integrates sensory signals weighted by (1) their bottom-up sensory reliabilities and (2) their top-down task relevance into multisensory spatial priority maps. (3) Critically, these spatial priority maps take into account the causal structure of the environment (i.e., interaction of task relevance with spatial disparity). The influence of task relevance and its interaction with spatial disparity implicate different spatial estimates for auditory and visual report in particular when the two signals are far apart, as expected under Bayesian causal inference. Yet, as we manipulated task relevance in long blocks, attention may potentially have increased the reliability of the auditory (resp. visual) signal and thereby influenced its weight in the final IPS3–4 spatial estimate for visual (resp. auditory) report [34]. Future studies manipulating task relevance in a cuing paradigm may help to further disentangle these different explanatory mechanisms. Collectively, these computational operations enable IPS3–4 to form multisensory spatial priority maps that go functionally beyond traditional unisensory spatial priority maps [35, 36]. Multisensory spatial priority maps define attentional priority in space jointly based on the bottom-up salience and reliability of signals from multiple sensory modalities [22], their current task relevance, and their causal structure. Indeed, the behavioral relevance of the IPS3–4 spatial estimates was further indicated by the correlation between the neural and behavioral weights, which progressively increased along the auditory and visual processing hierarchies to culminate in IPS3–4 [22, 37]. Likewise, IPS3–4 was the only region where perceptual report was decoded significantly better than chance. Previous studies focusing selectively on the auditory ventriloquist illusion have highlighted the importance of the planum temporale in audiovisual integration at the perceptual level [21, 38]. Indeed, our study also revealed a high correlation between neural and behavioral weights indices in higher auditory areas. Yet, unlike IPS3–4, higher auditory areas integrated sensory signals weighted by their task relevance but were not significantly affected by visual reliability. As null results need to be interpreted with caution, future studies are needed to further define the computational similarities and differences of the planum temporale and IPS3–4 in audiovisual integration (for further discussion, see the Supplemental Experimental Procedures). In conclusion, our results reveal distinct computational principles governing multisensory interactions in primary and association cortices. In primary visual cortices, spatial disparity controlled the influence of non-visual signals on the formation of spatial estimates. In parietal cortices, it determined the influence of task-irrelevant signals. Critically, parietal cortices integrated signals weighted by their bottom-up sensory reliabilities and top-down task relevance into multisensory spatial priority maps guiding spatial orienting and effective interactions in a multisensory world. Moving beyond identifying multisensory influences toward characterizing their computational principles reveals a hierarchical organization of multisensory perception in human neocortex. Our results demonstrate that multisensory interactions are pervasive in human neocortex but subserve distinct computational tasks [5, 39]. SUPPLEMENTAL INFORMATION Supplemental Information includes Supplemental Experimental Procedures, two figures, and three tables and can be found with this article online at http://dx.doi.org/10.1016/j.cub.2015.12.056. AUTHOR CONTRIBUTIONS T.R. and U.N. conceived and designed the experiments. T.R. performed the experiments. T.R. and U.N. analyzed the data. T.R. and U.N. wrote the paper. ACKNOWLEDGMENTS This study was funded by the European Research Council (ERC-2012StG_20111109) and the Max Planck Society. We thank C. Frith and U. Beierholm for helpful comments on a previous draft of this manuscript, P. Dayan for helpful discussions, and P. Ehses for advice on the fMRI sequence. Received: July 31, 2015 Revised: November 6, 2015 Accepted: December 22, 2015 Published: February 4, 2016 REFERENCES 1. Ernst, M.O., and Banks, M.S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415, 429–433. 2. Fetsch, C.R., Pouget, A., DeAngelis, G.C., and Angelaki, D.E. (2012). Neural correlates of reliability-based cue weighting during multisensory integration. Nat. Neurosci. 15, 146–154. 3. Alais, D., and Burr, D. (2004). The ventriloquist effect results from nearoptimal bimodal integration. Curr. Biol. 14, 257–262. 4. Körding, K.P., Beierholm, U., Ma, W.J., Quartz, S., Tenenbaum, J.B., and Shams, L. (2007). Causal inference in multisensory perception. PLoS ONE 2, e943. 5. Ghazanfar, A.A., and Schroeder, C.E. (2006). Is neocortex essentially multisensory? Trends Cogn. Sci. 10, 278–285. 6. Kayser, C., Logothetis, N.K., and Panzeri, S. (2010). Visual enhancement of the information representation in auditory cortex. Curr. Biol. 20, 19–24. 7. Liang, M., Mouraux, A., Hu, L., and Iannetti, G.D. (2013). Primary sensory cortices contain distinguishable spatial patterns of activity for each sense. Nat. Commun. 4, 1979. 8. Meyer, K., Kaplan, J.T., Essex, R., Webber, C., Damasio, H., and Damasio, A. (2010). Predicting visual stimuli on the basis of activity in auditory cortices. Nat. Neurosci. 13, 667–668. 9. Vetter, P., Smith, F.W., and Muckli, L. (2014). Decoding sound and imagery content in early visual cortex. Curr. Biol. 24, 1256–1262. 10. Shams, L., and Beierholm, U.R. (2010). Causal inference in perception. Trends Cogn. Sci. 14, 425–432. 11. Wallace, M.T., Roberson, G.E., Hairston, W.D., Stein, B.E., Vaughan, J.W., and Schirillo, J.A. (2004). Unifying multisensory signals across time and space. Exp. Brain Res. 158, 252–258. 12. Tian, B., Reser, D., Durham, A., Kustov, A., and Rauschecker, J.P. (2001). Functional specialization in rhesus monkey auditory cortex. Science 292, 290–293. 13. Mishkin, M., Ungerleider, L.G., and Macko, K.A. (1983). Object vision and spatial vision: two cortical pathways. Trends Neurosci. 6, 414–417. 14. Anderson, C.M., and Wu, C.F.J. (1995). Measuring location effects from factorial experiments with a directional response. Int. Stat. Rev. 63, 345–363. 15. Stein, B.E., and Meredith, M.A. (1993). The Merging of the Senses (The MIT Press). 16. Rohe, T., and Noppeney, U. (2015). Sensory reliability shapes perceptual inference via two mechanisms. J. Vis. 15, 22. Current Biology 26, 509–514, February 22, 2016 ª2016 Elsevier Ltd All rights reserved 513 17. Sadaghiani, S., Maier, J.X., and Noppeney, U. (2009). Natural, metaphoric, and linguistic auditory direction signals have distinct influences on visual motion processing. J. Neurosci. 29, 6490–6499. 18. Beauchamp, M.S., Argall, B.D., Bodurka, J., Duyn, J.H., and Martin, A. (2004). Unraveling multisensory integration: patchy organization within human STS multisensory cortex. Nat. Neurosci. 7, 1190–1192. 19. Hein, G., Doehrmann, O., Müller, N.G., Kaiser, J., Muckli, L., and Naumer, M.J. (2007). Object familiarity and semantic congruency modulate responses in cortical audiovisual integration areas. J. Neurosci. 27, 7881– 7887. 20. Werner, S., and Noppeney, U. (2010). Distinct functional contributions of primary sensory and association areas to audiovisual integration in object categorization. J. Neurosci. 30, 2662–2675. 21. Bonath, B., Noesselt, T., Martinez, A., Mishra, J., Schwiecker, K., Heinze, H.J., and Hillyard, S.A. (2007). Neural basis of the ventriloquist illusion. Curr. Biol. 17, 1697–1703. 22. Macaluso, E., Driver, J., and Frith, C.D. (2003). Multimodal spatial representations engaged in human parietal cortex during both saccadic and manual spatial orienting. Curr. Biol. 13, 990–999. 23. Kayser, C., Petkov, C.I., Augath, M., and Logothetis, N.K. (2007). Functional imaging reveals visual modulation of specific fields in auditory cortex. J. Neurosci. 27, 1824–1835. 24. Noesselt, T., Rieger, J.W., Schoenfeld, M.A., Kanowski, M., Hinrichs, H., Heinze, H.J., and Driver, J. (2007). Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. J. Neurosci. 27, 11431–11441. 28. Lee, H., and Noppeney, U. (2014). Temporal prediction errors in visual and auditory cortices. Curr. Biol. 24, R309–R310. 29. Falchier, A., Clavagnier, S., Barone, P., and Kennedy, H. (2002). Anatomical evidence of multimodal integration in primate striate cortex. J. Neurosci. 22, 5749–5759. 30. Macaluso, E., and Driver, J. (2005). Multisensory spatial interactions: a window onto functional integration in the human brain. Trends Neurosci. 28, 264–271. 31. Bizley, J.K., Nodal, F.R., Bajo, V.M., Nelken, I., and King, A.J. (2007). Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cereb. Cortex 17, 2172–2189. 32. Dahl, C.D., Logothetis, N.K., and Kayser, C. (2009). Spatial organization of multisensory responses in temporal association cortex. J. Neurosci. 29, 11924–11932. 33. Rohe, T., and Noppeney, U. (2015). Cortical hierarchies perform Bayesian causal inference in multisensory perception. PLoS Biol. 13, e1002073. 34. Martinez-Trujillo, J.C., and Treue, S. (2004). Feature-based attention increases the selectivity of population responses in primate visual cortex. Curr. Biol. 14, 744–751. 35. Bisley, J.W., and Goldberg, M.E. (2010). Attention, intention, and priority in the parietal lobe. Annu. Rev. Neurosci. 33, 1–21. 36. Gottlieb, J.P., Kusunoki, M., and Goldberg, M.E. (1998). The representation of visual salience in monkey parietal cortex. Nature 391, 481–484. 25. Lewis, R., and Noppeney, U. (2010). Audiovisual synchrony improves motion discrimination via enhanced connectivity between early visual and auditory areas. J. Neurosci. 30, 12329–12339. 37. Snyder, L.H., Batista, A.P., and Andersen, R.A. (1997). Coding of intention in the posterior parietal cortex. Nature 386, 167–170. 26. Foxe, J.J., Morocz, I.A., Murray, M.M., Higgins, B.A., Javitt, D.C., and Schroeder, C.E. (2000). Multisensory auditory-somatosensory interactions in early cortical processing revealed by high-density electrical mapping. Brain Res. Cogn. Brain Res. 10, 77–83. 38. Bonath, B., Noesselt, T., Krauel, K., Tyll, S., Tempelmann, C., and Hillyard, S.A. (2014). Audio-visual synchrony modulates the ventriloquist illusion and its neural/spatial representation in the auditory cortex. Neuroimage 98, 425–434. 27. Lakatos, P., Chen, C.M., O’Connell, M.N., Mills, A., and Schroeder, C.E. (2007). Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53, 279–292. 39. Driver, J., and Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on ‘sensory-specific’ brain regions, neural responses, and judgments. Neuron 57, 11–23. 514 Current Biology 26, 509–514, February 22, 2016 ª2016 Elsevier Ltd All rights reserved Current Biology, Volume 26 Supplemental Information Distinct Computational Principles Govern Multisensory Integration in Primary Sensory and Association Cortices Tim Rohe and Uta Noppeney Supplemental Figures Figure S1 (Related to Figure 1). Histograms of the reported signal locations (across-participants mean ± SEM, N = 5) for the 64 experimental conditions. The four auditory locations are displayed along the columns and the four visual locations along the rows. Blue indicates auditory report; red indicates visual report. (A) High visual reliability. (B) Low visual reliability. Figure S2 (Related to Figure 2). Audiovisual weight index (after controlling for eye movements, left panel, or for whether auditory and visual signals were presented in common or different hemifields, right panel) in the regions of interest. Audiovisual weight index wAV (across-participants circular mean and double-bootstrapped 68% confidence interval, N = 5) was computed as the angle between the auditory and visual regression coefficients (atan(ßV/ßA)). In order to control for horizontal eye movements, we included the post-stimulus mean horizontal eye position as a nuisance covariate in addition to the true auditory and visual locations to predict the fMRI decoded locations (left panel, (A)-(D)). In an additional analysis, we included a nuisance variable that coded whether the auditory and visual signals were presented in common or different hemifields in addition to the true auditory and visual locations in the linear regression analysis to predict the fMRI decoded locations (right panel, (E)-(H)). For a purely visual region, wAV is 90°. For a purely auditory region, it is 0°. Asterisks indicate the statistical significance of effects on wAV derived from a circular log-likelihood ratio statistic. (A, E) Audiovisual weight index wAV as a function of visual reliability (high (VR+) vs. small (VR-)). (B, F) Audiovisual weight index wAV as a function of task-relevance (auditory (A) vs. visual (V) report). (C, G) Audiovisual weight index wAV as a function of audiovisual spatial disparity (small (≤ 6.6°; D-) vs. large (> 6.6°; D+)). (D, H) Audiovisual weight index wAV in IPS3-4 as a function of task-relevance and disparity. Supplemental Tables Table S1 (Related to Figure 2). Decoding accuracies (across-participants mean Fisher z-transformed correlation coefficients) for signal location in audiovisual congruent conditions (top) and for perceptual localization reports (bottom) in the regions of interest. Region V1 V2 V3 V3AB IPS0-2 IPS3-4 hA A1 Signal location z 0.700* 0.660* 0.720* 0.620* 0.400* 0.340* 0.410* 0.140* Perceptual localization z 0.141 0.255 0.208 0.036 0.338 0.355* 0.208 0.036 Correlation coefficients were Fisher z-transformed before averaging. * = p < 0.05 in one-sample bootstrap test after multiple comparison correction across the eight regions of interest. N =5. Table S2 (Related to Figure 2 and Table 1). Main and interaction effects of the factors visual signal location (LV), auditory signal location (LA), visual reliability (VR), and task- relevance (TR) on post-stimulus eye-movement indices in repeated measures ANOVAs. Percent saccades F df1 df2 Horizontal eye position p TR 3.82 1 4 0.12 VR 0.10 1 4 0.77 LV 1.86 1.1 4.4 0.24 LA 1.06 1.2 4.6 0.37 TR X VR 0.02 1 4 0.89 TR X LV 1.70 1.0 4.2 0.26 TR X LA 1.12 1.7 6.6 0.37 VR X LV 1.02 2.0 7.8 0.40 VR X LA 1.56 1.3 5.3 0.28 LV X LA 0.79 2.3 9.1 0.50 TR X VR X LV 0.37 1.6 6.3 0.66 TR X VR X LA 0.13 1.7 6.6 0.84 TR X LV X LA 0.97 2.3 9.3 0.43 VR X LV x LA 1.08 2.2 8.6 0.39 TR X VR X LV X LA 1.63 2.4 9.7 0.25 Note: p values are Greenhouse-Geisser corrected. N = 5. Percent blinks F df1 df2 p F df1 df2 p 7.73 1.82 5.38 3.25 1.38 5.84 1.69 3.08 1.10 1.92 0.27 3.06 0.62 1.28 0.82 1 1 1.0 1.1 1 1.0 1.0 1.6 1.1 2.0 1.4 1.4 2.4 2.2 2.1 4 4 4.2 4.3 4 4.2 4.2 6.3 4.3 8.2 5.5 5.7 9.8 8.6 8.2 0.05 0.25 0.08 0.14 0.31 0.07 0.26 0.12 0.36 0.21 0.69 0.13 0.59 0.33 0.48 0.04 0.46 1.56 0.88 3.29 0.79 0.57 3.26 0.75 1.79 0.11 1.01 1.63 1.17 1.49 1 1 1.5 1.5 1 1.9 1.4 2.0 1.7 2.5 2.3 1.5 2.6 2.2 2.2 4 4 6.1 6.1 4 7.4 5.7 7.9 6.6 10.0 9.4 5.9 10.3 8.7 8.6 0.86 0.53 0.28 0.43 0.14 0.48 0.54 0.09 0.49 0.22 0.92 0.39 0.24 0.36 0.28 Table S3 (Related to Table 1). Statistical significance of main and interaction effects of the factors visual reliability (VR), task- relevance (TR) and spatial disparity (S) for the audiovisual weight index wAV when effects of the post-stimulus mean horizontal position of the eyes were controlled (upper panel) or when effects of the hemifield of the audiovisual signals were controlled (lower panel). VR TR S p p p VR X TR p VR XS p TR X S p VR X TR XS p Control of eye movements V1 1 0.938 0.975 0.207 1 0.999 0.004 V2 0.253 1 0.065 0.953 0.811 0.496 0.861 V3 0.999 0.704 1 0.873 0.906 1 1 V3AB 0.981 0.769 0.678 0.994 1 0.999 0.041 IPS0-2 0.080 1 0.146 0.406 1 1 0.035 IPS3-4 0.998 0.986 0.965 0.993 0.016 0.001 0.020 hA 1 0.301 1 0.959 1 0.953 1 A1 0.430 0.478 0.697 0.977 1 1 0.893 Control of hemifield V1 1 0.942 0.961 0.227 1 0.999 0.005 V2 0.227 1 0.064 0.920 0.824 0.424 0.847 V3 1 0.665 1 0.811 0.904 1 1 V3AB 0.974 0.745 0.589 0.992 1 0.997 0.040 IPS0-2 1 0.103 0.430 1 1 0.022 0.049 IPS3-4 0.999 0.949 0.994 0.999 0.028 <0.001 0.021 hA 0.997 0.066 1 0.984 1 0.979 1 A1 0.433 0.468 0.678 0.979 1 1 0.91 Note: p values (corrected for multiple comparisons across the eight regions of interest) were derived from maximum permutation tests using a circular log-likelihood ratio statistic. N = 5. P values in bold indicate significant values. Supplemental Experimental Procedures Participants After giving written informed consent, six healthy volunteers without a history of neurological or psychiatric disorders (all university students or graduates; 2 female; mean age 28.8 years, range 22-36 years) participated in the fMRI study. All participants had normal or corrected-to normal vision and reported normal hearing. One participant was excluded because of excessive head motion (4.21 / 3.52 STD above the mean of the translational / rotational volume-wise head motion based on the included 5 participants). The data have previously been reported [S1]. The study was approved by the human research review committee of the University of Tübingen. Stimuli The visual stimulus was a cloud of 20 white dots (diameter: 0.43° visual angle) sampled from a bivariate Gaussian with a vertical standard deviation of 2.5° and a horizontal standard deviation of 2° or 14° presented on a black background (i.e., 100% contrast). The auditory stimulus was a burst of white noise with a 5 ms on/off ramp. To create a virtual auditory spatial signal, the noise was convolved with spatially specific head-related transfer functions (HRTFs) thereby providing binaural (interaural time and amplitude differences) and monoaural spatial filtering signals. The HRTFs were pseudo-individualized by matching participants’ head width, heights, depth and circumference to the anthropometry of participants in the CIPIC database [S2]. HRTFs from the available locations in the database were interpolated to the desired location of the auditory signal. Experimental design In a spatial ventriloquist paradigm, participants were presented with synchronous, yet spatially congruent or discrepant visual and auditory signals (Fig. 1A). On each trial, visual and auditory locations were independently sampled from four possible locations along the azimuth (i.e., -10°, -3.3°, 3.3° or 10°) leading to four levels of spatial disparity (i.e., 0°, 6.6°, 13.3° or 20°). In addition, we manipulated the reliability of the visual signal by setting the horizontal standard deviation of the Gaussian cloud to 2° (high reliability) or 14° (low reliability) visual angle. In an inter-sensory selective-attention paradigm, participants either reported their auditory or visual perceived stimulus location. Hence, the 4 x 4 x 2 x 2 factorial design manipulated (1) the location of the visual stimulus ({-10°, -3.3°, 3.3°, 10°}, i.e., the mean of the Gaussian) (2) the location of the auditory stimulus ({-10°, -3.3°, 3.3°, 10°}) (3) the reliability of the visual signal ({2°,14°}, standard deviation of the Gaussian) and (4) task-relevance (auditory- / visual-selective report) (Fig. 1B). To characterize the computational principles of information integration, we reorganized these conditions into a two (visual reliability: high vs. low) x two (taskrelevance: auditory vs. visual report) x two (spatial disparity: ≤ 6.6° vs. > 6.6°) factorial design for the statistical analysis of the behavioural and fMRI data. On each trial, synchronous audiovisual spatial signals were presented for 50 ms followed by a variable inter-stimulus fixation interval from 1.75-2.75 s. Participants localized the signal in the task-relevant sensory modality as accurately as possible by pushing one of four spatially corresponding buttons. Throughout the experiment, they fixated a central cross (1.6° diameter). To maximize design efficiency, stimuli and conditions were presented in a pseudo-randomized fashion. Only the factor task-relevance was held constant within a session and counterbalanced across sessions. In each session, each of the 32 audiovisual spatial stimuli was presented exactly 11 times. 5.9% null-events were interspersed in the sequence of 352 stimuli per session. Each participant completed 20 sessions (10 auditory and 10 visual localization tasks, i.e. 110 trials for each of the 64 conditions were included in total; apart from one participant that performed 9 auditory and 11 visual localization sessions). Before the fMRI study, the participants completed one practice session outside the scanner. The number of sessions of the main experiment (i.e., 10 sessions x 2 task-contexts = 20 sessions) was determined based on a prior independent pilot study with one single subject that participated in 33 sessions in total including 17 scanning sessions for the auditory localization task. Computing the decoding performance for an increasing number of sessions demonstrated that reliable decoding performance was obtained approximately for  10 sessions. Experimental setup Audiovisual stimuli were presented using Psychtoolbox 3.09 (www.psychtoolbox.org) [S3] running under MATLAB R2010a (MathWorks). Auditory stimuli were presented at ~75 dB SPL using MR-compatible headphones (MR Confon). Visual stimuli were back-projected onto a Plexiglas screen using an LCoS projector (JVC DLA-SX21). Participants viewed the screen through an extra-wide mirror mounted on the MR head coil resulting in a horizontal visual field of approx. 76° at a viewing distance of 26 cm. Participants performed the localization task using an MR-compatible custom-built button device. Participants’ eye movements and fixation were monitored by recording the participants’ pupil location using an MR-compatible custom-build infrared camera (sampling rate 50 Hz) mounted in front of the participants’ right eye and iView software 2.2.4 (SensoMotoric Instruments). General analysis strategy for behavioural and neural data To characterize how human observers integrate auditory and visual signals into spatial representations at the behavioural and neural level, we developed a common analysis strategy for both (i) reported auditory and visual spatial estimates (i.e., participants’ behavioural localization responses) and (ii) the spatial estimates decoded from regions of interest along the auditory and visual dorsal processing hierarchy. We quantified the influence of the auditory and visual signals on the reported (behavioural) or decoded (neural) auditory or visual spatial estimates using a linear regression model. In this regression model, the perceived/decoded spatial locations were predicted by the true spatial locations separately for auditory and visual signals for each of the eight conditions in the two (visual reliability: high vs. low) x two (task-relevance: auditory vs. visual report) x two (spatial disparity: ≤ 6.6° vs. > 6.6°) factorial design. Hence, the regression model included 16 regressors in total, i.e., 8 (conditions) x 2 (true auditory or visual spatial locations). The auditory (ßA) and visual (ßV) parameter estimates quantified the influence of auditory and visual signals on the perceived/decoded signal location for a particular condition. To obtain a summary index for the relative audiovisual weights, we computed wAV as the four-quadrant inverse tangens of the visual (ßV) and auditory (ßA) parameter estimates from the regression model for each of the eight conditions based on participants’ behavioural responses (Fig. 1C) and their decoded spatial estimates across the cortical hierarchy (Fig. 2A-D). For a purely visual region, wAV is 90°. For a purely auditory region, it is 0°. Confidence intervals of the mean wAV were computed using a double bootstrap [S4] (to account for the small number of participants) for circular measures. Permutation tests of circular indices for behavioural and neural data We performed the statistics on the behavioural and neural audiovisual weight indices using a two (auditory vs. visual report) x two (high vs. low visual reliability) x two (large vs. small spatial disparity) factorial design based on the likelihood ratio statistics for circular measures (LRTS) [S5]. Similar to an analysis of variance for linear data, LRTS computes the difference in log-likelihood functions for the full model that allows differences in the mean locations of circular measures between conditions and the reduced null model that does not model any mean differences between conditions. LRTS were computed separately for the main effects and interactions. To refrain from making any parametric assumptions, we evaluated the main effects of visual reliability, task-relevance, spatial disparity and their interactions in the factorial design using permutation testing of the LRTS. To account for the within-subject repeated measures design at the 2nd random effects level, permutations were constrained to occur within each participant. For the main effects of visual reliability, task-relevance and spatial disparity, wAV values were permuted within the levels of the non-tested factors (5000 random permutations). For tests of the two-way interactions, we permuted within the levels of the third factor the simple main effects of the two factors of interest which are exchangeable under the null-hypothesis of no interaction [S6] (1024 permutations possible). For tests of the three-way interaction, values were freely permuted across all conditions [S7] (5000 random permutations). Unless otherwise stated, results are reported at p < 0.05. Note that we ensured the validity of these permutation approaches in simulations showing that simulated p values converged to a nominal alpha-level of 0.05. Behavioural data To quantify the influence of the auditory and visual signals on the perceived signal location, we computed the audiovisual weight index wAV based on participants’ localization responses. We then evaluated the effect of visual reliability, task-relevance and spatial disparity on the behavioural audiovisual weight index using permutation testing of a likelihood ratio test statistic (LRTS) (Tab. 1, for details see above). MRI data acquisition A 3T Siemens Magnetom Trio MR scanner was used to acquire both T1-weighted anatomical images and T2*weighted axial echoplanar images (EPI) with BOLD contrast (gradient echo, parallel imaging using GRAPPA with an acceleration factor of 2, TR = 2480 ms, TE = 40 ms, flip angle = 90°, FOV = 192×192 mm2, image matrix 78×78, 42 transversal slices acquired interleaved in ascending direction, voxel size = 2.5×2.5×2.5 mm3 + 0.25 mm interslice gap). In total, 353 volumes times 20 sessions were acquired for the ventriloquist paradigm, 161 volumes times 2-4 sessions for the auditory localizer and 159 volumes times 10-16 sessions for the visual retinotopic localizer resulting in approximately 18 hours of scanning in total per participant assigned over 7-11 days. The first three volumes of each session were discarded to allow for T1 equilibration effects. fMRI data analysis Ventriloquist paradigm The fMRI data were analysed with SPM8 (www.fil.ion.ucl.ac.uk/spm) [S8]. Scans from each participant were corrected for slice timing, were realigned and unwarped to correct for head motion and spatially smoothed with a Gaussian kernel of 3 mm FWHM. The time series in each voxel was high-pass filtered to 1/128 Hz. All data were analysed in native participant space. The fMRI experiment was modelled in an event-related fashion with regressors being entered into the design matrix after convolving each event-related unit impulse with a canonical hemodynamic response function and its first temporal derivative. In addition to modelling the 32 conditions in our 4 (auditory locations) x 4 (visual locations) x 2 (visual reliability) factorial design, the general linear model included the realignment parameters as nuisance covariates to account for residual motion artefacts. The factor task-relevance (visual vs. auditory report) was modelled across sessions. The parameter estimates pertaining to the canonical hemodynamic response function defined the magnitude of the BOLD response to the audiovisual stimuli in each voxel. For the multivariate decoding analysis, we extracted the parameter estimates of the canonical hemodynamic response function for each condition and session from voxels of the regions of interest (= fMRI activation vectors) defined in separate auditory and retinotopic localizer experiments (see below). Each spatial fMRI activation vector for the 64 conditions in our 4x4x2x2 factorial design was based on 11 trials within a particular session. To avoid the effects of image-wide activity changes, each spatial fMRI activation vector was normalized to have mean zero and standard deviation one. For the multivariate decoding analysis, we used a linear support-vector regression model (SVR, as implemented in LIBSVM 3.14 [S9]). For each of the eight regions of interest in the visual [S10] and auditory [S11] processing hierarchies we trained the SVR model to learn the mapping from the fMRI activation vectors to the external spatial locations based on the audiovisually congruent conditions (including conditions of auditory and visual report) from all but one session. This learnt mapping from activation pattern to external spatial location was then used to decode the spatial location from the fMRI activation vectors of the spatially congruent and incongruent audiovisual conditions of the remaining session. In a leave-one-session-out crossvalidation scheme, the training-test procedure was repeated for all sessions. The SVRs’ parameters (C and ν) were optimized using a grid search within each cross-validation fold (i.e., nested cross-validation). We then asked two questions: First, we investigated whether the true signal location of audiovisual congruent signals can be decoded successfully from the fMRI activation patterns in our regions of interest. We evaluated decoding accuracy in terms of the Pearson correlation between the true and the decoded spatial location in audiovisual congruent conditions alone (n.b. we can compute audiovisual accuracy in a meaningful way only for the congruent audiovisual trials). The Fisher z-transformed correlation coefficients for each subject were entered into a one-sample t-test and tested against zero by bootstrapping (n = 1000 bootstraps) the onesample t statistic (Tab. S1) [S12]. To correct for multiple comparisons across the eight regions of interest, we compared the critical test statistic of each region against the bootstrap distribution of the maximum test statistic across the eight regions of interest [S13, S14]. Please note that the correlation coefficients, the main effects and interactions each tested for effects of interest and addressed a priori hypotheses. Therefore, we did not correct the p values for the number of tested effects in our behavioural and fMRI analysis, but only corrected our fMRI analysis for the number of regions investigated using the bootstrapping procedure described above. Upon reviewers’ suggestion, we also investigated audiovisual spatial representations in prefrontal regions, in particular in the frontal eye fields, in addition to our eight regions of interest along the auditory and visual processing hierarchies. However, the decoding accuracies for audiovisual congruent conditions were not significantly better than chance in any of the prefrontal regions. Therefore, we did not characterize the spatial representations in prefrontal areas further. Second, we investigated how the eight regions of interest in the visual [S10] and auditory [S11] processing hierarchies integrate auditory and visual signals into spatial representations. For this, we focused on the decoded spatial locations of the spatially congruent and incongruent conditions that provide information about how a brain region combines visual and auditory spatial signals into spatial representations as a result of our SVR decoding procedure. Using the same regression analysis approach as for behavioural localization responses, we quantified the influence of the true auditory and visual signal location on the decoded spatial estimates for each region (see section General analysis strategy for behavioural and neural data). To increase the efficiency of the parameter estimates in higher-order association areas, we pooled the decoded spatial locations in IPS0-2 and IPS3-4 in the regression model (see [S15] for a similar approach). We computed the ‘neural’ relative audiovisual weight wAV as the angle between the auditory and visual parameter estimates of the linear regression and computed confidence intervals of mean wAV using a double bootstrap [S4] (Fig. 2). Next, we evaluated the main effects of visual reliability, task-relevance, spatial disparity and their interactions using the same permutation testing of the LRTS that was used for the behavioural data [S5] (Tab. 1). To correct for multiple comparisons across the eight regions of interest, we compared the critical test statistic of each region against the permutation distribution of the maximum test statistic across the eight regions of interest [S13, S14]. Further, we identified multisensory influences in primary sensory areas (i.e., auditory influence on V1 and visual influence on A1) for small spatial discrepancies. More specifically, we tested whether wAV was smaller than 90° (in V1) or larger than 0° (in A1) for small spatial discrepancies (≤ 6.6°) [S16] using a one-sided one-sample permutation test, where we permuted the sign of the circular distance of wAV (averaged across taskrelevance and spatial disparity) from the critical value in each subject (i.e., 32 permutations). To correlate the neural and behavioural audiovisual weight indices, we computed the circular-circular correlation coefficient [S17] between neural and behavioural weight indices over the conditions in the two (visual reliability: high vs. low) x two (task-relevance: auditory vs. visual report) x two (spatial disparity: ≤ 6.6° vs. > 6.6°) factorial design for each participant (Fig. 2E). The correlation coefficients were averaged across participants after Fisher’s z-transformation. To further test which region drives participants’ perceptual localization reports, we decoded the reported signal location from fMRI data. Therefore, we supplemented our first-level general linear models (see above) with four regressors each coding for one of the four possible reported signal locations (e.g., regressor 1 was a dummy variable coding the onset of each trial where participants reported location 1). Thus, each regressor coded one response location unconfounded by the true physical location of the auditory and visual signal components. As before, we used a linear support-vector regression model with leave-one-session-out cross-validation to decode the reported signals locations from fMRI activation patterns of the parameter estimates pertaining to the four response regressors’ in the eight regions of interest. We tested whether the decoding accuracy as indexed by the correlation coefficient between the true spatial reports and the decoded spatial reports were better than chance using a bootstrap approach with multiple comparison correction as described for decoding accuracies above (Tab. S1). Auditory and visual retinotopic localizer Auditory and visual retinotopic localizers were used to define regions of interest along the auditory and visual processing hierarchies in a participant-specific fashion. In the auditory localizer, participants were presented with brief bursts of white noise at -10° or 10° visual angle (duration 500 ms, stimulus onset asynchrony 2 s). In a one-back task, participants indicated via a key press when the spatial location of the current trial was different from the previous trial. 20 s blocks of auditory conditions (i.e., 20 trials) alternated with 13 s fixation periods. The auditory locations were presented in a pseudorandomized fashion to optimize design efficiency. Similar to the main experiment, the auditory localizer sessions were modelled in an event-related fashion with the onset vectors of left and right auditory stimuli being entered into the design matrix after convolution with the hemodynamic response function and its first temporal derivative. Auditory responsive regions were defined as voxels in superior temporal and Heschl’s gyrus showing significant activations for auditory stimulation relative to fixation (p < 0.05, family-wise error corrected). Within these regions, we defined primary auditory cortex (A1) based on cytoarchitectonic probability maps [S18] and referred to the remainder (i.e., planum temporale and posterior superior temporal gyrus) as higher-order auditory cortex (hA). Standard phase-encoded retinotopic mapping [S19] was used to define visual regions of interest. Participants viewed a checkerboard background flickering at 7.5 Hz through a rotating wedge aperture of 70° width (polar angle mapping) or an expanding/contracting ring (eccentricity mapping). The periodicity of the apertures was 42 s. Visual responses were modelled by entering a sine and cosine convolved with the hemodynamic response function as regressors in a general linear model. The preferred polar angle was determined as the phase lag for each voxel which is the angle between the parameter estimates for the sine and the cosine. The preferred phase lags for each voxel were projected on the reconstructed, inflated cortical surface using Freesurfer 5.1.0 [S20]. Visual regions V1-V3, V3AB and IPS0-IPS4 were defined as phase reversal in angular retinotopic maps. IPS0-4 were defined as contiguous, approximately rectangular regions based on phase reversals along the anatomical IPS [S21]. For the decoding analyses, the auditory and visual regions were combined from the left and right hemispheres. Control analyses to account for eye movements and hemifield of signals as potential confounds Eye recordings were calibrated with standard eccentricities between ±3° and ±10° to determine the deviation from the fixation cross. Fixation position was post-hoc offset corrected. Eye position data were automatically corrected for blinks and converted to radial velocity. For each trial, the post-stimulus mean horizontal eye position and the number of saccades (defined by a radial eye-velocity threshold of 15° s-1 for a minimum of 60 ms duration and radial amplitude larger than 1°) were quantified (0-875 ms after stimulus onset). We analyzed the eye movement indices including percent saccades, percent eye blinks and the post-stimulus mean horizontal eye position in three separate four (visual location) x four (auditory location) x two (visual reliability) x two (task-relevance) repeated measures ANOVAs. As the analysis of mean horizontal eye position revealed a nonsignificant trend of task relevance (Tab. S2), we performed an additional control analysis where we included the post-stimulus mean horizontal eye position as a nuisance covariate into the regression model in addition to the true auditory and visual locations to predict the fMRI decoded locations (Fig. S2 and tab. S3). Finally, we investigated the effect of within/across hemifield of signal presentation on our results by including a nuisance covariate that coded whether the auditory and visual signals were presented in the same or different hemifield in the linear regression analysis (with fMRI decoded spatial location as dependent variable) (Fig. S2 and tab. S3). Bayesian Causal Inference: Qualitative predictions for effects of visual reliability, task-relevance and interaction between task-relevance and spatial disparity To form a reliable representation of the environment, the brain is challenged to integrate signals originating from a common source and segregate those from different sources [S22, S23]. Reliability-weighted integration is statistically optimal for signals coming from a common source. Yet, it fails to capture more natural situations where the brain is confronted with many signals that may or may not arise from a common source [S23]. Indeed, it is well-established that information integration breaks down for conflicting sensory signals [S16, S24] that are unlikely to emanate from a common source. Likewise, our study demonstrates that the perceived sound location is strongly influenced by the location of the visual signal, when spatial disparity is small. Yet, for large spatial disparities, this audiovisual bias is greatly attenuated [S25]. Bayesian Causal Inference provides a rational strategy of how the brain should arbitrate between information integration and segregation in the face of uncertainty about the causal structure of the world. Under the assumption of a common source auditory and visual spatial estimates should be combined weighted according to their reliabilities; under the hypothesis of different sources, they should be treated independently. Critically, on a particular instance the brain does not know the underlying causal structures. It needs to infer their probabilities from the sensory inputs based on audiovisual correspondences such as spatial discrepancy. The brain is then thought to compute a final estimate of the auditory (or visual) signal location by combining the spatial estimates under the two causal assumptions (i.e., common vs. independent sources) weighted by their probabilities [S22, S26]. Using model-based decoding in a spatial localization task, a recent study [S1] suggested that Bayesian Causal Inference can better account for participants’ behavioural localization responses and spatial estimates decoded from IPS3-4 than traditional forced fusion models [S27, S28]. More specifically, this previous study demonstrated that spatial estimates obtained from Bayesian Causal Inference can be more accurately decoded from IPS3-4 than spatial estimates from forced fusion models. Yet, there are several factors that could have led to higher decoding accuracies for Bayesian Causal Inference relative to forced fusion estimates. Therefore, the current study focuses on the fundamental computational principles that drive audiovisual integration in IPS3-4. Bayesian Causal Inference makes three qualitative key predictions. First, sensory signals should be integrated depending on their reliability in particular when spatial disparity is small (i.e., effect of visual reliability and potentially its interaction with disparity). Second, auditory and visual signals should only be partially integrated, because formally the Bayesian Causal Inference estimate combines the forced fusion and the full segregation estimate of the task-relevant signal. Hence, we would expect different spatial estimates for visual and auditory report (i.e., effect of task-relevance). Third, the difference between auditory and visual estimates should increase for large spatial disparities (i.e., interaction between task-relevance and spatial disparity). For large spatial disparities it is unlikely that the two signals come from a common source, so that the influence of the full segregation estimate on the final estimate is increased. Importantly, our results demonstrate that IPS3-4 is the only area that shows a qualitative profile of wAV weight index that is consistent with all of these three key predictions. These results strongly support the hypothesis that IPS3-4 integrates auditory and visual spatial signals in line with Bayesian Causal Inference. Moreover, they demonstrate how Bayesian Causal Inference enables the formation of multisensory spatial priority maps where spatial priority is jointly defined by the relative reliabilities and task-relevance of signals from multiple sensory modalities. Role of the planum temporale in audiovisual integration of spatial signals Previous studies [S29, S30] have implicated the planum temporale in the ventriloquist illusion by demonstrating that the univariate BOLD-response in the planum temporale correlates with participants’ perceived sound location. Likewise, our results suggest that the activation pattern in higher auditory areas including the planum temporale correlates with participants’ percept. Further, the neural and behavioural audiovisual weight indices correlate in higher auditory areas. We also observe a substantial effect of task-relevance in higher order auditory areas. Collectively, these results suggest that the planum temporale plays a critical role in integrating audiovisual signals into behaviourally relevant spatial estimates. However, in contrast to IPS3-4, higher auditory areas (e.g., planum temporale) do not appear to integrate audiovisual signals weighted by their reliability, which would be inconsistent with Bayesian Causal Inference. As null-results need to be interpreted with caution (in particular given non-significant effects of reliability in primary auditory areas), future studies will need to further investigate the computational similarities and differences between planum temporale and IPS3-4 in audiovisual spatial integration. We may also need to manipulate auditory reliability instead of visual reliability for direct comparison of reliability reweighting across auditory and visual areas. Potentially, planum temporale and IPS3-4, both higher-order sensory processing areas, play similar roles in integrating auditory and visual signals into spatial representations. Supplemental References S1. S2. S3. S4. S5. S6. S7. S8. S9. S10. S11. S12. S13. S14. S15. S16. S17. S18. S19. S20. S21. S22. S23. S24. S25. S26. S27. S28. S29. Rohe, T., and Noppeney, U. (2015). Cortical Hierarchies Perform Bayesian Causal Inference in Multisensory Perception. PLoS Biol 13, e1002073. Algazi, V.R., Duda, R.O., Thompson, D.M., and Avendano, C. (2001). The cipic hrtf database. In Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the. (IEEE), pp. 99-102. Brainard, D.H. (1997). The psychophysics toolbox. Spatial vision 10, 433-436. Martin, M.A. (1990). On the double bootstrap. In Computing Science and Statistics: Interface'90. Proceedings of the 22nd Symposium on the Interface. pp. 73-78. Anderson, C.M., and Wu, C.J. (1995). Measuring location effects from factorial experiments with a directional response. International Statistical Review 63, 345-363. Edgington, E., and Onghena, P. (2007). Randomization tests, (Boca Raton, FL: CRC Press). Gonzalez, L., and Manly, B. (1998). Analysis of variance by randomization with small data sets. Environmetrics 9, 53-65. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.P., Frith, C.D., and Frackowiak, R.S.J. (1994). Statistical parametric maps in functional imaging: a general linear approach. Human brain mapping 2, 189-210. Chang, C.C., and Lin, C.J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 27. Mishkin, M., Ungerleider, L.G., and Macko, K.A. (1983). Object vision and spatial vision: two cortical pathways. Trends in neurosciences 6, 414-417. Tian, B., Reser, D., Durham, A., Kustov, A., and Rauschecker, J.P. (2001). Functional specialization in rhesus monkey auditory cortex. Science 292, 290-293. Efron, B., and Tibshirani, R.J. (1994). An introduction to the bootstrap, (London: Chapmann and Hall). Nichols, T.E., and Holmes, A.P. (2002). Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human brain mapping 15, 1-25. Nichols, T., and Hayasaka, S. (2003). Controlling the familywise error rate in functional neuroimaging: a comparative review. Statistical methods in medical research 12, 419-446. Liu, T., Hospadaruk, L., Zhu, D.C., and Gardner, J.L. (2011). Feature-specific attentional priority signals in human cortex. J Neurosci 31, 4484-4495. Stein, B.E., and Meredith, M.A. (1993). The merging of the senses, (Cambridge, MA: The MIT Press). Berens, P. (2009). CircStat: A MATLAB toolbox for circular statistics. Journal of Statistical Software 31, 1-21. Eickhoff, S.B., Stephan, K.E., Mohlberg, H., Grefkes, C., Fink, G.R., Amunts, K., and Zilles, K. (2005). A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. NeuroImage 25, 1325-1335. Sereno, M.I., Dale, A.M., Reppas, J.B., Kwong, K.K., Belliveau, J.W., Brady, T.J., Rosen, B.R., and Tootell, R.B. (1995). Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science 268, 889-893. Dale, A.M., Fischl, B., and Sereno, M.I. (1999). Cortical surface-based analysis. I. Segmentation and surface reconstruction. NeuroImage 9, 179-194. Swisher, J.D., Halko, M.A., Merabet, L.B., McMains, S.A., and Somers, D.C. (2007). Visual topography of human intraparietal sulcus. J Neurosci 27, 5326-5337. Kording, K.P., Beierholm, U., Ma, W.J., Quartz, S., Tenenbaum, J.B., and Shams, L. (2007). Causal inference in multisensory perception. PloS one 2, e943. Shams, L., and Beierholm, U.R. (2010). Causal inference in perception. Trends Cogn Sci 14, 425-432. Welch, R.B., and Warren, D.H. (1980). Immediate perceptual response to intersensory discrepancy. Psychol Bull 88, 638-667. Bertelson, P., and Radeau, M. (1981). Cross-modal bias and perceptual fusion with auditory-visual spatial discordance. Attention, Perception, & Psychophysics 29, 578-584. Wozny, D.R., Beierholm, U.R., and Shams, L. (2010). Probability matching as a computational strategy used in perception. PLoS Comput Biol 6. Alais, D., and Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14, 257-262. Ernst, M.O., and Banks, M.S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415, 429-433. Bonath, B., Noesselt, T., Krauel, K., Tyll, S., Tempelmann, C., and Hillyard, S.A. (2014). Audio-visual synchrony modulates the ventriloquist illusion and its neural/spatial representation in the auditory cortex. NeuroImage 98, 425-434. S30. Bonath, B., Noesselt, T., Martinez, A., Mishra, J., Schwiecker, K., Heinze, H.J., and Hillyard, S.A. (2007). Neural basis of the ventriloquist illusion. Curr Biol 17, 1697-1703.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Extended PDF