Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bayesian Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course Zurich, February 2008 Bayesian segmentation and normalisation Spatial priors on activation extent Posterior probability maps (PPMs) Image time-series Realignment Kernel Design matrix Smoothing General linear model Dynamic Causal Modelling Statistical parametric map Statistical inference Normalisation Gaussian field theory p <0.05 Template Parameter estimates Overview Introduction Bayes’s rule Gaussian case Bayesian Model Comparison Bayesian inference aMRI: Segmentation and Normalisation fMRI: Posterior Probability Maps (PPMs) Spatial prior (1st level) MEEG: Source reconstruction Summary Classical approach shortcomings In SPM, the p-value reflects the probability of getting the observed data in the effect’s absence. If sufficiently small, this p-value can be used to reject the null hypothesis that the effect is negligible. p( f (Y ) | H 0 ) Probability of the data, given no activation Shortcomings of this approach: One can never accept the null hypothesis Given enough data, one can always demonstrate a significant effect at every voxel Solution: using the probability distribution of the activation given the data. p ( | Y ) Probability of the effect, given the observed data Posterior probability Baye’s Rule Given p(Y), p() and p(Y,) Conditional densities are given by Y p(Y , ) p( | Y ) p(Y ) p(Y | ) p(Y , ) p( ) Eliminating p(Y,) gives Baye’s rule Likelihood Posterior Prior p(Y | ) p( ) p( | Y ) p(Y ) Evidence Gaussian Case y Likelihood and Prior N , p N , p y | (1) (1) (1) 1 (1) (1) (1) (1) ( 2) ( 2) 1 ( 2) ( 2) Posterior Posterior Likelihood p (1) | y N m, p 1 p (1) ( 2 ) m (1) p (1) ( 2 ) p Prior ( 2) Relative Precision Weighting ( 2) m (1) Multivariate Gaussian Bayesian Inference Three steps: Formulation of a generative model likelihood p(Y|) prior distribution p() Observation of data Y Update of beliefs based upon observations, given a prior state of knowledge p(Y | ) p( ) P( | Y ) p(Y ) Bayesian Model Comparison Select the model m with the highest probability given the data: P(Y | m) p(m) p(m | Y ) p(Y ) Model evidence (marginal likelihood): p(Y | m) p(Y | m, m ) p( m | m)d m Accuracy Complexity Model comparison and Baye’s factor: p (Y | m1 ) B12 p (Y | m2 ) B12 p(m1|Y) Evidence 1 to 3 50-75 Weak 3 to 20 75-95 Positive 20 to 150 95-99 Strong 150 99 Very strong Overview Introduction Bayes’s rule Gaussian case Bayesian Model Comparison Bayesian inference aMRI: Segmentation and Normalisation fMRI: Posterior Probability Maps (PPMs) Spatial prior (1st level) MEEG: Source reconstruction Summary Bayes and Spatial Preprocessing Normalisation Deformation parameters Bayesian regularisation log p( | y ) log p( y | ) log p( ) Mean square difference between template and source image (goodness of fit) Unlikely deformation Squared distance between parameters and their expected values (regularisation) Bayes and Spatial Preprocessing Affine registration. Template image Without Bayesian constraints, the non-linear spatial normalisation can introduce unnecessary warps. Non-linear registration using Bayes. (2 = 302.7) (2 = 472.1) Non-linear registration without Bayes constraints. (2 = 287.3) Bayes and Spatial Preprocessing Segmentation Empirical priors Intensities are modelled by a mixture of K Gaussian distributions. Overlay prior belonging probability maps to assist the segmentation: Prior probability of each voxel being of a particular type is derived from segmented images of 151 subjects. Unified segmentation & normalisation Circular relationship between segmentation & normalisation: – Knowing which tissue type a voxel belongs to helps normalisation. – Knowing where a voxel is (in standard space) helps segmentation. Build a joint generative model: – model how voxel intensities result from mixture of tissue type distributions – model how tissue types of one brain have to be spatially deformed to match those of another brain Using a priori knowledge about the parameters: adopt Bayesian approach and maximise the posterior probability Ashburner & Friston 2005, NeuroImage Overview Introduction Bayes’s rule Gaussian case Bayesian Model Comparison Bayesian inference aMRI: Segmentation and Normalisation fMRI: Posterior Probability Maps (PPMs) Spatial prior (1st level) MEEG: Source reconstruction Summary Bayesian fMRI General Linear Model: Y X with N (0, C ) What are the priors? • In “classical” SPM, no (flat) priors • In “full” Bayes, priors might be from theoretical arguments or from independent data • In “empirical” Bayes, priors derive from the same data, assuming a hierarchical model for generation of the data Parameters of one level can be made priors on distribution of parameters at lower level Bayesian fMRI with spatial priors Even without applied spatial smoothing, activation maps (and maps of eg. AR coefficients) have spatial structure. Contrast AR(1) Definition of a spatial prior via Gaussian Markov Random Field Automatically spatially regularisation of Regression coefficients and AR coefficients The Generative Model General Linear Model with Auto-Regressive error terms (GLM-AR): Y=X β +E where E is an AR(p) a p(a p ) N (0, p1D1 ) p( k ) N (0, a k1 D 1 ) A p Y yt X t ai et i t i 1 Spatial prior Over the regression coefficients: p k N 0, a k1 D 1 Shrinkage prior Spatial precison: determines the amount of smoothness Spatial kernel matrix Gaussian Markov Random Field priors D 1 D 1 d ji d ij 1 1 1 on diagonal elements dii dij > 0 if voxels i and j are neighbors. 0 elsewhere Same prior on the AR coefficients. Prior, Likelihood and Posterior The prior: p ( , A, , a , ) p( k | a k ) p(a k | q1 , q2 ) p(a p | p ) p( p | r1 , r2 ) k p p(n | u1 , u2 ) n The likelihood: p(Y | , A, ) p( yn | n , an , n ) n The posterior? p( |Y) ? The posterior over doesn’t factorise over k or n. Exact inference is intractable. Variational Bayes Approximate posteriors that allows for factorisation q( , A, , a , ) q(a k | Y ) q( p | Y ) q( n | Y )q(an | Y )q(n | Y ) k p n Variational Bayes Algorithm Initialisation While (ΔF > tol) Update Suff. Stats. for β Update Suff. Stats. for A Update Suff. Stats. for λ Update Suff. Stats. for α Update Suff. Stats. for γ End Event related fMRI: familiar versus unfamiliar faces Smoothing Global prior Spatial Prior Convergence & Sensitivity ROC curve Sensitivity Convergence F Iteration Number o Global o Spatial o Smoothing 1-Specificity SPM5 Interface Posterior Probability Maps Posterior distribution: probability of getting an effect, given the data p( | y ) mean: size of effect precision: variability Posterior probability map: images of the probability or confidence that an activation exceeds some specified threshold, given the data p( | y) a p( | y ) Two thresholds: • activation threshold : percentage of whole brain mean signal (physiologically relevant size of effect) • probability a that voxels must exceed to be displayed (e.g. 95%) Posterior Probability Maps Activation threshold p( | y) a Mean (Cbeta_*.img) Posterior probability distribution p( |Y) Probability a Std dev (SDbeta_*.img) PPM (spmP_*.img) Bayesian Inference p( | y ) p( y | ) p( ) PPMs Posterior Likelihood Prior SPMs u p (t | 0) p( | y ) Bayesian test PPMs: Show activations greater than a given size t f ( y) Classical T-test SPMs: Show voxels with nonzeros activations Example: auditory dataset Active > Rest 8 6 Active != Rest 250 200 150 4 100 2 0 Overlay of effect sizes at voxels where SPM is 99% sure that the effect size is greater than 2% of the global mean 50 0 Overlay of 2 statistics: This shows voxels where the activation is different between active and rest conditions, whether positive or negative PPMs: Pros and Cons Advantages Disadvantages ■ One can infer a cause DID NOT elicit a response ■ Use of priors over voxels is computationally demanding ■ SPMs conflate effectsize and effect-variability whereas PPMs allow to make inference on the effect size of interest directly. ■ Practical benefits are yet to be established ■ Threshold requires justification Overview Introduction Bayes’s rule Gaussian case Bayesian Model Comparison Bayesian inference aMRI: Segmentation and Normalisation fMRI: Posterior Probability Maps (PPMs) Spatial prior (1st level) MEEG: Source reconstruction Summary MEG/EEG Source Reconstruction (1) Inverse procedure Distributed Source model J K Y KJ E [nxt] [nxp][pxt] Forward modelling [nxt] n : number of sensors p : number of dipoles t : number of time samples Data Y K J E - under-determined system - priors required Bayesian framework Mattout et al, 2006 MEG/EEG Source Reconstruction (2) posterior likelihood prior p(J |Y ) p(Y | J ) p(J ) U MAP ( J ) Ce 1/ 2 Y KJ likelihood 2 WJ 2 p( J ) ~ N 0, C j WMN prior 2-level hierarchical model: Y KJ E1 J 0 E2 minimum norm smoothness prior functional prior Mattout et al, 2006 C j 1 W T W E1 ~ Ν( 0,Ce ) E2 ~ Ν( 0,C p ) Summary Bayesian inference: Incorporation of some prior beliefs, Preprocessing vs. Modeling Concept of Posterior Probability Maps. Variational Bayes for single-subject analyses: Spatial prior on regression and AR coefficients Drawbacks: Computation time: MCMC, Variational Bayes. Bayesian framework also allows: Bayesian Model Comparison. References ■ Classical and Bayesian Inference, Penny and Friston, Human Brain Function (2nd edition), 2003. ■ Classical and Bayesian Inference in Neuroimaging: Theory/Applications, Friston et al., NeuroImage, 2002. ■ Posterior Probability Maps and SPMs, Friston and Penny, NeuroImage, 2003. ■ Variational Bayesian Inference for fMRI time series, Penny et al., NeuroImage, 2003. ■ Bayesian fMRI time series analysis with spatial priors, Penny et al., NeuroImage, 2005. ■ Comparing Dynamic Causal Models, Penny et al, NeuroImage, 2004.