Download Bayesian Inference - Wellcome Trust Centre for Neuroimaging

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
Bayesian Inference
Guillaume Flandin
Wellcome Trust Centre for Neuroimaging
University College London
SPM Course
Zurich, February 2008
Bayesian segmentation
and normalisation
Spatial priors
on activation extent
Posterior probability
maps (PPMs)
Image time-series
Realignment
Kernel
Design matrix
Smoothing
General linear model
Dynamic Causal
Modelling
Statistical parametric map
Statistical
inference
Normalisation
Gaussian
field theory
p <0.05
Template
Parameter estimates
Overview
 Introduction
 Bayes’s rule
 Gaussian case
 Bayesian Model Comparison
 Bayesian inference
 aMRI: Segmentation and Normalisation
 fMRI: Posterior Probability Maps (PPMs)
Spatial prior (1st level)
 MEEG: Source reconstruction
 Summary
Classical approach shortcomings
In SPM, the p-value reflects the probability of getting the observed
data in the effect’s absence. If sufficiently small, this p-value can
be used to reject the null hypothesis that the effect is negligible.
p( f (Y ) | H 0 )
Probability of the
data, given no
activation
Shortcomings of this approach:
 One can never accept the null hypothesis
 Given enough data, one can always demonstrate a significant
effect at every voxel
Solution: using the probability distribution of the activation
given the data.
p ( | Y )
Probability of the effect,
given the observed data
 Posterior probability
Baye’s Rule
Given p(Y), p() and
p(Y,)

Conditional densities are given
by
Y
p(Y , )
p( | Y ) 
p(Y )
p(Y |  ) 
p(Y , )
p( )
Eliminating p(Y,) gives Baye’s rule
Likelihood
Posterior
Prior
p(Y |  ) p( )
p( | Y ) 
p(Y )
Evidence
Gaussian Case
y
Likelihood and Prior

  N  ,  
p    N  ,  
p y |  (1)
(1)
(1)
1
(1)
  (1)   (1)
 (1)   ( 2)   ( 2)
1
( 2)
( 2)
Posterior
Posterior

Likelihood


p  (1) | y
 N m, p 1
p  (1)  ( 2 )
m
(1)
p

(1)

( 2 )
p

Prior
 ( 2)
Relative Precision Weighting
 ( 2)
m
 (1)
Multivariate Gaussian
Bayesian Inference
Three steps:
 Formulation of a generative model
 likelihood p(Y|)
 prior distribution p()
 Observation of data
Y
 Update of beliefs based upon observations, given a prior
state of knowledge
p(Y |  ) p( )
P( | Y ) 
p(Y )
Bayesian Model Comparison
Select the model m with the highest probability given the data:
P(Y | m) p(m)
p(m | Y ) 
p(Y )
Model evidence (marginal likelihood):
p(Y | m)   p(Y | m, m ) p( m | m)d m
Accuracy
Complexity
Model comparison and Baye’s factor:
p (Y | m1 )
B12 
p (Y | m2 )
B12
p(m1|Y)
Evidence
1 to 3
50-75
Weak
3 to 20
75-95
Positive
20 to 150
95-99
Strong
 150
 99
Very strong
Overview
 Introduction
 Bayes’s rule
 Gaussian case
 Bayesian Model Comparison
 Bayesian inference
 aMRI: Segmentation and Normalisation
 fMRI: Posterior Probability Maps (PPMs)
Spatial prior (1st level)
 MEEG: Source reconstruction
 Summary
Bayes and Spatial Preprocessing
Normalisation
Deformation parameters
Bayesian regularisation
log p( | y )  log p( y |  )  log p( )  
Mean square difference between
template and source image
(goodness of fit)
Unlikely deformation
Squared distance between parameters
and their expected values
(regularisation)
Bayes and Spatial Preprocessing
Affine
registration.
Template
image
Without
Bayesian
constraints,
the non-linear
spatial
normalisation
can introduce
unnecessary
warps.
Non-linear
registration
using
Bayes.
(2 = 302.7)
(2 = 472.1)
Non-linear
registration
without
Bayes
constraints.
(2 = 287.3)
Bayes and Spatial Preprocessing
Segmentation
Empirical priors
 Intensities are modelled by a mixture of K Gaussian distributions.
 Overlay prior belonging probability maps to assist the segmentation:
 Prior probability of each voxel being of a particular type is derived from
segmented images of 151 subjects.
Unified segmentation & normalisation
 Circular relationship between segmentation &
normalisation:
– Knowing which tissue type a voxel belongs to helps normalisation.
– Knowing where a voxel is (in standard space) helps
segmentation.
 Build a joint generative model:
– model how voxel intensities result from mixture of tissue type
distributions
– model how tissue types of one brain have to be spatially deformed
to match those of another brain
 Using a priori knowledge about the parameters:
adopt Bayesian approach and maximise the posterior
probability
Ashburner & Friston 2005, NeuroImage
Overview
 Introduction
 Bayes’s rule
 Gaussian case
 Bayesian Model Comparison
 Bayesian inference
 aMRI: Segmentation and Normalisation
 fMRI: Posterior Probability Maps (PPMs)
Spatial prior (1st level)
 MEEG: Source reconstruction
 Summary
Bayesian fMRI
General Linear Model:
Y  X  
with
  N (0, C )
What are the priors?
• In “classical” SPM, no (flat) priors
• In “full” Bayes, priors might be from theoretical
arguments or from independent data
• In “empirical” Bayes, priors derive from the same data,
assuming a hierarchical model for generation of the data
Parameters of one level can be made priors on
distribution of parameters at lower level
Bayesian fMRI with spatial priors
Even without applied spatial smoothing, activation maps
(and maps of eg. AR coefficients) have spatial structure.
Contrast
AR(1)
 Definition of a spatial prior via Gaussian Markov Random Field
 Automatically spatially regularisation of Regression coefficients
and AR coefficients
The Generative Model
General Linear Model with Auto-Regressive error terms (GLM-AR):
Y=X β +E where E is an AR(p)
a

p(a p )  N (0,  p1D1 )
p(  k )  N (0, a k1 D 1 )


A
p
Y
yt  X t    ai et i   t
i 1
Spatial prior
Over the regression coefficients:
 

p  k  N 0, a k1 D 1
Shrinkage
prior

Spatial precison: determines
the amount of smoothness
Spatial kernel
matrix
Gaussian Markov Random Field priors D
1


D




1

d ji
d ij
1






1

1 on diagonal elements dii
dij > 0 if voxels i and j are neighbors.
0 elsewhere
Same prior on the AR coefficients.
Prior, Likelihood and Posterior
The prior:



p (  , A,  , a ,  )    p(  k | a k ) p(a k | q1 , q2 )   p(a p |  p ) p( p | r1 , r2 ) 
 k
 p



  p(n | u1 , u2 ) 
 n

The likelihood:
p(Y |  , A,  )   p( yn |  n , an , n )
n
The posterior?
p( |Y) ?
The posterior over  doesn’t factorise over k or n.
 Exact inference is intractable.
Variational Bayes
Approximate posteriors that allows for factorisation




q(  , A,  , a ,  )    q(a k | Y )   q( p | Y )   q(  n | Y )q(an | Y )q(n | Y ) 
 k
 p

 n
Variational Bayes Algorithm
Initialisation
While (ΔF > tol)
Update Suff. Stats. for β
Update Suff. Stats. for A
Update Suff. Stats. for λ
Update Suff. Stats. for α
Update Suff. Stats. for γ
End
Event related fMRI: familiar versus unfamiliar faces
Smoothing
Global prior
Spatial Prior
Convergence & Sensitivity
ROC curve
Sensitivity
Convergence
F
Iteration Number
o Global
o Spatial
o Smoothing
1-Specificity
SPM5 Interface
Posterior Probability Maps
Posterior distribution: probability of getting an effect, given the data
p(  | y )
mean: size of effect
precision: variability
Posterior probability map: images of the probability or confidence
that an activation exceeds some specified threshold, given the data

p(    | y)  a
p(  | y )

Two thresholds:
• activation threshold : percentage of whole brain mean signal
(physiologically relevant size of effect)
• probability a that voxels must exceed to be displayed (e.g. 95%)
Posterior Probability Maps
Activation threshold 
p(    | y)  a
Mean (Cbeta_*.img)
Posterior probability distribution p( |Y)
Probability a
Std dev (SDbeta_*.img)
PPM (spmP_*.img)
Bayesian Inference
p(  | y )  p( y |  ) p(  )
PPMs
Posterior
Likelihood
Prior
SPMs

u
p (t |   0)
p(  | y )

Bayesian test
PPMs: Show activations greater
than a given size
t  f ( y)
Classical T-test
SPMs: Show voxels with nonzeros activations
Example: auditory dataset
Active > Rest
8
6
Active != Rest
250
200
150
4
100
2
0
Overlay of effect sizes at voxels
where SPM is 99% sure that the
effect size is greater than 2% of the
global mean
50
0
Overlay of 2 statistics: This shows
voxels where the activation is different
between active and rest conditions,
whether positive or negative
PPMs: Pros and Cons
Advantages
Disadvantages
■ One can infer a cause
DID NOT elicit a
response
■ Use of priors over
voxels is computationally
demanding
■ SPMs conflate effectsize and effect-variability
whereas PPMs allow to
make inference on the
effect size of interest
directly.
■ Practical benefits are yet
to be established
■ Threshold requires
justification
Overview
 Introduction
 Bayes’s rule
 Gaussian case
 Bayesian Model Comparison
 Bayesian inference
 aMRI: Segmentation and Normalisation
 fMRI: Posterior Probability Maps (PPMs)
Spatial prior (1st level)
 MEEG: Source reconstruction
 Summary
MEG/EEG Source Reconstruction (1)
Inverse procedure
Distributed
Source model
J
K
Y  KJ  E
[nxt]
[nxp][pxt]
Forward modelling
[nxt]
n : number of sensors
p : number of dipoles
t : number of time samples
Data
Y  K J  E
- under-determined system
- priors required
Bayesian framework
Mattout et al, 2006
MEG/EEG Source Reconstruction (2)
posterior
likelihood
prior
p(J |Y ) p(Y | J ) p(J )
U MAP ( J )  Ce
1/ 2
Y  KJ 
likelihood
2
  WJ
2

p( J ) ~ N 0, C j
WMN prior

2-level hierarchical model:
Y  KJ  E1
J  0 E2
minimum norm
smoothness prior
functional prior
Mattout et al, 2006
C j 1  W T W
E1 ~ Ν( 0,Ce )
E2 ~ Ν( 0,C p )
Summary
Bayesian inference:
 Incorporation of some prior beliefs,
 Preprocessing vs. Modeling
 Concept of Posterior Probability Maps.
 Variational Bayes for single-subject analyses:
 Spatial prior on regression and AR coefficients
Drawbacks:
 Computation time:
MCMC, Variational Bayes.
Bayesian framework also allows:
 Bayesian Model Comparison.
References
■ Classical and Bayesian Inference, Penny and Friston,
Human Brain Function (2nd edition), 2003.
■ Classical and Bayesian Inference in Neuroimaging:
Theory/Applications, Friston et al., NeuroImage, 2002.
■ Posterior Probability Maps and SPMs, Friston and
Penny, NeuroImage, 2003.
■ Variational Bayesian Inference for fMRI time series,
Penny et al., NeuroImage, 2003.
■ Bayesian fMRI time series analysis with spatial priors,
Penny et al., NeuroImage, 2005.
■ Comparing Dynamic Causal Models, Penny et al,
NeuroImage, 2004.