Download X (1)

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Statistical Parametric
Mapping (SPM)
Talk I:
Spatial Pre-processing & Morphometry
Talk II:
General Linear Model
Talk III:
Experimental Design & Efficiency
Talk IV:
EEG/MEG
General Linear Model (GLM)
& Random Field Theory (RFT)
Rik Henson
With thanks to: Karl Friston, Andrew Holmes, Tom Nichols, Stefan Kiebel
Overview
fMRI time-series
kernel
Design matrix
Motion
correction
Smoothing
General Linear Model
Spatial
normalisation
Statistical Parametric Map
Parameter Estimates
Standard
template
Some Terminology
• SPM (“Statistical Parametric Mapping”) is a massively
univariate approach - meaning that a statistic (e.g., T-value) is
calculated for every voxel - using the “General Linear Model”
• Experimental manipulations are specified in a model (“design
matrix”) which is fit to each voxel to estimate the size of the
experimental effects (“parameter estimates”) in that voxel…
• … on which one or more hypotheses (“contrasts”) are tested to
make statistical inferences (“p-values”), correcting for multiple
comparisons across voxels (using “Random Field Theory”)
• The parametric statistics assume continuous-valued data and
additive noise that conforms to a “Gaussian” distribution
(“nonparametric” version SNPM eschews such assumptions)
Some Terminology
• SPM usually focused on “functional specialisation” - i.e.
localising different functions to different regions in the brain
• One might also be interested in “functional integration” - how
different regions (voxels) interact
• Multivariate approaches work on whole images and can identify
spatial/temporal patterns over voxels, without necessarily
specifying a design matrix (PCA, ICA)...
• … or with an experimental design matrix (PLS, CVA), or with an
explicit anatomical model of connectivity between regions “effective connectivity” - eg using Dynamic Causal Modelling
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
General Linear Model…
• Parametric statistics
•
•
•
•
•
•
•
•
•
•
one sample t-test
two sample t-test
paired t-test
all cases of the
Anova
General Linear Model
AnCova
correlation
linear regression
multiple regression
F-tests
etc…
General Linear Model
• Equation for single (and all) voxels:
yj = xj1 b1 + … xjL bL + ej
yj
xjl
bl
ej
: data for scan, j = 1…J
: explanatory variables / covariates / regressors, l = 1…L
: parameters / regression slopes / fixed effects
: residual errors, independent & identically distributed (“iid”)
(Gaussian, mean of zero and standard deviation of σ)
• Equivalent matrix form:
y = Xb + e
X
ej ~ N(0,s2)
: “design matrix” / model
Matrix Formulation
Equation for scan j
Simultaneous
equations for
scans 1.. J
Scans
Regressors
…that can be solved
for parameters b1.. L
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
General Linear Model (Estimation)
• Estimate parameters from least squares fit to data, y:
b^ = (XTX)-1XTy = X+y
(OLS estimates)
• Fitted response is:
Y = Xb
^
• Residual errors and estimated error variance are:
e^ = y - Y
s^2 = e^Te^/ df
where df are the degrees of freedom (assuming iid):
df = J - rank(X)
( R = I - XX+
e = Ry
(=J-L if X full rank)
df = trace(R) )
GLM Estimation – Geometric Perspective
y1
x1 1
y2 = x2 1
y3
x3 1
b1
e1
+ e2
b2
e3
DATA
(y1, y2, y3)
y
RESIDUALS
^
e = (e^1,e^2,e^3)T
Y = b1  X1 + b2  X2 + e
X1 (x1, x2, x3)
Y
(1,1,1)
O
X2
design space
(Y1,Y2,Y3)
General Linear Model (Inference)
• Specify contrast (hypothesis), c, a linear combination
of parameter estimates, cT b^
T
c = [1 -1 0 0]
• Calculate T-statistic for that contrast:
T = cTb^ / std(cTb^) = cTb^ / sqrt(s^2cT(XTX)-1c)
(c is a vector), or an F-statistic:
F
F = [(e0Tε0 – εTε) / (L-L0)] / [εTε / (J-L)]
where ε0 and L0 are residuals and rank resp. from
the reduced model specified by c (which is a matrix)
• Prob. of falsely rejecting Null hypothesis, H0:
cTb=0 (“p-value”)
c=
[ 2 -1 -1 0
-1 2 -1 0
-1 -1 2 0]
T-distribution
u
p (t |   0)
t  f ( y)
Simple “ANOVA-like” Example
rank(X)=3
• 12 scans, 3 conditions (1-way ANOVA)
yj = x1j b1 + x2j b2 + x3j b3 + x4j b4 + ej
where (dummy) variables:
x1j = [0,1] = condition A (first 4 scans)
x2j = [0,1] = condition B (second 4 scans)
x3j = [0,1] = condition C (third 4 scans)
x4j = [1] = grand mean
• T-contrast :
[1 -1 0 0] tests whether A>B
[-1 1 0 0] tests whether B>A
• F-contrast:
[ 2 -1 -1 0
-1 2 -1 0
-1 -1 2 0] tests main effect of A,B,C
13.9
10.4
6.2
13.9
18.0
18.1
15.1
21.8
25.7
30.8
21.2
26.7
=
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
-2.8
4.4
12.2
13.4
+
2.8
-0.7
-4.9
2.8
-0.2
-0.2
-3.2
3.5
-0.4
4.7
-4.9
0.6
c=[-1 1 0 0],
T=7.1/sqrt(12.1*0.5)
df=12-3=9, T(9)=2.9, p<.05
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
Global Effects
• May be variation in overall image
intensity from scan to scan
• Such “global” changes may confound
local / regional induced by experiment
global
• Adjust for global effects by:
- AnCova (Additive Model) - PET?
- Proportional Scaling -
AnCova
fMRI?
• Can improve statistics when
orthogonal to effects of interest (as
here)…
• …but can also worsen when effects of
interest correlated with global (as
next)
global
Scaling
global
Simple ANCOVA Example
b1 b2 b3 b4 b5
• 12 scans, 3 conditions, 1 confounding covariate
yj = x1j b1 + x2j b2 + x3j b3 + x4j b4 + x5j b5 + ej
where (dummy) variables:
x1j = [0,1] = condition A (first 4 scans)
x2j = [0,1] = condition B (second 4 scans)
x3j = [0,1] = condition C (third 4 scans)
x4j = grand mean
x5j = global signal (mean over all voxels)
(further mean-corrected over all scans)
• (Other covariates (confounds) could be movement
parameters, time during experiment, etc)
• Global correlated here with conditions (and time)
13.9
10.4
6.2
13.9
18.0
18.1
15.1
21.8
25.7
30.8
21.2
26.7
=
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
1 11
1 12
-3.0
4.5
12.7
14.3
-0.1
+
2.6
-0.7
-4.9
2.9
-0.3
-0.2
-3.1
3.7
-0.5
4.6
-4.8
0.7
c=[-1 1 0 0 0],
T=7.5/sqrt(13.6*1.6)
df=12-4=8, T(8)=1.62, p>.05
Global Effects (fMRI)
• Two types of scaling: Grand Mean scaling and Global scaling
• Grand Mean scaling is automatic, global scaling is optional
• Grand Mean scales by 100/mean over all voxels and ALL scans
(i.e, single number per session)
• Global scaling scales by 100/mean over all voxels for EACH scan
(i.e, a different scaling factor every scan)
• Problem with global scaling is that TRUE global is not (normally) known…
• …we only estimate it by the mean over voxels
• So if there is a large signal change over many voxels, the global estimate will
be confounded by local changes
• This can produce artifactual deactivations in other regions after global scaling
• Since most sources of global variability in fMRI are low frequency (drift),
high-pass filtering may be sufficient, and most people do not use global scaling
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
A word on correlation/estimability
• If any column of X is a linear
combination of any others (X is rank
deficient), some parameters cannot be
estimated uniquely (inestimable)
• … which means some contrasts cannot
be tested (eg, only if sum to zero)
• This has implications for whether
“baseline” (constant term) is explicitly
or implicitly modelled
cm = [1 0 0]

cd = [1 -1 0]

rank(X)=2
A
B A+B
“implicit”
A
B
“explicit”
A A+B

cm = [1
0]
cd = [1 
1]
cd*b = [1 -1]*b = 0.9
b1 = 1.6
b2 = 0.7
b1 = 0.9
b2 = 0.7
cd = [1 0]
cd*b = [1 0]*b = 0.9

A word on correlation/estimability
• If any column of X is a linear
combination of any others (X is rank
deficient), some parameters cannot be
estimated uniquely (inestimable)
• … which means some contrasts cannot
be tested (eg, only if sum to zero)
• This has implications for whether
“baseline” (constant term) is explicitly
or implicitly modelled
• (rank deficiency might be thought of as
perfect correlation…)
cm = [1 0 0]

cd = [1 -1 0]

rank(X)=2
A
B A+B
“explicit”
“implicit”
T= 1 1
0 1
A
A A+B
B
X(1)
*
T
=
X(2)
c(1)
*
T
=
c(2)
1 1
0 1
= [10]
[ 1 -1 ] *
A word on correlation/estimability
• When there is high (but not perfect)
correlation between regressors,
parameters can be estimated…
• …but the estimates will be inefficiently
estimated (ie highly variable)
A
• …but this will NOT tell you that, eg,
X1+X2 is highly correlated with X3…
• … so some contrasts can still be
inefficient/efficient, even though
pairwise correlations are low/high

cd = [1 -1 0]

B A+B
convolved with HRF!
• …meaning some contrasts will not lead
to very powerful tests
• SPM shows pairwise correlation
between regressors…
cm = [1 0 0]
cm = [1 0 0] ()
cd = [1 -1 0] 
A
B A+B
A word on orthogonalisation
• To remove correlation between two regressors,
you can explicitly orthogonalise one (X1) with
respect to the other (X2):
X1^ = X1 – (X2X2+)X1
(Gram-Schmidt)
Y
• Paradoxically, this will NOT change the
parameter estimate for X1, but will for X2
X1
• In other words, the parameter estimate for the
orthogonalised regressor is unchanged!
• This reflects fact that parameter estimates
automatically reflect orthogonal component of
each regressor…
• …so no need to orthogonalise, UNLESS you
have a priori reason for assigning common
variance to the other regressor
X1^
b1
X2
b2
b2 ^
A word on orthogonalisation
X1
X2
b1 = 0.9
b2 = 0.7
Orthogonalise X2
(Model M1)
X1 X2^
Orthogonalise X1
(Model M2)
b1(M1) = 1.6
b2(M1) = 0.7
T= 1 1
-1 1
X1^ X2
b1(M2) = 0.9
= b1(M1) – b2(M1)
b2(M2) = 1.1
= ( b1(M1) + b2(M1) )/2
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
fMRI Analysis
1. Scans are treated as a timeseries…
…and can be filtered to remove low-frequency (1/f) noise
2.
Effects of interest are convolved with haemodynamic response
function (HRF), to capture sluggish nature of BOLD response
3.
Scans can no longer be treated as independent observations…
…they are typically temporally autocorrelated (for TRs<8s)
(Epoch) fMRI example…
= b1
+ b2
+ e(t)
(box-car
unconvolved)
voxel timeseries
box-car function
baseline (mean)
(Epoch) fMRI example…
b1

=
+
b2
y
=
X

b
+
e
Low frequency noise
• Several causes of noise:
aliasing
– Physical (scanner drifts)
– Physiological (aliased)
• cardiac (~1 Hz)
• respiratory (~0.25 Hz)
power spectrum
noise
signal
(eg infinite 30s on-off)
power spectrum
highpass
filter
(Epoch) fMRI example…
...with highpass filter
b1
b2
b3
b4
=
b5
+
b6
b7
b8
b9
y
=
X
 b
+
e
(Epoch) fMRI example…
…fitted and adjusted data
Raw fMRI timeseries
Adjusted data
fitted box-car
highpass filtered (and scaled)
fitted high-pass filter
Residuals
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
Convolution with HRF
Unconvolved fit
Residuals

Boxcar function
Convolved fit
=
hæmodynamic response
convolved with HRF
Residuals (less structure)
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
Temporal autocorrelation…
• Because the data are typically correlated from one scan to the
next, one cannot assume the degrees of freedom (dfs) are simply
the number of scans minus the dfs used in the model – need
“effective degrees of freedom”
• In other words, the residual errors are not independent:
Y = Xb + e
e ~ N(0,s2V)
V  I, V=AA'
where A is the intrinsic autocorrelation
• Generalised least squares:
KY = KXb + Ke
Ke ~ N(0, s2V)
V = KAA'K'
(autocorrelation is a special case of “nonsphericity”…)
Temporal autocorrelation (History)
KY = KXb + Ke Ke ~ N(0, s2V)
V = KAA'K'
• One method is to estimate A, using, for example, an AR(p) model, then:
K = A-1
V=I
(allows OLS)
This “pre-whitening” is sensitive, but can be biased if K mis-estimated
• Another method (SPM99) is to smooth the data with a known autocorrelation
that swamps any intrinsic autocorrelation:
K=S
V = SAA'S’ ~ SS‘(use GLS)
Effective degrees of freedom calculated with Satterthwaite approximation
df = trace(RV)2/trace(RVRV) )
This is more robust (providing the temporal smoothing is sufficient, eg 4s
FWHM Gaussian), but less sensitive
• Most recent method (SPM2/5) is to restrict K to highpass filter, and estimate
residual autocorrelation A using voxel-wide, one-step ReML…
(
Nonsphericity and ReML
Scans
• Nonsphericity means (kind of) that:
Ce = cov(e)  s2I
cov(e)
spherical
Scans
• Nonsphericity can be modelled by set
of variance components:
Ce  1Q1 + 2Q2 + 3Q3 ...
(i are hyper-parameters)
- Non-identical (inhomogeneous):
(e.g, two groups of subjects)
Q1 
Q2 
- Non-independent (autocorrelated):
(e.g, white noise + AR(1))
Q1 
Q2 
Nonsphericity and ReML
• Joint estimation of parameters and hyperparameters requires ReML
• ReML gives (Restricted) Maximum Likelihood
(ML) estimates of (hyper)parameters, rather
than Ordinary Least Square (OLS) estimates
• ML estimates are more efficient, entail exact dfs
(no Satterthwaite approx)…
• …but computationally expensive: ReML is
iterative (unless only one hyper-parameter)
• To speed up:
– Correlation of errors (V) estimated by pooling
over voxels
– Covariance of errors (s2V) estimated by
single, voxel-specific scaling hyperparameter
Ce = ReML( yyT, X, Q )
b^ OLS = (XTX)-1XTy (= X+y)
b^ ML = (XTCe-1X)-1XTCe-1y
V = ReML(  yjyjT, X, Q )
 yy
voxel
T
ˆ1Q1 + ˆ2Q2
Nonsphericity and ReML
1.
Voxels to be pooled collected by first-pass
through data (OLS)
B
(biased if correlation structure
not stationary across voxels?)
2.
Correlation structure V estimated iteratively
using ReML once, pooling over all voxels
3.
Remaining hyper-parameter estimated using
V and ReML noniteratively, for each voxel
•
Estimation of nonsphericity is used to prewhiten the data and design matrix, W=V-1/2 (or
by KW, if highpass filter K present)
•
(which is why design matrices in SPM change
after estimation)
X
W
WX
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
T-contrasts and F-contrasts
A T-contrast is a directional test of a
unidimensional quantity (c = vector)
A F-contrast is a non-directional test of
multidimensional quantity (c = matrix)
A [1 -1] T-contrast tests whether A>B
(“one-tailed”, eg p1<.05)
A [1 -1] F-contrast tests whether A<>B
(“two-tailed”, F=T2, p2 = 2 x p1,)
F-contrasts can test more…
An F-contrast [1 0; 0 1] tests the sum of
squares of A and B (loosely the “union” of
A and B; loosely “A and/or B”)
An F-contrast [1 -1 0; 0 1 -1] tests “main
effect” of a 3-level factor in an ANOVA
Some further notes on F-contrasts:
1. Sign irrelevant (for stats; affects plots)
[1 0; 0 1] ≡ [-1 0; 0 -1] ≡ [1 0; 0 -1]
2. Scale/row order irrelevant (for stats)
[1 0; 0 1] ≡ [2 0; 0 2] ≡ [0 1; 1 0]
3. Rank relevant (for stats; affects plots):
[1 -1 0; 0 1 -1] ≡ [1 1 -2; 1 -1 0]
≡ [2 -1 -1; -1 2 -1; -1 -1 2]
(latter is SPM’s “effects of interest”)
B
F: [1 -1]
A
A
B
T: [1 -1]
B
T/F-test decision spaces
F: [1 0
0 1]
A
T-contrasts
A T statistic is ratio of a contrast, cTb, relative to the variability of that contrast:
T = cTb / std(cTb)
= cTb /sqrt(s2cT((WX)T(WX))-1c)
The map of T-values is output to the file:
spmT_*.img
* = number in Contrast Manager, eg spmT_0001.img
The contrast itself (cTb, ie, numerator, ie, linear combination of b’s) is output to:
con_*.img
* = number in Contrast Manager, eg con_0001.img
Note that T values are independent of the scaling of the contrast weights
[1 1] ≡ [2 2]
…however, the contrast value (and hence size of plots) is not, so use:
[0.5 0.5]
…if want to plot average of A and B
The Full-Monty T-test
y  Xb + e
c bˆ
t
Stˆd (cT bˆ )
T
b̂  (WX ) + Wy
W V
1 / 2
s 2V  cov( e )
T
2
T
+
+
ˆ
Stˆd (c b )  sˆ c (WX ) (WX ) c
T
cc==+1
+100000000000000000000
X
sˆ
2
(


WY  WXbˆ
)
V
2
trace( R)
R  I  WX (WX ) +
ReMLReMLestimation
estimation
F-contrasts
An F statistic is a ratio of variances, e.g: the additional variance captured by the
full model (X) relative to a “reduced” version of the model (X0)…
… where the reduced model is specified by the (null space of) the F-contrast, c
This is equivalent to the ratio of how much greater residual variance arises from
the reduced model (e0Tε0) relative to the full model (eTε) :
F = [(e0Tε0 – εTε) / (L-L0)] / [εTε / (N-L)]
(i.i.d)
where L and L0 are the dfs in the full and reduced models
The map of F-values is output to the file:
spmF_*.img
* = number in Contrast Manager, eg spmF_0001.img
The extra sum of squares is output to:
ess_*.img
* = number in Contrast Manager, eg ess_0001.img
F-contrasts
Eg: do the movement parameters explain significant variability ?
X0
X1 (b39)
X0
c’ =
Full
Model
Reduced
Model
00100000
00010000
00001000
00000100
00000010
00000001
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
Multiple comparisons…
• If n=100,000 voxels tested with
pu=0.05 of falsely rejecting Ho...
…then approx n  pu (eg 5,000) will
do so by chance (false positives, or
“type I” errors)
SPM{t}
Eg random noise
• Therefore need to “correct” p-values
for number of comparisons
• A severe correction would be a
Bonferroni, where pc = pu /n…
…but this is only appropriate when the
n tests independent…
… SPMs are smooth, meaning that
nearby voxels are correlated
=> Random Field Theory...
pu = 0.05
Gaussian
10mm FWHM
(2mm pixels)
Random Field Theory (RFT)
• Consider SPM as lattice representation of
continuous random field
• “Euler characteristic”: a topological
measure (# “components” - # “holes”)
• Euler depends on smoothness
• Smoothness estimated by covariance of
partial derivatives of residuals (expressed as
“resels” or FWHM)
• Smoothness does not have to be stationary
(for height thresholding): estimated locally
as “resels-per-voxel” (RPV)
FamilyWise Error
• Want a “Family-wise Error” (FWE) Rate of, eg, 0.05: i.e,
probability of one or more false positives in the volume
FWER = P(FWE) = P( i {Ti  u} | Ho)
= P( maxi Ti  u | Ho)
= P(One or more blobs | Ho)
 P(u  1 | Ho)
 E(u | Ho)
5%
(Euler characteristic)
E(u)  () ||1/2 (u 2 -1) exp(-u 2/2) / (2)2

()
||1/2
 Search region   R3
 volume
 roughness
(resels-per-voxel, RPV)
And so much more! Levels of Inference
• Three levels of inference:
– extreme voxel values
 voxel-level (height) inference
voxel-level: P(t  4.37) = .048
– big suprathreshold clusters
n=1
2
 cluster-level (extent) inference
– many suprathreshold clusters
 set-level inference
n=82
Parameters:
“Height” threshold, u
“Extent” threshold, k
- t > 3.09
- 12 voxels
Dimension, D
Volume, S
Smoothness, FWHM
-3
- 323 voxels
- 4.7 voxels
n=32
cluster-level: P(n  82, t  u) = 0.029
set-level: P(c  3, n  k, t  u) = 0.019
Example SPM window
(Spatial) Specificity vs. Sensitivity
Small-volume correction (SVC)
• If have an a priori region of interest, no need to correct for wholebrain!
• But can use RFT to correct for a Small Volume
• Volume can be based on:
– An anatomically-defined region
– A geometric approximation to the above (eg rhomboid/sphere)
– A functionally-defined mask (based on an ORTHOGONAL contrast!)
• Note extent of correction depends on shape (surface area) as well
as size (volume) of region (may want to smooth volume if rough)
Caveats for Random Field Theory
Lattice Image
Data

• Always need sufficient smoothness:
–FWHM smoothness 3-4 × voxel size
(more like ~10× for low-df images, else more
conservative than Bonferroni!)
• So for df’s < ~12, use more smoothing, or SNPM!
Continuous Random
Field
• For cluster-level inference:
– need reasonably strict initial height threshold
(e.g, p<.001 uncorrected)
– Stationarity of smoothness assumed
Example SPM window
!
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
False Discovery Rate (FDR)
•
FWE (for height threshold) is probability of one or more false
positive voxels in whole image (or small volume)
•
False Discovery Rate (FDR) is the probability of one of more
false positive voxels within those “declared active”
– FDR = E(V/R)
– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
Example
Noise
Signal
Signal+Noise
Example
Control of Per Comparison Rate at 10%
11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2%
Percentage of Null Pixels that are False Positives
9.5%
Control of Familywise Error Rate at 10%
Occurrence of Familywise Error
FWE
Control of False Discovery Rate at 10%
6.7%
10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2%
Percentage of Activated Pixels that are False Positives
8.7%
False Discovery Rate (FDR)
•
For any threshold, all voxels can be cross-classified:
•
Realized FDR, rFDR = V0R/(V1R+V0R) = V0R/NR
•
But only can observe NR, don’t know V1R & V0R
– We control the expected FDR = E(rFDR)
• Select desired limit q on FDR
• Order p-values, p(1)  p(2)  ...  p(V)
• Let r be largest i such that
1
Benjamini & Hochberg Procedure
p(i)  i/V  q/c(V)
• c(V) = 1 (“Positive Regression Dependency
on Subsets”)
• c(V) = i=1,...,V 1/i  log(V)+0.58
(arbitrary covariance structure)
p-value
i/V  q/c(V)
0
• Reject all hypotheses p(1), ... , p(r)
p(i)
0
i/V
1
Example SPM window
Summary for False Disovery Rate
• Adaptive, ie threshold depends on amount of signal:
– More signal, more p(i) less than i/V  q/c(V)
– (If no signal, same control as FWE)
• ...but this data-dependence also means that it is difficult to
compare results across experiments
• Not as conventional as FWE? Sociology of science…
• Rough conclusion:
FWE is more specific, less sensitive
(and very insensitive under some conditions)
FDR is less specific, more sensitive
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
Classical vs Bayesian Inference
• In classic inference, the p-value represents the
probability of the data given that the parameter
(contrast) is zero (the “null hypothesis”):
Classical T-test
u
p (t |   0)
• Two (possible) problems with this approach are:
t  f ( y)
– inability to prove null hypothesis
– very small effects can be significant
• Bayesian inference can be performed on the
probability of the parameter given the data:
• This “posterior” distribution can be “thresholded”
for a particular size effect, γ
(e.g. 1% signal change)
Bayesian test

p ( | y )

Parametric Empirical Bayes
• Bayes rule:
p(|y) = p(y|) p()
Posterior
Likelihood
(PPM)
(SPM)
Prior
• What are the priors?
– In “classical” SPM, no (flat) priors
– In “full” Bayes, priors could be theoretical
constraints, or from independent data
– In “empirical” Bayes, priors derive from
same data, assuming a hierarchical model
for generation of that data…
Hierarchical Models
• In a hierarchical model, parameters of one
level can be made priors on distribution of
parameters at lower level: “Parametric
Empirical Bayes” (Friston et al, 2002)
• The parameters and hyperparameters at each
level can be estimated using EM algorithm
(generalisation of ReML)
y
= X(1) (1) + e(1)
(1) = X(2) (2) + e(2)
…
(n-1) = X(n) (n) + e(n)
• (note parameters and hyperparameters at final
level do not differ from classical framework)
• Second-level could be subjects (a hidden
option in SPM, given computation expense)
• …or voxels (a user option SPM)…
( Ce(i) =  k(i) Qk(i) )
Posterior Probability Maps (PPMs)
• Bayes rule:
p(|y) = p(y|) p()
Posterior
Likelihood
(PPM)
(SPM)
Prior
• For PPMs in SPM5, priors come from distribution
over voxels
• If remove mean over voxels, prior mean can be
set to zero (a “shrinkage” prior)
• One can threshold posteriors for a given
probability of a contrast greater than …
Bayesian test

p ( | y )
• …to give a posterior probability map (PPM)

PPMs
vs
SPMs
rest [2.06]
rest
contrast(s)
<
PPM 2.06
SPMresults: C:\home\spm\analysis_PET
Height threshold P = 0.95
Extent threshold k = 0 voxels
SPMmip
[0, 0, 0]
<
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
60
<
SPM{T39.0}
SPMresults: C:\home\spm\analysis_PET
Height threshold T = 5.50
Extent threshold k = 0 voxels
1 4 7 10 13 16 19 22
Design matrix
3
<
4
<
SPMmip
[0, 0, 0]
<
contrast(s)
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
60
1 4 7 10 13 16 19 22
Design matrix
• Activations greater than certain amount
Voxels with non-zero activations
• Can infer no responses
Cannot “prove the null hypothesis”
• No fallacy of inference
Fallacy of inference (large df)
• Inference independent of search volume
Correct for search volume
• Computationally expensive
Computationally faster
Overview
1. General Linear Model
Design Matrix
Estimation/Contrasts
Covariates (eg global)
Estimability/Correlation
2. fMRI timeseries
Highpass filtering
HRF convolution
Autocorrelation (nonsphericity)
3. T and F-contrasts
4. Statistical Inference
Random Field Theory (FWE)
False Discovery Rate (FDR)
Posterior Probability Maps (PPM)
5. Mixed (Fixed & Random) Effects
Fixed vs. Random Effects
• Subjects can be Fixed or Random variables
• If subjects are a Fixed variable in a single design
matrix (SPM “sessions”), the error term conflates
within- and between-subject variance
– But in fMRI (unlike PET) the between-scan
variance is normally much smaller than the
between-subject variance
• If one wishes to make an inference from a subject
sample to the population, one needs to treat
subjects as a Random variable, and needs a proper
mixture of within- and between-subject variance
• In SPM, this is achieved by a two-stage procedure:
1) (Contrasts of) parameters are estimated from
a (Fixed Effect) model for each subject
2) Images of these contrasts become the data for
a second design matrix (usually simple t-test
or ANOVA)
Multi-subject Fixed Effect model
Subject 1
Subject 2
Subject 3
Subject 4
Subject 5
Subject 6
error df ~ 300
Two-stage “Summary Statistic” approach
1st-level (within-subject)
2nd-level (between-subject)

b^2
(s^22)

b^3
(s^23)

b^4
(s^24)

b^5
(s^25)

b^6
(s^26)

One-sample
t-test
contrast images of cbi
b^1
(s^21)
^s2  within-subject error
w

N=6 subjects
(error df =5)
p < 0.001 (uncorrected)
SPM{t}
^b
pop
WHEN special case of n independent
observations per subject:
var(bpop) = s2b / N + s2w / Nn
Limitations of 2-stage approach
• Summary statistic approach is a special case, valid only when
each subject’s design matrix is identical (“balanced designs”),
and underlyng error identical
• In practice, the approach is reasonably robust to unbalanced
designs (Penny, 2004)
• More generally, exact solutions (“mixed effects”) can be
obtained using hierarchical GLM and EM
• This is computationally expensive to perform at every voxel (so
not implemented in SPM…)
• …plus modelling of nonsphericity at 2nd-level can minimise
potential bias of unbalanced designs
The End