Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Parametric Mapping (SPM) Talk I: Spatial Pre-processing & Morphometry Talk II: General Linear Model Talk III: Experimental Design & Efficiency Talk IV: EEG/MEG General Linear Model (GLM) & Random Field Theory (RFT) Rik Henson With thanks to: Karl Friston, Andrew Holmes, Tom Nichols, Stefan Kiebel Overview fMRI time-series kernel Design matrix Motion correction Smoothing General Linear Model Spatial normalisation Statistical Parametric Map Parameter Estimates Standard template Some Terminology • SPM (“Statistical Parametric Mapping”) is a massively univariate approach - meaning that a statistic (e.g., T-value) is calculated for every voxel - using the “General Linear Model” • Experimental manipulations are specified in a model (“design matrix”) which is fit to each voxel to estimate the size of the experimental effects (“parameter estimates”) in that voxel… • … on which one or more hypotheses (“contrasts”) are tested to make statistical inferences (“p-values”), correcting for multiple comparisons across voxels (using “Random Field Theory”) • The parametric statistics assume continuous-valued data and additive noise that conforms to a “Gaussian” distribution (“nonparametric” version SNPM eschews such assumptions) Some Terminology • SPM usually focused on “functional specialisation” - i.e. localising different functions to different regions in the brain • One might also be interested in “functional integration” - how different regions (voxels) interact • Multivariate approaches work on whole images and can identify spatial/temporal patterns over voxels, without necessarily specifying a design matrix (PCA, ICA)... • … or with an experimental design matrix (PLS, CVA), or with an explicit anatomical model of connectivity between regions “effective connectivity” - eg using Dynamic Causal Modelling Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects General Linear Model… • Parametric statistics • • • • • • • • • • one sample t-test two sample t-test paired t-test all cases of the Anova General Linear Model AnCova correlation linear regression multiple regression F-tests etc… General Linear Model • Equation for single (and all) voxels: yj = xj1 b1 + … xjL bL + ej yj xjl bl ej : data for scan, j = 1…J : explanatory variables / covariates / regressors, l = 1…L : parameters / regression slopes / fixed effects : residual errors, independent & identically distributed (“iid”) (Gaussian, mean of zero and standard deviation of σ) • Equivalent matrix form: y = Xb + e X ej ~ N(0,s2) : “design matrix” / model Matrix Formulation Equation for scan j Simultaneous equations for scans 1.. J Scans Regressors …that can be solved for parameters b1.. L Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects General Linear Model (Estimation) • Estimate parameters from least squares fit to data, y: b^ = (XTX)-1XTy = X+y (OLS estimates) • Fitted response is: Y = Xb ^ • Residual errors and estimated error variance are: e^ = y - Y s^2 = e^Te^/ df where df are the degrees of freedom (assuming iid): df = J - rank(X) ( R = I - XX+ e = Ry (=J-L if X full rank) df = trace(R) ) GLM Estimation – Geometric Perspective y1 x1 1 y2 = x2 1 y3 x3 1 b1 e1 + e2 b2 e3 DATA (y1, y2, y3) y RESIDUALS ^ e = (e^1,e^2,e^3)T Y = b1 X1 + b2 X2 + e X1 (x1, x2, x3) Y (1,1,1) O X2 design space (Y1,Y2,Y3) General Linear Model (Inference) • Specify contrast (hypothesis), c, a linear combination of parameter estimates, cT b^ T c = [1 -1 0 0] • Calculate T-statistic for that contrast: T = cTb^ / std(cTb^) = cTb^ / sqrt(s^2cT(XTX)-1c) (c is a vector), or an F-statistic: F F = [(e0Tε0 – εTε) / (L-L0)] / [εTε / (J-L)] where ε0 and L0 are residuals and rank resp. from the reduced model specified by c (which is a matrix) • Prob. of falsely rejecting Null hypothesis, H0: cTb=0 (“p-value”) c= [ 2 -1 -1 0 -1 2 -1 0 -1 -1 2 0] T-distribution u p (t | 0) t f ( y) Simple “ANOVA-like” Example rank(X)=3 • 12 scans, 3 conditions (1-way ANOVA) yj = x1j b1 + x2j b2 + x3j b3 + x4j b4 + ej where (dummy) variables: x1j = [0,1] = condition A (first 4 scans) x2j = [0,1] = condition B (second 4 scans) x3j = [0,1] = condition C (third 4 scans) x4j = [1] = grand mean • T-contrast : [1 -1 0 0] tests whether A>B [-1 1 0 0] tests whether B>A • F-contrast: [ 2 -1 -1 0 -1 2 -1 0 -1 -1 2 0] tests main effect of A,B,C 13.9 10.4 6.2 13.9 18.0 18.1 15.1 21.8 25.7 30.8 21.2 26.7 = 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -2.8 4.4 12.2 13.4 + 2.8 -0.7 -4.9 2.8 -0.2 -0.2 -3.2 3.5 -0.4 4.7 -4.9 0.6 c=[-1 1 0 0], T=7.1/sqrt(12.1*0.5) df=12-3=9, T(9)=2.9, p<.05 Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects Global Effects • May be variation in overall image intensity from scan to scan • Such “global” changes may confound local / regional induced by experiment global • Adjust for global effects by: - AnCova (Additive Model) - PET? - Proportional Scaling - AnCova fMRI? • Can improve statistics when orthogonal to effects of interest (as here)… • …but can also worsen when effects of interest correlated with global (as next) global Scaling global Simple ANCOVA Example b1 b2 b3 b4 b5 • 12 scans, 3 conditions, 1 confounding covariate yj = x1j b1 + x2j b2 + x3j b3 + x4j b4 + x5j b5 + ej where (dummy) variables: x1j = [0,1] = condition A (first 4 scans) x2j = [0,1] = condition B (second 4 scans) x3j = [0,1] = condition C (third 4 scans) x4j = grand mean x5j = global signal (mean over all voxels) (further mean-corrected over all scans) • (Other covariates (confounds) could be movement parameters, time during experiment, etc) • Global correlated here with conditions (and time) 13.9 10.4 6.2 13.9 18.0 18.1 15.1 21.8 25.7 30.8 21.2 26.7 = 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 -3.0 4.5 12.7 14.3 -0.1 + 2.6 -0.7 -4.9 2.9 -0.3 -0.2 -3.1 3.7 -0.5 4.6 -4.8 0.7 c=[-1 1 0 0 0], T=7.5/sqrt(13.6*1.6) df=12-4=8, T(8)=1.62, p>.05 Global Effects (fMRI) • Two types of scaling: Grand Mean scaling and Global scaling • Grand Mean scaling is automatic, global scaling is optional • Grand Mean scales by 100/mean over all voxels and ALL scans (i.e, single number per session) • Global scaling scales by 100/mean over all voxels for EACH scan (i.e, a different scaling factor every scan) • Problem with global scaling is that TRUE global is not (normally) known… • …we only estimate it by the mean over voxels • So if there is a large signal change over many voxels, the global estimate will be confounded by local changes • This can produce artifactual deactivations in other regions after global scaling • Since most sources of global variability in fMRI are low frequency (drift), high-pass filtering may be sufficient, and most people do not use global scaling Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects A word on correlation/estimability • If any column of X is a linear combination of any others (X is rank deficient), some parameters cannot be estimated uniquely (inestimable) • … which means some contrasts cannot be tested (eg, only if sum to zero) • This has implications for whether “baseline” (constant term) is explicitly or implicitly modelled cm = [1 0 0] cd = [1 -1 0] rank(X)=2 A B A+B “implicit” A B “explicit” A A+B cm = [1 0] cd = [1 1] cd*b = [1 -1]*b = 0.9 b1 = 1.6 b2 = 0.7 b1 = 0.9 b2 = 0.7 cd = [1 0] cd*b = [1 0]*b = 0.9 A word on correlation/estimability • If any column of X is a linear combination of any others (X is rank deficient), some parameters cannot be estimated uniquely (inestimable) • … which means some contrasts cannot be tested (eg, only if sum to zero) • This has implications for whether “baseline” (constant term) is explicitly or implicitly modelled • (rank deficiency might be thought of as perfect correlation…) cm = [1 0 0] cd = [1 -1 0] rank(X)=2 A B A+B “explicit” “implicit” T= 1 1 0 1 A A A+B B X(1) * T = X(2) c(1) * T = c(2) 1 1 0 1 = [10] [ 1 -1 ] * A word on correlation/estimability • When there is high (but not perfect) correlation between regressors, parameters can be estimated… • …but the estimates will be inefficiently estimated (ie highly variable) A • …but this will NOT tell you that, eg, X1+X2 is highly correlated with X3… • … so some contrasts can still be inefficient/efficient, even though pairwise correlations are low/high cd = [1 -1 0] B A+B convolved with HRF! • …meaning some contrasts will not lead to very powerful tests • SPM shows pairwise correlation between regressors… cm = [1 0 0] cm = [1 0 0] () cd = [1 -1 0] A B A+B A word on orthogonalisation • To remove correlation between two regressors, you can explicitly orthogonalise one (X1) with respect to the other (X2): X1^ = X1 – (X2X2+)X1 (Gram-Schmidt) Y • Paradoxically, this will NOT change the parameter estimate for X1, but will for X2 X1 • In other words, the parameter estimate for the orthogonalised regressor is unchanged! • This reflects fact that parameter estimates automatically reflect orthogonal component of each regressor… • …so no need to orthogonalise, UNLESS you have a priori reason for assigning common variance to the other regressor X1^ b1 X2 b2 b2 ^ A word on orthogonalisation X1 X2 b1 = 0.9 b2 = 0.7 Orthogonalise X2 (Model M1) X1 X2^ Orthogonalise X1 (Model M2) b1(M1) = 1.6 b2(M1) = 0.7 T= 1 1 -1 1 X1^ X2 b1(M2) = 0.9 = b1(M1) – b2(M1) b2(M2) = 1.1 = ( b1(M1) + b2(M1) )/2 Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects fMRI Analysis 1. Scans are treated as a timeseries… …and can be filtered to remove low-frequency (1/f) noise 2. Effects of interest are convolved with haemodynamic response function (HRF), to capture sluggish nature of BOLD response 3. Scans can no longer be treated as independent observations… …they are typically temporally autocorrelated (for TRs<8s) (Epoch) fMRI example… = b1 + b2 + e(t) (box-car unconvolved) voxel timeseries box-car function baseline (mean) (Epoch) fMRI example… b1 = + b2 y = X b + e Low frequency noise • Several causes of noise: aliasing – Physical (scanner drifts) – Physiological (aliased) • cardiac (~1 Hz) • respiratory (~0.25 Hz) power spectrum noise signal (eg infinite 30s on-off) power spectrum highpass filter (Epoch) fMRI example… ...with highpass filter b1 b2 b3 b4 = b5 + b6 b7 b8 b9 y = X b + e (Epoch) fMRI example… …fitted and adjusted data Raw fMRI timeseries Adjusted data fitted box-car highpass filtered (and scaled) fitted high-pass filter Residuals Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects Convolution with HRF Unconvolved fit Residuals Boxcar function Convolved fit = hæmodynamic response convolved with HRF Residuals (less structure) Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects Temporal autocorrelation… • Because the data are typically correlated from one scan to the next, one cannot assume the degrees of freedom (dfs) are simply the number of scans minus the dfs used in the model – need “effective degrees of freedom” • In other words, the residual errors are not independent: Y = Xb + e e ~ N(0,s2V) V I, V=AA' where A is the intrinsic autocorrelation • Generalised least squares: KY = KXb + Ke Ke ~ N(0, s2V) V = KAA'K' (autocorrelation is a special case of “nonsphericity”…) Temporal autocorrelation (History) KY = KXb + Ke Ke ~ N(0, s2V) V = KAA'K' • One method is to estimate A, using, for example, an AR(p) model, then: K = A-1 V=I (allows OLS) This “pre-whitening” is sensitive, but can be biased if K mis-estimated • Another method (SPM99) is to smooth the data with a known autocorrelation that swamps any intrinsic autocorrelation: K=S V = SAA'S’ ~ SS‘(use GLS) Effective degrees of freedom calculated with Satterthwaite approximation df = trace(RV)2/trace(RVRV) ) This is more robust (providing the temporal smoothing is sufficient, eg 4s FWHM Gaussian), but less sensitive • Most recent method (SPM2/5) is to restrict K to highpass filter, and estimate residual autocorrelation A using voxel-wide, one-step ReML… ( Nonsphericity and ReML Scans • Nonsphericity means (kind of) that: Ce = cov(e) s2I cov(e) spherical Scans • Nonsphericity can be modelled by set of variance components: Ce 1Q1 + 2Q2 + 3Q3 ... (i are hyper-parameters) - Non-identical (inhomogeneous): (e.g, two groups of subjects) Q1 Q2 - Non-independent (autocorrelated): (e.g, white noise + AR(1)) Q1 Q2 Nonsphericity and ReML • Joint estimation of parameters and hyperparameters requires ReML • ReML gives (Restricted) Maximum Likelihood (ML) estimates of (hyper)parameters, rather than Ordinary Least Square (OLS) estimates • ML estimates are more efficient, entail exact dfs (no Satterthwaite approx)… • …but computationally expensive: ReML is iterative (unless only one hyper-parameter) • To speed up: – Correlation of errors (V) estimated by pooling over voxels – Covariance of errors (s2V) estimated by single, voxel-specific scaling hyperparameter Ce = ReML( yyT, X, Q ) b^ OLS = (XTX)-1XTy (= X+y) b^ ML = (XTCe-1X)-1XTCe-1y V = ReML( yjyjT, X, Q ) yy voxel T ˆ1Q1 + ˆ2Q2 Nonsphericity and ReML 1. Voxels to be pooled collected by first-pass through data (OLS) B (biased if correlation structure not stationary across voxels?) 2. Correlation structure V estimated iteratively using ReML once, pooling over all voxels 3. Remaining hyper-parameter estimated using V and ReML noniteratively, for each voxel • Estimation of nonsphericity is used to prewhiten the data and design matrix, W=V-1/2 (or by KW, if highpass filter K present) • (which is why design matrices in SPM change after estimation) X W WX Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects T-contrasts and F-contrasts A T-contrast is a directional test of a unidimensional quantity (c = vector) A F-contrast is a non-directional test of multidimensional quantity (c = matrix) A [1 -1] T-contrast tests whether A>B (“one-tailed”, eg p1<.05) A [1 -1] F-contrast tests whether A<>B (“two-tailed”, F=T2, p2 = 2 x p1,) F-contrasts can test more… An F-contrast [1 0; 0 1] tests the sum of squares of A and B (loosely the “union” of A and B; loosely “A and/or B”) An F-contrast [1 -1 0; 0 1 -1] tests “main effect” of a 3-level factor in an ANOVA Some further notes on F-contrasts: 1. Sign irrelevant (for stats; affects plots) [1 0; 0 1] ≡ [-1 0; 0 -1] ≡ [1 0; 0 -1] 2. Scale/row order irrelevant (for stats) [1 0; 0 1] ≡ [2 0; 0 2] ≡ [0 1; 1 0] 3. Rank relevant (for stats; affects plots): [1 -1 0; 0 1 -1] ≡ [1 1 -2; 1 -1 0] ≡ [2 -1 -1; -1 2 -1; -1 -1 2] (latter is SPM’s “effects of interest”) B F: [1 -1] A A B T: [1 -1] B T/F-test decision spaces F: [1 0 0 1] A T-contrasts A T statistic is ratio of a contrast, cTb, relative to the variability of that contrast: T = cTb / std(cTb) = cTb /sqrt(s2cT((WX)T(WX))-1c) The map of T-values is output to the file: spmT_*.img * = number in Contrast Manager, eg spmT_0001.img The contrast itself (cTb, ie, numerator, ie, linear combination of b’s) is output to: con_*.img * = number in Contrast Manager, eg con_0001.img Note that T values are independent of the scaling of the contrast weights [1 1] ≡ [2 2] …however, the contrast value (and hence size of plots) is not, so use: [0.5 0.5] …if want to plot average of A and B The Full-Monty T-test y Xb + e c bˆ t Stˆd (cT bˆ ) T b̂ (WX ) + Wy W V 1 / 2 s 2V cov( e ) T 2 T + + ˆ Stˆd (c b ) sˆ c (WX ) (WX ) c T cc==+1 +100000000000000000000 X sˆ 2 ( WY WXbˆ ) V 2 trace( R) R I WX (WX ) + ReMLReMLestimation estimation F-contrasts An F statistic is a ratio of variances, e.g: the additional variance captured by the full model (X) relative to a “reduced” version of the model (X0)… … where the reduced model is specified by the (null space of) the F-contrast, c This is equivalent to the ratio of how much greater residual variance arises from the reduced model (e0Tε0) relative to the full model (eTε) : F = [(e0Tε0 – εTε) / (L-L0)] / [εTε / (N-L)] (i.i.d) where L and L0 are the dfs in the full and reduced models The map of F-values is output to the file: spmF_*.img * = number in Contrast Manager, eg spmF_0001.img The extra sum of squares is output to: ess_*.img * = number in Contrast Manager, eg ess_0001.img F-contrasts Eg: do the movement parameters explain significant variability ? X0 X1 (b39) X0 c’ = Full Model Reduced Model 00100000 00010000 00001000 00000100 00000010 00000001 Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects Multiple comparisons… • If n=100,000 voxels tested with pu=0.05 of falsely rejecting Ho... …then approx n pu (eg 5,000) will do so by chance (false positives, or “type I” errors) SPM{t} Eg random noise • Therefore need to “correct” p-values for number of comparisons • A severe correction would be a Bonferroni, where pc = pu /n… …but this is only appropriate when the n tests independent… … SPMs are smooth, meaning that nearby voxels are correlated => Random Field Theory... pu = 0.05 Gaussian 10mm FWHM (2mm pixels) Random Field Theory (RFT) • Consider SPM as lattice representation of continuous random field • “Euler characteristic”: a topological measure (# “components” - # “holes”) • Euler depends on smoothness • Smoothness estimated by covariance of partial derivatives of residuals (expressed as “resels” or FWHM) • Smoothness does not have to be stationary (for height thresholding): estimated locally as “resels-per-voxel” (RPV) FamilyWise Error • Want a “Family-wise Error” (FWE) Rate of, eg, 0.05: i.e, probability of one or more false positives in the volume FWER = P(FWE) = P( i {Ti u} | Ho) = P( maxi Ti u | Ho) = P(One or more blobs | Ho) P(u 1 | Ho) E(u | Ho) 5% (Euler characteristic) E(u) () ||1/2 (u 2 -1) exp(-u 2/2) / (2)2 () ||1/2 Search region R3 volume roughness (resels-per-voxel, RPV) And so much more! Levels of Inference • Three levels of inference: – extreme voxel values voxel-level (height) inference voxel-level: P(t 4.37) = .048 – big suprathreshold clusters n=1 2 cluster-level (extent) inference – many suprathreshold clusters set-level inference n=82 Parameters: “Height” threshold, u “Extent” threshold, k - t > 3.09 - 12 voxels Dimension, D Volume, S Smoothness, FWHM -3 - 323 voxels - 4.7 voxels n=32 cluster-level: P(n 82, t u) = 0.029 set-level: P(c 3, n k, t u) = 0.019 Example SPM window (Spatial) Specificity vs. Sensitivity Small-volume correction (SVC) • If have an a priori region of interest, no need to correct for wholebrain! • But can use RFT to correct for a Small Volume • Volume can be based on: – An anatomically-defined region – A geometric approximation to the above (eg rhomboid/sphere) – A functionally-defined mask (based on an ORTHOGONAL contrast!) • Note extent of correction depends on shape (surface area) as well as size (volume) of region (may want to smooth volume if rough) Caveats for Random Field Theory Lattice Image Data • Always need sufficient smoothness: –FWHM smoothness 3-4 × voxel size (more like ~10× for low-df images, else more conservative than Bonferroni!) • So for df’s < ~12, use more smoothing, or SNPM! Continuous Random Field • For cluster-level inference: – need reasonably strict initial height threshold (e.g, p<.001 uncorrected) – Stationarity of smoothness assumed Example SPM window ! Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects False Discovery Rate (FDR) • FWE (for height threshold) is probability of one or more false positive voxels in whole image (or small volume) • False Discovery Rate (FDR) is the probability of one of more false positive voxels within those “declared active” – FDR = E(V/R) – R voxels declared active, V falsely so • Realized false discovery rate: V/R Example Noise Signal Signal+Noise Example Control of Per Comparison Rate at 10% 11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% Percentage of Null Pixels that are False Positives 9.5% Control of Familywise Error Rate at 10% Occurrence of Familywise Error FWE Control of False Discovery Rate at 10% 6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% Percentage of Activated Pixels that are False Positives 8.7% False Discovery Rate (FDR) • For any threshold, all voxels can be cross-classified: • Realized FDR, rFDR = V0R/(V1R+V0R) = V0R/NR • But only can observe NR, don’t know V1R & V0R – We control the expected FDR = E(rFDR) • Select desired limit q on FDR • Order p-values, p(1) p(2) ... p(V) • Let r be largest i such that 1 Benjamini & Hochberg Procedure p(i) i/V q/c(V) • c(V) = 1 (“Positive Regression Dependency on Subsets”) • c(V) = i=1,...,V 1/i log(V)+0.58 (arbitrary covariance structure) p-value i/V q/c(V) 0 • Reject all hypotheses p(1), ... , p(r) p(i) 0 i/V 1 Example SPM window Summary for False Disovery Rate • Adaptive, ie threshold depends on amount of signal: – More signal, more p(i) less than i/V q/c(V) – (If no signal, same control as FWE) • ...but this data-dependence also means that it is difficult to compare results across experiments • Not as conventional as FWE? Sociology of science… • Rough conclusion: FWE is more specific, less sensitive (and very insensitive under some conditions) FDR is less specific, more sensitive Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects Classical vs Bayesian Inference • In classic inference, the p-value represents the probability of the data given that the parameter (contrast) is zero (the “null hypothesis”): Classical T-test u p (t | 0) • Two (possible) problems with this approach are: t f ( y) – inability to prove null hypothesis – very small effects can be significant • Bayesian inference can be performed on the probability of the parameter given the data: • This “posterior” distribution can be “thresholded” for a particular size effect, γ (e.g. 1% signal change) Bayesian test p ( | y ) Parametric Empirical Bayes • Bayes rule: p(|y) = p(y|) p() Posterior Likelihood (PPM) (SPM) Prior • What are the priors? – In “classical” SPM, no (flat) priors – In “full” Bayes, priors could be theoretical constraints, or from independent data – In “empirical” Bayes, priors derive from same data, assuming a hierarchical model for generation of that data… Hierarchical Models • In a hierarchical model, parameters of one level can be made priors on distribution of parameters at lower level: “Parametric Empirical Bayes” (Friston et al, 2002) • The parameters and hyperparameters at each level can be estimated using EM algorithm (generalisation of ReML) y = X(1) (1) + e(1) (1) = X(2) (2) + e(2) … (n-1) = X(n) (n) + e(n) • (note parameters and hyperparameters at final level do not differ from classical framework) • Second-level could be subjects (a hidden option in SPM, given computation expense) • …or voxels (a user option SPM)… ( Ce(i) = k(i) Qk(i) ) Posterior Probability Maps (PPMs) • Bayes rule: p(|y) = p(y|) p() Posterior Likelihood (PPM) (SPM) Prior • For PPMs in SPM5, priors come from distribution over voxels • If remove mean over voxels, prior mean can be set to zero (a “shrinkage” prior) • One can threshold posteriors for a given probability of a contrast greater than … Bayesian test p ( | y ) • …to give a posterior probability map (PPM) PPMs vs SPMs rest [2.06] rest contrast(s) < PPM 2.06 SPMresults: C:\home\spm\analysis_PET Height threshold P = 0.95 Extent threshold k = 0 voxels SPMmip [0, 0, 0] < 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 60 < SPM{T39.0} SPMresults: C:\home\spm\analysis_PET Height threshold T = 5.50 Extent threshold k = 0 voxels 1 4 7 10 13 16 19 22 Design matrix 3 < 4 < SPMmip [0, 0, 0] < contrast(s) 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 60 1 4 7 10 13 16 19 22 Design matrix • Activations greater than certain amount Voxels with non-zero activations • Can infer no responses Cannot “prove the null hypothesis” • No fallacy of inference Fallacy of inference (large df) • Inference independent of search volume Correct for search volume • Computationally expensive Computationally faster Overview 1. General Linear Model Design Matrix Estimation/Contrasts Covariates (eg global) Estimability/Correlation 2. fMRI timeseries Highpass filtering HRF convolution Autocorrelation (nonsphericity) 3. T and F-contrasts 4. Statistical Inference Random Field Theory (FWE) False Discovery Rate (FDR) Posterior Probability Maps (PPM) 5. Mixed (Fixed & Random) Effects Fixed vs. Random Effects • Subjects can be Fixed or Random variables • If subjects are a Fixed variable in a single design matrix (SPM “sessions”), the error term conflates within- and between-subject variance – But in fMRI (unlike PET) the between-scan variance is normally much smaller than the between-subject variance • If one wishes to make an inference from a subject sample to the population, one needs to treat subjects as a Random variable, and needs a proper mixture of within- and between-subject variance • In SPM, this is achieved by a two-stage procedure: 1) (Contrasts of) parameters are estimated from a (Fixed Effect) model for each subject 2) Images of these contrasts become the data for a second design matrix (usually simple t-test or ANOVA) Multi-subject Fixed Effect model Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6 error df ~ 300 Two-stage “Summary Statistic” approach 1st-level (within-subject) 2nd-level (between-subject) b^2 (s^22) b^3 (s^23) b^4 (s^24) b^5 (s^25) b^6 (s^26) One-sample t-test contrast images of cbi b^1 (s^21) ^s2 within-subject error w N=6 subjects (error df =5) p < 0.001 (uncorrected) SPM{t} ^b pop WHEN special case of n independent observations per subject: var(bpop) = s2b / N + s2w / Nn Limitations of 2-stage approach • Summary statistic approach is a special case, valid only when each subject’s design matrix is identical (“balanced designs”), and underlyng error identical • In practice, the approach is reasonably robust to unbalanced designs (Penny, 2004) • More generally, exact solutions (“mixed effects”) can be obtained using hierarchical GLM and EM • This is computationally expensive to perform at every voxel (so not implemented in SPM…) • …plus modelling of nonsphericity at 2nd-level can minimise potential bias of unbalanced designs The End