Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Statistics Inference Methods & Issues: Multiple Testing, Nonparametrics, Conjunctions & Bayes Thomas Nichols, Ph.D. Department of Biostatistics University of Michigan http://www.sph.umich.edu/~nichols OHBM fMRI Course NIH Neuroinformatics / Human Brain Project June 12, 2005 Overview • Multiple Testing Problem – Which of my 100,000 voxels are “active”? • Nonparametric Inference – Can I trust my P-value at this voxel? • Conjunction Inference – Is this voxel “active” in all of my tasks? • Bayesian vs. Classical Inference – What in the world is a posterior probability? © 2005 Thomas Nichols 2 Overview • Multiple Testing Problem – Which of my 100,000 voxels are “active”? • Nonparametric Inference – Can I trust my P-value at this voxel? • Conjunction Inference – Is this voxel “active” in all of my tasks? • Bayesian vs. Classical Inference – What in the world is a posterior probability? © 2005 Thomas Nichols 3 Hypothesis Testing • Null Hypothesis H0 • Test statistic T u – t observed realization of T • level – Acceptable false positive rate Null Distribution of T – Level = P( T > u | H0 ) – Threshold u controls false positive rate at level • P-value – Assessment of t assuming H0 – P( T > t | H0 ) • Prob. of obtaining stat. as large or larger in a new experiment – P(Data|Null) not P(Null|Data) © 2005 Thomas Nichols t P-val 4 Null Distribution of T Hypothesis Testing in fMRI • Massively Univariate Modeling – Fit model at each voxel – Create statistic images of effect • Then find the signal in the image... 5 © 2005 Thomas Nichols Assessing Statistic Images Where’s the signal? High Threshold t > 5.5 Good Specificity Poor Power (risk of false negatives) Med. Threshold t > 3.5 Low Threshold t > 0.5 Poor Specificity (risk of false positives) Good Power 6 © 2005 Thomas Nichols ...but why threshold?! Blue-sky inference: What we’d like • Don’t threshold, model the signal! – Signal location? • Estimates and CI’s on (x,y,z) location ˆMag. – Signal magnitude? • CI’s on % change – Spatial extent? ˆLoc. ˆExt. space • Estimates and CI’s on activation volume • Robust to choice of cluster definition • ...but this requires an explicit spatial model © 2005 Thomas Nichols 7 Blue-sky inference: What we need • Need an explicit spatial model • No routine spatial modeling methods exist – High-dimensional mixture modeling problem – Activations don’t look like Gaussian blobs – Need realistic shapes, sparse representation • Some work by Hartvig et al., Penny et al. 8 © 2005 Thomas Nichols Real-life inference: What we get • Signal location – Local maximum – no inference – Center-of-mass – no inference • Sensitive to blob-defining-threshold • Signal magnitude – Local maximum intensity – P-values (& CI’s) • Spatial extent – Cluster volume – P-value, no CI’s • Sensitive to blob-defining-threshold 9 © 2005 Thomas Nichols Voxel-level Inference • Retain voxels above -level threshold u • Gives best spatial specificity – The null hyp. at a single voxel can be rejected u space Significant Voxels © 2005 Thomas Nichols No significant Voxels 10 Cluster-level Inference • Two step-process – Define clusters by arbitrary threshold uclus – Retain clusters larger than -level threshold k uclus space Cluster not significant © 2005 Thomas Nichols k k Cluster significant 11 Cluster-level Inference • Typically better sensitivity • Worse spatial specificity – The null hyp. of entire cluster is rejected – Only means that one or more of voxels in cluster active uclus space Cluster not significant © 2005 Thomas Nichols k k Cluster significant 12 Voxel-wise Inference & Multiple Testing Problem (MTP) • Standard Hypothesis Test – Controls Type I error of each test, at say 5% – But what if I have 100,000 voxels? 5% 0 • 5,000 false positives on average! • Must control false positive rate – What false positive rate? – Chance of 1 or more Type I errors? – Proportion of Type I errors? 14 © 2005 Thomas Nichols MTP Solutions: Measuring False Positives • Familywise Error Rate (FWER) – Familywise Error • Existence of one or more false positives – FWER is probability of familywise error • False Discovery Rate (FDR) – R voxels declared active, V falsely so • Observed false discovery rate: V/R – FDR = E(V/R) 15 © 2005 Thomas Nichols FWER MTP Solutions • Bonferroni • Maximum Distribution Methods – Random Field Theory – Permutation 16 © 2005 Thomas Nichols FWER MTP Solutions: Bonferroni • V voxels to test • Corrected Threshold – Threshold corresponding to = 0.05/V • Corrected P-value – min{ P-value V, 1 } 17 © 2005 Thomas Nichols FWER MTP Solutions • Bonferroni • Maximum Distribution Methods – Random Field Theory – Permutation 18 © 2005 Thomas Nichols FWER MTP Solutions: Controlling FWER w/ Max • FWER & distribution of maximum FWER = P(FWE) = P(One or more voxels u | Ho) = P(Max voxel u | Ho) • 100(1-)%ile of max distn controls FWER FWER = P(Max voxel u | Ho) © 2005 Thomas Nichols u 19 FWER MTP Solutions • Bonferroni • Maximum Distribution Methods – Random Field Theory – Permutation 20 © 2005 Thomas Nichols FWER MTP Solutions: Random Field Theory • Euler Characteristic u – Topological Measure • #blobs - #holes Threshold – At high thresholds, Random Field just counts blobs – FWER = P(Max voxel u | Ho) No holes = P(One or more blobs | Ho) P( 1 | H ) u o Never more E(u | Ho) than 1 blob 21 Sets Suprathreshold © 2005 Thomas Nichols RFT Details: Expected Euler Characteristic E(u) () || (u 2 -1) exp(-u 2/2) / (2)2 – Search region R3 – ( volume – || roughness • Assumptions – Multivariate Normal – Stationary* – ACF twice differentiable at 0 Only very upper tail approximates 1-Fmax(u) * Stationary – Results valid w/out stationary – More accurate when stat. holds © 2005 Thomas Nichols 22 Random Field Intuition • Corrected P-value for voxel value t Pc = P(max T > t) E(t) () || t2 exp(-t2/2) • Statistic value t increases – Pc decreases (of course!) • Search volume () increases – Pc increases (more severe MCP) • Smoothness increases (|| smaller) – Pc decreases (less severe MCP) © 2005 Thomas Nichols 25 Random Field Theory Strengths & Weaknesses • Closed form results for E(u) – Z, t, F, Chi-Squared Continuous RFs • Results depend only on volume & smoothness • Smoothness assumed known • Sufficient smoothness required • Multivariate normality • Several layers of approximations © 2005 Thomas Nichols – Results are for continuous random fields – Smoothness estimate becomes biased Lattice Image Data Continuous Random Field 26 Real Data • fMRI Study of Working Memory – 12 subjects, block design – Item Recognition Marshuetz et al (2000) • Active:View five letters, 2s pause, view probe letter, respond • Baseline: View XXXXX, 2s pause, view Y or N, respond • Second Level RFX – Difference image, A-B constructed for each subject – One sample t test Active D UBKDA yes Baseline N XXXXX no 27 © 2005 Thomas Nichols • Threshold Real Data: RFT Result • Result – 5 voxels above the threshold – 0.0063 minimum FWE-corrected p-value © 2005 Thomas Nichols -log10 p-value – S = 110,776 – 2 2 2 voxels 5.1 5.8 6.9 mm FWHM – u = 9.870 28 MTP Solutions: Measuring False Positives • Familywise Error Rate (FWER) – Familywise Error • Existence of one or more false positives – FWER is probability of familywise error • False Discovery Rate (FDR) – FDR = E(V/R) – R voxels declared active, V falsely so • Realized false discovery rate: V/R 29 © 2005 Thomas Nichols False Discovery Rate • For any threshold, all voxels can be cross-classified: Accept Null Reject Null Null True (no effect) V0A V0R m0 Null False (true effect) V1A V1R m1 NA NR V • Realized FDR rFDR = V0R/(V1R+V0R) = V0R/NR – If NR = 0, rFDR = 0 • But only can observe NR, don’t know V1R & V0R – We control the expected rFDR FDR = E(rFDR) 30 © 2005 Thomas Nichols False Discovery Rate Illustration: Noise Signal Signal+Noise 31 © 2005 Thomas Nichols Control of Per Comparison Rate at 10% 11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% Percentage of Null Pixels that are False Positives 9.5% Control of Familywise Error Rate at 10% Occurrence of Familywise Error FWE Control of False Discovery Rate at 10% 6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% Percentage of Activated Pixels that are False Positives © 2005 Thomas Nichols 8.7% 32 Benjamini & Hochberg Procedure • Select desired limit q on FDR • Order p-values, p(1) p(2) ... p(V) • Let r be largest i such that 1 JRSS-B (1995) 57:289-300 p(i) i/V q p-value i/V q 0 • Reject all hypotheses corresponding to p(1), ... , p(r). p(i) 0 1 i/V 33 © 2005 Thomas Nichols Benjamini & Hochberg Procedure Details • Method is valid under smoothness – Positive Regression Dependency on Subsets P(X1c1, X2c2, ..., Xkck | Xi=xi) is non-decreasing in xi • Only required of test statistics for which null true • Special cases include – Independence – Multivariate Normal with all positive correlations – Same, but studentized with common std. err. • For arbitrary covariance structure – Replace q with q c(V) c(V) = i=1,...,V 1/i log(V)+0.5772 Benjamini & Yekutieli (2001). Ann. Stat. 34 29:1165-1188 © 2005 Thomas Nichols Adaptiveness of Benjamini & Hochberg FDR When no signal: P-value threshold /V Ordered p-values p(i) 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 Fractional index i/V 1 When all signal: P-value threshold ...FDR adapts to the amount of signal in the data © 2005 Thomas Nichols 35 FDR Example • Threshold – Indep/PosDep u = 3.83 • Result – 3,073 voxels above Indep/PosDep u – <0.0001 minimum FDR-corrected p-value FWER Perm. Thresh. = 7.67 58 voxels © 2005 Thomas Nichols FDR Threshold = 3.83 3,073 voxels 37 Overview • Multiple Testing Problem – Which of my 100,000 voxels are “active”? • Nonparametric Inference – Can I trust my P-value at this voxel? • Conjunction Inference – Is this voxel “active” in all of my tasks? • Bayesian vs. Classical Inference – What in the world is a posterior probability? © 2005 Thomas Nichols 38 Nonparametric Permutation Test • Parametric methods – Assume distribution of statistic under null hypothesis • Nonparametric methods – Use data to find distribution of statistic under null hypothesis – Any statistic! 5% Parametric Null Distribution 5% Nonparametric Null Distribution 39 © 2005 Thomas Nichols Permutation Test Toy Example • Data from V1 voxel in visual stim. experiment A: Active, flashing checkerboard B: Baseline, fixation 6 blocks, ABABAB Just consider block averages... A B A B A B 103.00 90.48 99.93 87.83 99.76 96.06 • Null hypothesis Ho – No experimental effect, A & B labels arbitrary • Statistic – Mean difference 40 © 2005 Thomas Nichols Permutation Test Toy Example • Under Ho – Consider all equivalent relabelings – Compute all possible statistic values – Find 95%ile of permutation distribution AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86 AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15 AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67 AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25 ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82 44 © 2005 Thomas Nichols Permutation Test Toy Example • Under Ho – Consider all equivalent relabelings – Compute all possible statistic values – Find 95%ile of permutation distribution -8 © 2005 Thomas Nichols -4 0 4 8 45 Controlling FWER: Permutation Test • Parametric methods – Assume distribution of max statistic under null hypothesis • Nonparametric methods 5% Parametric Null Max Distribution – Use data to find distribution of max statistic 5% under null hypothesis – Again, any max statistic! Nonparametric Null Max Distribution 46 © 2005 Thomas Nichols Permutation Test & Exchangeability • Exchangeability is fundamental – Def: Distribution of the data unperturbed by permutation – Under H0, exchangeability justifies permuting data – Allows us to build permutation distribution • Subjects are exchangeable – Under Ho, each subject’s A/B labels can be flipped • fMRI scans are not exchangeable under Ho – If no signal, can we permute over time? – No, permuting disrupts order, temporal autocorrelation 47 © 2005 Thomas Nichols Permutation Test & Exchangeability • fMRI scans are not exchangeable – Permuting disrupts order, temporal autocorrelation • Intrasubject fMRI permutation test – Must decorrelate data, model before permuting – What is correlation structure? • Usually must use parametric model of correlation – E.g. Use wavelets to decorrelate • Bullmore et al 2001, HBM 12:61-78 • Intersubject fMRI permutation test – Create difference image for each subject – For each permutation, flip sign of some subjects © 2005 Thomas Nichols 48 Permutation Test Example • fMRI Study of Working Memory – 12 subjects, block design – Item Recognition Active D Marshuetz et al (2000) • Active:View five letters, 2s pause, view probe letter, respond • Baseline: View XXXXX, 2s pause, view Y or N, respond UBKDA Baseline • Second Level RFX – Difference image, A-B constructed for each subject – One sample, smoothed variance t test © 2005 Thomas Nichols yes N XXXXX no 52 Permutation Test Example • Permute! – 212 = 4,096 ways to flip 12 A/B labels – For each, note maximum of t image . Permutation Distribution Maximum t © 2005 Thomas Nichols Maximum Intensity Projection 53 Thresholded t Permutation Test Example • Compare with Bonferroni – = 0.05/110,776 • Compare with parametric RFT – 110,776 222mm voxels – 5.15.86.9mm FWHM smoothness – 462.9 RESELs 54 © 2005 Thomas Nichols uPerm = 7.67 58 sig. vox. t11 Statistic, Nonparametric Threshold uRF = 9.87 uBonf = 9.80 5 sig. vox. t11 Statistic, RF & Bonf. Threshold 378 sig. vox. Test Level vs. t11 Threshold © 2005 Thomas Nichols Smoothed Variance t Statistic, Nonparametric Threshold 55 Does this Generalize? RFT vs Bonf. vs Perm. Verbal Fluency Location Switching Task Switching Faces: Main Effect Faces: Interaction Item Recognition Visual Motion Emotional Pictures Pain: Warning Pain: Anticipation © 2005 Thomas Nichols df 4 9 9 11 11 11 11 12 22 22 RF 0 0 4 127 0 5 626 0 127 74 No. Significant Voxels (0.05 Corrected) t SmVar t Bonf Perm Perm 0 0 0 0 158 354 6 2241 3447 371 917 4088 0 0 0 5 58 378 1260 1480 4064 0 0 7 116 221 347 55 182 402 Overview • Multiple Testing Problem – Which of my 100,000 voxels are “active”? • Nonparametric Inference – Can I trust my P-value at this voxel? • Conjunction Inference – Is this voxel “active” in all of my tasks? • Bayesian vs. Classical Inference – What in the world is a posterior probability? © 2005 Thomas Nichols 58 Conjunction Inference • Consider several working memory tasks – – – – N-Back tasks with different stimuli Letter memory: D J PF D RAT F M R I B K Number memory: 4 2 8 4 4 2 3 9 2 3 5 8 9 3 1 4 Shape memory: • Interested in stimuli-generic response – What areas of the brain respond to all 3 tasks? – Don’t want areas that only respond in 1 or 2 tasks 59 © 2005 Thomas Nichols Conjunction Inference • For working memory example, K=3... – Letters – Numbers – Shapes H1 H2 H3 T1 T2 T3 – Test H 0 : {H 1 0} {H 2 0} {H 3 0} • At least one of the three effects not present – versus H A : {H 1 1} {H 2 1} {H 3 1} • All three effects present 61 © 2005 Thomas Nichols Conjunction Inference Methods: Friston et al • Use the minimum of the K statistics – Idea: Only declare a conjunction if all of the statistics are sufficiently large – min Tk u only when Tk u for all k k 64 © 2005 Thomas Nichols Valid Conjunction Inference With the Minimum Statistic • For valid inference, compare min stat to u – Assess mink Tk image as if it were just T1 – E.g. u0.05=1.64 (or some corrected threshold) • Equivalently, take intersection mask – Thresh. each statistic image at, say, 0.05 FWE corr. – Make mask: 0 = below thresh., 1 = above thresh. – Intersection of masks: conjunction-significant voxels 71 © 2005 Thomas Nichols Overview • Multiple Testing Problem – Which of my 100,000 voxels are “active”? • Nonparametric Inference – Can I trust my P-value at this voxel? • Conjunction Inference – Is this voxel “active” in all of my tasks? • Bayesian vs. Classical Inference – What in the world is a posterior probability? © 2005 Thomas Nichols 72 Classical Statistics: Model • Estimation Likelihood of Y p(Y|) – n = 12 subj. fMRI study • Data at one voxel – Y, sample average % BOLD change Y • Model – Y ~ N(, /√n) – is true population mean BOLD % change – Likelihood p(Y|) • Relative frequency of observing Y for one given value of © 2005 Thomas Nichols 73 Classical Statistics: MLE • Estimating Likelihood of Y p(Y|) – Don’t know in practice • Maximum Likelihood Estimation – Find that makes data most likely – The MLE ( ˆ ) is our estimate of © 2005 Thomas Nichols Y ˆ y Actual y observed in the experiment – Here, the MLE of the population mean is simply the BOLD 74 sample mean, y Classical Statistical Inference Likelihood of Y p(Y|) • Level 95% Confidence Interval – Y ± 1.96/√n – With many replications of the experiment, CI will contain 95% of the time CI observed Y 75 © 2005 Thomas Nichols Classical Statistics Redux • Grounded in long-run frequency of observable phenomena – Data, over theoretical replications – Inference: Confidence intervals, P-values • Estimation based on likelihood • Parameters are fixed – Can’t talk about probability of parameters – P( Pop mean > 0 ) ??? • True population mean % BOLD is either > 0 or not • Only way to know is to scan everyone in population 76 © 2005 Thomas Nichols Bayesian Statistics • Grounded in degrees of belief – “Belief” expressed with the grammar of probability – No problem making statements about unobservable parameters • Parameters are regarded random, not fixed • Data is regarded as fixed, since you only have one dataset 77 © 2005 Thomas Nichols Bayesian Essentials • Prior Distribution – Expresses belief on parameters before seeing the data • Likelihood – Same as with Classical • Posterior Distribution – Expresses belief on parameters after the seeing the data • Bayes Theorem – Tells how to combine prior with likelihood (data) to create posterior Posterior p( | y) © 2005 Thomas Nichols Likelihood p( y | ) p( ) p( y | ' ) p( ' )d ' Prior p( y | ) p( ) 78 Bayesian Statistics: From Prior to Posterior • Prior p( ) – ~ N(0 , ) – 0 = 0 %: a priori belief that activation & deactivation are equally likely – = 1 % : a priori belief that activation is small • Data: y = 5 % • Posterior Posterior Likelihood Prior 0 -5 0 5 10 Population Mean % BOLD Change – Somewhere between prior and likelihood 79 © 2005 Thomas Nichols Bayesian Statistics: Posterior Inference • All Inference based on posterior • E.g. Posterior Mean (instead of MLE) 1 1 2 2 21/ n 0 1 2 /n 1 2 21/ n – Weighted sum of prior & data mean – Weights based on prior & data precision © 2005 Thomas Nichols Posterior Likelihood Prior y 0 -5 0 5 10 Population Mean % BOLD Change Prior Mean 0 Posterior Mean y Data (Sample mean) 80 Bayesian Inference: Posterior Inference • But posterior is just another distribution – Can ask any probability question • E.g. “What’s the probability, after seeing the data, that > 0”, or P( > 0 | y ) – Here P( > 0 | y ) ≈ 1 • “Credible Intervals” – Here 4 ± 0.9 has 95% posterior prob. – No reference to repetitions of the experiment Posterior 0 -5 0 5 81 Population Mean % BOLD Change © 2005 Thomas Nichols 10 Bayesian vs. Classical • Foundations – Classical How observable statistics behave in long-run – Bayesian Measuring belief about unobservable parameters • Inference – Classical References other possible datasets not observed • Requires awkward explanations for CI’s & P-values – Bayesian Based on posterior, combination of prior and data • Allows intuitive probabilistic statements (posterior probabilities) 82 © 2005 Thomas Nichols Bayesian vs. Classical • Bayesian Challenge: Priors – I can set my prior to always find a result – “Objective” priors can be found; results then often similar to Classical inference • When are the two similar? – When n large, the prior can be overwhelmed by likelihood – One-sided P-value ≈ Posterior probability of > 0 – Doesn’t work with 2-sided P-value! [ P( 0 | y ) = 1 ] 83 © 2005 Thomas Nichols Bayesian vs. Classical SPM T vs SPM PPM • Auditory experiment contrast(s) < SPM{T SPMresults: Height threshold T = 5.50 Extent threshold k = 0 voxels SPM: Voxels with T > 5.5 39.0 } SPMmip [0, 0, 0] < 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 60 < SPMresults: 1 4 7 10 13 16 19 22 Design matrix Height threshold P = 0.95 Extent threshold k = 0 voxels PPM 2.06 4 < 3 < SPM mip [0, 0, 0] < contrast(s) 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 60 1 4 7 10 13 16 19 22 Design matrix PPM: Voxels with Posterior Probability > 0.95 • Qualitatively similar, but hard to equate thresholds 84 © 2005 Thomas Nichols Slide: Alexis Roche, CEA, SHFJ Conclusions • Multiple Testing Problem – Choose a MTP metric (FDR, FWE) – Use a powerful method that controls the metric • Nonparametric Inference – More power for small group FWE inferences • Conjunction Inference – Use intersection mask, or treat mink Tk as single T • Bayesian Inference – Conceptually different, but simpler than Classical – Priors controversial, but objective ones can be used © 2005 Thomas Nichols 85 References • Multiple Testing Problem – – – Worsley, Marrett, Neelin, Vandal, Friston and Evans, A Unified Statistical Approach for Determining Significant Signals in Images of Cerebral Activation. Human Brain Mapping, 4:58-73, 1996. Nichols & Hayasaka, Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statistical Methods in Medical Research, 12:419-446, 2003. CR Genovese, N Lazar and TE Nichols. Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate. NeuroImage, 15:870-878, 2002. • Nonparametric Inference – – TE Nichols and AP Holmes. Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping, 15:1-25, 2002. Bullmore, Long and Suckling. Colored noise and computational inference in neurophysiological (fMRI) time series analysis: resampling methods in time and wavelet domains. Human Brain Mapping, 12:61-78, 2001. • Conjunction Inference – – TE Nichols, M Brett, J Andersson, TD Wager, J-B Poline. Valid Conjunction Inference with the Minimum Statistic. NeuroImage, 2005. KJ Friston, WD Penny and DE Glaser. Conjunction Revisited. NeuroImage, NeuroImage 25:661– 667, 2005. • Bayesian Inference – – – L.R. Frank, R.B. Buxton, E.C. Wong. Probabilistic analysis of functional magnetic resonance imaging data. Magnetic Resonance in Medicine, 39:132–148, 1998. Friston,, Penny, Phillips, Kiebel, Hinton and Ashbuarner, Classical and Bayesian inference in neuroimagining: theory. NeuroImage, 16: 465-483, 2002. (See also, 484-512) Woolrich, M., Behrens, T., Beckmann, C., Jenkinson, M., and Smith, S. (2004). Multi-Level Linear Modelling for FMRI Group Analysis Using Bayesian Inference. NeuroImage, 21(4):1732-1747 86 © 2005 Thomas Nichols