Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Fun with Bayes Factors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University Bayes Factor The ratio of likelihood for the data under the null compared to the alternative Nothing special about the null, it compares any two models Likelihoods are averages across different possible parameter values specified by the model by a prior distribution What does it mean? Guidelines BF Evidence 1–3 3 – 10 10 – 30 30 – 100 >100 Anecdotal Substantial Strong Very strong Decisive Evidence for the null BF01>1 implies (some) support the null hypothesis Evidence for “invariances” This is more or less impossible for NHST It is a useful measure Consider a recent study in Psychological Sciences Liu, Wang, Wang & Jiang (2016). Conscious Access to Suppressed Threatening Information Is Modulated by Working Memory Working memory face emotion Explored whether keeping a face in working memory influenced its visibility under continuous flash suppression To insure subjects kept face in memory, tested for identity Working memory face emotion Different types of face emotions: fearful face, neutral face No significant differences of correct responses (same/different) for emotions: Experiment 1: t(11)= -1.74, p=0.110 If we compute the JZS Bayes Factor we get > ttest.tstat(t=-1.74, n1=12, simple=TRUE) B10 0.9240776 Which is anecdotal support for the null hypothesis You would want B10< 1/3 for substantial support for the null Replications Experiment 3 t(11)=-1.62, p=.133 Experiment 4 t(13)=-1.37, p=.195 Converting to JZS Bayes Factors suggests these are modest support for the null Experiment 3 ttest.tstat(t= -1.62, n1=12, simple=TRUE) B10 0.8033315 Experiment 4 ttest.tstat(t= -1.37, n1=14, simple=TRUE) B10 0.5857839 The null result matters The authors wanted to demonstrate that faces with different emotions were equivalently represented in working memory But differently affected visibility during the flash suppression part of a trial Experiment 1: Reaction times for seeing a face during continuous flash suppression were shorter for fearful faces than for neutral faces Main effect of emotion: F(1, 11)=5.06, p=0.046 Reaction times were shorter when the emotion of the face during continuous flash suppression matched the emotion of the face in working memory Main effect of congruency: F(1, 11)=11.86, p=0.005 Main effects We will talk about a Bayesian ANOVA later, but we can consider the ttest equivalent of these tests: Effect of emotion > ttest.tstat(t= sqrt(5.06), n1=12, simple=TRUE) B10 1.769459 Suggests anecdotal support for the alternative hypothesis Effect of congruency ttest.tstat(t= sqrt(11.86), n1=12, simple=TRUE) B10 9.664241 Suggests substantial support for the alternative hypothesis Evidence It is generally harder to get convincing evidence (BF>3 or BF>10) than to get p<.05 Interaction: F(1, 11)=4.36, p=.061 Contrasts: RT for fearful faces shorter if congruent with working memory: t(11)=-3.59, p=.004 RT for neutral faces unaffected by congruency t(11)=-0.45 Bayesian interpretations of t-tests: > ttest.tstat(t=-3.59, n1=12, simple=TRUE) B10 11.94693 > ttest.tstat(t=-0.45, n1=12, simple=TRUE) B10 0.3136903 Substantial Evidence For a two-sample t-test (n1=n2=10), a BF>3 corresponds to p<0.022 For a two-sample t-test (n1=n2=100), a BF>3 corresponds to p<0.012 For a two-sample t-test (n1=n2=1000), a BF>3 corresponds to p<0.004 Strong Evidence For a two-sample t-test (n1=n2=10), a BF>10 corresponds to p<0.004 For a two-sample t-test (n1=n2=100), a BF>10 corresponds to p<0.003 For a two-sample t-test (n1=n2=1000), a BF>10 corresponds to p<0.001 Of course, if you change your prior you change these values (but not much) Setting the scale parameter r=sqrt(2) (ultra wide) gives For a two-sample t-test (n1=n2=10), a BF>10 corresponds to p<0.005 For a two-sample t-test (n1=n2=100), a BF>10 corresponds to p<0.0017 For a two-sample t-test (n1=n2=1000), a BF>10 corresponds to p<0.00054 Bayesian meta-analysis Rouder & Morey (2011) identified how to combine replication studies to produce a JZS Bayes Factor that accumulates the information across experiments The formula for a one-sample, one-tailed t-test for BF10 is f( ) is the Cauchy (or half-Cauchy) distribution g( ) is the non-central t distribution It looks complicated, but it is easy enough to calculate Bayesian meta-analysis Consider the null results on face emotion and memorability Experiment 1 t(11)= -1.74, p=0.110 Experiment 3 t(11)=-1.62, p=.133 Experiment 4 t(13)=-1.37, p=.195 Strong support for the alternative! > tvalues<-c(-1.74, -1.62, -1.37) > nvalues<-c(12, 12, 14) > meta.ttestBF(t=tvalues, n1=nvalues) Bayes factor analysis -------------[1] Alt., r=0.707 : 4.414733 ±0% Against denominator: Null, d = 0 --Bayes factor type: BFmetat, JZS Linear regression The BayesFactor library has several functions for linear regression Consider the previously discussed Map Search data > regular<-lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, data=MSdata) > summary(regular) MSdata<-read.csv(file="MapSearch.csv",header=TRUE,stringsAsFactors=FALSE) Call: lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, data = MSdata) Residuals: Min 1Q Median 3Q Max -289.36 -107.29 -20.39 92.34 510.95 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.073e+03 9.592e+01 11.183 < 2e-16 *** Color -1.928e-03 5.729e-04 -3.366 0.000994 *** Proximity 1.974e+00 2.153e-01 9.170 7e-16 *** Size 3.236e+01 1.359e+01 2.381 0.018684 * Contrast -1.450e+02 6.886e+01 -2.105 0.037108 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 157.9 on 135 degrees of freedom Multiple R-squared: 0.4593, Adjusted R-squared: 0.4433 F-statistic: 28.67 on 4 and 135 DF, p-value: < 2.2e-16 Linear regression > summary(bf) Bayes factor analysis -------------regressionbf( ) compares all additive models to: 53.34634 the intercept [1] Color ±0% [2] Proximity : 3.164296e+12 ±0.01% > bf = regressionBF(RT_ms ~ ., data=MSdata) [3] Size : 1.784275 ±0% [4] Contrast : 0.2139982 ±0% |============================================================================================ ===============================================| 100% [5] Color + Proximity : 2.992316e+13 ±0% [6] Color + Size : 124.498 ±0% [7] Color + Contrast : 22.93048 ±0% [8] Proximity + Size : 1.412119e+13 ±0% [9] Proximity + Contrast : 2.823525e+12 ±0% [10] Size + Contrast : 0.4558166 ±0.01% [11] Color + Proximity + Size : 1.697263e+14 ±0% [12] Color + Proximity + Contrast : 9.524173e+13 ±0% [13] Color + Size + Contrast : 43.70297 ±0.01% [14] Proximity + Size + Contrast : 7.195322e+12 ±0% [15] Color + Proximity + Size + Contrast : 2.332274e+14 ±0% “RT_ms ~.” Means use all other variables in the data set Against denominator: Intercept only --Bayes factor type: BFlinearModel, JZS Specific comparisons Remember that each model is a ratio of average likelihoods You can easily create other such ratios > bf[15]/bf[14] Bayes factor analysis -------------[1] Color + Proximity + Size + Contrast : 32.41376 ±0% Against denominator: RT_ms ~ Proximity + Size + Contrast --Bayes factor type: BFlinearModel, JZS Interactions bf2 <- lmBF(RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Color, data=MSdata) regressionBF( > summary(bf2) ) does not handle interactions Bayes analysis variables, you have Forfactor p independent -------------[1] Color + Proximity + Size + Contrast + Contrast:Color : 4.599422e+13 ±0% Against denominator: different models, which is rather unwieldy Intercept only --Bayes factor type: BFlinearModel, JZS With the function lmbf( ) you can specify particular models, which are compared against the intercept-only model > bf3 <- lmBF(RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity, data=MSdata) > summary(bf3) Bayes factor analysis -------------[1] Color + Proximity + Size + Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity : 5.495223e+12 ±0% Against denominator: Intercept only --Bayes factor type: BFlinearModel, JZS Compare models Again, it is easy to generate new Bayes Factors by division > bf2/bf3 Bayes factor analysis -------------[1] Color + Proximity + Size + Contrast + Contrast:Color : 8.369856 ±0% Against denominator: RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity --Bayes factor type: BFlinearModel, JZS > Compare models by BF Generate multiple models and compare BF (relative to intercept-only) > CompareContrastBFs<-c(bf[15], bf2, bf3,bf4, bf[14]) > head(CompareContrastBFs) Bayes factor analysis -------------- [1] Color + Proximity + Size + Contrast [2] Color + Proximity + Size + Contrast + Contrast:Size : 1.001063e+14 ±0% [3] Color + Proximity + Size + Contrast + Contrast:Color : 4.599422e+13 ±0% [4] Proximity + Size + Contrast [5] Color + Proximity + Size Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity : 5.495223e+12 ±0% >+CompareContrastBFs[1]/CompareContrastBFs[4] Against denominator: Intercept only : 2.332274e+14 ±0% : 7.195322e+12 ±0% Bayes factor analysis -------------[1] Color + Proximity + Size + Contrast : 2.329798 ±0% --- Bayes factor type: BFlinearModel, JZS Against denominator: RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Size --Bayes factor type: BFlinearModel, JZS Compare models with WAIC We had previously done the same kind of thing in Stan (similar results, but not exactly the same) > compare(MSLinearModel, MSLinearNoColorModel, MSColorContrastInteractionModel, MSSizeContrastInteractionModel, MSAllContrastInteractionModel) WAIC pWAIC dWAIC weight SE dSE MSSizeContrastInteractionModel 1820.6 5.7 0.0 0.45 17.61 NA MSLinearModel MSColorContrastInteractionModel 1822.4 5.9 1.9 0.18 18.14 1.83 MSAllContrastInteractionModel MSLinearNoColorModel 1821.8 5.6 1.2 0.24 18.17 1.87 1823.1 7.0 2.6 0.13 17.45 1.35 1830.3 4.4 9.7 0.00 18.16 5.73 Trace of Color Density of Color 200 -0.003 400 0.000 Posteriors Models can generate posterior distributions Consider the model that just uses color as an independent variable 0 -0.006 0 2000 4000 6000 8000 10000 -0.006 Iterations 0 |----|----|----|----|----|----|----|----|----|----| **************************************************| > plot(chainsColor) 4e-05 % 8000 0e+00 30000 6000 10000 30000 40000 50000 60000 N = 10000 Bandwidth = 855.9 Trace of g Density of g 70000 0 2 4 200 6 400 8 10 Iterations 0 sig2 is the error variance 4000 0.000 Density of sig2 8e-05 50000 Trace of sig2 > chainsColor<-posterior(bf[1], iterations=10000) 2000 -0.002 N = 10000 Bandwidth = 0.0001202 0 -0.004 0 2000 4000 6000 Iterations 8000 10000 0 100 200 300 400 N = 10000 Bandwidth = 0.03864 500 Posteriors > summary(chainsColor) Iterations = 1:10000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE Color -2.414e-03 7.154e-04 7.154e-06 sig2 4.182e+04 5.095e+03 5.095e+01 g 2. Quantiles for each variable: 1.017e+00 1.166e+01 1.166e-01 2.5% 25% 50% 75% 7.498e-06 5.195e+01 1.166e-01 97.5% Color -3.847e-03 -2.892e-03 -2.412e-03 -1.929e-03 -1.021e-03 sig2 3.290e+04 3.819e+04 4.152e+04 4.502e+04 5.266e+04 g 2.625e-02 7.437e-02 1.540e-01 3.826e-01 4.558e+00 Posteriors Full linear model: > chainsFullLinear<-posterior(bf[15], iterations=10000) 0 |----|----|----|----|----|----|----|----|----|----| **************************************************| > summary(chainsFullLinear) Iterations = 1:10000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, % Call: lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, data = MSdata Residuals: Min 1Q Median 3Q Max -289.36 -107.29 -20.39 92.34 510.95 plus standard error of the mean: Mean SD Naive SE Time-series SE Color Proximity 1.900e+00 2.186e-01 2.186e-03 Size Contrast -1.412e+02 6.745e+01 6.745e-01 sig2 g 2. Quantiles for each variable: -1.850e-03 5.668e-04 5.668e-06 5.668e-06 2.324e-03 3.093e+01 1.359e+01 1.359e-01 1.346e-01 2.521e+04 3.098e+03 3.098e+01 3.074e-01 4.189e-01 4.189e-03 2.5% 25% > regular<-lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, dat > summary(regular) 50% 6.967e-01 3.262e+01 4.189e-03 75% 97.5% Color -2.974e-03 -2.229e-03 -1.850e-03 -1.472e-03 -7.478e-04 Proximity 1.475e+00 1.751e+00 1.898e+00 2.050e+00 2.324e+00 Size Contrast -2.749e+02 -1.854e+02 -1.408e+02 -9.664e+01 -8.085e+00 sig2 g 4.577e+00 2.168e+01 3.090e+01 4.002e+01 5.775e+01 1.988e+04 2.304e+04 2.495e+04 2.709e+04 3.184e+04 6.425e-02 1.329e-01 2.072e-01 3.414e-01 1.158e+00 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.073e+03 9.592e+01 11.183 < 2e-16 *** Color -1.928e-03 5.729e-04 -3.366 0.000994 *** Proximity 1.974e+00 2.153e-01 9.170 7e-16 *** Size 3.236e+01 1.359e+01 2.381 0.018684 * Contrast -1.450e+02 6.886e+01 -2.105 0.037108 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 157.9 on 135 degrees of freedom Multiple R-squared: 0.4593, Adjusted R-squared: 0.4433 F-statistic: 28.67 on 4 and 135 DF, p-value: < 2.2e-16 Trace of Contrast Posteriors 0 2000 4000 6000 2000 8000 4000 10000 6000 Iterations 0.006 0.000 0.003 200 400 600 -300 -0.006 0 0 -100 0.000 -0.003 > plot(chainsFullLinear) 8000 10000 -0.006 Iterations -400 -0.004 -300 -0.002 0.00008 0.00000 1.0 8000 4000 10000 0.0 0.5 20000 6000 2000 6000 Iterations 8000 10000 1.0 Iterations 20000 1.5 2.0 2.5 40000 50000 3.0 N = 10000 Bandwidth = 508.4 Trace of g Density of g 3.0 8 0.030 10 Density of Size 0 0 2000 4000 6000 Iterations 2000 8000 4000 10000 0.000 1.0 0 0.0 0 2 4 0.015 2.0 6 80 30000 N = 10000 Bandwidth = 0.03672 Trace of Size 40 100 Density of sig2 1.5 40000 2.5 4000 0 Density of Proximity 2.0 1.5 1.0 0 2000 -100 0.000 N = 10000 Bandwidth = 11.13 Trace of sig2 0 -200 N = 10000 Bandwidth = 9.489e-05 Trace of Proximity -40 Density of Contrast Density of Color 100 Trace of Color 6000 Iterations 8000 -40 10000 -20 0 0 20 40 2 60 N = 10000 Bandwidth = 2.283 4 6 8 80 N = 10000 Bandwidth = 0.02613 10 Not happy with your result? Suppose you get BF=2.9, but you want BF>3 Gather more data! There’s no problem with gathering more data because you are comparing two models, not deciding whether one model should be rejected Gathering more data will affect the average likelihood of each model, adding data gives you more evidence about the relative fit of the models to the data you have observed Note, if you make a decision based on BF>3 (or whatever), then you might increase the Type I error rate That is not what a Bayesian analysis is about However, over the long run, the BF will get very large in favor of the true model (if it is one of the models) One-tailed tests Consider the facial feedback data Dependent data Just to demonstrate things Compute average rating for each participant For each condition Drop NoPen condition > FFdata<read.csv(file="FacialFeedbackAvg.csv",header=TRUE,s tringsAsFactors=FALSE) One-tailed tests Regular dependent t-test One-sample test on differences > diffScores <- FFdata$PenInTeeth - FFdata$PenInLips > t.test(diffScores, alternative="greater”) One Sample t-test data: diffScores t = 0.56122, df = 20, p-value = 0.2904 alternative hypothesis: true mean is greater than 0 95 percent confidence interval: -3.266755 Inf sample estimates: mean of x 1.575758 Bayesian t-test library(BayesFactor) > bf<-ttestBF(x=diffScores) > bf Bayes factor analysis -------------- [1] Alt., r=0.707 : 0.2621909 ±0.03% Against denominator: Null, mu = 0 --- Bayes factor type: BFoneSample, JZS One-tailed Bayesian t-test Specify a range for the null hypothesis > bfInterval <- ttestBF(x=diffScores, nullInterval=c(0, Inf)) > bfInterval Bayes factor analysis -------------- [1] Alt., r=0.707 0<d<Inf [2] Alt., r=0.707 !(0<d<Inf) : 0.1571203 ±0% Against denominator: : 0.3672615 ±0% Null, mu = 0 --- Bayes factor type: BFoneSample, JZS We get 2 Tests! Directional test Your null does not have to be a point You just tested Model 1 (M1): delta>0 against delta=0 > bfInterval[1]/bfInterval[2] You just tested Model 2 (M2): delta<0 against delta=0 You can Bayes factor analysis -------------compare M1 and M2 by dividing the [1] Alt., r=0.707 0<d<Inf : 2.337454 ±0% BFs Against denominator: Alternative, r = 0.707106781186548, mu =/= 0 !(0<d<Inf) --Bayes factor type: BFoneSample, JZS Careful! You still have look at your data One subject seems to have “given up” on the task Removing this subject produces a rather different result Careful! > FFdata<read.csv(file="FacialFeedbackAvg.csv",header=TRUE,stringsAsFactors=FALSE) > FFdata <- FFdata[-c(18),] # Removes row of non-responsive subject > bfInterval Bayes factor analysis -------------- [1] Alt., r=0.707 0<d<Inf [2] Alt., r=0.707 !(0<d<Inf) : 0.2567416 ±0.01% Against denominator: : 0.2114579 ±0.01% Null, mu = 0 --- Bayes factor type: BFoneSample, JZS Trace of mu 4 0.3 0 0.0 8 Careful! Density of mu > chains<-posterior(bfInterval[1], 0 200 400 iterations=1000) 600 800 1000 0 Trace of sig2 |----|----|----|----|----|----|----|----|----|----| **************************************************| > plot(chains) 0 2 Iterations 4 6 8 N = 1000 Bandwidth = 0.3239 % 0.012 0.000 50 200 Density of sig2 200 400 600 800 1000 50 100 150 200 250 Iterations N = 1000 Bandwidth = 8.401 Trace of delta Density of delta 300 350 0 0.0 2 0.6 4 0 200 400 600 800 1000 0.0 0.2 0.4 0.6 0.8 Iterations N = 1000 Bandwidth = 0.03177 Trace of g Density of g 0 0.0 60 1.0 0 0 200 400 600 Iterations 800 1000 0 20 40 60 80 100 N = 1000 Bandwidth = 0.13 120 Iterations = 1:1000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 1000 Careful! > summary(chains) If you insist that the average difference scores must be positive, the model does the best it can That might not be very good! 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE mu 1.5257 1.2338 0.039016 0.041415 sig2 103.3307 35.2416 1.114436 1.014385 delta 0.1517 0.1193 0.003773 0.003773 g 1.5704 6.7063 0.212073 0.231978 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% mu 0.072775 0.55931 1.2281 2.1894 4.4375 sig2 55.467055 79.26845 95.9279 121.5487 191.5553 delta 0.006707 0.05882 0.1254 0.2223 0.4489 g 0.072677 0.17978 0.3580 0.8343 9.3308 Conclusions JZS Bayes Factors Easy to calculate Pretty easy to understand results You can do a lot with them Evidence for the null Add data Posterior distributions Different kinds of tests