Download PPT slides for 10 November (Bayes Factors).

Fun with Bayes Factors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University Bayes Factor The ratio of likelihood for the data under the null compared to the alternative   Nothing special about the null, it compares any two models  Likelihoods are averages across different possible parameter values specified by the model by a prior distribution What does it mean?  Guidelines BF Evidence 1–3 3 – 10 10 – 30 30 – 100 >100 Anecdotal Substantial Strong Very strong Decisive Evidence for the null  BF01>1 implies (some) support the null hypothesis  Evidence for “invariances”  This is more or less impossible for NHST  It is a useful measure  Consider a recent study in Psychological Sciences  Liu, Wang, Wang & Jiang (2016). Conscious Access to Suppressed Threatening Information Is Modulated by Working Memory Working memory face emotion   Explored whether keeping a face in working memory influenced its visibility under continuous flash suppression To insure subjects kept face in memory, tested for identity Working memory face emotion   Different types of face emotions: fearful face, neutral face No significant differences of correct responses (same/different) for emotions:  Experiment 1: t(11)= -1.74, p=0.110  If we compute the JZS Bayes Factor we get  > ttest.tstat(t=-1.74, n1=12, simple=TRUE)  B10  0.9240776  Which is anecdotal support for the null hypothesis  You would want B10< 1/3 for substantial support for the null Replications  Experiment 3  t(11)=-1.62, p=.133  Experiment 4  t(13)=-1.37, p=.195   Converting to JZS Bayes Factors suggests these are modest support for the null Experiment 3  ttest.tstat(t= -1.62, n1=12, simple=TRUE)  B10  0.8033315  Experiment 4  ttest.tstat(t= -1.37, n1=14, simple=TRUE)  B10  0.5857839 The null result matters  The authors wanted to demonstrate that faces with different emotions were equivalently represented in working memory  But differently affected visibility during the flash suppression part of a trial  Experiment 1:  Reaction times for seeing a face during continuous flash suppression were shorter for fearful faces than for neutral faces  Main effect of emotion: F(1, 11)=5.06, p=0.046  Reaction times were shorter when the emotion of the face during continuous flash suppression matched the emotion of the face in working memory  Main effect of congruency: F(1, 11)=11.86, p=0.005 Main effects  We will talk about a Bayesian ANOVA later, but we can consider the ttest equivalent of these tests:  Effect of emotion  > ttest.tstat(t= sqrt(5.06), n1=12, simple=TRUE)  B10  1.769459  Suggests anecdotal support for the alternative hypothesis  Effect of congruency   ttest.tstat(t= sqrt(11.86), n1=12, simple=TRUE) B10  9.664241  Suggests substantial support for the alternative hypothesis Evidence  It is generally harder to get convincing evidence (BF>3 or BF>10) than to get p<.05  Interaction: F(1, 11)=4.36, p=.061  Contrasts:  RT for fearful faces shorter if congruent with working memory: t(11)=-3.59, p=.004  RT for neutral faces unaffected by congruency t(11)=-0.45  Bayesian interpretations of t-tests:  > ttest.tstat(t=-3.59, n1=12, simple=TRUE)  B10  11.94693  > ttest.tstat(t=-0.45, n1=12, simple=TRUE)  B10  0.3136903 Substantial Evidence  For a two-sample t-test (n1=n2=10), a BF>3 corresponds to p<0.022  For a two-sample t-test (n1=n2=100), a BF>3 corresponds to p<0.012  For a two-sample t-test (n1=n2=1000), a BF>3 corresponds to p<0.004 Strong Evidence  For a two-sample t-test (n1=n2=10), a BF>10 corresponds to p<0.004  For a two-sample t-test (n1=n2=100), a BF>10 corresponds to p<0.003  For a two-sample t-test (n1=n2=1000), a BF>10 corresponds to p<0.001  Of course, if you change your prior you change these values  (but not much)  Setting the scale parameter r=sqrt(2) (ultra wide) gives  For a two-sample t-test (n1=n2=10), a BF>10 corresponds to p<0.005  For a two-sample t-test (n1=n2=100), a BF>10 corresponds to p<0.0017  For a two-sample t-test (n1=n2=1000), a BF>10 corresponds to p<0.00054 Bayesian meta-analysis  Rouder & Morey (2011) identified how to combine replication studies to produce a JZS Bayes Factor that accumulates the information across experiments  The formula for a one-sample, one-tailed t-test for BF10 is  f( ) is the Cauchy (or half-Cauchy) distribution  g( ) is the non-central t distribution  It looks complicated, but it is easy enough to calculate Bayesian meta-analysis  Consider the null results on face emotion and memorability  Experiment 1  t(11)= -1.74, p=0.110  Experiment 3  t(11)=-1.62, p=.133  Experiment 4  t(13)=-1.37, p=.195  Strong support for the alternative! > tvalues<-c(-1.74, -1.62, -1.37) > nvalues<-c(12, 12, 14) > meta.ttestBF(t=tvalues, n1=nvalues) Bayes factor analysis -------------[1] Alt., r=0.707 : 4.414733 ±0% Against denominator: Null, d = 0 --Bayes factor type: BFmetat, JZS Linear regression  The BayesFactor library has several functions for linear regression  Consider the previously discussed Map Search data > regular<-lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, data=MSdata) > summary(regular)  MSdata<-read.csv(file="MapSearch.csv",header=TRUE,stringsAsFactors=FALSE) Call: lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, data = MSdata) Residuals: Min 1Q Median 3Q Max -289.36 -107.29 -20.39 92.34 510.95 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.073e+03 9.592e+01 11.183 < 2e-16 *** Color -1.928e-03 5.729e-04 -3.366 0.000994 *** Proximity 1.974e+00 2.153e-01 9.170 7e-16 *** Size 3.236e+01 1.359e+01 2.381 0.018684 * Contrast -1.450e+02 6.886e+01 -2.105 0.037108 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 157.9 on 135 degrees of freedom Multiple R-squared: 0.4593, Adjusted R-squared: 0.4433 F-statistic: 28.67 on 4 and 135 DF, p-value: < 2.2e-16 Linear regression    > summary(bf) Bayes factor analysis -------------regressionbf( ) compares all additive models to: 53.34634 the intercept [1] Color ±0% [2] Proximity : 3.164296e+12 ±0.01% > bf = regressionBF(RT_ms ~ ., data=MSdata) [3] Size : 1.784275 ±0% [4] Contrast : 0.2139982 ±0% |============================================================================================ ===============================================| 100% [5] Color + Proximity : 2.992316e+13 ±0% [6] Color + Size : 124.498 ±0% [7] Color + Contrast : 22.93048 ±0% [8] Proximity + Size : 1.412119e+13 ±0% [9] Proximity + Contrast : 2.823525e+12 ±0% [10] Size + Contrast : 0.4558166 ±0.01% [11] Color + Proximity + Size : 1.697263e+14 ±0% [12] Color + Proximity + Contrast : 9.524173e+13 ±0% [13] Color + Size + Contrast : 43.70297 ±0.01% [14] Proximity + Size + Contrast : 7.195322e+12 ±0% [15] Color + Proximity + Size + Contrast : 2.332274e+14 ±0% “RT_ms ~.” Means use all other variables in the data set Against denominator: Intercept only --Bayes factor type: BFlinearModel, JZS Specific comparisons  Remember that each model is a ratio of average likelihoods  You can easily create other such ratios > bf[15]/bf[14] Bayes factor analysis -------------[1] Color + Proximity + Size + Contrast : 32.41376 ±0% Against denominator: RT_ms ~ Proximity + Size + Contrast --Bayes factor type: BFlinearModel, JZS Interactions  bf2 <- lmBF(RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Color, data=MSdata) regressionBF( > summary(bf2) ) does not handle interactions Bayes analysis variables, you have  Forfactor p independent -------------[1] Color + Proximity + Size + Contrast + Contrast:Color : 4.599422e+13 ±0% Against denominator: different models, which is rather unwieldy Intercept only --Bayes factor type: BFlinearModel, JZS  With the function lmbf( ) you can specify particular models, which are compared against the intercept-only model > bf3 <- lmBF(RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity, data=MSdata) > summary(bf3) Bayes factor analysis -------------[1] Color + Proximity + Size + Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity : 5.495223e+12 ±0% Against denominator: Intercept only --Bayes factor type: BFlinearModel, JZS Compare models Again, it is easy to generate new Bayes Factors by division  > bf2/bf3 Bayes factor analysis -------------[1] Color + Proximity + Size + Contrast + Contrast:Color : 8.369856 ±0% Against denominator: RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity --Bayes factor type: BFlinearModel, JZS > Compare models by BF  Generate multiple models and compare BF (relative to intercept-only)  > CompareContrastBFs<-c(bf[15], bf2, bf3,bf4, bf[14])  > head(CompareContrastBFs)  Bayes factor analysis  --------------  [1] Color + Proximity + Size + Contrast  [2] Color + Proximity + Size + Contrast + Contrast:Size : 1.001063e+14 ±0%  [3] Color + Proximity + Size + Contrast + Contrast:Color : 4.599422e+13 ±0%  [4] Proximity + Size + Contrast  [5] Color + Proximity + Size Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity : 5.495223e+12 ±0% >+CompareContrastBFs[1]/CompareContrastBFs[4]   Against denominator: Intercept only : 2.332274e+14 ±0% : 7.195322e+12 ±0% Bayes factor analysis -------------[1] Color + Proximity + Size + Contrast : 2.329798 ±0%  ---  Bayes factor type: BFlinearModel, JZS Against denominator: RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Size --Bayes factor type: BFlinearModel, JZS Compare models with WAIC   We had previously done the same kind of thing in Stan (similar results, but not exactly the same) > compare(MSLinearModel, MSLinearNoColorModel, MSColorContrastInteractionModel, MSSizeContrastInteractionModel, MSAllContrastInteractionModel) WAIC pWAIC dWAIC weight  SE dSE  MSSizeContrastInteractionModel 1820.6 5.7 0.0 0.45 17.61 NA  MSLinearModel  MSColorContrastInteractionModel 1822.4 5.9 1.9 0.18 18.14 1.83  MSAllContrastInteractionModel  MSLinearNoColorModel 1821.8 5.6 1.2 0.24 18.17 1.87 1823.1 7.0 2.6 0.13 17.45 1.35 1830.3 4.4 9.7 0.00 18.16 5.73 Trace of Color Density of Color 200 -0.003 400 0.000 Posteriors Models can generate posterior distributions  Consider the model that just uses color as an independent variable 0 -0.006  0 2000 4000 6000 8000 10000 -0.006 Iterations 0  |----|----|----|----|----|----|----|----|----|----|  **************************************************|  > plot(chainsColor) 4e-05 % 8000 0e+00 30000 6000 10000 30000 40000 50000 60000 N = 10000 Bandwidth = 855.9 Trace of g Density of g 70000 0 2 4 200 6 400 8 10 Iterations 0 sig2 is the error variance 4000 0.000 Density of sig2 8e-05  50000 Trace of sig2 > chainsColor<-posterior(bf[1], iterations=10000) 2000 -0.002 N = 10000 Bandwidth = 0.0001202  0 -0.004 0 2000 4000 6000 Iterations 8000 10000 0 100 200 300 400 N = 10000 Bandwidth = 0.03864 500 Posteriors  > summary(chainsColor)  Iterations = 1:10000  Thinning interval = 1  Number of chains = 1  Sample size per chain = 10000  1. Empirical mean and standard deviation for each variable, plus standard error of the mean:  Mean  SD Naive SE Time-series SE  Color -2.414e-03 7.154e-04 7.154e-06  sig2 4.182e+04 5.095e+03 5.095e+01  g  2. Quantiles for each variable: 1.017e+00 1.166e+01 1.166e-01 2.5%  25% 50% 75% 7.498e-06 5.195e+01 1.166e-01 97.5%  Color -3.847e-03 -2.892e-03 -2.412e-03 -1.929e-03 -1.021e-03  sig2 3.290e+04 3.819e+04 4.152e+04 4.502e+04 5.266e+04  g 2.625e-02 7.437e-02 1.540e-01 3.826e-01 4.558e+00 Posteriors  Full linear model:  > chainsFullLinear<-posterior(bf[15], iterations=10000)  0  |----|----|----|----|----|----|----|----|----|----|  **************************************************|  > summary(chainsFullLinear)  Iterations = 1:10000  Thinning interval = 1  Number of chains = 1  Sample size per chain = 10000  1. Empirical mean and standard deviation for each variable, % Call: lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, data = MSdata Residuals: Min 1Q Median 3Q Max -289.36 -107.29 -20.39 92.34 510.95 plus standard error of the mean:  Mean  SD Naive SE Time-series SE  Color  Proximity 1.900e+00 2.186e-01 2.186e-03  Size  Contrast -1.412e+02 6.745e+01 6.745e-01  sig2  g  2. Quantiles for each variable: -1.850e-03 5.668e-04 5.668e-06 5.668e-06 2.324e-03 3.093e+01 1.359e+01 1.359e-01 1.346e-01 2.521e+04 3.098e+03 3.098e+01 3.074e-01 4.189e-01 4.189e-03 2.5%  25% > regular<-lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, dat > summary(regular) 50% 6.967e-01 3.262e+01 4.189e-03 75% 97.5%  Color -2.974e-03 -2.229e-03 -1.850e-03 -1.472e-03 -7.478e-04  Proximity 1.475e+00 1.751e+00 1.898e+00 2.050e+00 2.324e+00  Size  Contrast -2.749e+02 -1.854e+02 -1.408e+02 -9.664e+01 -8.085e+00  sig2  g 4.577e+00 2.168e+01 3.090e+01 4.002e+01 5.775e+01 1.988e+04 2.304e+04 2.495e+04 2.709e+04 3.184e+04 6.425e-02 1.329e-01 2.072e-01 3.414e-01 1.158e+00 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.073e+03 9.592e+01 11.183 < 2e-16 *** Color -1.928e-03 5.729e-04 -3.366 0.000994 *** Proximity 1.974e+00 2.153e-01 9.170 7e-16 *** Size 3.236e+01 1.359e+01 2.381 0.018684 * Contrast -1.450e+02 6.886e+01 -2.105 0.037108 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 157.9 on 135 degrees of freedom Multiple R-squared: 0.4593, Adjusted R-squared: 0.4433 F-statistic: 28.67 on 4 and 135 DF, p-value: < 2.2e-16 Trace of Contrast Posteriors 0 2000 4000 6000 2000 8000 4000 10000 6000 Iterations 0.006 0.000 0.003 200 400 600 -300 -0.006 0 0 -100 0.000 -0.003 > plot(chainsFullLinear) 8000 10000 -0.006 Iterations -400 -0.004 -300 -0.002 0.00008 0.00000 1.0 8000 4000 10000 0.0 0.5 20000 6000 2000 6000 Iterations 8000 10000 1.0 Iterations 20000 1.5 2.0 2.5 40000 50000 3.0 N = 10000 Bandwidth = 508.4 Trace of g Density of g 3.0 8 0.030 10 Density of Size 0 0 2000 4000 6000 Iterations 2000 8000 4000 10000 0.000 1.0 0 0.0 0 2 4 0.015 2.0 6 80 30000 N = 10000 Bandwidth = 0.03672 Trace of Size 40 100 Density of sig2 1.5 40000 2.5 4000 0 Density of Proximity 2.0 1.5 1.0 0 2000 -100 0.000 N = 10000 Bandwidth = 11.13 Trace of sig2 0 -200 N = 10000 Bandwidth = 9.489e-05 Trace of Proximity -40  Density of Contrast Density of Color 100 Trace of Color 6000 Iterations 8000 -40 10000 -20 0 0 20 40 2 60 N = 10000 Bandwidth = 2.283 4 6 8 80 N = 10000 Bandwidth = 0.02613 10 Not happy with your result?  Suppose you get BF=2.9, but you want BF>3  Gather more data!    There’s no problem with gathering more data because you are comparing two models, not deciding whether one model should be rejected Gathering more data will affect the average likelihood of each model, adding data gives you more evidence about the relative fit of the models to the data you have observed Note, if you make a decision based on BF>3 (or whatever), then you might increase the Type I error rate  That is not what a Bayesian analysis is about  However, over the long run, the BF will get very large in favor of the true model (if it is one of the models) One-tailed tests  Consider the facial feedback data  Dependent data  Just to demonstrate things  Compute average rating for each participant  For each condition  Drop NoPen condition  > FFdata<read.csv(file="FacialFeedbackAvg.csv",header=TRUE,s tringsAsFactors=FALSE) One-tailed tests  Regular dependent t-test  One-sample test on differences  > diffScores <- FFdata$PenInTeeth - FFdata$PenInLips  > t.test(diffScores, alternative="greater”)  One Sample t-test  data: diffScores  t = 0.56122, df = 20, p-value = 0.2904  alternative hypothesis: true mean is greater than 0  95 percent confidence interval:  -3.266755 Inf  sample estimates:  mean of x  1.575758 Bayesian t-test  library(BayesFactor)  > bf<-ttestBF(x=diffScores)  > bf  Bayes factor analysis  --------------  [1] Alt., r=0.707 : 0.2621909 ±0.03%  Against denominator:  Null, mu = 0  ---  Bayes factor type: BFoneSample, JZS One-tailed Bayesian t-test  Specify a range for the null hypothesis  > bfInterval <- ttestBF(x=diffScores, nullInterval=c(0, Inf))  > bfInterval  Bayes factor analysis  --------------  [1] Alt., r=0.707 0<d<Inf  [2] Alt., r=0.707 !(0<d<Inf) : 0.1571203 ±0%  Against denominator:  : 0.3672615 ±0% Null, mu = 0  ---  Bayes factor type: BFoneSample, JZS We get 2 Tests! Directional test  Your null does not have to be a point  You just tested Model 1 (M1): delta>0 against delta=0  > bfInterval[1]/bfInterval[2] You just tested Model 2 (M2): delta<0 against delta=0  You can Bayes factor analysis -------------compare M1 and M2 by dividing the [1] Alt., r=0.707 0<d<Inf : 2.337454 ±0% BFs Against denominator: Alternative, r = 0.707106781186548, mu =/= 0 !(0<d<Inf) --Bayes factor type: BFoneSample, JZS Careful!  You still have look at your data  One subject seems to have “given up” on the task  Removing this subject produces a rather different result Careful!  > FFdata<read.csv(file="FacialFeedbackAvg.csv",header=TRUE,stringsAsFactors=FALSE)  > FFdata <- FFdata[-c(18),] # Removes row of non-responsive subject  > bfInterval  Bayes factor analysis  --------------  [1] Alt., r=0.707 0<d<Inf  [2] Alt., r=0.707 !(0<d<Inf) : 0.2567416 ±0.01%  Against denominator:  : 0.2114579 ±0.01% Null, mu = 0  ---  Bayes factor type: BFoneSample, JZS Trace of mu 4 0.3 0 0.0 8 Careful! Density of mu  > chains<-posterior(bfInterval[1], 0 200 400 iterations=1000) 600 800 1000  0  Trace of sig2 |----|----|----|----|----|----|----|----|----|----|  **************************************************|  > plot(chains) 0 2 Iterations 4 6 8 N = 1000 Bandwidth = 0.3239 % 0.012 0.000 50 200 Density of sig2 200 400 600 800 1000 50 100 150 200 250 Iterations N = 1000 Bandwidth = 8.401 Trace of delta Density of delta 300 350 0 0.0 2 0.6 4 0 200 400 600 800 1000 0.0 0.2 0.4 0.6 0.8 Iterations N = 1000 Bandwidth = 0.03177 Trace of g Density of g 0 0.0 60 1.0 0 0 200 400 600 Iterations 800 1000 0 20 40 60 80 100 N = 1000 Bandwidth = 0.13 120 Iterations = 1:1000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 1000 Careful!    > summary(chains) If you insist that the average difference scores must be positive, the model does the best it can That might not be very good! 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE mu 1.5257 1.2338 0.039016 0.041415 sig2 103.3307 35.2416 1.114436 1.014385 delta 0.1517 0.1193 0.003773 0.003773 g 1.5704 6.7063 0.212073 0.231978 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% mu 0.072775 0.55931 1.2281 2.1894 4.4375 sig2 55.467055 79.26845 95.9279 121.5487 191.5553 delta 0.006707 0.05882 0.1254 0.2223 0.4489 g 0.072677 0.17978 0.3580 0.8343 9.3308 Conclusions  JZS Bayes Factors  Easy to calculate  Pretty easy to understand results  You can do a lot with them  Evidence for the null  Add data  Posterior distributions  Different kinds of tests

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PPT slides for 10 November (Bayes Factors).