Download PPT slides for 10 November (Bayes Factors).

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Fun with Bayes Factors
Greg Francis
PSY 626: Bayesian Statistics for Psychological Science
Fall 2016
Purdue University
Bayes Factor
The ratio of likelihood for the data under the null compared to the
alternative

 Nothing special about the null, it compares any two models

Likelihoods are averages across different possible parameter values
specified by the model by a prior distribution
What does it mean?

Guidelines
BF
Evidence
1–3
3 – 10
10 – 30
30 – 100
>100
Anecdotal
Substantial
Strong
Very strong
Decisive
Evidence for the null

BF01>1 implies (some) support the null hypothesis

Evidence for “invariances”

This is more or less impossible for NHST

It is a useful measure

Consider a recent study in Psychological Sciences

Liu, Wang, Wang & Jiang (2016). Conscious Access to Suppressed
Threatening Information Is Modulated by Working Memory
Working memory face emotion


Explored whether keeping a face in working memory influenced its
visibility under continuous flash suppression
To insure
subjects kept
face in memory,
tested for
identity
Working memory face emotion


Different types of face emotions: fearful face, neutral face
No significant differences of correct responses (same/different) for
emotions:
 Experiment 1: t(11)= -1.74, p=0.110

If we compute the JZS Bayes Factor we get

> ttest.tstat(t=-1.74, n1=12, simple=TRUE)

B10

0.9240776

Which is anecdotal support for the null hypothesis

You would want B10< 1/3 for substantial support for the null
Replications

Experiment 3
 t(11)=-1.62, p=.133

Experiment 4
 t(13)=-1.37, p=.195


Converting to JZS Bayes Factors suggests these are modest support for
the null
Experiment 3
 ttest.tstat(t= -1.62, n1=12, simple=TRUE)

B10
 0.8033315

Experiment 4
 ttest.tstat(t= -1.37, n1=14, simple=TRUE)

B10
 0.5857839
The null result matters

The authors wanted to demonstrate that faces with different emotions
were equivalently represented in working memory

But differently affected visibility during the flash suppression part of a trial

Experiment 1:

Reaction times for seeing a face during continuous flash suppression
were shorter for fearful faces than for neutral faces
 Main effect of emotion:
F(1, 11)=5.06, p=0.046

Reaction times were shorter when the
emotion of the face during continuous
flash suppression matched the
emotion of the face in working memory
 Main effect of congruency:
F(1, 11)=11.86, p=0.005
Main effects

We will talk about a Bayesian ANOVA later, but we can consider the ttest equivalent of these tests:

Effect of emotion

> ttest.tstat(t= sqrt(5.06), n1=12, simple=TRUE)

B10

1.769459

Suggests anecdotal support for the alternative hypothesis

Effect of congruency


ttest.tstat(t= sqrt(11.86), n1=12, simple=TRUE)
B10

9.664241

Suggests substantial support for the alternative hypothesis
Evidence

It is generally harder to get convincing evidence (BF>3 or BF>10) than to
get p<.05

Interaction: F(1, 11)=4.36, p=.061

Contrasts:
 RT for fearful faces shorter if congruent with
working memory: t(11)=-3.59, p=.004
 RT for neutral faces unaffected by congruency
t(11)=-0.45

Bayesian interpretations of t-tests:
 > ttest.tstat(t=-3.59, n1=12, simple=TRUE)

B10
 11.94693
 > ttest.tstat(t=-0.45, n1=12, simple=TRUE)

B10
 0.3136903
Substantial Evidence

For a two-sample t-test (n1=n2=10), a BF>3 corresponds to p<0.022

For a two-sample t-test (n1=n2=100), a BF>3 corresponds to p<0.012

For a two-sample t-test (n1=n2=1000), a BF>3 corresponds to p<0.004
Strong Evidence

For a two-sample t-test (n1=n2=10), a BF>10 corresponds to p<0.004

For a two-sample t-test (n1=n2=100), a BF>10 corresponds to p<0.003

For a two-sample t-test (n1=n2=1000), a BF>10 corresponds to p<0.001

Of course, if you change your prior you change these values
 (but not much)

Setting the scale parameter r=sqrt(2) (ultra wide) gives

For a two-sample t-test (n1=n2=10), a BF>10 corresponds to p<0.005

For a two-sample t-test (n1=n2=100), a BF>10 corresponds to p<0.0017

For a two-sample t-test (n1=n2=1000), a BF>10 corresponds to p<0.00054
Bayesian meta-analysis

Rouder & Morey (2011) identified how to combine replication studies to
produce a JZS Bayes Factor that accumulates the information across
experiments

The formula for a one-sample, one-tailed t-test for BF10 is

f( ) is the Cauchy (or half-Cauchy) distribution

g( ) is the non-central t distribution

It looks complicated, but it is easy enough to calculate
Bayesian meta-analysis

Consider the null results on face emotion and memorability

Experiment 1
 t(11)= -1.74, p=0.110

Experiment 3
 t(11)=-1.62, p=.133

Experiment 4
 t(13)=-1.37, p=.195

Strong support for the
alternative!
> tvalues<-c(-1.74, -1.62, -1.37)
> nvalues<-c(12, 12, 14)
> meta.ttestBF(t=tvalues, n1=nvalues)
Bayes factor analysis
-------------[1] Alt., r=0.707 : 4.414733 ±0%
Against denominator:
Null, d = 0
--Bayes factor type: BFmetat, JZS
Linear regression

The BayesFactor library has several functions for linear regression

Consider the previously discussed Map Search data
> regular<-lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, data=MSdata)
> summary(regular)

MSdata<-read.csv(file="MapSearch.csv",header=TRUE,stringsAsFactors=FALSE)
Call:
lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, data = MSdata)
Residuals:
Min
1Q Median
3Q Max
-289.36 -107.29 -20.39 92.34 510.95
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.073e+03 9.592e+01 11.183 < 2e-16 ***
Color
-1.928e-03 5.729e-04 -3.366 0.000994 ***
Proximity 1.974e+00 2.153e-01 9.170 7e-16 ***
Size
3.236e+01 1.359e+01 2.381 0.018684 *
Contrast -1.450e+02 6.886e+01 -2.105 0.037108 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 157.9 on 135 degrees of freedom
Multiple R-squared: 0.4593,
Adjusted R-squared: 0.4433
F-statistic: 28.67 on 4 and 135 DF, p-value: < 2.2e-16
Linear regression



> summary(bf)
Bayes factor analysis
-------------regressionbf( ) compares all
additive models to: 53.34634
the intercept
[1] Color
±0%
[2] Proximity
: 3.164296e+12 ±0.01%
> bf = regressionBF(RT_ms ~ ., data=MSdata)
[3] Size
: 1.784275 ±0%
[4] Contrast
: 0.2139982 ±0%
|============================================================================================
===============================================|
100%
[5] Color + Proximity
: 2.992316e+13 ±0%
[6] Color + Size
: 124.498
±0%
[7] Color + Contrast
: 22.93048 ±0%
[8] Proximity + Size
: 1.412119e+13 ±0%
[9] Proximity + Contrast
: 2.823525e+12 ±0%
[10] Size + Contrast
: 0.4558166 ±0.01%
[11] Color + Proximity + Size
: 1.697263e+14 ±0%
[12] Color + Proximity + Contrast
: 9.524173e+13 ±0%
[13] Color + Size + Contrast
: 43.70297 ±0.01%
[14] Proximity + Size + Contrast
: 7.195322e+12 ±0%
[15] Color + Proximity + Size + Contrast : 2.332274e+14 ±0%
“RT_ms ~.” Means use
all other variables in the
data set
Against denominator:
Intercept only
--Bayes factor type: BFlinearModel, JZS
Specific comparisons

Remember that each model is a ratio of average likelihoods

You can easily create other such ratios
> bf[15]/bf[14]
Bayes factor analysis
-------------[1] Color + Proximity + Size + Contrast : 32.41376 ±0%
Against denominator:
RT_ms ~ Proximity + Size + Contrast
--Bayes factor type: BFlinearModel, JZS
Interactions

bf2 <- lmBF(RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Color, data=MSdata)
regressionBF(
> summary(bf2) ) does not handle interactions
Bayes
analysis variables, you have
 Forfactor
p independent
-------------[1] Color + Proximity + Size + Contrast + Contrast:Color : 4.599422e+13 ±0%
Against denominator:
different
models, which is rather unwieldy
Intercept
only
--Bayes factor type: BFlinearModel, JZS

With the function lmbf( ) you can specify particular models, which are
compared against the intercept-only model
> bf3 <- lmBF(RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity,
data=MSdata)
> summary(bf3)
Bayes factor analysis
-------------[1] Color + Proximity + Size + Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity : 5.495223e+12 ±0%
Against denominator:
Intercept only
--Bayes factor type: BFlinearModel, JZS
Compare models
Again, it is easy to generate new Bayes Factors by division

> bf2/bf3
Bayes factor analysis
-------------[1] Color + Proximity + Size + Contrast + Contrast:Color : 8.369856 ±0%
Against denominator:
RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity
--Bayes factor type: BFlinearModel, JZS
>
Compare models by BF

Generate multiple models and compare BF (relative to intercept-only)

> CompareContrastBFs<-c(bf[15], bf2, bf3,bf4, bf[14])

> head(CompareContrastBFs)

Bayes factor analysis

--------------

[1] Color + Proximity + Size + Contrast

[2] Color + Proximity + Size + Contrast + Contrast:Size
: 1.001063e+14 ±0%

[3] Color + Proximity + Size + Contrast + Contrast:Color
: 4.599422e+13 ±0%

[4] Proximity + Size + Contrast

[5] Color + Proximity + Size
Contrast + Contrast:Color + Contrast:Size + Contrast:Proximity : 5.495223e+12 ±0%
>+CompareContrastBFs[1]/CompareContrastBFs[4]


Against denominator:
Intercept only
: 2.332274e+14 ±0%
: 7.195322e+12 ±0%
Bayes factor analysis
-------------[1] Color + Proximity + Size + Contrast : 2.329798 ±0%

---

Bayes factor type: BFlinearModel, JZS
Against denominator:
RT_ms ~ Color + Proximity + Size + Contrast + Contrast:Size
--Bayes factor type: BFlinearModel, JZS
Compare models with WAIC


We had previously done the same kind of thing in Stan (similar results,
but not exactly the same)
> compare(MSLinearModel, MSLinearNoColorModel,
MSColorContrastInteractionModel, MSSizeContrastInteractionModel,
MSAllContrastInteractionModel)
WAIC pWAIC dWAIC weight

SE dSE

MSSizeContrastInteractionModel 1820.6 5.7 0.0 0.45 17.61 NA

MSLinearModel

MSColorContrastInteractionModel 1822.4 5.9 1.9 0.18 18.14 1.83

MSAllContrastInteractionModel

MSLinearNoColorModel
1821.8 5.6 1.2 0.24 18.17 1.87
1823.1 7.0 2.6 0.13 17.45 1.35
1830.3 4.4 9.7 0.00 18.16 5.73
Trace of Color
Density of Color
200
-0.003
400
0.000
Posteriors
Models can generate posterior distributions

Consider the model that just uses color as an independent variable
0
-0.006

0
2000
4000
6000
8000
10000
-0.006
Iterations
0

|----|----|----|----|----|----|----|----|----|----|

**************************************************|

> plot(chainsColor)
4e-05
%
8000
0e+00
30000
6000
10000
30000
40000
50000
60000
N = 10000 Bandwidth = 855.9
Trace of g
Density of g
70000
0
2
4
200
6
400
8
10
Iterations
0
sig2 is the error
variance
4000
0.000
Density of sig2
8e-05

50000
Trace of sig2
> chainsColor<-posterior(bf[1], iterations=10000)
2000
-0.002
N = 10000 Bandwidth = 0.0001202

0
-0.004
0
2000
4000
6000
Iterations
8000
10000
0
100
200
300
400
N = 10000 Bandwidth = 0.03864
500
Posteriors

> summary(chainsColor)

Iterations = 1:10000

Thinning interval = 1

Number of chains = 1

Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean

SD Naive SE Time-series SE

Color -2.414e-03 7.154e-04 7.154e-06

sig2 4.182e+04 5.095e+03 5.095e+01

g

2. Quantiles for each variable:
1.017e+00 1.166e+01 1.166e-01
2.5%

25%
50%
75%
7.498e-06
5.195e+01
1.166e-01
97.5%

Color -3.847e-03 -2.892e-03 -2.412e-03 -1.929e-03 -1.021e-03

sig2 3.290e+04 3.819e+04 4.152e+04 4.502e+04 5.266e+04

g
2.625e-02 7.437e-02 1.540e-01 3.826e-01 4.558e+00
Posteriors

Full linear model:

> chainsFullLinear<-posterior(bf[15], iterations=10000)

0

|----|----|----|----|----|----|----|----|----|----|

**************************************************|

> summary(chainsFullLinear)

Iterations = 1:10000

Thinning interval = 1

Number of chains = 1

Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,
%
Call:
lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, data = MSdata
Residuals:
Min
1Q Median
3Q Max
-289.36 -107.29 -20.39 92.34 510.95
plus standard error of the mean:

Mean

SD Naive SE Time-series SE

Color

Proximity 1.900e+00 2.186e-01 2.186e-03

Size

Contrast -1.412e+02 6.745e+01 6.745e-01

sig2

g

2. Quantiles for each variable:
-1.850e-03 5.668e-04 5.668e-06
5.668e-06
2.324e-03
3.093e+01 1.359e+01 1.359e-01
1.346e-01
2.521e+04 3.098e+03 3.098e+01
3.074e-01 4.189e-01 4.189e-03
2.5%

25%
> regular<-lm(formula = RT_ms ~ Color + Proximity + Size + Contrast, dat
> summary(regular)
50%
6.967e-01
3.262e+01
4.189e-03
75%
97.5%

Color
-2.974e-03 -2.229e-03 -1.850e-03 -1.472e-03 -7.478e-04

Proximity 1.475e+00 1.751e+00 1.898e+00 2.050e+00 2.324e+00

Size

Contrast -2.749e+02 -1.854e+02 -1.408e+02 -9.664e+01 -8.085e+00

sig2

g
4.577e+00 2.168e+01 3.090e+01 4.002e+01 5.775e+01
1.988e+04 2.304e+04 2.495e+04 2.709e+04 3.184e+04
6.425e-02 1.329e-01 2.072e-01 3.414e-01 1.158e+00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.073e+03 9.592e+01 11.183 < 2e-16 ***
Color
-1.928e-03 5.729e-04 -3.366 0.000994 ***
Proximity 1.974e+00 2.153e-01 9.170 7e-16 ***
Size
3.236e+01 1.359e+01 2.381 0.018684 *
Contrast -1.450e+02 6.886e+01 -2.105 0.037108 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 157.9 on 135 degrees of freedom
Multiple R-squared: 0.4593,
Adjusted R-squared: 0.4433
F-statistic: 28.67 on 4 and 135 DF, p-value: < 2.2e-16
Trace of Contrast
Posteriors
0
2000
4000
6000
2000
8000
4000
10000
6000
Iterations
0.006
0.000
0.003
200 400 600
-300
-0.006
0
0
-100
0.000
-0.003
> plot(chainsFullLinear)
8000
10000
-0.006
Iterations
-400
-0.004
-300
-0.002
0.00008
0.00000
1.0
8000
4000
10000
0.0
0.5
20000
6000
2000
6000
Iterations
8000
10000
1.0
Iterations
20000
1.5
2.0
2.5
40000
50000
3.0
N = 10000 Bandwidth = 508.4
Trace of g
Density of g
3.0
8
0.030
10
Density of Size
0
0
2000
4000
6000
Iterations
2000
8000
4000
10000
0.000
1.0
0
0.0
0
2
4
0.015
2.0
6
80
30000
N = 10000 Bandwidth = 0.03672
Trace of Size
40
100
Density of sig2
1.5
40000
2.5
4000
0
Density of Proximity
2.0
1.5
1.0
0
2000
-100
0.000
N = 10000 Bandwidth = 11.13
Trace of sig2
0
-200
N = 10000 Bandwidth = 9.489e-05
Trace of Proximity
-40

Density of Contrast
Density of Color
100
Trace of Color
6000
Iterations
8000
-40
10000
-20
0
0
20
40
2
60
N = 10000 Bandwidth = 2.283
4
6
8
80
N = 10000 Bandwidth = 0.02613
10
Not happy with your result?

Suppose you get BF=2.9, but you want BF>3

Gather more data!



There’s no problem with gathering more data because you are
comparing two models, not deciding whether one model should be
rejected
Gathering more data will affect the average likelihood of each model,
adding data gives you more evidence about the relative fit of the models
to the data you have observed
Note, if you make a decision based on BF>3 (or whatever), then you
might increase the Type I error rate
 That is not what a Bayesian analysis is about

However, over the long run, the BF will get very large in favor of the true
model (if it is one of the models)
One-tailed tests

Consider the facial feedback
data

Dependent data

Just to demonstrate things
 Compute average rating for each
participant
 For each condition
 Drop NoPen condition

> FFdata<read.csv(file="FacialFeedbackAvg.csv",header=TRUE,s
tringsAsFactors=FALSE)
One-tailed tests

Regular dependent t-test

One-sample test on differences

> diffScores <- FFdata$PenInTeeth - FFdata$PenInLips

> t.test(diffScores, alternative="greater”)

One Sample t-test

data: diffScores

t = 0.56122, df = 20, p-value = 0.2904

alternative hypothesis: true mean is greater than 0

95 percent confidence interval:

-3.266755
Inf

sample estimates:

mean of x

1.575758
Bayesian t-test

library(BayesFactor)

> bf<-ttestBF(x=diffScores)

> bf

Bayes factor analysis

--------------

[1] Alt., r=0.707 : 0.2621909 ±0.03%

Against denominator:

Null, mu = 0

---

Bayes factor type: BFoneSample, JZS
One-tailed Bayesian t-test

Specify a range for the null hypothesis

> bfInterval <- ttestBF(x=diffScores, nullInterval=c(0, Inf))

> bfInterval

Bayes factor analysis

--------------

[1] Alt., r=0.707 0<d<Inf

[2] Alt., r=0.707 !(0<d<Inf) : 0.1571203 ±0%

Against denominator:

: 0.3672615 ±0%
Null, mu = 0

---

Bayes factor type: BFoneSample, JZS
We get 2
Tests!
Directional test

Your null does not have to be a point

You just tested Model 1 (M1): delta>0 against delta=0

> bfInterval[1]/bfInterval[2]
You just tested
Model 2 (M2): delta<0 against delta=0

You can
Bayes factor analysis
-------------compare
M1 and M2 by dividing the
[1] Alt., r=0.707 0<d<Inf : 2.337454 ±0%
BFs
Against denominator:
Alternative, r = 0.707106781186548, mu =/= 0 !(0<d<Inf)
--Bayes factor type: BFoneSample, JZS
Careful!

You still have look at your data

One subject seems to have “given up” on the task

Removing this subject produces a rather different result
Careful!

> FFdata<read.csv(file="FacialFeedbackAvg.csv",header=TRUE,stringsAsFactors=FALSE)

> FFdata <- FFdata[-c(18),] # Removes row of non-responsive subject

> bfInterval

Bayes factor analysis

--------------

[1] Alt., r=0.707 0<d<Inf

[2] Alt., r=0.707 !(0<d<Inf) : 0.2567416 ±0.01%

Against denominator:

: 0.2114579 ±0.01%
Null, mu = 0

---

Bayes factor type: BFoneSample, JZS
Trace of mu
4
0.3
0
0.0
8
Careful!
Density of mu

> chains<-posterior(bfInterval[1],
0
200
400 iterations=1000)
600
800
1000

0

Trace of sig2
|----|----|----|----|----|----|----|----|----|----|

**************************************************|

> plot(chains)
0
2
Iterations
4
6
8
N = 1000 Bandwidth = 0.3239
%
0.012
0.000
50 200
Density of sig2
200
400
600
800
1000
50
100
150
200
250
Iterations
N = 1000 Bandwidth = 8.401
Trace of delta
Density of delta
300
350
0
0.0
2
0.6
4
0
200
400
600
800
1000
0.0
0.2
0.4
0.6
0.8
Iterations
N = 1000 Bandwidth = 0.03177
Trace of g
Density of g
0
0.0
60
1.0
0
0
200
400
600
Iterations
800
1000
0
20
40
60
80
100
N = 1000 Bandwidth = 0.13
120
Iterations = 1:1000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 1000
Careful!



> summary(chains)
If you insist that the
average difference scores
must be positive, the
model does the best it can
That might not be very
good!
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
mu
1.5257 1.2338 0.039016
0.041415
sig2 103.3307 35.2416 1.114436
1.014385
delta 0.1517 0.1193 0.003773
0.003773
g
1.5704 6.7063 0.212073
0.231978
2. Quantiles for each variable:
2.5%
25% 50%
75% 97.5%
mu 0.072775 0.55931 1.2281 2.1894 4.4375
sig2 55.467055 79.26845 95.9279 121.5487 191.5553
delta 0.006707 0.05882 0.1254 0.2223 0.4489
g
0.072677 0.17978 0.3580 0.8343 9.3308
Conclusions

JZS Bayes Factors

Easy to calculate

Pretty easy to understand results

You can do a lot with them

Evidence for the null

Add data

Posterior distributions

Different kinds of tests