Download PPT slides for 08 November (Bayes Factors)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
JZS Bayes Factors
Greg Francis
PSY 626: Bayesian Statistics for Psychological Science
Fall 2016
Purdue University
AIC


For a two-sample t-test, the null hypothesis (reduced
model) is that a score from group s (1 or 2) is defined as
With the same mean for each group s
X11 X21 X12 X22
AIC


For a two-sample t-test, the alternative hypothesis (full
model) is that a score from group s (1 or 2) is defined as
With different means for each group s
X11 X21 X12 X22
AIC

AIC and its variants (DIC, WAIC) are a way of comparing model
structures
 One mean or two means?
 Always uses maximum likelihood estimates of the parameters

Bayesian approaches identify a posterior distribution of parameter values

We should use that information!
Models of what?

We have been building models of trial-level scores
FFmodel1 <- map(
alist( HappinessRating ~ dnorm(mu, sigma),
mu <- a1*PenInTeeth + a2*NoPen + a3*PenInLips,
a1 ~ dnorm(50, 100),
a2 ~ dnorm(50, 100),
a3 ~ dnorm(50, 100),
sigma ~ dunif(0, 50)
), data= FFdata )
MSLinearModel <- map2stan(
alist( RT_ms ~ dnorm(mu, sigma),
mu <- a0 + a1*Proximity + a2*Size + a3*Color + a4*Contrast,
a0 ~ dnorm(1000, 1000),
a1 ~ dnorm(0, 20),
a2 ~ dnorm(0, 50),
a3 ~ dnorm(0, 1),
a4 ~ dnorm(0, 500),
sigma ~ dunif(0, 2000)
), data= MSdata )
Models of what?

We have been building models of trial-level scores

That is not the only option

In traditional hypothesis testing, we care more about effect sizes than
about individual scores
 Signal-to-noise ratio
 Of course, the effect size is derived from the individual scores

In some cases, it is enough to just model the effect size itself rather than
the individual scores
 Cohen’s d
 t-statistic
 p-value
 Correlation r
Models of means

It’s not really going to be practical, but let’s consider a case where we
assume that the population variance is known (and equals 1) and we
want to compare null and alternative hypotheses of fixed values
Models of means


The likelihood of any given observed mean value is derived from the
sampling distribution
Suppose n=100 (one sample)
Models of means


The likelihood of any given observed mean value is derived from the
sampling distribution
Suppose n=100 (one sample)
Suppose we observe
Data are more
likely under null
than under
alternative
Models of means


The likelihood of any given observed mean value is derived from the
sampling distribution
Suppose n=100 (one sample)
Suppose we observe
Data are more
likely under
alternative than
under null
Models of means


The likelihood of any given observed mean value is derived from the
sampling distribution
Suppose n=100 (one sample)
Suppose we observe
Data are more
likely under
alternative than
under null
Bayes Factor

The ratio of likelihood for the data under the null compared to the
alternative
 Or the other way around
P( D | H 0 )
BF01 
P ( D | H1 )
Suppose we observe
Data are more
likely under
alternative than
under null
Decision depends on alternative


The likelihood of any given observed mean value is derived from the
sampling distribution
Suppose n=100 (one sample)
Suppose we observe
Data are more
likely under null
than under
alternative
Decision depends on alternative


The likelihood of any given observed mean value is derived from the
sampling distribution
Suppose n=100 (one sample)
Decision depends on alternative


The likelihood of any given observed mean value is derived from the
sampling distribution
Suppose n=100 (one sample)
Decision depends on alternative
For a fixed sample mean, evidence for the alternative only happens for
alternative population mean values of a given range
For big alternative values, the observed sample mean is less likely than for
a null population value


The sample mean may be
unlikely for both models
Rouder et al. (2009)
Models of means

Typically, we do not hypothesize a specific value for the alternative, but a
range of plausible values
Likelihoods

For the null, we compute likelihood in the same way

Suppose n=100 (one sample)
Likelihoods


For the alternative, we have to consider each possible value of mu,
compute the likelihood of the sample mean for that value, and then
average across all possible values
Suppose n=100 (one sample)
Likelihoods


For the alternative, we have to consider each possible value of mu,
compute the likelihood of the sample mean for that value, and then
average across all possible values
Suppose n=100 (one sample)
Likelihoods


For the alternative, we have to consider each possible value of mu,
compute the likelihood of the sample mean for that value, and then
average across all possible values
Suppose n=100 (one sample)
Likelihoods


For the alternative, we have to consider each possible value of mu,
compute the likelihood of the sample mean for that value, and then
average across all possible values
Suppose n=100 (one sample)
Average Likelihood


For the alternative, we have to consider each possible value of mu,
compute the likelihood of the sample mean for that value, and then
average across all possible values
Suppose n=100 (one sample)
Prior for
value of
mu
Likelihood for
given value of
mu (from
sampling distribution)
Bayes Factor

Ratio of the likelihood for the null compared to the (average) likelihood for
the alternative
P( D | H 0 )
BF01 
P ( D | H1 )
P(D | H1)
Uncertainty

The prior standard deviation for mu establishes a range of plausible
values for mu
Less flexible
More flexible
Uncertainty

With a very narrow prior, you may not fit the data
Less flexible
More flexible
Uncertainty

With a very broad prior, you will fit well for some values of mu and poorly
for other values of mu
Less flexible
More flexible
Uncertainty

Uncertainty in the prior functions similar to the penalty for parameters in
AIC
Less flexible
More flexible
Penalty

Averaging acts like a penalty for extra parameters

Rouder et al. (2009)
Models of effect size

Consider the case of two-sample t-test

We often care about the standardized effect size

Which we can estimate from data as:
Models of effect size

If we were doing traditional hypothesis testing, we would compare a null
model:

Against an alternative:

Equivalent statements can be made using the standardized effect size
 As long as the standard deviation is not zero
Priors on effect size

For the null, the prior is (again) a spike at zero
JZS Priors on effect size

For the alternative, a good choice is a Cauchy distribution (t-distribution
with df=1) Rouder et al. (2009)
Jeffreys, Zellner, Siow
JZS Priors on effect size

It is a good choice because the integration for the alternative hypothesis
can be done numerically

t is the t-value you use in a hypothesis test (from the data)

v is the “degrees of freedom” (from the data)

This might not look easy, but it is simple to calculate with a computer
Variations of JZS Priors

Scale parameter “r”

Bigger values make for a broader prior
 More flexible!
 More penalty!
Variations of JZS Priors

Medium r= 1

Wide r= sqrt(2)/2

Ultrawide r=sqrt(2)
How do we use it?

Super easy

My own web site for a two-sample t-test

http://psych.purdue.edu/~gfrancis/EquivalentStatistics/

Rouder’s web site:

http://pcl.missouri.edu/bayesfactor

In R

library(BayesFactor)
How do we use it?
How do we use it?
How do we use it?

library(BayesFactor)

ttest.tstat(t=2.2, n1=15, n2=15,
simple=TRUE)


B10
1.993006
What does it mean?

Guidelines
BF
Evidence
1–3
3 – 10
10 – 30
30 – 100
>100
Anecdotal
Substantial
Strong
Very strong
Decisive
How does it compare to NHST?

To get BF>10, need rather large t values

Bigger still with
larger sample
sizes
Conclusions

JZS Bayes Factors

Easy to calculate

Pretty easy to understand results

A bit arbitrary for setting up
 Why not other priors?
 How to pick scale factor?
 Criteria for interpretation are arbitrary

Fairly painless introduction to Bayesian methods

Some things to watch out for (next time)