Download PPT slides for 08 November (Bayes Factors)

JZS Bayes Factors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University AIC   For a two-sample t-test, the null hypothesis (reduced model) is that a score from group s (1 or 2) is defined as With the same mean for each group s X11 X21 X12 X22 AIC   For a two-sample t-test, the alternative hypothesis (full model) is that a score from group s (1 or 2) is defined as With different means for each group s X11 X21 X12 X22 AIC  AIC and its variants (DIC, WAIC) are a way of comparing model structures  One mean or two means?  Always uses maximum likelihood estimates of the parameters  Bayesian approaches identify a posterior distribution of parameter values  We should use that information! Models of what?  We have been building models of trial-level scores FFmodel1 <- map( alist( HappinessRating ~ dnorm(mu, sigma), mu <- a1*PenInTeeth + a2*NoPen + a3*PenInLips, a1 ~ dnorm(50, 100), a2 ~ dnorm(50, 100), a3 ~ dnorm(50, 100), sigma ~ dunif(0, 50) ), data= FFdata ) MSLinearModel <- map2stan( alist( RT_ms ~ dnorm(mu, sigma), mu <- a0 + a1*Proximity + a2*Size + a3*Color + a4*Contrast, a0 ~ dnorm(1000, 1000), a1 ~ dnorm(0, 20), a2 ~ dnorm(0, 50), a3 ~ dnorm(0, 1), a4 ~ dnorm(0, 500), sigma ~ dunif(0, 2000) ), data= MSdata ) Models of what?  We have been building models of trial-level scores  That is not the only option  In traditional hypothesis testing, we care more about effect sizes than about individual scores  Signal-to-noise ratio  Of course, the effect size is derived from the individual scores  In some cases, it is enough to just model the effect size itself rather than the individual scores  Cohen’s d  t-statistic  p-value  Correlation r Models of means  It’s not really going to be practical, but let’s consider a case where we assume that the population variance is known (and equals 1) and we want to compare null and alternative hypotheses of fixed values Models of means   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Models of means   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative Models of means   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null Models of means   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null Bayes Factor  The ratio of likelihood for the data under the null compared to the alternative  Or the other way around P( D | H 0 ) BF01  P ( D | H1 ) Suppose we observe Data are more likely under alternative than under null Decision depends on alternative   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative Decision depends on alternative   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Decision depends on alternative   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Decision depends on alternative For a fixed sample mean, evidence for the alternative only happens for alternative population mean values of a given range For big alternative values, the observed sample mean is less likely than for a null population value   The sample mean may be unlikely for both models Rouder et al. (2009) Models of means  Typically, we do not hypothesize a specific value for the alternative, but a range of plausible values Likelihoods  For the null, we compute likelihood in the same way  Suppose n=100 (one sample) Likelihoods   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Likelihoods   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Likelihoods   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Likelihoods   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Average Likelihood   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Prior for value of mu Likelihood for given value of mu (from sampling distribution) Bayes Factor  Ratio of the likelihood for the null compared to the (average) likelihood for the alternative P( D | H 0 ) BF01  P ( D | H1 ) P(D | H1) Uncertainty  The prior standard deviation for mu establishes a range of plausible values for mu Less flexible More flexible Uncertainty  With a very narrow prior, you may not fit the data Less flexible More flexible Uncertainty  With a very broad prior, you will fit well for some values of mu and poorly for other values of mu Less flexible More flexible Uncertainty  Uncertainty in the prior functions similar to the penalty for parameters in AIC Less flexible More flexible Penalty  Averaging acts like a penalty for extra parameters  Rouder et al. (2009) Models of effect size  Consider the case of two-sample t-test  We often care about the standardized effect size  Which we can estimate from data as: Models of effect size  If we were doing traditional hypothesis testing, we would compare a null model:  Against an alternative:  Equivalent statements can be made using the standardized effect size  As long as the standard deviation is not zero Priors on effect size  For the null, the prior is (again) a spike at zero JZS Priors on effect size  For the alternative, a good choice is a Cauchy distribution (t-distribution with df=1) Rouder et al. (2009) Jeffreys, Zellner, Siow JZS Priors on effect size  It is a good choice because the integration for the alternative hypothesis can be done numerically  t is the t-value you use in a hypothesis test (from the data)  v is the “degrees of freedom” (from the data)  This might not look easy, but it is simple to calculate with a computer Variations of JZS Priors  Scale parameter “r”  Bigger values make for a broader prior  More flexible!  More penalty! Variations of JZS Priors  Medium r= 1  Wide r= sqrt(2)/2  Ultrawide r=sqrt(2) How do we use it?  Super easy  My own web site for a two-sample t-test  http://psych.purdue.edu/~gfrancis/EquivalentStatistics/  Rouder’s web site:  http://pcl.missouri.edu/bayesfactor  In R  library(BayesFactor) How do we use it? How do we use it? How do we use it?  library(BayesFactor)  ttest.tstat(t=2.2, n1=15, n2=15, simple=TRUE)   B10 1.993006 What does it mean?  Guidelines BF Evidence 1–3 3 – 10 10 – 30 30 – 100 >100 Anecdotal Substantial Strong Very strong Decisive How does it compare to NHST?  To get BF>10, need rather large t values  Bigger still with larger sample sizes Conclusions  JZS Bayes Factors  Easy to calculate  Pretty easy to understand results  A bit arbitrary for setting up  Why not other priors?  How to pick scale factor?  Criteria for interpretation are arbitrary  Fairly painless introduction to Bayesian methods  Some things to watch out for (next time)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PPT slides for 08 November (Bayes Factors)