* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT slides for 08 November (Bayes Factors)
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Gibbs sampling wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Resampling (statistics) wikipedia , lookup
JZS Bayes Factors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University AIC For a two-sample t-test, the null hypothesis (reduced model) is that a score from group s (1 or 2) is defined as With the same mean for each group s X11 X21 X12 X22 AIC For a two-sample t-test, the alternative hypothesis (full model) is that a score from group s (1 or 2) is defined as With different means for each group s X11 X21 X12 X22 AIC AIC and its variants (DIC, WAIC) are a way of comparing model structures One mean or two means? Always uses maximum likelihood estimates of the parameters Bayesian approaches identify a posterior distribution of parameter values We should use that information! Models of what? We have been building models of trial-level scores FFmodel1 <- map( alist( HappinessRating ~ dnorm(mu, sigma), mu <- a1*PenInTeeth + a2*NoPen + a3*PenInLips, a1 ~ dnorm(50, 100), a2 ~ dnorm(50, 100), a3 ~ dnorm(50, 100), sigma ~ dunif(0, 50) ), data= FFdata ) MSLinearModel <- map2stan( alist( RT_ms ~ dnorm(mu, sigma), mu <- a0 + a1*Proximity + a2*Size + a3*Color + a4*Contrast, a0 ~ dnorm(1000, 1000), a1 ~ dnorm(0, 20), a2 ~ dnorm(0, 50), a3 ~ dnorm(0, 1), a4 ~ dnorm(0, 500), sigma ~ dunif(0, 2000) ), data= MSdata ) Models of what? We have been building models of trial-level scores That is not the only option In traditional hypothesis testing, we care more about effect sizes than about individual scores Signal-to-noise ratio Of course, the effect size is derived from the individual scores In some cases, it is enough to just model the effect size itself rather than the individual scores Cohen’s d t-statistic p-value Correlation r Models of means It’s not really going to be practical, but let’s consider a case where we assume that the population variance is known (and equals 1) and we want to compare null and alternative hypotheses of fixed values Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null Bayes Factor The ratio of likelihood for the data under the null compared to the alternative Or the other way around P( D | H 0 ) BF01 P ( D | H1 ) Suppose we observe Data are more likely under alternative than under null Decision depends on alternative The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative Decision depends on alternative The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Decision depends on alternative The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Decision depends on alternative For a fixed sample mean, evidence for the alternative only happens for alternative population mean values of a given range For big alternative values, the observed sample mean is less likely than for a null population value The sample mean may be unlikely for both models Rouder et al. (2009) Models of means Typically, we do not hypothesize a specific value for the alternative, but a range of plausible values Likelihoods For the null, we compute likelihood in the same way Suppose n=100 (one sample) Likelihoods For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Likelihoods For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Likelihoods For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Likelihoods For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Average Likelihood For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Prior for value of mu Likelihood for given value of mu (from sampling distribution) Bayes Factor Ratio of the likelihood for the null compared to the (average) likelihood for the alternative P( D | H 0 ) BF01 P ( D | H1 ) P(D | H1) Uncertainty The prior standard deviation for mu establishes a range of plausible values for mu Less flexible More flexible Uncertainty With a very narrow prior, you may not fit the data Less flexible More flexible Uncertainty With a very broad prior, you will fit well for some values of mu and poorly for other values of mu Less flexible More flexible Uncertainty Uncertainty in the prior functions similar to the penalty for parameters in AIC Less flexible More flexible Penalty Averaging acts like a penalty for extra parameters Rouder et al. (2009) Models of effect size Consider the case of two-sample t-test We often care about the standardized effect size Which we can estimate from data as: Models of effect size If we were doing traditional hypothesis testing, we would compare a null model: Against an alternative: Equivalent statements can be made using the standardized effect size As long as the standard deviation is not zero Priors on effect size For the null, the prior is (again) a spike at zero JZS Priors on effect size For the alternative, a good choice is a Cauchy distribution (t-distribution with df=1) Rouder et al. (2009) Jeffreys, Zellner, Siow JZS Priors on effect size It is a good choice because the integration for the alternative hypothesis can be done numerically t is the t-value you use in a hypothesis test (from the data) v is the “degrees of freedom” (from the data) This might not look easy, but it is simple to calculate with a computer Variations of JZS Priors Scale parameter “r” Bigger values make for a broader prior More flexible! More penalty! Variations of JZS Priors Medium r= 1 Wide r= sqrt(2)/2 Ultrawide r=sqrt(2) How do we use it? Super easy My own web site for a two-sample t-test http://psych.purdue.edu/~gfrancis/EquivalentStatistics/ Rouder’s web site: http://pcl.missouri.edu/bayesfactor In R library(BayesFactor) How do we use it? How do we use it? How do we use it? library(BayesFactor) ttest.tstat(t=2.2, n1=15, n2=15, simple=TRUE) B10 1.993006 What does it mean? Guidelines BF Evidence 1–3 3 – 10 10 – 30 30 – 100 >100 Anecdotal Substantial Strong Very strong Decisive How does it compare to NHST? To get BF>10, need rather large t values Bigger still with larger sample sizes Conclusions JZS Bayes Factors Easy to calculate Pretty easy to understand results A bit arbitrary for setting up Why not other priors? How to pick scale factor? Criteria for interpretation are arbitrary Fairly painless introduction to Bayesian methods Some things to watch out for (next time)