* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT slides for 08 November (Bayes Factors)
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Gibbs sampling wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Resampling (statistics) wikipedia , lookup
JZS Bayes Factors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University AIC   For a two-sample t-test, the null hypothesis (reduced model) is that a score from group s (1 or 2) is defined as With the same mean for each group s X11 X21 X12 X22 AIC   For a two-sample t-test, the alternative hypothesis (full model) is that a score from group s (1 or 2) is defined as With different means for each group s X11 X21 X12 X22 AIC  AIC and its variants (DIC, WAIC) are a way of comparing model structures  One mean or two means?  Always uses maximum likelihood estimates of the parameters  Bayesian approaches identify a posterior distribution of parameter values  We should use that information! Models of what?  We have been building models of trial-level scores FFmodel1 <- map( alist( HappinessRating ~ dnorm(mu, sigma), mu <- a1*PenInTeeth + a2*NoPen + a3*PenInLips, a1 ~ dnorm(50, 100), a2 ~ dnorm(50, 100), a3 ~ dnorm(50, 100), sigma ~ dunif(0, 50) ), data= FFdata ) MSLinearModel <- map2stan( alist( RT_ms ~ dnorm(mu, sigma), mu <- a0 + a1*Proximity + a2*Size + a3*Color + a4*Contrast, a0 ~ dnorm(1000, 1000), a1 ~ dnorm(0, 20), a2 ~ dnorm(0, 50), a3 ~ dnorm(0, 1), a4 ~ dnorm(0, 500), sigma ~ dunif(0, 2000) ), data= MSdata ) Models of what?  We have been building models of trial-level scores  That is not the only option  In traditional hypothesis testing, we care more about effect sizes than about individual scores  Signal-to-noise ratio  Of course, the effect size is derived from the individual scores  In some cases, it is enough to just model the effect size itself rather than the individual scores  Cohen’s d  t-statistic  p-value  Correlation r Models of means  It’s not really going to be practical, but let’s consider a case where we assume that the population variance is known (and equals 1) and we want to compare null and alternative hypotheses of fixed values Models of means   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Models of means   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative Models of means   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null Models of means   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null Bayes Factor  The ratio of likelihood for the data under the null compared to the alternative  Or the other way around P( D | H 0 ) BF01  P ( D | H1 ) Suppose we observe Data are more likely under alternative than under null Decision depends on alternative   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative Decision depends on alternative   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Decision depends on alternative   The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Decision depends on alternative For a fixed sample mean, evidence for the alternative only happens for alternative population mean values of a given range For big alternative values, the observed sample mean is less likely than for a null population value   The sample mean may be unlikely for both models Rouder et al. (2009) Models of means  Typically, we do not hypothesize a specific value for the alternative, but a range of plausible values Likelihoods  For the null, we compute likelihood in the same way  Suppose n=100 (one sample) Likelihoods   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Likelihoods   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Likelihoods   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Likelihoods   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Average Likelihood   For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Prior for value of mu Likelihood for given value of mu (from sampling distribution) Bayes Factor  Ratio of the likelihood for the null compared to the (average) likelihood for the alternative P( D | H 0 ) BF01  P ( D | H1 ) P(D | H1) Uncertainty  The prior standard deviation for mu establishes a range of plausible values for mu Less flexible More flexible Uncertainty  With a very narrow prior, you may not fit the data Less flexible More flexible Uncertainty  With a very broad prior, you will fit well for some values of mu and poorly for other values of mu Less flexible More flexible Uncertainty  Uncertainty in the prior functions similar to the penalty for parameters in AIC Less flexible More flexible Penalty  Averaging acts like a penalty for extra parameters  Rouder et al. (2009) Models of effect size  Consider the case of two-sample t-test  We often care about the standardized effect size  Which we can estimate from data as: Models of effect size  If we were doing traditional hypothesis testing, we would compare a null model:  Against an alternative:  Equivalent statements can be made using the standardized effect size  As long as the standard deviation is not zero Priors on effect size  For the null, the prior is (again) a spike at zero JZS Priors on effect size  For the alternative, a good choice is a Cauchy distribution (t-distribution with df=1) Rouder et al. (2009) Jeffreys, Zellner, Siow JZS Priors on effect size  It is a good choice because the integration for the alternative hypothesis can be done numerically  t is the t-value you use in a hypothesis test (from the data)  v is the “degrees of freedom” (from the data)  This might not look easy, but it is simple to calculate with a computer Variations of JZS Priors  Scale parameter “r”  Bigger values make for a broader prior  More flexible!  More penalty! Variations of JZS Priors  Medium r= 1  Wide r= sqrt(2)/2  Ultrawide r=sqrt(2) How do we use it?  Super easy  My own web site for a two-sample t-test  http://psych.purdue.edu/~gfrancis/EquivalentStatistics/  Rouder’s web site:  http://pcl.missouri.edu/bayesfactor  In R  library(BayesFactor) How do we use it? How do we use it? How do we use it?  library(BayesFactor)  ttest.tstat(t=2.2, n1=15, n2=15, simple=TRUE)   B10 1.993006 What does it mean?  Guidelines BF Evidence 1–3 3 – 10 10 – 30 30 – 100 >100 Anecdotal Substantial Strong Very strong Decisive How does it compare to NHST?  To get BF>10, need rather large t values  Bigger still with larger sample sizes Conclusions  JZS Bayes Factors  Easy to calculate  Pretty easy to understand results  A bit arbitrary for setting up  Why not other priors?  How to pick scale factor?  Criteria for interpretation are arbitrary  Fairly painless introduction to Bayesian methods  Some things to watch out for (next time)
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            