Download PPT slides for 22 September - Purdue Psychological Sciences

Let’s continue to do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University Visual Search     A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape) Visual Search  Typical results: For conjunctive distractors, response time increases with the number of distractors Linear model      Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation Let’s do it for a single participant We are basically going through Section 4.4 of the text, but using a new data set Download files from the class web site and follow along in class We built our model using the map function MAP estimates  Maximum a posteriori (MAP) model fit  Formula:  RT_ms ~ dnorm(mu, sigma)  mu <- a + b * NumberDistractors  a ~ dnorm(1000, 500)  b ~ dnorm(0, 100)  sigma ~ dunif(0, 500)  MAP values:   a b sigma 830.54107 41.41798 333.16293 Posterior  You can estimate the posterior for a function by making draws from the posterior  numVariableLines=10000  mu_at_35 <- post$a +post$b *35  10,000 samples 0.004 0.003 Density 0.002 For example, what is the posterior distribution of the predicted mean value for 35 distractors? 0.001  You can ask all kinds of questions about predictions and so forth by just using probability 0.000  0.005  post<-extract.samples(VSmodel, n= numVariableLines) 2000 2100 2200 2300 mu|NumDistract=35 2400 2500 Posterior  What is the 89% highest posterior density interval of mu at NumberDistractors=35? HPDI(mu_at_35, prob=0.89)  |0.89 0.89| 0.005   2155.519 2400.441 Why 89%? Because it is prime  0.004   Why 95% for a CI?  |0.95 0.95| 2128.870 2428.384 0.000  HPDI(mu_at_35, prob=0.95) 0.001  0.002 Density 0.003  (2111.8, 2450.9) 2000  Why is the HPDI broader than the CI? 2100 2200 2300 mu|NumDistract=35 2400 2500 HPDI vs CI  HPDI95= (2128.9, 2428.4)  CI95= (2111.8, 2450.9)  Pretty similar, so why bother? Different interpretations:  HPDI is the smallest set of values of mean RT_ms for NumberDistractors=35 that have a 95% probability  If the model is valid, the priors are appropriate, and so forth  CI is the smallest set of values that results from a process that 95% of the time includes the true mean RT_ms for NumberDistractors=35  If the model is valid and so forth HPDI vs CI What is the probability that the mean of RT_ms for NumberDistractors=35 is greater than 2400 ms? 0.005  Treat posterior as a normal distribution:  Area greater than 2400 is 0.0593 0.003 Density  sd(mu_at_35)=76.74657 0.004  mean(mu_at_35) = 2280.252 0.002  CI is a description of the sample and an algorithm that connects it (probabilistically) to the true mean 0.001  HPDI is a description of the posterior distribution  Compute directly from posterior samples: 0.000  2000  length(mu_at_35[mu_at_35 > 2400])/length(mu_at_35) = 0.0584 2100 2200 2300 mu|NumDistract=35 2400 2500 HPDI vs CI   You cannot do this with a CI because the CI is not a summary of the posterior distribution Instead, the limits of the CI are the values that are output by a process that for 95% of random samples will produce limits that contain the true mean  If that sounds kind of silly, then you are following along just fine  In practice, the limits of a CI may be similar to the limits of an HPDI, but in principle, they could hardly be more different  And the limit values are sometimes not similar at all  It depends on the priors Prediction uncertainty    Our linear model uses a and b to predict the mean RT_ms for any given NumberDistractors value There is uncertainty in this prediction, and we should represent it for each value of NumberDistractors NumberDistractors.seq<-seq(from=1, to=65, by=1)  Generates a vector [1, 2, 3, … 65]  mu<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq))  Provides a posterior distribution for mean predicted RT_ms for each value in the vector (a great big 2D matrix)  mu.mean <- apply(mu, 2, mean)  mu.HPDI <-apply(mu, 2, HPDI, prob=0.89)  A short cut way of applying a function “mean” or “HPDI” to columns (dimension 2) of a matrix Prediction uncertainty  Plot the raw data  plot(RT_ms ~ NumberDistractors, data=VSdata2)  Plot the MAP line  lines(NumberDistractors.seq, mu.mean)  For all practical purposes, this is the same as plotting the regression line from the estimated coefficients, but it is estimated from the sampled posterior distribution for different NumberDistractors values  Plot a shaded region for the 89% HDPI  shade(mu.HPDI, NumberDistractors.seq) Prediction uncertainty 1000 1500 2000 2500 3000 3500 Nice summary of predicting average RT_ms values RT_ms  10 20 30 40 NumberDistractors 50 60 Predicting individual points   We have been predicting mean RT_ms values for each NumberDistractors value Our model is  Maximum a posteriori (MAP) model fit  Formula:  RT_ms ~ dnorm(mu, sigma)  mu <- a + b * NumberDistractors  If we try to predict any given RT_ms (not just a mean) we have to consider that we are sampling that value from a population with a standard deviation of sigma  We need to consider all of the uncertainty Predicting individual points  The model can just as easily generate individual simulated samples as means  sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq))  We can identify the “middle” 89% of such simulated samples for each NumberDistractors value  RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89)  PI is a function from the rethinking library  And plot everything  dev.new()  plot(RT_ms ~ NumberDistractors, data=VSdata2)  lines(NumberDistractors.seq, mu.mean) # MAP line for means  shade(mu.HPDI, NumberDistractors.seq) #HDPI for means  shade(RT_ms.PI, NumberDistractors.seq) # PI for individual values Predicting individual points 3500 3000 2500 2000 1500 Here, everything looks fine to me 1000  This kind of comparison is useful for “checking” on whether the model makes sense RT_ms  10 20 30 40 NumberDistractors 50 60 3000 2500 2000 1500 Are the predictions the same? 1000  For the best fitting line, we get nearly the same from the Bayesian MAP approach as from typical linear regression RT_ms  3500 Bayesian vs. Linear regression 10 20 30 40 NumberDistractors 50 60 Bayesian vs. Linear regression MAP:  length(sim.RT_ms[, 15][sim.RT_ms[, 15]<1000])/length(sim.RT_ms[, 15]) 2500 2000 1500 0.0004 Density 0.0006 RT_ms 0.0008 3000 0.0010 3500 0.0012  0.104 1000 0.0002  What is the probability of observing a random trial with RT_ms<1000 for NumberDisactors=15? 0.0000  500 1000 1500 2000 RT_ms|NumDistract=15 2500 10 20 30 40 NumberDistractors 50 60 Bayesian vs. Linear regression Linear regression: Why more likely to get this “rare” event in the Bayesian model?  Mean = 832.945 + 41.383 * 15 = 1453.69  RT_ms ~ N(1453.69, 348.8) 1500 2000 2500 3000 3500  0.0967 1000  What is the probability of observing a random trial with RT_ms<1000 for NumberDisactors=15? RT_ms  10 20 30 40 NumberDistractors 50 60 Bayesian vs. Linear regression   The difference is in the representation of uncertainty Typical linear regression uses the best fitting straight line, and then estimates RT_ms from that model  Any other choice would be worse (in terms of reducing error for the observed data)  The Bayesian MAP also has a best fitting straight line model, but its prediction is a posterior distribution of many different straight line models (with different parameters)  It estimates RT_ms from the full posterior distribution rather than just the “best fitting” model  There is almost always uncertainty about the model, so there is uncertainty about the values of RT_ms beyond the standard deviation in the regression equation  Predictions from typical linear regression ignore the uncertainty about the model  They tend to be overly optimistic Visual Search  Typical results: For conjunctive distractors, response time increases with the number of distractors Visual Search  Previously we fit a model to the Target absent condition  We can easily extend it to include the Target present condition    VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$DistractorType=="Conjunction") Define a dummy variable with value 0 if target is absent and 1 if the target is present VSdata2$TargetIsPresent <- ifelse(VSdata2$Target=="Present", 1, 0) Visual Search  Define the model  VSmodel <- map(   alist( RT_ms ~ dnorm(mu, sigma), mu <- a + (b* TargetIsPresent +(1TargetIsPresent)*b2)*NumberDistractors,  a ~ dnorm(1000, 500),  b ~ dnorm(0, 100),  b2 ~ dnorm(0, 100),  sigma ~ dunif(0, 2000)  ), data=VSdata2 )  Note, parameter b is the slope for when the target is present and b2 is the slope when the target is absent  Both conditions have the same model standard deviations and intercept Model results  Maximum a posteriori (MAP) model fit  Formula:  RT_ms ~ dnorm(mu, sigma)  mu <- a + (b * TargetIsPresent + (1 - TargetIsPresent) * b2) *  NumberDistractors Compare with model for Target absent only MAP values: a b sigma 624.42893 45.84785 357.47658  a ~ dnorm(1000, 500)  b ~ dnorm(0, 100)  b2 ~ dnorm(0, 100)  sigma ~ dunif(0, 2000)  MAP values:  a b b2 sigma  858.59107 23.40139 40.78839 542.78734  Log-likelihood: -308.63 Model results  print(precis(VSmodel, corr=TRUE)) Mean StdDev 5.5% 94.5% a  a 858.59 134.71 643.30 1073.89 1.00 -0.66 -0.66 0.03  b 23.40 4.39 16.38 30.42 -0.66 1.00 0.43 -0.02  b2 40.79 4.39 33.77 47.81 -0.66 0.43 1.00 -0.02  sigma 542.79 60.70 445.78 639.79  b b2 sigma 0.03 -0.02 -0.02 1.00 Best fitting lines points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ), pch=15) 3500  plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ), pch=1) 3000   abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b2"], col=col.alpha("green",1.0))  numVariableLines=10000  numVariableLinesToPlot=20  post<-extract.samples(VSmodel, n= numVariableLines)  for(i in 1: numVariableLinesToPlot){ 1000 1500 2000 2500 abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b"], col=col.alpha("red",1.0)) RT_ms   abline(a=post$a[i], b=post$b[i], col=col.alpha("red",0.3), lty=5)  abline(a=post$a[i], b=+post$b2[i], col=col.alpha("green",0.3), lty=5)  } 10 20 30 40 NumberDistractors 50 60 HDPI (Target absent)  # Plot HPDI for TargetAbsent  dev.new()  plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ), pch=1)  # Define a sequence of NumberDistractors to compute predictions  NumberDistractors.seq<-seq(from=1, to=65, by=1)  # use link to compute mu for each sample from posterior and for each value in NumberDistractors.seq  mu_absent<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq, TargetIsPresent=0))  mu_absent.mean <- apply(mu_absent, 2, mean)  mu_absent.HPDI <-apply(mu_absent, 2, HPDI, prob=0.89)  # Plot the MAP line (same as abline done previously from the linear regression coefficients)  lines(NumberDistractors.seq, mu_absent.mean, )  shade(mu_absent.HPDI, NumberDistractors.seq, col=col.alpha("green",0.3)) HDPI (Target present)  # Plot HPDI for TargetPresent  points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ),  3500 pch=15) # use link to compute mu for each sample from posterior and for each value in  3000 NumberDistractors.seq mu_present<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq, 2500 TargetIsPresent=1))  mu_present.HPDI <-apply(mu_present, 2, HPDI, prob=0.89)  # Plot the MAP line (same as abline done previously from the linear regression coefficients)  lines(NumberDistractors.seq, mu_present.mean)  shade(mu_present.HPDI, NumberDistractors.seq, col=col.alpha("red",0.3)) 1000 1500 2000 mu_present.mean <- apply(mu_present, 2, mean) RT_ms  10 20 30 40 NumberDistractors 50 60 Prediction intervals (target absent)  # Prediction interval for RT_ms raw scores  # Target absent  # generate many sample RT_ms scores for NumberDistractors.seq using the model  sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq, TargetIsPresent=0))  # Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a function from the rethinking library)  RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89)  # Plot  dev.new()  plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ), pch=1)  lines(NumberDistractors.seq, mu_absent.mean) # MAP line for means  shade(mu_absent.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means  shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("green",0.3)) # shaped prediction interval for simulated RT_ms values Prediction intervals (target present) # Target present  # generate many sample RT_ms scores for NumberDistractors.seq using the model  sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq, 3500   3000 TargetIsPresent=1)) # Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a 2500 function from the rethinking library)  # Plot  points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ), 1500 pch=15) 2000 RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89) RT_ms  lines(NumberDistractors.seq, mu_present.mean) # MAP line for means  shade(mu_present.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means  shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("red",0.3)) # shaped prediction 1000  10 interval for simulated RT_ms values 20 30 40 NumberDistractors 50 60 Conclusions  HPDI vs. CI  Predictions should consider uncertainty about the model   Bayesian analysis allows you to do this in a way that cannot be done with typical linear regression Extending the model to consider different slopes for different conditions is straightforward

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PPT slides for 22 September - Purdue Psychological Sciences