Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Let’s continue to do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape) Visual Search Typical results: For conjunctive distractors, response time increases with the number of distractors Linear model Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation Let’s do it for a single participant We are basically going through Section 4.4 of the text, but using a new data set Download files from the class web site and follow along in class We built our model using the map function MAP estimates Maximum a posteriori (MAP) model fit Formula: RT_ms ~ dnorm(mu, sigma) mu <- a + b * NumberDistractors a ~ dnorm(1000, 500) b ~ dnorm(0, 100) sigma ~ dunif(0, 500) MAP values: a b sigma 830.54107 41.41798 333.16293 Posterior You can estimate the posterior for a function by making draws from the posterior numVariableLines=10000 mu_at_35 <- post$a +post$b *35 10,000 samples 0.004 0.003 Density 0.002 For example, what is the posterior distribution of the predicted mean value for 35 distractors? 0.001 You can ask all kinds of questions about predictions and so forth by just using probability 0.000 0.005 post<-extract.samples(VSmodel, n= numVariableLines) 2000 2100 2200 2300 mu|NumDistract=35 2400 2500 Posterior What is the 89% highest posterior density interval of mu at NumberDistractors=35? HPDI(mu_at_35, prob=0.89) |0.89 0.89| 0.005 2155.519 2400.441 Why 89%? Because it is prime 0.004 Why 95% for a CI? |0.95 0.95| 2128.870 2428.384 0.000 HPDI(mu_at_35, prob=0.95) 0.001 0.002 Density 0.003 (2111.8, 2450.9) 2000 Why is the HPDI broader than the CI? 2100 2200 2300 mu|NumDistract=35 2400 2500 HPDI vs CI HPDI95= (2128.9, 2428.4) CI95= (2111.8, 2450.9) Pretty similar, so why bother? Different interpretations: HPDI is the smallest set of values of mean RT_ms for NumberDistractors=35 that have a 95% probability If the model is valid, the priors are appropriate, and so forth CI is the smallest set of values that results from a process that 95% of the time includes the true mean RT_ms for NumberDistractors=35 If the model is valid and so forth HPDI vs CI What is the probability that the mean of RT_ms for NumberDistractors=35 is greater than 2400 ms? 0.005 Treat posterior as a normal distribution: Area greater than 2400 is 0.0593 0.003 Density sd(mu_at_35)=76.74657 0.004 mean(mu_at_35) = 2280.252 0.002 CI is a description of the sample and an algorithm that connects it (probabilistically) to the true mean 0.001 HPDI is a description of the posterior distribution Compute directly from posterior samples: 0.000 2000 length(mu_at_35[mu_at_35 > 2400])/length(mu_at_35) = 0.0584 2100 2200 2300 mu|NumDistract=35 2400 2500 HPDI vs CI You cannot do this with a CI because the CI is not a summary of the posterior distribution Instead, the limits of the CI are the values that are output by a process that for 95% of random samples will produce limits that contain the true mean If that sounds kind of silly, then you are following along just fine In practice, the limits of a CI may be similar to the limits of an HPDI, but in principle, they could hardly be more different And the limit values are sometimes not similar at all It depends on the priors Prediction uncertainty Our linear model uses a and b to predict the mean RT_ms for any given NumberDistractors value There is uncertainty in this prediction, and we should represent it for each value of NumberDistractors NumberDistractors.seq<-seq(from=1, to=65, by=1) Generates a vector [1, 2, 3, … 65] mu<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq)) Provides a posterior distribution for mean predicted RT_ms for each value in the vector (a great big 2D matrix) mu.mean <- apply(mu, 2, mean) mu.HPDI <-apply(mu, 2, HPDI, prob=0.89) A short cut way of applying a function “mean” or “HPDI” to columns (dimension 2) of a matrix Prediction uncertainty Plot the raw data plot(RT_ms ~ NumberDistractors, data=VSdata2) Plot the MAP line lines(NumberDistractors.seq, mu.mean) For all practical purposes, this is the same as plotting the regression line from the estimated coefficients, but it is estimated from the sampled posterior distribution for different NumberDistractors values Plot a shaded region for the 89% HDPI shade(mu.HPDI, NumberDistractors.seq) Prediction uncertainty 1000 1500 2000 2500 3000 3500 Nice summary of predicting average RT_ms values RT_ms 10 20 30 40 NumberDistractors 50 60 Predicting individual points We have been predicting mean RT_ms values for each NumberDistractors value Our model is Maximum a posteriori (MAP) model fit Formula: RT_ms ~ dnorm(mu, sigma) mu <- a + b * NumberDistractors If we try to predict any given RT_ms (not just a mean) we have to consider that we are sampling that value from a population with a standard deviation of sigma We need to consider all of the uncertainty Predicting individual points The model can just as easily generate individual simulated samples as means sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq)) We can identify the “middle” 89% of such simulated samples for each NumberDistractors value RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89) PI is a function from the rethinking library And plot everything dev.new() plot(RT_ms ~ NumberDistractors, data=VSdata2) lines(NumberDistractors.seq, mu.mean) # MAP line for means shade(mu.HPDI, NumberDistractors.seq) #HDPI for means shade(RT_ms.PI, NumberDistractors.seq) # PI for individual values Predicting individual points 3500 3000 2500 2000 1500 Here, everything looks fine to me 1000 This kind of comparison is useful for “checking” on whether the model makes sense RT_ms 10 20 30 40 NumberDistractors 50 60 3000 2500 2000 1500 Are the predictions the same? 1000 For the best fitting line, we get nearly the same from the Bayesian MAP approach as from typical linear regression RT_ms 3500 Bayesian vs. Linear regression 10 20 30 40 NumberDistractors 50 60 Bayesian vs. Linear regression MAP: length(sim.RT_ms[, 15][sim.RT_ms[, 15]<1000])/length(sim.RT_ms[, 15]) 2500 2000 1500 0.0004 Density 0.0006 RT_ms 0.0008 3000 0.0010 3500 0.0012 0.104 1000 0.0002 What is the probability of observing a random trial with RT_ms<1000 for NumberDisactors=15? 0.0000 500 1000 1500 2000 RT_ms|NumDistract=15 2500 10 20 30 40 NumberDistractors 50 60 Bayesian vs. Linear regression Linear regression: Why more likely to get this “rare” event in the Bayesian model? Mean = 832.945 + 41.383 * 15 = 1453.69 RT_ms ~ N(1453.69, 348.8) 1500 2000 2500 3000 3500 0.0967 1000 What is the probability of observing a random trial with RT_ms<1000 for NumberDisactors=15? RT_ms 10 20 30 40 NumberDistractors 50 60 Bayesian vs. Linear regression The difference is in the representation of uncertainty Typical linear regression uses the best fitting straight line, and then estimates RT_ms from that model Any other choice would be worse (in terms of reducing error for the observed data) The Bayesian MAP also has a best fitting straight line model, but its prediction is a posterior distribution of many different straight line models (with different parameters) It estimates RT_ms from the full posterior distribution rather than just the “best fitting” model There is almost always uncertainty about the model, so there is uncertainty about the values of RT_ms beyond the standard deviation in the regression equation Predictions from typical linear regression ignore the uncertainty about the model They tend to be overly optimistic Visual Search Typical results: For conjunctive distractors, response time increases with the number of distractors Visual Search Previously we fit a model to the Target absent condition We can easily extend it to include the Target present condition VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$DistractorType=="Conjunction") Define a dummy variable with value 0 if target is absent and 1 if the target is present VSdata2$TargetIsPresent <- ifelse(VSdata2$Target=="Present", 1, 0) Visual Search Define the model VSmodel <- map( alist( RT_ms ~ dnorm(mu, sigma), mu <- a + (b* TargetIsPresent +(1TargetIsPresent)*b2)*NumberDistractors, a ~ dnorm(1000, 500), b ~ dnorm(0, 100), b2 ~ dnorm(0, 100), sigma ~ dunif(0, 2000) ), data=VSdata2 ) Note, parameter b is the slope for when the target is present and b2 is the slope when the target is absent Both conditions have the same model standard deviations and intercept Model results Maximum a posteriori (MAP) model fit Formula: RT_ms ~ dnorm(mu, sigma) mu <- a + (b * TargetIsPresent + (1 - TargetIsPresent) * b2) * NumberDistractors Compare with model for Target absent only MAP values: a b sigma 624.42893 45.84785 357.47658 a ~ dnorm(1000, 500) b ~ dnorm(0, 100) b2 ~ dnorm(0, 100) sigma ~ dunif(0, 2000) MAP values: a b b2 sigma 858.59107 23.40139 40.78839 542.78734 Log-likelihood: -308.63 Model results print(precis(VSmodel, corr=TRUE)) Mean StdDev 5.5% 94.5% a a 858.59 134.71 643.30 1073.89 1.00 -0.66 -0.66 0.03 b 23.40 4.39 16.38 30.42 -0.66 1.00 0.43 -0.02 b2 40.79 4.39 33.77 47.81 -0.66 0.43 1.00 -0.02 sigma 542.79 60.70 445.78 639.79 b b2 sigma 0.03 -0.02 -0.02 1.00 Best fitting lines points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ), pch=15) 3500 plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ), pch=1) 3000 abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b2"], col=col.alpha("green",1.0)) numVariableLines=10000 numVariableLinesToPlot=20 post<-extract.samples(VSmodel, n= numVariableLines) for(i in 1: numVariableLinesToPlot){ 1000 1500 2000 2500 abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b"], col=col.alpha("red",1.0)) RT_ms abline(a=post$a[i], b=post$b[i], col=col.alpha("red",0.3), lty=5) abline(a=post$a[i], b=+post$b2[i], col=col.alpha("green",0.3), lty=5) } 10 20 30 40 NumberDistractors 50 60 HDPI (Target absent) # Plot HPDI for TargetAbsent dev.new() plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ), pch=1) # Define a sequence of NumberDistractors to compute predictions NumberDistractors.seq<-seq(from=1, to=65, by=1) # use link to compute mu for each sample from posterior and for each value in NumberDistractors.seq mu_absent<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq, TargetIsPresent=0)) mu_absent.mean <- apply(mu_absent, 2, mean) mu_absent.HPDI <-apply(mu_absent, 2, HPDI, prob=0.89) # Plot the MAP line (same as abline done previously from the linear regression coefficients) lines(NumberDistractors.seq, mu_absent.mean, ) shade(mu_absent.HPDI, NumberDistractors.seq, col=col.alpha("green",0.3)) HDPI (Target present) # Plot HPDI for TargetPresent points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ), 3500 pch=15) # use link to compute mu for each sample from posterior and for each value in 3000 NumberDistractors.seq mu_present<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq, 2500 TargetIsPresent=1)) mu_present.HPDI <-apply(mu_present, 2, HPDI, prob=0.89) # Plot the MAP line (same as abline done previously from the linear regression coefficients) lines(NumberDistractors.seq, mu_present.mean) shade(mu_present.HPDI, NumberDistractors.seq, col=col.alpha("red",0.3)) 1000 1500 2000 mu_present.mean <- apply(mu_present, 2, mean) RT_ms 10 20 30 40 NumberDistractors 50 60 Prediction intervals (target absent) # Prediction interval for RT_ms raw scores # Target absent # generate many sample RT_ms scores for NumberDistractors.seq using the model sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq, TargetIsPresent=0)) # Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a function from the rethinking library) RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89) # Plot dev.new() plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ), pch=1) lines(NumberDistractors.seq, mu_absent.mean) # MAP line for means shade(mu_absent.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("green",0.3)) # shaped prediction interval for simulated RT_ms values Prediction intervals (target present) # Target present # generate many sample RT_ms scores for NumberDistractors.seq using the model sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq, 3500 3000 TargetIsPresent=1)) # Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a 2500 function from the rethinking library) # Plot points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ), 1500 pch=15) 2000 RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89) RT_ms lines(NumberDistractors.seq, mu_present.mean) # MAP line for means shade(mu_present.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("red",0.3)) # shaped prediction 1000 10 interval for simulated RT_ms values 20 30 40 NumberDistractors 50 60 Conclusions HPDI vs. CI Predictions should consider uncertainty about the model Bayesian analysis allows you to do this in a way that cannot be done with typical linear regression Extending the model to consider different slopes for different conditions is straightforward