Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Let’s continue to do a
Bayesian analysis
Greg Francis
PSY 626: Bayesian Statistics for Psychological Science
Fall 2016
Purdue University
Visual Search
A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a green
circle) or not
Vary number of distractors: 4, 16, 32, 64
Vary type of distractors: feature (different color), conjunctive (different color or
shape)
Visual Search
Typical results: For conjunctive distractors, response time increases with the
number of distractors
Linear model
Suppose you want to model the search time on the
Conjunctive search trials when the target is Absent as a
linear equation
Let’s do it for a single participant
We are basically going through Section 4.4 of the text, but
using a new data set
Download files from the class web site and follow along in
class
We built our model using the map function
MAP estimates
Maximum a posteriori (MAP) model fit
Formula:
RT_ms ~ dnorm(mu, sigma)
mu <- a + b * NumberDistractors
a ~ dnorm(1000, 500)
b ~ dnorm(0, 100)
sigma ~ dunif(0, 500)
MAP values:
a
b
sigma
830.54107 41.41798 333.16293
Posterior
You can estimate the posterior for a function by making
draws from the posterior
numVariableLines=10000
mu_at_35 <- post$a +post$b *35
10,000 samples
0.004
0.003
Density
0.002
For example, what is the
posterior distribution of the
predicted mean value for
35 distractors?
0.001
You can ask all kinds of questions
about predictions and so forth
by just using probability
0.000
0.005
post<-extract.samples(VSmodel, n= numVariableLines)
2000
2100
2200
2300
mu|NumDistract=35
2400
2500
Posterior
What is the 89% highest posterior density interval of mu at
NumberDistractors=35?
HPDI(mu_at_35, prob=0.89)
|0.89
0.89|
0.005
2155.519 2400.441
Why 89%? Because it is prime
0.004
Why 95% for a CI?
|0.95
0.95|
2128.870 2428.384
0.000
HPDI(mu_at_35, prob=0.95)
0.001
0.002
Density
0.003
(2111.8, 2450.9)
2000
Why is the HPDI broader than the CI?
2100
2200
2300
mu|NumDistract=35
2400
2500
HPDI vs CI
HPDI95= (2128.9, 2428.4)
CI95= (2111.8, 2450.9)
Pretty similar, so why bother? Different interpretations:
HPDI is the smallest set of values of mean RT_ms for
NumberDistractors=35 that have a 95% probability
If the model is valid, the priors are appropriate, and so forth
CI is the smallest set of values that results from a process
that 95% of the time includes the true mean RT_ms for
NumberDistractors=35
If the model is valid and so forth
HPDI vs CI
What is the probability that the mean of RT_ms for
NumberDistractors=35 is greater than 2400 ms?
0.005
Treat posterior as a normal distribution:
Area greater than 2400 is 0.0593
0.003
Density
sd(mu_at_35)=76.74657
0.004
mean(mu_at_35) = 2280.252
0.002
CI is a description of the sample and an algorithm that
connects it (probabilistically) to the true mean
0.001
HPDI is a description of the posterior distribution
Compute directly from posterior samples:
0.000
2000
length(mu_at_35[mu_at_35 > 2400])/length(mu_at_35) = 0.0584
2100
2200
2300
mu|NumDistract=35
2400
2500
HPDI vs CI
You cannot do this with a CI because the CI is not a
summary of the posterior distribution
Instead, the limits of the CI are the values that are output by
a process that for 95% of random samples will produce limits
that contain the true mean
If that sounds kind of silly, then you are following along just fine
In practice, the limits of a CI may be similar to the limits of an
HPDI, but in principle, they could hardly be more different
And the limit values are sometimes not similar at all
It depends on the priors
Prediction uncertainty
Our linear model uses a and b to predict the mean RT_ms
for any given NumberDistractors value
There is uncertainty in this prediction, and we should
represent it for each value of NumberDistractors
NumberDistractors.seq<-seq(from=1, to=65, by=1)
Generates a vector [1, 2, 3, … 65]
mu<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq))
Provides a posterior distribution for mean predicted RT_ms for each value in the vector (a great big 2D matrix)
mu.mean <- apply(mu, 2, mean)
mu.HPDI <-apply(mu, 2, HPDI, prob=0.89)
A short cut way of applying a function “mean” or “HPDI” to columns (dimension 2) of a matrix
Prediction uncertainty
Plot the raw data
plot(RT_ms ~ NumberDistractors, data=VSdata2)
Plot the MAP line
lines(NumberDistractors.seq, mu.mean)
For all practical purposes, this is the same as plotting the regression
line from the estimated coefficients, but it is estimated from the
sampled posterior distribution for different NumberDistractors values
Plot a shaded region for the 89% HDPI
shade(mu.HPDI, NumberDistractors.seq)
Prediction uncertainty
1000
1500
2000
2500
3000
3500
Nice summary
of predicting
average
RT_ms values
RT_ms
10
20
30
40
NumberDistractors
50
60
Predicting individual points
We have been predicting mean RT_ms values for each
NumberDistractors value
Our model is
Maximum a posteriori (MAP) model fit
Formula:
RT_ms ~ dnorm(mu, sigma)
mu <- a + b * NumberDistractors
If we try to predict any given RT_ms (not just a mean) we
have to consider that we are sampling that value from a
population with a standard deviation of sigma
We need to consider all of the uncertainty
Predicting individual points
The model can just as easily generate individual simulated samples as
means
sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors
=NumberDistractors.seq))
We can identify the “middle” 89% of such simulated samples for each
NumberDistractors value
RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89)
PI is a function from the rethinking library
And plot everything
dev.new()
plot(RT_ms ~ NumberDistractors, data=VSdata2)
lines(NumberDistractors.seq, mu.mean) # MAP line for means
shade(mu.HPDI, NumberDistractors.seq) #HDPI for means
shade(RT_ms.PI, NumberDistractors.seq) # PI for individual values
Predicting individual points
3500
3000
2500
2000
1500
Here, everything
looks fine to me
1000
This kind of
comparison is useful
for “checking” on
whether the model
makes sense
RT_ms
10
20
30
40
NumberDistractors
50
60
3000
2500
2000
1500
Are the predictions
the same?
1000
For the best fitting
line, we get nearly
the same from the
Bayesian MAP
approach as from
typical linear
regression
RT_ms
3500
Bayesian vs. Linear regression
10
20
30
40
NumberDistractors
50
60
Bayesian vs. Linear regression
MAP:
length(sim.RT_ms[, 15][sim.RT_ms[, 15]<1000])/length(sim.RT_ms[, 15])
2500
2000
1500
0.0004
Density
0.0006
RT_ms
0.0008
3000
0.0010
3500
0.0012
0.104
1000
0.0002
What is the probability of observing a random trial with
RT_ms<1000 for NumberDisactors=15?
0.0000
500
1000
1500
2000
RT_ms|NumDistract=15
2500
10
20
30
40
NumberDistractors
50
60
Bayesian vs. Linear regression
Linear regression:
Why more likely to get this “rare”
event in the Bayesian model?
Mean = 832.945 + 41.383 * 15 = 1453.69
RT_ms ~ N(1453.69, 348.8)
1500
2000
2500
3000
3500
0.0967
1000
What is the probability of observing a random trial with
RT_ms<1000 for NumberDisactors=15?
RT_ms
10
20
30
40
NumberDistractors
50
60
Bayesian vs. Linear regression
The difference is in the representation of uncertainty
Typical linear regression uses the best fitting straight line, and then
estimates RT_ms from that model
Any other choice would be worse (in terms of reducing error for the observed data)
The Bayesian MAP also has a best fitting straight line model, but its
prediction is a posterior distribution of many different straight line models
(with different parameters)
It estimates RT_ms from the full posterior distribution rather than just the “best fitting”
model
There is almost always uncertainty about the model, so there is uncertainty about the
values of RT_ms beyond the standard deviation in the regression equation
Predictions from typical linear regression ignore the uncertainty about
the model
They tend to be overly optimistic
Visual Search
Typical results: For conjunctive distractors, response time increases with the
number of distractors
Visual Search
Previously we fit a model to the Target absent condition
We can easily extend it to include the Target present condition
VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" &
VSdata$DistractorType=="Conjunction")
Define a dummy variable with value 0 if target is absent and 1 if the
target is present
VSdata2$TargetIsPresent <- ifelse(VSdata2$Target=="Present", 1, 0)
Visual Search
Define the model
VSmodel <- map(
alist( RT_ms ~ dnorm(mu, sigma),
mu <- a + (b* TargetIsPresent +(1TargetIsPresent)*b2)*NumberDistractors,
a ~ dnorm(1000, 500),
b ~ dnorm(0, 100),
b2 ~ dnorm(0, 100),
sigma ~ dunif(0, 2000)
), data=VSdata2 )
Note, parameter b is the slope for when the target is present and b2 is
the slope when the target is absent
Both conditions have the same model standard deviations and intercept
Model results
Maximum a posteriori (MAP) model fit
Formula:
RT_ms ~ dnorm(mu, sigma)
mu <- a + (b * TargetIsPresent + (1 - TargetIsPresent) * b2) *
NumberDistractors
Compare with model for Target absent only
MAP values:
a
b sigma
624.42893 45.84785 357.47658
a ~ dnorm(1000, 500)
b ~ dnorm(0, 100)
b2 ~ dnorm(0, 100)
sigma ~ dunif(0, 2000)
MAP values:
a
b
b2
sigma
858.59107 23.40139 40.78839 542.78734
Log-likelihood: -308.63
Model results
print(precis(VSmodel, corr=TRUE))
Mean StdDev 5.5% 94.5%
a
a
858.59 134.71 643.30 1073.89
1.00 -0.66 -0.66 0.03
b
23.40 4.39 16.38 30.42
-0.66 1.00 0.43 -0.02
b2
40.79 4.39 33.77 47.81
-0.66 0.43 1.00 -0.02
sigma 542.79 60.70 445.78 639.79
b
b2 sigma
0.03 -0.02 -0.02 1.00
Best fitting lines
points(RT_ms ~ NumberDistractors, data=subset(VSdata2,
VSdata2$Target=="Present" ), pch=15)
3500
plot(RT_ms ~ NumberDistractors, data=subset(VSdata2,
VSdata2$Target=="Absent" ), pch=1)
3000
abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b2"], col=col.alpha("green",1.0))
numVariableLines=10000
numVariableLinesToPlot=20
post<-extract.samples(VSmodel, n= numVariableLines)
for(i in 1: numVariableLinesToPlot){
1000
1500
2000
2500
abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b"], col=col.alpha("red",1.0))
RT_ms
abline(a=post$a[i], b=post$b[i], col=col.alpha("red",0.3), lty=5)
abline(a=post$a[i], b=+post$b2[i], col=col.alpha("green",0.3), lty=5)
}
10
20
30
40
NumberDistractors
50
60
HDPI (Target absent)
# Plot HPDI for TargetAbsent
dev.new()
plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ),
pch=1)
# Define a sequence of NumberDistractors to compute predictions
NumberDistractors.seq<-seq(from=1, to=65, by=1)
# use link to compute mu for each sample from posterior and for each value in
NumberDistractors.seq
mu_absent<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq,
TargetIsPresent=0))
mu_absent.mean <- apply(mu_absent, 2, mean)
mu_absent.HPDI <-apply(mu_absent, 2, HPDI, prob=0.89)
# Plot the MAP line (same as abline done previously from the linear regression coefficients)
lines(NumberDistractors.seq, mu_absent.mean, )
shade(mu_absent.HPDI, NumberDistractors.seq, col=col.alpha("green",0.3))
HDPI (Target present)
# Plot HPDI for TargetPresent
points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ),
3500
pch=15)
# use link to compute mu for each sample from posterior and for each value in
3000
NumberDistractors.seq
mu_present<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq,
2500
TargetIsPresent=1))
mu_present.HPDI <-apply(mu_present, 2, HPDI, prob=0.89)
# Plot the MAP line (same as abline done previously from the linear regression coefficients)
lines(NumberDistractors.seq, mu_present.mean)
shade(mu_present.HPDI, NumberDistractors.seq, col=col.alpha("red",0.3))
1000
1500
2000
mu_present.mean <- apply(mu_present, 2, mean)
RT_ms
10
20
30
40
NumberDistractors
50
60
Prediction intervals (target absent)
# Prediction interval for RT_ms raw scores
# Target absent
# generate many sample RT_ms scores for NumberDistractors.seq using the model
sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq,
TargetIsPresent=0))
# Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a
function from the rethinking library)
RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89)
# Plot
dev.new()
plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ),
pch=1)
lines(NumberDistractors.seq, mu_absent.mean) # MAP line for means
shade(mu_absent.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means
shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("green",0.3)) # shaped prediction
interval for simulated RT_ms values
Prediction intervals (target present)
# Target present
# generate many sample RT_ms scores for NumberDistractors.seq using the model
sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq,
3500
3000
TargetIsPresent=1))
# Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a
2500
function from the rethinking library)
# Plot
points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ),
1500
pch=15)
2000
RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89)
RT_ms
lines(NumberDistractors.seq, mu_present.mean) # MAP line for means
shade(mu_present.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means
shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("red",0.3)) # shaped prediction
1000
10
interval for simulated RT_ms values
20
30
40
NumberDistractors
50
60
Conclusions
HPDI vs. CI
Predictions should consider uncertainty about the model
Bayesian analysis allows you to do this in a way that
cannot be done with typical linear regression
Extending the model to consider different slopes for
different conditions is straightforward