Download PPT slides for 22 September - Purdue Psychological Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Let’s continue to do a
Bayesian analysis
Greg Francis
PSY 626: Bayesian Statistics for Psychological Science
Fall 2016
Purdue University
Visual Search




A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a green
circle) or not
Vary number of distractors: 4, 16, 32, 64
Vary type of distractors: feature (different color), conjunctive (different color or
shape)
Visual Search

Typical results: For conjunctive distractors, response time increases with the
number of distractors
Linear model





Suppose you want to model the search time on the
Conjunctive search trials when the target is Absent as a
linear equation
Let’s do it for a single participant
We are basically going through Section 4.4 of the text, but
using a new data set
Download files from the class web site and follow along in
class
We built our model using the map function
MAP estimates

Maximum a posteriori (MAP) model fit

Formula:

RT_ms ~ dnorm(mu, sigma)

mu <- a + b * NumberDistractors

a ~ dnorm(1000, 500)

b ~ dnorm(0, 100)

sigma ~ dunif(0, 500)

MAP values:


a
b
sigma
830.54107 41.41798 333.16293
Posterior

You can estimate the posterior for a function by making
draws from the posterior
 numVariableLines=10000

mu_at_35 <- post$a +post$b *35
 10,000 samples
0.004
0.003
Density
0.002
For example, what is the
posterior distribution of the
predicted mean value for
35 distractors?
0.001

You can ask all kinds of questions
about predictions and so forth
by just using probability
0.000

0.005
 post<-extract.samples(VSmodel, n= numVariableLines)
2000
2100
2200
2300
mu|NumDistract=35
2400
2500
Posterior

What is the 89% highest posterior density interval of mu at
NumberDistractors=35?
HPDI(mu_at_35, prob=0.89)
 |0.89
0.89|
0.005

 2155.519 2400.441
Why 89%? Because it is prime 
0.004

 Why 95% for a CI?

|0.95
0.95|
2128.870 2428.384
0.000

HPDI(mu_at_35, prob=0.95)
0.001

0.002
Density
0.003
 (2111.8, 2450.9)
2000

Why is the HPDI broader than the CI?
2100
2200
2300
mu|NumDistract=35
2400
2500
HPDI vs CI

HPDI95= (2128.9, 2428.4)

CI95= (2111.8, 2450.9)

Pretty similar, so why bother? Different interpretations:

HPDI is the smallest set of values of mean RT_ms for
NumberDistractors=35 that have a 95% probability
 If the model is valid, the priors are appropriate, and so forth

CI is the smallest set of values that results from a process
that 95% of the time includes the true mean RT_ms for
NumberDistractors=35
 If the model is valid and so forth
HPDI vs CI
What is the probability that the mean of RT_ms for
NumberDistractors=35 is greater than 2400 ms?
0.005
 Treat posterior as a normal distribution:
 Area greater than 2400 is 0.0593
0.003
Density
 sd(mu_at_35)=76.74657
0.004
 mean(mu_at_35) = 2280.252
0.002

CI is a description of the sample and an algorithm that
connects it (probabilistically) to the true mean
0.001

HPDI is a description of the posterior distribution
 Compute directly from posterior samples:
0.000

2000
 length(mu_at_35[mu_at_35 > 2400])/length(mu_at_35) = 0.0584
2100
2200
2300
mu|NumDistract=35
2400
2500
HPDI vs CI


You cannot do this with a CI because the CI is not a
summary of the posterior distribution
Instead, the limits of the CI are the values that are output by
a process that for 95% of random samples will produce limits
that contain the true mean
 If that sounds kind of silly, then you are following along just fine

In practice, the limits of a CI may be similar to the limits of an
HPDI, but in principle, they could hardly be more different
 And the limit values are sometimes not similar at all
 It depends on the priors
Prediction uncertainty



Our linear model uses a and b to predict the mean RT_ms
for any given NumberDistractors value
There is uncertainty in this prediction, and we should
represent it for each value of NumberDistractors
NumberDistractors.seq<-seq(from=1, to=65, by=1)
 Generates a vector [1, 2, 3, … 65]

mu<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq))
 Provides a posterior distribution for mean predicted RT_ms for each value in the vector (a great big 2D matrix)

mu.mean <- apply(mu, 2, mean)

mu.HPDI <-apply(mu, 2, HPDI, prob=0.89)
 A short cut way of applying a function “mean” or “HPDI” to columns (dimension 2) of a matrix
Prediction uncertainty

Plot the raw data
 plot(RT_ms ~ NumberDistractors, data=VSdata2)

Plot the MAP line
 lines(NumberDistractors.seq, mu.mean)
 For all practical purposes, this is the same as plotting the regression
line from the estimated coefficients, but it is estimated from the
sampled posterior distribution for different NumberDistractors values

Plot a shaded region for the 89% HDPI
 shade(mu.HPDI, NumberDistractors.seq)
Prediction uncertainty
1000
1500
2000
2500
3000
3500
Nice summary
of predicting
average
RT_ms values
RT_ms

10
20
30
40
NumberDistractors
50
60
Predicting individual points


We have been predicting mean RT_ms values for each
NumberDistractors value
Our model is
 Maximum a posteriori (MAP) model fit
 Formula:
 RT_ms ~ dnorm(mu, sigma)
 mu <- a + b * NumberDistractors

If we try to predict any given RT_ms (not just a mean) we
have to consider that we are sampling that value from a
population with a standard deviation of sigma
 We need to consider all of the uncertainty
Predicting individual points

The model can just as easily generate individual simulated samples as
means
 sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors
=NumberDistractors.seq))

We can identify the “middle” 89% of such simulated samples for each
NumberDistractors value
 RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89)
 PI is a function from the rethinking library

And plot everything
 dev.new()
 plot(RT_ms ~ NumberDistractors, data=VSdata2)
 lines(NumberDistractors.seq, mu.mean) # MAP line for means
 shade(mu.HPDI, NumberDistractors.seq) #HDPI for means
 shade(RT_ms.PI, NumberDistractors.seq) # PI for individual values
Predicting individual points
3500
3000
2500
2000
1500
Here, everything
looks fine to me
1000

This kind of
comparison is useful
for “checking” on
whether the model
makes sense
RT_ms

10
20
30
40
NumberDistractors
50
60
3000
2500
2000
1500
Are the predictions
the same?
1000

For the best fitting
line, we get nearly
the same from the
Bayesian MAP
approach as from
typical linear
regression
RT_ms

3500
Bayesian vs. Linear regression
10
20
30
40
NumberDistractors
50
60
Bayesian vs. Linear regression
MAP:
 length(sim.RT_ms[, 15][sim.RT_ms[, 15]<1000])/length(sim.RT_ms[, 15])
2500
2000
1500
0.0004
Density
0.0006
RT_ms
0.0008
3000
0.0010
3500
0.0012
 0.104
1000
0.0002

What is the probability of observing a random trial with
RT_ms<1000 for NumberDisactors=15?
0.0000

500
1000
1500
2000
RT_ms|NumDistract=15
2500
10
20
30
40
NumberDistractors
50
60
Bayesian vs. Linear regression
Linear regression:
Why more likely to get this “rare”
event in the Bayesian model?
 Mean = 832.945 + 41.383 * 15 = 1453.69
 RT_ms ~ N(1453.69, 348.8)
1500
2000
2500
3000
3500
 0.0967
1000

What is the probability of observing a random trial with
RT_ms<1000 for NumberDisactors=15?
RT_ms

10
20
30
40
NumberDistractors
50
60
Bayesian vs. Linear regression


The difference is in the representation of uncertainty
Typical linear regression uses the best fitting straight line, and then
estimates RT_ms from that model
 Any other choice would be worse (in terms of reducing error for the observed data)

The Bayesian MAP also has a best fitting straight line model, but its
prediction is a posterior distribution of many different straight line models
(with different parameters)
 It estimates RT_ms from the full posterior distribution rather than just the “best fitting”
model
 There is almost always uncertainty about the model, so there is uncertainty about the
values of RT_ms beyond the standard deviation in the regression equation

Predictions from typical linear regression ignore the uncertainty about
the model
 They tend to be overly optimistic
Visual Search

Typical results: For conjunctive distractors, response time increases with the
number of distractors
Visual Search

Previously we fit a model to the Target absent condition

We can easily extend it to include the Target present condition



VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" &
VSdata$DistractorType=="Conjunction")
Define a dummy variable with value 0 if target is absent and 1 if the
target is present
VSdata2$TargetIsPresent <- ifelse(VSdata2$Target=="Present", 1, 0)
Visual Search

Define the model

VSmodel <- map(


alist( RT_ms ~ dnorm(mu, sigma),
mu <- a + (b* TargetIsPresent +(1TargetIsPresent)*b2)*NumberDistractors,

a ~ dnorm(1000, 500),

b ~ dnorm(0, 100),

b2 ~ dnorm(0, 100),

sigma ~ dunif(0, 2000)

), data=VSdata2 )

Note, parameter b is the slope for when the target is present and b2 is
the slope when the target is absent
 Both conditions have the same model standard deviations and intercept
Model results

Maximum a posteriori (MAP) model fit

Formula:

RT_ms ~ dnorm(mu, sigma)

mu <- a + (b * TargetIsPresent + (1 - TargetIsPresent) * b2) *

NumberDistractors
Compare with model for Target absent only
MAP values:
a
b sigma
624.42893 45.84785 357.47658

a ~ dnorm(1000, 500)

b ~ dnorm(0, 100)

b2 ~ dnorm(0, 100)

sigma ~ dunif(0, 2000)

MAP values:

a
b
b2
sigma

858.59107 23.40139 40.78839 542.78734

Log-likelihood: -308.63
Model results

print(precis(VSmodel, corr=TRUE))
Mean StdDev 5.5% 94.5%
a

a
858.59 134.71 643.30 1073.89
1.00 -0.66 -0.66 0.03

b
23.40 4.39 16.38 30.42
-0.66 1.00 0.43 -0.02

b2
40.79 4.39 33.77 47.81
-0.66 0.43 1.00 -0.02

sigma 542.79 60.70 445.78 639.79

b
b2 sigma
0.03 -0.02 -0.02 1.00
Best fitting lines
points(RT_ms ~ NumberDistractors, data=subset(VSdata2,
VSdata2$Target=="Present" ), pch=15)
3500

plot(RT_ms ~ NumberDistractors, data=subset(VSdata2,
VSdata2$Target=="Absent" ), pch=1)
3000


abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b2"], col=col.alpha("green",1.0))

numVariableLines=10000

numVariableLinesToPlot=20

post<-extract.samples(VSmodel, n= numVariableLines)

for(i in 1: numVariableLinesToPlot){
1000
1500
2000
2500
abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b"], col=col.alpha("red",1.0))
RT_ms


abline(a=post$a[i], b=post$b[i], col=col.alpha("red",0.3), lty=5)

abline(a=post$a[i], b=+post$b2[i], col=col.alpha("green",0.3), lty=5)

}
10
20
30
40
NumberDistractors
50
60
HDPI (Target absent)

# Plot HPDI for TargetAbsent

dev.new()

plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ),
pch=1)

# Define a sequence of NumberDistractors to compute predictions

NumberDistractors.seq<-seq(from=1, to=65, by=1)

# use link to compute mu for each sample from posterior and for each value in
NumberDistractors.seq

mu_absent<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq,
TargetIsPresent=0))

mu_absent.mean <- apply(mu_absent, 2, mean)

mu_absent.HPDI <-apply(mu_absent, 2, HPDI, prob=0.89)

# Plot the MAP line (same as abline done previously from the linear regression coefficients)

lines(NumberDistractors.seq, mu_absent.mean, )

shade(mu_absent.HPDI, NumberDistractors.seq, col=col.alpha("green",0.3))
HDPI (Target present)

# Plot HPDI for TargetPresent

points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ),

3500
pch=15)
# use link to compute mu for each sample from posterior and for each value in

3000
NumberDistractors.seq
mu_present<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq,
2500
TargetIsPresent=1))

mu_present.HPDI <-apply(mu_present, 2, HPDI, prob=0.89)

# Plot the MAP line (same as abline done previously from the linear regression coefficients)

lines(NumberDistractors.seq, mu_present.mean)

shade(mu_present.HPDI, NumberDistractors.seq, col=col.alpha("red",0.3))
1000
1500
2000
mu_present.mean <- apply(mu_present, 2, mean)
RT_ms

10
20
30
40
NumberDistractors
50
60
Prediction intervals (target absent)

# Prediction interval for RT_ms raw scores

# Target absent

# generate many sample RT_ms scores for NumberDistractors.seq using the model

sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq,
TargetIsPresent=0))

# Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a
function from the rethinking library)

RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89)

# Plot

dev.new()

plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ),
pch=1)

lines(NumberDistractors.seq, mu_absent.mean) # MAP line for means

shade(mu_absent.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means

shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("green",0.3)) # shaped prediction
interval for simulated RT_ms values
Prediction intervals (target present)
# Target present

# generate many sample RT_ms scores for NumberDistractors.seq using the model

sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq,
3500


3000
TargetIsPresent=1))
# Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a
2500
function from the rethinking library)

# Plot

points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ),
1500
pch=15)
2000
RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89)
RT_ms

lines(NumberDistractors.seq, mu_present.mean) # MAP line for means

shade(mu_present.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means

shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("red",0.3)) # shaped prediction
1000

10
interval for simulated RT_ms values
20
30
40
NumberDistractors
50
60
Conclusions

HPDI vs. CI

Predictions should consider uncertainty about the model


Bayesian analysis allows you to do this in a way that
cannot be done with typical linear regression
Extending the model to consider different slopes for
different conditions is straightforward