Download PPT slides for 04 October - Purdue Psychological Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Binomial regression
Greg Francis
PSY 626: Bayesian Statistics for Psychological Science
Fall 2016
Purdue University
Zenner Cards

Guess which card appears next:
Zenner Cards

Guess which card appears next:
Zenner Cards

Guess which card appears next:
Data


Score indicates whether you
predicted correctly (1) or not
(0)
File ZennerCards.csv
contains the data for 22
participants
Loading data


# load full data file
ZCdata<read.csv(file="ZennerCards.csv",header=TRUE,stringsAsFactors=FALSE)

# load the rethinking library

library(rethinking)

# Dummy variables to indicate actual card type and guessed card type

ZCdata$CardIndex <- coerce_index(ZCdata$ActualCard)

ZCdata$SelectedCardIndex <- coerce_index(ZCdata$GuessedCard)
Binomial model

yi is number of observed outcomes (e.g., correct responses) from n
draws when the probability of a correct response is pi

We know n for any trial is 1

We estimate pi

It is convenient to actually estimate the logit of pi
Binomial regression


We will model the logit as a linear equation
Among other things, this insures that our pi value is always 0 and 1 (as
all probabilities must be)
 Makes it easier to identify priors

Our MAP estimate will gives us logit(pi), to get back pi we have to invert
Model set up

# All cards and subjects treated the same

ZCmodel1 <- map(

alist( Score ~ dbinom(1, p),

logit(p) <- a,

a ~ dnorm(0, 10)

), data= ZCdata )
Model results

Maximum a posteriori (MAP) model fit

Formula:

Score ~ dbinom(1, p)

logit(p) <- a

a ~ dnorm(0, 10)

MAP values:

a

-1.313936

Log-likelihood: -567.99 a is logit(p)

a=coef(ZCmodel1)["a”]

cat("Probability of correct response: ", logistic(a))

Probability of correct response: 0.211829
Model results



> precis(ZCmodel1)
Mean StdDev 5.5% 94.5%
a -1.31 0.07 -1.43 -1.2
> logistic(-1.43)
[1] 0.1930987
> logistic(-1.2)
[1] 0.2314752
Differences across cards

# Different probabilities for different actual cards

ZCmodel2 <- map(

alist( Score ~ dbinom(1, p),

logit(p) <- a[CardIndex],

a[CardIndex] ~ dnorm(0, 10)

), data= ZCdata )

print(precis(ZCmodel2, depth=2))

aValues2<-c()

for(p in unique(ZCdata$CardIndex)){

code<-sprintf("a[%d]", p)

aValues2<- c(aValues2, coef(ZCmodel2)[code])

}

cat("Probability of correct response: ", logistic(aValues2))
Differences across cards

Maximum a posteriori (MAP) model fit

Formula:

Score ~ dbinom(1, p)
Probability of correct response:
0.1818814
0.1818725
0.1818878
0.2454952
0.2682286
a[1]
ZCmodel2

logit(p) <- a[CardIndex]
a[2]

a[CardIndex] ~ dnorm(0, 10)

MAP values:
a[1]


a[2]
ZCmodel2
a[3]
a[3]
a[4]
ZCmodel2
a[5]
a[4]
-1.503652 -1.122784 -1.003628 -1.503610
-1.503713
ZCmodel2

Log-likelihood: -563.45
a[5]

>

> plot(coeftab(ZCmodel2))
ZCmodel2
-1.8
-1.6
-1.4
-1.2
Estimate
-1.0
-0.8
Differences across cards


compare(ZCmodel1, ZCmodel2)
WAIC pWAIC dWAIC weight SE dSE

ZCmodel2 1137.0 5.0
0.0
0.61 35.9 NA

ZCmodel1 1137.8 0.9
0.9
0.39 35.6 6.2


A model that uses different probabilities for different cards is expected to
do better than a model that ignores card type.
Does this suggest that people have some kind of predictive power?
Differences in guesses

It might be better to see if participant’s guesses improve model fit

# Different probabilities for different selected cards

ZCmodel4 <- map(

alist( Score ~ dbinom(1, p),

logit(p) <- a[SelectedCardIndex],

a[SelectedCardIndex] ~ dnorm(0, 10)

), data= ZCdata )

Maximum a posteriori (MAP) model fit

Formula:

Score ~ dbinom(1, p)

logit(p) <- a[SelectedCardIndex]

a[SelectedCardIndex] ~ dnorm(0, 10)

MAP values:

a[1]
a[2]
a[3]
a[4]
a[5]

-1.440575 -1.318974 -1.169226 -1.509158 -1.139059

Log-likelihood: -566.16
Probability of correct response:
0.1914564
0.2424932
0.1810636
0.2109891
0.2369949
Null model

Put a tight prior on a to be around logit(0.2)

# Null model (p=0.2)

ZCmodel0 <- map(

alist( Score ~ dbinom(1, p),

logit(p) <- a,

a ~ dnorm(logit(0.2), 0.001)

), data= ZCdata )

Maximum a posteriori (MAP) model fit

Formula:

Score ~ dbinom(1, p)

logit(p) <- a

a ~ dnorm(logit(0.2), 0.001)

MAP values:

a

-1.386281

Log-likelihood: -568.46
p=0.2000021
Comparing models

All models

> compare(ZCmodel0, ZCmodel1, ZCmodel2, ZCmodel4)

WAIC pWAIC dWAIC weight
SE dSE

ZCmodel2 1136.6 4.8 0.0 0.42 35.87 NA

ZCmodel0 1136.9 0.0 0.3 0.36 37.57 6.49

ZCmodel1 1138.1 1.1 1.5 0.20 35.68 6.10

ZCmodel4 1142.1 4.9 5.5 0.03 35.97 5.97

Model with different probabilities for different cards slightly beats null
model
 Indicates some response bias for some cards over other cards
Comparing models

Among models that might indicate prediction ability

> compare(ZCmodel0, ZCmodel1, ZCmodel4)

WAIC pWAIC dWAIC weight
SE dSE

ZCmodel0 1136.9 0.0 0.0 0.62 37.57 NA

ZCmodel1 1138.2 1.1 1.2 0.33 35.68 1.89

ZCmodel4 1142.3 5.0 5.3 0.04 35.82 4.39

Null model expected to best predict future data
 Its constraints (lack of flexibility) do not hurt much
 Flexibility of other models are expected to fit noise in the data
» Which hurts model prediction
Participants?















Probability of correct response:
0.1203492
0.3201697
Different probabilities for each participant
0.2002895
0.160307
ZCmodel3 <- map(
0.1802886
0.1803079
alist( Score ~ dbinom(1, p), 0.2201928
0.2202362
0.3201096
logit(p) <- a[Participant],
0.2003328
0.2401766
a[Participant] ~ dnorm(0, 10)0.140386
0.1803173
), data= ZCdata )
0.2202293
0.1803012
Maximum a posteriori (MAP) model fit
0.3401661
Formula:
0.2601691
Score ~ dbinom(1, p)
0.1803946
logit(p) <- a[Participant]
0.2402025
0.2402763
a[Participant] ~ dnorm(0, 10)
0.120403
MAP values:
0.2002914
a[1]
a[2]
a[3]
a[4]
a[5]
a[6]
a[7]
a[8]
a[9]
a[10]
a[11]
a[12]
a[13]
-1.9891274 -0.7529919 -1.3844860 -1.6559456 -1.5143936 -1.5142631 -1.2645434 -1.2642905 -0.7532682 -1.3842158 -1.1517115 1.8120881 -1.5141990
a[14]
a[15]
a[16]
a[17]
a[18]
a[19]
a[20]
a[21]
a[22]

-1.2643308 -1.5143083 -0.6625543 -1.0450897 -1.5136766 -1.1515694 -1.1511654 -1.9886197 -1.3844742

Log-likelihood: -556.92
Comparing models


> compare(ZCmodel0, ZCmodel1, ZCmodel4, ZCmodel3)
WAIC pWAIC dWAIC weight
SE dSE

ZCmodel0 1136.9 0.0 0.0 0.60 37.57 NA

ZCmodel1 1138.0 1.0 1.0 0.36 35.53 2.04

ZCmodel4 1142.7 5.2 5.8 0.03 35.93 4.33

ZCmodel3 1158.7 22.3 21.8 0.00 36.79 9.96

Little support for individual differences (at least relative to the other
models)
Participants and cards?

ZCmodel5 <- map(
a[1]ZCmodel5
alist( Score
~ dbinom(1, p),
a[2]

















ZCmodel5
a[3]ZCmodel5
a[4]ZCmodel5
a[5]ZCmodel5
b[1]ZCmodel5
b[2]ZCmodel5
b[3]ZCmodel5
b[4]ZCmodel5
b[5]ZCmodel5
b[6]ZCmodel5
b[7]ZCmodel5
b[8]ZCmodel5
b[9]ZCmodel5
b[10]
ZCmodel5
b[11]
Maximum a posteriori (MAP) model fit
ZCmodel5
b[12]
ZCmodel5
Formula:
b[13]
ZCmodel5
Score ~ dbinom(1, p)
b[14]
ZCmodel5
b[15]
logit(p) <- a[SelectedCardIndex] + b[Participant]
ZCmodel5
b[16]
a[SelectedCardIndex] ~ dnorm(0, 10)
ZCmodel5
b[17]
ZCmodel5
b[Participant] ~ dnorm(0, 10)
b[18]
ZCmodel5
MAP values:
b[19]
ZCmodel5
a[1]
a[2]
a[3]
a[4]
a[5]
b[1]
b[2] b[20]
b[3]
b[4]
b[5]
b[6]
ZCmodel5
b[21]
ZCmodel5
-1.251942949 -1.076632524 -0.952019133 -1.322105326 -0.916094125 -0.909358282 0.348792015 -0.304016459 -0.541041776 -0.455564670 -0.433982656
b[22]
ZCmodel5
logit(p) <- a[SelectedCardIndex] + b[Participant],
a[SelectedCardIndex] ~ dnorm(0, 10),
b[Participant] ~ dnorm(0, 10)
), data= ZCdata )
b[7]
b[8]
b[9]
b[10]
b[11]
b[12]
b[13]
b[14]
b[15]
b[16]
b[17]
-0.147331153 -0.207390235 0.375473532 -0.288378679 -0.049252023 -0.698171384 -0.433786977
-4 -0.193572279
-2 -0.4048440170 0.439403233 20.058472812
b[18]
b[19]
b[20]
b[21]
b[22]

-0.380296031 0.003302607 -0.061581956 -0.898467930 -0.275281342

Log-likelihood: -554.8
Estimate
4
Looking at probabilities

aValues5<-c()

for(p in unique(ZCdata$Participant)){

for(p2 in unique(ZCdata$CardIndex)){

code<-sprintf("b[%d]", p)

code2<-sprintf("a[%d]", p2)

aValues5<- c(aValues5, coef(ZCmodel5)[code] + coef(ZCmodel5)[code2])

}

}

cat("Probability of correct response: ", logistic(aValues5))

Probability of correct response: 0.1032799 0.1387809 0.09696041 0.1206817 0.1345426 0.2884034 0.3618596
0.2742206 0.3256688 0.3536057 0.1742272 0.227917 0.1643623 0.2009048 0.2216571 0.1427072 0.1889058 0.1343367
0.1655259 0.1834627 0.1534873 0.202352 0.1445911 0.1776724 0.1966154 0.1563124 0.2058578 0.147281 0.1808476
0.2000468 0.1979313 0.2566554 0.1870283 0.2272397 0.2498617 0.1885693 0.2453655 0.1780675 0.2168662
0.2387746 0.2939099 0.3680432 0.2795627 0.3315553 0.3597278 0.1764885 0.2306805 0.1665214 0.2034271
0.2243667 0.213964 0.2758091 0.2024006 0.2449214 0.2686916 0.1245409 0.1659972 0.1170904 0.1449459 0.1610832
0.1563382 0.2058898 0.1473055 0.1808766 0.2000781 0.1906927 0.2479331 0.1800989 0.2192222 0.2412953
0.1601938 0.2106623 0.1509782 0.1852045 0.2047506 0.3073496 0.3830338 0.2926182 0.3458731 0.3745805
0.2326389 0.2978366 0.2203492 0.265386 0.2903785 0.1635239 0.2147732 0.1541519 0.1889375 0.2087767 0.2229356
0.2864289 0.2110176 0.2547703 0.279143 0.2118976 0.2733531 0.2004175 0.2426483 0.2662757 0.1042928 0.1400877
0.09791816 0.1218421 0.1358157 0.1784002 0.233013 0.1683472 0.2055577 0.2266543
Comparing models


> compare(ZCmodel0, ZCmodel1, ZCmodel4, ZCmodel3, ZCmodel5)
WAIC pWAIC dWAIC weight
SE dSE

ZCmodel0 1136.9 0.0 0.0 0.59 37.57
NA

ZCmodel1 1137.9 1.0 1.0 0.37 35.48 2.09

ZCmodel4 1142.2 4.9 5.3 0.04 35.74 4.43

ZCmodel3 1159.2 22.5 22.3 0.00 36.87 9.82

ZCmodel5 1162.6 26.3 25.6 0.00 37.19 10.89
Trial effects


Modify the code to look for a trial effect
Does a model that includes just trial effects (no card or
participant effects) beat the null model?
Related documents