Download PPT slides for 04 October - Purdue Psychological Sciences

Binomial regression Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University Zenner Cards  Guess which card appears next: Zenner Cards  Guess which card appears next: Zenner Cards  Guess which card appears next: Data   Score indicates whether you predicted correctly (1) or not (0) File ZennerCards.csv contains the data for 22 participants Loading data   # load full data file ZCdata<read.csv(file="ZennerCards.csv",header=TRUE,stringsAsFactors=FALSE)  # load the rethinking library  library(rethinking)  # Dummy variables to indicate actual card type and guessed card type  ZCdata$CardIndex <- coerce_index(ZCdata$ActualCard)  ZCdata$SelectedCardIndex <- coerce_index(ZCdata$GuessedCard) Binomial model  yi is number of observed outcomes (e.g., correct responses) from n draws when the probability of a correct response is pi  We know n for any trial is 1  We estimate pi  It is convenient to actually estimate the logit of pi Binomial regression   We will model the logit as a linear equation Among other things, this insures that our pi value is always 0 and 1 (as all probabilities must be)  Makes it easier to identify priors  Our MAP estimate will gives us logit(pi), to get back pi we have to invert Model set up  # All cards and subjects treated the same  ZCmodel1 <- map(  alist( Score ~ dbinom(1, p),  logit(p) <- a,  a ~ dnorm(0, 10)  ), data= ZCdata ) Model results  Maximum a posteriori (MAP) model fit  Formula:  Score ~ dbinom(1, p)  logit(p) <- a  a ~ dnorm(0, 10)  MAP values:  a  -1.313936  Log-likelihood: -567.99 a is logit(p)  a=coef(ZCmodel1)["a”]  cat("Probability of correct response: ", logistic(a))  Probability of correct response: 0.211829 Model results    > precis(ZCmodel1) Mean StdDev 5.5% 94.5% a -1.31 0.07 -1.43 -1.2 > logistic(-1.43) [1] 0.1930987 > logistic(-1.2) [1] 0.2314752 Differences across cards  # Different probabilities for different actual cards  ZCmodel2 <- map(  alist( Score ~ dbinom(1, p),  logit(p) <- a[CardIndex],  a[CardIndex] ~ dnorm(0, 10)  ), data= ZCdata )  print(precis(ZCmodel2, depth=2))  aValues2<-c()  for(p in unique(ZCdata$CardIndex)){  code<-sprintf("a[%d]", p)  aValues2<- c(aValues2, coef(ZCmodel2)[code])  }  cat("Probability of correct response: ", logistic(aValues2)) Differences across cards  Maximum a posteriori (MAP) model fit  Formula:  Score ~ dbinom(1, p) Probability of correct response: 0.1818814 0.1818725 0.1818878 0.2454952 0.2682286 a[1] ZCmodel2  logit(p) <- a[CardIndex] a[2]  a[CardIndex] ~ dnorm(0, 10)  MAP values: a[1]   a[2] ZCmodel2 a[3] a[3] a[4] ZCmodel2 a[5] a[4] -1.503652 -1.122784 -1.003628 -1.503610 -1.503713 ZCmodel2  Log-likelihood: -563.45 a[5]  >  > plot(coeftab(ZCmodel2)) ZCmodel2 -1.8 -1.6 -1.4 -1.2 Estimate -1.0 -0.8 Differences across cards   compare(ZCmodel1, ZCmodel2) WAIC pWAIC dWAIC weight SE dSE  ZCmodel2 1137.0 5.0 0.0 0.61 35.9 NA  ZCmodel1 1137.8 0.9 0.9 0.39 35.6 6.2   A model that uses different probabilities for different cards is expected to do better than a model that ignores card type. Does this suggest that people have some kind of predictive power? Differences in guesses  It might be better to see if participant’s guesses improve model fit  # Different probabilities for different selected cards  ZCmodel4 <- map(  alist( Score ~ dbinom(1, p),  logit(p) <- a[SelectedCardIndex],  a[SelectedCardIndex] ~ dnorm(0, 10)  ), data= ZCdata )  Maximum a posteriori (MAP) model fit  Formula:  Score ~ dbinom(1, p)  logit(p) <- a[SelectedCardIndex]  a[SelectedCardIndex] ~ dnorm(0, 10)  MAP values:  a[1] a[2] a[3] a[4] a[5]  -1.440575 -1.318974 -1.169226 -1.509158 -1.139059  Log-likelihood: -566.16 Probability of correct response: 0.1914564 0.2424932 0.1810636 0.2109891 0.2369949 Null model  Put a tight prior on a to be around logit(0.2)  # Null model (p=0.2)  ZCmodel0 <- map(  alist( Score ~ dbinom(1, p),  logit(p) <- a,  a ~ dnorm(logit(0.2), 0.001)  ), data= ZCdata )  Maximum a posteriori (MAP) model fit  Formula:  Score ~ dbinom(1, p)  logit(p) <- a  a ~ dnorm(logit(0.2), 0.001)  MAP values:  a  -1.386281  Log-likelihood: -568.46 p=0.2000021 Comparing models  All models  > compare(ZCmodel0, ZCmodel1, ZCmodel2, ZCmodel4)  WAIC pWAIC dWAIC weight SE dSE  ZCmodel2 1136.6 4.8 0.0 0.42 35.87 NA  ZCmodel0 1136.9 0.0 0.3 0.36 37.57 6.49  ZCmodel1 1138.1 1.1 1.5 0.20 35.68 6.10  ZCmodel4 1142.1 4.9 5.5 0.03 35.97 5.97  Model with different probabilities for different cards slightly beats null model  Indicates some response bias for some cards over other cards Comparing models  Among models that might indicate prediction ability  > compare(ZCmodel0, ZCmodel1, ZCmodel4)  WAIC pWAIC dWAIC weight SE dSE  ZCmodel0 1136.9 0.0 0.0 0.62 37.57 NA  ZCmodel1 1138.2 1.1 1.2 0.33 35.68 1.89  ZCmodel4 1142.3 5.0 5.3 0.04 35.82 4.39  Null model expected to best predict future data  Its constraints (lack of flexibility) do not hurt much  Flexibility of other models are expected to fit noise in the data » Which hurts model prediction Participants?                Probability of correct response: 0.1203492 0.3201697 Different probabilities for each participant 0.2002895 0.160307 ZCmodel3 <- map( 0.1802886 0.1803079 alist( Score ~ dbinom(1, p), 0.2201928 0.2202362 0.3201096 logit(p) <- a[Participant], 0.2003328 0.2401766 a[Participant] ~ dnorm(0, 10)0.140386 0.1803173 ), data= ZCdata ) 0.2202293 0.1803012 Maximum a posteriori (MAP) model fit 0.3401661 Formula: 0.2601691 Score ~ dbinom(1, p) 0.1803946 logit(p) <- a[Participant] 0.2402025 0.2402763 a[Participant] ~ dnorm(0, 10) 0.120403 MAP values: 0.2002914 a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] a[13] -1.9891274 -0.7529919 -1.3844860 -1.6559456 -1.5143936 -1.5142631 -1.2645434 -1.2642905 -0.7532682 -1.3842158 -1.1517115 1.8120881 -1.5141990 a[14] a[15] a[16] a[17] a[18] a[19] a[20] a[21] a[22]  -1.2643308 -1.5143083 -0.6625543 -1.0450897 -1.5136766 -1.1515694 -1.1511654 -1.9886197 -1.3844742  Log-likelihood: -556.92 Comparing models   > compare(ZCmodel0, ZCmodel1, ZCmodel4, ZCmodel3) WAIC pWAIC dWAIC weight SE dSE  ZCmodel0 1136.9 0.0 0.0 0.60 37.57 NA  ZCmodel1 1138.0 1.0 1.0 0.36 35.53 2.04  ZCmodel4 1142.7 5.2 5.8 0.03 35.93 4.33  ZCmodel3 1158.7 22.3 21.8 0.00 36.79 9.96  Little support for individual differences (at least relative to the other models) Participants and cards?  ZCmodel5 <- map( a[1]ZCmodel5 alist( Score ~ dbinom(1, p), a[2]                  ZCmodel5 a[3]ZCmodel5 a[4]ZCmodel5 a[5]ZCmodel5 b[1]ZCmodel5 b[2]ZCmodel5 b[3]ZCmodel5 b[4]ZCmodel5 b[5]ZCmodel5 b[6]ZCmodel5 b[7]ZCmodel5 b[8]ZCmodel5 b[9]ZCmodel5 b[10] ZCmodel5 b[11] Maximum a posteriori (MAP) model fit ZCmodel5 b[12] ZCmodel5 Formula: b[13] ZCmodel5 Score ~ dbinom(1, p) b[14] ZCmodel5 b[15] logit(p) <- a[SelectedCardIndex] + b[Participant] ZCmodel5 b[16] a[SelectedCardIndex] ~ dnorm(0, 10) ZCmodel5 b[17] ZCmodel5 b[Participant] ~ dnorm(0, 10) b[18] ZCmodel5 MAP values: b[19] ZCmodel5 a[1] a[2] a[3] a[4] a[5] b[1] b[2] b[20] b[3] b[4] b[5] b[6] ZCmodel5 b[21] ZCmodel5 -1.251942949 -1.076632524 -0.952019133 -1.322105326 -0.916094125 -0.909358282 0.348792015 -0.304016459 -0.541041776 -0.455564670 -0.433982656 b[22] ZCmodel5 logit(p) <- a[SelectedCardIndex] + b[Participant], a[SelectedCardIndex] ~ dnorm(0, 10), b[Participant] ~ dnorm(0, 10) ), data= ZCdata ) b[7] b[8] b[9] b[10] b[11] b[12] b[13] b[14] b[15] b[16] b[17] -0.147331153 -0.207390235 0.375473532 -0.288378679 -0.049252023 -0.698171384 -0.433786977 -4 -0.193572279 -2 -0.4048440170 0.439403233 20.058472812 b[18] b[19] b[20] b[21] b[22]  -0.380296031 0.003302607 -0.061581956 -0.898467930 -0.275281342  Log-likelihood: -554.8 Estimate 4 Looking at probabilities  aValues5<-c()  for(p in unique(ZCdata$Participant)){  for(p2 in unique(ZCdata$CardIndex)){  code<-sprintf("b[%d]", p)  code2<-sprintf("a[%d]", p2)  aValues5<- c(aValues5, coef(ZCmodel5)[code] + coef(ZCmodel5)[code2])  }  }  cat("Probability of correct response: ", logistic(aValues5))  Probability of correct response: 0.1032799 0.1387809 0.09696041 0.1206817 0.1345426 0.2884034 0.3618596 0.2742206 0.3256688 0.3536057 0.1742272 0.227917 0.1643623 0.2009048 0.2216571 0.1427072 0.1889058 0.1343367 0.1655259 0.1834627 0.1534873 0.202352 0.1445911 0.1776724 0.1966154 0.1563124 0.2058578 0.147281 0.1808476 0.2000468 0.1979313 0.2566554 0.1870283 0.2272397 0.2498617 0.1885693 0.2453655 0.1780675 0.2168662 0.2387746 0.2939099 0.3680432 0.2795627 0.3315553 0.3597278 0.1764885 0.2306805 0.1665214 0.2034271 0.2243667 0.213964 0.2758091 0.2024006 0.2449214 0.2686916 0.1245409 0.1659972 0.1170904 0.1449459 0.1610832 0.1563382 0.2058898 0.1473055 0.1808766 0.2000781 0.1906927 0.2479331 0.1800989 0.2192222 0.2412953 0.1601938 0.2106623 0.1509782 0.1852045 0.2047506 0.3073496 0.3830338 0.2926182 0.3458731 0.3745805 0.2326389 0.2978366 0.2203492 0.265386 0.2903785 0.1635239 0.2147732 0.1541519 0.1889375 0.2087767 0.2229356 0.2864289 0.2110176 0.2547703 0.279143 0.2118976 0.2733531 0.2004175 0.2426483 0.2662757 0.1042928 0.1400877 0.09791816 0.1218421 0.1358157 0.1784002 0.233013 0.1683472 0.2055577 0.2266543 Comparing models   > compare(ZCmodel0, ZCmodel1, ZCmodel4, ZCmodel3, ZCmodel5) WAIC pWAIC dWAIC weight SE dSE  ZCmodel0 1136.9 0.0 0.0 0.59 37.57 NA  ZCmodel1 1137.9 1.0 1.0 0.37 35.48 2.09  ZCmodel4 1142.2 4.9 5.3 0.04 35.74 4.43  ZCmodel3 1159.2 22.5 22.3 0.00 36.87 9.82  ZCmodel5 1162.6 26.3 25.6 0.00 37.19 10.89 Trial effects   Modify the code to look for a trial effect Does a model that includes just trial effects (no card or participant effects) beat the null model?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PPT slides for 04 October - Purdue Psychological Sciences