Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Applied Bayesian Inference, KSU, April 29, 2012
§❹ The Bayesian Revolution:
Markov Chain Monte Carlo (MCMC)
Robert J. Tempelman
§ / 1
Applied Bayesian Inference, KSU, April 29, 2012
Simulation-based inference
• Suppose you’re interested in the following
integral/expectation:
f(x): density
E  g  x  
 g  x  f  x dx
g(x): function.
x
• You can draw random samples x1,x2,…,xn from
f(x). Then compute
n
1
Eˆ  g  x     g  xi   E  g  x  
n i 1
As n → 
• With Monte Carlo Standard Error:
n
1
n
  g  xi   Eˆ  g  x  
2
i 1
n 1
§ /
2
Applied Bayesian Inference, KSU, April 29, 2012
Beauty of Monte Carlo methods
• You can determine the distribution of any
function of the random variable(s).
• Distribution summaries include:
–
–
–
–
–
Means,
Medians,
Key Percentiles (2.5%, 97.5%)
Standard Deviations,
Etc.
• Generally more reliable than using “Delta
method” especially for highly non-normal
distributions.
§ /
3
Applied Bayesian Inference, KSU, April 29, 2012
Using method of composition for
sampling (Tanner, 1996).
• Involve two stages of sampling.
• Example:
 li
Prob
Y
y
|
l
e
i
– Suppose Yi|li~Poisson(li)
i
– In turn., li|a,b ~ Gamma(a,b)
li y
y!
b a a 1  bl
 p  li | a , b  
li e
 a 
i
– Then
Pr ob Yi  y | a , b  
 Pr ob Y  y | l  p  l | a , b  dl
i
i
i
i
Rli
  yi  a   b   1 
li y b a a 1  bl
li e d li 
 
y !  a 
yi ! a   1  b   1  b 
a
Rli
e  li
i
yi
negative binomial
distribution with mean
a/b and variance
(a/b)(1+ b -1).
§ / 4
Applied Bayesian Inference, KSU, April 29, 2012
Using method of composition for
sampling from negative binomial:
data new;
seed1 = 2;
alpha = 2; beta = 0.25;
do j = 1 to 10000;
call rangam(seed1,alpha,x);
lambda = x/beta;
call ranpoi(seed1,lambda,y);
output;
end;
run;
proc means mean var;
var y;
run;
1. Draw li|a,b ~ Gamma(a,b) .
2. Draw Yi ~Poisson(li)
The MEANS Procedure
Variable
y
Mean
7.9749
Variance
39.2638
E(y) = a/b  2/0.25  8
Var(y) = (a/b)(1+ b -1) = 8*(1+4)=40
§ /
5
Applied Bayesian Inference, KSU, April 29, 2012
Another
example?
Student
t.
data new;
seed1 = 29523; df=4;
do j = 1 to 100000;
1. Draw li|n ~ Gamma(n/2,n/2) .
call rangam(seed1,df/2,x);
2. Draw ti |li~Normal(0,1/li)
lambda = x/(df/2);
t = rannor(seed1)/sqrt(lambda);
output;
Then t ~ Student tn
end;
run;
Variable Mean Variance 5th Pctl 95th Pctl
proc means mean var p5 p95;
var t;
t
-0.00524 2.011365 -2.1376 2.122201
run;
data new;
t5 = tinv(.05,4);
Obs
t5
t95
t95 = tinv(.95,4);
run;
1
-2.1319 2.13185
proc print;
§ /
run;
6
Applied Bayesian Inference, KSU, April 29, 2012
Expectation-Maximization (EM)
• Ok, I know that EM is NOT a simulation-based
inference procedure.
– However, it is based on data augmentation.
• Important progenitor of Markov Chain Monte Carlo
(MCMC) methods
– Recall the plant genetics example
y
y
y
n!
 2   1  1 
L | y  
 
 
y1! y 2 ! y3 ! y 4 !  4   4   4 
1
2
 2   1  1   
L | y   
 
 
  
 4   4   4  4
y1
y2
y3
3
 
 
4
y4
y4
§ /
7
Applied Bayesian Inference, KSU, April 29, 2012
Data augmentation
• Augment “data” by splitting first cell into two
cells with probabilities ½ and /4 for 5
categories:
 1   
L  | y,x      
2  4
x
Looks like a Beta
Distribution to me!
 1   1    
 
  
 4   4  4
y2
y1  x  y4
 1 
4
y1  x  y 4
1   
 
 
4
L | y, x    
y1  x
y3
y4
y2  y3
y 2  y3
 a  b 
a 1
b 1
p   
  1   
 a    b 
§ /
8
Applied Bayesian Inference, KSU, April 29, 2012
Data augmentation (cont’d)
• So joint distribution of “complete” data:
y x
1
2
3
n!
 2      1   1    
p  y,x |   
    
 
  
x ! y1  x  ! y2 ! y3 ! y4 !  4   4   4   4   4 
x
y
y
y4
• Consider the part just including the “missing
data”
 y  2    
px | y,    1 
 
x
    2     2 
x
y1  x
binomial
 2 
E  x | y,   y1 
2
§ /
9
Applied Bayesian Inference, KSU, April 29, 2012
Expectation-Maximization.
• Start with complete log-likelihood:
x
y1  x
y2
y3
y4
n!
2
1
1
    
 
   
log L  | y,x   log 
 x ! y1  x  ! y2 ! y3 ! y4 !  4   4   4   4   4  
1  
log L   | y,x   cons   y1  x  y4  log     y2  y3  log 
4
 4 
• 1. Expectation (E-step)
E log L  | y,x    y1  E  x   y4  log     y2  y3  log 1   
x
 2 
  y1  y1  [t ]
y
4  log     y2  y3  log 1   
 ˆ  2 
  ˆ [t ] 
  y 4  log     y 2  y3  log 1   
  y1  [t ]
 ˆ  2 
 
§ / 10
Applied Bayesian Inference, KSU, April 29, 2012
• 2. Maximization step
– Use first or second derivative methods to
maximize Ex log L  | y,x 
  ˆ[t ] 
 y1  ˆ[t ]
  y4 
 E log L  | y,x      2 
y2  y3 
1   
– Set to 0:
ˆ [ t 1]
  ˆ [ t ] 
 y1 
  y4 
  ˆ [ t ]  2 
 
 ˆ [ t ] 
  y 2  y3  y 4
y1  [ t ]
ˆ  2
§ / 11
Applied Bayesian Inference, KSU, April 29, 2012
Recall the data
Probability
p1 
p2 
p3 
2   
Genotype
Data (Counts)
Prob(A_B_)
y1=1997
Prob(aaB_)
y2=906
Prob(A_bb)
y3=904
Prob(aabb)
y4=32
4
1   
4
1   
p4 
4
4
01
 → 0: close linkage in repulsion
 → 1: close linkage in coupling § / 12
Applied Bayesian Inference, KSU, April 29, 2012
iter
theta
1
0.1055303
2
0.0680147
3
0.0512031
PROC IML code:
proc iml;
4
y1 = 1997; y2 = 906; y3 = 904; y4 = 32;
5
theta = 0.20; /*Starting value */
6
do iter = 1 to 20;
7
Ex2 = y1*(theta)/(theta+2); /* E-step */
8
theta = (Ex2+y4)/(Ex2+y2+y3+y4);/* M-step */ 9
10
print iter theta;
11
end;
12
run;
Slower than Newton-Raphson/Fisher
scoring…but generally more robust to
poorer starting values.
0.0432646
0.0394234
0.0375429
0.036617
0.0361598
0.0359338
0.0358219
0.0357666
0.0357392
13
0.0357256
14
0.0357189
15
0.0357156
16
0.0357139
17
0.0357131
18
0.0357127
19
0.0357125
20
0.0357124
§ / 13
Applied Bayesian Inference, KSU, April 29, 2012
How derive an asymptotic standard
error using EM?
• From Louis (1982):
 2 log  p  θ | Y  
θ
2
 
X
  log  p  θ | Y, X   
p  X | θ, Y  dX - var 
X|θ , Y
θ
y  x  y4
 log  p  | y,x  
var  x |  , y   ˆ
θ 2
p | y, x     1
Given:
 2 log  p  θ | Y, X  
1    y  y
2
3
y1  x  y4   y2  y3 
1   
2
 2   ˆ 
  .0357 
 y1 
1997
 ˆ
  34.42
ˆ
.0357
2
.0
357
2
  2   2 
  log  p  | y,x   
1
34.42
x
var 
var
|
y
,
var
x
|
y
,
 26987.41
2
2
  ˆ 
.0357 
 ˆ
  ˆ
§ / 14
Applied Bayesian Inference, KSU, April 29, 2012
Finish off
 2 log  p  θ | Y  
θ
 
2
X
• Now
 2 log  p  θ | Y, X  
θ 2
 2 log  p  | y,x  
 2
26987.41
  log  p  θ | Y, X   
p  X | θ, Y  dX - var 
X|θ , Y
θ
y1  x  y4   y2  y3 
2
2
1   
 2 
y
y
y
  2 log  p  | y,x  
  1 1  2  ˆ  4   y  y 
3
 2
E
 54507.06
2
2
2
x 
ˆ
1  ˆ
 ˆ 
• Hence:
 2 log  p  θ | Y  
θ 2
 54507.06  26987.41  27519.65
 ˆ
se ˆ 
1
 .0060
27519.65
§ / 15
Applied Bayesian Inference, KSU, April 29, 2012
Stochastic Data Augmentation
(Tanner, 1996)
• Posterior Identity
p  | y    p  | y, x  p ( x | y )dx
R
• Predictive Identity
p x | y    p x | y ,  p ( | y )d
x
• Implies
R
p  | y  
 p  | y, x   p  x | y,  p( | y)ddx
Rx
R
    p  | y , x  p  x | y ,   dx  p ( | y )d
R  R x
  K    | y   p( | y )d
R
Suggests an
“iterative” method of
composition approach for
sampling
K    | y 
Transition
function for
Markov Chain
§ / 16
Applied Bayesian Inference, KSU, April 29, 2012
Sampling strategy from p(|y)
• Start somewhere: (starting value = [0] )
– Sample x[1] from p x | y ,   
– Sample [1] from p  | y , x  x[1] 
[0]
p  x | y ,    [1] 
– Sample
[2]
[2]
p
|
y
,
x
x
– Sample  ] from 
x[2] from
Cycle 1
Cycle 2
– etc.
– It’s like sampling from “E-steps” and “M-steps”
§ / 17
Applied Bayesian Inference, KSU, April 29, 2012
What are these
Full Conditional Densities (FCD) ?
• Recall “complete” likelihood function
y x
1
2
3
n!
 2      1   1    
p  y,x |   
    
 
  
x ! y1  x  ! y2 ! y3 ! y4 !  4   4   4   4   4 
x
y
y
y4
• Assume prior on  is “flat” p    1 :
p  | y, x   p  y, x |  
• FCD:
p  | y,x    
 y1  x  y4 1 1
1   
 2    
p  x | y,   
 
  2    2 
x
 y2  y3 1 1 Beta(a=(y -x +y +1),b=(y +y +1))
1
4
2 3
y1  x
Binomial(n=y1, p = 2/(+2))
§ / 18
Applied Bayesian Inference, KSU, April 29, 2012
Starting
value
IML code for Chained Data
Augmentation Example
proc iml;
seed1=4;
ncycle = 10000;
/* total number of samples */
theta = j(ncycle,1,0);
y1 = 1997; y2 = 906; y3 = 904; y4 = 32;
beta = y2+y3+1;
theta[1] = ranuni(seed1);
/* initial draw between 0 and 1 */
do cycle = 2 to ncycle;
p = 2/(2+theta[cycle-1]);
xvar= ranbin(seed1,y1,p);
alpha = y1+y4-xvar+1;
xalpha = rangam(seed1,alpha);
xbeta = rangam(seed1,beta);
theta[cycle] = xalpha/(xalpha+xbeta);
end;
create parmdata var {theta xvar };
append;
run;
data parmdata;
set parmdata;
Gamma a ,1
cycle = _n_;
Beta a , b  
Gamma a ,1  Gamma  b ,1 run;
§ / 19
Applied Bayesian Inference, KSU, April 29, 2012
“bad” starting value
Trace Plot
proc gplot data=parmdata;
plot theta*cycle;
Should discard the first “few”
run;
samples to ensure that one is
truly sampling from p(|y)
Starting value should have no
impact.
Burn
-in?
“Convergence in distribution”.
How to decide on this stuff?
Cowles and Carlin (1996)
Throw away the first 1000 samples as “burn-in”
§ / 20
Applied Bayesian Inference, KSU, April 29, 2012
Histogram of samples
post burn-in
proc univariate data=parmdata ;
where cycle > 1000;
var theta ;
histogram/normal(color=red
mu=0.0357 sigma=0.0060);
run;
Asymptotic Likelihood
inference
Bayesian inference
N
9000
Posterior Mean
0.03671503
Post. Std Deviation 0.00607971
Quantiles for Normal Distribution
Percent Quantile
Observed Asymptotic
(Bayesian) (Likelihood)
5.0
0.02702
0.02583
95.0
0.04728
0.04557
§ / 21
Applied Bayesian Inference, KSU, April 29, 2012
Zooming in on Trace Plot
Hints of
autocorrelation.
Expected with
Markov Chain
Monte Carlo
simulation
schemes.
Number of
drawn samples is
NOT equal
number of
independent
draws.
The greater the autocorrelation…the greater the problem…need
more samples!
§ / 22
Applied Bayesian Inference, KSU, April 29, 2012
Sample autocorrelation
proc arima data=parmdata
plots(only)=series(acf);
where cycle > 1000;
identify var= theta nlag=1000
outcov=autocov ;
run;
Autocorrelation Check for White Noise
To Lag ChiDF
Square
Pr >
Autocorrelations
ChiSq
6
<.0001 0.497
3061.39 6
0.253
0.141
0.079
0.045
0.029
§ / 23
Applied Bayesian Inference, KSU, April 29, 2012
How to estimate the effective number
of independent samples (ESS)
• Consider posterior mean based on m samples:
 
m
1
ˆm    [ i ]
m i 1
1
ˆ
var  m  var  [i ] 
i 1,2,..., m
m
• Initial positive sequence estimator (Geyer,
1992; Sorensen and Gianola,
1995):
t
ˆm (0)
ˆm (0)  2 ˆ m  j 
 
var ˆm 
j 0
m
ˆ m  j   ˆm  2 j   ˆm  2 j  1 , j  0,1,...,
Sum of adjacent lag
autocovariances
variance
1 i m j [i ] ˆ
ˆm ( j )      m  [i t ]  ˆm
m i 1
Lag-m autocovariance
§ / 24
Applied Bayesian Inference, KSU, April 29, 2012
Initial positive sequence estimator
t
 
var ˆm 
ˆm (0)  2 ˆ m  j 
j 0
m
• Choose t such that all ˆ m  j   0, j  0,1,..., t
• SAS PROC MCMC chooses a slightly different
cutoff (see documentation).
ESS 
ˆm  0 
 
var ˆm
Extensive autocorrelation
across lags…..leads to
smaller ESS
§ / 25
Applied Bayesian Inference, KSU, April 29, 2012
%macro
ESS1(data,variable,startcycle,maxlag);
data _null_;
set &data nobs=_n;;
call symputx('nsample',_n);
run;
proc arima data=&data ;
where iteration > &startcycle;
identify var= &variable nlag=&maxlag
outcov=autocov ;
run;
proc iml;
use autocov;
read all var{'COV'} into cov;
nsample = &nsample;
nlag2 = nrow(cov)/2;
Gamma = j(nlag2,1,0);
cutoff = 0;
t = 0;
Recall: 9000 MCMC
post burnin cycles.
SAS code
do while (cutoff = 0);
t = t+1;
Gamma[t] = cov[2*(t-1)+1] + cov[2*(t-1)+2];
if Gamma[t] < 0 then cutoff = 1;
if t = nlag2 then
do;
print "Too much autocorrelation";
print "Specify a larger max lag"; stop;
end;
end;
varm = (-Cov[1] + 2*sum(Gamma)) / nsample;
ESS = Cov[1]/varm;
/* effective sample size */
stdm = sqrt(varm);
parameter = "&variable";
/* Monte Carlo standard error */
print parameter stdm ESS;
run;
%mend ESS1;
§ / 26
Applied Bayesian Inference, KSU, April 29, 2012
Executing %ESS1
• %ESS1(parmdata,theta,1000,1000);
Recall: 1000 MCMC burnin cycles.
parameter
stdm
ESS
theta
0.0001116
2967.1289
i.e. information equivalent to
drawing 2967 independent draws
from density.
§ / 27
Applied Bayesian Inference, KSU, April 29, 2012
How large of an ESS should I target?
• Routinely…in the thousands or greater.
• Depends on what you want to estimate.
– Recommend no less than 100 for estimating “typical”
location parameters: mean, median, etc.
– Several times that for “typical” dispersion parameters
like variance.
• Want to provide key percentiles?
– i.e., 2.5th , 97.5th percentiles? Need to have ESS in the
thousands!
– See Raftery and Lewis (1992) for further direction.
§ / 28
Applied Bayesian Inference, KSU, April 29, 2012
Worthwhile to consider this sampling
strategy?
• Not too much
difference, if any, with
likelihood inference.
• But how about smaller
samples?
– e.g.,
y1=200,y2=91,y3=90,y4=3
– Different story
§ / 29
Applied Bayesian Inference, KSU, April 29, 2012
Gibbs sampling: origins
(Geman and Geman, 1984).
• Gibbs sampling was first developed in statistical
physics in relation to spatial inference problem
– Problem: true image  was corrupted by a stochastic
process to produce an observable image y (data)
• Objective: restore or estimate the true image  in the light of the
observed image y.
– Inference on  based on the Markov random field joint
posterior distribution, through successively drawing from
updated FCD which were rather easy to specify.
– These FCD each happened to be the Gibbs distn’s.
• Misnomer has been used since to describe a rather general
process.
§ / 30
Applied Bayesian Inference, KSU, April 29, 2012
Gibbs sampling
• Extension of chained data augmentation for
case of several unknown parameters.
• Consider p = 3 unknown parameters: 1 , 2 ,3
• Joint posterior density p 1 ,  2 ,  3 | y
• Gibbs sampling: MCMC sampling strategy
where all FCD are recognizeable:
p 1 |  2 ,  3 , y
p  2 | 1 , 3 , y 
p 3 | 1 ,  2 , y 
§ / 31
Applied Bayesian Inference, KSU, April 29, 2012
Gibbs sampling: the process
1) Start with some “arbitrary” starting values
(but within allowable
parameter
space)
 
0
1
1 
     0 
 2   2 
3    0 
 3 
1
2) Draw 1 from
3) Draw  21 from
1
4) Draw 3 from
One cycle = one random
draw from p 1 ,  2 ,  3 | y
p  |      ,      , y 
p  |      ,     , y 
p 1 |  2   2  ,3  3  , y
2
1
0
0
1
0
1
3
3
1
3
1
1
2
Steps 2-4 constitute one cycle of Gibbs sampling
5) Repeat steps 2)-4) m times.
1
2
m: length of
Gibbs chain
§ / 32
Applied Bayesian Inference, KSU, April 29, 2012
General extension of Gibbs sampling
• When there are d parameters and/or blocks of
d ]
parameters: θ '  [1 θ2
(0)
(0)
(0)
θ
• Again specify starting values:  1
2
d 
• Sample from the FCD’s in cycle i
(k )
(k )
(k+1)
p
|
θ
,...,
d ,y 
Sample 1
from  1 2
 k 1  k 
k 
(k+1)
p
θ
|
,
,...,
3
d ,y 
Sample 2
from  2 1
…
 k 1  k 1
 k 1
(k+1)
p
|
,
θ
,...,
Sample d
from  d 1
2
d 1 ,y 
Generically, sample i from p  i | θ  i ,y
§ / 33
Applied Bayesian Inference, KSU, April 29, 2012
• Throw away enough burn-in samples (k<m)
• (k+1) , (k+2) ,..., (m) are a realization of a Markov chain
with equilibrium distribution p(|y)
• The m-k joint samples of (k+1) , (k+2) ,..., (m) are then
considered to be random drawings from the joint
posterior density p(|y).
• Individually, the m-k samples of j(k+1) , j(k+2) ,..., j(k+m)
are random samples of j from the marginal posterior
density , p(j|y) j = 1,2,…,d.
– i.e., j are “nuisance” variables if interest is directed on j
§ / 34
Applied Bayesian Inference, KSU, April 29, 2012
Mixed model example with known variance
components, flat prior on b.
• Recall:
1
1
 βˆ    X ' R 1X
X
'
R
Z
β,u| e2 , u2 ,y~N    ,  
1
1
-1 
 uˆ    Z ' R X Z ' R Z+G   
1
1
1
– where b  X' R X
X' R Z  X' R 1 y
 
1
1
-1  
1 
Z
'
R
X
Z
'
R
Z
+
G
 Z' R y 
u  
• Write
– i.e.
ˆ
β
θˆ   
uˆ 
 X ' R 1X
X ' R 1Z 
C
1
1
-1 
 Z ' R X Z ' R Z+G 
β 
θ 
u 
θ| , ,y~N θˆ , C
2
e
2
u
1
ALREADY KNOW JOINT
POSTERIOR DENSITY!
§ / 35
Applied Bayesian Inference, KSU, April 29, 2012
FCD for mixed effects model with
known variance components
• Ok..really pointless to use MCMC here..but
let’s demonstrate. But it be can shown FCD
are:
~ ~
2
2
 i | y,   i ,  e ,  u ~ N  i , vi
 X'R 1y 
b
1 
 Z'R y 
ith row
•
pq
where  bi   cij j 
j 1, j  i
i  
cii
1
vi 
cii
ith column
ith
row
 X ' R 1X
X ' R 1Z 
C
1
1
-1 
Z
'
R
X
Z
'
R
Z+G
ith diagonal
element
β 
θ 
u 
§ / 36
Applied Bayesian Inference, KSU, April 29, 2012
•
Two ways to sample b and u
1. Block draw from θ| , ,y~N  θˆ , C 
2
e
2
u
1
– faster MCMC mixing (less/no autocorrelation across
MCMC cycles)
– But slower computing time (depending on dimension of
).
• i.e. compute Cholesky of C
• Some alternative strategies available (Garcia-Cortes and
Sorensen, 1995)
• 2. Series of univariate draws from
i | y, θ i ,  e2 ,  u2 ~ N i , vi  ; i  1, 2,... p  q
– Faster computationally.
– Slower MCMC mixing
• Partial solution: “thinning the MCMC chain” e.g., save every
37
10 cycles rather than every cycle
§ /
Applied Bayesian Inference, KSU, April 29, 2012
Example: A split plot in time example
(Data from Kuehl, 2000, pg.493)
• Experiment designed to explore mechanisms
for early detection of phlebitis during
amiodarone therapy.
– Three intravenous treatments:
(A1) Amiodarone
(A2) the vehicle solution only
(A3) a saline solution.
– 5 rabbits/treatment in a completely randomized
design.
– 4 repeated measures/animal (30 min. intervals)
§ / 38
Applied Bayesian Inference, KSU, April 29, 2012
SAS data step
data ear;
input trt rabbit time temp;
y = temp; A = trt; B = time;
trtrabbit = compress(trt||'_'||rabbit);
wholeplot=trtrabbit;
cards;
1 1 1 -0.3
1 1 2 -0.2
1 1 3 1.2
1 1 4 3.1
1 2 1 -0.5
1 2 2 2.2
1 2 3 3.3
1 2 4 3.7
etc.
§ / 39
Applied Bayesian Inference, KSU, April 29, 2012
The data (“spaghetti plot”)
§ / 40
Applied Bayesian Inference, KSU, April 29, 2012
Profile (Interaction) means plots
§ / 41
Applied Bayesian Inference, KSU, April 29, 2012
A split plot model assumption for
repeated measures
Treatment 1
RABBIT IS THE EXPERIMENTAL UNIT FOR TREATMENT
Rabbit 3
Rabbit 1
Time 1
Rabbit 2
Time 1
Time 2
Time 2
Time 2
Time 3
Time 3
Time 3
Time 4
Time 4
Time 4
Time 1
RABBIT IS THE
BLOCK FOR TIME
§ / 42
Applied Bayesian Inference, KSU, April 29, 2012
Suppose CS assumption was
appropriate
CONDITIONAL SPECIFICATION: Model
variation between experimental units (i.e. rabbits)
yijk    a i  uk (i )  b j  abij  eijk
uk ( i ) ~ NIID  0,  u2(a ) 
eijk ~ NIID 0,  e2
– This is a partially nested or split-plot design.
• i.e. for treatments, rabbits is the experimental unit; 
for time, rabbits is the block!
§ / 43
Applied Bayesian Inference, KSU, April 29, 2012
Analytical (non-simulation) Inference
based on PROC MIXED
Let’s assume “known”  u2(a )  0.10  e2  0.60
Flat priors on fixed effects p(b)  1.
title 'Split Plot in Time using Mixed';
title2 'Known Variance Components';
proc mixed data=ear noprofile;
class trt time rabbit;
model temp = trt time trt*time /solution;
random rabbit(trt);
parms (0.1) (0.6) /hold = 1,2;
ods output solutionf = solutionf;
run;
proc print data=solutionf;
where estimate ne 0;
run;
§ / 44
Applied Bayesian Inference, KSU, April 29, 2012
(Partial) Output
Obs
Effect
trt
time
Estimate StdErr
DF
1
Intercept
_
_
0.2200
0.3742
12
2
trt
1
_
2.3600
0.5292
12
3
trt
2
_
-0.2200 0.5292
12
5
time
_
1
-0.9000 0.4899
36
6
time
_
2
0.02000 0.4899
36
7
time
_
3
-0.6400 0.4899
36
9
trt*time
1
1
-1.9200 0.6928
36
10
trt*time
1
2
-1.2200 0.6928
36
11
trt*time
1
3
-0.06000 0.6928
36
13
trt*time
2
1
0.3200
0.6928
36
14
trt*time
2
2
-0.5400 0.6928
36
15
trt*time
2
3
0.5800
36
0.6928
§ / 45
Applied Bayesian Inference, KSU, April 29, 2012
MCMC inference
• First set up dummy variables.
/* Based on the zero out last level restrictions */
proc transreg data=ear design order =data;
model class(trt|time / zero=last);
id y trtrabbit;
output out=recodedsplit;
run;
proc print data=recodedsplit (obs=10);
var intercept &_trgind;
run;
Corner
parameterization
implicit in SAS linear
model s software
§ / 46
Applied Bayesian Inference, KSU, April 29, 2012
Partial Output (First two rabbits)
Obs _NA Inter trt1
ME_ cept
trt2
Trt1
time
3
0
Trt2
time
1
0
Trt2
time
2
0
Trt2 trt
time
3
0
1
time y
0
time time time Trt1 Trt1
1
2
3
time time
1
2
1
0
0
1
0
1
-0.3 1
1
2
-0.2 1
3
1.2
4
3.1
1
-0.3 1_1
1
0
0
1
0
0
1
0
0
0
0
1
2
-0.2 1_1
1
1
0
0
0
1
0
0
1
0
0
0
1
3
1.2
1_1
1
1
0
0
0
0
0
0
0
0
0
0
1
4
3.1
1_1
5
-0.5 1
1
0
1
0
0
1
0
0
0
0
0
1
1
-0.5 1_2
6
2.2
1
1
0
0
1
0
0
1
0
0
0
0
1
2
2.2
1_2
7
3.3
1
1
0
0
0
1
0
0
1
0
0
0
1
3
3.3
1_2
8
3.7
1
1
0
0
0
0
0
0
0
0
0
0
1
4
3.7
1_2
9
-1.1 1
1
0
1
0
0
1
0
0
0
0
0
1
1
-1.1 1_3
10
2.4
1
0
0
1
0
0
1
0
0
0
0
1
2
2.4
1
Part of X matrix (full-rank)
trtrab
bit
1_3
§ / 47
Applied Bayesian Inference, KSU, April 29, 2012
MCMC using PROC IML
proc iml;
Full code
available online
seed = &seed;
nburnin = 5000;
/* number of burn in samples */
total = 200000;/* total number of Gibbs cycles beyond burnin */
thin= 10;
/* saving every “thin" */
ncycle = total/skip;
/* leaving a total of ncycle saved samples */
§ / 48
Applied Bayesian Inference, KSU, April 29, 2012
Key subroutine (univariate sampling)
 | y, θ ,  ,  ~ N  , v  ; i  1, 2,... p  q
start gibbs;
i
i
2
e
2
u
i
i
/* univariate Gibbs sampler */
do j = 1 to dim; /* dim = p + q */
/* generate from full conditionals for fixed and random effects
*/
solt = wry[j] - coeff[j,]*solution + coeff[j,j]*solution[j];
solt = solt/coeff[j,j];
vt = 1/coeff[j,j];
solution[j] = solt + sqrt(vt)*rannor(seed);
end;
finish gibbs;
pq
 bi   cij j 
j 1, j  i
i  
cii
1
vi 
cii
§ / 49
Applied Bayesian Inference, KSU, April 29, 2012
• Output samples to SAS data set called soldata
proc means mean median std data=soldata;
run;
ods graphics on;
%tadplot(data=soldata, var=_all_);
ods graphics off;
%tadplot is a SAS automacro suited for processing MCMC
samples.
§ / 50
Applied Bayesian Inference, KSU, April 29, 2012
Comparisons for fixed effects
MCMC
(Some Monte Carlo error)
Variable Mean Median Std Dev
int
0.218 0.218 0.374
TRT1
2.365 2.368 0.526
TRT2
-0.22 -0.215 0.532
TIME1
-0.902 -0.903 0.495
TIME2
0.0225 0.0203 0.491
TIME3
-0.64 -0.643 0.488
TRT1
-1.915 -1.916 0.692
TIME1
TRT1
-1.224 -1.219
0.69
TIME2
TRT1
-0.063 -0.066 0.696
TIME3
TRT2
0.321 0.316 0.701
TIME1
TRT2
-0.543
-0.54 0.696
TIME2
TRT2
0.58 0.589 0.694
TIME3
N
20000
20000
20000
20000
20000
20000
20000
EXACT (PROC MIXED)
Effect
trt
time
Estimate StdErr
Intercept
_
_
0.2200
0.3742
trt
1
_
2.3600
0.5292
trt
2
_
-0.2200 0.5292
time
_
1
-0.9000 0.4899
time
_
2
0.02000 0.4899
time
_
3
-0.6400 0.4899
trt*time
1
1
-1.9200 0.6928
trt*time
1
2
-1.2200 0.6928
trt*time
1
3
-0.06000 0.6928
trt*time
2
1
0.3200
20000
trt*time
2
2
-0.5400 0.6928
20000
trt*time
2
3
0.5800
20000
20000
20000
0.6928
0.6928
§ / 51
Applied Bayesian Inference, KSU, April 29, 2012
%TADPLOT output on “intercept”.
Trace
Plot
Autocorrelation
Plot
Posterior Density
§ / 52
Applied Bayesian Inference, KSU, April 29, 2012
Marginal/Cell Means
• Effects on previous 2-3 slides not of particular
interest.
• Marginal means:
– Can derive using contrast vectors that are used to
compute least squares means in PROC
GLM/MIXED/GLIMMIX etc.
• lsmeans trt time trt*time / e;
– Ai: marginal mean for trt i
– Bj : marginal mean for time j
– AiBj: cell mean for trt i time j.
§ / 53
Applied Bayesian Inference, KSU, April 29, 2012
Examples of marginal/cell means
• Marginal means
ntime
1  ntime
 A1    trt1 
  time j   trt1time j 
ntime  j 1
j 1
ntrt
1  ntrt
 B1    time1 
  trti   trti time1 
ntrt  i 1
i 1
• Cell mean
 A1B1    trt1  time1  trt1time1
§ / 54
Applied Bayesian Inference, KSU, April 29, 2012
Marginal/cell (“LS”) means.
MCMC (Monte Carlo error)
Variable
A1
A2
A3
B1
B2
B3
B4
A1B1
A1B2
A1B3
A1B4
A2B1
A2B2
A2B3
A2B4
A3B1
A3B2
A3B3
A3B4
Mean
1.403
-0.293
-0.162
-0.501
0.366
0.465
0.932
-0.234
1.382
1.88
2.583
-0.584
-0.524
-0.062
-0.003
-0.684
0.24
-0.422
0.218
Median
1.401
-0.292
-0.162
-0.5
0.365
0.466
0.931
-0.231
1.382
1.878
2.583
-0.585
-0.526
-0.058
-0.005
-0.684
0.242
-0.423
0.218
Std Dev
0.223
0.223
0.224
0.216
0.213
0.217
0.216
0.373
0.371
0.374
0.372
0.375
0.373
0.373
0.377
0.377
0.374
0.376
0.374
EXACT (PROC MIXED)
trt
time
1
2
3
1
1
1
1
2
2
2
2
3
3
3
3
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Estimate
1.4
-0.29
-0.16
-0.5
0.3667
0.4667
0.9333
-0.24
1.38
1.88
2.58
-0.58
-0.52
-0.06
-3.61E-16
-0.68
0.24
-0.42
0.22
Standard
Error
0.2236
0.2236
0.2236
0.216
0.216
0.216
0.216
0.3742
0.3742
0.3742
0.3742
0.3742
0.3742
0.3742
0.3742
0.3742
0.3742
0.3742
§ /55
0.3742
Applied Bayesian Inference, KSU, April 29, 2012
Posterior densities of a1, b1, a1b1.
Dotted lines: normal
density inferences based
on PROC MIXED
Closed lines: MCMC
§ /56
Applied Bayesian Inference, KSU, April 29, 2012
Generalized linear mixed models
(Probit Link Model)
• Stage 1:
i 1
• Stage 2:
p  y | β, u      x β  z u 
n
p  u |  u2  
'
i
'
i
1
 2 
q /2
A
2 1/2
u
 1    x β  z u  
yi
'
i
'
i
1 yi
 1
exp   2 u ' A 1u 
 2 u
p  β   constant
• Stage 3: p  u2 
p  β, u ,  | y   p  y | β, u  p  u | 
2
u
2
u
 p  β  p  
2
u
§ / 57
Applied Bayesian Inference, KSU, April 29, 2012
Rethinking prior on b
• i.e. p β   constant
– Might not be the best idea for binary data, especially
when the data is “sparse”
• Animal breeders call this the “extreme category problem”
– e.g., if all of responses in a fixed effects subclass is
either 1 or 0, then ML/PM of corresponding marginal
mean will approach -/+ ∞.
– PROC LOGISTIC has the FIRTH option for this very
reason.
• Alternative: β ~ N (β 0 , I β2 )
– Typically, β 0  0
16 < 2b < 50 is probably sufficient on the underlying
latent scale (conditionally N(0,1))
§ / 58
Applied Bayesian Inference, KSU, April 29, 2012
Recall Latent Variable Concept
(Albert and Chib, 1993)
• Recall Prob(Yi  1 | b, u)  x i' b  z i' u 
 
Then
Prob(Yi  1)    1  0.159
0.3
0.4
0.2
xi' β  z i' u  1
'
i
0.1
Suppose for animal i
'
i
0.0
'
i
Standard Normal Density
 i | β , u ~ N (x b  z u,1)    i - x b  z u
'
i
-3
-2
-1
0
1
2
3
liability
§ / 59
Applied Bayesian Inference, KSU, April 29, 2012
Data augmentation with  ={i},
n
n
i 1
i 1
 
'
'
p
|
b
,
u
p
|
b
,
u
x
b
z
• i.e.
 i
 i i iu
 1 if  i  0
Pr obYi  1 |  i   
0 otherwise
 1 if  i  0
Pr obYi  0 |  i   
0 otherwise
Pr ob Yi  y |
i
distribution of Y becomes
degenerate or point mass in
form conditional on 
  I  i  0  I Yi  0   I 
I . indicator function
i
 0  I Yi  1
§ / 60
Applied Bayesian Inference, KSU, April 29, 2012
Rewrite hierarchical model
• Stage 1a)
p y |
n
  I
i 1
• Stage 1b)
p
n
i
| β, u    p 
i 1
 0  I Yi  0   I 
n
i | β, u    
i 1
i
 0  I Yi  1 
'
'
x
β
z
i
i
iu 
• Those two stages define likelihood function
p (y | β, u)   p(y, | β, u)d   p (y | ) p  | β, u d
 n
    Prob(Y  yi | i ) p ( i | β, u) d
 i 1
n
 
  x bz u
i 1
'
i
'
i
 1  x b  z u 
yi
'
i
'
i
1 yi
§ / 61
Applied Bayesian Inference, KSU, April 29, 2012
Joint Posterior Density
• Now
p  , β, u,  u2 | y   p (y | ) p( | β, u) p  β  p (u |  u2 ) p ( u2 )
• Let’s for now assume known 2u:
p  , β, u | y,  u2   p (y | ) p( | β, u) p  β  p (u |  u2 )
§ / 62
Applied Bayesian Inference, KSU, April 29, 2012
FCD
• Liabilities:
p  i | b, u,  , y  Pr ob(Y  y i |  i ) p( i | b, u)
2
u
p  i | b, u,  u2 , y    i  x i' b  z i' u I i  0
if yi = 1
p  i | b, u,  u2 , y    i  x i' b  z i' u I i  0
if yi = 0
i.e., draw from truncated normals
§ / 63
Applied Bayesian Inference, KSU, April 29, 2012
FCD (cont’d)
• Fixed and random effects
p b, u |  u2 , , y  p( | b, u) p(u |  u2 )
 u' A 1u 
   Xb  Zu '   Xb  Zu  
 exp  
 exp  
2 
2
 2 u 
1
 βˆ    X ' X
X'Z  
2
2
β,u| e , u ,y~N    ,  
-1 
 uˆ    Z ' X Z ' Z+G   
where
βˆ   X'X
X'Z
 
1 2 
Z'X
Z'Z
A
u 
ˆ
u
  
1
 X' 
 Z' 
§ / 64
Applied Bayesian Inference, KSU, April 29, 2012
Alternative Sampling strategies for
fixed and random effects
• 1. Joint multivariate draw from β,u| u2 ,y,
– faster mixing…but computationally expensive?
• 2. Univariate draws from FCD using
partitioned matrix results.
– Refer to Slides # 36, 37, 49
– Slower mixing.
§ / 65
Applied Bayesian Inference, KSU, April 29, 2012
Recall “binarized” RCBD
Litter
1
2
3
4
5
6
7
8
9
10
Diet 1
79.5>75
70.9
76.8>75
75.9>75
77.3>75
66.4
59.1
64.1
74.5
67.3
Diet 2
80.9>75
81.8>75
86.4>75
75.5>75
77.3>75
73.2
77.7>75
72.3
81.4>75
82.3>75
Diet 3
79.1>75
70.9
90.5>75
62.7
69.5
86.4>75
72.7
73.6
64.5
65.9
Diet 4
88.6>75
88.6>75
89.1>75
91.4>75
75.0
79.5>75
85.0>75
75.9>75
75.5>75
70.5
Diet 5
95.9>75
85.9>75
83.2>75
87.7>75
74.5
72.7
90.9>75
60.0
83.6>75
63.2
§ / 66
Applied Bayesian Inference, KSU, April 29, 2012
MCMC analysis
• 5000 burn-in cycles
• 500,000 additional cycles
– Saving every 10: 50,000 saved cycles
• Full conditional univariate sampling on fixed
and random effects.
• “Known” 2u = 0.50.
• Remember…no 2e.
§ / 67
Applied Bayesian Inference, KSU, April 29, 2012
Fixed Effect Comparison on inferences
(conditional on “known” 2u = 0.50)
• MCMC
a 
 1
a 2 
β 
a 3 
a 4 
 
a 5 
• PROC GLIMMIX
Variable
intercept
DIET1
DIET2
DIET3
DIET4
Mean Median Std Dev
0.349
0.345
0.506
-0.659 -0.654
0.64
0.761
0.75
0.682
-1 -0.993
0.649
0.76
0.753
0.686
N
50000
50000
50000
50000
50000
Solutions for Fixed Effects
Effect
diet
Intercept
Estimate
Standard
Error
0.3097
0.4772
diet
1
-0.5935
0.5960
diet
2
0.6761
0.6408
diet
3
-0.9019
0.6104
diet
4
0.6775
0.6410
diet
5
0
.
§ / 68
Applied Bayesian Inference, KSU, April 29, 2012
Marginal Mean Comparisons
MCMC
• Based on K’b
1
1
K '  1
1
1
1 0 0 0
0 1 0 0 
0 0 1 0
0 0 0 1
0 0 0 0 
PROC
GLIMMIX
Variable Mean Median Std Dev
mm1
-0.31 -0.302 0.499
mm2
1.11 1.097 0.562
mm3
-0.651 -0.644 0.515
mm4
1.109 1.092 0.563
mm5
0.349 0.345 0.506
N
50000
50000
50000
50000
50000
diet Least Squares Means
diet
Estimate
Standard
Error
1
-0.2838
0.4768
2
0.9858
0.5341
3
-0.5922
0.4939
4
0.9872
0.5343
5
0.3097
0.4772
§ / 69
Applied Bayesian Inference, KSU, April 29, 2012
Diet 1 Marginal Mean (+a1)
§ / 70
Applied Bayesian Inference, KSU, April 29, 2012
Posterior Density discrepancy between
MCMC and Empirical Bayes for i?
Dotted lines: normal
approximation based on
PROC GLIMMIX
Closed lines: MCMC
Do we run the risk of
overstating precision
with conventional
methods?
Diet Marginal Means
§ / 71
Applied Bayesian Inference, KSU, April 29, 2012
How about probabilities of success?
Variable Mean Median Std Dev
MCMC
• i.e., (K’b) or
normal cdf of
marginal
means
PROC
GLIMMIX
N
prob1
0.391
0.381
0.173
20000
prob2
0.833
0.864
0.126
20000
prob3
0.282
0.26
0.157
20000
prob4
0.833
0.863
0.126
20000
prob5
0.623
0.635
0.173
20000
Mean
Standard
diet
1
2
3
4
5
Estimate Standard
Error
-0.2838
0.9858
-0.5922
0.9872
0.3097
0.4768
0.5341
0.4939
0.5343
0.4772
0.3883
0.8379
0.2769
0.8382
0.6216
DELTA METHOD
Error
Mean
0.1827
0.1311
0.1653
0.1309
0.1815
§ / 72
Applied Bayesian Inference, KSU, April 29, 2012
Comparison of Posterior Densities
for Diet Marginal Mean Probabilities
Dotted lines: normal
approximation based on
PROC GLIMMIX
Closed lines: MCMC
Largest discrepancies along
the boundaries
§ / 73
Applied Bayesian Inference, KSU, April 29, 2012
Posterior density of
(+a1) & (+a2)
(+a1)
(+a2)
§ /74
Applied Bayesian Inference, KSU, April 29, 2012
Posterior density of (+a2) - (+a1)
prob21_diff
Frequency
Percent
prob21_diff < 0
819
1.64
prob21_diff >= 0 49181
98.36
Probability
((+a2) (+a1) < 0) =
0.0164
“Two-tailed” Pvalue = 2*0.0164
= 0.0328
§ / 75
Applied Bayesian Inference, KSU, April 29, 2012
How does that compare with PROC
GLIMMIX?
Estimates
Label
Estimate Standard DF
Error
t Value
Pr > |t|
Mean
Standard
Error
Mean
diet 1
lsmean
-0.2838
0.4768
10000
-0.60
0.5517
0.3883
0.1827
diet 2
lsmean
0.9858
0.5341
10000
1.85
0.0650
0.8379
0.1311
diet1 vs
diet2 dif
-1.2697
0.6433
10000
-1.97
0.0484
Non-est
.
Recall, we assumed “known” 2u …hence normal rather than
t-distributed test statistic.
§ / 76
Applied Bayesian Inference, KSU, April 29, 2012
What if variance components
are not known?
• Specify priors on variance components: Options?
– 1. Conjugate (Scaled Inverted Chi-Square) denoted as
nm
c-2 (nm, nmsm2))
2
 n m sm  2
2
 n m  n m sm
1
2 
2 m2
2
2
2  2 
p  m | n m , sm  
m
e
; m  u, e
n m 
 
 2aswell?)
– 2. Flat (and bounded
p 
2
m
  1; m  u, e
sm2  0
– 3. Gelman’s (2006) prior
p  m   Uniform(0, A)  p 
n m  2
2
m
n m  1
   
2
m
1
2
; 0   A
2
m
2
sm2  0
§ /77
Applied Bayesian Inference, KSU, April 29, 2012
Relationship between Scaled Inverted
Chi-Square & Inverted Gamma
• Scaled Inverted Chisquare:
n
n s2  2
2
n
 n s
1
2 
2 2
2
2
2  2 
p  | n , s  
e
n 
 
2 2 
n
s
E  2 | n , s 2  
;n  2
n 2
Var  | v, s
2
2
2v
2
s 
  v  2
2
2
b
a
b
2
2  (a 1)
p  | a , b  
 
e
 a 
E  2  
b
a 1
2
;a  1
2
v  4
;n  2
 
Var  2 
b2
a  1 a  2
2
Gelman’s prior
Gelman’s prior
n m  1
• Inverted Gamma
s 0
2
m
a 
1
2
b 0
§ /78
Applied Bayesian Inference, KSU, April 29, 2012
Gibbs sampling and mixed effects
models
• Recall the following hierarchical model:
  y  Xβ  Zu  '  y  Xβ  Zu  
p  y|β,u, ,    2  exp 
2
2 e
1
q
/2
u
'
A
u
2
2
p  u |  u    2 u  exp  
2
2
u
2
e
2
u
2  n /2
e
p  u2 | n u , su2   
2
 n u  n u su
 1
2 u2
2  2 
u
p  e2 | n e , se2   
2
 n e  n e se
1
2 e2
2  2 
e
e
e
§ / 79
Applied Bayesian Inference, KSU, April 29, 2012
Joint Posterior Density and FCD
p  β,u, , |y   
2
e
 
2  q /2
u
2
u
2  n /2
e
 u'A u 
exp  
2
2 u 
1
  y  Xβ  Zu  '  y  Xβ  Zu  
exp 
2
2 e
2
 n u  n u su
1
2 u2
2  2 
u
e
2
 n e  n e se
1
2 e2
2  2 
e
e
• FCD for b and u: same as before: normal
• FCD for VC: c-2
e  y  Xβ  Zu
n
2
2
2
p  e | ELSE , y   c n e  n,   e`e n e se  
j
1
 n
2
2
1
2
p  u | ELSE , y   c n u  q,   u`A u n u su  
j
1
§ / 80
Applied Bayesian Inference, KSU, April 29, 2012
Back to Split Plot in Time Example
• Empirical Bayes
(EGLS based on REML)
title 'Split Plot in Time using Mixed';
title2 'UnKnown Variance Components';
proc mixed data=ear covtest ;
class trt time rabbit;
model temp = trt time trt*time /solution;
random rabbit(trt);
ods output solutionf = solutionf;
run;
proc print data=solutionf;
where estimate ne 0;
run;
Fully Bayes:
• 5000 burnin-cycles
• 200000 subsequent
cycles
• Save every 10 post
burn-in
• Use Gelman’s prior
on VC
Code available online
§ / 81
Applied Bayesian Inference, KSU, April 29, 2012
Variance component inference
PROC MIXED
Covariance Parameter Estimates
Cov Parm Estimate Standard Z Value
Error
rabbit(trt) 0.08336 0.09910 0.84
Pr > Z
Residual 0.5783
<.0001
MCMC
Variable Mean
0.1363
4.24
Median Std Dev
sigmau
0.127 0.0869
sigmae
0.632
0.611
0.2001
N
0.141
20000
0.15
20000
§ /82
Applied Bayesian Inference, KSU, April 29, 2012
MCMC plots
Random effects variance
Residual Variance
§ /83
Applied Bayesian Inference, KSU, April 29, 2012
Estimated effects ± se (sd)
PROC MIXED
Effect
trt
time
Intercept _
Estimate
MCMC
StdErr
_
0.22
0.3638
trt
1_
2.36
0.5145
trt
2_
-0.22
0.5145
time
_
1
-0.9
0.481
time
_
2
0.02
0.481
time
_
3
-0.64
0.481
trt*time
1
1
-1.92
0.6802
trt*time
1
2
-1.22
0.6802
trt*time
1
3
-0.06
0.6802
trt*time
2
1
0.32
0.6802
trt*time
2
2
-0.54
0.6802
trt*time
2
3
0.58
0.6802
Variable Mean Median Std Dev
intercept
0.217
0.214
0.388
TRT1
2.363
2.368
0.55
TRT2
-0.22 -0.219
0.55
TIME1
-0.898 -0.893
0.499
TIME2
0.0206 0.0248
0.502
TIME3
-0.64 -0.635
0.501
TRT1
-1.924 -1.931
0.708
TIME1
TRT1
-1.222
-1.22
0.71
TIME2
TRT1
-0.057 -0.057
0.715
TIME3
TRT2
0.318
0.315
0.711
TIME1
TRT2
-0.54 -0.541
0.711
TIME2
TRT2
0.585
0.589
0.71
TIME3
N
20000
20000
20000
20000
20000
20000
20000
20000
20000
20000
20000
20000
§ /84
Applied Bayesian Inference, KSU, April 29, 2012
Marginal (“Least Squares”) Means
PROC MIXED
Least Squares Means
Effect
trt
time Estimate Standar
d Error
A1 trt
1
1.4000
0.2135
trt
2
-0.2900 0.2135
trt
3
-0.1600 0.2135
1
-0.5000 0.2100
B1 time
time
2
0.3667
0.2100
time
3
0.4667
0.2100
time
4
0.9333
0.2100
1
-0.2400 0.3638
A1B1 trt*time 1
trt*time
1
2
1.3800
0.3638
trt*time
1
3
1.8800
0.3638
trt*time
1
4
2.5800
0.3638
trt*time
2
1
-0.5800 0.3638
trt*time
2
2
-0.5200 0.3638
trt*time
2
3
-0.06000 0.3638
trt*time
2
4
4.44E-16 0.3638
trt*time
trt*time
trt*time
trt*time
3
3
3
3
1
2
3
4
-0.6800
0.2400
-0.4200
0.2200
0.3638
0.3638
0.3638
0.3638
DF
12
12
12
36
36
36
36
36
36
36
36
36
36
36
36
36
36
36
36
MCMC
Variable
Mean
Median
Std Dev
A1 A1
1.399
1.401
0.24
A2
-0.292
-0.29
0.237
A3
-0.16
-0.161
0.236
B1 B1
-0.502
-0.501
0.224
B2
0.364
0.363
0.222
B3
0.467
0.466
0.224
B4
0.934
0.936
0.222
A1B1 A1B1 -0.244
-0.246
0.389
A1B2
1.378
1.379
0.391
A1B3
1.882
1.88
0.391
A1B4
2.581
2.584
0.391
A2B1
-0.586
-0.586
0.393
A2B2
-0.526
-0.525
0.385
A2B3
-0.058
-0.054
0.387
A2B4
0.0031
0.0017
0.386
A3B1
-0.676
-0.678
0.388
A3B2
0.239
0.241
0.386
A3B3
-0.422
-0.427
0.392
A3B4
0.219
0.216
0.385
§ /85
Applied Bayesian Inference, KSU, April 29, 2012
Posterior Densities of A1, B1, A1B1
Dotted lines: t densities
based on estimates/stde
from PROC MIXED
Closed lines: MCMC
§ / 86
Applied Bayesian Inference, KSU, April 29, 2012
How about fully Bayesian inference in
generalized linear mixed models?
• Probit link GLMM.
– Extensions to handle unknown variance
components are exactly the same given the
augmented liability variables.
• i.e. scaled-inverted chi-square conjugate to 2u.
– No “overdispersion” (2e) to contend with for
binary data.
• But stay tuned for binomial/Poisson data!
§ / 87
Applied Bayesian Inference, KSU, April 29, 2012
Analysis of “binarized” RCBD data.
Empirical Bayes
title 'Posterior inference conditional
on unknown VC';
proc glimmix data=binarize;
class litter diet;
model y = diet / covb solution
dist=bin link = probit;
random litter;
lsmeans diet / diff ilink;
estimate 'diet 1 lsmean'
intercept 1 diet 1 0 0 0 0 / ilink;
estimate 'diet 2 lsmean'
intercept 1 diet 0 1 0 0 0/ ilink;
estimate 'diet1 vs diet2 dif'
intercept 0 diet 1 -1 0 0 0;
run;
Fully Bayes
10000 burnin cycles
200000 cycles therafter
Saving every 10
Gelman’s prior on VC.
§ /88
Applied Bayesian Inference, KSU, April 29, 2012
Inferences on VC
Method = RSPL
Covariance Parameter Estimates
Estimate
Standard Error
0.5783
0.5021
MCMC
Analysis Variable : sigmau
Mean Median Std Dev N
2.048 1.468 2.128 20000
Method = Laplace
Covariance Parameter Estimates
Estimate
Standard Error
0.6488
0.6410
Method = Quad
Covariance Parameter Estimates
Estimate
Standard Error
0.6662
0.6573
§ /89
Applied Bayesian Inference, KSU, April 29, 2012
Inferences on marginal means (+ai)
MCMC
Method = Laplace
diet Least Squares Means
diet
Estimate Standard DF
Error
1
-0.3024
0.5159
36
2
1.0929
0.5964
36
3
-0.6428
0.5335
36
4
1.0946
0.5976
36
5
0.3519
0.5294
36
Variable Mean Median Std Dev
mm1
mm2
mm3
mm4
mm5
-0.297 -0.301
1.322 1.283
-0.697 -0.69
1.319 1.285
0.465 0.442
0.643
0.716
0.662
0.72
0.671
N
20000
20000
20000
20000
20000
Larger: take into account
uncertainty on variance
components
§ /90
Applied Bayesian Inference, KSU, April 29, 2012
Posterior Densities of (+ai)
Dotted lines: t36 densities
based estimates and
standard errors from PROC
GLIMMIX
(method=laplace)
Closed lines: MCMC
§ /91
Applied Bayesian Inference, KSU, April 29, 2012
MCMC inferences on
probabilities of “success”:
(based on (+ai)
§ /92
Applied Bayesian Inference, KSU, April 29, 2012
MCMC inferences on marginal probabilities:
  a 
(based on  
)
i
2
1
u
Potentially big issues
with empirical Bayes
inference…dependent
upon quality of VC
inference &
asymptotics!
§ / 93
Applied Bayesian Inference, KSU, April 29, 2012
Inference on Diet 1 vs. Diet 2
probabilities MCMC
PROC GLIMMIX
Estimates
Label
Mean
Standard
Error
Mean
diet 1
lsmean
0.3812
0.1966
diet 2
lsmean
0.8628
diet1 vs
diet2 dif
Variable Mean Median Std
N
Dev
Prob
0.4 0.382 0.212 20000
diet1
0.857
0.899 0.137 20000
0.1309
Prob
diet2
Non-est .
P-value = 0.0559
Prob
diff
0.457
0.464 0.207 20000
prob21_diff
MCMC
Frequency
Percent
prob21_diff < 0
180
0.90
prob21_diff >= 0
19820
99.10
Probability ((+a2) - (+a1) < 0) = 0.0090 (“one-tailed”)
§ / 94
Applied Bayesian Inference, KSU, April 29, 2012
Any formal comparisons between
GLS/REML/EB(M/PQL) and MCMC for GLMM?
• Check Browne and Draper (2006).
• Normal data (LMM)
– Generally, inferences based on GLS/REML and MCMC are
sufficiently close.
– Since GLS/REML is faster, it is the method of choice for
classical assumptions.
• Non-normal data (GLMM).
– Quasi-likelihood based methods are particularly problematic
in bias of point estimates and interval coverage of variance
components.
• Side effects on fixed effects inference.
– Bayesian methods with diffuse priors are well calibrated for
both properties for all parameters.
– Comparisons with Laplace not done yet.
§ / 95
Applied Bayesian Inference, KSU, April 29, 2012
A pragmatic take on using MCMC vs PL for GLMM
under classical assumptions?
• If datasets are too small to warrant asymptotic
considerations, then the experiment is likely to
be poorly powered.
– Otherwise, PL might ≈ MCMC inference.
• However, differences could depend on
dimensionality, deviation of data distribution
from normal, and complexity of design.
• The real big advantage of MCMC ---is multistage hierarchical models (see later)
§ / 96
Applied Bayesian Inference, KSU, April 29, 2012
Implications of design on Fully Bayes
vs. PL inference for GLMM?
• RCBD: Known for LMM, that inferences on
treatment differences in RCBD are resilient to
estimates of block VC.
– Inference on differences in treatment effects
thereby insensitive to VC inferences in GLMM?
• Whole plot treatment factor comparisons in split plot
designs?
• Greater sensitivity (i.e. whole plot VC).
– Sensitivity of inference for conditional versus
 xβ 
“population-averaged” probabilities?
  x β  vs.  
'
i
'
i
2
1
u 
§ / 97
Applied Bayesian Inference, KSU, April 29, 2012
Ordinal Categorical Data
• Back to the GF83 data.
– Gibbs sampling strategy laid out by Sorensen and
Gianola (1995); Albert and Chib (1993).
– Simple extensions to what was considered earlier
for linear/probit mixed models
1 if  j 1  i   j
Pr ob Yi  j | i ,  j 1 ,  j   
 0 otherwise
§ / 98
Applied Bayesian Inference, KSU, April 29, 2012
Joint Posterior Density
• Stages
1A
c
p  y | , τ     Pr ob Yi  j | i , τ    I   j 1 
j 1
i 1 
n
1B
p
2
p u |  u2 
2
3
i | β, u,τ   
i
  xi' β  z i' u  ,  j 1 
1
2 
q/2
A
pb  constant
2 1/ 2
u
2
u
i
 j
1
1 
exp  
u
'
A
u 
2
 2 u
(or something diffuse)
 ~ p | n u , s   
2
u
i
  j  I Yi  j  
2
u
2 n u 1
u
 bu 
exp   2 
 u 
§ / 99
Applied Bayesian Inference, KSU, April 29, 2012
Anything different for FCD compared
to probit binary?
• Liabilities
p  i | β, u,yi  j   
'
'
x
β
z
 i iu  I   j 1 
i
i
  j  I Yi  j 
• Thresholds:
p   j |  j , ELSE   U  min( | Y  j  1),max( | Y  j ) 
– This leads to painfully slow mixing…a better
strategy is based on Metropolis sampling (Cowles
et al., 1996).
§ /100
Applied Bayesian Inference, KSU, April 29, 2012
Fully Bayesian inference on GF83
• 5000 burn-in samples
• 50000 samples post burn-in
• Saving every 10.
Diagnostic
plots for
2u
§ /101
Applied Bayesian Inference, KSU, April 29, 2012
Posterior Summaries
Variable
Mean
Median
Std Dev
5th Pctl
95th Pctl
intercept
hy
age
sex
sire1
sire2
sire3
sire4
sigmau
thresh2
probfemalecat1
-0.222
0.236
-0.036
-0.172
-0.082
0.116
0.194
-0.173
1.362
0.83
0.598
-0.198
0.223
-0.035
-0.171
-0.042
0.0491
0.106
-0.11
0.202
0.804
0.609
0.669
0.396
0.392
0.393
0.587
0.572
0.625
0.606
8.658
0.302
0.188
-1.209
-0.399
-0.69
-0.818
-1
-0.641
-0.64
-1.118
0.0021
0.383
0.265
0.723
0.894
0.598
0.48
0.734
0.937
1.217
0.595
4.148
1.366
0.885
probfemalecat2
0.827
0.864
0.148
0.53
0.986
probmalecat1
0.539
0.545
0.183
0.23
0.836
probmalecat2
0.79
0.821
0.154
0.491
0.974
§ /102
Applied Bayesian Inference, KSU, April 29, 2012
Posterior densities of sex-specific cumulative
probabilities (first two categories)
How would interpret a
“standard error” in this
context?
§ /103
Applied Bayesian Inference, KSU, April 29, 2012
Posterior densities of sex-specific
probabilities (each category)
§ /104
Applied Bayesian Inference, KSU, April 29, 2012
What if some FCD are not
recognizeable?
• Examples: Poisson mixed models, logistic
mixed models.
• Hmmm.. Need a different strategy.
– Use Gibbs sampling whenever you can.
– Use Metropolis-Hastings sampling for FCD that are
not recognizeable.
• NEXT!
§ /105