Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Applied Bayesian Inference, KSU, April 29, 2012 §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman § / 1 Applied Bayesian Inference, KSU, April 29, 2012 Simulation-based inference • Suppose you’re interested in the following integral/expectation: f(x): density E g x g x f x dx g(x): function. x • You can draw random samples x1,x2,…,xn from f(x). Then compute n 1 Eˆ g x g xi E g x n i 1 As n → • With Monte Carlo Standard Error: n 1 n g xi Eˆ g x 2 i 1 n 1 § / 2 Applied Bayesian Inference, KSU, April 29, 2012 Beauty of Monte Carlo methods • You can determine the distribution of any function of the random variable(s). • Distribution summaries include: – – – – – Means, Medians, Key Percentiles (2.5%, 97.5%) Standard Deviations, Etc. • Generally more reliable than using “Delta method” especially for highly non-normal distributions. § / 3 Applied Bayesian Inference, KSU, April 29, 2012 Using method of composition for sampling (Tanner, 1996). • Involve two stages of sampling. • Example: li Prob Y y | l e i – Suppose Yi|li~Poisson(li) i – In turn., li|a,b ~ Gamma(a,b) li y y! b a a 1 bl p li | a , b li e a i – Then Pr ob Yi y | a , b Pr ob Y y | l p l | a , b dl i i i i Rli yi a b 1 li y b a a 1 bl li e d li y ! a yi ! a 1 b 1 b a Rli e li i yi negative binomial distribution with mean a/b and variance (a/b)(1+ b -1). § / 4 Applied Bayesian Inference, KSU, April 29, 2012 Using method of composition for sampling from negative binomial: data new; seed1 = 2; alpha = 2; beta = 0.25; do j = 1 to 10000; call rangam(seed1,alpha,x); lambda = x/beta; call ranpoi(seed1,lambda,y); output; end; run; proc means mean var; var y; run; 1. Draw li|a,b ~ Gamma(a,b) . 2. Draw Yi ~Poisson(li) The MEANS Procedure Variable y Mean 7.9749 Variance 39.2638 E(y) = a/b 2/0.25 8 Var(y) = (a/b)(1+ b -1) = 8*(1+4)=40 § / 5 Applied Bayesian Inference, KSU, April 29, 2012 Another example? Student t. data new; seed1 = 29523; df=4; do j = 1 to 100000; 1. Draw li|n ~ Gamma(n/2,n/2) . call rangam(seed1,df/2,x); 2. Draw ti |li~Normal(0,1/li) lambda = x/(df/2); t = rannor(seed1)/sqrt(lambda); output; Then t ~ Student tn end; run; Variable Mean Variance 5th Pctl 95th Pctl proc means mean var p5 p95; var t; t -0.00524 2.011365 -2.1376 2.122201 run; data new; t5 = tinv(.05,4); Obs t5 t95 t95 = tinv(.95,4); run; 1 -2.1319 2.13185 proc print; § / run; 6 Applied Bayesian Inference, KSU, April 29, 2012 Expectation-Maximization (EM) • Ok, I know that EM is NOT a simulation-based inference procedure. – However, it is based on data augmentation. • Important progenitor of Markov Chain Monte Carlo (MCMC) methods – Recall the plant genetics example y y y n! 2 1 1 L | y y1! y 2 ! y3 ! y 4 ! 4 4 4 1 2 2 1 1 L | y 4 4 4 4 y1 y2 y3 3 4 y4 y4 § / 7 Applied Bayesian Inference, KSU, April 29, 2012 Data augmentation • Augment “data” by splitting first cell into two cells with probabilities ½ and /4 for 5 categories: 1 L | y,x 2 4 x Looks like a Beta Distribution to me! 1 1 4 4 4 y2 y1 x y4 1 4 y1 x y 4 1 4 L | y, x y1 x y3 y4 y2 y3 y 2 y3 a b a 1 b 1 p 1 a b § / 8 Applied Bayesian Inference, KSU, April 29, 2012 Data augmentation (cont’d) • So joint distribution of “complete” data: y x 1 2 3 n! 2 1 1 p y,x | x ! y1 x ! y2 ! y3 ! y4 ! 4 4 4 4 4 x y y y4 • Consider the part just including the “missing data” y 2 px | y, 1 x 2 2 x y1 x binomial 2 E x | y, y1 2 § / 9 Applied Bayesian Inference, KSU, April 29, 2012 Expectation-Maximization. • Start with complete log-likelihood: x y1 x y2 y3 y4 n! 2 1 1 log L | y,x log x ! y1 x ! y2 ! y3 ! y4 ! 4 4 4 4 4 1 log L | y,x cons y1 x y4 log y2 y3 log 4 4 • 1. Expectation (E-step) E log L | y,x y1 E x y4 log y2 y3 log 1 x 2 y1 y1 [t ] y 4 log y2 y3 log 1 ˆ 2 ˆ [t ] y 4 log y 2 y3 log 1 y1 [t ] ˆ 2 § / 10 Applied Bayesian Inference, KSU, April 29, 2012 • 2. Maximization step – Use first or second derivative methods to maximize Ex log L | y,x ˆ[t ] y1 ˆ[t ] y4 E log L | y,x 2 y2 y3 1 – Set to 0: ˆ [ t 1] ˆ [ t ] y1 y4 ˆ [ t ] 2 ˆ [ t ] y 2 y3 y 4 y1 [ t ] ˆ 2 § / 11 Applied Bayesian Inference, KSU, April 29, 2012 Recall the data Probability p1 p2 p3 2 Genotype Data (Counts) Prob(A_B_) y1=1997 Prob(aaB_) y2=906 Prob(A_bb) y3=904 Prob(aabb) y4=32 4 1 4 1 p4 4 4 01 → 0: close linkage in repulsion → 1: close linkage in coupling § / 12 Applied Bayesian Inference, KSU, April 29, 2012 iter theta 1 0.1055303 2 0.0680147 3 0.0512031 PROC IML code: proc iml; 4 y1 = 1997; y2 = 906; y3 = 904; y4 = 32; 5 theta = 0.20; /*Starting value */ 6 do iter = 1 to 20; 7 Ex2 = y1*(theta)/(theta+2); /* E-step */ 8 theta = (Ex2+y4)/(Ex2+y2+y3+y4);/* M-step */ 9 10 print iter theta; 11 end; 12 run; Slower than Newton-Raphson/Fisher scoring…but generally more robust to poorer starting values. 0.0432646 0.0394234 0.0375429 0.036617 0.0361598 0.0359338 0.0358219 0.0357666 0.0357392 13 0.0357256 14 0.0357189 15 0.0357156 16 0.0357139 17 0.0357131 18 0.0357127 19 0.0357125 20 0.0357124 § / 13 Applied Bayesian Inference, KSU, April 29, 2012 How derive an asymptotic standard error using EM? • From Louis (1982): 2 log p θ | Y θ 2 X log p θ | Y, X p X | θ, Y dX - var X|θ , Y θ y x y4 log p | y,x var x | , y ˆ θ 2 p | y, x 1 Given: 2 log p θ | Y, X 1 y y 2 3 y1 x y4 y2 y3 1 2 2 ˆ .0357 y1 1997 ˆ 34.42 ˆ .0357 2 .0 357 2 2 2 log p | y,x 1 34.42 x var var | y , var x | y , 26987.41 2 2 ˆ .0357 ˆ ˆ § / 14 Applied Bayesian Inference, KSU, April 29, 2012 Finish off 2 log p θ | Y θ 2 X • Now 2 log p θ | Y, X θ 2 2 log p | y,x 2 26987.41 log p θ | Y, X p X | θ, Y dX - var X|θ , Y θ y1 x y4 y2 y3 2 2 1 2 y y y 2 log p | y,x 1 1 2 ˆ 4 y y 3 2 E 54507.06 2 2 2 x ˆ 1 ˆ ˆ • Hence: 2 log p θ | Y θ 2 54507.06 26987.41 27519.65 ˆ se ˆ 1 .0060 27519.65 § / 15 Applied Bayesian Inference, KSU, April 29, 2012 Stochastic Data Augmentation (Tanner, 1996) • Posterior Identity p | y p | y, x p ( x | y )dx R • Predictive Identity p x | y p x | y , p ( | y )d x • Implies R p | y p | y, x p x | y, p( | y)ddx Rx R p | y , x p x | y , dx p ( | y )d R R x K | y p( | y )d R Suggests an “iterative” method of composition approach for sampling K | y Transition function for Markov Chain § / 16 Applied Bayesian Inference, KSU, April 29, 2012 Sampling strategy from p(|y) • Start somewhere: (starting value = [0] ) – Sample x[1] from p x | y , – Sample [1] from p | y , x x[1] [0] p x | y , [1] – Sample [2] [2] p | y , x x – Sample ] from x[2] from Cycle 1 Cycle 2 – etc. – It’s like sampling from “E-steps” and “M-steps” § / 17 Applied Bayesian Inference, KSU, April 29, 2012 What are these Full Conditional Densities (FCD) ? • Recall “complete” likelihood function y x 1 2 3 n! 2 1 1 p y,x | x ! y1 x ! y2 ! y3 ! y4 ! 4 4 4 4 4 x y y y4 • Assume prior on is “flat” p 1 : p | y, x p y, x | • FCD: p | y,x y1 x y4 1 1 1 2 p x | y, 2 2 x y2 y3 1 1 Beta(a=(y -x +y +1),b=(y +y +1)) 1 4 2 3 y1 x Binomial(n=y1, p = 2/(+2)) § / 18 Applied Bayesian Inference, KSU, April 29, 2012 Starting value IML code for Chained Data Augmentation Example proc iml; seed1=4; ncycle = 10000; /* total number of samples */ theta = j(ncycle,1,0); y1 = 1997; y2 = 906; y3 = 904; y4 = 32; beta = y2+y3+1; theta[1] = ranuni(seed1); /* initial draw between 0 and 1 */ do cycle = 2 to ncycle; p = 2/(2+theta[cycle-1]); xvar= ranbin(seed1,y1,p); alpha = y1+y4-xvar+1; xalpha = rangam(seed1,alpha); xbeta = rangam(seed1,beta); theta[cycle] = xalpha/(xalpha+xbeta); end; create parmdata var {theta xvar }; append; run; data parmdata; set parmdata; Gamma a ,1 cycle = _n_; Beta a , b Gamma a ,1 Gamma b ,1 run; § / 19 Applied Bayesian Inference, KSU, April 29, 2012 “bad” starting value Trace Plot proc gplot data=parmdata; plot theta*cycle; Should discard the first “few” run; samples to ensure that one is truly sampling from p(|y) Starting value should have no impact. Burn -in? “Convergence in distribution”. How to decide on this stuff? Cowles and Carlin (1996) Throw away the first 1000 samples as “burn-in” § / 20 Applied Bayesian Inference, KSU, April 29, 2012 Histogram of samples post burn-in proc univariate data=parmdata ; where cycle > 1000; var theta ; histogram/normal(color=red mu=0.0357 sigma=0.0060); run; Asymptotic Likelihood inference Bayesian inference N 9000 Posterior Mean 0.03671503 Post. Std Deviation 0.00607971 Quantiles for Normal Distribution Percent Quantile Observed Asymptotic (Bayesian) (Likelihood) 5.0 0.02702 0.02583 95.0 0.04728 0.04557 § / 21 Applied Bayesian Inference, KSU, April 29, 2012 Zooming in on Trace Plot Hints of autocorrelation. Expected with Markov Chain Monte Carlo simulation schemes. Number of drawn samples is NOT equal number of independent draws. The greater the autocorrelation…the greater the problem…need more samples! § / 22 Applied Bayesian Inference, KSU, April 29, 2012 Sample autocorrelation proc arima data=parmdata plots(only)=series(acf); where cycle > 1000; identify var= theta nlag=1000 outcov=autocov ; run; Autocorrelation Check for White Noise To Lag ChiDF Square Pr > Autocorrelations ChiSq 6 <.0001 0.497 3061.39 6 0.253 0.141 0.079 0.045 0.029 § / 23 Applied Bayesian Inference, KSU, April 29, 2012 How to estimate the effective number of independent samples (ESS) • Consider posterior mean based on m samples: m 1 ˆm [ i ] m i 1 1 ˆ var m var [i ] i 1,2,..., m m • Initial positive sequence estimator (Geyer, 1992; Sorensen and Gianola, 1995): t ˆm (0) ˆm (0) 2 ˆ m j var ˆm j 0 m ˆ m j ˆm 2 j ˆm 2 j 1 , j 0,1,..., Sum of adjacent lag autocovariances variance 1 i m j [i ] ˆ ˆm ( j ) m [i t ] ˆm m i 1 Lag-m autocovariance § / 24 Applied Bayesian Inference, KSU, April 29, 2012 Initial positive sequence estimator t var ˆm ˆm (0) 2 ˆ m j j 0 m • Choose t such that all ˆ m j 0, j 0,1,..., t • SAS PROC MCMC chooses a slightly different cutoff (see documentation). ESS ˆm 0 var ˆm Extensive autocorrelation across lags…..leads to smaller ESS § / 25 Applied Bayesian Inference, KSU, April 29, 2012 %macro ESS1(data,variable,startcycle,maxlag); data _null_; set &data nobs=_n;; call symputx('nsample',_n); run; proc arima data=&data ; where iteration > &startcycle; identify var= &variable nlag=&maxlag outcov=autocov ; run; proc iml; use autocov; read all var{'COV'} into cov; nsample = &nsample; nlag2 = nrow(cov)/2; Gamma = j(nlag2,1,0); cutoff = 0; t = 0; Recall: 9000 MCMC post burnin cycles. SAS code do while (cutoff = 0); t = t+1; Gamma[t] = cov[2*(t-1)+1] + cov[2*(t-1)+2]; if Gamma[t] < 0 then cutoff = 1; if t = nlag2 then do; print "Too much autocorrelation"; print "Specify a larger max lag"; stop; end; end; varm = (-Cov[1] + 2*sum(Gamma)) / nsample; ESS = Cov[1]/varm; /* effective sample size */ stdm = sqrt(varm); parameter = "&variable"; /* Monte Carlo standard error */ print parameter stdm ESS; run; %mend ESS1; § / 26 Applied Bayesian Inference, KSU, April 29, 2012 Executing %ESS1 • %ESS1(parmdata,theta,1000,1000); Recall: 1000 MCMC burnin cycles. parameter stdm ESS theta 0.0001116 2967.1289 i.e. information equivalent to drawing 2967 independent draws from density. § / 27 Applied Bayesian Inference, KSU, April 29, 2012 How large of an ESS should I target? • Routinely…in the thousands or greater. • Depends on what you want to estimate. – Recommend no less than 100 for estimating “typical” location parameters: mean, median, etc. – Several times that for “typical” dispersion parameters like variance. • Want to provide key percentiles? – i.e., 2.5th , 97.5th percentiles? Need to have ESS in the thousands! – See Raftery and Lewis (1992) for further direction. § / 28 Applied Bayesian Inference, KSU, April 29, 2012 Worthwhile to consider this sampling strategy? • Not too much difference, if any, with likelihood inference. • But how about smaller samples? – e.g., y1=200,y2=91,y3=90,y4=3 – Different story § / 29 Applied Bayesian Inference, KSU, April 29, 2012 Gibbs sampling: origins (Geman and Geman, 1984). • Gibbs sampling was first developed in statistical physics in relation to spatial inference problem – Problem: true image was corrupted by a stochastic process to produce an observable image y (data) • Objective: restore or estimate the true image in the light of the observed image y. – Inference on based on the Markov random field joint posterior distribution, through successively drawing from updated FCD which were rather easy to specify. – These FCD each happened to be the Gibbs distn’s. • Misnomer has been used since to describe a rather general process. § / 30 Applied Bayesian Inference, KSU, April 29, 2012 Gibbs sampling • Extension of chained data augmentation for case of several unknown parameters. • Consider p = 3 unknown parameters: 1 , 2 ,3 • Joint posterior density p 1 , 2 , 3 | y • Gibbs sampling: MCMC sampling strategy where all FCD are recognizeable: p 1 | 2 , 3 , y p 2 | 1 , 3 , y p 3 | 1 , 2 , y § / 31 Applied Bayesian Inference, KSU, April 29, 2012 Gibbs sampling: the process 1) Start with some “arbitrary” starting values (but within allowable parameter space) 0 1 1 0 2 2 3 0 3 1 2) Draw 1 from 3) Draw 21 from 1 4) Draw 3 from One cycle = one random draw from p 1 , 2 , 3 | y p | , , y p | , , y p 1 | 2 2 ,3 3 , y 2 1 0 0 1 0 1 3 3 1 3 1 1 2 Steps 2-4 constitute one cycle of Gibbs sampling 5) Repeat steps 2)-4) m times. 1 2 m: length of Gibbs chain § / 32 Applied Bayesian Inference, KSU, April 29, 2012 General extension of Gibbs sampling • When there are d parameters and/or blocks of d ] parameters: θ ' [1 θ2 (0) (0) (0) θ • Again specify starting values: 1 2 d • Sample from the FCD’s in cycle i (k ) (k ) (k+1) p | θ ,..., d ,y Sample 1 from 1 2 k 1 k k (k+1) p θ | , ,..., 3 d ,y Sample 2 from 2 1 … k 1 k 1 k 1 (k+1) p | , θ ,..., Sample d from d 1 2 d 1 ,y Generically, sample i from p i | θ i ,y § / 33 Applied Bayesian Inference, KSU, April 29, 2012 • Throw away enough burn-in samples (k<m) • (k+1) , (k+2) ,..., (m) are a realization of a Markov chain with equilibrium distribution p(|y) • The m-k joint samples of (k+1) , (k+2) ,..., (m) are then considered to be random drawings from the joint posterior density p(|y). • Individually, the m-k samples of j(k+1) , j(k+2) ,..., j(k+m) are random samples of j from the marginal posterior density , p(j|y) j = 1,2,…,d. – i.e., j are “nuisance” variables if interest is directed on j § / 34 Applied Bayesian Inference, KSU, April 29, 2012 Mixed model example with known variance components, flat prior on b. • Recall: 1 1 βˆ X ' R 1X X ' R Z β,u| e2 , u2 ,y~N , 1 1 -1 uˆ Z ' R X Z ' R Z+G 1 1 1 – where b X' R X X' R Z X' R 1 y 1 1 -1 1 Z ' R X Z ' R Z + G Z' R y u • Write – i.e. ˆ β θˆ uˆ X ' R 1X X ' R 1Z C 1 1 -1 Z ' R X Z ' R Z+G β θ u θ| , ,y~N θˆ , C 2 e 2 u 1 ALREADY KNOW JOINT POSTERIOR DENSITY! § / 35 Applied Bayesian Inference, KSU, April 29, 2012 FCD for mixed effects model with known variance components • Ok..really pointless to use MCMC here..but let’s demonstrate. But it be can shown FCD are: ~ ~ 2 2 i | y, i , e , u ~ N i , vi X'R 1y b 1 Z'R y ith row • pq where bi cij j j 1, j i i cii 1 vi cii ith column ith row X ' R 1X X ' R 1Z C 1 1 -1 Z ' R X Z ' R Z+G ith diagonal element β θ u § / 36 Applied Bayesian Inference, KSU, April 29, 2012 • Two ways to sample b and u 1. Block draw from θ| , ,y~N θˆ , C 2 e 2 u 1 – faster MCMC mixing (less/no autocorrelation across MCMC cycles) – But slower computing time (depending on dimension of ). • i.e. compute Cholesky of C • Some alternative strategies available (Garcia-Cortes and Sorensen, 1995) • 2. Series of univariate draws from i | y, θ i , e2 , u2 ~ N i , vi ; i 1, 2,... p q – Faster computationally. – Slower MCMC mixing • Partial solution: “thinning the MCMC chain” e.g., save every 37 10 cycles rather than every cycle § / Applied Bayesian Inference, KSU, April 29, 2012 Example: A split plot in time example (Data from Kuehl, 2000, pg.493) • Experiment designed to explore mechanisms for early detection of phlebitis during amiodarone therapy. – Three intravenous treatments: (A1) Amiodarone (A2) the vehicle solution only (A3) a saline solution. – 5 rabbits/treatment in a completely randomized design. – 4 repeated measures/animal (30 min. intervals) § / 38 Applied Bayesian Inference, KSU, April 29, 2012 SAS data step data ear; input trt rabbit time temp; y = temp; A = trt; B = time; trtrabbit = compress(trt||'_'||rabbit); wholeplot=trtrabbit; cards; 1 1 1 -0.3 1 1 2 -0.2 1 1 3 1.2 1 1 4 3.1 1 2 1 -0.5 1 2 2 2.2 1 2 3 3.3 1 2 4 3.7 etc. § / 39 Applied Bayesian Inference, KSU, April 29, 2012 The data (“spaghetti plot”) § / 40 Applied Bayesian Inference, KSU, April 29, 2012 Profile (Interaction) means plots § / 41 Applied Bayesian Inference, KSU, April 29, 2012 A split plot model assumption for repeated measures Treatment 1 RABBIT IS THE EXPERIMENTAL UNIT FOR TREATMENT Rabbit 3 Rabbit 1 Time 1 Rabbit 2 Time 1 Time 2 Time 2 Time 2 Time 3 Time 3 Time 3 Time 4 Time 4 Time 4 Time 1 RABBIT IS THE BLOCK FOR TIME § / 42 Applied Bayesian Inference, KSU, April 29, 2012 Suppose CS assumption was appropriate CONDITIONAL SPECIFICATION: Model variation between experimental units (i.e. rabbits) yijk a i uk (i ) b j abij eijk uk ( i ) ~ NIID 0, u2(a ) eijk ~ NIID 0, e2 – This is a partially nested or split-plot design. • i.e. for treatments, rabbits is the experimental unit; for time, rabbits is the block! § / 43 Applied Bayesian Inference, KSU, April 29, 2012 Analytical (non-simulation) Inference based on PROC MIXED Let’s assume “known” u2(a ) 0.10 e2 0.60 Flat priors on fixed effects p(b) 1. title 'Split Plot in Time using Mixed'; title2 'Known Variance Components'; proc mixed data=ear noprofile; class trt time rabbit; model temp = trt time trt*time /solution; random rabbit(trt); parms (0.1) (0.6) /hold = 1,2; ods output solutionf = solutionf; run; proc print data=solutionf; where estimate ne 0; run; § / 44 Applied Bayesian Inference, KSU, April 29, 2012 (Partial) Output Obs Effect trt time Estimate StdErr DF 1 Intercept _ _ 0.2200 0.3742 12 2 trt 1 _ 2.3600 0.5292 12 3 trt 2 _ -0.2200 0.5292 12 5 time _ 1 -0.9000 0.4899 36 6 time _ 2 0.02000 0.4899 36 7 time _ 3 -0.6400 0.4899 36 9 trt*time 1 1 -1.9200 0.6928 36 10 trt*time 1 2 -1.2200 0.6928 36 11 trt*time 1 3 -0.06000 0.6928 36 13 trt*time 2 1 0.3200 0.6928 36 14 trt*time 2 2 -0.5400 0.6928 36 15 trt*time 2 3 0.5800 36 0.6928 § / 45 Applied Bayesian Inference, KSU, April 29, 2012 MCMC inference • First set up dummy variables. /* Based on the zero out last level restrictions */ proc transreg data=ear design order =data; model class(trt|time / zero=last); id y trtrabbit; output out=recodedsplit; run; proc print data=recodedsplit (obs=10); var intercept &_trgind; run; Corner parameterization implicit in SAS linear model s software § / 46 Applied Bayesian Inference, KSU, April 29, 2012 Partial Output (First two rabbits) Obs _NA Inter trt1 ME_ cept trt2 Trt1 time 3 0 Trt2 time 1 0 Trt2 time 2 0 Trt2 trt time 3 0 1 time y 0 time time time Trt1 Trt1 1 2 3 time time 1 2 1 0 0 1 0 1 -0.3 1 1 2 -0.2 1 3 1.2 4 3.1 1 -0.3 1_1 1 0 0 1 0 0 1 0 0 0 0 1 2 -0.2 1_1 1 1 0 0 0 1 0 0 1 0 0 0 1 3 1.2 1_1 1 1 0 0 0 0 0 0 0 0 0 0 1 4 3.1 1_1 5 -0.5 1 1 0 1 0 0 1 0 0 0 0 0 1 1 -0.5 1_2 6 2.2 1 1 0 0 1 0 0 1 0 0 0 0 1 2 2.2 1_2 7 3.3 1 1 0 0 0 1 0 0 1 0 0 0 1 3 3.3 1_2 8 3.7 1 1 0 0 0 0 0 0 0 0 0 0 1 4 3.7 1_2 9 -1.1 1 1 0 1 0 0 1 0 0 0 0 0 1 1 -1.1 1_3 10 2.4 1 0 0 1 0 0 1 0 0 0 0 1 2 2.4 1 Part of X matrix (full-rank) trtrab bit 1_3 § / 47 Applied Bayesian Inference, KSU, April 29, 2012 MCMC using PROC IML proc iml; Full code available online seed = &seed; nburnin = 5000; /* number of burn in samples */ total = 200000;/* total number of Gibbs cycles beyond burnin */ thin= 10; /* saving every “thin" */ ncycle = total/skip; /* leaving a total of ncycle saved samples */ § / 48 Applied Bayesian Inference, KSU, April 29, 2012 Key subroutine (univariate sampling) | y, θ , , ~ N , v ; i 1, 2,... p q start gibbs; i i 2 e 2 u i i /* univariate Gibbs sampler */ do j = 1 to dim; /* dim = p + q */ /* generate from full conditionals for fixed and random effects */ solt = wry[j] - coeff[j,]*solution + coeff[j,j]*solution[j]; solt = solt/coeff[j,j]; vt = 1/coeff[j,j]; solution[j] = solt + sqrt(vt)*rannor(seed); end; finish gibbs; pq bi cij j j 1, j i i cii 1 vi cii § / 49 Applied Bayesian Inference, KSU, April 29, 2012 • Output samples to SAS data set called soldata proc means mean median std data=soldata; run; ods graphics on; %tadplot(data=soldata, var=_all_); ods graphics off; %tadplot is a SAS automacro suited for processing MCMC samples. § / 50 Applied Bayesian Inference, KSU, April 29, 2012 Comparisons for fixed effects MCMC (Some Monte Carlo error) Variable Mean Median Std Dev int 0.218 0.218 0.374 TRT1 2.365 2.368 0.526 TRT2 -0.22 -0.215 0.532 TIME1 -0.902 -0.903 0.495 TIME2 0.0225 0.0203 0.491 TIME3 -0.64 -0.643 0.488 TRT1 -1.915 -1.916 0.692 TIME1 TRT1 -1.224 -1.219 0.69 TIME2 TRT1 -0.063 -0.066 0.696 TIME3 TRT2 0.321 0.316 0.701 TIME1 TRT2 -0.543 -0.54 0.696 TIME2 TRT2 0.58 0.589 0.694 TIME3 N 20000 20000 20000 20000 20000 20000 20000 EXACT (PROC MIXED) Effect trt time Estimate StdErr Intercept _ _ 0.2200 0.3742 trt 1 _ 2.3600 0.5292 trt 2 _ -0.2200 0.5292 time _ 1 -0.9000 0.4899 time _ 2 0.02000 0.4899 time _ 3 -0.6400 0.4899 trt*time 1 1 -1.9200 0.6928 trt*time 1 2 -1.2200 0.6928 trt*time 1 3 -0.06000 0.6928 trt*time 2 1 0.3200 20000 trt*time 2 2 -0.5400 0.6928 20000 trt*time 2 3 0.5800 20000 20000 20000 0.6928 0.6928 § / 51 Applied Bayesian Inference, KSU, April 29, 2012 %TADPLOT output on “intercept”. Trace Plot Autocorrelation Plot Posterior Density § / 52 Applied Bayesian Inference, KSU, April 29, 2012 Marginal/Cell Means • Effects on previous 2-3 slides not of particular interest. • Marginal means: – Can derive using contrast vectors that are used to compute least squares means in PROC GLM/MIXED/GLIMMIX etc. • lsmeans trt time trt*time / e; – Ai: marginal mean for trt i – Bj : marginal mean for time j – AiBj: cell mean for trt i time j. § / 53 Applied Bayesian Inference, KSU, April 29, 2012 Examples of marginal/cell means • Marginal means ntime 1 ntime A1 trt1 time j trt1time j ntime j 1 j 1 ntrt 1 ntrt B1 time1 trti trti time1 ntrt i 1 i 1 • Cell mean A1B1 trt1 time1 trt1time1 § / 54 Applied Bayesian Inference, KSU, April 29, 2012 Marginal/cell (“LS”) means. MCMC (Monte Carlo error) Variable A1 A2 A3 B1 B2 B3 B4 A1B1 A1B2 A1B3 A1B4 A2B1 A2B2 A2B3 A2B4 A3B1 A3B2 A3B3 A3B4 Mean 1.403 -0.293 -0.162 -0.501 0.366 0.465 0.932 -0.234 1.382 1.88 2.583 -0.584 -0.524 -0.062 -0.003 -0.684 0.24 -0.422 0.218 Median 1.401 -0.292 -0.162 -0.5 0.365 0.466 0.931 -0.231 1.382 1.878 2.583 -0.585 -0.526 -0.058 -0.005 -0.684 0.242 -0.423 0.218 Std Dev 0.223 0.223 0.224 0.216 0.213 0.217 0.216 0.373 0.371 0.374 0.372 0.375 0.373 0.373 0.377 0.377 0.374 0.376 0.374 EXACT (PROC MIXED) trt time 1 2 3 1 1 1 1 2 2 2 2 3 3 3 3 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Estimate 1.4 -0.29 -0.16 -0.5 0.3667 0.4667 0.9333 -0.24 1.38 1.88 2.58 -0.58 -0.52 -0.06 -3.61E-16 -0.68 0.24 -0.42 0.22 Standard Error 0.2236 0.2236 0.2236 0.216 0.216 0.216 0.216 0.3742 0.3742 0.3742 0.3742 0.3742 0.3742 0.3742 0.3742 0.3742 0.3742 0.3742 § /55 0.3742 Applied Bayesian Inference, KSU, April 29, 2012 Posterior densities of a1, b1, a1b1. Dotted lines: normal density inferences based on PROC MIXED Closed lines: MCMC § /56 Applied Bayesian Inference, KSU, April 29, 2012 Generalized linear mixed models (Probit Link Model) • Stage 1: i 1 • Stage 2: p y | β, u x β z u n p u | u2 ' i ' i 1 2 q /2 A 2 1/2 u 1 x β z u yi ' i ' i 1 yi 1 exp 2 u ' A 1u 2 u p β constant • Stage 3: p u2 p β, u , | y p y | β, u p u | 2 u 2 u p β p 2 u § / 57 Applied Bayesian Inference, KSU, April 29, 2012 Rethinking prior on b • i.e. p β constant – Might not be the best idea for binary data, especially when the data is “sparse” • Animal breeders call this the “extreme category problem” – e.g., if all of responses in a fixed effects subclass is either 1 or 0, then ML/PM of corresponding marginal mean will approach -/+ ∞. – PROC LOGISTIC has the FIRTH option for this very reason. • Alternative: β ~ N (β 0 , I β2 ) – Typically, β 0 0 16 < 2b < 50 is probably sufficient on the underlying latent scale (conditionally N(0,1)) § / 58 Applied Bayesian Inference, KSU, April 29, 2012 Recall Latent Variable Concept (Albert and Chib, 1993) • Recall Prob(Yi 1 | b, u) x i' b z i' u Then Prob(Yi 1) 1 0.159 0.3 0.4 0.2 xi' β z i' u 1 ' i 0.1 Suppose for animal i ' i 0.0 ' i Standard Normal Density i | β , u ~ N (x b z u,1) i - x b z u ' i -3 -2 -1 0 1 2 3 liability § / 59 Applied Bayesian Inference, KSU, April 29, 2012 Data augmentation with ={i}, n n i 1 i 1 ' ' p | b , u p | b , u x b z • i.e. i i i iu 1 if i 0 Pr obYi 1 | i 0 otherwise 1 if i 0 Pr obYi 0 | i 0 otherwise Pr ob Yi y | i distribution of Y becomes degenerate or point mass in form conditional on I i 0 I Yi 0 I I . indicator function i 0 I Yi 1 § / 60 Applied Bayesian Inference, KSU, April 29, 2012 Rewrite hierarchical model • Stage 1a) p y | n I i 1 • Stage 1b) p n i | β, u p i 1 0 I Yi 0 I n i | β, u i 1 i 0 I Yi 1 ' ' x β z i i iu • Those two stages define likelihood function p (y | β, u) p(y, | β, u)d p (y | ) p | β, u d n Prob(Y yi | i ) p ( i | β, u) d i 1 n x bz u i 1 ' i ' i 1 x b z u yi ' i ' i 1 yi § / 61 Applied Bayesian Inference, KSU, April 29, 2012 Joint Posterior Density • Now p , β, u, u2 | y p (y | ) p( | β, u) p β p (u | u2 ) p ( u2 ) • Let’s for now assume known 2u: p , β, u | y, u2 p (y | ) p( | β, u) p β p (u | u2 ) § / 62 Applied Bayesian Inference, KSU, April 29, 2012 FCD • Liabilities: p i | b, u, , y Pr ob(Y y i | i ) p( i | b, u) 2 u p i | b, u, u2 , y i x i' b z i' u I i 0 if yi = 1 p i | b, u, u2 , y i x i' b z i' u I i 0 if yi = 0 i.e., draw from truncated normals § / 63 Applied Bayesian Inference, KSU, April 29, 2012 FCD (cont’d) • Fixed and random effects p b, u | u2 , , y p( | b, u) p(u | u2 ) u' A 1u Xb Zu ' Xb Zu exp exp 2 2 2 u 1 βˆ X ' X X'Z 2 2 β,u| e , u ,y~N , -1 uˆ Z ' X Z ' Z+G where βˆ X'X X'Z 1 2 Z'X Z'Z A u ˆ u 1 X' Z' § / 64 Applied Bayesian Inference, KSU, April 29, 2012 Alternative Sampling strategies for fixed and random effects • 1. Joint multivariate draw from β,u| u2 ,y, – faster mixing…but computationally expensive? • 2. Univariate draws from FCD using partitioned matrix results. – Refer to Slides # 36, 37, 49 – Slower mixing. § / 65 Applied Bayesian Inference, KSU, April 29, 2012 Recall “binarized” RCBD Litter 1 2 3 4 5 6 7 8 9 10 Diet 1 79.5>75 70.9 76.8>75 75.9>75 77.3>75 66.4 59.1 64.1 74.5 67.3 Diet 2 80.9>75 81.8>75 86.4>75 75.5>75 77.3>75 73.2 77.7>75 72.3 81.4>75 82.3>75 Diet 3 79.1>75 70.9 90.5>75 62.7 69.5 86.4>75 72.7 73.6 64.5 65.9 Diet 4 88.6>75 88.6>75 89.1>75 91.4>75 75.0 79.5>75 85.0>75 75.9>75 75.5>75 70.5 Diet 5 95.9>75 85.9>75 83.2>75 87.7>75 74.5 72.7 90.9>75 60.0 83.6>75 63.2 § / 66 Applied Bayesian Inference, KSU, April 29, 2012 MCMC analysis • 5000 burn-in cycles • 500,000 additional cycles – Saving every 10: 50,000 saved cycles • Full conditional univariate sampling on fixed and random effects. • “Known” 2u = 0.50. • Remember…no 2e. § / 67 Applied Bayesian Inference, KSU, April 29, 2012 Fixed Effect Comparison on inferences (conditional on “known” 2u = 0.50) • MCMC a 1 a 2 β a 3 a 4 a 5 • PROC GLIMMIX Variable intercept DIET1 DIET2 DIET3 DIET4 Mean Median Std Dev 0.349 0.345 0.506 -0.659 -0.654 0.64 0.761 0.75 0.682 -1 -0.993 0.649 0.76 0.753 0.686 N 50000 50000 50000 50000 50000 Solutions for Fixed Effects Effect diet Intercept Estimate Standard Error 0.3097 0.4772 diet 1 -0.5935 0.5960 diet 2 0.6761 0.6408 diet 3 -0.9019 0.6104 diet 4 0.6775 0.6410 diet 5 0 . § / 68 Applied Bayesian Inference, KSU, April 29, 2012 Marginal Mean Comparisons MCMC • Based on K’b 1 1 K ' 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 PROC GLIMMIX Variable Mean Median Std Dev mm1 -0.31 -0.302 0.499 mm2 1.11 1.097 0.562 mm3 -0.651 -0.644 0.515 mm4 1.109 1.092 0.563 mm5 0.349 0.345 0.506 N 50000 50000 50000 50000 50000 diet Least Squares Means diet Estimate Standard Error 1 -0.2838 0.4768 2 0.9858 0.5341 3 -0.5922 0.4939 4 0.9872 0.5343 5 0.3097 0.4772 § / 69 Applied Bayesian Inference, KSU, April 29, 2012 Diet 1 Marginal Mean (+a1) § / 70 Applied Bayesian Inference, KSU, April 29, 2012 Posterior Density discrepancy between MCMC and Empirical Bayes for i? Dotted lines: normal approximation based on PROC GLIMMIX Closed lines: MCMC Do we run the risk of overstating precision with conventional methods? Diet Marginal Means § / 71 Applied Bayesian Inference, KSU, April 29, 2012 How about probabilities of success? Variable Mean Median Std Dev MCMC • i.e., (K’b) or normal cdf of marginal means PROC GLIMMIX N prob1 0.391 0.381 0.173 20000 prob2 0.833 0.864 0.126 20000 prob3 0.282 0.26 0.157 20000 prob4 0.833 0.863 0.126 20000 prob5 0.623 0.635 0.173 20000 Mean Standard diet 1 2 3 4 5 Estimate Standard Error -0.2838 0.9858 -0.5922 0.9872 0.3097 0.4768 0.5341 0.4939 0.5343 0.4772 0.3883 0.8379 0.2769 0.8382 0.6216 DELTA METHOD Error Mean 0.1827 0.1311 0.1653 0.1309 0.1815 § / 72 Applied Bayesian Inference, KSU, April 29, 2012 Comparison of Posterior Densities for Diet Marginal Mean Probabilities Dotted lines: normal approximation based on PROC GLIMMIX Closed lines: MCMC Largest discrepancies along the boundaries § / 73 Applied Bayesian Inference, KSU, April 29, 2012 Posterior density of (+a1) & (+a2) (+a1) (+a2) § /74 Applied Bayesian Inference, KSU, April 29, 2012 Posterior density of (+a2) - (+a1) prob21_diff Frequency Percent prob21_diff < 0 819 1.64 prob21_diff >= 0 49181 98.36 Probability ((+a2) (+a1) < 0) = 0.0164 “Two-tailed” Pvalue = 2*0.0164 = 0.0328 § / 75 Applied Bayesian Inference, KSU, April 29, 2012 How does that compare with PROC GLIMMIX? Estimates Label Estimate Standard DF Error t Value Pr > |t| Mean Standard Error Mean diet 1 lsmean -0.2838 0.4768 10000 -0.60 0.5517 0.3883 0.1827 diet 2 lsmean 0.9858 0.5341 10000 1.85 0.0650 0.8379 0.1311 diet1 vs diet2 dif -1.2697 0.6433 10000 -1.97 0.0484 Non-est . Recall, we assumed “known” 2u …hence normal rather than t-distributed test statistic. § / 76 Applied Bayesian Inference, KSU, April 29, 2012 What if variance components are not known? • Specify priors on variance components: Options? – 1. Conjugate (Scaled Inverted Chi-Square) denoted as nm c-2 (nm, nmsm2)) 2 n m sm 2 2 n m n m sm 1 2 2 m2 2 2 2 2 p m | n m , sm m e ; m u, e n m 2aswell?) – 2. Flat (and bounded p 2 m 1; m u, e sm2 0 – 3. Gelman’s (2006) prior p m Uniform(0, A) p n m 2 2 m n m 1 2 m 1 2 ; 0 A 2 m 2 sm2 0 § /77 Applied Bayesian Inference, KSU, April 29, 2012 Relationship between Scaled Inverted Chi-Square & Inverted Gamma • Scaled Inverted Chisquare: n n s2 2 2 n n s 1 2 2 2 2 2 2 2 p | n , s e n 2 2 n s E 2 | n , s 2 ;n 2 n 2 Var | v, s 2 2 2v 2 s v 2 2 2 b a b 2 2 (a 1) p | a , b e a E 2 b a 1 2 ;a 1 2 v 4 ;n 2 Var 2 b2 a 1 a 2 2 Gelman’s prior Gelman’s prior n m 1 • Inverted Gamma s 0 2 m a 1 2 b 0 § /78 Applied Bayesian Inference, KSU, April 29, 2012 Gibbs sampling and mixed effects models • Recall the following hierarchical model: y Xβ Zu ' y Xβ Zu p y|β,u, , 2 exp 2 2 e 1 q /2 u ' A u 2 2 p u | u 2 u exp 2 2 u 2 e 2 u 2 n /2 e p u2 | n u , su2 2 n u n u su 1 2 u2 2 2 u p e2 | n e , se2 2 n e n e se 1 2 e2 2 2 e e e § / 79 Applied Bayesian Inference, KSU, April 29, 2012 Joint Posterior Density and FCD p β,u, , |y 2 e 2 q /2 u 2 u 2 n /2 e u'A u exp 2 2 u 1 y Xβ Zu ' y Xβ Zu exp 2 2 e 2 n u n u su 1 2 u2 2 2 u e 2 n e n e se 1 2 e2 2 2 e e • FCD for b and u: same as before: normal • FCD for VC: c-2 e y Xβ Zu n 2 2 2 p e | ELSE , y c n e n, e`e n e se j 1 n 2 2 1 2 p u | ELSE , y c n u q, u`A u n u su j 1 § / 80 Applied Bayesian Inference, KSU, April 29, 2012 Back to Split Plot in Time Example • Empirical Bayes (EGLS based on REML) title 'Split Plot in Time using Mixed'; title2 'UnKnown Variance Components'; proc mixed data=ear covtest ; class trt time rabbit; model temp = trt time trt*time /solution; random rabbit(trt); ods output solutionf = solutionf; run; proc print data=solutionf; where estimate ne 0; run; Fully Bayes: • 5000 burnin-cycles • 200000 subsequent cycles • Save every 10 post burn-in • Use Gelman’s prior on VC Code available online § / 81 Applied Bayesian Inference, KSU, April 29, 2012 Variance component inference PROC MIXED Covariance Parameter Estimates Cov Parm Estimate Standard Z Value Error rabbit(trt) 0.08336 0.09910 0.84 Pr > Z Residual 0.5783 <.0001 MCMC Variable Mean 0.1363 4.24 Median Std Dev sigmau 0.127 0.0869 sigmae 0.632 0.611 0.2001 N 0.141 20000 0.15 20000 § /82 Applied Bayesian Inference, KSU, April 29, 2012 MCMC plots Random effects variance Residual Variance § /83 Applied Bayesian Inference, KSU, April 29, 2012 Estimated effects ± se (sd) PROC MIXED Effect trt time Intercept _ Estimate MCMC StdErr _ 0.22 0.3638 trt 1_ 2.36 0.5145 trt 2_ -0.22 0.5145 time _ 1 -0.9 0.481 time _ 2 0.02 0.481 time _ 3 -0.64 0.481 trt*time 1 1 -1.92 0.6802 trt*time 1 2 -1.22 0.6802 trt*time 1 3 -0.06 0.6802 trt*time 2 1 0.32 0.6802 trt*time 2 2 -0.54 0.6802 trt*time 2 3 0.58 0.6802 Variable Mean Median Std Dev intercept 0.217 0.214 0.388 TRT1 2.363 2.368 0.55 TRT2 -0.22 -0.219 0.55 TIME1 -0.898 -0.893 0.499 TIME2 0.0206 0.0248 0.502 TIME3 -0.64 -0.635 0.501 TRT1 -1.924 -1.931 0.708 TIME1 TRT1 -1.222 -1.22 0.71 TIME2 TRT1 -0.057 -0.057 0.715 TIME3 TRT2 0.318 0.315 0.711 TIME1 TRT2 -0.54 -0.541 0.711 TIME2 TRT2 0.585 0.589 0.71 TIME3 N 20000 20000 20000 20000 20000 20000 20000 20000 20000 20000 20000 20000 § /84 Applied Bayesian Inference, KSU, April 29, 2012 Marginal (“Least Squares”) Means PROC MIXED Least Squares Means Effect trt time Estimate Standar d Error A1 trt 1 1.4000 0.2135 trt 2 -0.2900 0.2135 trt 3 -0.1600 0.2135 1 -0.5000 0.2100 B1 time time 2 0.3667 0.2100 time 3 0.4667 0.2100 time 4 0.9333 0.2100 1 -0.2400 0.3638 A1B1 trt*time 1 trt*time 1 2 1.3800 0.3638 trt*time 1 3 1.8800 0.3638 trt*time 1 4 2.5800 0.3638 trt*time 2 1 -0.5800 0.3638 trt*time 2 2 -0.5200 0.3638 trt*time 2 3 -0.06000 0.3638 trt*time 2 4 4.44E-16 0.3638 trt*time trt*time trt*time trt*time 3 3 3 3 1 2 3 4 -0.6800 0.2400 -0.4200 0.2200 0.3638 0.3638 0.3638 0.3638 DF 12 12 12 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 MCMC Variable Mean Median Std Dev A1 A1 1.399 1.401 0.24 A2 -0.292 -0.29 0.237 A3 -0.16 -0.161 0.236 B1 B1 -0.502 -0.501 0.224 B2 0.364 0.363 0.222 B3 0.467 0.466 0.224 B4 0.934 0.936 0.222 A1B1 A1B1 -0.244 -0.246 0.389 A1B2 1.378 1.379 0.391 A1B3 1.882 1.88 0.391 A1B4 2.581 2.584 0.391 A2B1 -0.586 -0.586 0.393 A2B2 -0.526 -0.525 0.385 A2B3 -0.058 -0.054 0.387 A2B4 0.0031 0.0017 0.386 A3B1 -0.676 -0.678 0.388 A3B2 0.239 0.241 0.386 A3B3 -0.422 -0.427 0.392 A3B4 0.219 0.216 0.385 § /85 Applied Bayesian Inference, KSU, April 29, 2012 Posterior Densities of A1, B1, A1B1 Dotted lines: t densities based on estimates/stde from PROC MIXED Closed lines: MCMC § / 86 Applied Bayesian Inference, KSU, April 29, 2012 How about fully Bayesian inference in generalized linear mixed models? • Probit link GLMM. – Extensions to handle unknown variance components are exactly the same given the augmented liability variables. • i.e. scaled-inverted chi-square conjugate to 2u. – No “overdispersion” (2e) to contend with for binary data. • But stay tuned for binomial/Poisson data! § / 87 Applied Bayesian Inference, KSU, April 29, 2012 Analysis of “binarized” RCBD data. Empirical Bayes title 'Posterior inference conditional on unknown VC'; proc glimmix data=binarize; class litter diet; model y = diet / covb solution dist=bin link = probit; random litter; lsmeans diet / diff ilink; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 / ilink; estimate 'diet 2 lsmean' intercept 1 diet 0 1 0 0 0/ ilink; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0; run; Fully Bayes 10000 burnin cycles 200000 cycles therafter Saving every 10 Gelman’s prior on VC. § /88 Applied Bayesian Inference, KSU, April 29, 2012 Inferences on VC Method = RSPL Covariance Parameter Estimates Estimate Standard Error 0.5783 0.5021 MCMC Analysis Variable : sigmau Mean Median Std Dev N 2.048 1.468 2.128 20000 Method = Laplace Covariance Parameter Estimates Estimate Standard Error 0.6488 0.6410 Method = Quad Covariance Parameter Estimates Estimate Standard Error 0.6662 0.6573 § /89 Applied Bayesian Inference, KSU, April 29, 2012 Inferences on marginal means (+ai) MCMC Method = Laplace diet Least Squares Means diet Estimate Standard DF Error 1 -0.3024 0.5159 36 2 1.0929 0.5964 36 3 -0.6428 0.5335 36 4 1.0946 0.5976 36 5 0.3519 0.5294 36 Variable Mean Median Std Dev mm1 mm2 mm3 mm4 mm5 -0.297 -0.301 1.322 1.283 -0.697 -0.69 1.319 1.285 0.465 0.442 0.643 0.716 0.662 0.72 0.671 N 20000 20000 20000 20000 20000 Larger: take into account uncertainty on variance components § /90 Applied Bayesian Inference, KSU, April 29, 2012 Posterior Densities of (+ai) Dotted lines: t36 densities based estimates and standard errors from PROC GLIMMIX (method=laplace) Closed lines: MCMC § /91 Applied Bayesian Inference, KSU, April 29, 2012 MCMC inferences on probabilities of “success”: (based on (+ai) § /92 Applied Bayesian Inference, KSU, April 29, 2012 MCMC inferences on marginal probabilities: a (based on ) i 2 1 u Potentially big issues with empirical Bayes inference…dependent upon quality of VC inference & asymptotics! § / 93 Applied Bayesian Inference, KSU, April 29, 2012 Inference on Diet 1 vs. Diet 2 probabilities MCMC PROC GLIMMIX Estimates Label Mean Standard Error Mean diet 1 lsmean 0.3812 0.1966 diet 2 lsmean 0.8628 diet1 vs diet2 dif Variable Mean Median Std N Dev Prob 0.4 0.382 0.212 20000 diet1 0.857 0.899 0.137 20000 0.1309 Prob diet2 Non-est . P-value = 0.0559 Prob diff 0.457 0.464 0.207 20000 prob21_diff MCMC Frequency Percent prob21_diff < 0 180 0.90 prob21_diff >= 0 19820 99.10 Probability ((+a2) - (+a1) < 0) = 0.0090 (“one-tailed”) § / 94 Applied Bayesian Inference, KSU, April 29, 2012 Any formal comparisons between GLS/REML/EB(M/PQL) and MCMC for GLMM? • Check Browne and Draper (2006). • Normal data (LMM) – Generally, inferences based on GLS/REML and MCMC are sufficiently close. – Since GLS/REML is faster, it is the method of choice for classical assumptions. • Non-normal data (GLMM). – Quasi-likelihood based methods are particularly problematic in bias of point estimates and interval coverage of variance components. • Side effects on fixed effects inference. – Bayesian methods with diffuse priors are well calibrated for both properties for all parameters. – Comparisons with Laplace not done yet. § / 95 Applied Bayesian Inference, KSU, April 29, 2012 A pragmatic take on using MCMC vs PL for GLMM under classical assumptions? • If datasets are too small to warrant asymptotic considerations, then the experiment is likely to be poorly powered. – Otherwise, PL might ≈ MCMC inference. • However, differences could depend on dimensionality, deviation of data distribution from normal, and complexity of design. • The real big advantage of MCMC ---is multistage hierarchical models (see later) § / 96 Applied Bayesian Inference, KSU, April 29, 2012 Implications of design on Fully Bayes vs. PL inference for GLMM? • RCBD: Known for LMM, that inferences on treatment differences in RCBD are resilient to estimates of block VC. – Inference on differences in treatment effects thereby insensitive to VC inferences in GLMM? • Whole plot treatment factor comparisons in split plot designs? • Greater sensitivity (i.e. whole plot VC). – Sensitivity of inference for conditional versus xβ “population-averaged” probabilities? x β vs. ' i ' i 2 1 u § / 97 Applied Bayesian Inference, KSU, April 29, 2012 Ordinal Categorical Data • Back to the GF83 data. – Gibbs sampling strategy laid out by Sorensen and Gianola (1995); Albert and Chib (1993). – Simple extensions to what was considered earlier for linear/probit mixed models 1 if j 1 i j Pr ob Yi j | i , j 1 , j 0 otherwise § / 98 Applied Bayesian Inference, KSU, April 29, 2012 Joint Posterior Density • Stages 1A c p y | , τ Pr ob Yi j | i , τ I j 1 j 1 i 1 n 1B p 2 p u | u2 2 3 i | β, u,τ i xi' β z i' u , j 1 1 2 q/2 A pb constant 2 1/ 2 u 2 u i j 1 1 exp u ' A u 2 2 u (or something diffuse) ~ p | n u , s 2 u i j I Yi j 2 u 2 n u 1 u bu exp 2 u § / 99 Applied Bayesian Inference, KSU, April 29, 2012 Anything different for FCD compared to probit binary? • Liabilities p i | β, u,yi j ' ' x β z i iu I j 1 i i j I Yi j • Thresholds: p j | j , ELSE U min( | Y j 1),max( | Y j ) – This leads to painfully slow mixing…a better strategy is based on Metropolis sampling (Cowles et al., 1996). § /100 Applied Bayesian Inference, KSU, April 29, 2012 Fully Bayesian inference on GF83 • 5000 burn-in samples • 50000 samples post burn-in • Saving every 10. Diagnostic plots for 2u § /101 Applied Bayesian Inference, KSU, April 29, 2012 Posterior Summaries Variable Mean Median Std Dev 5th Pctl 95th Pctl intercept hy age sex sire1 sire2 sire3 sire4 sigmau thresh2 probfemalecat1 -0.222 0.236 -0.036 -0.172 -0.082 0.116 0.194 -0.173 1.362 0.83 0.598 -0.198 0.223 -0.035 -0.171 -0.042 0.0491 0.106 -0.11 0.202 0.804 0.609 0.669 0.396 0.392 0.393 0.587 0.572 0.625 0.606 8.658 0.302 0.188 -1.209 -0.399 -0.69 -0.818 -1 -0.641 -0.64 -1.118 0.0021 0.383 0.265 0.723 0.894 0.598 0.48 0.734 0.937 1.217 0.595 4.148 1.366 0.885 probfemalecat2 0.827 0.864 0.148 0.53 0.986 probmalecat1 0.539 0.545 0.183 0.23 0.836 probmalecat2 0.79 0.821 0.154 0.491 0.974 § /102 Applied Bayesian Inference, KSU, April 29, 2012 Posterior densities of sex-specific cumulative probabilities (first two categories) How would interpret a “standard error” in this context? § /103 Applied Bayesian Inference, KSU, April 29, 2012 Posterior densities of sex-specific probabilities (each category) § /104 Applied Bayesian Inference, KSU, April 29, 2012 What if some FCD are not recognizeable? • Examples: Poisson mixed models, logistic mixed models. • Hmmm.. Need a different strategy. – Use Gibbs sampling whenever you can. – Use Metropolis-Hastings sampling for FCD that are not recognizeable. • NEXT! § /105