Download Bayes factor for paired data

Supplementary Information Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing Corey M. Yanofsky and David R. Bickel A.1. Heuristic overview of the derivation of the Bayes factor This document contains an explicit derivation of the Bayes factor used in the main paper for both paired and unpaired data. In each case, there are two models for the data: the null model in which the gene is equivalently expressed in the two conditions, and the alternative model in which the gene is differentially expressed. The derivation of the Bayes factor requires two components per model. The first component is the probability distribution of the data conditional on some statistical parameters; this is termed the likelihood function. The differential expression model will always have one extra parameter to take into account the fact that the gene’s expression level is different across conditions. The data are always modeled as: observed datum = average data level + measurement error. Here, the average data level is an unknown parameter. Throughout, the measurement errors are assumed to be independent and identically distributed as Gaussian random variables. That is, for all j, 𝑝( 𝜀𝑗 |𝜎 2 ) = 𝜎 1 1 exp [− 2𝜎2 𝜀𝑗 2 ], 2𝜋 √ where 𝜀𝑗 is the measurement error of the jth observation and 𝜎 2 is the data variance . The second component is the prior distribution of the model parameters, namely, the baseline expression level of the gene and the experimental variability of the data. The prior distribution summarizes everything that is known about the model parameters prior to observing the data. Since the basic expression level of the gene and the variability of the data are unknown, we use standard default priors for them. The extra parameter in the alternative model measures the amount of differential expression. Here, we use what has been called a unit-information prior distribution, that is, a prior distribution that contains exactly as much information as one extra data point. The unitinformation prior is weakly informative, so it will not unduly influence the results in favor of either model. To calculate the Bayes factor, we marginalize the model parameters; that is, we integrate the likelihood function with respect to the prior distribution, resulting in a prior predictive distribution. The marginalization removes the nuisance parameters from the expression. The Bayes factor is the ratio of the prior predictive distributions under the null and alternative models. 1 A.2 Derivations Bayes factor for paired data ′ Suppose n = n′ and 𝑥𝑖,𝑗 is paired with 𝑥𝑖,𝑗 . Let ′ 𝑦𝑗 = 𝑥𝑖,𝑗 − 𝑥𝑖,𝑗 . (S1) The hypothesis of equivalent expression is M0: yj = j, and the hypothesis of differential expression is M1: yj =  + j. Prior distributions For both models, we set 𝑝(𝜎 2 ) ∝ 1 . 𝜎2 For M1, we use the unit information prior 𝑝(𝛼|𝜎 2 ) = 1 𝜎√2𝜋 exp [− 1 (𝛼 − 𝜇𝛼 )2 ]. 2𝜎 2 In the main text of the paper, the prior mean 𝜇𝛼 is set to zero. Null model prior predictive distribution The prior predictive distribution of the data under M0 is ∞ 𝑝(𝑦|𝑀0 ) = ∫ 𝑝(𝜎 2 ) ∏ 𝑝(𝑦𝑗 |𝜇 = 0, 𝜎 2 ) 𝑑𝜎 2 , 0 𝑗 𝑛 ∞ ̅̅̅2 1 2 +1 𝑛𝑦 ⁄2 −𝑛 𝑝(𝑦|𝑀0 ) = (2𝜋) ∫ ( 2) exp (− 2 ) 𝑑𝜎 2 , 𝜎 2𝜎 0 𝑛 −𝑛⁄2 ̅̅̅2 ) 𝑝(𝑦|𝑀0 ) = (2𝜋)−𝑛⁄2  (2 ) (𝑛𝑦 . Alternative model prior predictive distribution We define in advance: 𝛼̂ = (𝑛𝑦̅+𝜇𝛼 ) 𝑛+1 , 2 2 𝑆𝑆𝑅1 = (𝜇𝛼 − 𝛼̂)2 + ∑(𝑦𝑗 − 𝛼̂) . 𝑗 After some algebra, we can derive ̅̅̅2 − (𝑦̅)2 ) + 𝑆𝑆𝑅1 = 𝑛(𝑦 𝑛 (𝑦̅ − 𝜇𝛼 )2 . 𝑛+1 Here, SSR1 is the effective sum of squares of the residuals under M1. It is the sum of the SSR using the maximum likelihood estimator 𝛼𝑀𝐿𝐸 = 𝑦̅ and a term that penalizes disagreement between the MLE and the prior mean. The prior predictive distribution of the data under M1 is ∞ ∞ 𝑝(𝑦|𝑀1 ) = ∫ ∫ 𝑝(𝜎 2 )𝑝(𝛼|𝜎 2 ) ∏ 𝑝(𝑦𝑗 |𝜇 = 𝛼, 𝜎 2 ) 𝑑𝛼 𝑑𝜎 2 , 𝑗 0 −∞ ∞ (𝑛+1) +1 2 1 𝑝(𝑦|𝑀1 ) = (2𝜋)−(𝑛+1)⁄2 ∫ ( 2 ) 𝜎 0 ∞ ( ∫ exp {− −∞ 1 2 [(𝛼 − 𝜇𝛼 )2 + ∑(𝑦𝑗 − 𝛼) ]} 𝑑𝛼 ) 𝑑𝜎 2 . 2 2𝜎 𝑗 Isolating the part of the integrand that is a quadratic expression in , we complete the square: 2 (𝛼 − 𝜇𝛼 )2 + ∑(𝑦𝑗 − 𝛼) 𝑗 ̅̅̅2 = 𝛼 2 − 2𝛼𝜇𝛼 + 𝜇𝛼 2 + 𝑛𝛼 2 − 2𝛼𝑛𝑦̅ + 𝑛𝑦 ̅̅̅2 + 𝜇𝛼 2 ) = (𝑛 + 1)𝛼 2 − 2𝛼(𝑛𝑦̅ + 𝜇𝛼 ) + (𝑛𝑦 ̅̅̅2 − 𝑛(𝑦̅)2 ) + 𝜇𝛼 2 + 𝑛(𝑦̅)2 − = (𝑛 + 1)(𝛼 − 𝛼̂)2 + (𝑛𝑦 𝑛2 (𝑦̅)2 + 2𝑛𝑦̅𝜇𝛼 + 𝜇𝛼 2 𝑛+1 2 2 (𝑦 2 2 2 2 (𝑦 2 2 ̅̅̅2 − (𝑦̅)2 ) + 𝑛𝜇𝛼 + 𝑛 ̅) + 𝜇𝛼 + 𝑛(𝑦̅) − 𝑛 ̅) + 2𝑛𝑦̅𝜇á + 𝜇𝛼 = (𝑛 + 1)(𝛼 − 𝛼̂)2 + 𝑛(𝑦 𝑛+1 𝑛+1 ̅̅̅2 − (𝑦̅)2 ) + = (𝑛 + 1)(𝛼 − 𝛼̂)2 + 𝑛(𝑦 1 (𝑛(𝑦̅)2 − 2𝑛𝑦̅𝜇𝛼 + 𝑛𝜇𝛼 2 ) 𝑛+1 ̅̅̅2 − (𝑦̅)2 ) + = (𝑛 + 1)(𝛼 − 𝛼̂)2 + 𝑛(𝑦 𝑛 (𝑦̅ − 𝜇𝛼 )2 𝑛+1 = (𝑛 + 1)(𝛼 − 𝛼̂)2 + 𝑆𝑆𝑅1 . Substituting back into the integral, we have (𝑛+1) 𝑝(𝑦|𝑀1 ) = (2𝜋)−(𝑛+1)⁄2 +1 ∞ 1 ∞ 𝑆𝑆𝑅 𝑛+1 2 exp (− 21 ) (∫−∞ exp [− 2 (𝛼 ∫0 (𝜎2 ) 2𝜎 2𝜎 − 𝛼̂)2 ] 𝑑𝛼 ) 𝑑𝜎 2 , 3 ∞ 1 𝑛 +1 2 𝑝(𝑦|𝑀1 ) = (2𝜋)−𝑛⁄2 (𝑛 + 1)−1⁄2 ∫0 (𝜎2 ) exp (− 𝑆𝑆𝑅1 ) 𝑑𝜎 2 , 2𝜎 2 𝑛 2 𝑝(𝑦|𝑀1 ) = (2𝜋)−𝑛⁄2  ( ) (𝑛 + 1)−1⁄2 (𝑆𝑆𝑅1 )−𝑛⁄2 . The Bayes factor is 𝑛 𝑝(𝑦|𝑀0 ) 𝑆𝑆𝑅1 2 𝐵𝐹 = = √𝑛 + 1 ( 2 ) . ̅̅̅ 𝑝(𝑦|𝑀1 ) 𝑛𝑦 (S2) Equations (S1) and (S2) together are equivalent to equations (9), (11), and (12) of the main paper. Bayes factor for two-sample data ′ Suppose that 𝑋𝑖,𝑗 and 𝑋𝑖,𝑗 are independent. Define 𝑦𝑗 = 𝑥𝑖,𝑗 , 𝑗 = 1, … , 𝑛, (S3) ′ 𝑦𝑛+𝑗 = 𝑥𝑖,𝑗 , 𝑗 = 1, … , 𝑛′ . (S4) The hypothesis of equivalent expression is M0: 𝑦𝑗 = 𝛽 + 𝜀𝑗 , 𝑗 = 1, … , 𝑛 + 𝑛′, and the hypothesis of differential expression is M1: 𝑦𝑗 = 𝛽 + 𝜀𝑗 , 𝑗 = 1, … , 𝑛, 𝑦𝑗 = 𝛼 + 𝜀𝑗 , 𝑗 = 𝑛 + 1, … , 𝑛 + 𝑛′. Preliminaries To fix notation, let 𝑛+𝑛′ 1 ̅̅̅ 𝑦2 = ∑ 𝑦𝑗 2 , 𝑛+𝑚 𝑗=1 𝑛+𝑛′ 1 𝑦̅ = ∑ 𝑦𝑗 , 𝑛+𝑚 𝑗=1 𝑛+𝑛′ 1 𝑦𝑎 = ̅̅̅ ∑ 𝑦𝑗 , 𝑛′ 𝑗=𝑛+1 𝑛 1 𝑦𝑏 = ∑ 𝑦𝑗 . ̅̅̅ 𝑛 𝑗=1 4 Before beginning the derivation of the Bayes factor, we note that the maximum likelihood estimates under M1 are 𝛼𝑀𝐿𝐸 = ̅̅̅, 𝑦𝑎 𝛽𝑀𝐿𝐸 = ̅̅̅, 𝑦𝑏 and the sum of squares of the residuals using the MLEs is 2 2 ̅̅̅2 − 𝑚(𝑦 𝑆𝑆𝑅𝑀𝐿𝐸 = (𝑛 + 𝑚)𝑦 ̅̅̅) − 𝑛(𝑦 ̅̅̅) 𝑎 𝑏 . Prior distributions For both models, we set the prior for (𝛽, 𝜎 2 ) to be 1 𝑝(𝛽, 𝜎 2 ) ∝ 𝜎2 . For the extra parameter in M1, we use the unit information prior centered at 𝛼 = 𝛽, 𝑝(𝛼|𝛽, 𝜎 2 ) = 1 1 exp [− 2 (𝛼 𝜎 √2𝜋 2𝜎 − 𝛽)2 ]. Null model prior predictive distribution The prior predictive probability of the data under M0 is ∞ ∞ 𝑛+𝑛′ 𝑝(𝑦|𝑀0 ) = ∫ 𝑝(𝛽, 𝜎 2 ) ∫ ∏ 𝑝(𝑦𝑗 |𝛽, 𝜎 2 ) 𝑑𝛽 𝑑𝜎 2 , −∞ 𝑗=1 0 ∞ −𝑛+𝑛′ 2 𝑝(𝑦|𝑀0 ) = (2𝜋) 𝑛+𝑛′ +1 2 1 ∫ ( 2) 𝜎 0 −∞ ∞ −𝑛+𝑛′−1 2 𝑝(𝑦|𝑀0 ) = (2𝜋) ∞ ̅̅̅2 − (𝑦̅)2 ) (𝑛 + 𝑛′)(𝑦 𝑛(𝛽 − 𝑦̅)2 exp (− ( ∫ exp ) (− ) 𝑑𝛽 ) 𝑑𝜎 2 , 2𝜎 2 2𝜎 2 −12 (𝑛 + 𝑛′) 𝑛+𝑛′−1 +1 2 1 ∫ ( 2) 𝜎 exp (− 0 𝑛+𝑛′−1 2 𝑝(𝑦|𝑀0 ) = (2𝜋)− 1 (𝑛 + 𝑛′)−2  ( ̅̅̅2 − (𝑦̅)2 ) (𝑛 + 𝑛′)(𝑦 ) 𝑑𝜎 2 , 2𝜎 2 𝑛 + 𝑛′ − 1 ̅̅̅2 − (𝑦̅)2 )]−(𝑛+𝑛′−1)⁄2 . ) [(𝑛 + 𝑛′)(𝑦 2 Alternative model prior predictive distribution We define in advance: 𝐸(𝛼|𝛽, 𝑦) = 𝛽̂ = ( (𝑛′𝑦 ̅̅̅ 𝑎 + 𝛽) , 𝑛′ + 1 (𝑛 + 𝑛𝑛′)𝑦 ̅̅̅ ̅̅̅ 𝑏 + 𝑛′𝑦 𝑎 ), ′ 𝑛 + 𝑛 + 𝑛𝑛′ 5 𝑆𝑆𝑅1 = 𝑆𝑆𝑅𝑀𝐿𝐸 + 𝑛𝑛′ (𝑦 ̅̅̅ − ̅̅̅) 𝑦𝑏 2 . (𝑛 + 𝑛′ + 𝑛𝑛′) 𝑎 As before, the effective sum of squares of the residuals under M1 is the sum of the SSR using the maximum likelihood estimators and a penalty term for disagreement between the MLEs and the prior distribution. Before dealing with the marginal probability of the data under M1, we re-arrange the quadratic expression in  and  to ease the integrations. 𝑛 𝑛+𝑛′ 2 2 2 (𝛼 − 𝛽) + ∑(𝑦𝑗 − 𝛽) + ∑ (𝑦𝑗 − 𝛼) 𝑗=1 𝑗=𝑛+1 2 ̅̅̅2 = 𝛼 2 + 𝛽 2 − 2𝛼𝛽 + 𝑛𝛽 2 − 2𝑛𝑦 ̅̅̅𝛽 ̅̅̅𝛼 𝑏 + 𝑛′𝛼 − 2𝑛′𝑦 𝑎 + (𝑛 + 𝑛′)𝑦 2 ̅̅̅2 = (𝑛 + 1)𝛽 2 − 2𝑛𝑦 ̅̅̅𝛽 ̅̅̅ 𝑏 + (𝑛′ + 1)𝛼 − 2(𝑛′𝑦 𝑎 + 𝛽)𝛼 + (𝑛 + 𝑛′)𝑦 2 ̅̅̅2 = (𝑛 + 1)𝛽 2 − 2𝑛𝑦 ̅̅̅𝛽 𝑏 + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦 − =( =( =( =( =( 2 (𝑛′𝑦 ̅̅̅ 𝑎 + 𝛽) 𝑛′ + 1 𝑛 + 𝑛′ + 𝑛𝑛′ 2 𝑛′ 2 ̅̅̅2 ̅̅̅ 𝑦𝑎 𝛽 + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦 ̅̅̅) ) 𝛽 − 2 (𝑛𝑦 𝑏+ 𝑛′ + 1 𝑛′ + 1 𝑛′2 (𝑦 − ̅̅̅)2 𝑛′ + 1 𝑎 (𝑛 + 𝑛𝑛′)𝑦 𝑛 + 𝑛′ + 𝑛𝑛′ ̅̅̅ ̅̅̅ 2 𝑏 + 𝑛′𝑦 𝑎 ̅̅̅2 ) [𝛽 2 − 2 ( ) 𝛽] + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽)) + (𝑛 + 𝑛′)𝑦 𝑛′ + 1 𝑛 + 𝑛′ + 𝑛𝑛′ 𝑛′2 (𝑦 − ̅̅̅)2 𝑛′ + 1 𝑎 2 𝑛 + 𝑛′ + 𝑛𝑛′ 2 2 ̅̅̅2 − 𝑛′ (𝑦 ̅̅̅)2 ) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦 𝑛′ + 1 𝑛′ + 1 𝑎 2 [(𝑛 + 𝑛𝑛′)𝑦 ̅̅̅ ̅̅̅] 𝑏 + 𝑛′𝑦 𝑎 − (𝑛 + 𝑛′ + 𝑛𝑛′)(𝑛′ + 1) 2 𝑛 + 𝑛′ + 𝑛𝑛′ 2 2 ̅̅̅2 − 𝑛′ (𝑦 ̅̅̅)2 ) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦 𝑛′ + 1 𝑛′ + 1 𝑎 2 2 (𝑛 + 𝑛𝑛′)2 (𝑦 ̅̅̅) + 𝑛′2 (𝑦 ̅̅̅) + 2𝑛′(𝑛 + 𝑛𝑛′)𝑦 ̅̅̅ 𝑦𝑏 𝑏 𝑎 𝑎 ̅̅̅ − (𝑛 + 𝑛′ + 𝑛𝑛′)(𝑛′ + 1) 𝑛 + 𝑛′ + 𝑛𝑛′ 2 2 ̅̅̅2 ) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦 𝑛′ + 1 2 2 𝑛2 (𝑛′ + 1)(𝑦 ̅̅̅) + 𝑛′2 (𝑛 + 1)(𝑦 ̅̅̅) − 2𝑛𝑛′𝑦 ̅̅̅ 𝑦𝑏 𝑏 𝑎 𝑎 ̅̅̅ − (𝑛 + 𝑛′ + 𝑛𝑛′) 6 =( 𝑛 + 𝑛′ + 𝑛𝑛′ 2 2 2 2 ̅̅̅2 − 𝑛′(𝑦 ̅̅̅) − 𝑛(𝑦 ̅̅̅) ) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦 𝑎 𝑏 𝑛′ + 1 2 2 𝑛𝑛′(𝑦 ̅̅̅) + 𝑛𝑛′(𝑦 ̅̅̅) − 2𝑛𝑛′𝑦 ̅̅̅ 𝑦𝑏 𝑏 𝑎 𝑎 ̅̅̅ + (𝑛 + 𝑛′ + 𝑛𝑛′) =( 𝑛 + 𝑛′ + 𝑛𝑛′ 𝑛𝑛′ 2 2 (𝑦 ̅̅̅ − ̅̅̅) 𝑦𝑏 2 ) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + 𝑆𝑆𝑅𝑀𝐿𝐸 + (𝑛 + 𝑛′ + 𝑛𝑛′) 𝑎 𝑚+1 =( 𝑛 + 𝑛′ + 𝑛𝑛′ 2 2 ) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + 𝑆𝑆𝑅1 . 𝑛′ + 1 The marginal probability of the data under M1 is ∞ ∞ ∞ 𝑛 𝑝(𝑦|𝑀1 ) = ∫ ∫ ∫ 𝑝(𝛼̂, 𝜎 2 )𝑝(𝛼|𝛽, 𝜎 2) 𝑛+𝑛′ 2 ∏ 𝑝(𝑦𝑗 |𝜇 = 𝛽, 𝜎 ) ∏ 𝑝(𝑦𝑗 |𝜇 = 𝛼, 𝜎 2 ) 𝑑𝛼 𝑑𝛽 𝑑𝜎 2 , 𝑖=𝑗 0 −∞ −∞ ∞ (𝑛+𝑛′+1) +1 2 1 ∫ ( 2) 𝜎 −𝑛+𝑛′+1 2 𝑝(𝑦|𝑀1 ) = (2𝜋) 0 ∞ 𝑖=𝑛+1 ∞ 𝑛 1 2 ∫ ∫ exp [− 2 ((𝛼 − 𝛽)2 + ∑(𝑦𝑗 − 𝛽) 2𝜎 𝑗=1 −∞ −∞ 𝑛+𝑛′ 2 + ∑ (𝑦𝑗 − 𝛼) )] 𝑑𝛼 𝑑𝛽 𝑑𝜎 2 , 𝑗=𝑛+1 ∞ −𝑛+𝑛′+1 2 𝑝(𝑦|𝑀1 ) = (2𝜋) (𝑛+𝑛′+1) +1 2 1 ∫ ( 2) 𝜎 0 ∞ 𝑆𝑆𝑅1 𝑛 + 𝑛′ + 𝑛𝑛′ exp (− ) ( ∫ exp [− 2 ] 𝑑𝛽 ) 2 2𝜎 2𝜎 (𝑚 + 1) −∞ ∞ × ( ∫ exp [− −∞ ∞ −𝑛+𝑛′−1 2 𝑝(𝑦|𝑀1 ) = (2𝜋) −12 (𝑛 + 𝑛′ + 𝑛𝑛′) (𝑚 + 1) 2 (𝛼 − 𝐸(𝛼|𝛽, 𝑦)) ] 𝑑𝛼 ) 𝑑𝜎 2 , 2 2𝜎 (𝑛+𝑛′−1) +1 2 1 ∫ ( 2) 𝜎 exp (− 0 𝑛+𝑚−1 2 𝑝(𝑦|𝑀1 ) = (2𝜋)− 1 (𝑛 + 𝑛′ + 𝑛𝑛′)−2  ( 𝑆𝑆𝑅1 ) 𝑑𝜎 2 , 2𝜎 2 𝑛 + 𝑛′ − 1 ) (𝑆𝑆𝑅1 )−(𝑛+𝑛′−1)⁄2 . 2 The Bayes factor is 𝑝(𝑦|𝑀0 ) 𝑛 + 𝑛′ + 𝑛𝑛′ 𝑆𝑆𝑅1 𝐵𝐹 = =√ ( ̅̅̅2 − (𝑦̅)2 )) 𝑝(𝑦|𝑀1 ) 𝑛 + 𝑛′ (𝑛 + 𝑛′)(𝑦 𝑛+𝑛′−1 2 . (S5) Equations (S3), (S4) and (S5) together are equivalent to equations (10), (11), and (12) of the main paper. 7 B.1. Derivation of posterior distribution for sampling variances Under the null hypothesis with non-paired data, the data have the same mean, but the null hypothesis says nothing about sampling variances for the treatment and control data. In section A, the variances were treated as identical. Here we treat them as unrelated. Suppose that 𝑦𝑗 = 𝛽 + 𝜎𝜀𝑗 , 𝑗 = 1, … , 𝑛, 𝑦𝑗 = 𝛽 + 𝜎′𝜀𝑗 , 𝑗 = 𝑛 + 1, … , 𝑛 + 𝑛′. where j are independent and identically Gaussian with mean zero and variance 1, 2 is the variance of the control data, and ’2 is the sampling variance of the treatment data. To calculate the posterior predictive variance of a new treatment data point minus a new control data point, we need (up to proportionality) the posterior distribution of 2 and ’2, which we derive here. Prior distribution We set 1 1 𝑝(𝛽, 𝜎 2 , 𝜎 ′2 ) ∝ 𝜎2 𝜎′2 . Posterior distribution Define 𝑛+𝑛′ 1 𝑦𝑎 = ̅̅̅ ∑ 𝑦𝑗 , 𝑛′ 𝑗=𝑛+1 𝑛 1 𝑦𝑏 = ∑ 𝑦𝑗 , ̅̅̅ 𝑛 𝑗=1 𝑛+𝑛′ 2 𝑆𝑆𝑅′ = ∑ (𝑦𝑗 − ̅̅̅) 𝑦𝑎 , 𝑗=𝑛+1 𝑛 2 𝑆𝑆𝑅 = ∑(𝑦𝑗 − ̅̅̅) 𝑦𝑏 . 𝑗=1 8 Before dealing with the posterior distribution of 2 and ’2, we re-arrange the quadratic expression in to ease the integration that will follow. 𝑛(𝛽 − ̅̅̅) 𝑦𝑏 2 𝑛′(𝛽 − ̅̅̅) 𝑦𝑎 2 + 𝜎2 𝜎′2 = 𝑛 𝑛′ ̅̅̅ 𝑦𝑏 2 ) + 2 (𝛽 2 − 2𝛽𝑦 ̅̅̅ 𝑦𝑎 2 ), (𝛽 2 − 2𝛽𝑦 𝑏 + ̅̅̅ 𝑎 + ̅̅̅ 2 𝜎 𝜎′ =( 2 2 𝑛 𝑛′ 𝑛𝑦 ̅̅̅ 𝑛′𝑦 ̅̅̅ 𝑛𝑦 ̅̅̅ 𝑛′𝑦 ̅̅̅ 𝑏 𝑎 𝑏 𝑎 2 + 𝛽 − 2𝛽 + + + , ) ( 2 ) 𝜎 2 𝜎′2 𝜎 𝜎′2 𝜎2 𝜎′2 =( 2 2 𝑛 𝑛′ 𝑛𝜎′2 ̅̅̅ 𝑦𝑏 + 𝑛′ 𝜎 2 ̅̅̅ 𝑦𝑎 𝑛𝑦 ̅̅̅ 𝑛′𝑦 ̅̅̅ 𝑏 𝑎 2 + [𝛽 − 2𝛽 + + , ) ( )] 𝜎 2 𝜎′2 𝑛𝜎′2 + 𝑛′ 𝜎 2 𝜎2 𝜎′2 2 2 2 (𝑛𝜎′2 ̅̅̅ 𝑛 𝑛′ 𝑛𝜎′2 ̅̅̅ 𝑦𝑏 + 𝑛′ 𝜎 2 ̅̅̅ 𝑦𝑎 𝑛𝑦 ̅̅̅ 𝑛′𝑦 ̅̅̅ 𝑦𝑏 + 𝑛′ 𝜎 2 ̅̅̅) 𝑦𝑎 2 𝑏 𝑎 = ( 2 + 2 ) [𝛽 − ( + + − . )] 𝜎 𝜎′ 𝑛𝜎′2 + 𝑛′ 𝜎 2 𝜎2 𝜎′2 𝑛𝜎′2 + 𝑛′ 𝜎 2 For the full set of parameters, the posterior distribution is, 2 𝑛 𝑛′ (𝜎 2 )−(2 + 1) (𝜎 ′2 )−( 2 +1) ′2 𝑝(𝛽, 𝜎 , 𝜎 |𝑦) ∝ 𝑛 𝑛′ 1 𝑛(𝛽 −( + 1) ′2 −( +1) 2 (𝜎 ) 2 exp [− ( = (𝜎 2 ) = 2 𝑛 𝑛′ (𝜎 2 )−(2 + 1) (𝜎 ′2 )−( 2 +1) − 2 exp [− ∑𝑛𝑗=1(𝛽 − 𝑦𝑗 ) 2𝜎 2 2 − ∑𝑛′ 𝑗=𝑛+1(𝛽 − 𝑦𝑗 ) 2𝜎 2 ], − ̅̅̅) 𝑦𝑏 2 𝑛′(𝛽 − ̅̅̅) 𝑦𝑎 2 𝑆𝑆𝑅 𝑆𝑆𝑅′ + + 2 + 2 )], 𝜎2 𝜎′2 𝜎 𝜎′ 2 2 2 1 𝑛 𝑛′ 𝑛𝜎′2 ̅̅̅ 𝑦𝑏 + 𝑛′ 𝜎 2 ̅̅̅ 𝑦𝑎 𝑛𝑦 ̅̅̅ 𝑛′𝑦 ̅̅̅ 𝑏 𝑎 exp [− {( 2 + 2 ) [𝛽 − ( + + )] 2 𝜎 𝜎′ 𝑛𝜎′2 + 𝑛′ 𝜎 2 𝜎2 𝜎′2 (𝑛𝜎′2 ̅̅̅ 𝑦𝑏 + 𝑛′ 𝜎 2 ̅̅̅) 𝑦𝑎 2 𝑆𝑆𝑅 𝑆𝑆𝑅′ + 2 + 2 }]. 𝑛𝜎′2 + 𝑛′ 𝜎 2 𝜎 𝜎′ Next we marginalize  to eliminate it from the posterior distribution. ∞ 𝑝(𝜎 2 , 𝜎 ′2 |𝑦) = ∫−∞ 𝑝(𝛽, 𝜎 2 , 𝜎 ′2 |𝑦)𝑑𝛽, 2 𝑛 𝑛′ 1 𝑛𝑦 ̅̅̅ 𝑏 −( + 1) ′2 −( +1) 2 (𝜎 ) 2 exp [− { 2 2 𝜎 ∝ (𝜎 2 ) + 2 (𝑛𝜎′2 ̅̅̅ 𝑛′𝑦 ̅̅̅ 𝑦𝑏 + 𝑛′ 𝜎 2 ̅̅̅) 𝑦𝑎 2 𝑆𝑆𝑅 𝑎 − + 2 𝜎′2 𝑛𝜎′2 + 𝑛′ 𝜎 2 𝜎 2 + ∞ 𝑆𝑆𝑅′ 1 𝑛 𝑛′ 𝑛𝜎′2 ̅̅̅ 𝑦𝑏 + 𝑛′ 𝜎 2 ̅̅̅ 𝑦𝑎 ∫ exp + [𝛽 − }] {− ( ) ( )] } 𝑑𝛽, 2 2 2 2 ′ 2 𝜎′ 2 𝜎 𝜎′ 𝑛𝜎′ + 𝑛 𝜎 −∞ 1 = 𝑛 𝑛′ (𝜎 2 )−(2 + 1) (𝜎 ′2 )−( 2 +1) ( + 2 2 (𝑛𝜎′2 ̅̅̅ 𝑛 𝑛′ 2 1 𝑛𝑦 ̅̅̅ 𝑛′𝑦 ̅̅̅ 𝑦𝑏 + 𝑛′ 𝜎 2 ̅̅̅) 𝑦𝑎 2 𝑆𝑆𝑅 𝑏 𝑎 + exp [− + − + 2 ) ( 𝜎 2 𝜎′2 2 𝜎2 𝜎′2 𝑛𝜎′2 + 𝑛′ 𝜎 2 𝜎 𝑆𝑆𝑅′ )]. 𝜎′2 9 This is the final expression. It is not a standard distribution, but it is easy to generate Markov chain Monte Carlo samples from it. 10

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Bayes factor for paired data