Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Supplementary Information Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing Corey M. Yanofsky and David R. Bickel A.1. Heuristic overview of the derivation of the Bayes factor This document contains an explicit derivation of the Bayes factor used in the main paper for both paired and unpaired data. In each case, there are two models for the data: the null model in which the gene is equivalently expressed in the two conditions, and the alternative model in which the gene is differentially expressed. The derivation of the Bayes factor requires two components per model. The first component is the probability distribution of the data conditional on some statistical parameters; this is termed the likelihood function. The differential expression model will always have one extra parameter to take into account the fact that the geneβs expression level is different across conditions. The data are always modeled as: observed datum = average data level + measurement error. Here, the average data level is an unknown parameter. Throughout, the measurement errors are assumed to be independent and identically distributed as Gaussian random variables. That is, for all j, π( ππ |π 2 ) = π 1 1 exp [β 2π2 ππ 2 ], 2π β where ππ is the measurement error of the jth observation and π 2 is the data variance . The second component is the prior distribution of the model parameters, namely, the baseline expression level of the gene and the experimental variability of the data. The prior distribution summarizes everything that is known about the model parameters prior to observing the data. Since the basic expression level of the gene and the variability of the data are unknown, we use standard default priors for them. The extra parameter in the alternative model measures the amount of differential expression. Here, we use what has been called a unit-information prior distribution, that is, a prior distribution that contains exactly as much information as one extra data point. The unitinformation prior is weakly informative, so it will not unduly influence the results in favor of either model. To calculate the Bayes factor, we marginalize the model parameters; that is, we integrate the likelihood function with respect to the prior distribution, resulting in a prior predictive distribution. The marginalization removes the nuisance parameters from the expression. The Bayes factor is the ratio of the prior predictive distributions under the null and alternative models. 1 A.2 Derivations Bayes factor for paired data β² Suppose n = nβ² and π₯π,π is paired with π₯π,π . Let β² π¦π = π₯π,π β π₯π,π . (S1) The hypothesis of equivalent expression is M0: yj = ο₯j, and the hypothesis of differential expression is M1: yj = ο‘ + ο₯j. Prior distributions For both models, we set π(π 2 ) β 1 . π2 For M1, we use the unit information prior π(πΌ|π 2 ) = 1 πβ2π exp [β 1 (πΌ β ππΌ )2 ]. 2π 2 In the main text of the paper, the prior mean ππΌ is set to zero. Null model prior predictive distribution The prior predictive distribution of the data under M0 is β π(π¦|π0 ) = β« π(π 2 ) β π(π¦π |π = 0, π 2 ) ππ 2 , 0 π π β Μ Μ Μ 2 1 2 +1 ππ¦ β2 βπ π(π¦|π0 ) = (2π) β« ( 2) exp (β 2 ) ππ 2 , π 2π 0 π βπβ2 Μ Μ Μ 2 ) π(π¦|π0 ) = (2π)βπβ2 ο (2 ) (ππ¦ . Alternative model prior predictive distribution We define in advance: πΌΜ = (ππ¦Μ +ππΌ ) π+1 , 2 2 πππ 1 = (ππΌ β πΌΜ)2 + β(π¦π β πΌΜ) . π After some algebra, we can derive Μ Μ Μ 2 β (π¦Μ )2 ) + πππ 1 = π(π¦ π (π¦Μ β ππΌ )2 . π+1 Here, SSR1 is the effective sum of squares of the residuals under M1. It is the sum of the SSR using the maximum likelihood estimator πΌππΏπΈ = π¦Μ and a term that penalizes disagreement between the MLE and the prior mean. The prior predictive distribution of the data under M1 is β β π(π¦|π1 ) = β« β« π(π 2 )π(πΌ|π 2 ) β π(π¦π |π = πΌ, π 2 ) ππΌ ππ 2 , π 0 ββ β (π+1) +1 2 1 π(π¦|π1 ) = (2π)β(π+1)β2 β« ( 2 ) π 0 β ( β« exp {β ββ 1 2 [(πΌ β ππΌ )2 + β(π¦π β πΌ) ]} ππΌ ) ππ 2 . 2 2π π Isolating the part of the integrand that is a quadratic expression in ο‘, we complete the square: 2 (πΌ β ππΌ )2 + β(π¦π β πΌ) π Μ Μ Μ 2 = πΌ 2 β 2πΌππΌ + ππΌ 2 + ππΌ 2 β 2πΌππ¦Μ + ππ¦ Μ Μ Μ 2 + ππΌ 2 ) = (π + 1)πΌ 2 β 2πΌ(ππ¦Μ + ππΌ ) + (ππ¦ Μ Μ Μ 2 β π(π¦Μ )2 ) + ππΌ 2 + π(π¦Μ )2 β = (π + 1)(πΌ β πΌΜ)2 + (ππ¦ π2 (π¦Μ )2 + 2ππ¦Μ ππΌ + ππΌ 2 π+1 2 2 (π¦ 2 2 2 2 (π¦ 2 2 Μ Μ Μ 2 β (π¦Μ )2 ) + πππΌ + π Μ ) + ππΌ + π(π¦Μ ) β π Μ ) + 2ππ¦Μ πá + ππΌ = (π + 1)(πΌ β πΌΜ)2 + π(π¦ π+1 π+1 Μ Μ Μ 2 β (π¦Μ )2 ) + = (π + 1)(πΌ β πΌΜ)2 + π(π¦ 1 (π(π¦Μ )2 β 2ππ¦Μ ππΌ + πππΌ 2 ) π+1 Μ Μ Μ 2 β (π¦Μ )2 ) + = (π + 1)(πΌ β πΌΜ)2 + π(π¦ π (π¦Μ β ππΌ )2 π+1 = (π + 1)(πΌ β πΌΜ)2 + πππ 1 . Substituting back into the integral, we have (π+1) π(π¦|π1 ) = (2π)β(π+1)β2 +1 β 1 β πππ π+1 2 exp (β 21 ) (β«ββ exp [β 2 (πΌ β«0 (π2 ) 2π 2π β πΌΜ)2 ] ππΌ ) ππ 2 , 3 β 1 π +1 2 π(π¦|π1 ) = (2π)βπβ2 (π + 1)β1β2 β«0 (π2 ) exp (β πππ 1 ) ππ 2 , 2π 2 π 2 π(π¦|π1 ) = (2π)βπβ2 ο ( ) (π + 1)β1β2 (πππ 1 )βπβ2 . The Bayes factor is π π(π¦|π0 ) πππ 1 2 π΅πΉ = = βπ + 1 ( 2 ) . Μ Μ Μ π(π¦|π1 ) ππ¦ (S2) Equations (S1) and (S2) together are equivalent to equations (9), (11), and (12) of the main paper. Bayes factor for two-sample data β² Suppose that ππ,π and ππ,π are independent. Define π¦π = π₯π,π , π = 1, β¦ , π, (S3) β² π¦π+π = π₯π,π , π = 1, β¦ , πβ² . (S4) The hypothesis of equivalent expression is M0: π¦π = π½ + ππ , π = 1, β¦ , π + πβ², and the hypothesis of differential expression is M1: π¦π = π½ + ππ , π = 1, β¦ , π, π¦π = πΌ + ππ , π = π + 1, β¦ , π + πβ². Preliminaries To fix notation, let π+πβ² 1 Μ Μ Μ π¦2 = β π¦π 2 , π+π π=1 π+πβ² 1 π¦Μ = β π¦π , π+π π=1 π+πβ² 1 π¦π = Μ Μ Μ β π¦π , πβ² π=π+1 π 1 π¦π = β π¦π . Μ Μ Μ π π=1 4 Before beginning the derivation of the Bayes factor, we note that the maximum likelihood estimates under M1 are πΌππΏπΈ = Μ Μ Μ , π¦π π½ππΏπΈ = Μ Μ Μ , π¦π and the sum of squares of the residuals using the MLEs is 2 2 Μ Μ Μ 2 β π(π¦ πππ ππΏπΈ = (π + π)π¦ Μ Μ Μ ) β π(π¦ Μ Μ Μ ) π π . Prior distributions For both models, we set the prior for (π½, π 2 ) to be 1 π(π½, π 2 ) β π2 . For the extra parameter in M1, we use the unit information prior centered at πΌ = π½, π(πΌ|π½, π 2 ) = 1 1 exp [β 2 (πΌ π β2π 2π β π½)2 ]. Null model prior predictive distribution The prior predictive probability of the data under M0 is β β π+πβ² π(π¦|π0 ) = β« π(π½, π 2 ) β« β π(π¦π |π½, π 2 ) ππ½ ππ 2 , ββ π=1 0 β βπ+πβ² 2 π(π¦|π0 ) = (2π) π+πβ² +1 2 1 β« ( 2) π 0 ββ β βπ+πβ²β1 2 π(π¦|π0 ) = (2π) β Μ Μ Μ 2 β (π¦Μ )2 ) (π + πβ²)(π¦ π(π½ β π¦Μ )2 exp (β ( β« exp ) (β ) ππ½ ) ππ 2 , 2π 2 2π 2 β12 (π + πβ²) π+πβ²β1 +1 2 1 β« ( 2) π exp (β 0 π+πβ²β1 2 π(π¦|π0 ) = (2π)β 1 (π + πβ²)β2 ο ( Μ Μ Μ 2 β (π¦Μ )2 ) (π + πβ²)(π¦ ) ππ 2 , 2π 2 π + πβ² β 1 Μ Μ Μ 2 β (π¦Μ )2 )]β(π+πβ²β1)β2 . ) [(π + πβ²)(π¦ 2 Alternative model prior predictive distribution We define in advance: πΈ(πΌ|π½, π¦) = π½Μ = ( (πβ²π¦ Μ Μ Μ π + π½) , πβ² + 1 (π + ππβ²)π¦ Μ Μ Μ Μ Μ Μ π + πβ²π¦ π ), β² π + π + ππβ² 5 πππ 1 = πππ ππΏπΈ + ππβ² (π¦ Μ Μ Μ β Μ Μ Μ ) π¦π 2 . (π + πβ² + ππβ²) π As before, the effective sum of squares of the residuals under M1 is the sum of the SSR using the maximum likelihood estimators and a penalty term for disagreement between the MLEs and the prior distribution. Before dealing with the marginal probability of the data under M1, we re-arrange the quadratic expression in ο‘ and ο’ to ease the integrations. π π+πβ² 2 2 2 (πΌ β π½) + β(π¦π β π½) + β (π¦π β πΌ) π=1 π=π+1 2 Μ Μ Μ 2 = πΌ 2 + π½ 2 β 2πΌπ½ + ππ½ 2 β 2ππ¦ Μ Μ Μ π½ Μ Μ Μ πΌ π + πβ²πΌ β 2πβ²π¦ π + (π + πβ²)π¦ 2 Μ Μ Μ 2 = (π + 1)π½ 2 β 2ππ¦ Μ Μ Μ π½ Μ Μ Μ π + (πβ² + 1)πΌ β 2(πβ²π¦ π + π½)πΌ + (π + πβ²)π¦ 2 Μ Μ Μ 2 = (π + 1)π½ 2 β 2ππ¦ Μ Μ Μ π½ π + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦ β =( =( =( =( =( 2 (πβ²π¦ Μ Μ Μ π + π½) πβ² + 1 π + πβ² + ππβ² 2 πβ² 2 Μ Μ Μ 2 Μ Μ Μ π¦π π½ + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦ Μ Μ Μ ) ) π½ β 2 (ππ¦ π+ πβ² + 1 πβ² + 1 πβ²2 (π¦ β Μ Μ Μ )2 πβ² + 1 π (π + ππβ²)π¦ π + πβ² + ππβ² Μ Μ Μ Μ Μ Μ 2 π + πβ²π¦ π Μ Μ Μ 2 ) [π½ 2 β 2 ( ) π½] + (πβ² + 1)(πΌ β πΈ(πΌ|π½)) + (π + πβ²)π¦ πβ² + 1 π + πβ² + ππβ² πβ²2 (π¦ β Μ Μ Μ )2 πβ² + 1 π 2 π + πβ² + ππβ² 2 2 Μ Μ Μ 2 β πβ² (π¦ Μ Μ Μ )2 ) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦ πβ² + 1 πβ² + 1 π 2 [(π + ππβ²)π¦ Μ Μ Μ Μ Μ Μ ] π + πβ²π¦ π β (π + πβ² + ππβ²)(πβ² + 1) 2 π + πβ² + ππβ² 2 2 Μ Μ Μ 2 β πβ² (π¦ Μ Μ Μ )2 ) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦ πβ² + 1 πβ² + 1 π 2 2 (π + ππβ²)2 (π¦ Μ Μ Μ ) + πβ²2 (π¦ Μ Μ Μ ) + 2πβ²(π + ππβ²)π¦ Μ Μ Μ π¦π π π π Μ Μ Μ β (π + πβ² + ππβ²)(πβ² + 1) π + πβ² + ππβ² 2 2 Μ Μ Μ 2 ) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦ πβ² + 1 2 2 π2 (πβ² + 1)(π¦ Μ Μ Μ ) + πβ²2 (π + 1)(π¦ Μ Μ Μ ) β 2ππβ²π¦ Μ Μ Μ π¦π π π π Μ Μ Μ β (π + πβ² + ππβ²) 6 =( π + πβ² + ππβ² 2 2 2 2 Μ Μ Μ 2 β πβ²(π¦ Μ Μ Μ ) β π(π¦ Μ Μ Μ ) ) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦ π π πβ² + 1 2 2 ππβ²(π¦ Μ Μ Μ ) + ππβ²(π¦ Μ Μ Μ ) β 2ππβ²π¦ Μ Μ Μ π¦π π π π Μ Μ Μ + (π + πβ² + ππβ²) =( π + πβ² + ππβ² ππβ² 2 2 (π¦ Μ Μ Μ β Μ Μ Μ ) π¦π 2 ) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + πππ ππΏπΈ + (π + πβ² + ππβ²) π π+1 =( π + πβ² + ππβ² 2 2 ) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + πππ 1 . πβ² + 1 The marginal probability of the data under M1 is β β β π π(π¦|π1 ) = β« β« β« π(πΌΜ, π 2 )π(πΌ|π½, π 2) π+πβ² 2 β π(π¦π |π = π½, π ) β π(π¦π |π = πΌ, π 2 ) ππΌ ππ½ ππ 2 , π=π 0 ββ ββ β (π+πβ²+1) +1 2 1 β« ( 2) π βπ+πβ²+1 2 π(π¦|π1 ) = (2π) 0 β π=π+1 β π 1 2 β« β« exp [β 2 ((πΌ β π½)2 + β(π¦π β π½) 2π π=1 ββ ββ π+πβ² 2 + β (π¦π β πΌ) )] ππΌ ππ½ ππ 2 , π=π+1 β βπ+πβ²+1 2 π(π¦|π1 ) = (2π) (π+πβ²+1) +1 2 1 β« ( 2) π 0 β πππ 1 π + πβ² + ππβ² exp (β ) ( β« exp [β 2 ] ππ½ ) 2 2π 2π (π + 1) ββ β × ( β« exp [β ββ β βπ+πβ²β1 2 π(π¦|π1 ) = (2π) β12 (π + πβ² + ππβ²) (π + 1) 2 (πΌ β πΈ(πΌ|π½, π¦)) ] ππΌ ) ππ 2 , 2 2π (π+πβ²β1) +1 2 1 β« ( 2) π exp (β 0 π+πβ1 2 π(π¦|π1 ) = (2π)β 1 (π + πβ² + ππβ²)β2 ο ( πππ 1 ) ππ 2 , 2π 2 π + πβ² β 1 ) (πππ 1 )β(π+πβ²β1)β2 . 2 The Bayes factor is π(π¦|π0 ) π + πβ² + ππβ² πππ 1 π΅πΉ = =β ( Μ Μ Μ 2 β (π¦Μ )2 )) π(π¦|π1 ) π + πβ² (π + πβ²)(π¦ π+πβ²β1 2 . (S5) Equations (S3), (S4) and (S5) together are equivalent to equations (10), (11), and (12) of the main paper. 7 B.1. Derivation of posterior distribution for sampling variances Under the null hypothesis with non-paired data, the data have the same mean, but the null hypothesis says nothing about sampling variances for the treatment and control data. In section A, the variances were treated as identical. Here we treat them as unrelated. Suppose that π¦π = π½ + πππ , π = 1, β¦ , π, π¦π = π½ + πβ²ππ , π = π + 1, β¦ , π + πβ². where ο₯j are independent and identically Gaussian with mean zero and variance 1, ο³2 is the variance of the control data, and ο³β2 is the sampling variance of the treatment data. To calculate the posterior predictive variance of a new treatment data point minus a new control data point, we need (up to proportionality) the posterior distribution of ο³2 and ο³β2, which we derive here. Prior distribution We set 1 1 π(π½, π 2 , π β²2 ) β π2 πβ²2 . Posterior distribution Define π+πβ² 1 π¦π = Μ Μ Μ β π¦π , πβ² π=π+1 π 1 π¦π = β π¦π , Μ Μ Μ π π=1 π+πβ² 2 πππ β² = β (π¦π β Μ Μ Μ ) π¦π , π=π+1 π 2 πππ = β(π¦π β Μ Μ Μ ) π¦π . π=1 8 Before dealing with the posterior distribution of ο³2 and ο³β2, we re-arrange the quadratic expression in to ease the integration that will follow. π(π½ β Μ Μ Μ ) π¦π 2 πβ²(π½ β Μ Μ Μ ) π¦π 2 + π2 πβ²2 = π πβ² Μ Μ Μ π¦π 2 ) + 2 (π½ 2 β 2π½π¦ Μ Μ Μ π¦π 2 ), (π½ 2 β 2π½π¦ π + Μ Μ Μ π + Μ Μ Μ 2 π πβ² =( 2 2 π πβ² ππ¦ Μ Μ Μ πβ²π¦ Μ Μ Μ ππ¦ Μ Μ Μ πβ²π¦ Μ Μ Μ π π π π 2 + π½ β 2π½ + + + , ) ( 2 ) π 2 πβ²2 π πβ²2 π2 πβ²2 =( 2 2 π πβ² ππβ²2 Μ Μ Μ π¦π + πβ² π 2 Μ Μ Μ π¦π ππ¦ Μ Μ Μ πβ²π¦ Μ Μ Μ π π 2 + [π½ β 2π½ + + , ) ( )] π 2 πβ²2 ππβ²2 + πβ² π 2 π2 πβ²2 2 2 2 (ππβ²2 Μ Μ Μ π πβ² ππβ²2 Μ Μ Μ π¦π + πβ² π 2 Μ Μ Μ π¦π ππ¦ Μ Μ Μ πβ²π¦ Μ Μ Μ π¦π + πβ² π 2 Μ Μ Μ ) π¦π 2 π π = ( 2 + 2 ) [π½ β ( + + β . )] π πβ² ππβ²2 + πβ² π 2 π2 πβ²2 ππβ²2 + πβ² π 2 For the full set of parameters, the posterior distribution is, 2 π πβ² (π 2 )β(2 + 1) (π β²2 )β( 2 +1) β²2 π(π½, π , π |π¦) β π πβ² 1 π(π½ β( + 1) β²2 β( +1) 2 (π ) 2 exp [β ( = (π 2 ) = 2 π πβ² (π 2 )β(2 + 1) (π β²2 )β( 2 +1) β 2 exp [β βππ=1(π½ β π¦π ) 2π 2 2 β βπβ² π=π+1(π½ β π¦π ) 2π 2 ], β Μ Μ Μ ) π¦π 2 πβ²(π½ β Μ Μ Μ ) π¦π 2 πππ πππ β² + + 2 + 2 )], π2 πβ²2 π πβ² 2 2 2 1 π πβ² ππβ²2 Μ Μ Μ π¦π + πβ² π 2 Μ Μ Μ π¦π ππ¦ Μ Μ Μ πβ²π¦ Μ Μ Μ π π exp [β {( 2 + 2 ) [π½ β ( + + )] 2 π πβ² ππβ²2 + πβ² π 2 π2 πβ²2 (ππβ²2 Μ Μ Μ π¦π + πβ² π 2 Μ Μ Μ ) π¦π 2 πππ πππ β² + 2 + 2 }]. ππβ²2 + πβ² π 2 π πβ² Next we marginalize ο’ to eliminate it from the posterior distribution. β π(π 2 , π β²2 |π¦) = β«ββ π(π½, π 2 , π β²2 |π¦)ππ½, 2 π πβ² 1 ππ¦ Μ Μ Μ π β( + 1) β²2 β( +1) 2 (π ) 2 exp [β { 2 2 π β (π 2 ) + 2 (ππβ²2 Μ Μ Μ πβ²π¦ Μ Μ Μ π¦π + πβ² π 2 Μ Μ Μ ) π¦π 2 πππ π β + 2 πβ²2 ππβ²2 + πβ² π 2 π 2 + β πππ β² 1 π πβ² ππβ²2 Μ Μ Μ π¦π + πβ² π 2 Μ Μ Μ π¦π β« exp + [π½ β }] {β ( ) ( )] } ππ½, 2 2 2 2 β² 2 πβ² 2 π πβ² ππβ² + π π ββ 1 = π πβ² (π 2 )β(2 + 1) (π β²2 )β( 2 +1) ( + 2 2 (ππβ²2 Μ Μ Μ π πβ² 2 1 ππ¦ Μ Μ Μ πβ²π¦ Μ Μ Μ π¦π + πβ² π 2 Μ Μ Μ ) π¦π 2 πππ π π + exp [β + β + 2 ) ( π 2 πβ²2 2 π2 πβ²2 ππβ²2 + πβ² π 2 π πππ β² )]. πβ²2 9 This is the final expression. It is not a standard distribution, but it is easy to generate Markov chain Monte Carlo samples from it. 10