Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Supplementary Information
Validation of differential gene expression algorithms: Application comparing fold-change
estimation to hypothesis testing
Corey M. Yanofsky and David R. Bickel
A.1. Heuristic overview of the derivation of the Bayes factor
This document contains an explicit derivation of the Bayes factor used in the main paper for
both paired and unpaired data. In each case, there are two models for the data: the null model in
which the gene is equivalently expressed in the two conditions, and the alternative model in which
the gene is differentially expressed.
The derivation of the Bayes factor requires two components per model. The first component
is the probability distribution of the data conditional on some statistical parameters; this is termed
the likelihood function. The differential expression model will always have one extra parameter to
take into account the fact that the geneβs expression level is different across conditions.
The data are always modeled as:
observed datum = average data level + measurement error.
Here, the average data level is an unknown parameter. Throughout, the measurement errors are
assumed to be independent and identically distributed as Gaussian random variables. That is, for
all j,
π(
ππ |π 2 ) = π
1
1
exp [β 2π2 ππ 2 ],
2π
β
where ππ is the measurement error of the jth observation and π 2 is the data variance .
The second component is the prior distribution of the model parameters, namely, the
baseline expression level of the gene and the experimental variability of the data. The prior
distribution summarizes everything that is known about the model parameters prior to observing
the data. Since the basic expression level of the gene and the variability of the data are unknown, we
use standard default priors for them.
The extra parameter in the alternative model measures the amount of differential
expression. Here, we use what has been called a unit-information prior distribution, that is, a prior
distribution that contains exactly as much information as one extra data point. The unitinformation prior is weakly informative, so it will not unduly influence the results in favor of either
model.
To calculate the Bayes factor, we marginalize the model parameters; that is, we integrate
the likelihood function with respect to the prior distribution, resulting in a prior predictive
distribution. The marginalization removes the nuisance parameters from the expression. The Bayes
factor is the ratio of the prior predictive distributions under the null and alternative models.
1
A.2 Derivations
Bayes factor for paired data
β²
Suppose n = nβ² and π₯π,π
is paired with π₯π,π . Let
β²
π¦π = π₯π,π
β π₯π,π .
(S1)
The hypothesis of equivalent expression is
M0: yj = ο₯j,
and the hypothesis of differential expression is
M1: yj = ο‘ + ο₯j.
Prior distributions
For both models, we set
π(π 2 ) β
1
.
π2
For M1, we use the unit information prior
π(πΌ|π 2 ) =
1
πβ2π
exp [β
1
(πΌ β ππΌ )2 ].
2π 2
In the main text of the paper, the prior mean ππΌ is set to zero.
Null model prior predictive distribution
The prior predictive distribution of the data under M0 is
β
π(π¦|π0 ) = β« π(π 2 ) β π(π¦π |π = 0, π 2 ) ππ 2 ,
0
π
π
β
Μ
Μ
Μ
2
1 2 +1
ππ¦
β2
βπ
π(π¦|π0 ) = (2π)
β« ( 2)
exp (β 2 ) ππ 2 ,
π
2π
0
π
βπβ2
Μ
Μ
Μ
2 )
π(π¦|π0 ) = (2π)βπβ2 ο (2 ) (ππ¦
.
Alternative model prior predictive distribution
We define in advance:
πΌΜ =
(ππ¦Μ
+ππΌ )
π+1
,
2
2
πππ
1 = (ππΌ β πΌΜ)2 + β(π¦π β πΌΜ) .
π
After some algebra, we can derive
Μ
Μ
Μ
2 β (π¦Μ
)2 ) +
πππ
1 = π(π¦
π
(π¦Μ
β ππΌ )2 .
π+1
Here, SSR1 is the effective sum of squares of the residuals under M1. It is the sum of the SSR using
the maximum likelihood estimator πΌππΏπΈ = π¦Μ
and a term that penalizes disagreement between the
MLE and the prior mean.
The prior predictive distribution of the data under M1 is
β
β
π(π¦|π1 ) = β« β« π(π 2 )π(πΌ|π 2 ) β π(π¦π |π = πΌ, π 2 ) ππΌ ππ 2 ,
π
0 ββ
β
(π+1)
+1
2
1
π(π¦|π1 ) = (2π)β(π+1)β2 β« ( 2 )
π
0
β
( β« exp {β
ββ
1
2
[(πΌ β ππΌ )2 + β(π¦π β πΌ) ]} ππΌ ) ππ 2 .
2
2π
π
Isolating the part of the integrand that is a quadratic expression in ο‘, we complete the square:
2
(πΌ β ππΌ )2 + β(π¦π β πΌ)
π
Μ
Μ
Μ
2
= πΌ 2 β 2πΌππΌ + ππΌ 2 + ππΌ 2 β 2πΌππ¦Μ
+ ππ¦
Μ
Μ
Μ
2 + ππΌ 2 )
= (π + 1)πΌ 2 β 2πΌ(ππ¦Μ
+ ππΌ ) + (ππ¦
Μ
Μ
Μ
2 β π(π¦Μ
)2 ) + ππΌ 2 + π(π¦Μ
)2 β
= (π + 1)(πΌ β πΌΜ)2 + (ππ¦
π2 (π¦Μ
)2 + 2ππ¦Μ
ππΌ + ππΌ 2
π+1
2
2 (π¦ 2
2
2
2 (π¦ 2
2
Μ
Μ
Μ
2 β (π¦Μ
)2 ) + πππΌ + π Μ
) + ππΌ + π(π¦Μ
) β π Μ
) + 2ππ¦Μ
πá + ππΌ
= (π + 1)(πΌ β πΌΜ)2 + π(π¦
π+1
π+1
Μ
Μ
Μ
2 β (π¦Μ
)2 ) +
= (π + 1)(πΌ β πΌΜ)2 + π(π¦
1
(π(π¦Μ
)2 β 2ππ¦Μ
ππΌ + πππΌ 2 )
π+1
Μ
Μ
Μ
2 β (π¦Μ
)2 ) +
= (π + 1)(πΌ β πΌΜ)2 + π(π¦
π
(π¦Μ
β ππΌ )2
π+1
= (π + 1)(πΌ β πΌΜ)2 + πππ
1 .
Substituting back into the integral, we have
(π+1)
π(π¦|π1 ) =
(2π)β(π+1)β2
+1
β 1
β
πππ
π+1
2
exp (β 21 ) (β«ββ exp [β 2 (πΌ
β«0 (π2 )
2π
2π
β πΌΜ)2 ] ππΌ ) ππ 2 ,
3
β
1
π
+1
2
π(π¦|π1 ) = (2π)βπβ2 (π + 1)β1β2 β«0 (π2 )
exp (β
πππ
1
) ππ 2 ,
2π 2
π
2
π(π¦|π1 ) = (2π)βπβ2 ο ( ) (π + 1)β1β2 (πππ
1 )βπβ2 .
The Bayes factor is
π
π(π¦|π0 )
πππ
1 2
π΅πΉ =
= βπ + 1 ( 2 ) .
Μ
Μ
Μ
π(π¦|π1 )
ππ¦
(S2)
Equations (S1) and (S2) together are equivalent to equations (9), (11), and (12) of the main paper.
Bayes factor for two-sample data
β²
Suppose that ππ,π
and ππ,π are independent. Define
π¦π = π₯π,π , π = 1, β¦ , π,
(S3)
β²
π¦π+π = π₯π,π
, π = 1, β¦ , πβ² .
(S4)
The hypothesis of equivalent expression is
M0: π¦π = π½ + ππ , π = 1, β¦ , π + πβ²,
and the hypothesis of differential expression is
M1: π¦π = π½ + ππ , π = 1, β¦ , π,
π¦π = πΌ + ππ , π = π + 1, β¦ , π + πβ².
Preliminaries
To fix notation, let
π+πβ²
1
Μ
Μ
Μ
π¦2 =
β π¦π 2 ,
π+π
π=1
π+πβ²
1
π¦Μ
=
β π¦π ,
π+π
π=1
π+πβ²
1
π¦π =
Μ
Μ
Μ
β π¦π ,
πβ²
π=π+1
π
1
π¦π = β π¦π .
Μ
Μ
Μ
π
π=1
4
Before beginning the derivation of the Bayes factor, we note that the maximum likelihood estimates
under M1 are
πΌππΏπΈ = Μ
Μ
Μ
,
π¦π
π½ππΏπΈ = Μ
Μ
Μ
,
π¦π
and the sum of squares of the residuals using the MLEs is
2
2
Μ
Μ
Μ
2 β π(π¦
πππ
ππΏπΈ = (π + π)π¦
Μ
Μ
Μ
)
β π(π¦
Μ
Μ
Μ
)
π
π .
Prior distributions
For both models, we set the prior for (π½, π 2 ) to be
1
π(π½, π 2 ) β π2 .
For the extra parameter in M1, we use the unit information prior centered at πΌ = π½,
π(πΌ|π½, π 2 ) =
1
1
exp [β 2 (πΌ
π β2π
2π
β π½)2 ].
Null model prior predictive distribution
The prior predictive probability of the data under M0 is
β
β π+πβ²
π(π¦|π0 ) = β« π(π½, π 2 ) β« β π(π¦π |π½, π 2 ) ππ½ ππ 2 ,
ββ π=1
0
β
βπ+πβ²
2
π(π¦|π0 ) = (2π)
π+πβ²
+1
2
1
β« ( 2)
π
0
ββ
β
βπ+πβ²β1
2
π(π¦|π0 ) = (2π)
β
Μ
Μ
Μ
2 β (π¦Μ
)2 )
(π + πβ²)(π¦
π(π½ β π¦Μ
)2
exp (β
(
β«
exp
)
(β
) ππ½ ) ππ 2 ,
2π 2
2π 2
β12
(π + πβ²)
π+πβ²β1
+1
2
1
β« ( 2)
π
exp (β
0
π+πβ²β1
2
π(π¦|π0 ) = (2π)β
1
(π + πβ²)β2 ο (
Μ
Μ
Μ
2 β (π¦Μ
)2 )
(π + πβ²)(π¦
) ππ 2 ,
2π 2
π + πβ² β 1
Μ
Μ
Μ
2 β (π¦Μ
)2 )]β(π+πβ²β1)β2 .
) [(π + πβ²)(π¦
2
Alternative model prior predictive distribution
We define in advance:
πΈ(πΌ|π½, π¦) =
π½Μ = (
(πβ²π¦
Μ
Μ
Μ
π + π½)
,
πβ² + 1
(π + ππβ²)π¦
Μ
Μ
Μ
Μ
Μ
Μ
π + πβ²π¦
π
),
β²
π + π + ππβ²
5
πππ
1 = πππ
ππΏπΈ +
ππβ²
(π¦
Μ
Μ
Μ
β Μ
Μ
Μ
)
π¦π 2 .
(π + πβ² + ππβ²) π
As before, the effective sum of squares of the residuals under M1 is the sum of the SSR using the
maximum likelihood estimators and a penalty term for disagreement between the MLEs and the
prior distribution.
Before dealing with the marginal probability of the data under M1, we re-arrange the quadratic
expression in ο‘ and ο’ to ease the integrations.
π
π+πβ²
2
2
2
(πΌ β π½) + β(π¦π β π½) + β (π¦π β πΌ)
π=1
π=π+1
2
Μ
Μ
Μ
2
= πΌ 2 + π½ 2 β 2πΌπ½ + ππ½ 2 β 2ππ¦
Μ
Μ
Μ
π½
Μ
Μ
Μ
πΌ
π + πβ²πΌ β 2πβ²π¦
π + (π + πβ²)π¦
2
Μ
Μ
Μ
2
= (π + 1)π½ 2 β 2ππ¦
Μ
Μ
Μ
π½
Μ
Μ
Μ
π + (πβ² + 1)πΌ β 2(πβ²π¦
π + π½)πΌ + (π + πβ²)π¦
2
Μ
Μ
Μ
2
= (π + 1)π½ 2 β 2ππ¦
Μ
Μ
Μ
π½
π + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦ β
=(
=(
=(
=(
=(
2
(πβ²π¦
Μ
Μ
Μ
π + π½)
πβ² + 1
π + πβ² + ππβ² 2
πβ²
2
Μ
Μ
Μ
2
Μ
Μ
Μ
π¦π π½ + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦
Μ
Μ
Μ
)
) π½ β 2 (ππ¦
π+
πβ² + 1
πβ² + 1
πβ²2
(π¦
β
Μ
Μ
Μ
)2
πβ² + 1 π
(π + ππβ²)π¦
π + πβ² + ππβ²
Μ
Μ
Μ
Μ
Μ
Μ
2
π + πβ²π¦
π
Μ
Μ
Μ
2
) [π½ 2 β 2 (
) π½] + (πβ² + 1)(πΌ β πΈ(πΌ|π½)) + (π + πβ²)π¦
πβ² + 1
π + πβ² + ππβ²
πβ²2
(π¦
β
Μ
Μ
Μ
)2
πβ² + 1 π
2
π + πβ² + ππβ²
2
2
Μ
Μ
Μ
2 β πβ² (π¦
Μ
Μ
Μ
)2
) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦
πβ² + 1
πβ² + 1 π
2
[(π + ππβ²)π¦
Μ
Μ
Μ
Μ
Μ
Μ
]
π + πβ²π¦
π
β
(π + πβ² + ππβ²)(πβ² + 1)
2
π + πβ² + ππβ²
2
2
Μ
Μ
Μ
2 β πβ² (π¦
Μ
Μ
Μ
)2
) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦
πβ² + 1
πβ² + 1 π
2
2
(π + ππβ²)2 (π¦
Μ
Μ
Μ
)
+ πβ²2 (π¦
Μ
Μ
Μ
)
+ 2πβ²(π + ππβ²)π¦
Μ
Μ
Μ
π¦π
π
π
π Μ
Μ
Μ
β
(π + πβ² + ππβ²)(πβ² + 1)
π + πβ² + ππβ²
2
2
Μ
Μ
Μ
2
) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦
πβ² + 1
2
2
π2 (πβ² + 1)(π¦
Μ
Μ
Μ
)
+ πβ²2 (π + 1)(π¦
Μ
Μ
Μ
)
β 2ππβ²π¦
Μ
Μ
Μ
π¦π
π
π
π Μ
Μ
Μ
β
(π + πβ² + ππβ²)
6
=(
π + πβ² + ππβ²
2
2
2
2
Μ
Μ
Μ
2 β πβ²(π¦
Μ
Μ
Μ
)
β π(π¦
Μ
Μ
Μ
)
) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + (π + πβ²)π¦
π
π
πβ² + 1
2
2
ππβ²(π¦
Μ
Μ
Μ
)
+ ππβ²(π¦
Μ
Μ
Μ
)
β 2ππβ²π¦
Μ
Μ
Μ
π¦π
π
π
π Μ
Μ
Μ
+
(π + πβ² + ππβ²)
=(
π + πβ² + ππβ²
ππβ²
2
2
(π¦
Μ
Μ
Μ
β Μ
Μ
Μ
)
π¦π 2
) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + πππ
ππΏπΈ +
(π + πβ² + ππβ²) π
π+1
=(
π + πβ² + ππβ²
2
2
) (π½ β π½Μ ) + (πβ² + 1)(πΌ β πΈ(πΌ|π½, π¦)) + πππ
1 .
πβ² + 1
The marginal probability of the data under M1 is
β
β
β
π
π(π¦|π1 ) = β« β« β« π(πΌΜ, π
2 )π(πΌ|π½,
π
2)
π+πβ²
2
β π(π¦π |π = π½, π ) β π(π¦π |π = πΌ, π 2 ) ππΌ ππ½ ππ 2 ,
π=π
0 ββ ββ
β
(π+πβ²+1)
+1
2
1
β« ( 2)
π
βπ+πβ²+1
2
π(π¦|π1 ) = (2π)
0
β
π=π+1
β
π
1
2
β« β« exp [β 2 ((πΌ β π½)2 + β(π¦π β π½)
2π
π=1
ββ ββ
π+πβ²
2
+ β (π¦π β πΌ) )] ππΌ ππ½ ππ 2 ,
π=π+1
β
βπ+πβ²+1
2
π(π¦|π1 ) = (2π)
(π+πβ²+1)
+1
2
1
β« ( 2)
π
0
β
πππ
1
π + πβ² + ππβ²
exp (β
) ( β« exp [β 2
] ππ½ )
2
2π
2π (π + 1)
ββ
β
× ( β« exp [β
ββ
β
βπ+πβ²β1
2
π(π¦|π1 ) = (2π)
β12
(π + πβ² + ππβ²)
(π + 1)
2
(πΌ β πΈ(πΌ|π½, π¦)) ] ππΌ ) ππ 2 ,
2
2π
(π+πβ²β1)
+1
2
1
β« ( 2)
π
exp (β
0
π+πβ1
2
π(π¦|π1 ) = (2π)β
1
(π + πβ² + ππβ²)β2 ο (
πππ
1
) ππ 2 ,
2π 2
π + πβ² β 1
) (πππ
1 )β(π+πβ²β1)β2 .
2
The Bayes factor is
π(π¦|π0 )
π + πβ² + ππβ²
πππ
1
π΅πΉ =
=β
(
Μ
Μ
Μ
2 β (π¦Μ
)2 ))
π(π¦|π1 )
π + πβ²
(π + πβ²)(π¦
π+πβ²β1
2
.
(S5)
Equations (S3), (S4) and (S5) together are equivalent to equations (10), (11), and (12) of the main
paper.
7
B.1. Derivation of posterior distribution for sampling variances
Under the null hypothesis with non-paired data, the data have the same mean, but the null
hypothesis says nothing about sampling variances for the treatment and control data. In section A,
the variances were treated as identical. Here we treat them as unrelated. Suppose that
π¦π = π½ + πππ , π = 1, β¦ , π,
π¦π = π½ + πβ²ππ , π = π + 1, β¦ , π + πβ².
where ο₯j are independent and identically Gaussian with mean zero and variance 1, ο³2 is the
variance of the control data, and ο³β2 is the sampling variance of the treatment data.
To calculate the posterior predictive variance of a new treatment data point minus a new control
data point, we need (up to proportionality) the posterior distribution of ο³2 and ο³β2, which we derive
here.
Prior distribution
We set
1
1
π(π½, π 2 , π β²2 ) β π2 πβ²2 .
Posterior distribution
Define
π+πβ²
1
π¦π =
Μ
Μ
Μ
β π¦π ,
πβ²
π=π+1
π
1
π¦π = β π¦π ,
Μ
Μ
Μ
π
π=1
π+πβ²
2
πππ
β² = β (π¦π β Μ
Μ
Μ
)
π¦π ,
π=π+1
π
2
πππ
= β(π¦π β Μ
Μ
Μ
)
π¦π .
π=1
8
Before dealing with the posterior distribution of ο³2 and ο³β2, we re-arrange the quadratic expression
in to ease the integration that will follow.
π(π½ β Μ
Μ
Μ
)
π¦π 2 πβ²(π½ β Μ
Μ
Μ
)
π¦π 2
+
π2
πβ²2
=
π
πβ²
Μ
Μ
Μ
π¦π 2 ) + 2 (π½ 2 β 2π½π¦
Μ
Μ
Μ
π¦π 2 ),
(π½ 2 β 2π½π¦
π + Μ
Μ
Μ
π + Μ
Μ
Μ
2
π
πβ²
=(
2
2
π
πβ²
ππ¦
Μ
Μ
Μ
πβ²π¦
Μ
Μ
Μ
ππ¦
Μ
Μ
Μ
πβ²π¦
Μ
Μ
Μ
π
π
π
π
2
+
π½
β
2π½
+
+
+
,
)
( 2
)
π 2 πβ²2
π
πβ²2
π2
πβ²2
=(
2
2
π
πβ²
ππβ²2 Μ
Μ
Μ
π¦π + πβ² π 2 Μ
Μ
Μ
π¦π
ππ¦
Μ
Μ
Μ
πβ²π¦
Μ
Μ
Μ
π
π
2
+
[π½
β
2π½
+
+
,
)
(
)]
π 2 πβ²2
ππβ²2 + πβ² π 2
π2
πβ²2
2
2
2
(ππβ²2 Μ
Μ
Μ
π
πβ²
ππβ²2 Μ
Μ
Μ
π¦π + πβ² π 2 Μ
Μ
Μ
π¦π
ππ¦
Μ
Μ
Μ
πβ²π¦
Μ
Μ
Μ
π¦π + πβ² π 2 Μ
Μ
Μ
)
π¦π 2
π
π
= ( 2 + 2 ) [π½ β (
+
+
β
.
)]
π
πβ²
ππβ²2 + πβ² π 2
π2
πβ²2
ππβ²2 + πβ² π 2
For the full set of parameters, the posterior distribution is,
2
π
πβ²
(π 2 )β(2 + 1) (π β²2 )β( 2 +1)
β²2
π(π½, π , π |π¦) β
π
πβ²
1 π(π½
β( + 1) β²2 β( +1)
2
(π ) 2
exp [β (
= (π 2 )
=
2
π
πβ²
(π 2 )β(2 + 1) (π β²2 )β( 2 +1)
β
2
exp [β
βππ=1(π½ β π¦π )
2π 2
2
β
βπβ²
π=π+1(π½ β π¦π )
2π 2
],
β Μ
Μ
Μ
)
π¦π 2 πβ²(π½ β Μ
Μ
Μ
)
π¦π 2 πππ
πππ
β²
+
+ 2 + 2 )],
π2
πβ²2
π
πβ²
2
2
2
1 π
πβ²
ππβ²2 Μ
Μ
Μ
π¦π + πβ² π 2 Μ
Μ
Μ
π¦π
ππ¦
Μ
Μ
Μ
πβ²π¦
Μ
Μ
Μ
π
π
exp [β {( 2 + 2 ) [π½ β (
+
+
)]
2 π
πβ²
ππβ²2 + πβ² π 2
π2
πβ²2
(ππβ²2 Μ
Μ
Μ
π¦π + πβ² π 2 Μ
Μ
Μ
)
π¦π 2 πππ
πππ
β²
+ 2 + 2 }].
ππβ²2 + πβ² π 2
π
πβ²
Next we marginalize ο’ to eliminate it from the posterior distribution.
β
π(π 2 , π β²2 |π¦) = β«ββ π(π½, π 2 , π β²2 |π¦)ππ½,
2
π
πβ²
1 ππ¦
Μ
Μ
Μ
π
β( + 1) β²2 β( +1)
2
(π ) 2
exp [β { 2
2 π
β (π 2 )
+
2
(ππβ²2 Μ
Μ
Μ
πβ²π¦
Μ
Μ
Μ
π¦π + πβ² π 2 Μ
Μ
Μ
)
π¦π 2 πππ
π
β
+ 2
πβ²2
ππβ²2 + πβ² π 2
π
2
+
β
πππ
β²
1 π
πβ²
ππβ²2 Μ
Μ
Μ
π¦π + πβ² π 2 Μ
Μ
Μ
π¦π
β«
exp
+
[π½
β
}]
{β
(
)
(
)] } ππ½,
2
2
2
2
β²
2
πβ²
2 π
πβ²
ππβ² + π π
ββ
1
=
π
πβ²
(π 2 )β(2 + 1) (π β²2 )β( 2 +1) (
+
2
2
(ππβ²2 Μ
Μ
Μ
π
πβ² 2
1 ππ¦
Μ
Μ
Μ
πβ²π¦
Μ
Μ
Μ
π¦π + πβ² π 2 Μ
Μ
Μ
)
π¦π 2 πππ
π
π
+
exp
[β
+
β
+ 2
)
(
π 2 πβ²2
2 π2
πβ²2
ππβ²2 + πβ² π 2
π
πππ
β²
)].
πβ²2
9
This is the final expression. It is not a standard distribution, but it is easy to generate Markov chain
Monte Carlo samples from it.
10