Download Multiple hypothesis testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression profiling wikipedia , lookup

Transcript
Multiple hypothesis testing
CM226: Machine Learning for Bioinformatics.
Fall 2016
Sriram Sankararaman
Multiple hypothesis testing
1 / 24
Setting
In genomic studies, we are testing many hypothesis
• Genome-wide Association Study: Test every SNP in the genome.
500, 000 SNPs.
• Genome-wide expression study: Test every gene in the genome.
20, 000 genes.
Multiple hypothesis testing
2 / 24
Evaluating tests of multiple hypotheses
Testing m hypotheses.
Assume that all null hypotheses are true.
We set α = 0.05 for each test.
How many hypotheses are rejected ?
Z =
m
X
Zi
i=1
E [Z] =
m
X
EZi
i=1
= mα
For large m, many false positives.
Multiple hypothesis testing
3 / 24
Tests of multiple hypotheses
Setting
Example We perform m tests where test i accepts or rejects null
hypothesis H0i .
pi is the p-value obtained in test i.
Truth
H0
H1
Total
Decision
H0 H1
U
V
T
S
Q
R
Total
m0
m1
m
V: False positive (type-I error)
T: False negative (type-II error)
S: True positive
U: True negative
Multiple hypothesis testing
4 / 24
Evaluating tests of multiple hypotheses
• FWER: Family-wise Error Rate
• FDR: False Discovery Rate
• Many others: Per-comparison error rate (PCER), k-FWER, False
Non-discovery rate
Which evaluation criterion to choose ?
Depends on the tradeoff between the different types of errors
Multiple hypothesis testing
5 / 24
Outline
Family-wise Error Rate
False Discovery Rate
Multiple hypothesis testing
Family-wise Error Rate
6 / 24
FWER
FWER: Probability that any true null hypothesis is rejected
F W ER = P (V ≥ 1)
Multiple hypothesis testing
Family-wise Error Rate
7 / 24
FWER control procedure
Given 0 ≤ α ≤ 1 and a set of p-values, output a list of rejected null
hypotheses subject to the constraint
F W ER ≤ α
.
Multiple hypothesis testing
Family-wise Error Rate
8 / 24
Bonferroni procedure
Multiple hypothesis testing
Family-wise Error Rate
9 / 24
Bonferroni procedure
Bonferroni procedure: Reject null hypotheses for which pi ≤
α
m.
α
F W ER = Pr{∪i∈{set of true null hypotheses} Reject hypothesis i at level }
m
X
α
≤
Pr0 {Reject hypothesis i at level }
m
i∈{set of true null hypotheses}
= m0
α
m
≤ α
Multiple hypothesis testing
Family-wise Error Rate
9 / 24
Bonferroni procedure
Pros:
• Bonferroni’s procedure provides FWER control no matter what the
dependence among the tests (or equivalently the p-values) or no
matter how many of the hypotheses are truly null.
• Easy to implement.
• Computationally efficient. Data-independent.
Cons:
• Very conservative. Tests are often correlated so that the tests are not
independent.
• In an extreme case, assume perfect correlation among all tests. Then
rejecting hypotheses for which pi ≤ α controls FWER at α.
Multiple hypothesis testing
Family-wise Error Rate
9 / 24
Bonferroni procedure
Workarounds
• Make assumptions about the number of null hypotheses or the
dependence of p-values.
• Estimate mef f : the effective number of tests.
Multiple hypothesis testing
Family-wise Error Rate
9 / 24
Outline
Family-wise Error Rate
False Discovery Rate
Multiple hypothesis testing
False Discovery Rate
10 / 24
FDR
We can tolerate some false positives especially if it is easy to do follow-up
experiments.
False Discovery Rate (FDR) is the expected proportion of false positive
results among the rejected tests
F DP
=
V
R
∨V 1
R if R > 0
0 else
F DR = E [F DP ]
V
= E
|R > 0 P (R > 0)
R
=
Multiple hypothesis testing
False Discovery Rate
11 / 24
FDR and FWER
• 1{V ≥ 1} ≥ FDP. Thus, FWER ≥ FDR.
• When all null hypotheses are true, FWER= FDR.
Any procedure that controls FWER also controls FDR.
A procedure that controls FDR but not FWER can be more powerful.
Multiple hypothesis testing
False Discovery Rate
12 / 24
FDR control procedure
Let F DR(t) denote the FDR when we reject all null hypotheses with
pi ≤ t.
V (t) = |{i : i is a true null hypothesis, pi ≤ t}|
R(t) = |{i : pi ≤ t}|
V (t)
F DR(t) = E
R(t) ∨ 1
= E [F DP (t)]
Multiple hypothesis testing
False Discovery Rate
13 / 24
FDR control procedure
FDR control procedure.
• Find tα = sup{t : F DR(t) ≤ α}.
• Reject all hypotheses i with pi ≤ tα .
We choose the largest t to increase our sensitivity while bounding the
expected proportion of false discoveries.
Multiple hypothesis testing
False Discovery Rate
13 / 24
Controlling FDR
B-H procedure
m = 100
α = 0.2, π0 = 0.8
Xi ∼ µi + i
i ∼ N (0, 1)
Hi = 0 ⇔ µi = 0
Hi = 1 ⇒ µi = 3
a
a
Multiple hypothesis testing
http://simulations.lpma-paris.fr/fdr_tutorial/
False Discovery Rate
14 / 24
Controlling FDR
Benjamini-Hochberg procedure (B-H)1
p(1) , . . . , p(m) are the ordered p-values
k
2: Let k̂ = max{1 ≤ k ≤ m : p(k) ≤ m
α}
3: If k̂ exists, reject hypotheses corresponding to p(1) , . . . , p(k̂) .
Otherwise reject none.
Equivalently, we can write
1:
tBH = max{p(k) : p(k) ≤
k
α}
m
and reject each null hypothesis i with pi ≤ tBH .
1
Benjamini and Hochberg, JRSSB 1995
Multiple hypothesis testing
False Discovery Rate
15 / 24
B-H procedure
B-H provides an estimator of FDR(t) that is conservatively biased.
F DR(t) ≈
=
E [V (t)]
E [R(t) ∨ 1]
m0 t
E [R(t) ∨ 1]
m0 and denominator are unknown in the above expression.
Since we want to upper bound FDR, okay to replace them with estimators
that are bigger.
\
F
DRBH (t) =
Multiple hypothesis testing
mt
R(t) ∨ 1
False Discovery Rate
16 / 24
The B-H procedure controls FDR
Let r hypotheses be rejected at FDR level of α using the BH procedure.
E [V (tBH )]
E [R(tBH ) ∨ 1]
r
m0 m
α
=
r
m0
=
α
m
≤ α
F DR(tBH ) ≈
Notice that the B-H is conservative in its control of the FDR.
Multiple hypothesis testing
False Discovery Rate
17 / 24
Example: testing for differentially expressed genes
• Hedenfalk et al. (2001) find
genes that are differentially
expressed between
BRCA1-mutation positive
tumors and BRCA2-mutation
positive tumors.
• 3170 genes used for analysis.
• Compute two-sample t-statistic
for each gene i, followed by a
p-value pi .
a
a
Multiple hypothesis testing
Storey and Tibshirani PNAS 2003
False Discovery Rate
18 / 24
Example: testing for differentially expressed genes
• Hedenfalk et al. (2001) use
p-value cutoff of 0.001 to find
51 genes out of 3226 that are
differentially expressed. Expect
3 false positives.
• B-H estimates that 94 genes are
differentially expressed at a FDR
of 0.05.
a
a
Multiple hypothesis testing
Storey and Tibshirani PNAS 2003
False Discovery Rate
19 / 24
Summary
• Multiple-testing a serious concern in genomic studies.
• Price to pay for the number of tests. On the other hand, we can use
the large number of tests can be an asset – allowing using to infer
population parameters that we cannot learn from a single test.
• Two main quantities we would like to control: FWER and FDR.
• Which quantity we choose to control depends on the application.
• Procedures to control FWER and FDR.
Multiple hypothesis testing
False Discovery Rate
20 / 24