Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multiple hypothesis testing CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Multiple hypothesis testing 1 / 24 Setting In genomic studies, we are testing many hypothesis • Genome-wide Association Study: Test every SNP in the genome. 500, 000 SNPs. • Genome-wide expression study: Test every gene in the genome. 20, 000 genes. Multiple hypothesis testing 2 / 24 Evaluating tests of multiple hypotheses Testing m hypotheses. Assume that all null hypotheses are true. We set α = 0.05 for each test. How many hypotheses are rejected ? Z = m X Zi i=1 E [Z] = m X EZi i=1 = mα For large m, many false positives. Multiple hypothesis testing 3 / 24 Tests of multiple hypotheses Setting Example We perform m tests where test i accepts or rejects null hypothesis H0i . pi is the p-value obtained in test i. Truth H0 H1 Total Decision H0 H1 U V T S Q R Total m0 m1 m V: False positive (type-I error) T: False negative (type-II error) S: True positive U: True negative Multiple hypothesis testing 4 / 24 Evaluating tests of multiple hypotheses • FWER: Family-wise Error Rate • FDR: False Discovery Rate • Many others: Per-comparison error rate (PCER), k-FWER, False Non-discovery rate Which evaluation criterion to choose ? Depends on the tradeoff between the different types of errors Multiple hypothesis testing 5 / 24 Outline Family-wise Error Rate False Discovery Rate Multiple hypothesis testing Family-wise Error Rate 6 / 24 FWER FWER: Probability that any true null hypothesis is rejected F W ER = P (V ≥ 1) Multiple hypothesis testing Family-wise Error Rate 7 / 24 FWER control procedure Given 0 ≤ α ≤ 1 and a set of p-values, output a list of rejected null hypotheses subject to the constraint F W ER ≤ α . Multiple hypothesis testing Family-wise Error Rate 8 / 24 Bonferroni procedure Multiple hypothesis testing Family-wise Error Rate 9 / 24 Bonferroni procedure Bonferroni procedure: Reject null hypotheses for which pi ≤ α m. α F W ER = Pr{∪i∈{set of true null hypotheses} Reject hypothesis i at level } m X α ≤ Pr0 {Reject hypothesis i at level } m i∈{set of true null hypotheses} = m0 α m ≤ α Multiple hypothesis testing Family-wise Error Rate 9 / 24 Bonferroni procedure Pros: • Bonferroni’s procedure provides FWER control no matter what the dependence among the tests (or equivalently the p-values) or no matter how many of the hypotheses are truly null. • Easy to implement. • Computationally efficient. Data-independent. Cons: • Very conservative. Tests are often correlated so that the tests are not independent. • In an extreme case, assume perfect correlation among all tests. Then rejecting hypotheses for which pi ≤ α controls FWER at α. Multiple hypothesis testing Family-wise Error Rate 9 / 24 Bonferroni procedure Workarounds • Make assumptions about the number of null hypotheses or the dependence of p-values. • Estimate mef f : the effective number of tests. Multiple hypothesis testing Family-wise Error Rate 9 / 24 Outline Family-wise Error Rate False Discovery Rate Multiple hypothesis testing False Discovery Rate 10 / 24 FDR We can tolerate some false positives especially if it is easy to do follow-up experiments. False Discovery Rate (FDR) is the expected proportion of false positive results among the rejected tests F DP = V R ∨V 1 R if R > 0 0 else F DR = E [F DP ] V = E |R > 0 P (R > 0) R = Multiple hypothesis testing False Discovery Rate 11 / 24 FDR and FWER • 1{V ≥ 1} ≥ FDP. Thus, FWER ≥ FDR. • When all null hypotheses are true, FWER= FDR. Any procedure that controls FWER also controls FDR. A procedure that controls FDR but not FWER can be more powerful. Multiple hypothesis testing False Discovery Rate 12 / 24 FDR control procedure Let F DR(t) denote the FDR when we reject all null hypotheses with pi ≤ t. V (t) = |{i : i is a true null hypothesis, pi ≤ t}| R(t) = |{i : pi ≤ t}| V (t) F DR(t) = E R(t) ∨ 1 = E [F DP (t)] Multiple hypothesis testing False Discovery Rate 13 / 24 FDR control procedure FDR control procedure. • Find tα = sup{t : F DR(t) ≤ α}. • Reject all hypotheses i with pi ≤ tα . We choose the largest t to increase our sensitivity while bounding the expected proportion of false discoveries. Multiple hypothesis testing False Discovery Rate 13 / 24 Controlling FDR B-H procedure m = 100 α = 0.2, π0 = 0.8 Xi ∼ µi + i i ∼ N (0, 1) Hi = 0 ⇔ µi = 0 Hi = 1 ⇒ µi = 3 a a Multiple hypothesis testing http://simulations.lpma-paris.fr/fdr_tutorial/ False Discovery Rate 14 / 24 Controlling FDR Benjamini-Hochberg procedure (B-H)1 p(1) , . . . , p(m) are the ordered p-values k 2: Let k̂ = max{1 ≤ k ≤ m : p(k) ≤ m α} 3: If k̂ exists, reject hypotheses corresponding to p(1) , . . . , p(k̂) . Otherwise reject none. Equivalently, we can write 1: tBH = max{p(k) : p(k) ≤ k α} m and reject each null hypothesis i with pi ≤ tBH . 1 Benjamini and Hochberg, JRSSB 1995 Multiple hypothesis testing False Discovery Rate 15 / 24 B-H procedure B-H provides an estimator of FDR(t) that is conservatively biased. F DR(t) ≈ = E [V (t)] E [R(t) ∨ 1] m0 t E [R(t) ∨ 1] m0 and denominator are unknown in the above expression. Since we want to upper bound FDR, okay to replace them with estimators that are bigger. \ F DRBH (t) = Multiple hypothesis testing mt R(t) ∨ 1 False Discovery Rate 16 / 24 The B-H procedure controls FDR Let r hypotheses be rejected at FDR level of α using the BH procedure. E [V (tBH )] E [R(tBH ) ∨ 1] r m0 m α = r m0 = α m ≤ α F DR(tBH ) ≈ Notice that the B-H is conservative in its control of the FDR. Multiple hypothesis testing False Discovery Rate 17 / 24 Example: testing for differentially expressed genes • Hedenfalk et al. (2001) find genes that are differentially expressed between BRCA1-mutation positive tumors and BRCA2-mutation positive tumors. • 3170 genes used for analysis. • Compute two-sample t-statistic for each gene i, followed by a p-value pi . a a Multiple hypothesis testing Storey and Tibshirani PNAS 2003 False Discovery Rate 18 / 24 Example: testing for differentially expressed genes • Hedenfalk et al. (2001) use p-value cutoff of 0.001 to find 51 genes out of 3226 that are differentially expressed. Expect 3 false positives. • B-H estimates that 94 genes are differentially expressed at a FDR of 0.05. a a Multiple hypothesis testing Storey and Tibshirani PNAS 2003 False Discovery Rate 19 / 24 Summary • Multiple-testing a serious concern in genomic studies. • Price to pay for the number of tests. On the other hand, we can use the large number of tests can be an asset – allowing using to infer population parameters that we cannot learn from a single test. • Two main quantities we would like to control: FWER and FDR. • Which quantity we choose to control depends on the application. • Procedures to control FWER and FDR. Multiple hypothesis testing False Discovery Rate 20 / 24