Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
q-value Tiffany Chao Beth Johnson Steven Lee Hypothesis testing ● Test for each gene ○ null hypothesis: no differential expression ● Two kinds of errors ○ type I error (false positive) say that a gene is differentially expressed when it actually isn't; wrongly reject a true null hypothesis ○ type II error (false negative) say that a gene isn't differentially expressed when it actually is; fail to reject a false null hypothesis Thinking about p-values ● Probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming the null hypothesis is true ● Minimum false positive rate at which an observed statistic can be called significant ● If null hypothesis is simple, then a null pvalue is uniformly distributed Multiple comparison problem ● Even if we have useful approximations for our p-values, we still face the multiple comparison problem ● When performing many independent tests, p-values no longer have the same interpretation Not only in genomics! ● "Statistical Comparisons of Classifiers over Multiple Datasets", Demsar, JMLR 2006 ● "Permutation Tests for Studying Classifier Performance", Ojala, JMLR 2010 ● "On Comparing Classifiers: Pitfalls to avoid and a recommended approach", Salzberg, 1997, Data Mining and Knowledge Discovery Multiple hypothesis testing Called significant Called not significant Total Null true F m0 – F m0 Alternative true T m1 – T m1 Total S m–S m Suppose we care about p-values ≤ 0.05? Error rates (more on this later) ● Per comparison error rate (PCER) ○ E[F] / m ● Per family error rate (PFER) ○ E[F] ● Family-wise error rate (FWER) ○ Pr(F ≥ 1) ● False discovery rate (FDR)* ○ E[F/S] (and set F/S = 0 when S = 0) = E[F/S | S > 0] Pr(S > 0) ● Positive false discovery rate (pFDR)* ○ E[F/S | S > 0] MHT error controlling procedure ● Suppose you test m hypotheses and get m pvalues: p1 , p2 , p3 , ... pm ● A multiple hypothesis test error controlling procedure is a function T(p; α) such that rejecting all nulls with pi ≤ T(p; α) implies that Error ≤ α ● Error is a population quantity (not random) Weak and strong control ● Weak: T(p; α) is such that Error ≤ α only when m0 = m ● Strong: T(p; α) is such that Error ≤ α for any value of m0 ○ note that m0 is not an argument for T(p; α)! Bonferroni correction provides strong control: but too restrictive Why FDR and q-value? ● To help us interpret these values, two pieces of information would be useful ● ● Estimate of the overall proportion of features that are truly alternative (even if they cannot be precisely identified) Measure of significance that can be associated with each feature so that thresholding the numbers at a particular value has an easy interpretation FDR ● Would like an error measure that provides a balance between ● ● Number of false positive features (F) Number of true positive features (T) FDR ● The false discovery rate is the expected value of the proportion of false positive features among all those called significant *Some possibility S = 0, so some adjustment has to be made to definition of FDR Estimating FDR ● Therefore, the FDR depends on what threshold (t) we are using to determine significance Estimating FDR ● Because we are considering many features (m is very large), we can approximate Estimating FDR ● ● We now need to approximate E[S(t)] and E[F(t)] To illustrate how FDR is determined, for m genes we have m p values ● ● denoted p1, p2,…,pm Define F(t) and S(t) can count these for a given t Estimating FDR ● ● Approximating F(t) is more difficult because we do know how many values called significant were truly null Assuming null p values are uniformly distributed, the probability(null p ≤ t) = t (# of null features x probability of null feature called significant) Estimating FDR ● ● ● We do not know true value of m0, (# of null features) so we must estimate Equivalently, we can estimate the proportion of features that are truly null (denoted by π0) Assuming a uniform distribution for null pvalues, we can estimate this quantity using a histogram Estimating π0 Find where pvalues look like a uniform distribution and set λ Estimating π0 Note π0 does not depend on t λ (1-λ) Estimating π0 Can also fit a cubic function to the π0 vs λ data to determine π0(1) (because “most” of the p values at 1 would be expected to be null) FDR ● Estimate for False Discovery Rate is Graphical Interpretation q-value definition ● for a given feature, the q-value is the expected FDR incurred if it is called significant ○ ● (every other p_j <= p_i is also called significant) in practical terms: a q-value threshold is the "proportion of significant features that turn out to be false leads" Graphical Interpretation Graphical Interpretation q-value a measure of each feature's significance ● p-value is in terms of the false positive rate vs q-value is in terms of the FDR ○ this takes into account that thousands of features are simultaneously being tested (via FDR) ■ uses a better model of where the significant features are likely to be p vs q ● Example: ○ ● p-values: ○ ○ ● m = 10000 cutoff at .01 assumes that you likely found about 100 false positives cutoff of .0001 assumes that you only found 1 false positive, but at what cost? q-values: ○ set q-value cutoff at .05, and be sure that only 5% of the significant genes found are likely to be false positives Algorithm for Determining qValues ● ● Compute test statistic (p-value) for m genes Estimate π0 ○ Using histogram ■ ■ Find region where p-values are uniform + set λ Count p-values > λ and compute (1-λ)m (number of values) Using cubic spline For each p-value calculate FDR for each threshold t >= p ○ ● ○ ■ ○ only choose t values for each unique p in the gene set choose minimum FDR as q-value q-value (cutoff) q-value accuracy ● assumes that the dependence between features will generally be weak dependence genes are actually dependent in pathways, which can be modeled as blocks ○ ● ● if so, when m is large, calling all features significant with q <= alpha, implies the FDR <= alpha the estimated q value of each feature is greater than or equal to it's true q-value conservative is desirable ○ q-value summary ● ● A standard measure of significance that can be universally interpreted between studies better than using just p-values ○ arbitrary selection of alpha, where it is selected so the expected number of false positives is < 1 throws away too many likely truly significant features Questions? FDR plug-in Create K permutations of the data, producing k statistics tj for features j=1,...,M and permutations k=1,...,K. ● For a range of cutoffs C, let ● Estimate the FDR by ●