Download Significance Tests

Significance Tests P-values and Q-values Outline      Statistical significance in multiple testing Empirical distribution of test statistics Family-wide p-values Correlation and p-values False discovery rates Tests and Test Statistics      T-test is fairly robust to skew, but not robust to outliers – “thick tails” of distribution Non-parametric tests are robust, but lose too much ability to detect differences (power) Robust tests can be useful Permutation tests are simple and easy to program Some authors use: xi , group1  xi , group2 si  rather than ti  SDi  q , SD xi , group1  xi , group2 SDi To reduce numbers of low fold-changes in highly signficant scores Distribution of test statistics Quantile plots of t-statistics: left: random distn; right: experiment Distribution of Set of p-values Multiple comparisons  Suppose 10,000 genes on a chip   Each gene has a 5% chance of exceeding the threshold score for a p-value of .05   None actually differentially expressed Type I error definition On average, 500 genes should exceed .05 threshold ‘by chance’ Family-Wide Error Rate  ‘Corrected’ p-value:    Probability of finding a single false positive among all N tests Normally all tests at same threshold Simplest correction (Bonferroni)    pi* = Npi, (if Npi < 1, otherwise 1) Fairly close to true false positive rate in simulations of independent tests Too conservative in practice! P-Values from Correlated Genes Null distribution from Null distribution from Null distribution from independent genes perfectly correlated genes highly correlated genes .5 .3 .9 .5 .3 .9 .5 .3 .9 .7 .03 .1 .5 .3 .9 .45 .2 .95 .4 .9 .05 .5 .3 .9 .65 .25 .8 .6 .8 .4 .5 .3 .9 .4 .35 .75 .2 .2 .9 .5 .3 .9 .5 .4 .85 Rows: genes; columns: samples; entries: p-values from randomized distribution The Effect of Correlation   If all genes are uncorrelated, Sidak is exact If all genes were perfectly correlated p-values for one are p-values for all  No multiple-comparisons correction needed   Typical gene data is highly correlated   First eigenvalue of SVD may be more than half the variance More sensitive tests possible if we can generate joint null distribution of p-values Re-formulating the Question     Independent: ~5% of genes exceed .05 threshold, all the time Perfectly Correlated: all genes exceed .05 threshold ~5% of the time Realistically correlated: .05 < f1 < 1 of genes exceeds .05 threshold, .05 < f2 < 1 of the cases New question: for a given f1 and , how likely is it that a fraction f1 of genes will exceed the  threshold? Step-Down p-Values    Calculate single-step p-values for genes: p1, …, pN Order the smallest k p-values: p(1), …, p(k) For each k, ask:      How likely are we to get k p-values less than p(k) if no differences are real? Generate null distribution by permutations More significant genes, at the same level of Type I error, compared with single-step procedures See Ge, et al, Test, 2003 Bioconductor package multtest False Discovery Rate   At threshold t* what fraction of genes are likely to be true positives? Illustration: 10,000 independent genes t 1.96 2.57 3.29 p #sig E(FP) FDR* .05 .01 .001 600 200 40 500 100 10 87% 50% 20% In practice use permutation algorithm to compute FDR pFDR   How to estimate the FDR? ‘positive’ False Discovery Rate:   E(#false positives/#positives) * P(#positives >0) Simes’ inequality allows this to be computed from p-values

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Significance Tests