Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multiple Testing Matthew Kowgier Multiple Testing • In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical inferences simultaneously. – Errors in inference – Hypothesis tests that incorrectly reject the null hypothesis What a P-value isn’t • P-value is NOT the probability of H0 given the data • P-value takes no account of the power of the study – Probability of accepting H0 when it is actually false What a P-value IS? • “Informal measure of the compatibility of the data with the null hypothesis” – Jewell 2004 • If we repeated our experiment over and over again, each time taking a random sample of observable units (people), what proportion of the time could we expect to observe a result (test statistic) at least as extreme, by chance alone? Type I Error • “False positive": the error of rejecting a null hypothesis when it is actually true. • The error of accepting an alternative hypothesis (the real hypothesis of interest) when the results can be attributed to chance. • Occurs when we observe a difference when in truth there is none. – e.g., A court finding a person guilty of a crime that they did not actually commit. • Try to set Type I error to 0.05 or 0.01 – there is only a 5 or 1 in 100 chance that the variation that we are seeing is due to chance. Type II Error • “False negative": the error of failing to reject a null hypothesis when the alternative hypothesis is true. • The error of failing to observe a difference when in truth there is one. – e.g., A court finding a person not guilty of a crime that they did actually commit. Actual Condition Test Result Affected Not Affected Shows Infected True Positive False Positive Type I Error Shows “not infected” False Negative Type II Error True Negative How Stringent a P-value? • P < 0.05 – By chance alone, under the null hypothesis we will observe a positive result (false positive) in 5% of our tests – 5/100 – 50/1,000 – 500/10,000 – 5,000/100,000 – 50,000/1,000,000 Genome Wide Association • 12,000, 550,000, 1,000,000 SNPs • Multiple diseases add tests • Stratifying by sex, ethnicity, smoking status etc adds tests (and reduces power by effectively reducing sample size) • Need to rethink our critical P-value Not Accounting for Multiple Tests • Invalid statistical conclusions • Confidence intervals that don’t contain the population parameter • Incorrect rejection of H0 Implications • Clinical Trial – May result in approval of a drug as an improvement over existing drugs, when it is in fact equivalent to the existing drugs. – Could happen by chance that the new drug appears to be worse for some side-effect, when it is actually not worse for this side-effect. Accounting for Multiple Testing • Make standards for each comparison more stringent than for a single test • Bonferroni correction – Adjust allowable type I error by dividing alpha by number of tests – E.g. 20 tests – p-value cut-off becomes 0.05/20 = 0.0025 – E.g. 500,000 tests – p-value cut-off becomes 0.05/500,000 = 0.0000001 Accounting for Multiple Testing • Bonferroni thought to be too stringent, particularly for GWAs • False Discovery Rate (FDR) – Instead of controlling the chance of any false positives (as Bonferroni does), FDR controls the expected proportion of false positives – A FDR threshold is determined from the observed p-value distribution, and hence is adaptive to the amount of signal in your data. FDR • q-value replaces a p-value • http://faculty.washington.edu/jstorey/qvalue/