Download Accounting for Multiple Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Multiple Testing
Matthew Kowgier
Multiple Testing
• In statistics, the multiple comparisons/testing
problem occurs when one considers a set of
statistical inferences simultaneously.
– Errors in inference
– Hypothesis tests that incorrectly reject the null
hypothesis
What a P-value isn’t
• P-value is NOT the probability of H0 given the
data
• P-value takes no account of the power of the
study
– Probability of accepting H0 when it is actually false
What a P-value IS?
• “Informal measure of the compatibility of the
data with the null hypothesis”
– Jewell 2004
• If we repeated our experiment over and over
again, each time taking a random sample of
observable units (people), what proportion of
the time could we expect to observe a result
(test statistic) at least as extreme, by chance
alone?
Type I Error
• “False positive": the error of rejecting a null
hypothesis when it is actually true.
• The error of accepting an alternative
hypothesis (the real hypothesis of interest)
when the results can be attributed to chance.
• Occurs when we observe a difference when in
truth there is none.
– e.g., A court finding a person guilty of a crime that they did not
actually commit.
• Try to set Type I error to 0.05 or 0.01
– there is only a 5 or 1 in 100 chance that the variation that we are
seeing is due to chance.
Type II Error
• “False negative": the error of failing to reject a
null hypothesis when the alternative
hypothesis is true.
• The error of failing to observe a difference
when in truth there is one.
– e.g., A court finding a person not guilty of a crime
that they did actually commit.
Actual Condition
Test
Result
Affected
Not Affected
Shows
Infected
True Positive
False Positive
Type I Error
Shows
“not
infected”
False Negative
Type II Error
True Negative
How Stringent a P-value?
• P < 0.05
– By chance alone, under the null hypothesis we will
observe a positive result (false positive) in 5% of
our tests
– 5/100
– 50/1,000
– 500/10,000
– 5,000/100,000
– 50,000/1,000,000
Genome Wide Association
• 12,000, 550,000, 1,000,000 SNPs
• Multiple diseases add tests
• Stratifying by sex, ethnicity, smoking status etc
adds tests (and reduces power by effectively
reducing sample size)
• Need to rethink our critical P-value
Not Accounting for Multiple Tests
• Invalid statistical conclusions
• Confidence intervals that don’t contain the
population parameter
• Incorrect rejection of H0
Implications
• Clinical Trial
– May result in approval of a drug as an improvement
over existing drugs, when it is in fact equivalent to
the existing drugs.
– Could happen by chance that the new drug appears
to be worse for some side-effect, when it is actually
not worse for this side-effect.
Accounting for Multiple Testing
• Make standards for each comparison more
stringent than for a single test
• Bonferroni correction
– Adjust allowable type I error by dividing alpha by
number of tests
– E.g. 20 tests – p-value cut-off becomes 0.05/20 =
0.0025
– E.g. 500,000 tests – p-value cut-off becomes
0.05/500,000 = 0.0000001
Accounting for Multiple Testing
• Bonferroni thought to be too stringent,
particularly for GWAs
• False Discovery Rate (FDR)
– Instead of controlling the chance of any false
positives (as Bonferroni does), FDR controls the
expected proportion of false positives
– A FDR threshold is determined from the observed
p-value distribution, and hence is adaptive to the
amount of signal in your data.
FDR
• q-value replaces a p-value
• http://faculty.washington.edu/jstorey/qvalue/