Download P-values are Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Randomness wikipedia , lookup

Statistics wikipedia , lookup

Transcript
P-values are Random Variables
P-values are Random Variables
Duncan Murdoch
Department of Statistical and Actuarial Sciences
University of Western Ontario
October 4, 2007
1 of 29
P-values are Random Variables
Outline
1
Motivation
2
What are p-values?
3
How should we teach them?
4
Examples
This is joint work with Yu-Ling Tsai and James Adcock.
2 of 29
P-values are Random Variables
Motivation
Outline
1
Motivation
2
What are p-values?
3
How should we teach them?
4
Examples
3 of 29
P-values are Random Variables
Motivation
Teaching introductory statistics
I’ve been teaching hypothesis testing in introductory
statistics courses since 1988.
Over time I have gradually changed the way I teach
hypothesis testing and p-values; this talk describes my
current ideas.
A few recent events triggered the urge to write this up...
4 of 29
P-values are Random Variables
Motivation
Teaching introductory statistics
I’ve been teaching hypothesis testing in introductory
statistics courses since 1988.
Over time I have gradually changed the way I teach
hypothesis testing and p-values; this talk describes my
current ideas.
A few recent events triggered the urge to write this up...
4 of 29
P-values are Random Variables
Motivation
Teaching introductory statistics
I’ve been teaching hypothesis testing in introductory
statistics courses since 1988.
Over time I have gradually changed the way I teach
hypothesis testing and p-values; this talk describes my
current ideas.
A few recent events triggered the urge to write this up...
4 of 29
P-values are Random Variables
Motivation
A trigger
On the R-help list in May 2006, regarding inconsistent results
(p = 0.7767, p = 0.9059, p = 0.1887) when running a normality
test on randomly generated data:
I mistakenly had thought the p-values would be more
stable since I am artificially creating a random normal
distribution. Is this expected for a normality test or is
this an issue with how rnorm is producing random
numbers? I guess if I run it many times, I would find
that I would get many large values for the p-value?
– Name withheld
5 of 29
P-values are Random Variables
Motivation
A trigger
On the R-help list in May 2006, regarding inconsistent results
(p = 0.7767, p = 0.9059, p = 0.1887) when running a normality
test on randomly generated data:
I mistakenly had thought the p-values would be more
stable since I am artificially creating a random normal
distribution. Is this expected for a normality test or is
this an issue with how rnorm is producing random
numbers? I guess if I run it many times, I would find
that I would get many large values for the p-value?
– Name withheld
5 of 29
P-values are Random Variables
Motivation
A response
Discussion followed on why this was not a reasonable
expectation, including this:
We see this misunderstanding worryingly often.
Worrying because it reveals that a fundamental aspect
of statistical inference has not been grasped: that
p-values are designed to be (approximately) uniformly
distributed and fall below any given level with the
stated probability, when the null hypothesis is true.
– Peter Dalgaard
6 of 29
P-values are Random Variables
Motivation
A second trigger
At her thesis defence, Yu-Ling presented histograms of
simulated p-values to illustrate deficiencies in some asymptotic
approximations:
ted margins
0.4
0.6
0.8
1.0
4
3
1
0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
4
0.2
4
4
0.0
Gamma
2
3
2
1
0
0
1
2
3
4
Regularized Score
4
Score
3
3
3
2
2
2
One of the examiners questioned this way of presenting the
results.
1
1
1
7 of 29
P-values are Random Variables
Motivation
Advice on the web
On a medical school research methods course web page:
The t-test value for the stress test indicates that the
probability that the null hypothesis is true is smaller
than one-in-twenty.
8 of 29
P-values are Random Variables
Motivation
Advice on the web
On a medical school research methods course web page:
The t-test value for the stress test indicates that the
probability that the null hypothesis is true is smaller
than one-in-twenty.
I pointed out that this isn’t correct,
8 of 29
P-values are Random Variables
Motivation
Advice on the web
On a medical school research methods course web page:
The t-test value for the stress test indicates that the
probability that the null hypothesis is true is smaller
than one-in-twenty.
I pointed out that this isn’t correct, and received the response:
[This] is written the way it is to give students a way to
make decisions about statistical results in journal
articles. It is not for people learning about statistics.
Thus, the interpretation of p-values is correct enough.
8 of 29
P-values are Random Variables
What are p-values?
Outline
1
Motivation
2
What are p-values?
3
How should we teach them?
4
Examples
9 of 29
P-values are Random Variables
What are p-values?
The definition of a p-value
Given a null hypothesis H0 , an alternative H1 , and a test
statistic T , the p-value is
the probability, computed assuming that H0 is true,
that the test statistic would take a value as extreme or
more extreme than that actually observed.
– Moore, D. S. (2007), The Basic Practice of Statistics
In the typical case where large values of T are considered to be
extreme, this is
p = P(T ≥ tobs |H0 )
10 of 29
P-values are Random Variables
What are p-values?
Interpretation of a p-value
How should we interpret p?
the smaller the p-value, the stronger the evidence
against H0 provided by the data.
– Moore, D. S. (2007), The Basic Practice of Statistics
11 of 29
P-values are Random Variables
What are p-values?
How are p-values interpreted in the wild?
The definition is p = P(T ≥ tobs |H0 ). Some common
misconceptions (from Wikipedia):
1
the probability that the null hypothesis is true, i.e.
P(H0 |data).
2
the probability that a finding is “merely a fluke”.
3
the probability of falsely rejecting the null hypothesis, i.e.
P[H0 ∩ (tobs ≥ tcrit )].
4
the probability that a replicating experiment would not yield
the same conclusion.
12 of 29
P-values are Random Variables
What are p-values?
How are p-values interpreted in the wild?
The definition is p = P(T ≥ tobs |H0 ). Some common
misconceptions (from Wikipedia):
1
the probability that the null hypothesis is true, i.e.
P(H0 |data).
2
the probability that a finding is “merely a fluke”.
3
the probability of falsely rejecting the null hypothesis, i.e.
P[H0 ∩ (tobs ≥ tcrit )].
4
the probability that a replicating experiment would not yield
the same conclusion.
12 of 29
P-values are Random Variables
What are p-values?
How are p-values interpreted in the wild?
The definition is p = P(T ≥ tobs |H0 ). Some common
misconceptions (from Wikipedia):
1
the probability that the null hypothesis is true, i.e.
P(H0 |data).
2
the probability that a finding is “merely a fluke”.
3
the probability of falsely rejecting the null hypothesis, i.e.
P[H0 ∩ (tobs ≥ tcrit )].
4
the probability that a replicating experiment would not yield
the same conclusion.
12 of 29
P-values are Random Variables
What are p-values?
How are p-values interpreted in the wild?
The definition is p = P(T ≥ tobs |H0 ). Some common
misconceptions (from Wikipedia):
1
the probability that the null hypothesis is true, i.e.
P(H0 |data).
2
the probability that a finding is “merely a fluke”.
3
the probability of falsely rejecting the null hypothesis, i.e.
P[H0 ∩ (tobs ≥ tcrit )].
4
the probability that a replicating experiment would not yield
the same conclusion.
12 of 29
P-values are Random Variables
How should we teach them?
Outline
1
Motivation
2
What are p-values?
3
How should we teach them?
4
Examples
13 of 29
P-values are Random Variables
How should we teach them?
How do we teach confidence intervals?
Definitions are awkward:
A level C confidence interval for a parameter is an
interval computed from sample data by a method that
has probability C of producing an interval containing
the true value of the parameter.
–Moore and McCabe (2003), Introduction to the
Practice of Statistics
14 of 29
P-values are Random Variables
How should we teach them?
10
5
Simulation number
15
20
But a picture tells the story...
1.0
1.5
2.0
2.5
3.0
Interval
15 of 29
P-values are Random Variables
How should we teach them?
We should emphasize that p-values are random
variables
Start by saying the p-value is simply a transformation of
the test statistic.
If the audience has enough mathematical sophistication,
give a formula:
p = 1 − F (tobs )
where F (·) is the CDF of T under H0 .
Show (or state) that this results in p ∼ Unif(0, 1) under H0 .
Mention that a good T will tend to be larger under H1 , so p
will be smaller.
THEN give Moore’s statement, as one justification for this
definition.
16 of 29
P-values are Random Variables
How should we teach them?
We should emphasize that p-values are random
variables
Start by saying the p-value is simply a transformation of
the test statistic.
If the audience has enough mathematical sophistication,
give a formula:
p = 1 − F (tobs )
where F (·) is the CDF of T under H0 .
Show (or state) that this results in p ∼ Unif(0, 1) under H0 .
Mention that a good T will tend to be larger under H1 , so p
will be smaller.
THEN give Moore’s statement, as one justification for this
definition.
16 of 29
P-values are Random Variables
How should we teach them?
We should emphasize that p-values are random
variables
Start by saying the p-value is simply a transformation of
the test statistic.
If the audience has enough mathematical sophistication,
give a formula:
p = 1 − F (tobs )
where F (·) is the CDF of T under H0 .
Show (or state) that this results in p ∼ Unif(0, 1) under H0 .
Mention that a good T will tend to be larger under H1 , so p
will be smaller.
THEN give Moore’s statement, as one justification for this
definition.
16 of 29
P-values are Random Variables
How should we teach them?
We should emphasize that p-values are random
variables
Start by saying the p-value is simply a transformation of
the test statistic.
If the audience has enough mathematical sophistication,
give a formula:
p = 1 − F (tobs )
where F (·) is the CDF of T under H0 .
Show (or state) that this results in p ∼ Unif(0, 1) under H0 .
Mention that a good T will tend to be larger under H1 , so p
will be smaller.
THEN give Moore’s statement, as one justification for this
definition.
16 of 29
P-values are Random Variables
How should we teach them?
We should emphasize that p-values are random
variables
Start by saying the p-value is simply a transformation of
the test statistic.
If the audience has enough mathematical sophistication,
give a formula:
p = 1 − F (tobs )
where F (·) is the CDF of T under H0 .
Show (or state) that this results in p ∼ Unif(0, 1) under H0 .
Mention that a good T will tend to be larger under H1 , so p
will be smaller.
THEN give Moore’s statement, as one justification for this
definition.
16 of 29
P-values are Random Variables
How should we teach them?
Show pictures!
P-values are random variables, so it is natural to study their
distribution by simulation.
2.0
1.5
Density
0.0
0.5
1.0
1.5
1.0
0.0
0.5
Density
2.0
2.5
10000 p−values under H1
2.5
10000 p−values under H0
0.0
0.2
0.4
0.6
0.8
1.0
p−values
0.0
0.2
0.4
0.6
0.8
1.0
p−values
Histograms are easily understood.
17 of 29
P-values are Random Variables
Examples
Outline
1
Motivation
2
What are p-values?
3
How should we teach them?
4
Examples
18 of 29
P-values are Random Variables
Examples
One-sample t-test
Data X1 , . . . , X4 ∼ N(µ, σ 2 ) i.i.d.
Hypotheses H0 : µ = 0 versus H1 : µ > 0
Test statistic T = X̄√ ∼ t(3) under H0
s/ 4
0.0
0.2
0.4
0.6
p−values
0.8
1.0
8
6
0
2
4
Density
6
0
2
4
Density
6
4
2
0
Density
µ=1
8
µ = 0.5
8
µ=0
0.0
0.2
0.4
0.6
p−values
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
p−values
19 of 29
P-values are Random Variables
Examples
Composite null hypotheses
When H0 is composite, it may not uniquely determine the
distribution.
Hypotheses H0 : µ ≤ 0 versus H1 : µ > 0
0.0
0.2
0.4
0.6
p−values
0.8
1.0
4
3
0
1
2
Density
3
0
1
2
Density
3
2
1
0
Density
µ = 0.5
4
µ=0
4
µ = − 0.5
0.0
0.2
0.4
0.6
p−values
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
p−values
20 of 29
P-values are Random Variables
Examples
Violations of assumptions
If our assumptions are violated, the null distribution of p
may be distorted, but larger samples often improve the
approximations.
Example: Assume data are N(µ, σ 2 ), but they really
Exponential(1).
1
2
One-sample t-test, H0 : µ = 1 versus H1 : µ 6= 1
Two-sample t-test, H0 : µ1 = µ2 versus H1 : µ1 6= µ2
Note that H0 is true in both cases. Let’s look at the null
distributions.
21 of 29
P-values are Random Variables
Examples
Violations of assumptions
If our assumptions are violated, the null distribution of p
may be distorted, but larger samples often improve the
approximations.
Example: Assume data are N(µ, σ 2 ), but they really
Exponential(1).
1
2
One-sample t-test, H0 : µ = 1 versus H1 : µ 6= 1
Two-sample t-test, H0 : µ1 = µ2 versus H1 : µ1 6= µ2
Note that H0 is true in both cases. Let’s look at the null
distributions.
21 of 29
P-values are Random Variables
Examples
Violations of assumptions
If our assumptions are violated, the null distribution of p
may be distorted, but larger samples often improve the
approximations.
Example: Assume data are N(µ, σ 2 ), but they really
Exponential(1).
1
2
One-sample t-test, H0 : µ = 1 versus H1 : µ 6= 1
Two-sample t-test, H0 : µ1 = µ2 versus H1 : µ1 6= µ2
Note that H0 is true in both cases. Let’s look at the null
distributions.
21 of 29
P-values are Random Variables
Examples
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
p−values
Two sample t−test with n=2
Two sample t−test with n=10
1.0
1.0
0.0
0.0
1.0
Density
2.0
p−values
2.0
0.0
Density
1.0
Density
0.0
1.0
0.0
Density
2.0
One sample t−test with n=10
2.0
One sample t−test with n=2
0.0
0.2
0.4
0.6
p−values
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
p−values
22 of 29
P-values are Random Variables
Examples
Discrete data
With discrete data, p-values inherit a discrete distribution.
You won’t see Unif(0, 1) under the null.
This makes display of simulated p-values harder, but the
empirical CDF is not too bad.
23 of 29
P-values are Random Variables
Examples
Discrete data
With discrete data, p-values inherit a discrete distribution.
You won’t see Unif(0, 1) under the null.
This makes display of simulated p-values harder, but the
empirical CDF is not too bad.
23 of 29
P-values are Random Variables
Examples
Test for independence in a 2 × 2 table
Example from Tamhane and Dunlop (2000), Statistics and Data
Analysis
Prednisone
Prednisone + VCR
Total
Success
14
38
52
Failure
7
4
11
Total
21
42
63
Pearson’s chi-square p-value: 0.04608
Fisher’s exact p-value: 0.03232
What are the null distributions like?
24 of 29
P-values are Random Variables
Examples
Null tables with fixed margins.
0.8
0.0
0.4
Proportion
8
4
0
Density
Pearson's test
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
p−values
0.4
0.6
0.8
1.0
0.8
1.0
p−values
0.8
0.4
Proportion
0.0
2
0
Density
4
Fisher's test
0.0
0.2
0.4
0.6
p−values
0.8
1.0
0.0
0.2
0.4
0.6
p−values
25 of 29
P-values are Random Variables
Examples
Null tables with independent rows, P(success) = 52/63
0.8
0.4
Proportion
0.0
2
0
Density
4
Pearson's test
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
p−values
0.4
0.6
0.8
1.0
0.8
1.0
p−values
0.8
0.0
0.4
Proportion
4
2
0
Density
6
Fisher's test
0.0
0.2
0.4
0.6
p−values
0.8
1.0
0.0
0.2
0.4
0.6
p−values
26 of 29
P-values are Random Variables
Examples
Other examples
Explore robustness in other situations where the
assumptions are violated. Look for the effect of violations
on the power of the test.
Study Welch’s correction for unequal variances in a
two-sample t-test. What happens when the variances are
equal? What happens if we do not use it when we should?
Show Monte Carlo p-values when the null distribution is
only available by simulation. Explore bootstrap tests.
Explore other asymptotic approximations by studying the
distributions of nominal p-values.
27 of 29
P-values are Random Variables
Examples
Other examples
Explore robustness in other situations where the
assumptions are violated. Look for the effect of violations
on the power of the test.
Study Welch’s correction for unequal variances in a
two-sample t-test. What happens when the variances are
equal? What happens if we do not use it when we should?
Show Monte Carlo p-values when the null distribution is
only available by simulation. Explore bootstrap tests.
Explore other asymptotic approximations by studying the
distributions of nominal p-values.
27 of 29
P-values are Random Variables
Examples
Other examples
Explore robustness in other situations where the
assumptions are violated. Look for the effect of violations
on the power of the test.
Study Welch’s correction for unequal variances in a
two-sample t-test. What happens when the variances are
equal? What happens if we do not use it when we should?
Show Monte Carlo p-values when the null distribution is
only available by simulation. Explore bootstrap tests.
Explore other asymptotic approximations by studying the
distributions of nominal p-values.
27 of 29
P-values are Random Variables
Examples
Other examples
Explore robustness in other situations where the
assumptions are violated. Look for the effect of violations
on the power of the test.
Study Welch’s correction for unequal variances in a
two-sample t-test. What happens when the variances are
equal? What happens if we do not use it when we should?
Show Monte Carlo p-values when the null distribution is
only available by simulation. Explore bootstrap tests.
Explore other asymptotic approximations by studying the
distributions of nominal p-values.
27 of 29
P-values are Random Variables
Examples
Still more examples
0
1
2
3
4
In multiple testing, illustrate the distribution of the smallest
of n p-values, and the distribution of Bonferroni-corrected
p-values.
Storey and Tibshirani (2003) used histograms of p-values
in a collection of genomewide tests in order to illustrate
false discovery rate calculations.
0.0
0.2
0.4
0.6
0.8
1.0
Density of observed p−values
28 of 29
P-values are Random Variables
Examples
Still more examples
0
1
2
3
4
In multiple testing, illustrate the distribution of the smallest
of n p-values, and the distribution of Bonferroni-corrected
p-values.
Storey and Tibshirani (2003) used histograms of p-values
in a collection of genomewide tests in order to illustrate
false discovery rate calculations.
0.0
0.2
0.4
0.6
0.8
1.0
Density of observed p−values
28 of 29
P-values are Random Variables
Conclusion
Many students end up with fallacious interpretations of
p-values, e.g. P(H0 |data).
We should look at histograms (or ECDF plots) of p-values
from simulations.
P-values are random variables!
29 of 29