Download 19. P-values, Power, Sample Size

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Taylor's law wikipedia , lookup

Birthday problem wikipedia , lookup

Probability box wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

Transcript
STA111 - Lecture 19
1
Odds and Ends: p-values, testing, errors
The interpretation of a p-value of, say, 0.08 is: “If the null hypothesis is true, the probability of observing
data as extreme or more extreme than what was observed in the sample is 0.08”.
Remarks:
• A p-value is not the probability that the null hypothesis is true. Be careful, this is a very common
misunderstanding.
• The p-value does not quantify the size of the effect. A finding that has a very small p-value, can be
associated with an effect size that is so small that is scientifically irrelevant.
Another important fact is that we can perform two-sided hypothesis tests by looking at confidence intervals.
If a (1 − α)% confidence interval for a parameter doesn’t include the hypothesized value under the null, we
reject it at an α significance level (and if it does contain the “null value”, we don’t reject H0 ).
When we covered one-sided tests (with altenative hypotheses of the type H1 : θ > θ0 or H1 : θ < θ0 ) we
assumed point null hypotheses H0 : θ = θ0 . This might have seemed unnatural and counterintuitive at the
time. It turns out that the tests don’t change if the null hypothesis is one-sided: the same test statistics
and procedures that we saw for H0 : θ = θ0 work for null hypotheses like H0 : θ ≤ θ0 or H0 : θ ≥ θ0 .
Lastly, remember the table:
H0 is true
H1 is true
Don’t Reject H0
OK
Type II error (false negative)
Reject H0
Type I error (false positive)
OK
What is the probability of falsely rejecting the null if we set a significance level of α for our tests? The
answer is precisely α (do you see why?). What about the Type II error probability? This is the probability
of not being able to reject H0 when it is false (false negative). The next section is related to this error
probability.
2
Power, Sample Size Determination
The power of a test is defined as the probability of rejecting the null when it is false. In short, one can say
that the power is the probability of a true positive. As we have seen, significance levels control the probability
of observing a false positive. In practice, we would also like to make sure that our tests are able to reject
the null if it is false.
Let’s work out a simple example. Assume that we are doing a z-test for a mean with H0 : µ = µ0 vs
H1 : µ > µ0 . Suppose that the true value of the population average is µ1 > µ0 . The test rejects the null
√
hypothesis at a significance level α if the Z-statistic n(X n − µ0 )/σ is greater than zα , where zα is the
value such that PH0 (Z > zα ) = α (for instance, if α = 0.05, zα = 1.64). Therefore, the power of the test
is
√
σ
n(µ1 − µ0 )
= PH1 Z̃ > zα −
,
power = PH1 (reject null) = PH1 (Z > zα ) = PH1 X n > µ0 + zα √
σ
n
1
where Z̃ ∼ Normal(0, 1). Recall that we want to maximize the power, so we want to maximize the probability
above. Let’s discuss the role of all the ingredients in the formula:
• Significance level: The significance level of the test comes in the zα term. Increasing the significance
level increases power because we can reject for greater p-values.
• Sample size: Increasing the sample size increases power.
• Population variance: High values of σ decrease power.
• Difference between µ1 and µ0 : The greater µ1 is with respect to µ0 , the easier it is to reject.
This formula (and analogous versions for other tests) are used for determining sample sizes in scientific experiments. These computations rely on population parameters that are unknown (difference between true value
and hypothesized value, population variance), so scientists make computations under conservative estimates.
Example: Suppose that a pharmaceutical wants to run a clinical trial for testing a new drug. They measure
a continuous random variable (that can be modeled well with a Normal) that takes on positive values if
there is a “positive” treatment effect and negative values if there is a “negative” treatment effect. The FDA
requires that the probability of a false positive be set to α = 0.01. The null hypothesis of the trial is that
the drug is ineffective or detrimental and the alternative hypothesis is that the drug is beneficial. Previous
studies ensure that the population variance should be below 1. The pharmaceutical wants to design an
experiment such that the probability of rejecting the null is at least 0.8 if the average treatment effect is
equal to 0.5. What should be the sample size?
Given this information, we should substitute zα = 2.33, σ = 1, and µ1 − µ0 = 0.5, so
√
power = P (Z̃ > 2.33 − n 0.5) = 0.8
The value k such that P (Z̃ > 0.8) = k is k = −0.842. Solving for n, we get n = [(2.33+0.842)/0.5]2 ≈ 40.
Exercise 1. (Same idea but with different numbers) Suppose that a pharmaceutical wants to run a clinical
trial for testing a new drug. They measure a continuous random variable such that positive values represent
“positive” treatment effect and negative values represent “negative” treatment effect. The FDA requires that
the probability of a false positive should be set to α = 0.01. The null hypothesis of the test is that the drug
is ineffective or detrimental, and the alternative hypothesis is that the drug is beneficial. Previous studies
ensure that the population variance should be below 2. The pharmaceutical wants to design an experiment
such that the probability of rejecting the null is 0.9 if the average treatment effect is equal to 0.25. What
should be the sample size?
Exercise 2. Find the formula for the power of a z-test with null hypothesis H0 : µ = µ0 vs the alternative
H1 : µ 6= µ0 . Interpret the equation that you get as a function of the significance level, sample size,
population variance and difference between the true value µ1 and the hypothesized value under the null µ0 .
2