Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STA111 - Lecture 19 1 Odds and Ends: p-values, testing, errors The interpretation of a p-value of, say, 0.08 is: “If the null hypothesis is true, the probability of observing data as extreme or more extreme than what was observed in the sample is 0.08”. Remarks: • A p-value is not the probability that the null hypothesis is true. Be careful, this is a very common misunderstanding. • The p-value does not quantify the size of the effect. A finding that has a very small p-value, can be associated with an effect size that is so small that is scientifically irrelevant. Another important fact is that we can perform two-sided hypothesis tests by looking at confidence intervals. If a (1 − α)% confidence interval for a parameter doesn’t include the hypothesized value under the null, we reject it at an α significance level (and if it does contain the “null value”, we don’t reject H0 ). When we covered one-sided tests (with altenative hypotheses of the type H1 : θ > θ0 or H1 : θ < θ0 ) we assumed point null hypotheses H0 : θ = θ0 . This might have seemed unnatural and counterintuitive at the time. It turns out that the tests don’t change if the null hypothesis is one-sided: the same test statistics and procedures that we saw for H0 : θ = θ0 work for null hypotheses like H0 : θ ≤ θ0 or H0 : θ ≥ θ0 . Lastly, remember the table: H0 is true H1 is true Don’t Reject H0 OK Type II error (false negative) Reject H0 Type I error (false positive) OK What is the probability of falsely rejecting the null if we set a significance level of α for our tests? The answer is precisely α (do you see why?). What about the Type II error probability? This is the probability of not being able to reject H0 when it is false (false negative). The next section is related to this error probability. 2 Power, Sample Size Determination The power of a test is defined as the probability of rejecting the null when it is false. In short, one can say that the power is the probability of a true positive. As we have seen, significance levels control the probability of observing a false positive. In practice, we would also like to make sure that our tests are able to reject the null if it is false. Let’s work out a simple example. Assume that we are doing a z-test for a mean with H0 : µ = µ0 vs H1 : µ > µ0 . Suppose that the true value of the population average is µ1 > µ0 . The test rejects the null √ hypothesis at a significance level α if the Z-statistic n(X n − µ0 )/σ is greater than zα , where zα is the value such that PH0 (Z > zα ) = α (for instance, if α = 0.05, zα = 1.64). Therefore, the power of the test is √ σ n(µ1 − µ0 ) = PH1 Z̃ > zα − , power = PH1 (reject null) = PH1 (Z > zα ) = PH1 X n > µ0 + zα √ σ n 1 where Z̃ ∼ Normal(0, 1). Recall that we want to maximize the power, so we want to maximize the probability above. Let’s discuss the role of all the ingredients in the formula: • Significance level: The significance level of the test comes in the zα term. Increasing the significance level increases power because we can reject for greater p-values. • Sample size: Increasing the sample size increases power. • Population variance: High values of σ decrease power. • Difference between µ1 and µ0 : The greater µ1 is with respect to µ0 , the easier it is to reject. This formula (and analogous versions for other tests) are used for determining sample sizes in scientific experiments. These computations rely on population parameters that are unknown (difference between true value and hypothesized value, population variance), so scientists make computations under conservative estimates. Example: Suppose that a pharmaceutical wants to run a clinical trial for testing a new drug. They measure a continuous random variable (that can be modeled well with a Normal) that takes on positive values if there is a “positive” treatment effect and negative values if there is a “negative” treatment effect. The FDA requires that the probability of a false positive be set to α = 0.01. The null hypothesis of the trial is that the drug is ineffective or detrimental and the alternative hypothesis is that the drug is beneficial. Previous studies ensure that the population variance should be below 1. The pharmaceutical wants to design an experiment such that the probability of rejecting the null is at least 0.8 if the average treatment effect is equal to 0.5. What should be the sample size? Given this information, we should substitute zα = 2.33, σ = 1, and µ1 − µ0 = 0.5, so √ power = P (Z̃ > 2.33 − n 0.5) = 0.8 The value k such that P (Z̃ > 0.8) = k is k = −0.842. Solving for n, we get n = [(2.33+0.842)/0.5]2 ≈ 40. Exercise 1. (Same idea but with different numbers) Suppose that a pharmaceutical wants to run a clinical trial for testing a new drug. They measure a continuous random variable such that positive values represent “positive” treatment effect and negative values represent “negative” treatment effect. The FDA requires that the probability of a false positive should be set to α = 0.01. The null hypothesis of the test is that the drug is ineffective or detrimental, and the alternative hypothesis is that the drug is beneficial. Previous studies ensure that the population variance should be below 2. The pharmaceutical wants to design an experiment such that the probability of rejecting the null is 0.9 if the average treatment effect is equal to 0.25. What should be the sample size? Exercise 2. Find the formula for the power of a z-test with null hypothesis H0 : µ = µ0 vs the alternative H1 : µ 6= µ0 . Interpret the equation that you get as a function of the significance level, sample size, population variance and difference between the true value µ1 and the hypothesized value under the null µ0 . 2