Download Kerns hypothesis testing

Hypothesis Testing 10.2. TESTS FOR PROPORTIONS 219 • Y = 17, then throw away the torque converter. Let p denote the proportion of defectives produced by the machine. Before the installation of the torque converter p was 0.10. Then we installed the torque converter. Did p change? Did it go up or down? We use statistics to decide. Our method is to observe data and construct a 95% confidence interval for p, r p̂(1 − p̂) p̂ ± zα/2 . (10.2.1) n If the confidence interval is • [0.01, 0.05], then we are 95% confident that 0.01 ≤ p ≤ 0.05, so there is evidence that the torque converter is helping. • [0.15, 0.19], then we are 95% confident that 0.15 ≤ p ≤ 0.19, so there is evidence that the torque converter is hurting. • [0.07, 0.11], then there is not enough evidence to conclude that the torque converter is doing anything at all, positive or negative. 10.2.1 Terminology The null hypothesis H0 is a “nothing” hypothesis, whose interpretation could be that nothing has changed, there is no difference, there is nothing special taking place, etc.. In Example 10.1 the null hypothesis would be H0 : p = 0.10. The alternative hypothesis H1 is the hypothesis that something has changed, in this case, H1 : p , 0.10. Our goal is to statistically test the hypothesis H0 : p = 0.10 versus the alternative H1 : p , 0.10. Our procedure will be: 1. Go out and collect some data, in particular, a simple random sample of observations from the machine. 2. Suppose that H0 is true and construct a 100(1 − α)% confidence interval for p. 3. If the confidence interval does not cover p = 0.10, then we rejectH0 . Otherwise, we fail to rejectH0 . Remark 10.2. Every time we make a decision it is possible to be wrong, and there are two possible mistakes that we could make. We have committed a Type I Error if we reject H0 when in fact H0 is true. This would be akin to convicting an innocent person for a crime (s)he did not commit. Type II Error if we fail to reject H0 when in fact H1 is true. This is analogous to a guilty person escaping conviction. Type I Errors are usually considered worse2 , and we design our statistical procedures to control the probability of making such a mistake. We define the significance level of the test = IP(Type I Error) = α. (10.2.2) We want α to be small which conventionally means, say, α = 0.05, α = 0.01, or α = 0.005 (but could mean anything, in principle). 2 There is no mathematical difference between the errors, however. The bottom line is that we choose one type of error to control with an iron fist, and we try to minimize the probability of making the other type. That being said, null hypotheses are often by design to correspond to the “simpler” model, so it is often easier to analyze (and thereby control) the probabilities associated with Type I Errors. 220 CHAPTER 10. HYPOTHESIS TESTING • The rejection region (also known as the critical region) for the test is the set of sample values which would result in the rejection of H0 . For Example 10.1, the rejection region would be all possible samples that result in a 95% confidence interval that does not cover p = 0.10. • The above example with H1 : p , 0.10 is called a two-sided test. Many times we are interested in a one-sided test, which would look like H1 : p < 0.10 or H1 : p > 0.10. We are ready for tests of hypotheses for one proportion. Table here. Don’t forget the assumptions. Example 10.3. Find 1. The null and alternative hypotheses 2. Check your assumptions. 3. Define a critical region with an α = 0.05 significance level. 4. Calculate the value of the test statistic and state your conclusion. Example 10.4. Suppose p = the proportion of students who are admitted to the graduate school of the University of California at Berkeley, and suppose that a public relations officer boasts that UCB has historically had a 40% acceptance rate for its graduate school. Consider the data stored in the table UCBAdmissions from 1973. Assuming these observations constituted a simple random sample, are they consistent with the officer’s claim, or do they provide evidence that the acceptance rate was significantly less than 40%? Use an α = 0.01 significance level. Our null hypothesis in this problem is H0 : p = 0.4 and the alternative hypothesis is H1 : p < 0.4. We reject the null hypothesis if p̂ is too small, that is, if √ p̂ − 0.4 < −zα , 0.4(1 − 0.4)/n (10.2.3) where α = 0.01 and −z0.01 is > -qnorm(0.99) [1] -2.326348 Our only remaining task is to find the value of the test statistic and see where it falls relative to the critical value. We can find the number of people admitted and not admitted to the UCB graduate school with the following. > A <- as.data.frame(UCBAdmissions) > head(A) 1 2 3 4 5 6 Admit Gender Dept Freq Admitted Male A 512 Rejected Male A 313 Admitted Female A 89 Rejected Female A 19 Admitted Male B 353 Rejected Male B 207 10.2. TESTS FOR PROPORTIONS 221 > xtabs(Freq ~ Admit, data = A) Admit Admitted Rejected 1755 2771 Now we calculate the value of the test statistic. > phat <- 1755/(1755 + 2771) > (phat - 0.4)/sqrt(0.4 * 0.6/(1755 + 2771)) [1] -1.680919 Our test statistic is not less than −2.32, so it does not fall into the critical region. Therefore, we fail to reject the null hypothesis that the true proportion of students admitted to graduate school is less than 40% and say that the observed data are consistent with the officer’s claim at the α = 0.01 significance level. Example 10.5. We are going to do Example 10.4 all over again. Everything will be exactly the same except for one change. Suppose we choose significance level α = 0.05 instead of α = 0.01. Are the 1973 data consistent with the officer’s claim? Our null and alternative hypotheses are the same. Our observed test statistic is the same: it was approximately −1.68. But notice that our critical value has changed: α = 0.05 and −z0.05 is > -qnorm(0.95) [1] -1.644854 Our test statistic is less than −1.64 so it now falls into the critical region! We now reject the null hypothesis and conclude that the 1973 data provide evidence that the true proportion of students admitted to the graduate school of UCB in 1973 was significantly less than 40%. The data are not consistent with the officer’s claim at the α = 0.05 significance level. What is going on, here? If we choose α = 0.05 then we reject the null hypothesis, but if we choose α = 0.01 then we fail to reject the null hypothesis. Our final conclusion seems to depend on our selection of the significance level. This is bad; for a particular test, we never know whether our conclusion would have been different if we had chosen a different significance level. Or do we? Clearly, for some significance levels we reject, and for some significance levels we do not. Where is the boundary? That is, what is the significance level for which we would reject at any significance level bigger, and we would fail to reject at any significance level smaller? This boundary value has a special name: it is called the p-value of the test. Definition 10.6. The p-value, or observed significance level, of a hypothesis test is the probability when the null hypothesis is true of obtaining the observed value of the test statistic (such as p̂) or values more extreme – meaning, in the direction of the alternative hypothesis3 . 3 Bickel and Doksum [7] state the definition particularly well: the p-value is “the smallest level of significance α at which an experimenter using [the test statistic] T would reject [H0 ] on the basis of the observed [sample] outcome x”. 222 CHAPTER 10. HYPOTHESIS TESTING Example 10.7. Calculate the p-value for the test in Examples 10.4 and 10.5. The p-value for this test is the probability of obtaining a z-score equal to our observed test statistic (which had z-score ≈ −1.680919) or more extreme, which in this example is less than the observed test statistic. In other words, we want to know the area under a standard normal curve on the interval (−∞, −1.680919]. We can get this easily with > pnorm(-1.680919) [1] 0.04638932 We see that the p-value is strictly between the significance levels α = 0.01 and α = 0.05. This makes sense: it has to be bigger than α = 0.01 (otherwise we would have rejected H0 in Example 10.4) and it must also be smaller than α = 0.05 (otherwise we would not have rejected H0 in Example 10.5). Indeed, p-values are a characteristic indicator of whether or not we would have rejected at assorted significance levels, and for this reason a statistician will often skip the calculation of critical regions and critical values entirely. If (s)he knows the p-value, then (s)he knows immediately whether or not (s)he would have rejected at any given significance level. Thus, another way to phrase our significance test procedure is: we will reject H0 at the α-level of significance if the p-value is less than α. Remark 10.8. If we have two populations with proportions p1 and p2 then we can test the null hypothesis H0 : p1 = p2 . Table Here.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Kerns hypothesis testing