* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 8. Hypothesis Testing
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
8. Hypothesis Testing 8.1 Tests of Population Mean Consider an automatic machine which bottles cola into 2-liter (2000 cc) bottles. Because of changes in working conditions, wear and tear, and variations in the process, the exact amount put into bottles will vary. So the machine needs to be checked periodically to ensure that it puts 2000cc on the average into each bottle. Three different cases are possible here. We shall see them one by one. Consumer protection requires the average amount to be at least 2000 cc so that consumers get their money’s worth. In this case the hypotheses would be formulated as H0: 2000 Ha: < 2000 To test the null hypothesis, let us say a random sample of 49 bottles is taken, and the bottles are tested for the exact amounts of their contents. From sampling theory, we know that since the sample is large the sample mean, X , will be normally distributed with mean and variance 2/n. If the computed sample mean, x , is greater than 2000 there is nothing to complain about and the null hypothesis (H0) would not be rejected. But if x is less than 2000 there is reason to doubt H0, and the lesser it gets the more doubtful we become. At some point we might consider x too low to accept H0, and reject it. Because the rejection of occurs when x is low, which corresponds to the left tail of the (normal) distribution of X , we call this case a 1-tailed test with rejection on the left. In rejecting the null hypothesis, we might be committing a Type I error. The probability of this occurring, though, is not as clear cut as in the case of Acceptance Sampling (with n = 1) that we saw in the previous section. What should be the p-value here? The convention is to declare the area to the left of x on the normal distribution for X as the p-value. The p-value of an evidence should therefore be interpreted as the probability of the evidence being as unfavorable to H0 as, or more unfavorable to H0 than, it currently is. It is also customary to calculate the p-value on the standard normal distribution after converting x into its z-score using the formula x z . / n Specifically, the p-value will be the area under the standard normal curve to the left of the computed z value. Since the statistic z decides the outcome of the test, it is called the test statistic. The maximum allowable p-value is , and its complement, (1 ), is called the confidence level. Whenever the p-value is less than , H0 is rejected. Let us carry out a sample calculation for the cola example. We shall assume that from past experience is known to be 7. Suppose x is 1998, then the test statistic z = (1998 2000)/(7/ 49 ) = 2. The p-value is then calculated on the spreadsheet using the formula =NORMSDIST(-2) which yields 0.0228 or 2.28%. Thus H0 will be rejected if is 5% or 10%, but not if it is 1%. Figure 8.1.1 shows the sheet named “Tests of mu” in Hypothesis Testing.xls. When the input data are entered into the shaded cells of this template, the test statistic z and the p-values appear in the designated places. Note that you, as the template user, should know whether it is a 2-tailed or 1-tailed left/right test. 45 Figure 8.1.1. Testing [Workbook: Hypothesis Testing.xls; Sheet: Testing mu] The data entered currently in the template corresponds to the example calculation above. The p-value for the 1-tailed test with rejection on the left, 0.0228, appears in cell F9. When the population is finite, a finite population correction can be applied. The population size N should be entered in cell J5. The corrected z and p-values appear in their places (not visible in Figure 8.1.1). Note that the p-value has decreased. Thus a hypothesis that is not rejected when the correction was not applied might be rejected when the correction is applied. Often, the population standard deviation is unknown. In this case, we can conduct a t-test if the population is normal. In a t-test, we substitute the sample standard deviation s in place of . The statistic X t S/ n follows Student’s t-distribution with (n 1) degrees of freedom. A t-distribution with degrees of freedom = df has mean zero, variance df/(df 2) for df > 2, and is symmetric. As df increases it approaches the normal distribution. The bottom half of the template seen in Figure 8.1.1 contains the t-test. Note that the input area contains s instead of . The second concern with the automatic bottling machine is the profit motive. Putting more than 2000 cc on the average into each bottle would waste cola and reduce profits. A special action, of stopping and resetting the machine, is necessary when the average is greater than 2000. In this case the hypotheses will be formulated as H0: 2000 Ha: > 2000 Here the rejection will occur when the sample mean is too much more than 2000 or when the test statistic z or t goes too far into the right tail. This case is therefore a 1-tailed test with rejection on the right. On the template shown in Figure 8.1.1, the p-value for this case appears in cell G9 or G18 must be read-off. The third concern with the bottling machine is that of process control. To be in control, the average amount bottled should be neither too much nor too little. It should be as near 2000 as possible. The special action of stopping and resetting the machine is necessary on both tails of the test statistic. Here the hypotheses will be formulated as H0: = 2000 Ha: 2000 Since the rejection of H0 occurs on both tails, this test is called a 2-tailed test. On the template, the p-value for this case appears in cell E9 or E18. 46 8.2 Tests of Population Proportion 8.2.1 The Test A z-test can be used to test hypotheses about population proportion p. The test statistic z is calculated using the formula p p0 z p0 (1 p0 ) / n where p is the sample proportion and p0 is the hypothesized value for p. The test may be 1-tailed or 2tailed. The template is shown in Figure 8.2.1. Figure 8.2.1. Testing p [Workbook: Hypothesis Testing.xls; Sheet: Testing p] When the input data are entered in the shaded cells, z and p-values appear in their designated places. The bottom portion is for applying finite population correction. The data currently entered are from the Cheese Spread case of Bowerman/O'Connell. The test is 2-tailed and no finite population correction is necessary. The p-value is 0.0000, or almost zero. Hence the null hypothesis is rejected. If finite population correction is needed, the value for population size, N, should be entered in cell B13 and the p-value should be read off from the range E15:G15. 8.3 Tests of Population Variance Figure 8.3.1. Testing 2 [Workbook: Hypothesis Testing.xls; Sheet: Testing Variance] ] When random samples are drawn from a normally distributed population, the sample statistic 47 (n 1)s2/2 follows a 2 distribution with (n 1) degrees of freedom. Thus a 2 test can be done for testing hypotheses regarding 2. Figure 8.3.1 shows the template to be used for this test. When the input data are entered into the shaded cells, the 2 value and the p-values appear in their designated places. 8.4 The Power of a Test 8.4.1 The Power of a Test Figure 8.4.1 shows the template that can be used for calculating and plotting the Power of a test. On this template, the type of test is selected in the drop down box. The template assumes that the critical value(s) for the sample mean is (are) decided according to the value in cell B6. For particular values of actual , the probability of Type II error and the Power are calculated in the range E4:I5. As seen in the range E3:G5, when = 2.985, the power of the test is almost 1, when = 2.99, it is 0.9913 and when = 2.995, it is 0.6433. As approaches 0, the null hypothesis becomes "less and less false" and more and more difficult to detect as being false. Hence the power decreases. At worst, the power equals the value used for the hypothesis test. Figure 8.4.1. Power of a Test [Workbook: Hypothesis Testing.xls; Sheet: Power of a mu test] In the same template, an accompanying plot of Power versus (known as the power curve) shows how power varies with . To create this plot, enter a meaningful starting value for in cell L2. Note how the power starts from almost 1 and approaches 0.05 (the value of used for the hypothesis test) as approaches 0. [When = 0, the null hypothesis is true and power is meaningless. But the template has been programmed to return a value of 1 for power and zero for probability of Type II error.] 48 8.4.2 The Power of a p Test Figure 8.4.2 shows the template. Its use is similar to that of the previous template. When the input data are entered in the shaded cells and the type of test is selected from the drop down box, the results appear in the range E4:I5. The power curve is also plotted in the same template (not shown in Figure). Enter a meaningful starting value for p in cell L2 to create this curve. Figure 8.4.2. Power of a p Test [Workbook: Hypothesis Testing; Sheet: Power of a p Test] 8.5 Sample Size Determination An important practical decision in hypothesis testing is sample size determination. The objective of hypothesis testing is to limit the chances of Type I and Type II errors to specified maximums and at the same time minimize the sample size. Sample size determination is thus an optimization problem with Type I and Type II error constraints. These constraints will be specified as, “under such and such condition the chance of Type I/II error should not exceed such and such %.” Optimal Sample Size for Testing Figure 8.5.1. Determining n to Test [Workbook: Hypothesis Testing.xls; Sheet: n for testing mu] 49 The template is shown in Figure 8.5.1. This template uses the Solver for optimizing the sample size. Instructions for using the Solver are in the template. Some points to note while using this template are: 1. If there is only one constraint each for Type I and Type II Errors, then the suggested value for n in the cell B21 itself would be optimal. The Solver can then be used to find the optimal C. The formula for optimal n under these circumstances is | z || z | 2 1 Minimum n 0 0 1 where z0 is the critical z implied by the Type I Error constraint when = 0, and z1 and 1 similarly correspond to Type II Error constraint. The symbol means rounding up to the nearest integer. Because there is only one constraint each for Type I and Type II Errors, the suggested value of 94 is itself optimal for n. 2. The objective in the current setup is to minimize n. A problem may have a total cost function based on Type I and Type II Error costs which may have to be minimized. 3. If the Solver is unable to find an optimal solution, check all the data input, especially the Type I and Type II Error constraints. Make sure n and c have been set to the suggested values. After any correction, re-run the Solver. 8.5.2 Optimal Sample Size for Testing p The case of finding the optimal sample size for testing p is very similar to the case of seen in the previous section. Figure 8.5.2 shows the template. After entering the necessary data in the shaded cells, one may use the Solver to find the solution. Figure 8.5.2. Optimal n for Testing p [Workbook: Hypothesis Testing.xls; Sheet: n for testing p] While using this template, the following points may be noted. 1. If there is only one constraint each for Type I and Type II Errors, then the suggested value for n in the cell B18 itself would be optimal. The Solver can then be used to find the optimal value for x-critical. The formula for optimal n under these circumstances is 50 | z | p (1 p ) | z | p (1 p ) 2 0 0 1 1 1 0 n Minimum p 0 p1 where z0 is the critical z implied by the Type I Error constraint when p = p0, and z1 and p1 similarly correspond to Type II Error constraint. The symbol means rounding up to the nearest integer. 2. The objective in the current setup is to minimize n. A problem may have a total cost function based on Type I and Type II Error costs which may have to be minimized. 3. If the Solver is unable to find an optimal solution, check all the data input, especially the Type I and Type II Error constraints. Make sure n and x-critical have been set to the suggested values. After any correction, re-run the Solver. 4. Because binomial distribution is approximated as normal distribution, a necessary assumption here is that the sample is large. At times this may not be satisfied. Specially in Acceptance Sampling problems, the sample size is very likely to be small. One should then use the binomial template discussed in the next section. 8.6 Exercises 1. Do exercises 8-18 to 8-20 in the textbook. 2. Do exercises 8-55, 8-57 in the textbook. 3. Do exercises 8-76, 8-77 in the textbook. 4. Do exercises 8-81, 8-82 in the textbook. 8.7 Projects 1. A producer and a consumer of pins want to design a sampling plan that would be acceptable to both of them. The producer wants a lot that contains 1% defectives to have at least 99.5% probability of acceptance, and a lot that contains 2% defectives to have at least 98% probability of acceptance. The consumer wants a lot that contains 8% defectives to have at most 10% probability of acceptance, a lot that contains 10% defectives to have at most 2% probability of acceptance and a lot that contains 12% defectives to have not more than 0.5% chance of acceptance. Find the optimal value for sample size n and acceptance number c (x-critical). 2. A sampling plan has sample size n = 100 and acceptance number c = 3. The probability distribution of the % defective (p) of incoming lots is given by the table below. p Prob 1% 0.3 2% 0.4 3% 0.2 4% 0.1 i. When a random lot is received, what is its probability of acceptance under the given sampling plan? [Hint: Construct a joint probability table.] ii. Given that a lot is accepted, what is the probability that it has 1% defectives? 2% defectives? iii. What is the expected % defective in a lot accepted under this sampling plan? [This quantity is known as the Average Outgoing Quality (AOQ).