* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Engineering Statistics Chapter 4 Hypothesis Testing
Survey
Document related concepts
Transcript
Engineering Statistics Chapter 4 Hypothesis Testing 4B Testing on Variance & Proportion of Variances Why test variance? • In real life, things vary. Even under strictest conditions, results will be different between one another. This causes variances. • When we get a sample of data, apart from testing on the mean to find if it is reasonable, we also test the variance to see whether certain factor has caused greater or smaller variation among the data. • While larger variance may indicate the influence of some undesirable element, certain factors also bring the data closer, resulting in smaller variance. Whether welcome or not, we need to know if these factors exist. Differences • In nature, variation exists. Even identical twins have some differences. This makes life interesting. • In production, uniformity is usually desirable. Farmers try to find ways to make their tomatoes of the same size, or nearly so, so that they can grade them and pack them in standard packs. • Manufacturers thrive on uniformity of their products. That is the conveyor belt industry’s prerequisite. It will be a nightmare if the cars coming out of a production line are different. Nature against Production • In real life, however, variation exists. When things come out uniform, it is a suspicious event. • Hence if we find that a garden produces fruits of exactly the same size, then it means that either there is an interference in the process, for example, where the fruits are kept in fixed-shaped containers while growing, or they are artificial. • Another example of unlikely even product is when the signature on many copies are exactly matching. This may be the result of tracing, and so the signatures are not real. Purpose of Variance Analysis • Suppose we have introduced a procedure to make our product uniform, how can we be sure that the aim has been achieved: i.e. the variance has become smaller? • Conversely, if we find that the scores of students come out nearly the same, we wonder if there is copying among them, because normally we expect them to get different scores. • These are where tests on variances is done. Variation of variance • As we collect sample data, it nearly always yield data varying from one end of the mean to another. Such variations are parts of natural distribution. In general, we expect the variation (I.e. the standard deviation) to be of a certain range. Both large and small variances are not normal. • Hence, in testing on variance, we may test for large values or small values. These correspond to right and left-tail tests. Similarly, we may also test for the variance to be within a range, for the twotail tests. Modeling variance • There are two ways by which the variances are tested: either a sample variance against the assumed population variance, or variances of two population. • As we learn in Ch 3 (3B), sample variance follows the 2-distribution when compared to the population variance. Hence, we shall test sample variance s2 against the population variance 2 using the distribution (n –1)s2/2~ 2. Left and right-tails • As there is great difference between the left and right tail of a 2-distribution, we need to read the two ends separately. • As usual, if we are testing on the right tail, we read the 2-value at . However, for the left tail, we read the value at 1–. • If we are comparing the variances of two samples, the distribution to be used is the F-distribution, with n1–1, n2–1 degrees of freedom. Example 1 • The administrative manager of our company feels that when the files of a customer have very different sizes, it will mean loss of spaces. To avoid wasting spaces allocated for common computer files in an office, he introduces a new format for all such files. After using the format for a month, the manager checks on 25 such files, and find that the standard deviation in the sizes is 3.3 kb. Previously the standard deviation was 5.4 kb. At 90% level of confidence, test the hypothesis that the standard deviation has been reduced. Example 1 (Analysis) • Since there are 25 data, we model the variance s2 using a 2-distribution of degree =25–1=24. • The model is (25–1)s2/5.42~ 224. • As we are testing whether the variance (hence the standard deviation) has gone down, we run a test on the left-tail of the distribution. The procedure Null hypothesis: 2 = 5.42; Alternative hypothesis: 2<5.42. Test statistic: 2 = 24s2/5.42 ~ 224. At 90% confidence, =0.1, 1–=0.9, 20.9,24= 15.659. Hence the null hypothesis will be accepted if 2 15.659. 2 = 243.32/5.42 = 8.963 < 2critical. So we reject the null hypothesis. The file standard deviation has indeed been reduced. • As has been discussed before, the value of 2 is badly skewed. We will accept the null hypothesis when 2 is is more than the critical value of 15.659. However, the value obtained is less than that, so we accept the alternative hypothesis. The graph Example 2 • The standard deviation of the price of a tray of 30 eggs is 32.5 sen. A check is made on the price in 38 outlets a town and it is found that the standard deviation is 42.4 sen. At 95% of confidence, test the hypothesis the prices in that town fluctuate more than 32.5 sen. Solution: The distribution we should use is (38–1)s2 /32.52~ 237. • Unfortunately, the 2 table does not provide for the 37 degree of freedom. We have to take the nearest value, i.e. 240. Example 2 (Analysis) • The test is on whether the prices vary more in that town than in general. So our test is on the right tail. Null hypothesis: 2 = 32.52; Alternative hypothesis: 2>32.52. Test statistic: 2 = 37s2/32.52 ~ 237. (Note: Even though we are reading the value of 240, technically we are still testing using 237.) At 95% confidence, =0.05, 20.05,37= 55.758. Hence the null hypothesis will be admitted if 2 55.758. In this case, the calculated value of 2 is 3742.42/32.52 = 63.315 > 2critical. So we reject the null hypothesis. The standard deviation of price of eggs in that town is high. Example 3 • A florist orders for dendrobium orchids with the specified mean length of 85 cm and standard deviation of 5.8 cm. When she receives a supply, the measurement of 22 stalks shows a mean about 85.5 cm, which is acceptable. But the standard deviation is 6.7 cm, which different from the specified value. At 95% of confidence, test the hypothesis the standard deviation is not what is expected. Solution: The distribution for test is (22–1)s2/6.72~ 221. Example 3 (Solution) • Since we want to know if the SD is as expected, we shall test on two tails. Null hypothesis: 2 = 6.72; Alternative hypothesis: 26.72. Test statistic: 2 = 21s2/5.82 ~ 221. At 95% confidence, =0.05, /2=0.025, 1-/2 =0.975, 20.025,21= 35.479, and 20.975,21= 10.283. Hence we shall accept null hypothesis if 10.283 2 35.479. 2 = 216.72/5.82 = 28.023, which is within the range of critical values. Hence we accept the null hypothesis. This means the variance is acceptable. Example 4 Example 5 Ratio of two variance • When we compare the variances from two samples, the correct measure to take is the ratio of the two using the F-distribution. As we saw in 3B, s12/s22~F1,2, where n1 and n2 are the sizes of the samples and 1=n1-1, and 2 =n2-1. • Similar to the test of one sample variance against the population, we may have one-tail (on the right or left), or two-tail test. Example 6 • In order to compare the variation of income among the workers in two categories, a survey is conducted among 15 workers from category A and 31 in category C. It turns out that the standard deviation for A is RM 152.08 and that for C is RM 116.36. At 95% level of confidence, test the hypothesis the variation in incomes for the two group are nearly the same. Solution: We shall use the F-distribution of 14 numerator degree and 30 denominator degree for the test. i.e. 12/ 22~ F14,30. Example 6 (Solution) • We run a two tail test to find if the variance are different. Null hypothesis: 12 = 22; Alternative hypothesis: 12 22. Test statistic: F = s12/s22~ F14,30. At 95% confidence, =0.05, /2=0.025, 1-/2 =0.975. From the table, F0.025,14,30= 2.31, [Actually F0.025,15,30]. However, for F0.975,14,30, we need to calculate the value as follows: Example 6 (contd) • F0.025,30,14 = 2.73. • F0.975,14,30 = 1/F0.025,30,14 = 1/2.73 = 0.366. • Hence we shall accept null hypothesis if 0.366 Fcalculated 2.31. • Fcalculated = s12/s22 = 152.082/116.362 = 1.708. This falls within the critical range. Hence we accept the null hypothesis. The variances (and hence the standard deviations) of the two categories of workers are not different. Example 7 • A medical director claims that the standard deviation in the time taken for treating patients has been cut down since he has introduced a new procedure. The record shows that for 21 patients, the standard deviation for treatment is 6.8 minutes. In contrast, the standard deviation for 16 patients was 9.2 minutes before the procedure was used. At 95% level of confidence, can you accept his claim? Solution: Let s1 and s2 represent the standard deviation pre- and post- procedure. Example 7 (Solution) Null hypothesis: 22 = 12; Alternative hypothesis: 22 < 12. Test statistic: F = s22/s12~ F20,15. This is a one-tail test. As we are testing for reduction, it means we are looking at the left tail. At 95% confidence, =0.95. From the table, F0.05,15,20 = 2.20. So F0.95,20,15= 1/2.20 = 0.455. Hence we shall accept the null hypothesis if Fcalculated 0.455, and reject it otherwise. Fcalculated = 6.82/9.22 = 0.546 > Fcritical. Hence we accept the null hypothesis. The change in the standard deviation is not significant at =0.95. Example 8 • A technician complains that the standard deviation of the IC produced since the temperature in the production room has decreased by 2oC has actually gone up. This affects the quality of the IC. A measurement shows that the standard deviation of the 12 new ICs is 32.2 m; while 15 ICs from the previous batch has standard deviation of 27.4 m. At 95% confidence level, can we support the technician? Solution: Let s1 and s2 represent the standard deviation before and after the temperature was reduced. Null hypothesis: 22 = 12; Alternative hypothesis: 22 > 12. Example 8 (Solution) Test statistic: F = s22/s12~ F11,14. This is a one-tail test on the right. At 95% confidence, =0.05. So we shall look for F0.05,11,14. However, the table does not provide value for F0.05,11,14, so we read the values for F0.05,10,14 (2.60) and F0.05,12,14 (2.53) and take the average, giving 2.57. Hence we shall accept the null hypothesis if Fcalculated 2.57, and reject it otherwise. Fcalculated = 32.22/27.42 = 1.381 < Fcritical. So the null hypothesis is accepted.