Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3B - Variance Sample Variance • It is possible, in fact easy to estimate the mean of a sample because it make sense to expect the sample mean to be close to the population mean. This is true even for small samples. • However, when it comes to the variance, it is found that the value changes quite a lot, depending on the size of the sample. • It has been found that, if the size of a sample is n, then the ratio variance of sample variance s2 to population variance 2 follows the 2-distribution of degree n–1. I.e. (n–1)s2/2 ~ 2n-1. Introducing 2 • Statisticians found that the value of variance in a sample follows a certain distribution, called 2distribution. • This distribution is highly skewed to the right. Its value depends very much on n, the size of the sample. • It is very seldom that we are interested in the probability of a sample variance being of a certain value. Rather, we are usually more interested in, say, the lower and upper limit of the variance. The Graph of 2 2 tables • The UTM 2 tables list the values of 2 at 0.001, 0.005, 0.010, 0.025, 0.05, 0.10, 0.25, 0.5, 0.70, 0.90, 0.95, 0.975, 0.99, 0.995 for n = 1, 2, …,120. • Unlike the t-distributions, 2 distributions are highly skewed. The table gives the probabilities on the left and right ends of distributions. 2 table: P(2 k) = Case < 0.5. Value of k large. Case > 0.5. Value of k small. Interpreting 2 values 0.001 0.005 0.010 0.025 … 0.975 0.990 0.995 = 1 10.827 7.879 6.635 5.024 … 0.001 0.000 0.000 3 16.266 12.838 11.345 9.348 … 0.216 0.115 0.072 6 22.457 18.548 16.812 14.449 … 1.237 0.872 0.676 Example: The values of 2 are read separately for near 0 and near 1. = 1, P(2 > 7.879) = 0.005; P(2 > 0.000) = 0.995. = 3, P(2 > 12.838) = 0.005; P(2 > 0.072) = 0.995. = 6, P(2 > 16.812) = 0.010; P(2 > 0.872) = 0.990. Example 1 The standard deviation of the life of a certain car battery is 5.8 months. A repair shop just received a sample of 7 batteries. Find the 95% confidence interval of the standard deviation of the sample. Solution: (7–1)s2/2 ~ 26. At 95% confidence, /2 = 0.025 and 1– /2 = 0.975. From the table, we read 20.025,6 = 14.449 and 20.975,6 = 1.237. So 1.237 6s2/2 14.449 2.63s 9.00. Example 2 A pharmaceutical company claims that the standard deviation of its 200 mg Vitamin C tablets is 24 mg or less. If we check 21 such tablets, what is the 90% confidence interval of the standard deviation? Solution: (21-1)s2/2 ~ 220. At 90% confidence, /2 = 0.05 and 1–/2 = 0.95. From the table, we read 20.05,20 = 32.671 and 20.95,20 = 11.591. So 11.591 20s2/2 32.671 18.27 mg s 30.67 mg. Example 3 • The annual report of HHH Restaurant shows the mean sale of its franchises for the last quarter is RM 5.6 m, with standard deviation of RM 1.25 m. Estimate the 95% confidence intervals for the (i) mean, and (ii) standard deviation for 20 restaurants managed by Ali & Co. Example 3 (i) Solution • Since the mean and standard deviation from the population are given, so we shall use the normal distribution X~N(5.6, 1.252/20) to model the sample mean. At 95% confidence, /2 = 0.025. Z0.025 = 1.96. So the interval for the sample mean is 5.6 – 1.96×[1.252/20] X 5.6 + 1.96×[1.252/20] 5.05 X 6.15 The range is from RM 5.05 m to RM 6.15 m. Example 3 (ii) Solution • We model the variance using (n–1)s2/2 ~ 219. At 95% confidence, /2 = 0.025 and 1– /2 = 0.975. From the 2-table, we have 20.025,19 = 32.852 and 20.975,19 = 8.907. So the inequality is 8.907 19×s2/1.252 32.852 0.7325 s2 2.7016 0.856 s 1.644. The range for the standard deviation is RM 0.856 m to RM 1.644 m. Population variance from Sample variance • As for the mean, we usually need to estimate the variance of the population from a sample. In this case, we use the same 2n–1 distribution for (n–1) s2/2. • The calculation of 2 can be obtained directly from the inequality 2 /2,n–1 (n–1)s2/2 21–/2,n–1; or we can use the inverse inequality 1/2 1–/2,n–1 2/(n–1)s2 1/2/2,n–1. The result is the same. Example 4 – Using 2 (n–1)s /2 From a sample of 10 food samples, it was found that the mean content of a certain poison is 13.5 g with standard deviation 3.7 g. At 90% level, find the confidence intervals of the mean and SD of the poison content for the food. Solution: For mean, we shall use the t-distribution with 9 (=10–1) degrees of freedom since the sample size is small. Mean: At 90% level, = 0.1, /2 = 0.05. Referring to Table 7, t0.05,9 = 1.833. Hence the mean should lie between 13.5–1.833×3.7/10 to 13.5+1.833×3.7/10. So we conclude that the mean is between 11.356 g and 15.64 g . Variance: Using the standard symbols, we have (n– 1)s2/2~ 29. At 90% level, /2 = 0.05 and 1–/2 = 0.95. From the table, we find 20.05,9 = 16.919 and 20.95,9 = 3.325. So 3.325 9×3.72/2 16.919. From this, we obtain 2.70 g 6.09 g. Example 4 – Using 2/ 2 (n-1)s • Instead of using the distribution for (n–1)s2/ 2, we can use the form 2 /(n–1)s2 ~ 1/29. This will then give us at 90% level, /2 = 0.05 and 1–/2 = 0.95. The relation for the interval is 1/20.05,9 2 /(n-1)s2 1/2 0.95,9 . From the table, we have 1/3.325. 2 /9×3.72 1/16.919. From the inequality, we obtain the same range 2.70g 6.09 g. Example 5 • Consumers complain that the price of food vary a lot depending on where you live. In order to ascertain the variation of the price of a plate of fried rice, a survey is made at 75 stalls across the country. The standard deviation turns out to be 68 sen. Based on this survey, estimate the standard deviation of the price of a plate of fried rice for the whole country at the level of 95%. Example 5 (Solution) • We model the standard deviation using the 274 distribution: (n–1)s2/2~ 274. However, the table does not provide for 274. The alternative is to use the nearest value, which is 275. • At 95%, /2 = 0.025 and 1–/2 = 0.975. From the table, we read 20.025,75 = and 20.975,75 = • Note: When is large, there is little difference between 2 of one value from another. Using 275 instead of 274 will not cause any discrepancies. Example 6 • In a health screening, 14 student have their weights and heights taken. Thee BMI are calculated as follows: 32 25 26 22 18 28 35 25 18 22 29 33 20 26 Based on this set of data, find the 90% confidence interval for the mean and standard deviation of the BMI for all students. Note: In this case, the raw sample data are given. We are to infer on the population parameters based on the sample data. Example 6 (Solution) We first find the mean and SD using the calculator as follows: First put your calculator in SD mode. Enter each number using the M+ (called DATA) key. After that, Tap SHIFT, 2. You see three displays: X, Xn and Xn–1. The two SD are called population and sample SD respectively. Since you obtain the data from the sample, you need to choose the sample SD. Thus: mean: X = 25.64, standard deviation: s = 5.37. Example 6 (Solution) I. Confidence interval for the mean: The sample size 14 is small; so we model the population mean µ using the t-distribution of degree 13. At 90% confidence, α=0.1, α/2=0.05. t0.05,13 = 1.771. So the confidence interval for the mean is 25.64 – 1.771×(5.372/14) to 25.64 + 1.771×(5.372/14) 23.10 to 28.18. Example 6 (contd) II. Confidence interval for the standard deviation (using the variance) For the variance, the model is (n–1)s2/ 2 ~ 2. which in this case is 13×5.372/2 ~ 213. At 90% confidence, α=0.1, α/2=0.05, and 1 – α/2 = 0.95. 20.05,13 = 22.362, 20.95,13 = 5.892. This means that 5.892 ≤ 13×5.372/2 ≤ 22.362 1.767 ≤ ≤ 7.977. Ratio of two variances • When variances s12 and s22 are obtained from two samples, either of the same population, or from two comparable populations, then the ratio of the variances s12/s22 follows the F-distribution of degrees 1 and 2 degrees: s12/ s22~F1,2 F-distribution • F-distribution has two parameters, 1 called the numerator and 2 the denominator. • Because the F-distributions are very wildly skewed, depending on the degrees of freedom, table are given only for the right tail of = 0.001, 0.01, 0.025, and 0.05. • We need to determine F values for 0.999, 0.099, 0.975 and 0.95 ourselves, using the fact that if s12/s22 ~ F1,2, then s22/s12 ~ F2,1. Reading the F-distribution table • To find the value of F0.05,6,7, say, you first look for the 0.05 table. Next you read the top row. This shows the numerator values. Locate 5. • On the left, the first column shows the denominators. Locate 7. On this row, under the numerator 5, we see 3.97. So F0.05,6,7 =3.97. Obtaining F-value of 1– • We note thats12/ s22~F1,2 s22/s12~F2,1. • From this relation, we obtain F 1,2,1- as 1/F2,1,. • For example F0.01,5,7 = 10.46, so F0.99,7,5 = 1/10.46 = 0.0956. • Conversely, if you need F0.95,8,4, then read F0.05,4,8 = 6.04. F0.95,8,4 = 1/6.04 = 0.166. Example 7 (i) The standard deviation of sugar levels among men is 1.23 units. Find the 95% confidence interval of the standard deviation for the sugar levels for a sample of 24 men. (ii) The standard deviation of sugar levels among a sample of 16 men is 1.23 units. Find the 95% confidence interval of the standard deviation for the sugar levels for another sample of 26 men. 7 (i) Solution • This is a revision example of the 2distribution. We note that the population variance 2 is 1.232. By theory, (24–1)s2/2 ~ 223. • At 95% confidence, α=0.05, α/2=0.025, and 1– α/2 = 0.975. 20.025,23 = 39.364, and 20.975,23 = 12.401. So 12.401 23s2/1.232 39.364. 0.903 s 2.589. 7 (ii) Solution • Here 1.232 is the first sample variance s12. By theory, s12/s22 ~ F15,25. • At 95% confidence, α=0.05, α/2=0.025, and 1–α/2 = 0.975. • F0.025,15,25 = 2.41; • To calculate F0.975,15,25, we first read F0.025,25,15 = 2.69. Hence F0.975,15,25 = 1/2.69 = 0.372. So 0.372 1.232/s22 2.41. 0.792 s2 2.017. Example 8 (i) The standard deviation for the monthly pays of a group of 10 workers in a factory is RM 115.65. What is the 90% confidence of the standard deviation of the monthly pays of 8 workers in a similar factory? (ii) The Tourism Council finds the standard deviation of the spending among 15 tourists at a resort to be RM 223.45. Find the 98% confidence interval for the standard deviation of spending among 20 tourists in a similar resort. 8 (i) Solution • We take 115.652 as the second sample variance s22, and we seek s1. By theory, s12/s22 ~ F7,9. • At 90% confidence, α=0.10, α/2=0.05, and 1 – α/2 = 0.95. • F0.05,7,9 = 3.29; • For F0.95,7,9 we read F0.05,9,7 = 3.68 F0.95,7,9 = 1/3.68 = 0.272. • So 0.272 s12/115.652 3.29 60.34 s1 209.84. 8 (ii) Solution • Again we take 223.452 as the second sample variance s22, and we seek s1. By theory, s12/s22 ~ F19,14. • At 98% confidence, α=0.02, α/2=0.01, and 1 – α/2 = 0.99. • Unfortunately, the F-table does not give values for F19,14, and neither do we have F14,19. • In this case, we take the nearest value, i.e. F0.01,20,15 = 3.37, and for F0.99,19, 14, we read F0.01,15,20 = 3.09, F0.99,19,14 = 1/3.09 = 0.3236. • Hence 0.3236 s12/115.652 3.37. 65.79 s1 212.31. At 98% confidence, the range of the standard deviation is RM 65.79 to RM 212.31. Wide range for Variance • We note that, unlike the mean, the confidence intervals for the variance is rather wide. Increasing the size of sample does not significantly reduce the range of the variance. This is the nature of things in that while variations in values cancel each other, leading to the mean closer to the expected value, the variation in values will remain, thus causing large variances. • In fact, when we have an unexpectedly small range of variance, we should suspect that some unusual factors have caused values to converge. This means the data are not natural and are suspect.