Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LESSON SEVEN CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS An interval estimate for μ of the form ̅̅̅̅̅ 𝑥 ± a margin of error would provide the user with a measure of the uncertainty associated with the point estimate. One would expect that the formula for “the margin of error” should take in consideration the factors that determine the variation in values of the point estimate, such as the sample size n and the population standard deviation, σ. Using the Central Limit Theorem (n≥30), we can calculate the interval that contains the 95% of sample means: π± 1,96 𝜎𝑥̅ or π±ε where ε = 1,96𝜎𝑥̅ . The interval π±ε is described as the interval with a fixed centre, π, and total width w= w x ε, that contains 95% of all sample means. In estimation π is unknown. Therefore replace π by a point estimate, 𝑥̅ . Substitution for π gives the interval 𝑥̅ ± 1,96𝜎𝑥̅ = 𝑥̅ ± 𝜀 The essential difference between the two equations is that in the former the centre of the interval is fixed at μ, but in the second the centre of the interval is no longer fixed: the centre moves according to the value of the new point estimate, 𝑥̅ . An interval estimate 𝑥̅ ± 𝜀 will contain μ if the sample mean, 𝑥̅ , is one of the 95% of 𝑥̅ ′𝑠within the interval μ±ε. An interval estimate will NOT contain μ if the sample mean 𝑥̅ is outside the interval μ±ε. Each one of the 95% of sample means that fall within a distance of 1,96𝜎𝑥̅ from μ will result in an interval 𝑥̅ ± 1,96𝜎𝑥̅ that contains the population mean somewhere within the interval. Since 95% such interval will contain μ, we can state that we are 95% confident that the population mean, μ, is in the interval. The formula for an interval estimate for μ with any level of confidence may be deduced as a generalization of the 95% confidence interval above. In general, if we let the area in each tail be /2, then the corresponding Zvalue will be referred as Z/2 ; hence the margin of error ε = Z/2 𝜎𝑥̅ . Then (1-)x100% is called the “level of confidence” that the interval 𝑥̅ ± 𝑍𝛼/2 𝜎𝑥̅ contains μ somewhere within it. The (1-a) 100% confidence interval is given by the formula 𝑥̅ ± 𝑍𝛼/2 𝜎𝑥̅ In some applications the population standard deviation σ will be known. For example the variation in the diameter of discs cut by a certain machine may have been established over a period of time. In other application σ will not be known. In such cases (provided n≥30), σ is estimated by s, the point estimate calculated from the sample data. Hence, when σ is unknown, the confidence interval for μ is 𝑥̅ ± 𝑍𝛼/2 𝑠𝑥̅ 𝑠𝑥̅ = 𝑠 √𝑛 is the sample standard error of mean. EXAMPLE An importer if Herbs and Spices claims that the average weight of packets of Saffron is 20 gms. A randon sample of 36 packets of Saffron is collected. From the sample, the average weight was calculated as 19,35 gms. The population standard deviation of weights is known to be 1,8. a) Calculate the 95% confidence interval for the population average weight, μ. b) Calculate 99% confidence interval for the population average weight, μ. c) Estimate the range for total weight of saffron is 50 with 95% confidence. a) 𝜎𝑥̅ = 𝜎 √𝑛 = 1,8 √36 = 1,8 6 = 0,3. Z0,025 = 1,96 𝑥̅ ± 𝑍0,025 𝜎𝑥̅ = 19,35 ± 1,96 ∗ 0,3 = 19,35 ± 0,5880 From 18,762 to 19,938. b) Z0,005 = 2,5758 𝑥̅ ± 𝑍0,005 𝜎𝑥̅ = 19,35 ± 2,5758 ∗ 0,3 = 19,35 ± 0,7727 d) Total weight of packets = Number of packets x 𝑥̅ . The mean weight per packet is between 18,762 and 19,938 gms, with 95% confidence. Hence the total weight of 50 packets is between 938,1 and 996,9 with 95% confidence. One-sided confidence intervals The lower limit, above which we are (1-α)100% confident the population mean lies: 𝑥̅ − 𝑍𝛼 𝜎 √𝑛 𝑜𝑟 𝑥̅ − 𝑍𝛼 𝑠 √𝑛 (𝑖𝑓 𝜎 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛) The upper limit, below which we are (1-α)100% confident the population mean lies: 𝑥̅ + 𝑍𝛼 𝜎 √𝑛 𝑜𝑟 𝑥̅ + 𝑍𝛼 𝑠 √𝑛 (𝑖𝑓 𝜎 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛) EXAMPLE A property investor claims that the average rental income per room in student accommodation is at most £ 5.000 per year. The mean rent paid by a random sample of 36 students is 5.200, the standard deviation is 735. a) Calculate a 90% confidence interval for the true mean annual rental income. b) Calculate the lower limit for one-sided 95% confidence interval. a) 𝑠𝑥̅ = 𝑠 √𝑛 = 735 √36 = 122,5 Z0,05 = 1,6449 𝑥̅ ± 𝑍0,05 𝑠𝑥̅ = 5.200± 1,6449*122,5 5.200±201,5 From 4998,5 to 5401,5. b) You will need Z = 1,6449 when α = 5% The 95% lower confidence limit is 𝑥̅ − 𝑍0,05 𝑠𝑥̅ = 5.200 − 1,6449 ∗ 122,5 = 4998,5 Confidence intervals for proportions We saw that the Central Limit Theorem (CLT) stated that sample means were Normally distributed for n ≥ 30 𝜎 𝑥̅ ∼ 𝑁 (𝜇, ) √𝑛 Then based on the CLT we derived the formula for the confidence interval for the mean as 𝑥̅ ± 𝑍𝛼/2 𝜎𝑥̅ Similarly, the CLT stated that sample were Normally distributed 𝑝 ∼ 𝑁 (𝜋, 𝜋(1 − 𝜋) ) √𝑛 for n ≥30. Based on CLT, the formula for the confidence interval for the population proportion is given as 𝑝 ± 𝑍𝛼/2 √ 𝑝(1 − 𝑝) 𝑛 EXAMPLE In a poll of 200 voters 88 stated that they will vote for the Green party candidate. Construct 95% confidence interval. Comment on the precision of the interval. p= 88/200 = 0,44. Z/2 = 1,96 0,44± 1,96√ 0,44 (0,56) 200 = 0,44±0,0688 The interval is too wide. Suppose that p = 0,44 but n= 1.000. The interval will be 0,44± 1,96√ 0,44 (0,56) 1.000 = 0,44± 0,0307. The precision of confidence intervals for means and proportions It has been already noted that very wide interval estimates are of little practical use. It has been noted several times that increasing sample size results in a reduction in the width or precision of a confidence interval. To calculate the exact sample required to give an interval estimate of a specified precision, return to the formulae used to calculate confidence intervals for means and proportions. The precision of the confidence interval 𝑥̅ ± 𝑍𝛼/2 𝜎𝑥̅ can be written as 𝜀 = 𝑍𝛼/2 𝜎𝑥̅ = 𝑍𝛼/2 √ 𝜎 𝑛 So we can solve the equation for n and get 𝑛=( 𝑍𝛼/2 𝜎 𝜀 ) 2 . This is the sample size for (1-α)100% confidence interval for μ, with precision ±ε. Similarly, to calculate the sample size that will give a confidence interval for proportions with a specified precision (±ε), substitute the required value for ε in the equation 𝑍𝛼 𝑝(1 − 𝑝) 𝑛=( 𝜀 = 𝑍𝛼/2 √ 𝑝(1−𝑝) 𝑛 2 ) 𝜀 2 is the precision ε, for (1-)100% confidence interval for proportions. For maximum precision, substitute p = 0,5. EXAMPLE For the data in the example of Saffron pocket calculate the sample size that will give a 99% confidence interval for the population mean with a margin error ±0,5 when σ= 1,8. Z0,005 = 2,5758 2,5758 ∗ 1,8 𝑛=( ) 0,5 2 = 85,986 For the data of the Green party candidate calculate the sample size that will give a 95% confidence interval with a margin error of ±0,01 for the population proportion when p is unknown. Z0,025= 1,96 Since p is unknown we get maximum precision putting p = 0,5. (1,96)2 0,5 ∗ 0,5 𝑛= = 9604 0,012 Confidence intervals for differences between means and proportions While the estimation of a single population mean or proportion is important, there are situations where we may be more interested in estimating the difference between two means or proportions. For example, we may be interested in whether the percentage that intend to vote for party B is higher that for party A or whether commuting time is faster by train than by car etc. It was stated that the distribution for the difference between two normal independent random variables was also normal, with mean equal to the difference between the two means and the variance equal to the sum of variances. If 𝑋1 ∼ 𝑁(𝜇1, 𝜎12 ) 𝑎𝑛𝑑 𝑋2 ∼ 𝑁(𝜇2, 𝜎22 ) then (𝑋1 − 𝑋2 ) ∼ 𝑁(𝜇1 − 𝜇2 , 𝜎12 + 𝜎12 ). Similarly the distribution of the sample means is Normally distributed (n≥30) and then the distribution of differences between every possible pair of sample means is given by 𝑋̅1 − 𝑋̅2 ∼ 𝑁 (𝜇1 − 𝜇2 , 𝜎12 𝜎12 + ) 𝑛1 𝑛2 with n1 and n2 ≥30. Hence, the (1-)100% CI per (μ1 – μ2): (𝑥 ̅̅̅1 − ̅̅̅ 𝑥2 ) ± 𝑍𝛼/2 √ 𝜎12 𝑛1 +√ 𝜎22 𝑛2 If the sample sizes are 30 or more and σ1 and σ2 are unknown they may be estimated by s1 and s2 and the confidence interval is 𝑠12 𝑠22 (𝑥 ̅̅̅1 − 𝑥 ̅̅̅2 ) ± 𝑍𝛼/2 √ + √ 𝑛1 𝑛2 Strictly speaking, the t-percentage point should be used when σ is unknown, but the Z percentage point is a good approximation for large n. Difference between proportions The sample proportions are Normally distributed for n1 and n2 ≥30 according to the Central Limit Theorem. Hence the difference between sample proportions is also Normally distributed: 𝑝1 − 𝑝2 ~𝑁 (𝜋1 − 𝜋2 , 𝜋1 (1−𝜋1 ) 𝑛1 + 𝜋2 (1−𝜋2 ) 𝑛2 ). The point estimate for the difference between two populations is (p1-p2) the standard error for the difference between sample proportions is 𝑠𝑝1−𝑝2 = √ 𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 ) +√ 𝑛1 𝑛2 Hence the confidence intervals for the difference between population proportion (π1-π2) is 𝑝1 − 𝑝2 ± 𝑍𝛼/2 √ 𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 ) + 𝑛1 𝑛2 EXAMPLE Designers of rowing equipment investigate the difference between the mean weights of male and female rowing teams. Random samples of male and female rowers are selected: the sample sizes and average weights and sample standard deviations are given below Sample size Sample mean Sample standard dev. Male rowers 42 60,5 6,8 Female rowers 30 52,6 4,5 a) Calculate the 95% confidence interval for the difference in means between male and female rowers. b) What inference can be drawn from your results about the difference between population means; the difference between individuals in each population? a) The difference between means is (60,5-52,6) = 7,9. The standard error is √ (6,8)2 42 +√ (4,5)2 30 = 1,3326 Z0,025 = 1,96. The confidence interval is 7,9± 1,96 * 1,3326 = 7,9±2,6119 = (5,2281; 10,5119). b) We are 95% confident that the mean weight of male rowers exceeds the mean weight for female rowers by 5,2881 to 10,5991. When we are very confident that the mean for male rowers is greater than the mean for female rowers we cannot assume that any individual male rower will be heavier than any individual female rower. This is because the variance for individual values is n times greater than the variance for means. EXAMPLE The table below gives the results for polls taken in two localties. Sample size Vote for Green party Area A 200 88 Area B 160 54 pA = 0,44; pB = 0,3375 (pA- pB) = 0,1025. The standard error is √ 0,44(1 − 0,44) 0,3375(1 − 0,3375) + = 0,0513 200 160 The confidence interval is 0,1025±1,96*0,0513=0,1025±0,0843= (0,0182;0,1868).