Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confidence Interval Estimation Week 8 Objectives • • • • On completion of this module you should be able to: calculate and interpret confidence interval estimates for the mean and proportion, determine sample size for means and proportions understand the application of confidence intervals, particularly in auditing and consider ethical issues relating to confidence interval estimation. 2 Confidence interval estimation • Until now we have estimated population parameters using point estimates – a single value. • Last week we saw how these point estimates can vary from sample to sample. • If we now use our understanding of the variability in the sampling distribution of the mean, we can develop an interval estimate of the population mean. • We construct this with some specified level of confidence (90%, 95% and 99% are common). 3 Confidence interval estimation of the mean ( known) • A confidence interval allows us to make an inference about the population based on data from a sample. • Just as each sample results in a difference point estimate of a parameter, each sample will also result in a different confidence interval. • The level of confidence is given by (1-)100% where is the proportion of the tails of the distribution that is outside the confidence interval (so for 95% confidence, =0.05). 4 Confidence interval estimation of the mean ( known) • Our chosen confidence level indicates how confident we can be (in the long run) that the interval resulting from our sample contains the true population statistic. • For example, a 95% confidence level indicates that we can expect (in the long run) that in 95 out of 100 samples, the resulting interval will contain the true population parameter value. 5 Confidence interval estimation of the mean ( known) X Z 2 n or X Z 2 n X Z 2 n where Z/2 is the standardised normal distribution value which has an upper tail probability of /2. 6 0 Confidence interval estimation of the mean ( known) • For example, for a 95% confidence interval 0.95/2=0.475: 0.025 0.475 0.475 0.025 -1.96 0 +1.96 7 0 Confidence interval estimation of the mean ( known) • For a 99% confidence interval 0.99/2=0.495: 0.005 0.495 0.495 0.005 -2.58 0 +2.58 8 Example 8-1 A lecturer is interested in the amount of time it takes students to complete a particular assignment. It is known that the standard deviation is 24 minutes. The lecturer takes a random sample of 40 students and discovers that in this sample, the mean time to complete the assignment is 150 minutes. (a) Set up a 95% confidence interval estimate for the population mean time take to complete the assignment. 9 Solution 8-1 • We are given 24, X 150 and n 40. • For a 95% confidence interval, Z = 1.96 and: 24 X Z 2 150 1.96 n 40 150 7.4377 (to 4 dec. pl.) 142.56 157.44 • So the lecturer can be 95% confident that the true mean time taken to complete the assignment is between 142.56 and 157.44 minutes. 10 0 Solution 8-1 (b) Recalculate your answer to (a) based on a 90% confidence interval. • For a 90% confidence interval: 0.1 0.4 0.4 0.1 -1.28 0 +1.28 11 Solution 8-1 X Z 2 24 150 1.28 n 40 150 4.8573 (to 4 dec. pl.) 145.14 154.86 • So the lecturer can be 90% confident that the true mean time taken to complete the assignment is between 145.14 and 154.86 minutes. 12 Solution 8-1 (c) It was the lecturer’s intention that this assignment take students no more than two hours to complete. Comment on how well the sampled students reflect this goal. Based on your results, what would you recommend to the lecturer? • Even the widest interval (95%: 142.56 to 157.44) does not contain 120 minutes (2 hours). • This sample seems to indicate that average time taken to complete the assignment is much greater than the lecturer intended. 13 Solution 8-1 • It seems unlikely that students are going to be able to complete the assignment in 2 hours. • Recommend that the lecturer reduces the assignment or changes their expectations… Some important points to note: • We would expect in the long run that only 5% of samples would result in an interval which does not contain the true mean time to complete the assignment. 14 Solution 8-1 • It is possible that this sample is one such case, BUT, given how far outside the interval the desired time fell, this seems unlikely. • The confidence interval relates to mean (average) time taken to complete the assignment. • It is likely that some individual students will be able to complete the assignment with the two hours, but this does not appear to be the case for the ‘average’ student. 15 Solution 8-1 (d) Suppose an extra class was offered to students which enable them to better understand the goal of the assignment. This resulted in a change of the standard deviation for all students to only 12 minutes. What effect would this have on your answer to (a)? 12 X Z 2 150 1.96 n 40 150 3.7188 (to 4 dec. pl.) 146.28 153.72 16 Solution 8-1 • So the lecturer can be 95% confident that the true mean time taken to complete the assignment is between 146.38 and 153.72 minutes. • Reducing the standard deviation smaller confidence interval. • The extra class has enabled a more accurate estimate of the true time taken to complete the assignment (due to student completion times being less varied in the sample), BUT, it has not reduced the assignment completion time (in this example)!! 17 Confidence interval estimation of the mean ( unknown) • Just as we use X to estimate μ, so we use S to estimate . • If X is a normally distributed random variable, then X t S n follows a t distribution with n-1 degrees of freedom. • The t distribution (or Student’s t distribution) looks very similar to the normal distribution (bellshaped and symmetrical) but has more area in the tails and less in the centre. 18 Confidence interval estimation of the mean ( unknown) • As the degrees of freedom increase, the t distribution approaches the standardised normal distribution. • For sample sizes of 120 or more, there is little difference between Z and t (in which case the normal distribution is often used even when is unknown). • Critical values for the t distribution (see Table E.3 in the text) depend on the degrees of freedom and the confidence level. 19 Confidence interval estimation of the mean ( unknown) • Check for yourself that you can find the critical t values in the example that follows. • Read the information on page 289 of the text which discusses degrees of freedom. • The confidence interval for the mean ( unknown) is given by S X tn1 n or S S X tn1 X tn1 n n 20 Example 8-2 A group of students are concerned that bags of a particular brand of potato chips weigh less than the 50 grams that the packaging claims. They take a random sample of 20 bags and discover that for this sample, the mean weight is 49.2 grams and the standard deviation is 1.1 grams. Calculate the 95% confidence interval based on their sample data. Do you think the students are justified in their belief? 21 Example 8-2 What potential problems are there with the way the students have conducted this experiment? Solution • We are given X 49.2 and S 1.1 • Since the population standard deviation is unknown (we have a sample value), we will use the t distribution in the confidence interval formula. 22 Solution 8-2 We have a sample size of 20, so the degrees of freedom are: n 1 20 1 19. 2 = .025 1 – = .95 t = – 2.0930 2 = .025 t = 2.0930 23 Solution 8-2 X tn1 S 1.1 49.2 2.0930 n 20 49.2 0.5148 48.69 49.71 • So the students can be 95% confident that the true population mean weight of the bags of chips is between 48.69 and 49.71 grams. • Based on this sample, it appears that the mean weight of bags is less than 50 grams. 24 Solution 8-2 • We cannot be sure of this conclusion unless we know that this sample is representative of the population. For example, – Were the bags of chips randomly selected? (consider the randomness of the choice of store, location in stores, location of store (town, city etc)) – Is a sample of 20 bags sufficiently large? (Doubtful!!) – Were the weights of the bags measured accurately? – Are there factors which result in a change of the weight of the bags after packing (such as settling, moisture content changes etc)? • What other factors can you think of that might affect the results? 25 Confidence interval estimation of the proportion • Recall that when both np and n(1-p) are at least 5, the binomial distribution can be approximated by the normal distribution. • In this case, the confidence interval estimate for the population proportion is: ps 1 ps ps Z n or ps Z ps 1 ps p ps Z n ps 1 ps n 26 Confidence interval estimation of the proportion where X number of successes ps = sample proportion = n sample size p = population proportion Z = critical value from standard normal distribution n = sample size 27 Example 8-3 A financial advice firm has been recommending a particular investment opportunity to a large number of its clients. They surveyed a random sample of 500 clients who took advantage of the investment and discover that 408 of them are glad they made the investment. Construct both a 95% and 99% confidence interval for the population proportion of clients who are glad they made the investment. 28 Solution 8-3 408 ps 0.816 n 500 500 • Both np and n(1-p) are at least 5 so normal distribution is appropriate. • For the 95% confidence interval, Z=1.96 and ps Z ps 1 ps 0.816 1 0.816 0.816 1.96 n 500 0.816 0.033964 (to 6 dec. pl.) 0.782 p 0.850 (to 3 dec. pl.) 29 Solution 8-3 • So we can say with 95% confidence that the true population proportion of clients who were happy with the investment is between 0.782 and 0.850. • For a 99% confidence interval, Z=2.58 and ps 1 ps 0.816 1 0.816 ps Z 0.816 2.58 n 500 0.816 0.044708 (to 6 dec. pl.) 0.771 p 0.861 (to 3 dec. pl.) • So we can say with 99% confidence that the true population proportion of clients who were happy with the investment is between 0.771 and 0.861. 30 Determining sample size • Sample size has a huge impact in statistical analyses. • The chosen size is based on a balance between accuracy and cost. • The statistician decides how big a sampling error is acceptable in estimating each of the parameters. 31 Determining sample size • Recall that a confidence interval for the mean is found via: X Z n • The amount added or subtracted to the sample mean is half the interval – this represents the amount of imprecision resulting from sample error. • The sample error is therefore: e Z n • Rearranging then gives an expression for n, the sample size. 32 Determining sample size • The sample size required to construct the confidence interval estimate for the mean is: Z 2 2 n 2 e • To determine sample size you must know: – the desired confidence level (which determines Z) – the acceptable sampling error (e) – the standard deviation () 33 Example 8-4 A consumer watchdog organisation is interested in the mean amount charged per hour by accountants for their services. Based on studies in other similar countries, the standard deviation is believed to be $12.75. The organisation wants to estimate the mean amount charged per hour to within ±$4 with 95% confidence. What sample size is needed? If 99% confidence were required, what would the required sample size be? 34 Solution 8-4 e4 • We are given: 12.75 and based on 95% confidence, Z=1.96. • The sample size is therefore: 1.96 12.75 Z n 2 39.03 40 2 e 4 • Important note: we round up to the next whole integer when determining sample size. • So a sample of 40 accountants should be taken to be 95% confident that the estimate of the mean is within ±$4 of the true mean. 2 2 2 2 35 Solution 8-4 • Based on 99% confidence, Z=2.58. • The sample size is therefore: 2.58 12.75 Z n 2 67.63 68 2 e 4 2 2 2 2 • So a sample of 68 accountants should be taken to be 99% confident that the estimate of the mean is within ±$4 of the true mean. 36 Determining sample size • Recall that a confidence interval for the proportion is found via: ps 1 ps ps Z n p 1 p • Sample error is therefore: e Z n • Rearranging this gives the sample size required to construct the confidence interval estimate for the 2 proportion: Z p 1 p n e2 37 Determining sample size • To determine sample size you must know: – the desired confidence level (which determines Z) – the acceptable sampling error (e) – the true proportion of successes (p) • Unfortunately we don’t usually know p (as that is normally what we are trying to estimate!!) • We can therefore: – use past information or relevant experiences to provide an educated estimate of p – use p = 0.5 as this results in the largest sample size (often referred to as the ‘most conservative estimate’). 38 Example 8-5 The same group of students that were discussed in Example 8-2 discovered a flaw in the process of random selection of chip bags. They decide to conduct the experiment again and this time work out what sample size will be needed to accurately estimate the population proportion of chip bags that are underweight. If they wish to be 95% confident that their estimated proportion is within ±0.025 of the population proportion, determine the required sample size. 39 Solution 8-5 • We have Z=1.96 (95% confidence) and e=0.025. • Since we have no information about p, the most conservative estimate (p=0.5) is chosen. • The sample size should be 2 2 Z p 1 p 1.96 0.5 0.5 n 1536.64 1537 2 2 e 0.025 • So the students need to sample 1537 bags of chips in order for their estimate of the proportion to be within ±0.025 of the population proportion with 95% confidence. 40 Solution 8-5 • Note we could reduce the sample size dramatically by: – allowing a larger sample error (eg e=0.05 reduces sample size to 385) – using a better estimate of p (perhaps based on results of the first survey if these are sufficiently reliable – we weren’t given this information, however, and so can’t try this here) • But, care is needed – these options might reduce the effectiveness of the survey!!! 41 Applications of confidence interval estimation in auditing • Auditing makes use of statistical sampling in order to estimate the total amount. • The point estimate for the population total is given by: Total NX where N is the population size. • The confidence interval estimate for the total is: s N X N t n N n N 1 42 Example 8-6 A budget eyewear store (which sells frames for glasses, lens cleaner, cases, cloths etc) is conducting the end of quarter inventory of its stock. It was determined that there were 1296 items in stock of which a sample of 100 was randomly selected. An audit was conducted which found that the mean value of the merchandise in the sample was $196 and the standard deviation was $67.50. 43 Example 8-6 Based on this information, find the 95% confidence interval estimate of the total estimated value of the merchandise in inventory at the end of the quarter. Solution • We are given N 1296 n 100 X 196 S 67.50 and given 95% confidence we can find: tn1, t99,0.025 1.9842 44 Example 8-6 • The 95% confidence interval estimate will be: s N X N t n N n N 1 67.5 1296 100 1296 196 1296 1.9842 100 1296 1 254016 16681.1092 (4 dec. pl.) $237,334.11 Population total $270,697.11 • So the store can be 95% confident that the population total merchandise value will be between $237,334.11 and $270,697.11. 45 Example 8-6 • Note that this sample has resulted in quite wide bounds on the total merchandise caused by a relatively large standard error. • This result may indicate the store must add up every item in stock, or, take perhaps just take a larger sample! 46 Applications of confidence interval estimation in auditing • Difference estimation is used in auditing when there are believed to be errors in a set of items being audited. • It allows the estimation of the magnitude of the errors based on a sample. • The average difference is: n D D i 1 i n where Di audited value original value 47 Applications of confidence interval estimation in auditing • The standard deviation of the difference is: n SD D D i 1 2 i n 1 • The confidence interval estimate for the total difference is: SD ND N tn1 n N n N 1 48 Applications of confidence interval estimation in auditing • Often organisations are interested in determining the maximum allowable proportion of a certain event (such as a non-complying item). • This requires a one-sided confidence interval for a proportion: ps 1 ps N n ps Z n N 1 where Z is the right hand tail probability of . 49 Ethical issues and confidence interval estimation Ethical issues to consider: • Are confidence intervals included with point estimates? (Can you find examples in the media where this has not occurred?) • Is the sample size stated? • Is the confidence interval interpreted so that a non-statistician can clearly understand it? • Is every effort made to avoid ambiguity or misleading conclusions? 50 After the lecture each week… • Review the lecture material • Complete all readings • Complete all of recommended problems (listed in SG) from the textbook • Complete at least some of additional problems • Consider (briefly) the discussion points prior to tutorials 51