Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
8 Chapter Sampling Distributions and Estimation (Part 1) Sampling Variation Estimators and Sampling Distributions Sample Mean and the Central Limit Theorem Confidence Interval for a Mean (m) with Known s Confidence Interval for a Mean (m) with Unknown s Confidence Interval for a Proportion (p) McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. Sampling Variation • • • 8A-2 Sample statistic – a random variable whose value depends on which population items happen to be included in the random sample. Depending on the sample size, the sample statistic could either represent the population well or differ greatly from the population. This sampling variation can easily be illustrated. Sampling Variation • Consider eight random samples of size n = 5 from a large population of GMAT scores for MBA applicants. • The sample means ( xi ) tend to be close to the population mean (m = 520.78). 8A-3 Sampling Variation • 8A-4 The dot plots show that the sample means have much less variation than the individual sample items. Sampling Variation 8A-5 Estimators and Sampling Distributions Some Terminology • • • 8A-6 Estimator – a statistic derived from a sample to infer the value of a population parameter. Estimate – the value of the estimator in a particular sample. Population parameters are represented by Greek letters and the corresponding statistic by Roman letters. Estimators and Sampling Distributions Examples of Estimators 8A-7 Estimators and Sampling Distributions Sampling Distributions • • • 8A-8 The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of size n is taken. An estimator is a random variable since samples vary. ^ Sampling error = – Estimators and Sampling Distributions Bias • 8A-9 • Bias is the difference between the expected value of the estimator and the true parameter. Bias = E( ^ )– • ^ An estimator is unbiased if E( ) = • On average, an unbiased estimator neither overstates nor understates the true parameter. Estimators and Sampling Distributions Bias • Sampling error is random whereas bias is systematic. Figure 8.4 • 8A-10 An unbiased estimator avoids systematic error. Estimators and Sampling Distributions 8A-11 Estimators and Sampling Distributions Efficiency • • Efficiency refers to the variance of the estimator’s sampling distribution. A more efficient estimator has smaller variance. Figure 8.5 8A-12 Estimators and Sampling Distributions Consistency • A consistent estimator converges toward the parameter being estimated as the sample size increases. Figure 8.6 8A-13 Sample Mean and the Central Limit Theorem Central Limit Theorem (CLT) for a Mean • • 8A-14 If a random sample of size n is drawn from a population with mean m and standard deviation s, the distribution of the sample mean x approaches a normal distribution with mean m and standard deviation sx = s/ n as the sample size increase. If the population is normal, the distribution of the sample mean is normal regardless of sample size. Sample Mean and the Central Limit Theorem • 8A-15 If the population is exactly normal, then the sample mean follows a normal distribution. Sample Mean and the Central Limit Theorem • 8A-16 As the sample size n increases, the distribution of sample means narrows in on the population mean µ. Sample Mean and the Central Limit Theorem • 8A-17 If the sample is large enough, the sample means will have approximately a normal distribution even if your population is not normal. Sample Mean and the Central Limit Theorem Illustrations of Central Limit Theorem 8A-18 Sample Mean and the Central Limit Theorem Illustrations of Central Limit Theorem • Symmetric population 8A-19 Sample Mean and the Central Limit Theorem Illustrations of Central Limit Theorem • Skewed population 8A-20 Sample Mean and the Central Limit Theorem Example - Bottle Filling: Variation in X 8A-21 Sample Mean and the Central Limit Theorem Sample Size and Standard Error • The standard error declines as n increases, but at a decreasing rate. s Make the interval m + z n small by increasing n. The distribution of sample means collapses at the true population mean m as n increases. 8A-22 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population 8A-23 • Consider a discrete uniform population consisting of the integers {0, 1, 2, 3}. • The population parameters are: m = 1.5, s = 1.118 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population • 8A-24 All possible samples of size n = 2, with replacement, are given below along with their means. Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population • 8A-25 The population is uniform, yet the distribution of all possible sample means has a peaked triangular shape. Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population • The CLT’s predictions for the mean and standard error are mx = m = 1.5 and sx = s/ n 8A-26 = 1.118/ 2 = 0.7905 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population • x the mean of means is x = 1(0.0) + 2(.05) + 3(1.0) + 4(1.5) + 3(2.0) + 2(2.5) + 1(3.0) = 1.5 16 • 8A-27 The standard deviation of the means is Confidence Interval for a Mean (m) with Known s What is a Confidence Interval? • A sample mean x is a point estimate of the population mean m. • A confidence interval for the mean is a range mlower < m < mupper The confidence level is the probability that the confidence interval contains the true population mean. The confidence level (usually expressed as a %) is the area under the curve of the sampling distribution. • • 8A-28 Confidence Interval for a Mean (m) with Known s What is a Confidence Interval? • 8A-29 The confidence interval for m with known s is: Confidence Interval for a Mean (m) with Known s Choosing a Confidence Level • • • 8A-30 A higher confidence level leads to a wider confidence interval. Greater confidence implies loss of precision. 95% confidence is most often used. Confidence Interval for a Mean (m) with Known s Interpretation • • • 8A-31 A confidence interval either does or does not contain m. The confidence level quantifies the risk. Out of 100 confidence intervals, approximately 95% would contain m, while approximately 5% would not contain m. Confidence Interval for a Mean (m) with Known s Is s Ever Known? • • • 8A-32 Yes, but not very often. In quality control applications with ongoing manufacturing processes, assume s stays the same over time. In this case, confidence intervals are used to construct control charts to track the mean of a process over time. Confidence Interval for a Mean (m) with Unknown s Student’s t Distribution • Use the Student’s t distribution instead of the normal distribution when the population is normal but the standard deviation s is unknown and the sample size is small. s x+t n • The confidence interval for m (unknown s) is s s x-t <m< x+t n n 8A-33 Confidence Interval for a Mean (m) with Unknown s Student’s t Distribution 8A-34 Confidence Interval for a Mean (m) with Unknown s Student’s t Distribution • • 8A-35 t distributions are symmetric and shaped like the standard normal distribution. The t distribution is dependent on the size of the sample. Figure 8.11 Confidence Interval for a Mean (m) with Unknown s Degrees of Freedom • • 8A-36 Degrees of Freedom (d.f.) is a parameter based on the sample size that is used to determine the value of the t statistic. Degrees of freedom tell how many observations are used to calculate s, less the number of intermediate estimates used in the calculation. n=n-1 Confidence Interval for a Mean (m) with Unknown s Degrees of Freedom • • 8A-37 As n increases, the t distribution approaches the shape of the normal distribution. For a given confidence level, t is always larger than z, so a confidence interval based on t is always wider than if z were used. Confidence Interval for a Mean (m) with Unknown s Comparison of z and t • • • • 8A-38 For very small samples, t-values differ substantially from the normal. As degrees of freedom increase, the tvalues approach the normal z-values. For example, for n = 31, the degrees of freedom are: n = 31 – 1 = 30 What would the t-value be for a 90% confidence interval? Confidence Interval for a Mean (m) with Unknown s Comparison of z and t For n = 30, the corresponding z-value is 1.645. 8A-39 Confidence Interval for a Mean (m) with Unknown s Example GMAT Scores Again • Here are the GMAT scores from 20 applicants to an MBA program: Figure 8.13 8A-40 Confidence Interval for a Mean (m) with Unknown s Example GMAT Scores Again • Construct a 90% confidence interval for the mean GMAT score of all MBA applicants. x = 510 • • 8A-41 s = 73.77 Since s is unknown, use the Student’s t for the confidence interval with n = 20 – 1 = 19 d.f. First find t0.90 from Appendix D. Confidence Interval for a Mean (m) with Unknown s • 8A-42 For a 90% confidence interval, use Appendix D to find t0.05 = 1.729 Confidence Interval for a Mean (m) with Unknown s Example GMAT Scores Again • The 90% confidence interval is: s s x-t x + t <m< n n 73.77 73.77 513 – 1.729 < m < 513 + 1.729 20 20 513 – 28.52 < m < 513 + 28.52 • 8A-43 We are 90% certain that the true mean GMAT score is within the interval 481.48 < m < 538.52. Confidence Interval for a Mean (m) with Unknown s Confidence Interval Width • • 8A-44 Confidence interval width reflects - the sample size, - the confidence level and - the standard deviation. To obtain a narrower interval and more precision - increase the sample size or - lower the confidence level (e.g., from 90% to 80% confidence) Confidence Interval for a Mean (m) with Unknown s A “Good” Sample • 8A-45 Here are five different samples of 25 births from a population of N = 4,409 births and their 95% CIs. Confidence Interval for a Mean (m) with Unknown s A “Good” Sample • An examination of the samples shows that sample 5 has an outlier. Figure 8.15 • The outlier is a warning that the resulting confidence interval possibly could not be trusted. In this case, a larger sample size is needed. • 8A-46 Confidence Interval for a Mean (m) with Unknown s Using Appendix D • • • • 8A-47 Beyond n = 50, Appendix D shows n in steps of 5 or 10. If the table does not give the exact degrees of freedom, use the t-value for the next lower n. This is a conservative procedure since it causes the interval to be slightly wider. For d.f. above 150, use the z-value. Confidence Interval for a Mean (m) with Unknown s Using Excel • 8A-48 Use Excel’s function =TINV(probability, d.f.) to obtain a two-tailed value of t. Here, “probability” is 1 minus the confidence level. Figure 8.17 Confidence Interval for a Mean (m) with Unknown s Using MegaStat • MegaStat give you a choice of z or t and does all calculations for you. Figure 8.18 8A-49 Confidence Interval for a Mean (m) with Unknown s Using MINITAB • Figure 8.19 8A-50 MINITAB also gives confidence intervals for the median and standard deviation. Confidence Interval for a Proportion (p) • • • 8A-51 A proportion is a mean of data whose only value is 0 or 1. The Central Limit Theorem (CLT) states that the distribution of a sample proportion p = x/n approaches a normal distribution with mean p and standard deviation p(1-p) sp = n p = x/n is a consistent estimator of p. Confidence Interval for a Proportion (p) Illustration: Internet Hotel Reservations • • • 8A-52 Management of the Pan-Asian Hotel System tracks the percent of hotel reservations made over the Internet. The binary data are: 1 Reservation is made over the Internet 0 Reservation is not made over the Internet After data was collected, it was determined that the proportion of Internet reservations is p = .20. Confidence Interval for a Proportion (p) Illustration: Internet Hotel Reservations 8A-53 • Here are five random samples of n = 20. Each p is a point estimate of p. • Notice the sampling variation in the value of p. Confidence Interval for a Proportion (p) Applying the CLT • 8A-54 The distribution of a sample proportion p = x/n is symmetric if p = .50 and regardless of p, approaches symmetry as n increases. Confidence Interval for a Proportion (p) Applying the CLT • • • • 8A-55 As n increases, the statistic p = x/n more closely resembles a continuous random variable. As n increases, the distribution becomes more symmetric and bell shaped. As n increases, the range of the sample proportion p = x/n narrows. The sampling variation can be reduced by increasing the sample size n. Confidence Interval for a Proportion (p) When is it Safe to Assume Normality? • • Rule of Thumb: The sample proportion p = x/n may be assumed to be normal if both np > 10 and n(1-p) > 10. Sample size to assume normality: 8A-56 Table 8.9 Confidence Interval for a Proportion (p) Standard Error of the Proportion • • 8A-57 The standard error of the proportion sp depends on p, as well as n. It is largest when p is near .50 and smaller when p is near 0 or 1. Confidence Interval for a Proportion (p) Standard Error of the Proportion • The formula for the standard error is symmetric. Figure 8.22 8A-58 Confidence Interval for a Proportion (p) Standard Error of the Proportion • Enlarging n reduces the standard error sp but at a diminishing rate. Figure 8.23 8A-59 Confidence Interval for a Proportion (p) Confidence Interval for p • The confidence interval for p is p+z p(1-p) n Where z is based on the desired confidence. • 8A-60 Since p is unknown, the confidence interval for p = x/n (assuming a large sample) is p(1-p) p+z n Confidence Interval for a Proportion (p) Confidence Interval for p • 8A-61 z can be chosen for any confidence level. For example, Confidence Interval for a Proportion (p) Example Auditing • A sample of 75 retail in-store purchases showed that 24 were paid in cash. What is p? p = x/n = 24/75 = .32 • Is p normally distributed? np = (75)(.32) = 24 n(1-p) = (75)(.88) = 51 Both are > 10, so we may conclude normality. 8A-62 Confidence Interval for a Proportion (p) Example Auditing • The 95% confidence interval for the proportion of retail in-store purchases that are paid in cash is: .32(1-.32) p(1-p) = .32 + 1.96 p+z 75 n = .32 + .106 • 8A-63 .214 < p < .426 We are 95% confident that this interval contains the true population proportion. Confidence Interval for a Proportion (p) Narrowing the Interval • • 8A-64 The width of the confidence interval for p depends on - the sample size - the confidence level - the sample proportion p To obtain a narrower interval (i.e., more precision) either - increase the sample size - reduce the confidence level Confidence Interval for a Proportion (p) Using Excel and MegaStat • To find a confidence interval for a proportion in Excel, use (for example) =0.15-NORMSINV(.95)*SQRT(0.15*(1-0.15)/200) =0.15+NORMSINV(.95)*SQRT(0.15*(10.15)/200) 8A-65 Confidence Interval for a Proportion (p) Using Excel and MegaStat • In MegaStat, enter p and n to obtain the confidence interval for a proportion. Figure 8.23 • 8A-66 MegaStat always assumes normality. Confidence Interval for a Proportion (p) Using Excel and MegaStat • • If the sample is small, the distribution of p may not be well approximated by the normal. Confidence limits around p can be constructed by using the binomial distribution. Figure 8.24 8A-67 Confidence Interval for a Proportion (p) Polls and Margin of Error • • • 8A-68 In polls and surveys, the confidence interval width when p = .5 is called the margin of error. Below are some margins of error for 95% confidence interval assuming p = .50. Each reduction in the margin of error requires a disproportionately larger sample size. Confidence Interval for a Proportion (p) Rule of Three • If in n independent trials, no events occur, the upper 95% confidence bound is approximately 3/n. Very Quick Rule • A Very Quick Rule (VQR) for a 95% confidence interval when p is near .50 is p + 1/ n 8A-69 Applied Statistics in Business & Economics End of Chapter 8A 8A-70 McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc.