Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 7 ESTIMATES AND SAMPLE SIZES 1 ESTIMATION: AN INTRODUCTION Introduction We have come a long way. We started by learning “what is statistics and the two areas of applied statistics.” In Chapter 1, we learned that: 1. Descriptive statistics consists of methods for organizing, displaying, and describing data by using tables, graphs, and summary measures. 2. Inferential statistics consists of methods that used samples to make decisions or predictions about the population. In Chapters 2 and 3, we focused on descriptive statistics and learned how to draw tables, how to graph data, and how to calculate numerical summary measures such as mean, median, mode, variance, and standard deviation. Now in Chapters 7, we will focus on inferential statistics. We begin by discussing estimation. 2 ESTIMATION: AN INTRODUCTION Definition Estimation is a process for assigning value(s) to a population parameter based on information collected from a sample. There are many real-life examples in which “estimation” is used. A few of them are, for example, to estimate the: 1. 2. 3. 4. Mean of fuel consumption for a particular model car. Proportion of students that completed MAT 12 course with a passing grade for the past 10 years. Proportion of female high school students that dropped out of school because of pregnancy. Percentage of all California lawyers disbarred for committing a criminal offense. 3 ESTIMATION: AN INTRODUCTION Of course we can conduct a census to find the true mean or proportion of the population in 1 through 4. However, for what we now know about census, it would be: 1. 2. 3. Expensive. Difficult to reach or contact every member of the population. Time consuming. So, because of the problem with census, a representative sample is generally drawn from the population and the appropriate sample statistic is calculated. Then, 1. 2. A value is assigned to the population parameter based on the calculated value of the sample statistic. The value assigned to the population parameter based on the value of sample statistic is called an estimate of the population parameter. 4 ESTIMATION: AN INTRODUCTION For example, the Mathematics Department draws a sample of 50 students from all students who have taken MAT 12 for the past 10 years. The department records the number of students that passed and failed the course, and calculated the sample proportion, p̂ , of students who passed the course to be 0.65. So, • If the department assigns the value of sample proportion, p̂, to the population proportion, p, then 0.65 is called an estimate of p and p̂ is called the estimator. Summary Estimation procedure involves: • Draw a sample from the population. • Collect required information from each element of the sample. • Calculate the value of sample statistic. • Assign the value to corresponding population parameter. Note: The sample must be a simple random sample. 5 7-2 ESTIMATING A POPULATION PROPORTION The estimated value of population parameter can either be based on a point estimate or an interval estimate. Point Estimate - Definition A point estimate is the value of sample statistic used to estimate population parameter. So, suppose we used the sample proportion, p̂ , as a point estimate of p, then we can say that the proportion of all students that have taken MAT 12 course with a passing grade for the past 10 years is about 0.65. That is, Point estimate of population parameter Value of corresponding sample statistic We discussed in Chapter 6 that the value of sample statistic varies from one sample to another that are of the same size and drawn from the population. Therefore, 1. 2. The value assigned to the population proportion, p , based on a point estimate depends on the sample drawn. The value assigned to population parameter is almost always different from the true value of population parameter. 6 An Interval Estimation Definition: An interval estimate is an interval build around the point estimate and then a probabilistic statement is made that the built interval contains the corresponding population parameter. Therefore following on to our example, rather than saying that the proportion of all students that have taken MAT 12 in the last 10 years is 0.65, we would: Add and subtract a number to 0.65 to obtain an interval and then 2. Say that the interval contains the population proportion, p. 1. Now, let us add and subtract 0.2 to 0.65. Then we obtain an interval (0.65 0.2 to 0.65 0.2) (0.45 to 0.85) We state that the population proportion, p, is likely to be contained in the interval 0.45 to 0.85. 2. We also state that the proportion of all students that have taken MAT 12 with a passing grade in the past 10 years is between 0.45 and .85. 3. The 0.45 is called the lower limit and 0.85 is called the upper limit. 7 4. The number we subtracted and added to the point estimate is called margin of error. 1. An Interval Estimation 5. 6. 7. The value of margin of error depends on: a. Standard deviation, p̂, of the sample proportion, p̂ . b. Level of confidence that we like to attach to the interval. So, a. The larger is , the greater is margin of p̂ error. b. To ensure that the population proportion is contained in the interval, we have to use a higher confidence level. c. We add a probabilistic statement so the interval is based on the confidence level. d. An interval constructed based on the confidence level is called a confidence interval. Confidence interval is defined as p̂ p pˆ .65 .45 Confidence Interval Point estimate Margin of error .85 8 An Interval Estimation 8. The confidence level associated with a confidence interval is defined as Confidence level (1 )100% or it is called confidence coefficient when expressed as probability and expressed as: Confidence level (1 ) significance level. This formula means that we have (1 )100% confidence that the interval contains the true population proportion. 9 7.3-7.4 ESTIMATION OF A POPULATION MEAN: KNOWN The three possible cases on how to construct a confidence interval for population mean with known are as follows: I. We use standard normal distribution to construct the confidence interval for with 1. 2. 3. II. x n assuming that n N 0.05 if: Standard deviation is known. Sample size is small, n<30 Population is normally distributed or at least close to normal distribution provided there is no outliers. We use standard normal distribution to construct the confidence interval for with 1. 2. 3. x n assuming that n N 0.05 if: Standard deviation is known. Sample size is large, n 30 By central limit theorem, the sampling distribution of the sample mean is approximately normal. However, we may not be able to use standard normal distribution if the population distribution is very different from normal distribution. 10 ESTIMATION OF A POPULATION MEAN: KNOWN III. We use a nonparametric method to construct the confidence interval if: a. b. c. Standard deviation is known. Sample size is small, n<30 Population is not normally distributed or is unknown. The rest of this section will deal with Cases I and II. We will not cover the 3rd case. Formula The (1 )100% confidence interval for under Cases I and II is defined as, (1 )100% confidence interval x z x , where, x n and the margin of error, E z x 11 ESTIMATION OF A POPULATION MEAN: KNOWN Three Possible Cases 12 ESTIMATION OF A POPULATION MEAN: KNOWN Let us revisit the definition of confidence level. Remember that Confidence level (1 ) where is the significance level. (1 )% curve of from . x confidence level is the area under the standard normal between two points on both sides and of equal distance 13 ESTIMATION OF A POPULATION MEAN: KNOWN How to determine z given confidence level 1. To find the 2 locations for z, first the a. b. Area between the 2 z’s is (1 ) Since z1 and z2 are the same distance from the mean, , then the sum of areas to the left of z1 and right of z2 is 1 (1 ) c. 2. 3. Since the area to the left of z1 and the area to the right of z2 are equal, then: Using table A-2, we can find the values of z1 and z2 that correspond to the required area. Note that the values of z1 and z2 are the same, but they have opposite signs. Area to the left of z1 Area to the left of z 2 2 2 14 ESTIMATION OF A POPULATION MEAN: KNOWN Interpretation of confidence level Let us consider 20 samples of the same size taken from the same population. Then, 1. 2. 3. 4. Let us calculate the sample mean,x for each sample. Let us then calculate the confidence interval for around each sample mean, x , based on a confidence level of 90%. The normal curve of the sampling distribution for x is shown to the right. In the context of this example, we say that 90% of the intervals such as for x1 and x2 will include , and 10% such as the interval around x3 will not. 15 ESTIMATION OF A POPULATION MEAN: KNOWN Width of a confidence Interval As stated previously, the confidence interval is defined as, (1 )100% confidence interval x z x , where z x is margin of error. Then the width of the confidence interval depends on z x , which in turn depends on: 1. 2. z which depends on the confidence level and n because x n Since is out of control of the investigators, then the width of confidence level can only be controlled by using z and n. Thus, the width is controlled by the following relationships: 1. 2. 3. 4. The value of z increases as the confidence level increases. The value of z decreases as the confidence level decreases. With n remaining constant, the higher the confidence level, the larger the width of a confidence interval. An increase in the sample size causes a decrease in the width of confidence level In conclusion, we can reduce the width of a confidence interval 16 by lowering confidence level or increase sample size. Determining the Sample Size for the Estimation of Mean Because of the problems associated with conducting a census or even a sample survey, we need to find a way to determine a sample size that will produce required results without wasting unnecessary effort or financial resources on surveying larger sample size. E z n z n n z 2 2 n E2 E n E n z n z E So, to find the appropriate sample size, n, we need: Confidence level Width of a confidence interval So, having a predetermined margin of error, we can find the sample size that will produce the required results. Note that if is not known, one could take a small sample and calculate sample standard deviation, s, and then use the s in lieu of 17 in the formula. ESTIMATION OF A POPULATION MEAN: KNOWN Example #1 – Problem 8.10 Find z for each of the following confidence levels a) 90% b) 95% Example #1 – Solution a) Given: 1 .90 .10, .05 2 From Table IV, the value of z that corresponds to the area .05 to the left of z1 is 1.65 or 1.64. Also, the value of z that corresponds to the area .05 to the right of z 2 is 1.64 or 1.65. Thus, the value of z that corresponds to a confidence level of 90% is 1.64 or 1.65. .05 .90 z1 .05 z2 b) Given: 1 .95 .05, .025 2 From Table IV, the value of z that corresponds to the area .025 to the left of z1 is 1.96 Also, the value of z that corresponds to the area .025 to the right or 0.975 to the left of z 2 is 1.96 Thus, the value of z that corresponds to a confidence level of 95% is 1.96. .025 .95 z1 .025 z2 18 ESTIMATION OF A POPULATION MEAN: KNOWN Example #2 For a data set obtained from a sample n = 81 and x=48.25. It is known that = 4.8. a) What is the point estimate of ? b) Make a 95% confidence interval for c) What is the margin of error of estimate for part b? Example #2 – Solution Given : n 81, x 48.25, 4.8, population is normally distribute d. a) What is the point estimate of Point estimate of x 48.25 19 ESTIMATION OF A POPULATION MEAN: KNOWN Example #2 – Solution Given : n 81, x 48.25, 4.8, population is normally distribute d. b) Make a 95% confidence interval for . The confidence level is 95%. Hence, the areas in each tail of the normal .05 curve is 0.025 2 2 Since population is normally distribute d then we can use normal distributi on to make confidence interval. Thus, from Table IV, the value of z that correspond s to the area of 0.025 in each tail of the curve is 1.96. 4.8 Thus, x 0.533333 n 81 The confidence interval for x z x 48.25 1.96(.5333) 48.25 1.05 47.20 to 49.30 c) What is the margin of error 20 E z x 1.96(.5333) 1.05 ESTIMATION OF A POPULATION MEAN: KNOWN Example #3 The standard deviation for population is = 14.8. A sample of 25 observations selected from this population gave a mean equal to 143.72. The population is known to have a normal distribution. a) Make a 99% confidence interval for b) Construct a 95% confidence interval for c) Determine a 90% confidence interval for d) Does the width of the confidence intervals constructed in parts a through c decrease as the confidence level decreases? Explain your answer. 21 ESTIMATION OF A POPULATION MEAN: KNOWN Example #3 – Solution Given : n 25, x 143.72, 14.8, population is normally distribute d. Since sample is drawn from a normally distribute d population , then, 14.8 2.96 n 25 a) Make a 99% confidence interval for . x The confidence level is 99%. Hence, the areas in each tail of the normal .01 curve is 0.005 2 2 Since population is normally distribute d then we can use normal distributi on to make confidence interval. Thus, from Table IV, the value of z that correspond s to the area of 0.005 in each tail of the curve is 2.57 or 2.58. Thus, the 99% confidence interval for x z x 143.72 2.58(2.96) 143.72 7.64 136.08 to 151.36 22 ESTIMATION OF A POPULATION MEAN: KNOWN Example #3 – Solution Given : n 25, x 143.72, 14.8, population is normally distribute d. Since sample is drawn from a normally distribute d population , then, 14.8 2.96 n 25 b) Make a 95% confidence interval for . x The confidence level is 95%. Hence, the areas in each tail of the normal .05 curve is 0.025 2 2 Since population is normally distribute d then we can use normal distributi on to make confidence interval. Thus, from Table IV, the value of z that correspond s to the area of 0.025 in each tail of the curve is 1.96. Thus, the 95% confidence interval for x z x 143.72 1.96(2.96) 143.72 5.80 137.92 to 149.52 23 ESTIMATION OF A POPULATION MEAN: KNOWN Example #3 – Solution Since sample is drawn from a normally distribute d population , then, 14.8 x 2.96 n 25 c) Make a 90% confidence interval for . The confidence level is 90%. Hence, the areas in each tail of the normal .10 curve is 0.05 2 2 Since population is normally distribute d then we can use normal distributi on to make confidence interval. Thus, from Table IV, the value of z that correspond s to the area of 0.05 in each tail of the curve is 1.64 or 1.65. Thus, the 95% confidence interval for x z x 143.72 1.65(2.96) 143.72 4.88 138.84 to 148.60 d) Yes, because as the confidence level decreases, so is the z value and the width 24 of the interval. ESTIMATION OF A POPULATION MEAN: KNOWN Example #4 For a population, the value of the standard deviation is 4.96. A sample of 32 observations taken from this population produced the following data. 74 85 72 73 86 81 77 60 83 78 79 88 76 73 84 78 81 72 82 81 79 83 88 86 78 83 87 82 80 84 76 74 a) What is the point estimate of b) Make a 99% confidence interval for c) What is the margin or error of estimate for part b? Example #4 – Solution 2543 Given : n 32 and from the given data, x 79.4688 32 Although t he sampling distributi on for x is not known, the sampling distributi on of x is approximat ely normal because sample size is large, 4.96 n 30 by the central limit theo rem. Thus, x .8768 n 32 a) What is the point estimate of Point estimate of x 479.4688 25 ESTIMATION OF A POPULATION MEAN: KNOWN Example #4 – Solution Since sample is approximat ely normally distribute d, then, 4.96 x .8768 n 32 b) Make a 99% confidence interval for . The confidence level is 99%. Hence, the areas in each tail of the normal .01 curve is 0.005 2 2 Since sample is approximat ely normally distribute d, then we can use normal distributi on to make confidence interval. Thus, from Table IV, the value of z that correspond s to the area of 0.005 in each tail of the curve is 2.57 or 2.58. Thus, the 99% confidence interval for x z x 79.4688 2.58(.8768) 79.4688 2.2621 77.21 to 81.73 c) What is the margin of error 26 E z x 2.58(.8768) 2.2621 ESTIMATION OF A POPULATION MEAN: KNOWN Example #5 For a population data set, = 14.50. a) What should the sample size be for a 98% confidence interval for to have a margin of error of estimate equal to 5.50? b) What should the sample size be for a 95% confidence interval for to have a margin of error of estimate equal to 4.25? Example #5 – Solution Given : 14.50 a) Given that the confidence level 98% and E 5.50, find sample size. .02 The areas in each tail under the normal curve is 0.01 2 2 From Table IV, the value of z that correspond s to the area of 0.01 in each tail under the normal curve is 2.33. Thus, n z 2 2 E 2 ( 2.33)2 (14.50)2 (5.5) 2 37.73 38 27 ESTIMATION OF A POPULATION MEAN: KNOWN Example #5 – Solution Given : 14.50 b) Given that the confidence level 95% and E 4.25, find sample size. .05 The areas in each tail under the normal curve is 0.025 2 2 From Table IV, the value of z that correspond s to the area of 0.025 in each tail under the normal curve is 1.96. Thus, n z 2 2 E 2 (1.96)2 (14.50)2 ( 4.25) 2 44.71 45 28 ESTIMATION OF A POPULATION MEAN: KNOWN Example #6 Inside the Box Corporation makes corrugated cardboard boxes. One type of these boxes states that the breaking capacity of this box is 75 pounds. Fifty-five randomly selected such boxes were loaded until they break. The average breaking capacity of these boxes was found to be 78.52 pounds. Suppose that the standard deviation of the breaking capacities of all such boxes is 2.63 pounds. Calculate a 99% confidence interval for the average breaking capacity of all boxes of this type. 29 ESTIMATION OF A POPULATION MEAN: KNOWN Example #6 – Solution Given : x 78.52, 2.63, n 55 Since sample is large, n 30, we can assume that the sampling distributi on of x is normally distribute d, then, 2.63 x .3546 n 55 The confidence level is 99%. Hence, the areas in each tail under the normal .01 curve is 0.005 2 2 Since sampling distributi on of x is approximat ely normally distribute d, then we can use normal distributi on to make confidence interval for . Thus, from Table IV, the value of z that correspond s to the area of 0.005 in each tail under the normal curve is 2.57 or 2.58. Thus, the 99% confidence interval for x z x 78.52 2.58(.3546) 30 78.52 .9147 77.61 to 79.43 pounds ESTIMATION OF A POPULATION MEAN: NOT KNOWN The three possible cases on how to construct a confidence interval for population mean when is unknown are as follows: I. We use t distribution to construct the confidence interval for 1. 2. 3. II. We use t distribution to construct the confidence interval for 1. 2. III. Standard deviation, , is unknown. Sample size is small, n<30 Population is normally distributed. Standard deviation, , is unknown. Sample size is large, n 30 if: if: We use a nonparametric method to construct the confidence interval for if: 1. 2. 3. Standard deviation, , is unknown. Sample size is small, n <30. Population is not normally distributed. 31 ESTIMATION OF A POPULATION MEAN: NOT KNOWN Three Possible Cases 32 The t Distribution • • The t distribution is also called student’s t distribution. It is similar to the normal distribution because it has: 1. 2. 3. • A bell-shape curve, A total area of 1.0 under the curve, and A population mean, , of zero It is different from the normal distribution curve because: 1. 2. 3. 4. It has a lower height and wider spread, The units are denoted by t, and It’s population standard deviation, , is defined as df /( df 2) df is the degree of freedom, and is defined as the number of observations that can be chosen freely. It is denoted as df n 1, where n is sample size • • t distribution depends only one parameter, df . As the sample size becomes larger, the t distribution approaches the standard normal distribution. 33 Figure 8.5 The t distribution for df = 9 and the standard normal distribution. 34 The t Distribution 1. Table A-3 lists t value for a given degree of freedom and an area in the right tail under a t distribution curve. 2. This area is the same as the area in the left tail under the t distribution curve because of symmetry. Steps to read t distribution in Table V: 1. Locate the value of degree of freedom under the column labeled “df”, and draw a horizontal line through the row. 2. Locate the area under one of the columns for areas in the right tail under the t distribution curve, and draw a vertical line through the column. 3. The entry where the horizontal line and vertical line meet is the required t value. 4. For example, let us find a t value for a t distribution with a sample size of 9 and an area of 0.01 in the right rail of the t distribution curve. 35 The t Distribution Example #7 Find the value of t for t distribution for each of the following, a) Area in the right tail = .05 & df = 12 b) Area in the left tail = .05 & df = 12 Example #7 – Solution (a) Given : Area in the right tail 0.05, df 12, then .05 2 From Table V, the required t value for t distributi on is 1.782 (b) Given : Area in the left tail 0.025, n 66, then df n 1 66 1 65, and .025 2 From Table V, the required t value for t distributi on is - 1.997 36 The t Distribution Example #8 For each of the following, find the area in the appropriate tail of the t distribution. a) t = 2.467 & df = 28 b) t = -1.672 & df = 58 c) t = -2.670 & n = 55 Example #8 – Solution (a) Given : t 2.467 & df 28 From Table V, the required area in each tail under the curve is 0.01 b) Given : t - 1.672 & df 58 From Table V, the required area in each tail under the curve is 0.05 c) Given : t 2.670 & n 55, then df n 1 55 1 54 From Table V, the required area in each tail under the curve is 0.005 37 Confidence Interval for μ Using the t Distribution In Section 4.3, we define x as x n However, since is normally unknown, we can estimate a sample standard deviation, s, and use it in lieu of and s x in place of x s x is calculated as, sx s n Therefore, the (1 – α)100% confidence interval for is (1 )100% confidence interval x ts x and Margin of error ts x Note: If df>75, we can either use: 1. The entries in last row of Table V, where df , or 2. A normal distribution to approximate the t distribution. 38 Confidence Interval for μ Using the t Distribution Example #9 Find the value of t from the t distribution table for each of the following. a) Confidence level = 99% & df = 13 b) Confidence level = 95% & n = 36 Example #9 – Solution a) Given : Confidence level 99% & df 13, then .01 and 0.005 2 From Table V, the required t value 3.012 b) Given : Confidence level 95% & n 36, then df n 1 35 .05 and 0.025 2 From Table V, the required t value 2.030 39 Confidence Interval for μ Using the t Distribution Example #10 – Problem 8.47 A sample of 11 observations taken from a normally distributed population produced the following data. -7.1 10.3 8.7 -3.6 -6.0 -7.5 5.2 3.7 9.8 -4.4 6.4 a) What is the point estimate of b) Make a 95% confidence interval for c) What is the margin of error of estimate for part b? Example #10 – Solution Given : n 11 a) What is the point estimate of 15.5 Poin t estimate of x 1.4091 11 40 Confidence Interval for μ Using the t Distribution Example #10 – Solution b) Make a 95% confidence interval for Since is unknown, we have to determine x ( x) 2 (15.5) 2 2 534.49 x n 11 7.1600 s n 1 10 s 7.1600 sx 2.1588 n 11 For a 95% confidence level, 0.05 and .025 2 The required t value for an area of .025 and df 10, is 2.228 95% confidence interval for x ts x 1.4091 2.228(2.1588) 1.4091 4.8098 - 3.40 to 6.22 c) E ts x 4.8098 x2 -7.1 50.41 10.3 106.09 8.7 75.69 -3.6 12.96 -6.0 36.00 -7.5 56.25 5.2 27.04 3.7 13.69 9.8 96.04 -4.4 19.36 6.4 40.96 2 x 15.5 x 534.49 41 Confidence Interval for μ Using the t Distribution Example #11 A random sample of 16 airline passengers at the Bay City airport showed that the mean time spent waiting in line to check in at the ticket counter was 31 minutes with a standard deviation of 7 minutes. Construct a 99% confidence interval for the mean time spent waiting in line by all passengers at this airport. Assume that such waiting times for all passengers are normally distributed. Example #11 – Solution Given n 16, x 31 minutes, s 7 minutes, Confidence level 99%, then df 15 Population is normally distribute d. sx s 7 1.75 n 16 For a 99% confidence level, 0.01 and .005 2 The required t value for an area of .025 and df 15, is 2.947 99% confidence interval for x ts x 31 2.947(1.75) 31 5.16 - 25.84 to 36.16 42 ESTIMATION OF A POPULATION PROPORTION: LARGE SAMPLES We already learned that for a large sample size, that is, np>5 and nq > 5, then 1. The sampling distribution of p̂ is approximately normally distributed 2. The mean, p̂ , of the sampling distribution of p̂ is equal to the population proportion 3. The standard deviation, p̂ , of the sampling distribution of the sample proportion, p̂ , is define as, pq pˆ where q 1 p n Since we may not know of p̂ The p̂ , we will need to use s pˆ as an estimate ˆˆ pq spˆ where pˆ point estimate of p n (1 )100% Confidence interval for the p = Margin of error = zs p̂ pˆ zs pˆ 43 DETERMINING THE SAMPLE SIZE FOR THE ESTIMATION OF PROPORTION Given the confidence level and the values of p̂ and q̂ , the sample size that will produce a predetermined maximum of error E of the confidence interval estimate of p is ˆˆ z pq n 2 E 2 44 DETERMINING THE SAMPLE SIZE FOR THE ESTIMATION OF PROPORTION In case the values of p̂ and q̂ are not known 1. We make the most conservative estimate of the sample size n by using pˆ .5 and qˆ .5 2. We take a preliminary sample (of arbitrarily determined size) and calculate p̂ and q̂ from this sample. Then use these values to find n. 45 Example Example #12 Check if the sample size is large enough to use the normal distribution to make a confidence interval for P for each of the following cases. a. n=50, p̂ =.25, b. N=160, p̂ =.03 Answers: a. n p̂ = (50)(.25)=12.5, and n q̂ = (50)(.75)=37.5 so, the sample size is large enough t use the normal distribution. b. n p̂ = (160)(.03)= 4.8 , the sample size is not large enough to use the normal distribution . 46 Example Example #12 A sample of 200 observation selected from a population produced a sample proportion equal to .91. Make a 90% confidence interval for p. Answer: n=200, p̂ =.91, q̂ =1-.91=.09, s pˆ pˆ qˆ / n =.02023611 The 90% confidence interval for p is pˆ zs pˆ =.91+1.65(.02023611)= =.877 to .943 47