Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Estimation Let's return to our example of the random sample of 200 USC undergraduates. Remember that this is both a large and a random sample, and therefore the Central Limit Theorem applies to any statistic that we calculate from it. We ask these 200 randomly-selected USC students to tell us their grade point average (GPA). We calculate the mean GPA for the sample and find it to be 2.58. Next, we calculate the standard deviation for these self-reported GPA values and find it to be 0.44. How can we use these two simple univariate statistics from our random sample to estimate the probable GPA for the entire USC student body (i.e, the statistical universe)? Recall that the standard error is the standard deviation of the sampling distribution. The Central Limit Theorem tells us how to estimate it: sY ̂ N The standard error is estimated by dividing the standard deviation of the sample by the square root of the size of the sample. In our example, 0.44 ˆ 200 0.44 ˆ 14.142 ˆ 0.031 First, we let our sample mean (Y-bar), 2.58, be our estimate of the mean of the sampling distribution of all possible mean GPAs from random samples of 200. Since the Central Limit Theorem applies in this situation, we know that the mean of the sampling distribution is exactly equal to the (unknown) mean GPA for the universe. Thus, 2.58 is the first element in our estimate of the mean of all USC undergraduate GPAs. This is called the point estimate. However, we do not stop here. Estimation consists of inferring TWO values. These are known as the upper and lower confidence levels (UCL and LCL). We have no reason to expect that the universe (i.e., actual) GPA is identical to the sample GPA. However, we do know the likelihood of GPA values above and below 2.58. For example, we know that 95.44 percent of the probable true GPAs lie between z = – 2.00 and z = + 2.00. What we need to do is to CONVERT our known z-values (i.e., – 2.00 and + 2.00) into GPA scores. To do this, we need to know the value of the conversion factor, just the same as we need to know the value of the exchange rate if we wish to know how many Deutsche marks we can purchase for one $1.00 US. The value of this conversion factor, or exchange rate, is the standard error of the estimate. Here, the conversion rate—the standard error—is ˆ 0.031 We can now make our estimates of the confidence limits (i.e., LCL and UCL), as follows: LCL Y ( Z / 2 )(ˆ Y ) and UCL Y ( Z / 2 )(ˆ Y ) We have everything that we need except the appropriate z-values from the table for the normal curve. We need to decide on how confident that we want to be that we have captured the “true” (unknown actual) universe mean GPA within our estimates. Let’s say that we want to be 95 percent sure that we have captured the “true” limits. Dividing the normal curve into two equal halves (because it is symmetrical), this means that in each half (.5000) we are looking for the z-scores that divide the halves of the curve into the 47.50 percent (.4750) of the area nearest the center (Column B in Table 8.1) and the 2.50 (.0250) percent of the area in the two tails (Column C in the table). In Column A of this table, we find that the z-values are 1.96. We now know where we are on the underlying x-axis. We do NOT know where we are on the curve itself where this 95-percent area begins and ends. To find out, we need to use our “currency exchange rate,” the standard error of the estimate. Recall, we estimated its value to be 0.031 (above). In other words, when we travel a distance of 1 z on the x-axis, this is equivalent to traveling 0.031 GPA units on the curve. Since we need to travel slightly less than 2 z, the distance in GPA units is simply 1.96 x 0.031, or a distance of 0.061 GPA. Because we need to move both to the left and to the right of the mean GPA, we need to SUBTRACT this value from the mean of the sampling distribution to get the lower confidence limit and ADD this value to the mean of the sampling distribution to get the upper confidence limit. We presume that the value of the mean of the sampling distribution—the point estimate—is the sample mean, 2.58. Thus, the lower confidence limit is 2.52 and the upper confidence limit is 2.64. In other words, we are 95 percent confident that the “true” undergraduate GPA is equal to or greater than 2.52 and equal to or less then 2.64. Rendering this into more general form yields: LCL 2.58 – (1.96)(0.031) LCL 2.58 – (0.061) LCL 2.52 and UCL 2.58 + (1.96)(0.031) UCL 2.58 + (0.061) UCL 2.64 Let's say we want to be 99 percent confident. We would need to travel to the right AND to the left from the mean of the sampling distribution on the x-axis so as to encompass 99 percent of the area under the sampling distribution curve. This means covering 49.5 percent of the area in the right half and 49.5 percent in the left half (95 / 2 = 49.5). Let's look for the z-values for area 0.4950 in Column B of Table 8.1. We see, however, that there is no such value in the table. The closest values are 0.4949 and 0.4951. However, we know how to interpolate. The z-values that we are looking for are half-way between 2.57 and 2.58. Interpolating, we arrive at z = 2.575. LCL 2.58 – (2.575)(0.031) LCL 2.58 – (0.0798) LCL 2.50 and UCL 2.58 + (2.575)(0.031) UCL 2.58 + (0.0798) UCL 2.66 Notice that in being more confident in capturing the “true” GPA (i.e., in moving from 95 percent to 99 percent confident) we WIDEN the confidence limits. Remember that we can never be certain (100 percent confident) because the sampling distribution is asymptotic (never ends). Now let’s estimate confidence intervals for proportions. (This example comes from Sirkin 1999, pp. 256-258.) An overnight telephone poll of 900 randomly-selected likely voters found that Candidate A would receive 53 percent of the vote if the election were to be held today. What is the estimated proportion of support for Candidate A in the universe of likely voters (and the “margin of error”)? The algorithm is: CLs Ps / 2 Pp Q p n For the upper confidence limit: UCL Ps / 2 Pp Q p n And for the lower confidence limit LCL Ps – / 2 Pp Q p n Recapitulation 1. In large random samples, the Central Limit Theorem is assumed to hold. 2. The standard error of the estimate can be calculated from the standard deviation. 3. The sample mean is used as the point estimate (i.e., the mean of the sampling distribution, hence the “true” universe value) 4. Confidence limits reflect the desired probability of capturing the “true” universe value. 5. Desired confidence equates to the two areas in the center of each half of the normal distribution. Recapitulation (continued) 6. The more confidence desired in capturing the “true” value in the universe, the wider the confidence intervals. 7. Estimations of proportions in the universe are calculated in a similar fashion. Estimation Problem 1 A random sample of 175 college professors drank an average of 5.5 glasses of chardonnay per week with a standard deviation of 2.5 glasses. Based upon these sample statistics and the “Proportions of Area Under Standard Normal Curve" (Appendix 1, pp. 540-542), supply the following for the 95 percent confidence level: 1. The point estimate. __________ 2. The standard error. __________ 3. The upper confidence limit (UCL). __________ 4. The lower confidence limit (LCL). __________ Answers to Estimation Problem 1 A random sample of 175 college professors drank an average of 5.5 glasses of chardonnay per week with a standard deviation of 2.5 glasses. Based upon these sample statistics and the “Proportions of Area Under Standard Normal Curve" (Appendix 1, pp. 540-542), supply the following for the 95 percent confidence level: 1. The point estimate. 5.5 2. The standard error. 0.189 3. The upper confidence limit (UCL). 5.870 4. The lower confidence limit (LCL). 5.130 Estimation Problem 2 Fifty-one percent of an overnight random sample of 600 likely voters favored George W. Bush over Al Gore. Assume that these were the only two candidates surveyed in this poll and that there were no “undecideds.” Supply the following statistics for the 95 percent confidence level: 1. Support for Bush in the universe. __________ 2. Support for Gore in the universe. __________ 3. The “margin of error.” __________ 4. The upper confidence limit (UCL). __________ 5. The lower confidence limit (LCL). __________ 6. Is the race “to close to call”? __________ Answers to Estimation Problem 2 Fifty-one percent of an overnight random sample of 600 likely voters favored George W. Bush over Al Gore. Assume that these were the only two candidates surveyed in this poll and that there were no “undecideds.” Supply the following statistics for the 95 percent confidence level: 1. Support for Bush in the universe. .51 2. Support for Gore in the universe. .49 3. The “margin of error.” .02 4. The upper confidence limit (UCL). .53 5. The lower confidence limit (LCL). .49 6. Is the race “too close to call”? Yes