* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 7. Point Estimation and Confidence Intervals for Means
Survey
Document related concepts
Transcript
Point Estimation and Confidence Intervals Vartanian: SW 540 We use samples to make estimates of population figures because we do not know the population figures and analyses populations are generally too costly. We will make these estimates of such populations figures as the mean, the standard deviation, and many other figures. We want our estimates be unbiased and efficient. Note that there are a number of different ways to estimate population figures – different types of estimation processes. We will be choosing those processes that are least biased and most efficient. Unbiased Estimates An unbiased estimate of a population figure means that over repeated samples, the expected value of all of our samples is equal to the population figure. In other words, the estimate for any single sample need not be the population figure, but if we were to take many samples, the mean values for those samples will be the population figure. For example, if we’re trying to estimate the mean value for years of work experience, an unbiased sampling distribution would look like the following: Where is the population figure. A biased estimate would look like the following: Efficient Estimates An efficient estimate is one that has the smallest error around the population figure, over repeated samples. An efficient estimate will help us to hone in on the true value of the population figure better than other estimates. Confi.Intervals.Means Page 1 A more efficient estimate looks like number 1 below, and a less efficient estimate looks like number 2 below. Confidence Intervals for the Mean A confidence interval around a mean indicates the percent likelihood that the true value of the mean lies between particular estimates. That is, we have a sample and want to know, say, the 95% likelihood that the population mean lies between particular values. To determine a confidence interval for a mean, we need to know the estimate of the mean and an estimate of the standard error of the mean. The standard error of the mean is determined by taking the standard deviation of the sample and dividing it by the square root of the sample size. The hat over the sigma indicates that we’re using an estimate of the population standard error. With relatively large sample sizes (n>=30), we can state that we are 95% confident that the true population mean lies somewhere between The 1.96 value comes from the z table. If you look in the z table, you will see that the value that corresponds to 1.96 for z is .025. What this means is that if we go 1.96 units from the mean, there will be .025 of the distribution (or 2.5% of the distribution) at either tail of the distribution. In other words, 5% of the distribution will be left at both tails of the distribution (2.5% at each tail) and we will be examining values that cover 95% of the distribution. We could also examine a 99% confidence interval, and if we were to do this, we could again look in the z table and find that z value that corresponds to this is roughly .0050. By doubling this .0050 figure, we get .010, or 1% of the distribution, as the sum of the proportion of the distribution at both tails of the distribution. Thus, if we want a 99% confidence interval, we need to find z values that correspond to 1% of the tail of the distribution. The z value we’re looking for lies between 2.57 (.0051) and 2.58 (.0049). The book uses a z value of 2.58 for a 99% confidence interval. To then find the 99% Confi.Intervals.Means Page 2 confidence interval, we would use the following formula: Example If we found the mean to be 10 and the standard error to be 1, then we would be 95% confident that the true mean in the population is between 10+1.96*1 = 11.96, and 10-1.96*1 = 8.04. We would be 99% confident that the true mean lies between 10+2.58*1 = 12.58 and 10-2.58 = 7.42. The greater our confidence level, the more spread out are the estimates. In other words, to get a more precise interval around the population mean, we must sacrifice confidence that we are in the range of the population mean. The larger our sample size, the lower will be the standard error for the estimate, and therefore the more precise we are in our estimates. Confidence Intervals for a Proportion The standard error for a proportion is estimated by where the is the estimated proportion of cases in the condition – say the proportion of cases who are poor, or the proportion of cases that are married. The hats over these Greek letters indicates that they are estimates of the population taken from a sample. To then determine the confidence interval for the proportion for a sample of size 30 or more, use the following formula: Example: If the estimate of the proportion of females in population was .52 and the standard error of this estimate Confi.Intervals.Means Page 3 was .05, then the 95% confidence interval is: .52 +1.96*.05 =.618 and .52-1.96*.05=.422 . Choosing Sample Sizes for Proportions Let’s say that you want to determine the appropriate sample size to determine a 95% confidence interval and be within .03 of the true proportion. How large of a sample will you choose? Whenever using a relatively large sample and using a 95% confidence interval, you will choose 1.96, or almost 2 standard error units, for your z score. We determined this value from the analysis above. The sampling proportion will fall within of the true proportion with a probability of .95. In other words, we’ll use the formula of z times the standard error of the proportion, set this equal to the amount of error we are willing to accept, and then solve for n, or the number of observations we need. Here, the number of observations, n, is determined by the following formula . We could solve for n by using the following formula: , were B is the level of error (.03 in this case). We must make an educated guess for the value of , or set it so that we ensure that the error value does not exceed our stated value (.03 in this case). To do this, we would set equal to .5, because this will make sure that our error will not exceed our stated error level. That is, if we set equal to .5, (1- ) will take on the highest possible value, and thus, n will be higher than for any other value of . (If we set =.3, the n that would be necessary to satisfy our condition would be smaller than if were at .5.) So, in this case, the number of observations will be Confi.Intervals.Means Page 4 . In other words, we’ll need at least 1067 observations to ensure that we are 95% confident that the sample proportion falls within .03 of the true proportion. To do this same type of procedure for the mean, you would have to know the population standard deviation, or a good estimate of this standard deviation. This isn’t always easy to know. Confi.Intervals.Means Page 5