Download 7. Point Estimation and Confidence Intervals for Means

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Point Estimation and Confidence Intervals
Vartanian: SW 540
We use samples to make estimates of population figures because we do not know the population
figures and analyses populations are generally too costly. We will make these estimates of such
populations figures as the mean, the standard deviation, and many other figures. We want our estimates
be unbiased and efficient. Note that there are a number of different ways to estimate population
figures – different types of estimation processes. We will be choosing those processes that are least
biased and most efficient.
Unbiased Estimates
An unbiased estimate of a population figure means that over repeated samples, the expected value of all
of our samples is equal to the population figure. In other words, the estimate for any single sample
need not be the population figure, but if we were to take many samples, the mean values for those
samples will be the population figure.
For example, if we’re trying to estimate the mean value for years of work experience, an unbiased
sampling distribution would look like the following:
Where
is the population figure. A biased estimate would look like the following:
Efficient Estimates
An efficient estimate is one that has the smallest error around the population figure, over repeated
samples. An efficient estimate will help us to hone in on the true value of the population figure better
than other estimates.
Confi.Intervals.Means
Page 1
A more efficient estimate looks like number 1 below, and a less efficient estimate looks like number 2
below.
Confidence Intervals for the Mean
A confidence interval around a mean indicates the percent likelihood that the true value of the mean lies
between particular estimates. That is, we have a sample and want to know, say, the 95% likelihood
that the population mean lies between particular values. To determine a confidence interval for a mean,
we need to know the estimate of the mean and an estimate of the standard error of the mean. The
standard error of the mean is determined by taking the standard deviation of the sample and dividing
it by the square root of the sample size.
The hat over the sigma indicates that we’re using an estimate of the population standard error. With
relatively large sample sizes (n>=30), we can state that we are 95% confident that the true population
mean lies somewhere between
The 1.96 value comes from the z table. If you look in the z table, you will see that the value that
corresponds to 1.96 for z is .025. What this means is that if we go 1.96 units from the mean, there will
be .025 of the distribution (or 2.5% of the distribution) at either tail of the distribution. In other words,
5% of the distribution will be left at both tails of the distribution (2.5% at each tail) and we will be
examining values that cover 95% of the distribution.
We could also examine a 99% confidence interval, and if we were to do this, we could again look in
the z table and find that z value that corresponds to this is roughly .0050. By doubling this .0050 figure,
we get .010, or 1% of the distribution, as the sum of the proportion of the distribution at both tails of the
distribution. Thus, if we want a 99% confidence interval, we need to find z values that correspond to
1% of the tail of the distribution. The z value we’re looking for lies between 2.57 (.0051) and 2.58
(.0049). The book uses a z value of 2.58 for a 99% confidence interval. To then find the 99%
Confi.Intervals.Means
Page 2
confidence interval, we would use the following formula:
Example
If we found the mean to be 10 and the standard error to be 1, then we would be 95% confident that the
true mean in the population is between
10+1.96*1 = 11.96, and 10-1.96*1 = 8.04.
We would be 99% confident that the true mean lies between
10+2.58*1 = 12.58 and 10-2.58 = 7.42.
The greater our confidence level, the more spread out are the estimates. In other words, to get a more
precise interval around the population mean, we must sacrifice confidence that we are in the range of
the population mean. The larger our sample size, the lower will be the standard error for the estimate,
and therefore the more precise we are in our estimates.
Confidence Intervals for a Proportion
The standard error for a proportion is estimated by
where the is the estimated proportion of cases in the condition – say the proportion of cases who are
poor, or the proportion of cases that are married. The hats over these Greek letters indicates that they
are estimates of the population taken from a sample.
To then determine the confidence interval for the proportion for a sample of size 30 or more, use the
following formula:
Example:
If the estimate of the proportion of females in population was .52 and the standard error of this estimate
Confi.Intervals.Means
Page 3
was .05, then the 95% confidence interval is:
.52 +1.96*.05 =.618 and .52-1.96*.05=.422 .
Choosing Sample Sizes for Proportions
Let’s say that you want to determine the appropriate sample size to determine a 95% confidence
interval and be within .03 of the true proportion. How large of a sample will you choose?
Whenever using a relatively large sample and using a 95% confidence interval, you will choose 1.96, or
almost 2 standard error units, for your z score. We determined this value from the analysis above. The
sampling proportion will fall within
of the true proportion with a probability of .95. In other
words, we’ll use the formula of z times the standard error of the proportion, set this equal to the amount
of error we are willing to accept, and then solve for n, or the number of observations we need. Here,
the number of observations, n, is determined by the following formula
. We could solve for n by using the following formula:
, were B is the level of error (.03 in this case). We must make an educated guess
for the value of , or set it so that we ensure that the error value does not exceed our stated value (.03
in this case). To do this, we would set equal to .5, because this will make sure that our error will not
exceed our stated error level. That is, if we set equal to .5, (1- ) will take on the highest possible
value, and thus, n will be higher than for any other value of . (If we set =.3, the n that would be
necessary to satisfy our condition would be smaller than if were at .5.) So, in this case, the number of
observations will be
Confi.Intervals.Means
Page 4
.
In other words, we’ll need at least 1067 observations to ensure that we are 95% confident that the
sample proportion falls within .03 of the true proportion.
To do this same type of procedure for the mean, you would have to know the population standard
deviation, or a good estimate of this standard deviation. This isn’t always easy to know.
Confi.Intervals.Means
Page 5