Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia, lookup

Central limit theorem wikipedia, lookup

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Transcript
Estimation
Let's return to our example of the random sample of
200 USC undergraduates. Remember that this is both a
large and a random sample, and therefore the Central
Limit Theorem applies to any statistic that we calculate
from it. We ask these 200 randomly-selected USC
students to tell us their grade point average (GPA). We
calculate the mean GPA for the sample and find it to be
2.58. Next, we calculate the standard deviation for
these self-reported GPA values and find it to be 0.44.
How can we use these two simple univariate statistics
from our random sample to estimate the probable GPA
for the entire USC student body (i.e, the statistical
universe)?
Recall that the standard error is the
standard deviation of the sampling
distribution. The Central Limit Theorem
tells us how to estimate it:
sY
̂ 
N
The standard error is estimated by
dividing the standard deviation of the
sample by the square root of the size
of the sample. In our example,
0.44
ˆ 
200
0.44
ˆ 
14.142
ˆ  0.031
First, we let our sample mean (Y-bar), 2.58,
be our estimate of the mean of the
sampling distribution of all possible mean
GPAs from random samples of 200. Since
the Central Limit Theorem applies in this
situation, we know that the mean of the
sampling distribution is exactly equal to the
(unknown) mean GPA for the universe.
Thus, 2.58 is the first element in our
estimate of the mean of all USC
undergraduate GPAs. This is called the
point estimate. However, we do not stop
here.
Estimation consists of inferring TWO values. These are
known as the upper and lower confidence levels (UCL
and LCL). We have no reason to expect that the
universe (i.e., actual) GPA is identical to the sample GPA.
However, we do know the likelihood of GPA values
above and below 2.58. For example, we know that 95.44
percent of the probable true GPAs lie between z = – 2.00
and z = + 2.00. What we need to do is to CONVERT our
known z-values (i.e., – 2.00 and + 2.00) into GPA scores.
To do this, we need to know the value of the conversion
factor, just the same as we need to know the value of the
exchange rate if we wish to know how many Deutsche
marks we can purchase for one $1.00 US. The value of
this conversion factor, or exchange rate, is the standard
error of the estimate.
Here, the conversion rate—the
standard error—is
ˆ  0.031
We can now make our estimates of
the confidence limits (i.e., LCL and
UCL), as follows:
LCL  Y  ( Z / 2 )(ˆ Y )
and
UCL  Y  ( Z / 2 )(ˆ Y )
We have everything that we need except
the appropriate z-values from the table for
the normal curve. We need to decide on
how confident that we want to be that we
have captured the “true” (unknown actual)
universe mean GPA within our estimates.
Let’s say that we want to be 95 percent
sure that we have captured the “true”
limits.
Dividing the normal curve into two equal
halves (because it is symmetrical), this
means that in each half (.5000) we are
looking for the z-scores that divide the
halves of the curve into the 47.50 percent
(.4750) of the area nearest the center
(Column B in Table 8.1) and the 2.50
(.0250) percent of the area in the two tails
(Column C in the table). In Column A of
this table, we find that the z-values are 
1.96.
We now know where we are on the underlying x-axis.
We do NOT know where we are on the curve itself where
this 95-percent area begins and ends. To find out, we
need to use our “currency exchange rate,” the standard
error of the estimate. Recall, we estimated its value to
be 0.031 (above). In other words, when we travel a
distance of 1 z on the x-axis, this is equivalent to
traveling 0.031 GPA units on the curve. Since we need
to travel slightly less than 2 z, the distance in GPA units
is simply 1.96 x 0.031, or a distance of 0.061 GPA.
Because we need to move both to the left and to the right
of the mean GPA, we need to SUBTRACT this value
from the mean of the sampling distribution to get the
lower confidence limit and ADD this value to the mean of
the sampling distribution to get the upper confidence
limit.
We presume that the value of the mean of the sampling
distribution—the point estimate—is the sample mean,
2.58. Thus, the lower confidence limit is 2.52 and the
upper confidence limit is 2.64. In other words, we are 95
percent confident that the “true” undergraduate GPA is
equal to or greater than 2.52 and equal to or less then
2.64. Rendering this into more general form yields:
LCL  2.58 – (1.96)(0.031)
LCL  2.58 – (0.061)
LCL  2.52
and
UCL  2.58 + (1.96)(0.031)
UCL  2.58 + (0.061)
UCL  2.64
Let's say we want to be 99 percent confident.
We would need to travel to the right AND to the
left from the mean of the sampling distribution
on the x-axis so as to encompass 99 percent of
the area under the sampling distribution curve.
This means covering 49.5 percent of the area in
the right half and 49.5 percent in the left half
(95 / 2 = 49.5).
Let's look for the z-values for area 0.4950 in Column B of
Table 8.1. We see, however, that there is no such value
in the table. The closest values are 0.4949 and 0.4951.
However, we know how to interpolate. The z-values that
we are looking for are half-way between 2.57 and 2.58.
Interpolating, we arrive at z =  2.575.
LCL  2.58 – (2.575)(0.031)
LCL  2.58 – (0.0798)
LCL  2.50
and
UCL  2.58 + (2.575)(0.031)
UCL  2.58 + (0.0798)
UCL  2.66
Notice that in being more confident in
capturing the “true” GPA (i.e., in moving
from 95 percent to 99 percent confident)
we WIDEN the confidence limits.
Remember that we can never be certain
(100 percent confident) because the
sampling distribution is asymptotic (never
ends).
Now let’s estimate confidence intervals for
proportions. (This example comes from
Sirkin 1999, pp. 256-258.)
An overnight telephone poll of 900
randomly-selected likely voters found that
Candidate A would receive 53 percent of
the vote if the election were to be held
today. What is the estimated proportion
of support for Candidate A in the universe
of likely voters (and the “margin of error”)?
The algorithm is:
CLs  Ps   / 2
Pp Q p
n
For the upper confidence limit:
UCL  Ps   / 2
Pp Q p
n
And for the lower confidence limit
LCL  Ps –  / 2
Pp Q p
n
Recapitulation
1. In large random samples, the Central Limit
Theorem is assumed to hold.
2. The standard error of the estimate can be
calculated from the standard deviation.
3. The sample mean is used as the point
estimate (i.e., the mean of the sampling
distribution, hence the “true” universe value)
4. Confidence limits reflect the desired
probability of capturing the “true” universe
value.
5. Desired confidence equates to the two areas
in the center of each half of the normal
distribution.
Recapitulation (continued)
6. The more confidence desired in capturing
the “true” value in the universe, the wider the
confidence intervals.
7. Estimations of proportions in the universe are
calculated in a similar fashion.
Estimation Problem 1
A random sample of 175 college professors drank an
average of 5.5 glasses of chardonnay per week with a
standard deviation of 2.5 glasses. Based upon these
sample statistics and the “Proportions of Area Under
Standard Normal Curve" (Appendix 1, pp. 540-542), supply
the following for the 95 percent confidence level:
1.
The point estimate.
__________
2.
The standard error.
__________
3.
The upper confidence limit (UCL).
__________
4.
The lower confidence limit (LCL).
__________
Answers to Estimation Problem 1
A random sample of 175 college professors drank an
average of 5.5 glasses of chardonnay per week with a
standard deviation of 2.5 glasses. Based upon these
sample statistics and the “Proportions of Area Under
Standard Normal Curve" (Appendix 1, pp. 540-542), supply
the following for the 95 percent confidence level:
1.
The point estimate.
5.5
2.
The standard error.
0.189
3.
The upper confidence limit (UCL).
5.870
4.
The lower confidence limit (LCL).
5.130
Estimation Problem 2
Fifty-one percent of an overnight random sample of 600
likely voters favored George W. Bush over Al Gore.
Assume that these were the only two candidates surveyed
in this poll and that there were no “undecideds.”
Supply the following statistics for the 95 percent
confidence level:
1.
Support for Bush in the universe.
__________
2.
Support for Gore in the universe.
__________
3.
The “margin of error.”
__________
4.
The upper confidence limit (UCL).
__________
5.
The lower confidence limit (LCL).
__________
6.
Is the race “to close to call”?
__________
Answers to Estimation Problem 2
Fifty-one percent of an overnight random sample of 600
likely voters favored George W. Bush over Al Gore.
Assume that these were the only two candidates surveyed
in this poll and that there were no “undecideds.”
Supply the following statistics for the 95 percent
confidence level:
1.
Support for Bush in the universe.
.51
2.
Support for Gore in the universe.
.49
3.
The “margin of error.”
.02
4.
The upper confidence limit (UCL).
.53
5.
The lower confidence limit (LCL).
.49
6.
Is the race “too close to call”?
Yes