Download 1 Reminder of Definitions 2 Unknown Population Standard

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
1
Reminder of Definitions
There are many means and standard deviations that arise in our discussions.
symbol
µ
σ
x̄
s
µx̄
σx̄
sx̄
2
definition
population mean
population standard deviation
sample mean (different for each sample)
sample standard deviation (different for each sample)
mean of sample mean
standard deviation of sample mean
estimate of σx̄ (different for each sample)
Unknown Population Standard Deviation and Small Sample Sizes
Throughout we assume that the sample size n is at most 5% of the population size N . This allows us to
ignore the finite population correction factor when talking about the standard deviation of the sample mean.
When the population standard deviation σ is not known, it can be √
approximated by the sample standard
n. We define the analogous quantity
deviation
s.
The
standard
deviation
of
the
sample
mean
is
σ
=
σ/
x̄
√
sx̄ = s/ n.
First suppose that the sample size is at least 30. In this case, s is a good approximation of σ and thus sx̄
x̄ − µ
is a good approximation of σx̄ . By the central limit theorem,
follows a standard normal distribution,
σx̄
x̄ − µ
no matter what the distribution of the underlying population data. Therefore,
very nearly follows a
sx̄
standard normal distribution.
Now suppose that the sample size is at least 2 but less than 30. Because we can not use the central limit
theorem here, we must assume that the population data set is normally distributed in order to conclude
that x̄ − µσx̄ follows a standard normal distribution. However, now it is no longer true that sx̄ is a reliable
x̄ − µ
approximation of σx̄ . It turns out that, in this case,
follows a Student t distribution with n−1 “degrees
sx̄
of freedom”.
To solve confidence interval problems, we proceed essentially the same was as we did before. However, instead
of looking up z-scores on the standard normal table, we will be looking up t-scores on the Student t table.
Remember that the Student t table is arranged differently from the standard normal table.
We will be considering the following example. Suppose that there are a total of 10000 minivans on a road.
We obtain a sample of 16 minivans. The mean weight of the minivans in our sample is 4000 pounds. The
standard deviation of the sample is 800 pounds. Remember that computing this sample standard deviation
requires a division by 15 and not by 16.
3
Confidence Intervals from Confidence Levels
Our goal here is to find a 90% confidence interval for the mean weight of the entire population of minivans.
First, we convert the confidence level to a t-score. Since our sample size is 16, we consider the Student t
distribution with 15 degrees of freedom. We want to find the value t such that the shaded area in the figure
on the left below is 0.9. Equivalently, the shaded area on the figure on the right below is 0.05.
1
0.9
0.05
−t
0
t
−t
0
t
We locate the row of the Student t table for 15 degrees of freedom and the column for an area of 0.05. The
value in the body of the table is t = 1.753.
√
Second, we convert the t-score to a maximum sampling error. To do this, we need to compute sx̄ = 800/ 16 =
800/4 = 200. Then E = tsx̄ = 1.753 · 200 = 350.6.
Third, we convert the maximum sampling error to a confidence interval. The left endpoint of the interval is
x̄ − E = 4000 − 350.6 = 3649.4 and the right endpoint is x̄ + E = 4000 + 350.6 = 4350.6.
We say that we are 90% confident that the true mean weight of the entire population of minivans is between
3649.4 and 4350.6 pounds.
4
Confidence Levels from Confidence Intervals
Our goal here is to find out how confident we can be that the mean weight of the entire population of
minivans is between 3480 and 4520 pounds.
First we convert the confidence interval to a maximum sampling error. The width of the confidence interval
is 4520 − 3480 = 1040, so E = 40/2 = 520. Alternatively, you could have just observed that the left endpoint
of the interval is 520 less than the sample mean, and the right endpoint of the interval is 520 more than the
sample mean.
Next we convert the maximum sampling error to a t-score. In the last exercise, we computed sx̄ = 200. Then
t = E/sx̄ = 520/200 = 2.6.
Third, we convert the t-score to a confidence level. We locate the row of the Student t table for 15 degrees of
freedom. Next we try to locate t = 2.6 in the body of the table in this row. The closest match is 2.602 in the
row for area of 0.01. This means that the shaded area in the figure on the left below is 0.01. Equivalently,
the shaded area on the figure on the right below is 0.98.
2
0.98
0.01
−2.6
0
2.6
−2.6
0
2.6
We say that we are 98% confident that the true mean weight of the entire population of minivans is between
3480 and 4520 pounds.
3