Download Unit-14-Confidence-Interval-for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Elementary Statistics
Triola, Elementary Statistics 11/e
Unit 14 The Confidence Interval for Means, Οƒ Unknown
Estimation
Unit 14 The Confidence Interval for Means, Οƒ Unknown (Section 7-4)
We are now ready to begin our exploration of how we make estimates of the population mean. Before
we get started, I want to emphasize the importance of having collect a representative sample, i.e. one
that is a simple random sample. Without that, our estimates are useless.
Μ…, the mean of our sample. However, we do not
The best estimate of the mean that is available to us is 𝒙
expect π‘₯Μ… equal πœ‡, therefore, this single estimate, while a good start is somewhat useless because we do
not know how far off from πœ‡ we might be. What we need is a Lower Bound and an Upper Bound in
which we could have some confidence that πœ‡ falls between these two limits.
The first thing we need to quantify is the word β€œconfidence”. As example, let’s say that we want to be
95% confident, and we now explore what that means, other than β€œpretty darn sure”. First note, that we
are working with averages, π‘₯Μ… and πœ‡. That means that the probability distribution we will be working
with is the sampling distribution of the mean, whose mean is 𝝁𝒙̅ and standard deviation, πˆπ’™Μ… . According
to the Central Limit Theorem,. πœ‡π‘₯Μ… = πœ‡ and 𝜎π‘₯Μ… =
𝜎
,
βˆšπ‘›
and so we will be working with those latter values.
Now picture the sampling distribution with πœ‡ at its center. All possible π‘₯Μ… are in the sampling distribution
somewhere, and so if we find a value E such that the interval, πœ‡ ± 𝐸 which is centered on πœ‡, captures
95% of the area under the curve, it will also capture 95% of all possible π‘₯Μ… . Unfortunately, I don’t have
any good pictures to show this, so I’ll lecture on this in class. One last piece, if 95% of the π‘₯Μ… lie within 𝐸
of πœ‡ then πœ‡ must lie within 𝐸 of 95% of the π‘₯Μ… . To put this in another way. 95% of the time we take a
sample, we’re going to get π‘₯Μ… such that πœ‡ lies in π‘₯Μ… . ±πΈ. Therefore this is our 95% confidence interval and
𝐸 is called the margin of error.
Now all we have to do is to find E. Unfortunately, this is easier said than done. Let’s go back to our
sampling distribution. If πœ‡ ± 𝐸 encompasses 95% under the normal curve, then 𝐸 must be 1.96
standard deviations units from πœ‡. Under any normal curve, 95% of the area centered under the curve, is
bounded by 1.96𝜎 from the mean. Hence, 𝐸 = 1.96𝜎π‘₯Μ… = 1.96
𝜎
.
βˆšπ‘›
We’re done, right? Well, not exactly.
Remember, we set out to estimate πœ‡ because we didn’t know it. So what would we know 𝜎 if we don’t
know 𝜎? Fortunately, for very large samples, say size 100 or larger, we can use s, the sample standard
deviation in place of 𝜎. Therefore, we can have 𝐸 = 1.96
𝑠
,
βˆšπ‘›
but what do we do about smaller sample,
say size 20 or 30?
This problem wasn’t solved until around the turn of the 20th century, when William Gosset, working for
the Guinness Brewery company worked out a probability distribution that could be used to perform
quality control tests using small samples. He called it the Student t distribution. This distribution is very
similar in shape to the Normal distribution, except that it is wider, i.e. it has a larger standard deviation.
Furthermore, the size of the standard deviation depends on the sample size, the smaller the sample the
48
Elementary Statistics
Triola, Elementary Statistics 11/e
Unit 14 The Confidence Interval for Means, Οƒ Unknown
larger the standard deviation. Take at a look at the following figure that shows different shapes for the
t-distribution as a function of sample size, as well as comparing it to the Normal distribution.
Here are a few more rules for working with the t-distribution. If you know for a fact, or you strongly
suspect (because you carefully examined the histogram of your sample) that the underlying population
is normally distributed, then the sample size is not that important other than its effect on the shape of
the t-curve. However, if you suspect that the underlying population is not all that normally shaped, then
you sample size should be a minimum of 30.
Nomenclature
When we were working with the Standard Normal distribution, we called the horizontal axis, the z-axis.
The z value that bordered the 95% area centered under the curve to the right was called a critical value
and denoted 𝑧𝛼⁄2 . 𝜢 is called the significance and is the sum of the area of the tails, i.e. the area under
the curve to the left and right of the centered 95% area. Hence, in this case 𝛼 = 0.05. See the figure
below.
The critical value corresponding to a centered 95% area is, 𝑧𝛼⁄2 = 1.96. (1.96 =
NORM.S.INV(.975))
49
Elementary Statistics
Triola, Elementary Statistics 11/e
Unit 14 The Confidence Interval for Means, Οƒ Unknown
We have a completely analogous situation when it comes to using the t-distribution. The axis is called
the t-axis, the critical value is 𝑑𝛼⁄2 , and 𝐸 = 𝑑𝛼⁄2
𝑠
.
βˆšπ‘›
In order to find 𝑑𝛼⁄2 we need to know to things, the
confidence level and the size of the sample.
There is one more twist when finding 𝐝𝐞𝐠𝐫𝐞𝐞𝐬 𝐨𝐟 𝐟𝐫𝐞𝐞𝐝𝐨𝐦 . First, we work with 𝛼, the significance
which is one minus the confidence level expressed as a decimal. For a confidence level of 95% we have,
𝛼 = 1.0 βˆ’ 0.95 = 0.05
We also need to use the degrees of freedom which is simply the sample size minus one,
Deg_ freedom = 𝑛 βˆ’ 1
Worked Example
We receive a batch of 50,000 washers, and we wish to estimate the average inside diameter of the
washers. We carefully select a simple random sample of size 20 and find that the average inside
diameter is 24.78mm with a standard deviation of 1.62mm. We want to calculate a 95% confidence
interval for our estimate of the batch µ. First we calculate 𝑑𝛼⁄2 using T.INV.2T,
50
Elementary Statistics
Triola, Elementary Statistics 11/e
Unit 14 The Confidence Interval for Means, Οƒ Unknown
Note that the tool uses the work Probability instead of 𝛼 or Significance.
We see that 𝑑𝛼⁄2 = 2.0930 and we proceed to calculate E,
(2.0930)(1.62)
𝑠
𝐸 = 𝑑𝛼⁄2
=
= 0.7582
βˆšπ‘›
√20
Finally, the confidence interval is,
π‘₯Μ… ± 𝐸 = 24.78 ± 0.76 = (24.02, 25.54)
Below is the Excel spreadsheet that I used to calculate these values. If you double click on the table, you
will bring up a copy of Excel. Then if you select any of the cells, such as the value for t, you will see the
Excel formula in the formula bar, 𝒇𝒙 toward the top.
x
24.78
s
n
1.62
t
E
x-E
x+E
20 2.093024 0.7582 24.02182 25.53818
One last note. Suppose that the manufacturer of the washers had claimed that the average inside
diameter was 25.00mm. On the basis of this sample, could you refute the claim? You could not because
25.00 lies within (24.02, 25.54). Hence, there’s no reason to doubt the manufacturer’s claim.
Now it’s your turn to have some fun. You’ll want to open Excel 2010 and label some cells as I did.
Here’s the situation. Assume the population is normally distributed. For a sample size of 61, the
average weight loss was 4.0 kg with a standard deviation of 6.4 kg. Find a 99% confidence interval for
the mean of the population. Use Excel in exactly the same way I did. See the answer at the end of this
unit.
This is the end of Unit 14.
In class, you will get more practice with these
concepts by working exercises in MyMathLab.
51
Elementary Statistics
Triola, Elementary Statistics 11/e
Unit 14 The Confidence Interval for Means, Οƒ Unknown
Answers
x
s
4
n
6.4
t
E
x-E
x+E
61 2.660283 2.1799 1.8201 6.1799
52