Download Slide 1 - The University of North Carolina at Chapel Hill

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Slide 1
Welcome to unit six lesson one confidence intervals. Using techniques of statistical
inference, we draw conclusions about a population based on our sample data. So are
drawing inferences on our population using our sample. Statistical inferences provide a
statement of how much confidence we can place in our conclusions. In this lesson we are
going to examine one type of statistical inference referred to as confidence intervals. In
the next lesson, we will examine tests of significance. A confidence interval is a good
descriptive tool to present along with the results of a test, so the two go hand in hand.
Slide 2
Suppose we are interested in estimating the diastolic blood pressure for women living in
North Carolina for women between the ages of 50 and 60. We’d take a simple random
sample of say 200 women and measure their blood pressure. Suppose the average
diastolic blood pressure of these women was 117 mm of mercury. Now what can we say
about the diastolic blood pressure of the population? Because remember now we’ve only
measured some random sample, but we’d like to make inference about the population.
We know of X bar of the sample mean is an unbiased estimate of MU, the population
mean. But what do we know about its reliability? How confident are we that this value
of 117 is close to the true population mean? The second sample might produce a mean of
115 or 120 millimeters of mercury.
Slide 3
Questions about variation are answered by examining the spread of the samples. So
population has the mean of MU and a standard deviation of sigma. And the sample mean
follows the distribution with the mean of MU and a sample standard deviation of sigma
divided by the square root of n. If the population of diastolic blood pressure for women
50 to 60 years of age has the mean MU and a standard deviation sigma, then in repeated
samples of size 200 the sample mean has a normal distribution with mean MU and
variant sigma divided by the square root of 200. So sigma is a standard deviation of the
observations within a population. And sigma divided by the square root of n is the
standard deviation of a group of means. So sigma at the variation, and if we estimate
sigma we estimation the standard deviation of observations with in a sample, and we can
estimate sigma divided by the square root of n by finding the standard deviation of a
group of samples means.
Slide 4
Now suppose we know sigma, the population standard deviation. This might not be
realistic but we will see lessons later that will address this problem. Suppose sigma
equals 50, since the sample mean is 117 millimeters of mercury, the standard deviation of
the mean is about 3.5 millimeters of mercury. The standard deviation of the mean is
frequently referred to as the standard error of the mean, or just the standard error.
Side 5
Remember that in the normal distribution about 95 percent of the probability lies between
two standard deviations of the mean. Since the standard error of the mean is 3.5, then
twice the standard error of the mean is 7, two times 3.5. To say that the samples mean
lies with in 7 units of the population mean, is the same as saying that the population mean
lies within 7 points of the sample mean. So about 95 percent of all samples will capture
the true population mean in intervals from 117 minus 7 to 117 plus 7.
Slide 6
Now we’re ready to construct a confidence interval. We say that we are 95 percent
confident that the unknown mean diastolic blood pressure for 50 to 60 year old female
NC residents is between 117 minus twice the standard error of the mean and 117 plus
twice the standard error of the mean. Notice the third set of parentheses here on your
slide. This is the notation we use most often when displaying the confidence intervals.
So it presents the low number the high number. We interpret the interval as 95 percent
confidence that the unknown mean diastolic blood pressure lies between 110 and 124
millimeters of mercury. In other words if we repeated the sampling procedure many
times we would expect 95 percent of our samples to contain the true mean.
Slide 7
The confidence interval contains estimates plus or minus a margin of error. The margin
of error for a 95 percent confidence interval, in this case, is twice the standard error. It
shows how accurate we believe our guess is base on the variability estimate. The plus
and the minus symbol is a short hand way of indicating that we subtract the margin of
error from our estimate to obtain a lower bound for our confidence interval and then we
add the margin of error to our estimate to obtain the upper bound for our confidence
interval.
Slide 8
In the previous example, we use the fact that the sampling distribution of the sample
mean X bar is approximately normally distributed. The mean of MU and the standard
deviation of sigma divided by the square root of n, the sample size. The mean of
diastolic blood pressure is exactly normally distributed if diastolic blood pressure is
normally distributed. So if the data is distributed normally, then the distribution of the
means from that data is also distributed normally. Even if the distribution of diastolic
blood pressure is not normal, then the central limit theorem from unit five states that with
large samples the mean is approximately normally distributed. By large sample we mean
greater than 30. We will see a demonstration of the central limit theorem in the activity
for this lesson. We need our estimates to follow normal distribution so that we can easily
compute confidence intervals. So it’s nice to have data that has a strong distribution.
Slide 9
In constructing the confidence interval for the mean of the diastolic blood pressure, we
know that 95 percent of the area under the normal curve is located within plus and minus
two standard deviations of the mean. This was an approximation. Above is the standard
normal curve. And we want area C to equal 95 percent or .95. Since the curve is
symmetric, you know if you fold it half it’s a mirror image on either side, and the area
under this curve must total one. So we have symmetric curve and the total area of the
curve is one. So now we know we have .025 area to the right of C and .025 area to the
left of C, if in the middle you have 95 percent and you have 5 percent left over, half of it
2.5 percent on one tail and the other half on the other tail. Notice that the area to the left
of C equals one minus C divided by two. And one minus .95 divided by two equals
.025. We are going to make use of this relationship very often.
Slide 10
Now assign the value from standard normal distribution with .025 area to the left of it.
We look at table T dash 2 in your text. Part of this table is reproduced in the slide here.
Look for the value .025 in the body of the table. This area is associated with negative Z
value of minus 1.96. We read the ones place and the tenth place of the Z statistic from
the left most column. So you get the minus 1.9 from that first column. And then you
read across. And when you read across you’re reading the hundredth place of the Z
statistics from the first row. So you go to the six hundreds. So the distribution is
symmetric, Z equals 1.96. So 95 percent of the area to the standard normal curve is
between plus and minus 1.96 standard deviations of the mean. So the table here is telling
you that a Z value of negative1.96 has two and half percent of .025 to the left of it. And
that’s how much we want in our last little table if we are going to have 95 percent in the
middle.
Slide 11
We’re often interested in having from area C equal to 95 percent. However we need to
be able to determine the Z value associated with any area not just 95 percent. And to do
this you can just use the method described in the previous slide. And here are some of
the most commonly used values. If you want to be 90 percent confident, or if you want
to have 90 percent in the middle of your curve, that means you have ten percent left over
and you have five percent in each tail. And a Z value with five percent to the left of it or
five percent to the right of it is a Z of 1.645. As you saw before, when you have a Z of 95
percent, 95 percent of the data in the middle with two and a half left on each tail, that is a
Z of 1.96 and if you’re 99 percent confident your Z will be 2.576. So a lot of the times if
I was looking it up, it’s kind of nice to memorize those or have those values written down
because those are the ones that are most commonly used.
Slide 12
Now we can state the general strategy for finding a confidence interval for population
means. Choose a simple random sample of size n from a population having an unknown
mean MU and a known standard deviation sigma. A level C confidence interval for MU
is, as you see the formula on your slide, X bar plus and minus V times sigma divided by
the square root of n. This interval is exact when the population distribution is normal and
is approximately correct for large n in other cases and if you had data from other
distributions. As long as you have that larger sample size 30 or more, this equation here
will work fine for you.
Slide 13
Now let’s see an example. Suppose you want a 90 percent confidence interval for the
mean height of tenth grade males in Chapel Hill High School. You take a simple random
sample of 30 tenth grade males. You know the population standard deviation is 5 inches,
which is assuming we know that to be 5 inches. The sample mean is 64 inches. Using
the methods discussed previously you can determine that the Z value associated with 90
percent confidence is 1.645. Now you can substitute the number into the equation. The
90 percent confidence interval is 62.5 inches to 65.5 inches. So in other words we are 90
percent confident that the population mean lies in that interval. So the true mean height
of tenth grade males in Chapel Hill High School is 90 percent confident that it’s between
62.5 inches and 65.5 inches.
Slide 14
It is desirable to have a small confidence interval because that means that our estimate is
more precise because there is less variability in it. There are three ways you can decrease
the size of the confidence interval. You can lower the level of confidence, you can
increase the sample size. Those are the first two ways. And you can also reduce the
population variable, but that’s less obvious about how you can do that, I mean it’s usually
just given to be whatever it is. This can be achieved by carefully controlling the
measurement process or by restricting our attention to only part of a large population.
Certain part of your population might have a less variability in height or IQ or something.
If you just concentrate on those types of people, make that population, then your
variances will be smaller. But that may not work for your experiment, so it kind of
depends what works. But usually you either lower the level of confidence or increase the
sample size in order to end up with a smaller confidence interval.
Slide 15
Here we use the example; the mean height of tenth graders to illustrate the effect
confidence level has on the size of the confidence interval. So we’re going to see how
changing your confidence will change the width of your confidence interval. At 90
percent confidence, the confidence interval is 62.5 to 65.5 inches, as we saw in a previous
slide. Often we use a margin of error to describe the size of a confidence interval. And
this margin of error is actually half the width of the interval. You can also think of your
margin of error is the Z value times the standard error. But since you are adding that,
you’re taking the Z value times the standard error and you’re adding that to your sample
means subtracting that to your sample means, then that value is half of your confidence
interval. So that‘s the value of margin of errors. So it’s kind of like two different words
to describing the same thing. At 90 percent confidence the margin of error is 1.5, at 95
percent confidence is 1.79, and at 99 percent confidence is 2.35. So you can see as the
margin of error is getting bigger the confidence interval is getting wider, as you want to
be more confident. Because as you have a larger interval you’re obviously going to be
able to contain more numbers in it and you’re going to be more confident that true mean
value is in that interval.
Slide 16
Here we use the example of the mean height tenth graders to illustrate the effect sample
size has on the size of the confidence interval. So for each example the confidence level
has been fixed at 90 percent. So we are 90 percent confident for each one, but how many
people, males do we have in our sample? As it increases, we see that the margin of error
decreased. So we see that for a sample size of 30, the margin of error is 1.5, and this
reduces to 1.1 with a sample size of 60 and, margin of error reduces to .8 with a sample
size of 100. So what you are seeing here is that if you can increase your sample size this
will reduce the width of your confidence interval. But also of course the trade off is, the
more sample size, the more costly is your experiment.
Slide 17
The first equation on the slide is the formula for the confidence interval. And Z times
sigma divided by the square root of n is the margin of error. If we call this value m and
solve for the sample size then we can determine the sample size needed to achieve a
particular margin of error. So the bottom equation here is the formula for the sample size
and will be illustrated with the example in the next slide. But basically we are saying if
we state what the margin of error is, then we can determine what size of sample we need
to collect to be able to have that size of margin of error.
Slide 18
Let’s return to the problem of estimating the diastolic blood pressure. We know the
population standard deviation is 50 millimeters of mercury. Suppose we would like to
estimate the mean with the margin of error 5 with a 95 percent confidence. To determine
the sample size, we use a sample size formula. The answer is 383 once we plug all the
numbers in. But it is 383 in some fractional part. Since we cannot sample fractional part
we are going to round the number up to 384. So we need a sample size of 384 to be able
to estimate our population mean of the margin of error of 5.
Slide 19
Now you can take the quiz to check your understanding of this material, and this
concludes the tutorial for unit 6 of lesson 1.