Download Statistics Central Limit Theorem Unit Plan

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Central Limit Theorem
The Central Limit Theorem (CLT), is one of the most
important ideas in Statistics. It allows us to model a wide
variety of phenomena and make astoundingly accurate
predicts. However, very specific conditions need to apply
in order for the CLT to be valid.
If you use the CLT where it is not valid, all sorts of disasters
may await. Consider yourself warned!!
Before we get to the CLT itself, we need a few definitions:
A sampling distribution is the probability distribution of a
set of sample means when samples of a fixed size n are
repeatedly taken from a population (with replacement.)
Note that this set no longer consists of data points, like it
has for the entirety of the class up to now, but of the
means of various samples.
For example, take a data set A. Its elements are data points
and can be written thusly:
๐ด = {๐‘ฅ1, ๐‘ฅ2 , ๐‘ฅ3 , ๐‘ฅ4 , โ€ฆ . . }
However, we are now dealing with sets of means. Take the
set of means B. Its elements are means and can be written
thusly:
๐ต = {๐‘ฅฬ…1 , ๐‘ฅฬ…2 , ๐‘ฅฬ…3 , ๐‘ฅฬ…4 , โ€ฆ . }
The mean of a sampling distribution is the same as the
mean of the population from which it was drawn:
๐œ‡๐‘ฅฬ… = ๐œ‡
The variance of a sampling distribution is the variance of
the population from which it was drawn, divided by n, the
sample size:
๐œŽ๐‘ฅฬ… 2
๐œŽ2
=
๐‘›
And thus the standard deviation can be given as:
๐œŽ๐‘ฅฬ… =
๐œŽ
โˆš๐‘›
We now have in mathematical form the very important
idea that weโ€™ve been talking about for most of the year:
The larger the sample size, the less uncertainty in the
result. (Remember, standard deviation is a measure of risk
or uncertainty.)
By the way, the standard deviation of a set of sample
means has a special name: The standard error
Now we are ready for the CLT itself. It states:
1.) If samples of a fixed size n, if ๐‘› โ‰ฅ 30, are drawn from
any population, then the set of sample means
approximates a normal distribution.
2.) OR, if samples of any fixed size are drawn from a
normally-distributed population, then the set of sample
means is also normally distributed.
p. 251 Example 4:
First you have to read the graph and realize the population
weโ€™re concerned with is only very young drivers (between
15 and 19). The mean of this part of the sample is ๐‘ฅฬ… = 25
and we are told that the standard deviation of the
population is ๐œŽ =1.5. (The fact that we are just given this
parameter is the one unrealistic thing about this problem)
The sample size is 50, which is greater than 30, so weโ€™re
justified in using the CLT, even though it doesnโ€™t tell us that
the original population was normal (it doesnโ€™t matter.)
So we know that our sample mean of 25 came from
somewhere within a normally distributed set of all possible
sample means from this population. What are the mean
and standard deviation of this set? Well, we know that its
mean is still 25, just like the original data set, and its
standard deviation is given by
๐œŽ๐‘ฅฬ… =
๐œŽ
โˆš๐‘›
=
1.5
โˆš50
= 0.2121
We are now asked to answer the question โ€œWhat is the
probability that the real mean ( ๐œ‡ ) is somewhere between
24.7 and 25.5 minutes? In other words,
๐‘ƒ(24.7 < ๐‘ฅฬ… < 25.5) = _____
Well, we know how to do these problems already! Theyโ€™re
just the โ€œbetweenโ€ problems from the last section! Just
make sure youโ€™re using the โ€œnewโ€ standard deviation,
0.2121, and not the โ€œoriginalโ€!! (this is the most common
mistake in this section.)
Evaluating the expression gives us ๐‘ƒ = 0.9116
In other words, based on our data, we are 91.16%
confident that the true mean is between 24.7 and 25.5
minutes. This is called a confidence interval. (Although
most confidence intervals are symmetric about the mean,
they donโ€™t have to be.)
Now try p.252 โ€œTry it yourselfโ€ #4. Note that nothing
changes from Example 4 except the sample size goes from
50 to 100. Notice how that affects the confidence level for
the same range of boundsโ€ฆ
Continue with examples 5,6
HW: p.254 #1-8, p.256-7 #21-34
Continue with CLT worksheets