Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Central Limit Theorem
The Central Limit Theorem (CLT), is one of the most
important ideas in Statistics. It allows us to model a wide
variety of phenomena and make astoundingly accurate
predicts. However, very specific conditions need to apply
in order for the CLT to be valid.
If you use the CLT where it is not valid, all sorts of disasters
may await. Consider yourself warned!!
Before we get to the CLT itself, we need a few definitions:
A sampling distribution is the probability distribution of a
set of sample means when samples of a fixed size n are
repeatedly taken from a population (with replacement.)
Note that this set no longer consists of data points, like it
has for the entirety of the class up to now, but of the
means of various samples.
For example, take a data set A. Its elements are data points
and can be written thusly:
๐ด = {๐ฅ1, ๐ฅ2 , ๐ฅ3 , ๐ฅ4 , โฆ . . }
However, we are now dealing with sets of means. Take the
set of means B. Its elements are means and can be written
thusly:
๐ต = {๐ฅฬ
1 , ๐ฅฬ
2 , ๐ฅฬ
3 , ๐ฅฬ
4 , โฆ . }
The mean of a sampling distribution is the same as the
mean of the population from which it was drawn:
๐๐ฅฬ
= ๐
The variance of a sampling distribution is the variance of
the population from which it was drawn, divided by n, the
sample size:
๐๐ฅฬ
2
๐2
=
๐
And thus the standard deviation can be given as:
๐๐ฅฬ
=
๐
โ๐
We now have in mathematical form the very important
idea that weโve been talking about for most of the year:
The larger the sample size, the less uncertainty in the
result. (Remember, standard deviation is a measure of risk
or uncertainty.)
By the way, the standard deviation of a set of sample
means has a special name: The standard error
Now we are ready for the CLT itself. It states:
1.) If samples of a fixed size n, if ๐ โฅ 30, are drawn from
any population, then the set of sample means
approximates a normal distribution.
2.) OR, if samples of any fixed size are drawn from a
normally-distributed population, then the set of sample
means is also normally distributed.
p. 251 Example 4:
First you have to read the graph and realize the population
weโre concerned with is only very young drivers (between
15 and 19). The mean of this part of the sample is ๐ฅฬ
= 25
and we are told that the standard deviation of the
population is ๐ =1.5. (The fact that we are just given this
parameter is the one unrealistic thing about this problem)
The sample size is 50, which is greater than 30, so weโre
justified in using the CLT, even though it doesnโt tell us that
the original population was normal (it doesnโt matter.)
So we know that our sample mean of 25 came from
somewhere within a normally distributed set of all possible
sample means from this population. What are the mean
and standard deviation of this set? Well, we know that its
mean is still 25, just like the original data set, and its
standard deviation is given by
๐๐ฅฬ
=
๐
โ๐
=
1.5
โ50
= 0.2121
We are now asked to answer the question โWhat is the
probability that the real mean ( ๐ ) is somewhere between
24.7 and 25.5 minutes? In other words,
๐(24.7 < ๐ฅฬ
< 25.5) = _____
Well, we know how to do these problems already! Theyโre
just the โbetweenโ problems from the last section! Just
make sure youโre using the โnewโ standard deviation,
0.2121, and not the โoriginalโ!! (this is the most common
mistake in this section.)
Evaluating the expression gives us ๐ = 0.9116
In other words, based on our data, we are 91.16%
confident that the true mean is between 24.7 and 25.5
minutes. This is called a confidence interval. (Although
most confidence intervals are symmetric about the mean,
they donโt have to be.)
Now try p.252 โTry it yourselfโ #4. Note that nothing
changes from Example 4 except the sample size goes from
50 to 100. Notice how that affects the confidence level for
the same range of boundsโฆ
Continue with examples 5,6
HW: p.254 #1-8, p.256-7 #21-34
Continue with CLT worksheets