Download The Central Limit Theorem

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
The Central Limit Theorem
Business Statistics
Plan for Today
• Sampling distribution of sample means
• The standard error of the mean
• The Central Limit Theorem
• The Law of Large Numbers
• z-scores for samples
• Examples
• The Central Limit Theorem for Sums
1
Example: sampling with replacement
• Population: { 3, 7, 8 }. Mean = 6, st.dev. = 2.16
• Possible samples of size 2:
{3,3}, {3,7}, {3,8}, {7,3}, {7,7}, {7,8}, {8,3}, {8,7}, {8.8}
The sample means: 3, 5, 5.5, 5, 7, 7.5, 5.5, 7.5, 8
Consider this as a new population, the population of
all samples of size 2. We can compute:
Mean = 6 -- the same as for the original population
St.dev. = 1.53 -- approximately 1.4 times smaller
This is an example of a sampling distribution.
Sampling distribution of sample means
• Start with a distribution with mean 𝜇 and
standard deviation 𝜎.
• Pick a number n – the size of samples.
• For each sample of size n, compute its
sample mean 𝑥. Do this many times:
𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , …. to get a new
distribution: the distribution of sample means.
Denote by 𝜇𝑥 its mean, and by 𝜎𝑥 its
standard deviation.
2
Sampling distribution of sample means
• Two formulas for such a distribution:
𝜇𝑥 = 𝜇
𝜎
𝜎𝑥 =
𝑛
• See how they work in the previous example.
Sampling distribution of sample means
• The quantity 𝜎𝑥 , representing the standard
deviation in the sampling distribution, is
called the standard error of the mean.
• An important implication of the formula
𝜎𝑥 = 𝜎/ 𝑛 is that the sample size must be
quadrupled (multiplied by 4) to achieve half
(1/2) the measurement error. When
designing statistical studies where cost is a
factor, this may have a role in understanding
cost–benefit tradeoffs.
3
Normal Distribution
Underlying Distribution
• When the original (or underlying) distribution is
normal, or nearly normal, then the distribution of
sampling means will also be normal, even for small
values of the sample size n:
If 𝑋~𝑁(𝜇, 𝜎), then 𝑋~𝑁(𝜇, 𝜎
𝑛)
• When the underlying distribution is arbitrary, then
the distribution of sample means will still approach
the normal distribution, but for larger values of n.
• If there is a symmetry in the original distribution,
then 𝑛 ≥ 20 suffices. If there is no symmetry, then
the required value of n could be 30 or higher.
4
Uniform Distribution
Distributions of Sample Means
5
The Central Limit Theorem
The sampling distribution of sample
means will approximately follow the
normal distribution 𝑁(𝜇, 𝜎 𝑛) for
big enough values of the sample size
n, regardless of the underlying
distribution.
The Central Limit Theorem
It does not matter what the distribution of the
original population is, or whether you even
need to know it. The important fact is that the
distribution of sample means tend to follow
the normal distribution.
The Central Limit Theorem (CLT) shows how
important the normal distribution is. It reflects
the distribution of sampling means for any
possible distribution, when the sample size is
chosen appropriately big.
6
The Law of Large Numbers
• When the sample size n gets progressively bigger,
the sample mean 𝑥 will get closer and closer to the
population mean 𝜇.
• This law is a consequence of the CLT.
• It implies that the likelihood of 𝑥 predicting 𝜇
within a certain error increases, when the sample
size n gets bigger.
• The probability of a certain error can be made as
small as we wish, by picking a sufficiently large
sample size n. (But, larger samples require more
resources.)
z-scores for samples
• When the CLT applies, and we consider a
sample of size n with mean 𝑥, taken from a
population with mean 𝜇 and standard
deviation 𝜎, its z-score is given by the formula:
𝑧=
𝑥−𝜇
𝜎/ 𝑛
• When computing with a calculator, use
parentheses appropriately: (𝑥 − 𝜇) ÷ (𝜎 ÷ 𝑛 )
7
Example 1
• The heights of 18-year-old Canadian men are
approximately normally distributed with a mean of 69
inches and a standard deviation of 2.5 in.
(a) What is the probability that the height of a
randomly selected 18-year-old Canadian male is
between 68 and 70 inches?
(b) What is the probability that in a group of 9
randomly selected 18-year-old Canadian males, the
average height is between 68 and 70 inches?
Answer: (a) 31.08% (b) 76.98%
(do on board)
Why is there such a drastic difference in the answers?
Example 2
• The scores on a stress test for factory employees
follow an unknown distribution with a mean of 32
and a standard deviation of 11.5.
(a) Find the probability that the average stress score
for a group of 50 employees is less than 27.
(b) Find the 90th percentile for the average stress
score for a group of 50 employees.
(Solve on board.)
Answers: (a) 0.11%
(b) 𝑥 = μ + 𝑧 ∙
𝜎
𝑛
= 34.08
8
Example 3
Practice: The lifespan of Typhoon X50 vacuum
cleaners under heavy usage is normally
distributed with a mean of 17 years and a
standard deviation of 8 years. A cleaning
company bought 25 brand new Typhoon X50
vacuum cleaners. What is the probability that
the average lifespan of this batch will be more
than 15 years?
Answer: 89.44%
Example 4
The average weight of a bag of flower is 2.047
kg with a standard deviation of 0.013kg.
Within what limits would the middle 90% of
the sampling distribution of sample means fall
for samples of size 10?
Solutions: the middle 90% is between 5th and
95th percentiles. The z-score for 0.45 is 1.645.
Thus, 𝑥 falls between 𝜇 ± 1.645 ∙ (𝜎 𝑛).
Or between 2.0402 kg and 2.0538 kg.
9
The Central Limit Theorem for Sums
If you keep drawing larger and larger samples
and taking their sums, the sums form their
own normal distribution (the sampling
distribution), which approaches a normal
distribution as the sample size increases. This
normal distribution has a mean equal to the
original mean multiplied by the sample size
and a standard deviation equal to the original
standard deviation multiplied by the square
root of the sample size.
10