Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Central Limit Theorem Business Statistics Plan for Today • Sampling distribution of sample means • The standard error of the mean • The Central Limit Theorem • The Law of Large Numbers • z-scores for samples • Examples • The Central Limit Theorem for Sums 1 Example: sampling with replacement • Population: { 3, 7, 8 }. Mean = 6, st.dev. = 2.16 • Possible samples of size 2: {3,3}, {3,7}, {3,8}, {7,3}, {7,7}, {7,8}, {8,3}, {8,7}, {8.8} The sample means: 3, 5, 5.5, 5, 7, 7.5, 5.5, 7.5, 8 Consider this as a new population, the population of all samples of size 2. We can compute: Mean = 6 -- the same as for the original population St.dev. = 1.53 -- approximately 1.4 times smaller This is an example of a sampling distribution. Sampling distribution of sample means • Start with a distribution with mean 𝜇 and standard deviation 𝜎. • Pick a number n – the size of samples. • For each sample of size n, compute its sample mean 𝑥. Do this many times: 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , …. to get a new distribution: the distribution of sample means. Denote by 𝜇𝑥 its mean, and by 𝜎𝑥 its standard deviation. 2 Sampling distribution of sample means • Two formulas for such a distribution: 𝜇𝑥 = 𝜇 𝜎 𝜎𝑥 = 𝑛 • See how they work in the previous example. Sampling distribution of sample means • The quantity 𝜎𝑥 , representing the standard deviation in the sampling distribution, is called the standard error of the mean. • An important implication of the formula 𝜎𝑥 = 𝜎/ 𝑛 is that the sample size must be quadrupled (multiplied by 4) to achieve half (1/2) the measurement error. When designing statistical studies where cost is a factor, this may have a role in understanding cost–benefit tradeoffs. 3 Normal Distribution Underlying Distribution • When the original (or underlying) distribution is normal, or nearly normal, then the distribution of sampling means will also be normal, even for small values of the sample size n: If 𝑋~𝑁(𝜇, 𝜎), then 𝑋~𝑁(𝜇, 𝜎 𝑛) • When the underlying distribution is arbitrary, then the distribution of sample means will still approach the normal distribution, but for larger values of n. • If there is a symmetry in the original distribution, then 𝑛 ≥ 20 suffices. If there is no symmetry, then the required value of n could be 30 or higher. 4 Uniform Distribution Distributions of Sample Means 5 The Central Limit Theorem The sampling distribution of sample means will approximately follow the normal distribution 𝑁(𝜇, 𝜎 𝑛) for big enough values of the sample size n, regardless of the underlying distribution. The Central Limit Theorem It does not matter what the distribution of the original population is, or whether you even need to know it. The important fact is that the distribution of sample means tend to follow the normal distribution. The Central Limit Theorem (CLT) shows how important the normal distribution is. It reflects the distribution of sampling means for any possible distribution, when the sample size is chosen appropriately big. 6 The Law of Large Numbers • When the sample size n gets progressively bigger, the sample mean 𝑥 will get closer and closer to the population mean 𝜇. • This law is a consequence of the CLT. • It implies that the likelihood of 𝑥 predicting 𝜇 within a certain error increases, when the sample size n gets bigger. • The probability of a certain error can be made as small as we wish, by picking a sufficiently large sample size n. (But, larger samples require more resources.) z-scores for samples • When the CLT applies, and we consider a sample of size n with mean 𝑥, taken from a population with mean 𝜇 and standard deviation 𝜎, its z-score is given by the formula: 𝑧= 𝑥−𝜇 𝜎/ 𝑛 • When computing with a calculator, use parentheses appropriately: (𝑥 − 𝜇) ÷ (𝜎 ÷ 𝑛 ) 7 Example 1 • The heights of 18-year-old Canadian men are approximately normally distributed with a mean of 69 inches and a standard deviation of 2.5 in. (a) What is the probability that the height of a randomly selected 18-year-old Canadian male is between 68 and 70 inches? (b) What is the probability that in a group of 9 randomly selected 18-year-old Canadian males, the average height is between 68 and 70 inches? Answer: (a) 31.08% (b) 76.98% (do on board) Why is there such a drastic difference in the answers? Example 2 • The scores on a stress test for factory employees follow an unknown distribution with a mean of 32 and a standard deviation of 11.5. (a) Find the probability that the average stress score for a group of 50 employees is less than 27. (b) Find the 90th percentile for the average stress score for a group of 50 employees. (Solve on board.) Answers: (a) 0.11% (b) 𝑥 = μ + 𝑧 ∙ 𝜎 𝑛 = 34.08 8 Example 3 Practice: The lifespan of Typhoon X50 vacuum cleaners under heavy usage is normally distributed with a mean of 17 years and a standard deviation of 8 years. A cleaning company bought 25 brand new Typhoon X50 vacuum cleaners. What is the probability that the average lifespan of this batch will be more than 15 years? Answer: 89.44% Example 4 The average weight of a bag of flower is 2.047 kg with a standard deviation of 0.013kg. Within what limits would the middle 90% of the sampling distribution of sample means fall for samples of size 10? Solutions: the middle 90% is between 5th and 95th percentiles. The z-score for 0.45 is 1.645. Thus, 𝑥 falls between 𝜇 ± 1.645 ∙ (𝜎 𝑛). Or between 2.0402 kg and 2.0538 kg. 9 The Central Limit Theorem for Sums If you keep drawing larger and larger samples and taking their sums, the sums form their own normal distribution (the sampling distribution), which approaches a normal distribution as the sample size increases. This normal distribution has a mean equal to the original mean multiplied by the sample size and a standard deviation equal to the original standard deviation multiplied by the square root of the sample size. 10