Download Statistics Central Limit Theorem Unit Plan

Central Limit Theorem The Central Limit Theorem (CLT), is one of the most important ideas in Statistics. It allows us to model a wide variety of phenomena and make astoundingly accurate predicts. However, very specific conditions need to apply in order for the CLT to be valid. If you use the CLT where it is not valid, all sorts of disasters may await. Consider yourself warned!! Before we get to the CLT itself, we need a few definitions: A sampling distribution is the probability distribution of a set of sample means when samples of a fixed size n are repeatedly taken from a population (with replacement.) Note that this set no longer consists of data points, like it has for the entirety of the class up to now, but of the means of various samples. For example, take a data set A. Its elements are data points and can be written thusly: 𝐴 = {𝑥1, 𝑥2 , 𝑥3 , 𝑥4 , … . . } However, we are now dealing with sets of means. Take the set of means B. Its elements are means and can be written thusly: 𝐵 = {𝑥̅1 , 𝑥̅2 , 𝑥̅3 , 𝑥̅4 , … . } The mean of a sampling distribution is the same as the mean of the population from which it was drawn: 𝜇𝑥̅ = 𝜇 The variance of a sampling distribution is the variance of the population from which it was drawn, divided by n, the sample size: 𝜎𝑥̅ 2 𝜎2 = 𝑛 And thus the standard deviation can be given as: 𝜎𝑥̅ = 𝜎 √𝑛 We now have in mathematical form the very important idea that we’ve been talking about for most of the year: The larger the sample size, the less uncertainty in the result. (Remember, standard deviation is a measure of risk or uncertainty.) By the way, the standard deviation of a set of sample means has a special name: The standard error Now we are ready for the CLT itself. It states: 1.) If samples of a fixed size n, if 𝑛 ≥ 30, are drawn from any population, then the set of sample means approximates a normal distribution. 2.) OR, if samples of any fixed size are drawn from a normally-distributed population, then the set of sample means is also normally distributed. p. 251 Example 4: First you have to read the graph and realize the population we’re concerned with is only very young drivers (between 15 and 19). The mean of this part of the sample is 𝑥̅ = 25 and we are told that the standard deviation of the population is 𝜎 =1.5. (The fact that we are just given this parameter is the one unrealistic thing about this problem) The sample size is 50, which is greater than 30, so we’re justified in using the CLT, even though it doesn’t tell us that the original population was normal (it doesn’t matter.) So we know that our sample mean of 25 came from somewhere within a normally distributed set of all possible sample means from this population. What are the mean and standard deviation of this set? Well, we know that its mean is still 25, just like the original data set, and its standard deviation is given by 𝜎𝑥̅ = 𝜎 √𝑛 = 1.5 √50 = 0.2121 We are now asked to answer the question “What is the probability that the real mean ( 𝜇 ) is somewhere between 24.7 and 25.5 minutes? In other words, 𝑃(24.7 < 𝑥̅ < 25.5) = _____ Well, we know how to do these problems already! They’re just the “between” problems from the last section! Just make sure you’re using the “new” standard deviation, 0.2121, and not the “original”!! (this is the most common mistake in this section.) Evaluating the expression gives us 𝑃 = 0.9116 In other words, based on our data, we are 91.16% confident that the true mean is between 24.7 and 25.5 minutes. This is called a confidence interval. (Although most confidence intervals are symmetric about the mean, they don’t have to be.) Now try p.252 “Try it yourself” #4. Note that nothing changes from Example 4 except the sample size goes from 50 to 100. Notice how that affects the confidence level for the same range of bounds… Continue with examples 5,6 HW: p.254 #1-8, p.256-7 #21-34 Continue with CLT worksheets

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics Central Limit Theorem Unit Plan