Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 18 – Like a Ton of Bricks …Hope You’re Insured. Sampling Distributions Again In this world, many distributions are not normally distributed. Some distributions might be, but we cannot verify it. Usually, in fact, we can only verify this with a census. A census is dumb. Sampling Distributions Again These distributions are referred to as parent distributions. This is because they have babies. Self-respecting statisticians call them child distributions. I prefer to think of them as underloved babies, just like those poor, sweet robot babies. Sampling Distributions Again Each sample size creates a separate child distribution. In other words, there is one distributions for samples of size 2 and other for samples of size 3. And another for samples of size 4. Also, samples of size 5 is a separate distribution. Sampling Distributions Again Also 6. Also 7. Also 12. Also 18. Also 107. And many, many more. Like…one for each whole number, I suppose. Sampling Distributions Again Each one of these child distributions is just a little closer to normal than the original parent distribution. The exception is a parent distribution which starts of normally distributed already. The child distributions would still, however, also be normal. Sampling Distributions Again The sampling distribution lets us calculate things based on a normal curve for distributions that might not even be normal. It is technically the collection of every possible mean from every single potential sample of that particular size. You would never want to create a sampling distribution ever. An actual sampling distribution is even dumber than a sample. The concept of a sampling distribution, however, is handy. Sampling Distributions Again These distributions focus on the average of the sample, and as such outliers tend to have their effect diluted. This can be by counterbalancing outliers. This can also be the fact that most of the elements of the sample will be more typical values. This leads to a smaller standard deviation than the original distribution had. Sampling Distributions Again Much in the way that other types of children will often resemble their parents, sampling distributions have some things in common. Sampling distributions have the same mean as the original population. This idea is that the mean of all the sample means is the same as the original mean was in the first place. Waw!?! Each individual number in the sampling distribution is the mean of a sample. So the sampling distribution is actually an enormous gathering of means. A sample mean convention, as it were. And if we took every single possible mean and averaged them, they would average to the original mean of the parent distribution. We Care….Why? These sampling distributions look a lot like Normie. In the absence of paternity tests, we cannot prove anything, however. The resemblance to Normie makes the math easier. Well, once we review how to do z-score type stuff. Mean and Standard Deviation We established that the means of all the sampling distributions tend to centralize around the true mean of the parent distribution. This centralizing effect causes the standard deviation of the sampling distribution to be smaller. For a quantitative variable, in fact, we can take the original standard deviation and divide by the square root of n, which is the sample size. Mean and Standard Deviation For proportional data (which we will spend the next 4 chapters discussing) our standard deviation will be the square root of the quotient of the product of the proportion and its compliment divided by the sample size. Waw?!? It is actually a simple formula: The Central Limit Theorem The Central Limit Theorem is a huge deal in Statistics. If you were to go into a statistician’s bar and talk crap about this theorem, there would be a brawl. It basically states what we were just talking about, but I will recap it. The Central Limit Theorem Sampling distributions are more normal than their parent distributions, unless the parent was normally distributed. ◦ In which case the sampling distribution stays normal. Sampling distributions have the same mean as the parent distribution and have a standard deviation which is the standard deviation of the parent distribution divided by the square root of the sample size. The Central Limit Theorem This is important: a distribution that does not start normal will never have normal sampling distributions…they will only be close, at best. As a rule of thumb, 30 is usually enough to make nearly any distribution roughly normal in its sampling distribution. There are also conditions to check. Conditions? In order to make our assumption of having a roughly normal sampling distribution something other than hubris or arrogance, we have to meet a few requirements. If we can do this, we can bypass the usual effect that assuming has on you and me. This is just plain exciting to me, since usually making assumptions just ends up biting you in the donkey. The Conditions Randomizing Condition: The sample must be random. Independence Condition: Each thing in the sample must be independent. The 10% Condition: The sample must be less than 10% of the population. The Large Enough Sample Condition: The sample must be large enough. That Last One Seemed Vague True. It is not vague though, but instead it changes based on which kind of variable we are looking at. We will clarify it when we get there, but the basic idea is that for proportional data it is having at least 10 successes and 10 failures, and for quantitative data you need to have a distribution that lacks extreme skew or outliers. Standard Error P.S. – Instead of calling the standard deviation of the sampling distribution a standard deviation, we will instead call it a standard error. This is because it is not really a deviation unless we calculate every mean, but instead it represents a specific sampling value being different from the mean due to sampling error. Assignments Chapter 18 – 25 and 27, then 5 and 17. Due Tuesday. I will be giving more homework due Thursday so don’t fall behind. Midterm project presentations are in just over two weeks. There will be a chapter 18+19 Quiz next week. Read chapter 19 for Monday. Quiz Bulletpoints Be able to use z-scores to find probabilities for individuals. Be able to use z-scores to find probabilities for sample averages. Be able to use z-scores to find probabilities for sample proportions. Be able to find a confidence interval for the true proportion based on a sample. Be able to find the sample size in order get a desired margin of error.