Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AP Statistics – Part IV: Randomness and Probability Chapter 18: Sampling Distribution Models (page 410) • Standard Deviation: σ = √npq = √pq n √n • A Normal model centered at p with a standard deviation of √(pq/n) is a good model for a collection of proportions found for many random samples of size n from a population with success probability p. o N(p, √(pq/n)) o Remember the 68-95-99.7 Rule • Sampling error is variability you’d expect to see from one sample to another. Aka: sampling variability • Summary: if we draw repeated random samples of the same size, n, from some population and measure the proportion, p-hat, we get for each sample, then the collection of these proportions will pile up around the underlying population proportion, p, in such a way that a histogram of the sample proportions can be modeled well by a Normal model. o The only catch… the model becomes better and better as the sample size gets bigger. Samples of size 1 or 2 aren’t going to work, but much larger samples do have histograms that resemble normal models. • Assumptions: 1. sampled values must be independent of each other 2. sample size, n, must be large enough • Check the following conditions BEFORE using the Normal model to model the distribution of sample proportions. 1. Randomization Condition: Sample should be a simple random sample of the population. At least be confident that the sampling method was not biased and that the sample should be representative. 2. 10% Condition: If no replacement, the sample size, n, must be no larger than 10% of the population. The only good protection from failures is to think carefully about possible reasons for the data to fail to be independent. There are no simple conditions to check that guarantee independence. 3. Success/Failure Condition: sample size has to be big enough so that both np and nq are at least 10. Need to expect at least 10 success and 10 failures to have enough data. • No longer is a proportion something we just compute for a set of data. We now see it as a random quantity that has a distribution. We call that distribution the sampling distribution model for the proportion. • The sampling model quantifies the variability, telling us how surprising any sample proportion is. And, it enables us to make informed decisions about how precise our estimate of the true proportion might be. The sampling distribution model for a proportion Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of p-hat is modeled by a Normal model with mean µ(p-hat) = p, and standard deviation SD(p-hat) = √(pq/n) • Step-by-Step: Page 415 • Law of Large Numbers: as the sample size gets larger, each sample average is more likely to be closer to the population mean. • Fundamental Theorem of Statistics: The sampling distribution of any mean becomes more nearly Normal as the sample size grows. All we need is for the observations to be independent and collected with randomization. We don’t even care about the shape of the population distribution. Central Limit Theorem (CLT). o CLT tells us that means of repeated random samples will tend to follow a Normal model as the sample size grows. The Central Limit Theorem (CLT) The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be. The Sampling Distribution Model for a Mean When a random sample is drawn from any population with mean µ and standard deviation σ, its sample mean ӯ has a sampling distribution with the same mean µ but whose standard deviation is σ/√n. No matter what population the random sample comes from, the shape of the sampling distribution is approximately Normal as long as the sample size is large enough. The larger the sample used, the more closely the Normal approximates the sampling distributions for the mean. • When we have categorical data, we calculate the sample proportion, p-hat; it’s sampling distribution had a Normal model with a mean at the true proportion p, and a standard deviation of SD(p-hat) = √(pq/n) • When we have quantitative data, we calculate a sample mean, ӯ, its sampling distribution has a Normal model with a mean at the true mean, µ, and a standard deviation of SD(ӯ) = (σ/√n) • Assumptions and Conditions: o The CLT requires few assumptions. Check the following conditions: 1. Randomization Condition: data values must be sampled randomly or the concept of a sampling distribution makes no sense. 2. Independent Assumption: sampled values must be mutually independent. There’s no way to check in general whether the observations are independent. However, when the sample is drawn without replacement you should check the 10% Condition. 3. Large enough sample condition: there is no one-size-fits-all rule. Unimodal and symmetric – fairly small sample is OK. Strongly skewed – takes a pretty large sample. For now, just think about the sample size in the context of what you know about the population. Then tell whether you believe the Large enough sample condition has been met. • Step-by-step: Page 423 • Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error. o SE (p-hat) = √[(p-hat)(q-hat) / n] o SE (ӯ) = s / √n • Hints: o Don’t confuse the sampling distribution with the distribution of the sample. o Beware of observations that are not independent. o Watch out for small samples from skewed populations.