Download AP Statistics – Part IV: Randomness and Probability • Standard

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
AP Statistics – Part IV: Randomness and Probability
Chapter 18: Sampling Distribution Models (page 410)
• Standard Deviation: σ = √npq = √pq
n
√n
• A Normal model centered at p with a standard deviation of √(pq/n) is a good
model for a collection of proportions found for many random samples of size n
from a population with success probability p.
o N(p, √(pq/n))
o Remember the 68-95-99.7 Rule
• Sampling error is variability you’d expect to see from one sample to another.
Aka: sampling variability
• Summary: if we draw repeated random samples of the same size, n, from some
population and measure the proportion, p-hat, we get for each sample, then the
collection of these proportions will pile up around the underlying population
proportion, p, in such a way that a histogram of the sample proportions can be
modeled well by a Normal model.
o The only catch… the model becomes better and better as the sample
size gets bigger. Samples of size 1 or 2 aren’t going to work, but much
larger samples do have histograms that resemble normal models.
• Assumptions:
1. sampled values must be independent of each other
2. sample size, n, must be large enough
• Check the following conditions BEFORE using the Normal model to model the
distribution of sample proportions.
1. Randomization Condition: Sample should be a simple random
sample of the population. At least be confident that the sampling method
was not biased and that the sample should be representative.
2. 10% Condition: If no replacement, the sample size, n, must be no
larger than 10% of the population. The only good protection from
failures is to think carefully about possible reasons for the data to fail to
be independent. There are no simple conditions to check that guarantee
independence.
3. Success/Failure Condition: sample size has to be big enough so that
both np and nq are at least 10. Need to expect at least 10 success and 10
failures to have enough data.
• No longer is a proportion something we just compute for a set of data. We now
see it as a random quantity that has a distribution. We call that distribution the
sampling distribution model for the proportion.
• The sampling model quantifies the variability, telling us how surprising any
sample proportion is. And, it enables us to make informed decisions about how
precise our estimate of the true proportion might be.
The sampling distribution model for a proportion
Provided that the sampled values are independent and the sample size is large
enough, the sampling distribution of p-hat is modeled by a Normal model with
mean µ(p-hat) = p, and standard deviation SD(p-hat) = √(pq/n)
• Step-by-Step: Page 415
• Law of Large Numbers: as the sample size gets larger, each sample average is
more likely to be closer to the population mean.
• Fundamental Theorem of Statistics: The sampling distribution of any mean
becomes more nearly Normal as the sample size grows. All we need is for the
observations to be independent and collected with randomization. We don’t
even care about the shape of the population distribution. Central Limit
Theorem (CLT).
o CLT tells us that means of repeated random samples will tend to
follow a Normal model as the sample size grows.
The Central Limit Theorem (CLT)
The mean of a random sample has a sampling distribution whose shape can be
approximated by a Normal model. The larger the sample, the better the
approximation will be.
The Sampling Distribution Model for a Mean
When a random sample is drawn from any population with mean µ and standard
deviation σ, its sample mean ӯ has a sampling distribution with the same mean µ
but whose standard deviation is σ/√n. No matter what population the random
sample comes from, the shape of the sampling distribution is approximately
Normal as long as the sample size is large enough. The larger the sample used,
the more closely the Normal approximates the sampling distributions for the
mean.
• When we have categorical data, we calculate the sample proportion, p-hat; it’s
sampling distribution had a Normal model with a mean at the true proportion p,
and a standard deviation of SD(p-hat) = √(pq/n)
• When we have quantitative data, we calculate a sample mean, ӯ, its sampling
distribution has a Normal model with a mean at the true mean, µ, and a
standard deviation of SD(ӯ) = (σ/√n)
• Assumptions and Conditions:
o The CLT requires few assumptions. Check the following conditions:
1. Randomization Condition: data values must be sampled randomly or
the concept of a sampling distribution makes no sense.
2. Independent Assumption: sampled values must be mutually
independent.
There’s no way to check in general whether the
observations are independent. However, when the sample is drawn
without replacement you should check the 10% Condition.
3. Large enough sample condition: there is no one-size-fits-all rule.
Unimodal and symmetric – fairly small sample is OK.
Strongly skewed – takes a pretty large sample.
For now, just think about the sample size in the context of what
you know about the population. Then tell whether you believe
the Large enough sample condition has been met.
• Step-by-step: Page 423
• Whenever we estimate the standard deviation of a sampling distribution, we
call it a standard error.
o SE (p-hat) = √[(p-hat)(q-hat) / n]
o SE (ӯ) = s / √n
• Hints:
o Don’t confuse the sampling distribution with the distribution of
the sample.
o Beware of observations that are not independent.
o Watch out for small samples from skewed populations.