Download UNIT 3 Section 7 SAMPLING DISTRIBUTIONS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
UNIT 3
Section 7
SAMPLING DISTRIBUTIONS
When we describe characteristics of a population, we refer to these as “parameters,” and when we
describe characteristics of a sample, we refer to these as “statistics. “ (Characteristics being mean,
proportion, standard deviation, etc.)
Average GPA of all PBHS students:
𝜇 = 3.59
(a parameter)
Average GPA of an SRS of PBHS students: 𝑥̅ = 3.561
(as statistic)
When information about all individuals of a population is collected to provide a parameter, the
distribution a “population distribution.”
When information about a sample of individuals is collected to provide a statistic, the distribution
displays “sample data.”
We can then combine the various resulting values of the statistic taken from all possible SRSs of size n
(from the sample data collections) to provide a “sampling distribution.” Sampling variability explains
the chance variation of values 𝑥̅ obtained from each sample.
FOR EXAMPLE: Consider the graphs below: there are 200 chips in a bag, 100 red and 100 blue.
Suppose we want to look at the proportion of red chips randomly selected.
The population distribution shows the distribution when selecting all chips (left column); the
distributions of sample data shows the results when drawing SRSs n=20 from the bag of chips (center
column); and the sampling distribution shows the distribution of all proportions collected from samples
(right column).
Central Limit Theorem
The Central Limit Theorem refers to the fundamental idea that the sampling distribution of any mean
becomes more normal as the sample size increases (must have a finite σ). As the sample size increases,
the sampling distribution gets closer and closer to a Normal distribution.
Regarding sample size: if your original population distribution is skewed, then your sampling distribution
will better approximate a Normal distribution if you use a bigger sample size. (If population is Normally
distributed, then you can “get away with” a smaller sample size.)
The Normal distribution condition (#3 under “Conditions” ahead) requires for proportions that np and
n(1-p) must be ≥ 10 and for means that n ≥ 30 or the population is Normally distributed in order to
determine that the sampling distribution is Normal.
Bias (think mean/center of distribution)
Sample means ( x ) and sample proportions ( p̂ ) are considered “unbiased estimators” of population
parameters μ or p, respectively. The mean of a sampling distribution is the mean of the actual population.
Variability (think standard deviation/spread of distribution)
The variability of a statistic is described by the spread of the sampling distribution, and the spread is
determined by the sampling design and the sample size.
The larger the sample size, the lower the standard deviation and the less the spread. Statistics from larger
samples have less variability which increases the precision of the estimate.
The 10% Condition (#2 under “Conditions” ahead) requires that the population size to be at least 10 times
the sample size. If the condition is met, the spread of the sampling distribution does not depend on the
size of the population. If not met, we cannot calculate the standard deviation of the sampling distribution
for p̂ or x .
Sample Proportion, pˆ (to estimate population proportion p)
æ
p(1- p) ö
N ç p,
÷
n ø
è
In order to determine what proportion (percentage) of a population satisfies some categorical variable
(such as the proportion of Americans who are registered Republicans or the proportion of teenagers who
own a car) we will find pˆ from an SRS to estimate the unknown parameter p.
Recall that the sampling distribution of pˆ describes how the sample proportion pˆ varies in all possible
samples from a population.

The mean of the sampling distribution of pˆ is equal to the population proportion p; therefore, pˆ is an
“unbiased estimator” of population
proportion p. A statistic is considered

 to be an unbiased estimator if
the mean of it sampling distribution is equal to the value of the parameter being estimated.


When sample size n is large, the sampling distribution of p is close to a Normal distribution. We will use
Normal approximation when the Large Counts Condition is met (see below).
In order to make inferences about a population, certain assumptions/conditions must be met:
Assumptions


Independent sample values
Large enough sample size
Conditions
1. Random samples must be used; we will use SRS.
(Randomization in selecting subjects, assigning treatments, sampling methods, ...)
2. Population must be at least 10 times the sample size. (required for finding standard deviation)
3. Both np ≥ 10 AND n(1- p) ≥ 10. (required for Normal approximation of sampling distribution)
(Because sample size n in the formula for standard deviation involves taking the square root (and a
fraction/denominator), a sample size four times larger is needed to reduce the standard deviation by one
half.)
Sample Mean, x (to estimate population mean  )
æ s ö
N ç m,
÷
è  n ø

In order to determine the mean (average) of a population satisfies some quantitative variable, such as the
mean SAT score among high school seniors or the mean salary of U.S. adults, we will find x from an SRS
to estimate the unknown parameter  .
The mean of sampling distribution is  , so x is an “unbiased estimator” of  . A statistic is considered

to be an unbiased estimator if the mean of it sampling distribution is equal to the value of the parameter

being estimated.



If the population is normally distributed, then so is the sampling distribution of the sample mean x , even
when the sample size is small. If the population is not Normally distributed and the sample size is small,
then the sampling distribution of x will resemble the population shape (left-skewed, bimodal, etc.).

However, according to the Central limit Theorem, the shape of the sampling distribution will become
approximately Normal as sample size n increases, regardless of the shape of the population distribution.

In order to make inferences about a population, certain assumptions/conditions must be met:
Assumptions


Independent sample values
Large enough sample size
Conditions
1. Random samples must be used; we will use SRS.
(Randomization in selecting subjects, assigning treatments, sampling methods, ...)
2. Population must be at least 10 times the sample size. (required for finding standard deviation)
3. Population is normally distributed OR n ≥ 30. (required for Normal approximation of sampling
distribution)