Download Chapter 7 Sampling Distribution Summary Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 7 Sampling Distribution Summary
Statistics
Chapter 7 concerns summary statistics (a one number summary of a random
sample) and sampling distributions (a plot of the summary statistics taken from
many samples of a population). The two summary statistics and their sampling
distribution analyzed in this chapter are for the mean (in 7.2) and the proportion
of successes in each sample for a binomial distribution (in 7.3).
In section 7.1 sampling distributions are generated by using simulations for
samples of size n. The four steps in doing this are:
1. Take a random sample of fixed size n from the population.
2. Compute a summary statistic (often either the mean or proportion of
successes).
3. Repeat steps 1 and 2 many times.
4. Display the distribution of the summary statistics (sampling
distribution).
In analyzing the sampling distribution we typically compute its mean (  x if the
summary statistic is the mean and  p̂ if the summary statistic is the proportion
of successes) and its standard deviation, which is called the standard error (SE)
for a sampling distribution. It is denoted as  x if the summary statistic is the
mean and  p̂ if the summary statistic is the proportion of successes. In general,
if the shape of the original population is normal the sampling distribution shape
will also be normal. If the shape of the population is skewed, the shape of the
sampling distribution will become more and more normal as the sample size, n,
increases.
Reasonably likely values for a summary statistic include the middle 95% of the
values in a sampling distribution. Rare events are those in the lower 2.5% or
upper 2.5% of the sampling distribution. If the sampling distribution is normal
and it is standardized (transformed to a mean of 0 and a standard deviation
equal to one) the middle 95% correspond to z-scores between ±1.96
(approximately ± 2 standard deviations).
Section 7.2 Sampling Distribution of the Sample Mean
Section 7.2 considers cases for sampling distributions where the summary
statistic is the sample mean. The notations used for the population, individual
samples, and the sampling distribution are:
Population
Parameter

Sample
Statistic
x
Sampling
Distribution
x

s
 x or SE
N
n
Mean
Standard
Deviation
Size
The relationships between the population parameters and sampling distribution
are:
   x (mean of the population = mean of the sampling distribution)
𝜎𝑥̅ =
𝜎
√𝑛
(as the sample size, n, increases the standard error decreases)
The Central Limit Theorem concerns the property that as the sample size, n,
increases the sampling distribution becomes more normal.
𝜎
The property that 𝜎𝑥̅ = is very useful because it is now possible to determine
√𝑛
the standard error for a sample of size n without having to perform a simulation.
As an example of how this might be used, say you own a catering company and
you are determining how much of a certain beverage you should have available
for a job that involves catering 50 people. From previous experience, you find
that each person drinks an average of 0.25 liters with a standard deviation of
0.12 liters. What is the probability that the average amount of beverage
consumed by each person will be 0.28 liters or more?
𝜇 = 𝜇𝑥̅ = 0.25 L
x 
z

n

x  x
x
0.12
 0.01697
50

0.28  0.25
 1.77
0.01697
A z-score of 1.77 corresponds to 0.9616, so there is a (1 – 0.9616) =
0.0384 or 3.84% chance the average consumption will be 0.28 L or more
per person.
What are the likely values (middle 95%) of beverage amounts that you might
expect to provide for each person? Remembering that the middle 95%
corresponds to z-scores of ±1.96, we have:
z
x  x
x
Rearranging this to solve for x and substituting ±1.96 for z gives:
x   1.96    x   x
Substituting, we have:
x   1.96   0.01697  0.25  0.217 to 0.283 liters.
In other words, it is likely that the average amount of beverage consumed by
each person will be between 0.217 the 0.283 liters.
Finding probabilities involving sample totals
Some problems are given in terms of a total value rather than an average value.
For example, instead of expressing our previous problem as:
What is the probability that the average amount of beverage consumed by
each of the 50 persons with be 0.28 liters or more?
we could have stated the problem as:
What is the probability that the total amount of beverage consumed by the
50 persons will be 14 liters or more? (because 50 x 0.28 = 14 liters)
Two methods are possible for handling this “total value” problem. The first is
to just divide the total by the number in the sample, n, to get an average per
 14

person, and then proceed as we did in the problem above.   0.28 
 50

The second method is to transform our equations which treat average values to
total values. The mean of the total, denoted as  SUM in the text, is
 SUM  n (in our example, 50 x 0.28 = 14 liters)
The standard error for the total, denoted as  SUM  n x  n
shape of the distribution will still be normal.

n
 n   . The
For our example problem, then, we have:
You own a catering company and are analyzing how much beverage will
be consumed. Based on previous experience, you have found that the
average amount of beverage consumed is 0.25 liters per person with a
standard deviation of 0.12 liters. What is the probability that the total
amount of beverage consumed for a catering event of 50 people will be
14 liters or less?
 SUM  n  50  0.25  12.50 liters
 SUM  n    50  0.12  0.849
z
sample sum   SUM
 SUM

14  50  0.25 14  12.50

 1.77
0.848
50  0.12
Once again, a z-score of 1.77 corresponds to 0.9616, so there is a (1 –
0.9616) = 0.0384 or 3.84% chance the total consumption will be 14 liters
or more.
The catering company owner could also determine what total range of beverage
consumed is likely (that is, the middle 95%).
 1.96 
sample sum   sum
 SUM
sample sum  1.96 SUM    SUM
 1.960.849   12.50
 10.84 to 14.16
The owner of the catering company, then, can reasonably expect to need
between 10.84 to 14.16 liters of the beverage.
7.3 Sampling Distribution of the Sample Proportion
From section 6.2, for a binomial distribution with proportion of successes, p, we
found for a sample of size n the mean for the number of successes X is  X  np
and the standard deviation is  X  np1  p  . If both np and n(1 – p) are ≥ 10
the sampling distribution will have approximately a normal shape.
If the summary statistic for each sample is the sample proportion, p̂ (called “phat”), defined as:
pˆ 
number of " succeses"
sample size
we have, for the sampling distribution of the sample proportion:
  pˆ 
X
n

np
p
n
(the mean of the sampling distribution of
the sampling proportion is always equal to p)
np1  p 

n
p1  p 
(the spread decreases as the
n
n
sample size, n, increases)
 As the sample size increases, the shape of the sampling
distribution becomes more normal and is approximately normal
if n is large enough.
 As a general rule of thumb, if both np and n(1 – p) are at least
10, the shape can be treated as being normal.
  pˆ 
X

Example from book:
About 60% of Mississippians use seat belts. Suppose your class conducts a
survey of 40 randomly selected Mississippians.
a.) What is the chance that 75% or more of those selected wear seat belts?
np = 40(0.6) = 24 and n(1-p) = 40(1-0.6)=16, which are both ≥ 10, so we
can treat it as a normal distribution.
 pˆ  p  0.60
 pˆ 
p1  p 
0.61  0.6

 0.0775
n
40
0.75  0.6
 1.94 (corresponds to
 pˆ
0.0775
0.9738). That means that there is a (1 – 0.9738) = 0.0262 or 2.62% chance
that in a sample of 40 Mississippians, 75% of them will wear seat belts.
The z-score for this is: z 
pˆ  p

b.) Would it be quite unusual to find that fewer than 25% of the Mississippians
selected wear seat belts?
For this case, pˆ  0.25 , so the z score will be:
z
pˆ  p
 pˆ

0.25  0.6
 4.52
0.0775
Since anything outside of z scores between -1.96 and +1.96 are unusual, it
would be very unusual to find that only 25% of Mississippians in a sample
of size 40 wear seatbelts.