Download Lesson Notes: Shape, Center, and Spread for Sample

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Lesson Notes: Shape, Center, and
Spread for Sample Proportions
Deriving the Formulas
Letting X represent a random variable that can take on the value 1 with
probability p and the value 0 with probability 1 p, you can derive the mean
and standard deviation of X symbolically from the definitions of the mean and
standard deviation of a probability distribution.
X xi pi
0(1 p) 1(p)
p
___________
X xi X 2 pi
__________________________
(0 p)2 (1 p) (1 p)2 p
______________________
p2 (1 p) (1 p)2 p
__________________
(p)(1 p)(p 1 p)
________
p(1 p)
Because p̂ is the mean of n independent x’s, the mean________
of the sampling
______
p(1 p)
__
or _____
. See also
distribution of p̂ is p and the standard deviation is _______
n
n
E45 and E46.
p(1 p)
Notes for AP Teachers
Modeling Good Answers
Once students have completed E35 or E43, show them the model answer as an
example of what is expected on the AP Exam. The answer to E35 shows standard
use of a normal approximation to estimate probabilities. E43 has a discussion of the
effect of sample size on the shape of the sampling distribution. A PDF file containing
each question and its model answer is available at www.keypress.com/keyonline.
Solutions
Discussion
D11. a. The distribution for n 40 is the most
normal, and the distribution for n 10 is the
least normal. Notice that for n 10, np 10 0.6 6 10, and n(1 p) 10 0.4 4 10.
The guideline does not hold and the distribution
for n 10 is quite skewed. For n 20, np 12 10, but n(1 p) 8 10. The guideline does
not hold. This distribution is less skewed, but
the skew is still noticeable. When n 40, both
np 24 and n(1 p) 16 are more than 10. The
guideline does hold and the distribution looks
86
Section 7.3 Solutions
very symmetric and fairly smooth. It appears that
the guideline works quite well.
b. We would expect 60% of any size sample to be
wearing seat belts.
c. The distribution for n 10 has the largest
spread, and the distribution for n 40 has the
smallest spread.
d. Getting a sample with 75% or more
Mississippians would be more likely with a sample
size of 10. The area of the bars in the histogram
to the right of 0.75 gets smaller as the sample size
increases because with larger sample sizes, the
sample proportions tend to cluster closer to 0.6.
Statistics in Action Instructor’s Guide, Volume 2
© 2008 Key Curriculum Press
______
D12. a. For n 10, p̂ 0.6, p̂ _____
n
p(1 p)
_____
0.6 0.4
0.155
10
_____
______
For n 20, p̂ 0.6, p̂ _____
n
p(1 p)
_____
0.6 0.4
0.110
20
_____
______
For n 40, p̂ 0.6, p̂ _____
n
p(1 p)
_____
0.6 0.4
0.077
40
_____
b. Estimates will vary, but the answers from part a
should match quite well. In particular, the sampling
distributions of the sample proportion for all
sample sizes have a center that is close to 0.6. For
the sampling distribution of the sample proportion
for samples of size 10, the spread is large, with
reasonably likely sample proportions ranging from
about 0.3 to 0.9. For n 20, the spread is smaller,
with reasonably likely sample proportions ranging
from about 0.4 to 0.8. For n 40, the spread is small,
with reasonably likely sample proportions ranging
from about 0.45 to 0.75.
c. The mean using these formulas is 0.6000006,
the difference due to rounding error. The standard
deviation is 0.155, which matches the result from
the previous formula.
Practice
P16. First check the guideline: Both np 100 0.63 63 and n(1 p) 100 0.37 37 are greater
than 10, so the shape of the distribution will be
approximately
normal. So______________
sum np 63, and
_________
sum np(1 p) 100 0.63 0.37 4.83.
Find z-scores and probabilities for both 56 and 70
freshmen and subtract:
56 63 1.45, P(sum ⱕ 56) 0.074
z _______
4.83
70 63 1.45, P(sum ⱕ 70) 0.926
z _______
4.83
Then 0.926 0.074 0.852.
P ⫽ 0.852
56
63
70
Number of Freshmen
Alternatively, normalcdf56,70,63,4.83 0.853.
There is about an 85% probability that a
sample of 100 freshmen will contain between
56 and 70 freshmen who believe that dissent
Statistics in Action Instructor’s Guide, Volume 2
© 2008 Key Curriculum Press
is a critical component of the political
process.
P17. a. The distribution for n 100 is the most normal,
and the distribution for n 10 is the least normal.
Notice that for n 10, np 10 0.1 1 10,
and n(1 p) 10 0.9 9 10. The guideline
does not hold and the distribution for n 10 is
quite skewed. For n 20, np 2 10, but
n(1 p) 18 10. The guideline does not hold
and this distribution is less skewed, but the skew is
still very noticeable. When n 40, np 4 10 but
n(1 p) 36 10. The guideline does not hold
and this distribution is less skewed, but the skew
is still noticeable. For n 100, both np 10 and
n(1 p) 90 are at least 10. The guideline does
hold and the distribution looks very symmetric
and fairly smooth. The skew is visible for all but
n 100, where both np and n(1 p) are at least
10. It appears that the guideline works quite well.
b. For each sample size, the expected proportion
of drivers is 0.10.
c. The distribution for n 10 has the largest
spread, and the distribution for n 100 has the
smallest spread.
d. Finding 20% or more drivers failing a driving
test would be more likely in a sample of size 10.
The area of the bars in the histogram to the right
of 0.20 gets smaller as the sample size increases
because with larger sample sizes, the sample
proportions tend to cluster closer to 0.1.
P18. a. The mean for each distribution is p 0.10.
______
_____
0.1 0.9
For n 10, p̂ _____
_____
0.095
n
10
p(1 p)
_____
0.1 0.9
For n 20, p̂ _____
0.067
20
_____
0.1 0.9
For n 40, p̂ _____
0.047
40
_____
0.1 0.9
For n 100, p̂ _____
100 0.03
b. Estimates will vary, but the answers from part a
should match quite well. In particular, the sampling
distributions of the sample proportion for all
sample sizes have a center that is close to 0.1. For
the sampling distribution of the sample proportion
for samples of size 10, the spread is large, with
reasonably likely sample proportions ranging from
about 0 to 0.3. For n 20, the spread is smaller,
with reasonably likely sample proportions ranging
from about 0 to 0.23. For n 40, the spread is
small, with reasonably likely sample proportions
ranging from about 0.01 to 0.2. For n 100, the
spread is smallest, with reasonably likely sample
proportions ranging from about 0.04 to 0.16.
Section 7.3 Solutions
87
c. The mean using the table is 0.0999999,
different from 0.1 only because of rounding error.
The standard error is about 0.095, which is equal
to the result in part a.
P19. a. This sampling distribution can be considered
approximately normal because both np 53 and
n(1 p) 47 are 10 or greater. The sampling
distribution of p̂ has a mean of 0.53 and a standard
error of
________
p̂ p(1 p)
________
n
_____________
0.53(1 0.53)
____________
0.05
100
fall within approximately two standard errors of
the mean. This interval is
_________
p 1.96 p(1 p)
________
0.6 1.96 n
___________
0.6(1 0.6)
n
__________
The intervals produced for the various sample
sizes are shown below.
a. n 40: 0.6 0.152 or (0.448, 0.752)
b. n 100: 0.6 0.096 or (0.504, 0.696)
c. n 400: 0.6 0.048 or (0.552, 0.648)
Note again that the sample size must be increased
fourfold to cut the width of the interval in half.
Exercises
0.43 0.48 0.53 0.58 0.63
Sample Proportion (p ⫽ 0.53, n ⫽ 100)
b. No. As you can see from the sampling
distribution in part a, the probability of getting 9 (or
9%) or fewer women just by chance is almost 0. That
there are so few women in the U.S. Senate cannot
reasonably be attributed to chance alone, and so we
should look for other explanations (for example, that
fewer women go into politics, that women are not
able to raise as much money for their campaigns, or
that voters are reluctant to elect a woman senator).
P20. a. The sampling distribution of p̂ is approximately
normal because both np 134.85 and n(1 p) 300.15 are 10 or greater. With p 0.31 and
n 435, the sampling distribution for the sample
proportion has a mean of 0.31 and a standard error
of 0.022. A sample proportion of 0.53 is about 10
standard deviations above the mean of the sampling
distribution, and therefore it would be very unusual
to see such a result. The probability is close to 0.
E35. The sampling distributions for the sample
proportion and for the number of successes can
be considered approximately normal because both
np 920 and n(1 p) 80 are 10 or greater.
a. This sampling distribution is approximately
normal with mean and standard error
p̂ 0.92
________
p̂ p(1 p)
________
n
_____________
0.92(1 0.92)
____________
0.00858
1000
The sketch, with scale on the x-axis, is shown in
part c.
b. This sampling distribution is approximately
normal with mean and standard error
sum 1000(0.92) 920
_________
__________________
sum np(1 p) 1000(0.92)(1 0.92) 8.58
The sketch is shown in part d.
0.9
0.92 2.33.
__________
c. The z-score is z _________
0.92(1 0.92)
_________
1000
P ⫽ 0.0099
p̂ ⫽ 0.90
0.903 0.911 0.92 0.929 0.937
Sample Proportion (n ⫽ 1000)
0.266 0.288 0.310 0.332 0.354
Sample Proportion (p ⫽ 0.31, n ⫽ 435)
b. Yes, p̂ 231/435 0.53. As you saw in
part a, it is almost impossible for this many
representatives to be Republican if they were
chosen at random from the general population.
Possible reasons include the fact that people do
not always vote for their party’s candidate.
P21. Because the sampling distributions are
approximately normal, in each case about 95% of
the potential values of the sample proportion will
88
Section 7.3 Solutions
The probability is about 0.0099.
925 920
__________________
d. The z-score is z ______________
0.5828.
1000(0.92)(1 0.92)
P ⫽ 0.2800
sum ⫽ 925
902.8 911.4 920 928.6 937.2
Number of Han Chinese (n ⫽ 1000)
The probability is about 0.2800.
Statistics in Action Instructor’s Guide, Volume 2
© 2008 Key Curriculum Press
e. Rare events would be the totals that are outside
the interval 920 1.96(8.58); that is, larger than
937 or smaller than 903. Rare events would be the
proportions that are outside the interval 0.92 1.96(0.00858); that is, larger than 0.937 or smaller
than 0.903.
E36. a. Because np 500 0.223 111.5 and
n(1 p) 500 0.777 388.5 are both 10 or
more, the shape of the sampling distribution will
be approximately normal.
p̂ p 0.223
________
p̂ p(1 p)
________
n
___________
0.223 0.777
0.0186
500
___________
The sketch, with scale on the x-axis, is shown in
part c.
b. The shape was verified as approximately
normal in part a.
sum np 500 0.223 111.5
_________
_______________
sum np(1 p) 500 0.223 0.777 9.31
The sketch, with scale on the x-axis, is shown in
part d.
c. Using the sampling distribution for the sample
proportion, the z-score for a sample proportion of
0.2 is
(0.2 0.223)
z ___________ 1.24
0.0186
The probability of getting a sample proportion of
0.20 or fewer with one of these surnames is 0.108.
P ⫽ 0.108
0.1858 0.2044 0.223 0.2416 0.2602
Sample Proportion
d. Using the sampling distribution for the sample
sum, the z-score for 105 successes is
(105 111.5)
z ____________ 0.698
9.31
The probability of getting 105 people or more with
one of these surnames is 0.757.
sum ⫽ 105
P ⫽ 0.757
e. Rare events would be the totals that are
outside the interval 111.5 1.96(9.31); that is,
larger than 129.75 or smaller than 93.25. Rare
events would be the proportions that are outside
the interval 0.223 1.96(0.0186); that is, larger
than 0.259 or smaller than 0.187.
E37. a. The shape would be slightly skewed to the left
because np 100 0.92 92 and n(1 p) 100 0.08 8 are not both at least 10. The mean of
the sampling distribution would still be 0.92 and
the standard deviation would be greater because
of the smaller sample size.
b. As stated in part a, the shape will now be
slightly skewed to the left. The mean will be onetenth that of the mean for a sample of 1000, and
the standard___deviation will be smaller as well, by a
factor of 10 .
c. With n 100, the distribution is more spread
out and skewed left than the distribution for
n 1000. This means a larger percentage of the
sample proportions would be farther from p and,
in this case, in the direction of the skew or in the
left tail. Because the sample proportion of 0.9 is in
the left portion of the distribution or the part that
is with the skew, the probability of getting 90% or
fewer in the sample will be greater with a sample
size of 100 than with a sample size of 1000.
Note: You cannot accurately calculate this
probability using this procedure because the
distribution is not approximately normal.
E38. a. Because np 100 0.223 22.3 and n(1 p) 100 0.777 77.7 are both at least 10, the
shape will still be approximately normal. The
mean of the sampling distribution of the sample
proportion will still be 0.223, but the standard
__
deviation will be larger by a factor of 5 .
b. As shown in part a, the shape will still be
approximately normal. The mean of the sampling
distribution of the sample sum will be one-fifth as
large as that in E36. The standard
__ deviation will be
smaller, as well, by a factor of 5 .
c. The probability of getting 20% or fewer in
the sample will be greater with a sample size of
100 because the distribution is more spread out.
This means a larger percentage of the sample
proportions would be farther from p.
E39. a. 0.5
b. The sampling distribution can be considered
approximately normal because both np 25 and
n(1 p) 25 are 10 or greater. The sampling
distribution has mean and standard error
___________
92.88 102.19 111.5 120.81 130.12
Number of People
Statistics in Action Instructor’s Guide, Volume 2
© 2008 Key Curriculum Press
p̂ 0.5 and
p̂ 0.5(1 0.5)
__________
0.0707
50
Section 7.3 Solutions
89
The z-score for a sample proportion of 0.2 is
d. The z-score for a sample proportion of 0.54 is
0.2 0.5 4.2433
z ________
0.0707
The probability of getting 10 or fewer under the
median age is 0.000011.
c. These results are not at all what we would
expect from a random sample of 50 people. This
group is special in that they are all old enough to
have a job. Children are included in computing the
33.1 median age, but they do not hold jobs. Thus,
we would expect the median age of job holders to
be greater than the median age of all people in the
United States, as they are at Westvaco.
E40. For parts a and b of this problem, the sampling
distribution for the sample total can be considered
approximately normal because both np 50.3 and
n(1 p) 85.7 are 10 or greater.
For parts c to e of this problem, the sampling
distribution for the sample total can also be
considered approximately normal because both
np 1054.5 and n(1 p) 1795.5 are 10 or greater.
a. The sampling distribution is approximately
normal with mean and standard error
0.54
0.37 18.80
__________
z ____________
0.37 0.63
_________
2850
The probability is very close to 0.
e. The z-score for a sample proportion of 0.54
is 18.80 from part d. The probability of getting
54% or larger by chance alone is very close to 0.
You should conclude that this group is special
in some way. (In fact, Cal State Northridge is in
Los Angeles and many of its students come from
urban schools.)
E41. You are assuming that the 75 married women
were selected randomly from a population in
which 60% are employed. Both np 75 0.6 45
and n(1 p) 75 0.4 30 are at least 10, so the
sampling distribution of the sum will be approximately normal. The sampling distribution of the
sum has
sum np 45
_________
___________
sum np(1 p) 75 0.6 0.4 4.24
For 30 employed women in the sample, z sum 136 0.37 50.32
_____________
sum 136 0.37 0.63 5.63
(30 45)
______
3.536, and P(sum ⱕ 30) 0.0002.
4.24
For 40 employed women in the sample, z (40 45)
______
1.179, and P(sum ⱕ 40) 0.1192.
4.24
The sketch is shown in part b.
b. The z-score for 68 is
P(30 ⱕ sum ⱕ 40) 0.1192 0.0002 0.1190.
(68 50.32)
z ___________ 3.14
5.63
P ⫽ 0.1190
P ⫽ 0.9992
total ⫽ 68
30
39.06 44.69 50.32 55.95 61.58
Number of Students (n ⫽ 136)
The probability of getting 68 or fewer who need
remediation is about 0.9992.
c. This distribution is approximately normal with
mean and standard error
p̂ 0.37
________
p̂ 0.37 0.63
0.009
2850
________
40
45
Number of Employed Women
E42. Getting 15 or fewer disks with bad sectors is the
same thing as getting 85 or more without. The
sampling distribution for the number of disks
that have no bad sectors can be considered
approximately normal because both np 80 and
n(1 p) 20 are 10 or greater. The sampling
distribution has mean and standard error
sum 100(0.8) 80
_______________
sum 100(0.8)(1 0.8) 4
The z-score for 85 disks with no bad sectors is
85 80 1.25
z _______
4
0.352 0.361 0.370 0.379 0.388
Sample Proportion (n ⫽ 2850)
90
Section 7.3 Solutions
Statistics in Action Instructor’s Guide, Volume 2
© 2008 Key Curriculum Press
Frequency
P ⫽ 0.1056
sum ⫽ 85
72
76
80
84
88
Number of Disks (n ⫽ 100)
0.80
0.90
1.00
Sample Proportion (n ⫽ 50, p ⫽ 0.90)
Frequency
40
20
10
0.82
0.88
0.94
1.00
Sample Proportion (n ⫽ 100, p ⫽ 0.90)
Frequency
60
50
40
30
20
10
0
0.86 0.88 0.90 0.92 0.94
Sample Proportion (n ⫽ 500, p ⫽ 0.90)
Frequency
80
60
40
20
0
0.90 0.92 0.94 0.96 0.98 1.00 1.02
Sample Proportion (n ⫽ 50, p ⫽ 0.98)
Frequency
80
60
40
20
0
0.90 0.92 0.94 0.96 0.98 1.00 1.02
Sample Proportion (n ⫽ 100, p ⫽ 0.98)
100
80
60
40
20
0
0.900
Statistics in Action Instructor’s Guide, Volume 2
© 2008 Key Curriculum Press
30
0
Frequency
The probability of getting 85 or more with no bad
sectors is about 0.1056.
The assumption that underlies this answer is
that the 100 sampled disks are a random sample
from a population in which 80% of the disks have
no bad sectors.
E43. a. The means of the sampling distributions
definitely depend upon p, as the first three center
close to p 0.2 and the second three center close
to p 0.4. The sampling distributions have means
equal to p regardless of the sample size; the means
do not depend upon the sample size.
b. The spreads of the sampling distributions
decrease as n increases for both values of p. The
spreads also depend upon the value of p, however.
For each sample size, the spread for p 0.4 has a
larger standard error than the one for p 0.2.
c. For p 0.2, the shape is quite skewed for n 5
and some slight skewness remains at n 25. For
n 100, the shape is basically symmetric. For
p 0.4, the shape shows a slight skewness at n 5 but is fairly symmetric at n 25 and beyond.
The farther p is from 0.5, the more skewness in the
sampling distribution of the sample proportion.
d. The rule does not work well for samples of size
5 for either value of p but works well for samples
of size 25 or more for both values of p.
E44. The first three histograms are simulated sampling
distributions for p 0.90, and the last three are
for p 0.98. The sample sizes for each proportion
are 50, 100, and 500.
For p 0.90, the shape is skewed toward the
smaller values for n 50 but is approximately
normal when n 100 and n 500. For p 0.98, the shape is highly skewed for n 50 and
remains highly skewed for n 100. It becomes
approximately normal at n 500. The rule that
np and n(1 p) must both be at least 10 seems to
work quite well.
The important message: Whether the normal
model works well as an approximation depends
upon both n and p. If p is close to 0 or 1, the
sample size must be quite large to produce an
approximately normal sampling distribution for
the sample proportion.
70
60
50
40
30
20
10
0
0.940
0.920
0.960
0.980
Sample Proportion (n ⫽ 500, p ⫽ 0.98)
1.000
Section 7.3 Solutions
91
xf ____________
_
0 14 + 1 26 0.65
E45. a. x ____
n 40
26
This sample mean is the same as p̂ __
40 0.65
because each success adds 1 and each failure
adds 0, so both formulas count the number of
successes, then divide by the sample size.
b. ∑ xi pi 0 0.4 1 0.6 0.6.
The box on page 450 says p 0.6, which
is the same.
c. The sample proportion is a type of sample
mean, if the successes are given the value 1 and
failures the value 0.
92
Section 7.3 Solutions
___________
E46. a. xi X 2pi
___________________________
(0 0.6)2 0.4 (1 0.6)2 0.6
_________________
0.62 0.4 0.42 0.6
________________
0.6 0.4(0.6 0.4)
_______
0.6 0.4
____
0.24
________
_______
____
b. p(1 p) 0.6 0.4 0.24 .
Both formulas give the same value for the
standard error.
Statistics in Action Instructor’s Guide, Volume 2
© 2008 Key Curriculum Press