Download UNIT 6A - Gordon State College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
106 CHAPTER 6: PUTTING STATISTICS TO WORK
UNIT 6A
TIME OUT TO THINK
Pg. 373. In this case, saying that the average contract
offer is zero also seems a bit misleading (just as
did saying it is $700,000 if we used the mean).
Pg. 375. Assuming the administration uses 12 faculty
members to teach the 12 classes, they could
simply redistribute assignments so that each of the
3 courses uses 4 faculty teaching sections of 25
students each. The advantages and disadvantages
of such a change are a matter of opinion, but a few
factors to consider: redistributing means students
will have fairly small classes all around, which is
generally a good thing, but will not have the
benefit of the extra small class that they now have
in one case. Other factors may also be involved,
such as having only one qualified faculty member
for one of the courses.
Pg. 377(1st). In either a right-skewed or left-skewed
distribution, the mean is further away from the
mode (center) than the median, which shows that
the mean is affected by outliers and the median is
not.
Pg. 377(2nd). In statistics, it refers to a distortion away
from symmetry.
QUICK QUIZ
1. b. The mean of a data set is the sum of all the data
values divided by the number of data points.
2. b. The median of a data set is the middle value
when the data are placed in numerical order.
3. c. Outliers are data values that are much smaller or
larger than the other values.
4. a. Outliers that are larger than the rest of the data
set tend to pull the mean to the right of the
median, and this is the case for all possible data
sets that satisfy the conditions given.
5. c. Because the mean is significantly higher than
the median, the distribution is skewed to the right,
which implies some students drink considerably
more than 12 sodas per week. Specifically, it can
be shown that at least one student must drink more
than 16 sodas per week.
6. c. The high salaries of a few superstars pull the tail
of this distribution to the right.
7. a. High outliers tend to pull the mean to the right
of the median, and thus you would hope to get a
salary near the mean. There are data sets with high
outliers that can be constructed where the median
is higher than the mean, but these are rare.
8. a. In general, data sets with a narrow central peak
typically have less variation than those with broad,
spread out values (the presence of outliers can
confound the issue).
9. c. Driving in rush hour can go quickly on some
days, but is agonizingly slow on others, so the
time it takes to get downtown has high variation.
10. b. The mayor would like to see a high median and
low variation because that means there’s a tight,
central peak in the distribution that corresponds to
support for her.
DOES IT MAKE SENSE?
7. Makes sense. The two highest grades may be large
enough to balance the remaining seven lower
grades so that the third highest is the mean.
8. Does not make sense. The median for this exam is,
by definition, the mean of the two middle scores
(the fifth and sixth scores), and thus it could not be
the third highest score.
9. Makes sense. When very high outliers are present,
they tend to pull the mean to the right of the
median (right-skewed).
10. Does not make sense. The word “average” has
several meanings in mathematics, and the
management and employees may be using
different measures of the average (the mean and
the median, for example).
11. Does not make sense. When the mean, median,
and mode are all the same, it’s a sign that you
have a symmetric distribution of data.
12. Makes sense. Suppose the mean age of the general
population is 32 years. There’s no reason why a
college extension class (which typically attracts
older students) could not share the same mean.
BASIC SKILLS & CONCEPTS
13. The mean and median are 5.68 and 5.685,
respectively. Because there is an even number of
data points, the median is calculated by taking the
mean of the middle two values (add 5.67 and 5.70,
and divide by 2). Since no value occurs more than
once, there is no mode.
14. The mean, median, and mode are 98.44, 98.4, and
98.4, respectively. Because there is an even
number of data points, the median is calculated by
taking the mean of the middle two values (add
98.4 and 98.4, and divide by 2).
15. The mean and median are 201.14 and 200,
respectively. Since no value occurs more than
once, there is no mode.
16. The mean and median are 78.4 and 79,
respectively. This is a multi-modal data set, with
modes of 70, 73, 76, and 81.
17. The mean and median are 1.18 and 1.05,
respectively. Since no value occurs more than
once, there is no mode.
UNIT 6A: CHARACTERIZING DATA 107
18. The mean and median are 0.9194 and 0.920,
respectively. There is no mode.
19. The mean and median are 0.81237 and 0.8161,
respectively. The outlier is 0.7901, because it
varies from the others by a couple hundredths of a
pound, while the rest vary from each other only by
a few thousandths of a pound. Without the outlier,
the mean is 0.81608, and the median is 0.8163.
20. The mean is 11.125 years, and the median is 7
years. 27 is an outlier. Eliminating 27, the mean
and median are 5.83 and 3, respectively.
21. The median would be a better representation of the
average. A small number of households with high
earnings skew the distribution to the right,
affecting the mean.
22. This distribution is probably skewed to the right
by a small percentage of men who marry (for the
first time) very late in life. Therefore the median
would do a better job representing the average age
(the outliers affect the mean).
23. This distribution is probably skewed to the right
by a small percentage of people who change jobs
many times. Thus the median would do a better
job of representing the average, as the outliers
affect the mean.
24. More likely than not, this distribution is skewed to
the right by a few flights where many pieces of
luggage were lost. Because of this, the mean is
probably higher than the median, and thus the
airline would want to use the median as the best
representation of the average. Customers of the
airline might argue that the mean is a better
descriptor for the average.
25. This is a classic example of a symmetric
distribution, and thus the mean and the median are
almost identical (in a perfectly symmetric
distribution, they are identical) – either the mean
or the median could be used.
26. This distribution is probably skewed to the right
by those times when the wait is long, and therefore
the median is a better representation of the average
(certainly from the view of an advertisement for
the bank – customers might claim the mean is a
more honest way of describing the average wait
time).
27. a. There is one peak on the far left due to all the
people who scored low.
b. The distribution is right-skewed because the
scores tail off to the right.
c. The variation is large.
28. a. There would be two peaks, one representing the
figure skaters who use the rink in the morning,
who would have lower weights, and the other
representing the professional hockey players who
would have higher weights.
b. This distribution would be symmetric, as each
peak would be close to the same shape, centered
around a point in the middle of the data.
c. The variation would be moderate, since the
variation within the weights of hockey players and
within the weights of the figure skaters would not
be great.
29. a. There are likely two peaks in such a
distribution, one for the women, and one for the
men.
b. The distribution is symmetric because one
would find a tail on the left representing the
lightest people on the team, and a tail on the right
representing the heaviest people on the team.
c. Relative to a single-gendered set of athletes, the
variation would be large.
30. a. The distribution would not have a peak since all
digits, 0–9, have nearly the same frequency in the
case where the last digit is selected randomly.
b. This is a symmetric distribution because it is
uniform.
c. The data values are spread out evenly across the
distribution, so there is large variation.
31. a. Consider a typical data set for the monthly
sales: we expect the sales to be high in the winter
months, with very low sales (or even no sales) in
the summer. The number of shovels sold each
month, beginning in January, might look like this:
{25, 20, 15, 3, 0, 0, 0, 0, 0, 3, 15, 20}. To picture
this data set as a distribution, we create a
frequency histogram.
This histogram has two peaks, the second not as
dramatic as the first. Realize that although the
histogram would look different if we varied the
bin sizes or the hypothetical sales data, its basic
shape is captured in the diagram above. This is
due to the fact that a majority of the monthly sales
figures are expected to be low (corresponding to
the left peak), and the winter monthly sales figures
would be high (corresponding to the right peak).
108 CHAPTER 6: PUTTING STATISTICS TO WORK
b. The histogram shown in part a is right-skewed.
c. The variation is large because of the difference
between low sales in the summer, and high sales
in the winter.
32. a. There are likely two peaks in such a
distribution, one for the compact cars, and one for
the sports utility vehicles.
b. The frequency histogram in part a would be
symmetric.
c. The variation in car weights would be moderate
because the weights of compact cars and sports
utility vehicles would differ greatly, but there
would be little variation among the weights of
compact cars and among the weights of sports
utility vehicles.
33. a. There would be one peak in the distribution,
corresponding to the average price of detergent.
b. More likely than not, there would be no extreme
outliers in this data set, and it would be symmetric.
c. There would be moderate variation in the data
set because we don’t expect outliers. (The
detergent may not sell well next to 19 other brands
if its price is radically different. Of course this
begs the question, “But what if it’s much cheaper
than the other brands?” That’s not too likely to
happen for a commodity like detergent, because
once one company begins selling its detergent at a
low price, the others would be forced to lower
their prices to compete).
34. a. There would be one peak to the left,
corresponding to the ages of children, who would
visit the amusement parks in larger numbers than
adults.
b. The distribution is right-skewed because the
ages of adults visiting the amusement park would
tail off to the right.
c. The variation would be large, since there could
be infants and senior citizens visiting the
amusement park.
FURTHER APPLICATIONS
35. The distribution has two peaks (bimodal), with no
symmetry, and large variation. For tourists who
want to watch the geyser erupt, it means they may
have to wait around for a while (most likely
between 25 and 40 minutes, though perhaps as
long as 50 to 90 minutes, depending on when they
arrive at the site).
36. The distribution has one peak, and is right-skewed.
Most of the data is clustered around its peak,
though it has considerable spread, so it has
moderate variation. The histogram tells us that if a
computer chip is going to fail, it will most likely
fail soon, rather than late.
37. This distribution has one peak, is symmetric, and
has moderate variation. The graph says that
weights of rugby players are usually near the mean
weight, but that there are some light and some
heavy players.
38. a. As shown, the graph is not symmetric (though
the idea it represents – uniform distribution of
randomly chosen numbers – is symmetric).
b. No, the peaks appear randomly (though, again,
the smooth distribution one envisions when
numbers are chosen randomly could be described
as a distribution with no peaks).
c. No, the distribution would look different
because the numbers are chosen at random, and
the ups and downs of the graph would appear at
random (and different) locations.
d. The histogram would get smoother,
approaching a uniform distribution.
39. a. Sketches will vary. We cannot be sure of the
exact shape of the distribution with the given
parameters, as we don’t have all the raw data.
However, because the mean is larger than the
median, and because the data set has large outliers,
it would likely be right-skewed, with a single peak
at the mode.
b. About 150 families (50%) earned less than
$45,000 because that value is the median income.
c. We don’t have enough information to be certain
how many families earned more than $55,000, but
it is likely a little less than half, simply because the
mean is a measure of center for all distributions.
(Note that it is a little less than half because the
outliers tend to pull the mean right of the median).
40. a. Because the mode is 0 minutes, and the
distribution is right skewed (its mean is right of
the median, and there are high outliers), the graph
would have a single peak at 0 and would decrease
to the right. (Note that planes do not typically
depart earlier than scheduled – it would be unwise
to leave passengers behind – and thus there are no
negative delays).
b. 50% of the flights were delayed less than the
median time of 6 minutes due to the definition of
the median.
c. 50% of the flights were delayed more than the
median time of 6 minutes due to the definition of
the median.
TECHNOLOGY EXERCISES
48. a. The mean, median and mode for team A are
134, 125, and 115 respectively. The mean and
median for team B are 131 and 135, respectively.
There is no mode for Team B.
b. The coach of team A can claim that his team
has the greater mean weight.
UNIT 6B: MEASURES OF VARIATION 109
c. The coach of team B can claim that his team has
the greater median weight.
d. The mean weight of the teams combined is
132.5, which is equal to the mean of the two mean
weights of the teams. This is true since both teams
have the same number of players.
e. The median weight of the teams combined is
130, which is equal to the median of the two
median weights of the teams. This result is not
true in general.
UNIT 6B
8. a. Using the range rule of thumb, we see that the
range is about four times as large as the standard
deviation. (The worst-case scenario is when the
range and standard deviation are both zero).
9. b. Newborn infants and first grade boys have
similar heights, whereas there is considerable
variation in the heights of all elementary school
children.
10. c. Because the standard deviation for Garcia is the
largest, the data are more spread out, so Professor
Garcia must have had very high grades from more
students than the other two.
DOES IT MAKE SENSE?
TIME OUT TO THINK
Pg. 382. Examples and opinions will vary. Big Bank’s
three lines will have greater variation since
customers entering the bank can choose the
shortest line, change lines if the customer at the
front of the line has a time consuming transaction
to complete, and there is a chance that the person
at the front of all three lines has a lengthy
transaction.
Pg. 387. Students should notice the sense in which the
standard deviation is an average of the individual
deviations. Also, notice that the larger deviations
are more heavily weighted in the calculation as a
result of taking the squares.
QUICK QUIZ
1. a. The range is defined as the high value – low
value.
2. c. The five-number summary includes the low
value, the first quartile, median, and third quartile,
and the high value. Unless the mean happens to be
the same as the median, it would not be part of the
five-number summary.
3. a. Roughly half of any data set is contained
between the lower and upper quartiles (“roughly”
because with an odd number of data values, one
cannot break a data set into equal parts).
4. b. Consider the set {0, 0, 0, 0, 0, 0, 0, 0, 0, 100}.
Its mean is 10, which is larger than the upper
quartile of 0.
5. b. You need the high and low values to compute
the range, and you need all of the values to
compute the standard deviation, so the only thing
you can compute is a single deviation.
6. b. The standard deviation is defined in such a way
that it can be interpreted as the average distance of
a random value from the mean.
7. c. The standard deviation is defined as the square
root of the variance, and thus is always nonnegative.
7. Does not make sense. The range depends upon
only the low and high values, not on the middle
values.
8. Makes sense. The upper quartile includes the
highest score.
9. Makes sense. Consider the case where the 15
highest scores were 80, the next score was 68, and
the lowest 14 scores were 40. There are numerous
such cases that satisfy the conditions given.
10. Makes sense. Using the range rule of thumb, the
range is approximately four times as large as the
standard deviation. The only instance where the
range would not be larger than the standard
deviation is the case where both were 0.
11. Makes sense. The standard deviation describes the
spread of the data, and one would certainly expect
the heights of 5-year old children to have less
variation than the heights of children aged 3 to 15.
12. Does not make sense. The standard deviation
carries the same units as the mean.
BASIC SKILLS & CONCEPTS
13. The mean waiting time at Big Bank is (4.1 + 5.2 +
5.6 + 6.2 + 6.7 + 7.2 + 7.7 + 7.7 + 8.5 + 9.3 +
11.0) ÷ 11 = 79.2 ÷ 11 = 7.2. Because there are
11 values, the median is the sixth value = 7.2.
14. The mean waiting time at Best Bank is (6.6 + 6.7
+ 6.7 + 6.9 + 7.1 + 7.2 + 7.3 + 7.4 + 7.7 + 7.8 +
7.8) ÷ 11 = 79.2 ÷ 11 = 7.2. Because there are
11 values, the median is the sixth value = 7.2.
15. a. For the East Coast, the mean, median, and range
are 134.97, 123.45, and 117.8, respectively. For
the West Coast, the mean, median, and range are
145.82, 150.3, and 69.2, respectively.
b. For the East Coast, the five-number summary is
(98.2, 108.7, 123.45, 140.0, 216.0). For the West
Coast, it is (113.2, 122.7, 150.3, 156.0, 182.4).
The boxplots are not shown.
c. The standard deviation for the East Coast is
42.86. For the West Coast, it is 25.06.
110 CHAPTER 6: PUTTING STATISTICS TO WORK
d. For the East Coast, the standard deviation is
approximately 117.8 ÷ 4 = 29.45, which is a far
cry from the actual value of 42.86. This is due, in
part, to the outlier of New York City (216.0). For
the West Coast, the standard deviation is
approximately 69.2 ÷ 4 = 17.3, which is also well
off the mark.
e. The cost of living index is smaller, on average,
for the East Coast cities shown, though the
variation is higher (due in part to the outlier of
New York City).
16. a. For the coastal cities, the mean, median, and
range are 7943.6, 7832, and 5614, respectively.
The mean, median, and range for the non-coastal
cities are 6468.2, 6699, and 5358, respectively.
b. For the coastal cities, the five-number summary
is (4876, 7488, 7832, 8923, 10,490). For the noncoastal cities, it is (2899, 5683, 6699, 7434, 8257).
The boxplots are not shown.
c. The standard deviation for the coastal cities is
1593.4. It is 1533.9 for the non-coastal cities.
d. The standard deviation for the coastal cities is
approximately 5614 ÷ 4 = 1403.5, which
underestimates the actual value of 1593.4. For the
non-coastal cities, the standard deviation is
approximately 5358 ÷ 4 = 1339.5, which also
underestimates the actual value of 1533.9.
e. The mean and median taxes for non-coastal
cities are considerably smaller than the taxes paid
in coastal cities, though the variation is slightly
larger.
17. a. For the no treatment group, the mean, median,
and range are 0.184, 0.16, and 0.35, respectively.
For the treatment group, the mean, median, and
range are 1.334, 1.07, and 2.11, respectively.
b. The five-number summary for the no treatment
group is (0.02, 0.085, 0.16, 0.295, 0.37). It is
(0.27, 0.595, 1.07, 2.205, 2.38) for the treatment
group. The boxplots are not shown.
c. The standard deviation for the no treatment
group is 0.127; for the treatment group, it is 0.859.
d. The standard deviation for the no treatment
group is approximately 0.35 ÷ 4 = 0.0875, which
underestimates the actual value of 0.127. For the
treatment group, the standard deviation is
approximately 2.11 ÷ 4 = 0.5275, which is
underestimates the actual value of .859.
e. The average weight and standard deviation of
trees in the treatment group are higher that those
of the no treatment group.
18. a. The mean, median, and range for Beethoven’s
symphonies are 38.8, 36, and 42 minutes,
respectively. For Mahler, these numbers are 75,
80, and 44 minutes.
b. Beethoven’s five-number summary is (26, 29,
36, 45, 68). Mahler’s is (50, 62, 80, 87.5, 94). The
boxplots are not shown.
c. The standard deviation for Beethoven’s
symphonies is 13.13 minutes. It is 15.44 for
Mahler’s symphonies.
d. The standard deviation for Beethoven’s
symphonies is approximately 42 ÷ 4 = 10.5,
which underestimates the actual value of 13.13.
For Mahler, the standard deviation is
approximately 44 ÷ 4 = 11, which underestimates
the actual value of 15.44.
e. Mahler wrote symphonies that were
significantly longer, on average, than Beethoven’s,
and the variation is slightly larger for Mahler’s
symphonies.
FURTHER APPLICATIONS
19. a.
UNIT 6B: MEASURES OF VARIATION 111
b. The five-number summaries for each of the sets
shown are (in order): (9, 9, 9, 9, 9); (8, 8, 9, 10,
10); (8, 8, 9, 10, 10); and (6, 6, 9, 12, 12). The
boxplots are not shown.
c. The standard deviations for the sets are (in
order): 0.000, 0.816, 1.000, and 3.000.
d. Looking at just the answers to part c, we can tell
that the variation gets larger from set to set.
20. a.
21.
22.
23.
24.
25.
b. The five-number summaries for each of the sets
shown are (in order): (6, 6, 6, 6, 6); (5, 5, 6, 7, 7);
(5, 5, 6, 7, 7); and (3, 3, 6, 9, 9). The boxplots are
not shown.
c. The standard deviations for the sets are (in
order): 0.000, 0.816, 1.000, and 3.000.
d. Looking at just the answers to part c, we can tell
that the variation gets larger from set to set.
The first pizza shop has a slightly larger mean
delivery time, but much less variation in delivery
time when compared to the second shop. If you
want a reliable delivery time, choose the first
shop. If you don’t care when the pizza arrives,
choose the shop that offers cheaper pizza (you like
them equally well, so you may as well save some
money).
While Skyview Airlines runs a little ahead of
schedule (on average), it can also have long delays
(late arrivals). SkyHigh Airlines on average runs
slightly behind schedule, but it tends to have
smaller delays. If you want to avoid the possibility
of long delays, SkyHigh might be the better
choice.
A lower standard deviation means more certainty
in the return, and a lower risk.
Factory A has the lower average defect rate per
day, but on some days, it can be expected to
produce more defective chips than Factory B. The
process in Factory B is more uniform and
consistent, and thus one might argue that it is more
reliable.
Batting averages are less varied today than they
were in the past. Because the mean is unchanged,
batting averages above .350 are less common
today.
TECHNOLOGY EXERCISES
30. a. The mean, median, and standard deviation of
the data are 91.25, 93.5, and 8.85, respectively.
b. The mean, median, and standard deviation of
the data are 99.55, 94, and 31.3, respectively. The
mean and standard deviation are both greater.
c. The mean, median, and standard deviation of
the new data are 91.25, 93, and 8.48, respectively.
112 CHAPTER 6: PUTTING STATISTICS TO WORK
The mean remains the same.
standard deviation are smaller.
The median and
UNIT 6C
6.
TIME OUT TO THINK
Pg. 396. No; the 100th percentile is defined as a score
for which 100% of the data values are less than or
equal to it. Thus it must be the highest score, so no
one can score ABOVE the 100th percentile.
Pg. 397. The most likely explanation is that a
minimum height is thought necessary for the
army. If there is any maximum height, it must be
taller than most men (since men make up much of
the army), which means it is much taller than most
women. Opinions will vary on the issue of
fairness. This would be a good topic for a
discussion either during or outside of class.
7.
8.
QUICK QUIZ
1. b. A normal distribution is symmetric with one
peak, and this produces a bell-shaped curve.
2. a. The mean, median and mode are equal in a
normal distribution.
3. a. Data values farther from the mean correspond to
lower frequencies than those close to the mean,
due to the bell-shaped distribution.
4. c. Since most of the workers earn minimum wage,
the mode of the wage distribution is on the left,
and the distribution is right-skewed.
5. a. Roughly 68% of data values fall within one
standard deviation of the mean, and 68% is about
2/3.
6. c. In a normally distributed data set, 99.7% of all
data fall within 3 standard deviations of the mean.
7. a. Note that 43 mpg is 1 standard deviation above
the mean. Since 68% of the data lies within one
standard deviation of the mean, the remaining
32% is farther than 1 standard deviation from the
mean. The normal distribution is symmetric, so
half of 32%, or 16%, lies above 1 standard
deviation.
8. c. The z-score for 84 is (84 – 75)/6 = 1.5.
9. b. Table 6.3 shows that the percentile
corresponding with a standard score of –0.60 is
27.43%.
10. c. If your friend said his IQ was in the 75th
percentile, it would mean his IQ is larger than
75% of the population. It is impossible to have an
IQ that is larger than 102% of the population.
DOES IT MAKE SENSE?
5. Makes sense. Physical characteristics are often
normally distributed, and college basketball
9.
10.
players are tall, though not all of them, so it makes
sense that the mean is 6 '3" , with a standard
deviation of 3” (this would put approximately 95%
of the players within 5'9" to 6 '9" ).
Makes sense. Physical characteristics are often
normally distributed, and these statistics say that
approximately 95% of babies born at Belmont are
between 4.4 and 9.2 pounds (which is reasonable).
Does not make sense. With a mean of 6.8 pounds,
and a standard deviation of 7 pounds, you would
expect that a small percentage of babies would be
born 2 standard deviations above the mean. But
this would imply that some babies weigh more
than 20.8 pounds, which is much too heavy for a
newborn.
Does not make sense. The standard score is
computed for an individual data value, not for an
entire set of exams scores. Furthermore, a standard
score of 75 for a particular data value would mean
that value is 75 standard deviations away from the
mean, which is virtually impossible (almost all
data in a normally distributed data set is within 3
standard deviations of the mean).
Makes sense. A standard score of 2 or more
corresponds to percentiles of 97.72% and higher,
so while this could certainly happen, the teacher is
giving out As (on the final exam) to only 2.28% of
the class
Makes sense. If Jack is in the 50th percentile, he is
taller than half of the population (to which he is
being compared), and by definition, he is at the
median height.
BASIC SKILLS & CONCEPTS
11. Diagrams a and c are normal, and diagram c has a
larger standard deviation because its values are
more spread out.
12. Diagrams b and c are normal, and diagram b has a
larger standard deviation because its values are
more spread out.
13. Most trains would probably leave near to their
scheduled departure times, but a few could be
considerably delayed, which would result in a
right-skewed (and non-normal) distribution.
14. This distribution would be normal because a
machine fills the bags with approximately 40
pounds of lawn fertilizer, and the variation in these
weights would be random, which typically results
in normal distributions.
15. Random factors would be responsible for the
variation seen in the distances from the bull’s-eye,
so this distribution would be normal.
16. Distributions that measure athletic ability tend to
be normal.
UNIT 6C: THE NORMAL DISTRIBUTION 113
17. Easy exams would have scores that are leftskewed, so the scores would not be normally
distributed.
18. Performances in athletic events are often normally
distributed.
19. a. Half of any normally distributed data set lies
below the mean, so 50% of the scores are less than
100.
b. Note that 120 is one standard deviation above
the mean. Since 68% of the data lies within one
standard deviation of the mean (between 80 and
120), half of 68%, or 34%, lies between 100 and
120. Add 34% to 50% (the answer to part a) to
find that 84% of the scores are less than 120.
c. Note that 140 is two standard deviations above
the mean. Since 95% of the data lies within two
standard deviations above the mean (between 60
and 140), half of 95%, or 47.5%, lies between 100
and 140. Add 47.5% to 50% (see part a) to find
that 97.5% of the scores are below 140.
d. Note that 60 is two standard deviations below
the mean. Since 95% of the data lies within two
standard deviations of the mean (between 60 and
140), 5% of the data lies outside this range.
Because the normal distribution is symmetric, half
of 5%, or 2.5%, lies in the lower tail below 60.
e. A sketch of the normal distribution would show
that the amount of data lying above 60 is
equivalent to the amount of data lying below 140,
so the answer is 97.5% (see part c).
f. Note that 160 is three standard deviations above
the mean. Since 99.7% of the data lies within three
standard deviations of the mean (between 40 and
160), 0.3% of the data lies outside this range.
Because the normal distribution is symmetric, half
of 0.3%, or 0.15%, lies in the upper tail above 160.
g. A sketch of the normal distribution shows that
the percent of data lying above 80 is equivalent to
the percent lying below 120, so the answer is 84%
(see part b).
h. The percent of data between 60 and 140 is 95%,
because these are each two standard deviations
from the mean.
20. a. A heart rate of 55 is one standard deviation
below the mean. Since 68% of the data lies
between 55 and 85, 32% lie outside that range,
which implies 16% lies below 55 (due to
symmetry).
b. A heart rate of 40 is two standard deviations
below the mean. Since 95% of the data lies
between 40 and 100, 5% lie outside that range,
which implies 2.5% lies below 40 (due to
symmetry).
c. Half of 68% (or 34%) lies between 70 and 85,
50% lies below the mean of 70, so 84% of the data
is less than 85.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
d. Half of 95% (or 47.5%) lies between 70 and
100, and 50% lies below the mean of 70, so 97.5%
of the data lies below 100.
e. This portion of the distribution is equal to the
portion less than 55 because of symmetry, so the
answer is 16% (see part a).
f. The portion of the distribution greater than 55 is
equal to the portion less than 85 because of
symmetry, so the answer is 84% (see part c).
g. The portion of the distribution more than 40 is
equal to the portion less than 100 because of
symmetry, so the answer is 97.5% (see part d).
h. Both 55 and 85 are one standard deviation from
the mean, and thus 68% of the data lies between
them.
A score of 59 is one standard deviation below the
mean. Since 68% of normally distributed data lies
within one standard deviation of the mean, half of
68%, or 34%, lies between 59 and 67 due to
symmetry. The percent of data lying above 67 is
50%, so we add these two results to find that 84%
of the scores lie above 59.
A score of 83 is two standard deviations above the
mean. Since 95% of normally distributed data lies
within two standard deviations of the mean
(between 51 and 83), 5% lies outside this range.
Half of 5%, or 2.5%, will be in the upper tail,
above 83, so 100% – 2.5% = 97.5% of the scores
lie below 83.
Subtract two standard deviations from the mean to
get the cutoff score of 67 – 16 = 51. Since 95% of
the data lies within two standard deviations of the
mean, 5% lies outside this range. Half of 5%, or
2.5%, will be below 51, and will fail the exam.
A score of 75 is one standard deviation above the
mean. Since 68% of the data lies within one
standard deviation of the mean (between 59 and
75), 32% of the data is outside this range. Half of
32%, or 16%, will be in the lower tail of the
distribution. Since 68% + 16% = 84%, 84% ×
500 = 420 of the students will score below 75.
The standard score for 67 is z = (67 – 67)/8 = 0.
The standard score for 59 is z = (59 – 67)/8 = –1.
The standard score for 55 is z = (55 – 67)/8 = –1.5
The standard score for 88 is z = (88 – 67)/8 =
2.625.
a. z = 1. The corresponding percentile is 84.13%.
b. z = 0.5, and the corresponding percentile is
69.15%.
c. z = –1.5, and the corresponding percentile is
6.68%.
a. z = –0.5, and the corresponding percentile is
30.85%.
b. z = –2, and the corresponding percentile is
2.28%.
114 CHAPTER 6: PUTTING STATISTICS TO WORK
c. z = 1.2, and the corresponding percentile is
88.48%.
31. a. The 20th percentile corresponds to a standard
score of about z = –0.83, which means the data
value is about 0.83 standard deviations below the
mean.
b. The 80th percentile corresponds to a standard
score of about z = 0.84, which means the data
value is 0.84 standard deviations above the mean.
c. The 63rd percentile corresponds to a standard
score of about z = 0.33, which means the data
value is 0.33 standard deviations above the mean.
32. a. The 10th percentile corresponds to a standard
score of about z = –1.28, which means the data
value is 1.28 standard deviations below the mean.
b. The 35th percentile corresponds to a standard
score of about z = –0.38, which means the data
value is 0.83 standard deviations below the mean.
c. The 88th percentile corresponds to a standard
score of about z = 1.18, which means the data
value is 1.18 standard deviations above the mean.
39.
40.
41.
42.
43.
FURTHER APPLICATIONS
33. About 68% of births occur within 15 days of the
mean, because this range is one standard deviation
from the mean.
34. Within one month of the due date corresponds to
two standard deviations on either side of the mean,
which implies 95% of the births occur within one
month of the mean.
35. Since 68% of births occur within 15 days of the
due date, 32% occur outside this range, and half of
32%, or 16%, occur more than 15 days after the
due date.
36. Since 95% of births occur within 30 days of the
due date, 5% occur outside this range, and half of
5%, or 2.5%, occur more than 30 days after the
due date.
37. a. The standard score is z = (66 – 63.6)/2.5 = 0.96,
which corresponds to an approximate percentile of
83%.
b. z = (62 – 63.6)/2.5 = –0.64, which corresponds
to an approximate percentile of 26%.
c. z = (61.5 – 63.6)/2.5 = –0.84, which
corresponds to an approximate percentile of 20%.
d. z = (64.7 – 63.6)/2.5 = 0.44, which corresponds
to an approximate percentile of 67%.
38. a. The standard score is z = (22 – 26.2)/4.7 = –
0.89, which corresponds to an approximate
percentile of 19%.
b. z = (28 – 26.2)/4.7 = 0.38, which corresponds to
an approximate percentile of 65%.
c. For a BMI of 25, z = (25 – 26.2)/4.7 = –0.26,
which corresponds to an approximate percentile of
40%, so 60% of American men are overweight.
For a BMI of 30, z = (30 – 26.2)/4.7 = 0.81, which
44.
45.
46.
47.
corresponds to an approximate percentile of 79%,
so 21% of American men are obese.
It is not likely because heights of eighth-graders
would be normally distributed, and a standard
deviation of 40 inches would imply some of the
eighth graders (about 16%) are taller than 95
inches, which is preposterous.
Assuming the scores are out of 100, it is not likely.
A standard deviation of 50 would imply some of
the scores are higher than 100.
The 90th percentile corresponds to a z-score of
about 1.3, which means a data value is 1.3
standard deviations above the mean. For the verbal
portion of the GRE, this translates into a score of
462 + 1.3(119) = 617. For the quantitative portion,
it translates into a score of 584 + 1.3(151) = 780.
The standard score for 600 is z = (600 – 462)/119
= 1.16, which corresponds to about the 87th
percentile. This is the percent of students who
score below 600, so the percent that score above
600 is 13%.
The standard score for 600 is z = (600 – 584)/151
= 0.106, which corresponds to about the 54th
percentile. This is the percent of students who
score below 600, so the percent that score above
600 is 46%.
The standard score for 700 is z = (700 – 584)/151
= 0.77, which is 0.77 standard deviations above
the mean. Your verbal score would also need to be
0.77 standard deviations above the mean to be in
the same percentile, which translates to a score of
462 + 0.77(198) = 554.
The z-score for an 800 on the quantitative exam is
z = (800 – 584)/151 = 1.43, or about 1.4. This
corresponds to the 91.92 percentile, so about 92%
score below 800, which means 8% score 800.
The z-score for an 800 on the verbal exam is z =
(800 – 462)/119 = 2.84, which corresponds to an
approximate percentile of 99.7%. Since 99.7%
score below 800, about 0.3% score 800.
For the verbal exam, z = (400 – 462)/119 = –0.52.
, which corresponds to an approximate percentile
of 30%. Thus about 30% score below 400. For the
quantitative exam, z = (400 – 584)/151 = –1.22,
which corresponds to an approximate percentile of
11%. Thus about 11% score below 400.
TECHNOLOGY EXERCISES
53. a. z = (72 – 69.7)/2.7 = 0.85, which, from table
6.3, corresponds to an approximate percentile of
80%.
b. z = (65 – 69.7)/2.7 = –1.74;
NORMDIST(64,69.7,2.7,TRUE)
gives
the
percentile as 4.0865%, which corresponds to an
approximate
percentile
of
4%.
UNIT 6D: STATISTICAL INFERENCE 115
c. 6 '4" = 76"; NORMDIST(76,69.7,2.7,TRUE),
gives the percentile as 99.0185%, so
approximately 1% of American adult males are
taller than 6 '4".
d. 5' 4" = 64"; NORMDIST(64,69.7,2.7,TRUE),
gives the percentile as 1.7381%, so approximately
1.7% of American adult males are shorter than
5' 4".
e. 6 '7" = 79"; NORMDIST(79,69.7,2.7,TRUE),
gives the percentile as 99.97139%, so
approximately (100% – 99.97139%)
×
115,000,000 = 33,000 of American adult males are
as tall as Kobe Bryant.
UNIT 6D
TIME OUT TO THINK
Pg. 403. The probability is 0.01, or 1 in 100, that the
results occurred by chance. This is certainly strong
evidence in support of the herbal remedy, but by
no means does it constitute proof.
Pg. 405. Opinions will vary, but 95% confidence
seems reasonable for most surveys, which have no
need to be perfect. However, for something of
great importance, such as deciding an election, it
is far too low. That is why elections must be
decided by letting everyone vote, not through
statistical sampling.
Pg. 407. The null hypothesis is “the suspect is
innocent.” A guilty verdict means rejecting the
null hypothesis, concluding that the suspect is
NOT innocent. A “not guilty” verdict means not
rejecting the null hypothesis, in which case we
continue to assume that the suspect is innocent.
QUICK QUIZ
1. c. A divorce rate of 60% is statistically significant
because such large variation from the national rate
of 30% is not likely to occur by chance alone.
2. b. In order to determine whether a result is
statistically significant, we must be able to
compare it to a known quantity (such as the effect
on a control group in this case).
3. a. Statistical significance at the 0.01 level means
an observed difference between the effectiveness
of the remedy and the placebo would occur by
chance in fewer than 1 out of 100 trials of the
experiment.
4. c. If the statistical significance is at the 0.05 level,
we would expect that 1 in 20 experiments would
show the pill working better than the placebo by
chance alone.
5. b. The confidence interval is found by subtracting
and adding 3% to the results of the poll (65%).
6. c. The central limit theorem tells us that the true
proportion is within the margin of error 95% of the
time.
1
. Set
7. c. The margin of error is approximately
n
this equal to 4% (0.04), and solve for n to find that
the sample size was 625. If we quadruple the
sample size to 2500, the margin of error will be
1
= 0.02 , or 2%.
approximately
2500
8. a. The null hypothesis is often taken to be the
assumption that there is no difference in the things
being compared.
9. c. If you cannot reject the null hypothesis, the only
thing you can be sure of is that you don’t have
evidence to support the alternative hypothesis.
10. b. If the difference in gas prices can be explained
by chance in only 1 out of 100 experiments,
you’ve found a result that is statistically
significant at the 0.01 level.
DOES IT MAKE SENSE?
9. Makes sense. If the number of people cured by the
new drug is not significantly larger than the
number cured by the old drug, the difference could
be explained by chance alone, which means the
results are not statistically significant.
10. Does not make sense. While results significant at
the 0.05 level offer some evidence that the Magic
Diet Pill works, we cannot remove all doubt that
the results came about by chance alone.
11. Does not make sense. The margin of error is based
upon the number of people surveyed (the sample
size), and thus both Agency A and B should have
the same margin of error.
12. Makes sense. The margin of error is based upon
the number of people surveyed (the sample size),
1
, it decreases
and because it is approximately
n
as n increases.
13. Makes sense. The alternative hypothesis is the
claim that is accepted if the null hypothesis is
rejected.
14. Does not make sense. One cannot prove the null
hypothesis – it can be rejected, in which case there
is evidence for the alternative hypothesis, or we
can not reject it, which only means we do not have
evidence to support the alternative hypothesis.
BASIC SKILLS & CONCEPTS
15. Most of the time you would expect around 10
sixes in 60 rolls of a standard die. It would be very
rare to get only 2 sixes, and thus this is statistically
significant.
116 CHAPTER 6: PUTTING STATISTICS TO WORK
16. Most of the time you would expect around 50
heads in 100 tosses of a coin, and since 48 is well
within the bounds of what would be expected by
chance alone, this is not statistically significant.
17. Note that 11/200 is about 5% – this is exactly what
you would expect by chance alone from an airline
that has 95% of its flights on time.
18. It is very unlikely that a basketball player that
makes 92% of free throws would miss 18 in a row.
This is statistically significant. (In chapter 7, you
will learn that probability of this occurring is
practically zero.)
19. There aren’t many winners in the Power Ball
lottery, and yet there are a good number of 7-11
stores. It would be very unlikely that ten winners
in a row purchased their tickets at the same store,
so this is statistically significant.
20. The likelihood of 42 people in a group of 45 all
sharing the same birthday is extremely small, so
this is statistically significant.
21. The result is significant at far less than the 0.01
level (which means it is significant at both the
0.01 and 0.05 level), because the probability of
finding the stated results is less than 1 in 1 million
by chance alone. This gives us good evidence to
support the alternative hypothesis that the
accepted value for temperature is wrong.
22. This result is significant at the 1/10,000 = 0.0001
level, so it is significant at both the 0.01 and 0.05
levels. Because chance alone does not do a good
job of explaining the difference found in length of
hospital stays, it would be reasonable to conclude
that seat belts reduce the severity of injuries
(assuming that the severity of injury is directly
correlated with length of hospital stay).
23. The results of the study are significant at neither
the 0.05 level nor the 0.01 level, and thus the
improvement is not statistically significant.
24. The difference found in the weights would be
expected by chance alone in only 1 out of 100
such surveys.
25. The margin of error is 1/ 1012 = 0.03 , or 3%,
and the 95% confidence interval is 29% to 35%.
This means we can be 95% confident that the
actual percentage is between 29% and 35%.
26. The margin of error is 1/ 65, 000 = 0.004 , or
0.4%, and the 95% confidence interval is 8.5% to
9.3%. This means we can be 95% confident that
the actual percentage is between 8.5% and 9.3%.
27. The margin of error is 1/ 540 = 0.043 , or 4.3%,
and the 95% confidence interval is 59.7% to
68.3% for the satisfied group, and 59.7% to 68.3%
for the dissatisfied group. This means we can be
95% confident that the actual percentage is
between and 59.7% and 68.3%.
28. The margin of error is 1/ 1110 = 0.03 , or 3%,
and the 95% confidence interval is 56% to 62%.
This means we can be 95% confident that the
actual percentage is between and 56% and 62%.
29. The margin of error is 1/ 1003 = 0.03 , or 3%,
and the 95% confidence interval is 45% to 51%.
This means we can be 95% confident that the
actual percentage is between and 45% and 51%.
30. The margin of error is 1/ 2818 = 0.02 , or 2%,
and the 95% confidence intervals are 71% to 75%
and 49% to 53%. This means we can be 95%
confident that the actual percentage is between and
71% and 75%.
31. The margin of error is 1/ 1100 = 0.03 , or 3%,
and the 95% confidence intervals are 38% to 44%
and 12% to 16%. This means we can be 95%
confident that the actual percentage is between and
38% and 44%.
32. The margin of error is 1/ 1229 = 0.03 , or 3%,
and the 95% confidence intervals are 25% to 31%
and 12% and 18%.
33. a. Null hypothesis: percentage of high school
graduates = 85%. Alternative hypothesis:
percentage of high school students > 85%.
b. Rejecting the null hypothesis means there is
evidence that the percentage of high school
graduates exceeds 85%. Failing to reject the null
hypothesis means there is insufficient evidence to
conclude that the percentage of high school
graduates exceeds 85%.
34. a. Null hypothesis: vitamin C in tablets = 500 mg.
Alternate hypothesis: vitamin C in tablets < 500
mg.
b. Rejecting the null hypothesis means that there is
evidence that the tablets contain less than 500 mg.
Failing to reject the null hypothesis means there is
insufficient evidence to conclude that the tablets
contain less than 500 mg.
35. a. Null hypothesis: mean teacher salary = $47,750.
Alternative hypothesis: mean teacher salary >
$47,750.
b. Rejecting the null hypothesis means there is
evidence that the mean teacher salary in the state
exceeds $47,750. Failing to reject the null
hypothesis means there is insufficient evidence to
conclude that the mean teacher salary exceeds
$47,750.
36. a. Null hypothesis: water usage = 1675 gallons per
month. Alternative hypothesis: water usage > 1675
gallons per month.
b. Rejecting the null hypothesis means there is
evidence that the water usage exceeds 1675
gallons per month. Failing to reject the null
hypothesis means there is insufficient evidence to
UNIT 6D: STATISTICAL INFERENCE 117
37.
38.
39.
40.
41.
42.
43.
44.
conclude that water usage exceeds 1675 gallons
per month.
a. Null hypothesis: percentage of underrepresented
students = 20%. Alternative hypothesis:
percentage of underrepresented students < 20%.
b. Rejecting the null hypothesis means there is
evidence that the percentage of underrepresented
students is less than 20%. Failing to reject the null
hypothesis means there is insufficient evidence to
conclude that the percentage of underrepresented
students is less than 20%.
a. Null hypothesis: pollution levels = EPA
minimum. Alternative hypothesis: pollution levels
< EPA minimum.
b. Rejecting the null hypothesis means there is
evidence that the pollution levels are less than the
EPA minimum. Failing to reject the null
hypothesis means there is insufficient evidence to
conclude that the pollution levels are less than the
EPA minimum.
Null hypothesis: mean annual mileage of cars in
the fleet = 11,725 miles. Alternative hypothesis:
mean annual mileage of cars in the fleet > 11,725.
The result is significant at the 0.01 level, and
provides good evidence for rejecting the null
hypothesis.
Null hypothesis: proportion of supporting voters =
0.5. Alternative hypothesis: proportion of
supporting voters > 0.5. The result is not
significant at the 0.05 level, and there are no
grounds for rejecting the null hypothesis.
Null hypothesis: mean stay = 2.1 days. Alternative
hypothesis: mean stay > 2.1 days. The result is not
significant at the 0.05 level, and there are no
grounds for rejecting the null hypothesis.
Null hypothesis: mean bounce height = 90.10
inches. Alternative hypothesis: mean bounce
height < 90.10 inches. The result is significant at
the 0.05 level, and provides evidence for rejecting
the null hypothesis.
Null hypothesis: mean income = $50,000.
Alternative hypothesis: mean income > $50,000.
The result is significant at the 0.01 level, and
provides good evidence for rejecting the null
hypothesis.
Null hypothesis: mean ownership = 7.5 years.
Alternative hypothesis: mean ownership < 7.5
years. The result is not significant at the 0.05
level, and there are no grounds for rejecting the
null hypothesis.
FURTHER APPLICATIONS
45. The margin of error is 1/ 13, 000 = 0.009 , or
0.9%, and the 95% confidence interval is 64.1% to
65.9%. This means we can be 95% confident that
the actual percentage of viewers who watched
American Idol was between 64.1% and 65.9%.
46. Null hypothesis: mean birth weight = 3.38 kg.
Alternative hypothesis: mean birth weight > 3,38
kg. The result is significant at the 0.01 level, and
provides good evidence for rejecting the null
hypothesis.
47. The margin of error is approximately 1/ n . If we
want to decrease the margin of error by a factor of
2, we must increase the sample size by a factor of
1
1
1 1
4, because
=
= ⋅
.
4n 2 n 2 n
48. The margin of error is approximately 1/ n . If we
want to decrease the margin of error by a factor of
10, we must increase the sample size by a factor of
1
1
1 1
100, because
=
= ⋅
.
10
n
100n 10 n
49. The margin of error, 3%, is consistent with the
sample size because 1/ 1019 = 0.03 = 3%.