Download ISM_Chapter 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CHAPTER 4 ANSWERS
Section 4.1
Statistical Literacy and Critical Thinking
1
2
3
4
5
6
7
8
9
10
11
12
An outlier in a data set is a value that is much higher or much lower than
almost all other values. This definition is not exact enough to clearly and
objectively determine whether a value is an outlier. Although there are
statistical tools that can aid in such a determination, there is also some
judgment required.
The median will do a better job of describing the income of a typical person
in the class since the professor’s salary will just be the largest salary in
the list of salaries, but will not affect the median. The mean, however,
will be greatly affected by the professor’s outlier salary since the mean is
obtained by summing all of the salaries and dividing by the number of
salaries, 25.
It is not likely that the result will be a good estimate of the mean
commuting time for all workers. This procedure treats all of the states
equally, regardless of the number of commuters and the number of large
cities. Those states with more commuters should be weighted more heavily in
determining a mean for all commuters.
No. The numbers on the jerseys are just labels for the names of the players.
They do not measure or count anything, so the mean would be a meaningless
statistic.
This statement does not make sense. There is only one mean for a data set.
This statement is sensible. A set of data may have more than one mode. For
example, the set of data 65.2, 65.2, 72.3, 75.0, 72.3, 81.4 has two numbers
that occur more often than any other values: 65.2 and 72.3.
This statement is sensible. It is possible for a set of data to have the
same values for the mean, median, and mode. For example, the data set
consisting of 4, 6, 6, 6, 8 has mean, median, and mode all equal to 6.
The statement does not make sense. A mean calculated from the original raw
data does not have to be equal to a mean calculated from a frequency table
for the same data. The reason is that when using a frequency table to
compute a mean, all of the values in a bin are assumed to be equal when, in
facet, they are usually not equal.
The median best describes average income of adults in a large city since it
is not affected by the very large incomes of a relatively small group of
people.
Half of the adults will have incomes below the median and half will
have incomes above the median.
Since oranges packed in a large box are usually pre-sorted so that they are
similar in size, either the mean or the median will provide a good average.
If the box contains a random collection of oranges just picked, the mean
would be a better (and quicker) average to use since only the total weight
and total number of oranges are needed to find it.
The median would best describe the average number of times that people change
jobs since it would not be influenced by the large numbers of changes made by
a few people.
The mean would best describe the average number of pieces of lost luggage per
flight. There will likely be very few large numbers to influence the value
of the mean and the mean also reflects the total number of pieces lost.
Concepts and Applications
13
For the mean, total the 8 numbers and divide by 8.
more decimal place than shown in the data.
Round the answer to one
52
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.1, WHAT IS AVERAGE?
MEAN
14
98.6 98.6 98.0 98.0 99.0 98.4 98.4 98.4 98.4 98.6
10
984.0
98.40
10
0.27 0.17 0.17 0.16 0.13 0.24 0.29 0.24 0.14 0.16 0.12 0.16
12
2.25
0.188
12
For the median, first put the twelve numbers in increasing order.
Since
there is an even number of data values, the median is the average of the
middle two (sixth and seventh) numbers. Thus median = (0.16 + 0.17)/2 =
0.165.
The mode is the number that occurs most often. Since 0.16 occurs three times
and no other number occurs more than twice, 0.16 is the mode.
For the mean, total the 12 numbers and divide by 12. Round the answer to one
more decimal place than shown in the data.
MEAN
17
58.3
For the median, first put the ten numbers in increasing order. Since there
is an even number of data values, the median is the average of the middle two
(fifth and sixth) numbers. Since the ordered list is 98.0, 98.4, 98.4, 98.4,
98.4, 98.4, 98.6, 98.6, 98.6, 99.0, the median = (98.4 + 98.4)/2 = 98.4.
The mode is the number that occurs most often. Since 98.4 occurs four times
and no other number occurs more than three times, 98.4 is the mode.
For the mean, total the 12 numbers and divide by 12. Round the answer to one
more decimal place than shown in the data.
MEAN
16
466
8
For the median, first put the eight numbers in increasing order. Since there
is an even number of data values, the median is the average of the middle two
(fourth and fifth) numbers. Thus median = (53 + 58)/2 = 55.5.
The mode is the number that occurs most often. Since 49 occurs twice and no
other number occurs more than once, 49 is the mode.
For the mean, total the 10 numbers and divide by 10. Round the answer to one
more decimal place than shown in the data.
MEAN
15
53 52 75 62 68 58 49 49
8
53
98 92 95 87 96 90 65 92 95 93 98 94
12
1095
91.3min.
12
For the median, first put the twelve numbers in increasing order.
Since
there is an even number of data values, the median is the average of the
middle two (sixth and seventh) numbers. Thus median = (93 + 94)/2 = 93.5
minutes.
The mode is the number that occurs most often. Since 92, 95, and 98 each
occurs two times and no other number occurs more than once, 92 and 95 and 98
minutes are all modes.
For the mean, total the 11 numbers and divide by 11. Round the answer to one
more decimal place than shown in the data.
MEAN
0.72 0.90 0.84 0.68 0.84 0.90 0.92 0.84 0.64 0.84 0.76
11
888
0.807mm.
11
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
54
18
CHAPTER 4, DESCRIBING DATA
For the median, first put the eleven numbers in increasing order.
Since
there are an odd number of data values, the median is the middle (sixth)
number. Thus median = 0.84 mm.
The mode is the number that occurs most often. Since 0.84 occurs four times
and no other number occurs more than twice, 0.84 mm is the mode.
For the mean, total the 15 ages and divide by 15. Round the answer to one
more decimal place than shown in the data.
MEAN
19
For the median, put the fifteen numbers in increasing order. Since there are
an odd number of data values, the median is the middle (eighth) number. Thus
the median = 57.
The mode is the number that occurs most often. Since 57 occurs four times
and no other number occurs more than twice, the mode is 57.
For the mean, total the 11 weights and divide by 11. Round the answer to one
more decimal place than shown in the data.
MEAN
20
21
22
57 61 57 57 58 57 61 54 68 51 49 64 50 48 65
15
857
57.1.
15
0.957 0.912 0.842 0.925 0.939 0.886 0.914 0.913 0.958 0.947 0.920
11
10.113
0.9194
11
For the median, first put the eleven numbers in increasing order.
Since
there are an odd number of data values, the median is the middle (sixth)
number. Thus median = 0.0.920 g.
The mode is the number that occurs most often. Since no value occurs more
than once, there is no mode.
For the mean, total the 22 weights and divide by 22. Round the answer to one
more decimal place than shown in the data.
The sum of the 22 weights is 123.61 g, so the mean is 123.61/22 = 5.619 g.
For the median, first put the 22 numbers in increasing order.
Since there
is an even number of data values, the median is the average of the middle two
(eleventh and twelvth) numbers. Thus median = (5.59 + 5.60)/2 = 5.595 g.
The mode is the number that occurs most often. Since 5.58 occurs three times
and no other number occurs more than twice, 5.58 g is the mode.
a)
For the mean, total the seven areas and divide by 7.
The sum of the seven areas is 1,103,100 square miles, so the mean is
1103100/7 = 157,586 square miles.
For the median, first put the seven numbers in increasing order.
Since there are an odd number of data values, the median is the middle
(fourth) number. Thus median = 104,100 square miles.
b)
Alaska is an outlier on the high end. Without Alaska, the mean is
487,900/6 = 81,317 square miles. The median is the average of the
third and fourth values = (53200 + 104100)/2 = 78650 square miles.
c)
Connecticut is an outlier on the low end. Without Connecticut, the
mean is 1,097,600/6 = 182,933 square miles. The median is the average
of the third and fourth values = (104100 + 114000)/2 = 109050 square
miles.
a)
The mean equals the total weight/7 cans = 5.6866/7 = 0.8124 pounds.
The median is the fourth smallest number in the ordered list of seven
weights, or 0.8161 pounds.
b)
0.7901 is an outlier since it is considerably lower than all of the
other six values.
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.1, WHAT IS AVERAGE?
23
c)
If the outlier is excluded, the mean becomes 4.8965/6 = 0.8161 and the
median becomes the average of the third and fourth numbers in the
ordered list of six numbers, or (0.8161 + 0.8165) = 0.8163.
a)
MEAN
b)
Since the mean will be the total divided by five, the total will need
to be 5 x 75 =375 in order for the mean to be 75 after the next quiz.
Since the total is already 295 after the first four quizzes, the next
quiz will need to be 375 - 295 = 80.
If you achieve a score of 100, your mean score will be (295+100)/5 =
395/5 = 79. Thus it’s not possible to have a mean score higher than 79
after the next quiz.
c)
24
a)
MEAN
70 75 80 70
4
295
4
60 70 65 85 85
5
76.75
365
5
73.0
b)
25
26
27
28
29
30
31
55
Since the mean will be the total divided by six, the total will need to
be 6 x 75 =450 in order for the mean to be 75 after the next quiz.
Since the total is already 365 after the first five quizzes, the next
quiz will need to be 450 - 365 = 85.
c)
If you achieve a score of 100, your mean score will be (365+100)/6 =
465/6 = 77.5. Thus 77.5 is the maximum mean score that you could have
after the next quiz.
Since the mean equals the total divided by 6, you must have a total of 480 in
order to have a mean of 80. If you get 90 on the next quiz, you will have a
total of 570 for seven quizzes for a mean of 570/7 = 81.4. The maximum mean
score that you could have after the next quiz would result if you scored a
100. This would make your total 580 and your mean would be 580/7 = 82.9.
The minimum mean score that you could have after the next quiz would result
if you scored a zero. In that case, your new mean would be 480/7 = 68.6.
The number of hits that she has so far is 30 x .300 = 9. If she gets a hit
in her next at-bat, she will have 10 hits in 31 at-bats. Her new batting
average will be 10/31 = .323.
The mean score of your students is (55 + 60 + 68 + 70 + 87 + 88 + 95)/7 =
523/7 = 74.7. The median score is 70. Thus if the “average” score reported
by the district is a mean, your fourth graders are above average; if it is a
median, the fourth graders are below average.
The mean height (in inches) of your players is (77 + 78 + 78 + 84 + 86)/5 =
403/5 = 80.6" or 6' 8.6". The median height is 78" or 6' 6". The answer to
the question depends on the meaning of “average.” If the “average” height
reported by the league is a mean, your team is above average height; if it is
a median, the team is below average height.
The mean weight of all of the peaches is the total weight divided by the
total number of peaches or (18 + 22 + 24) pounds/(50 + 55 + 60) = 64/165 =
0.39 pounds.
No. The classes are not of equal size. If we think of the two percentages
as points out of 100, then the first class had a total number of points equal
to 25 x 86 = 2150 while the second had 30 x 84 = 2520. The mean for the two
classes combined is the total number of points divided by the total number of
students or 4670/55 = 84.91
Each student is taking three classes with enrollments of 20 each and one
class with an enrollment of 100, so the mean size of each student’s classes
is 160/4 = 40. There are three classes with 100 students each and 45 classes
with 20 students each, making a total enrollment of 1200 students in 48
classes. Thus the mean enrollment per class is 1200/48 = 25. Both means are
correct, but they describe different means. The principal’s mean provides
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
56
32
CHAPTER 4, DESCRIBING DATA
the mean class size per class since it takes into account all classes taken
by all students, while the parents’ mean provides the mean class size per
student.
This requires a weighted mean of the grades where the weights are the
percentages. Therefore,
Mean=
(15)(75)+(20)(90)+(40)(85)+(25)(72)
15 20 40 25
33
Batting Average=
Total Number of Hits
Total Number of At Bats
8125
100
81.25
203
4 3 5
5
12
.
0.417
35
This number gives the mean number of hits per time at bat.
No. Suppose that the player had 400 hits in 1000 at-bats (.400 average)
followed by 2 hits in 4 at-bats. The player now has 402 hits in 1004 at-bats
for an average of .4003 (which would still be reported as a .400 average).
No. The average would be 10% only if the two farms produced exactly the same
number of eggs. To demonstrate that the average might not be 10%, suppose
one farm had 8% of 1000 eggs (80 eggs) with salmonella, while the other had
12% of 3000 eggs (360 eggs) with salmonella. Altogether, there are 440 eggs
out of 4000 with salmonella, giving a percentage of 440/4000 or 11%, not 10%.
36
a)
Batting Average=
b)
Slugging Average=
34
Total Number of Hits
Total Number of At Bats
Total Number of Bases
Total Number of At Bats
3 2 2
5 4 5
3 4 6
545
7
14
13
14
0.500
0.929
c)
37
38
Yes. For example, if a player has 2 home runs in 4 at-bats, the
slugging percentage is 8/4 = 2.000.
Each share of stock gets one vote. If a Yes vote counts 1 point and a No
vote counts 0 points, then the outcome of the vote is (400x1 + 600x0) /
(400+600) = 400/1000 = 0.400. This represents the average number of Yes
votes per vote. Since the average is less than 0.5, the vote fails.
Alternatively, we can just say that there are 400 Yes votes and 600 No votes,
so the item on which the vote is being taken does not pass.
This is a weighted mean with the course credits being the weights. Thus,
GPA=
39
40
(5)(4)+(3)(3)+(3)(2)+(3)(1)
5 3 3 3
38
14
2.71
The data are at the nominal level of measurement, so the only measure of
center that makes sense is the mode. The mode is 1, indicating that the
smooth-yellow peas occur more than any other phenotype.
The population center has been moving westward with time, reflecting
increased population in western states relative to eastern states.
Section 4.2
Statistical Literacy and Critical Thinking
1
2
3
A graph is symmetric if its left half is a mirror image of its right half.
This distribution is uniform (or rectangular). The distribution is
symmetric, not skewed, and there are no modes.
The students in the statistics class have satisfied some prerequisites for
college, and perhaps also for the class. It is therefore probable that
people with lower IQ scores are not present in the class as they would be for
the randomly selected adults. Thus there will be less variability in the IQ
scores of the students than for the scores or the randomly selected adults.
A graph of the distribution of the student IQ scores will concentrated in a
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.2, SHAPES OF DISTRIBUTIONS
4
5
6
7
8
57
narrower range (less spread) than the graph for IQ scores of randomly
selected adults.
Skewness refers to a lack of symmetry with the graph more spread out on one
side than on the other.
This statement is not sensible. A distribution can have any number of modes
and still be symmetric.
This statement is not sensible. With a symmetric distribution, the mean and
median are always equal.
This statement does not make sense. If the distribution is uniform, the
graph of the distribution is a horizontal straight line, so there cannot be a
mode consisting of a single value.
This statement makes sense. A distribution can be left-skewed with a single
mode.
Concepts and Applications
9
Times Between Eruptions of Old Faithful
80
60
40
20
0
45 50 55 60 65 70 75 80 85 90 95 100 105 110
Minutes
This distribution has two modes (at 50 and 80 minutes), is left-skewed, and
has wide variation.
10
Failure Time of Computer Chips
50
40
30
20
10
0
-10
1
2
3
4
5
6
7
8
9
10
11
12
Times(months)
This distribution has one mode (at 1 month), is right skewed, and has
moderate variation.
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
58
CHAPTER 4, DESCRIBING DATA
11
Weight of Rugby Players
80
60
40
20
0
65
70
75
80
85
90
95 100 105 110 115 110
Weight (kg)
This distribution is single peaked, is nearly symmetric, and has moderate
variation.
12
Weightsof a Sample of Pennies
40
30
20
10
0
-10
2.48
2.56
2.64
2.72
2.80
2.88
2.96
3.04
3.12
Weights (grams)
13
The distribution is bimodal and is roughly symmetric. The gap between the
left portion of the distribution and the right portion reflects the fact that
this graph actually includes two different populations: Pennies made before
1983 and pennies made in 1983 or later.
a)
The distribution of incomes will have a shape similar to the one shown
below, but its exact shape cannot be determined from the information
given. Since the mean is greater than the median, it will be rightskewed.
Frequency
Med Mean
35000 41000
Max
250000
Income
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.2, SHAPES OF DISTRIBUTIONS
b)
c)
14
a)
b)
c)
15
a)
b)
16
a)
b)
17
18
a)
b)
a)
19
a)
b)
20
21
a)
b)
a)
b)
22
a)
b)
23
a)
b)
24
a)
b)
25
a)
b)
a)
b)
a)
b)
26
27
28
a)
59
About 50% or 150 (half of 300) of the families earned less than $35,000
since that is the value of the median.
No. It depends on the precise distribution. All we can determine is
that less than half of the families earned more than $41,000.
More than half of the days (183 or more) had no rainfall, so then the
minimum and the median are both zero.
The distribution is right-skewed. There are many days with no rainfall
(the mode is 0), probably quite a few with a little rainfall, and maybe
a small number with greater rainfall. The mean is only 0.083 inches,
so there was a total rainfall for the year of only about 30 inches.
No. Since there was zero rainfall on more than half of the days, it
rained on fewer than half of the days (182 or fewer).
We would expect one mode of $0 because an income of $0 is probably the
most common value.
Right-skewed. Most of the incomes will be at or near zero, with only a
few that are much greater, including the instructor’s.
The distribution is likely to have one mode.
The distribution is likely to be nearly symmetric, perhaps a little
left-skewed since there is a possible range of 75 points below the mean
and only 25 points above it.
The distribution is likely to have one mode.
The distribution is likely to be nearly symmetric.
The distribution is likely to have one or two modes; there might be one
mode for linemen, fullbacks, and linebackers, and a second mode for
running backs, wide receivers, and defensive backs.
The distribution will likely be nearly symmetric since there are nearly
equal numbers of the two groups of players mentioned above.
The distribution is likely to have two modes since figure skaters tend
to be smaller than hockey players.
We can’t know the skewness for certain without knowing how many skaters
are in each group. Assuming equal numbers in each group, the
distribution will probably be right skewed since no professional figure
skaters are heavy, while some of the hockey players may be light.
There will be two modes since SUVs will be heavier than compacts.
It is symmetric due to the equal number of cars in each group.
The distribution is likely to have one mode.
The distribution is likely to be right skewed since most flights will
leave with little or no delays, but a few flights will have long
delays. Delays shorter than zero are not possible.
The distribution will have one mode somewhere near the speed limit.
It will be right-skewed since there will be a few people who exceed the
speed limit, but even fewer who are much below the limit.
The distribution is likely to have one mode (there’s no reason to
suspect that museum goers must be either old or young).
It will be right-skewed since younger adults may also have children
with them and retirees are more likely to be alone.
The distribution is likely to have one mode.
The distribution is likely to be right-skewed. There will be a greater
percentage of young people and families with children.
Since there will always be exactly 4 players, there is one mode.
The distribution is likely to be symmetric.
The distribution is likely to have one mode.
The distribution is likely to be symmetric.
The distribution is likely to have one mode.
The distribution is likely to be right-skewed since this is the right
tail of a distribution that is already skewed to the right.
The distribution is likely to have one mode since this is similar to
the income distribution of all adults.
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
60
CHAPTER 4, DESCRIBING DATA
29
b)
a)
b)
30
a)
b)
The distribution
The distribution
The distribution
stars than there
The distribution
The distribution
tend to not make
is likely to be right-skewed.
is likely to have one mode.
is likely to be right-skewed since there are fewer
are “journeyman” ball players.
is likely to have one mode.
is likely to be right-skewed since low average players
the team.
Section 4.3
Statistical Literacy and Critical Thinking
1
2
3
4
5
6
7
8
The standard deviation is based on how much values deviate from the mean.
The movie patrons are likely to have more variation in their IQs than the
students in a physics class. The students are likely to be a more
homogeneous group since they have been filtered by being in college and
additionally by being able to satisfy the mathematical prerequisites for the
class.
This statement is incorrect because it defines the standard deviation in
terms of the minimum and the maximum values, but the standard deviation uses
every value in its computation.
It means that about 25% of the values are at or below 93.2 and about 75% are
above 93.2.
This statement does not make sense. The median is the 50th percentile, so it
cannot be the 60th percentile.
This statement makes sense. Since annual incomes have a distribution that is
right-skewed, the mean will be larger than the median.
This statement makes sense. The annual incomes of the instructors are likely
to be in a smaller range than the incomes of physicians since the group of
physicians may include those in general practice, pediatricians, brain
surgeons, internists, etc.
This statement does not make sense. The standard deviation is a type of
average, so it does not necessarily become larger as the sample size
increases.
Concepts and Applications
9
Range = highest value – lowest value = 75 – 49 = 26 seconds
The Mean is 58.25 seconds.
Time
53
52
75
62
68
58
49
49
Deviation =
Time - Mean
-5.25
-6.25
16.75
3.75
9.75
-0.25
-9.25
-9.25
Sum =
Standard Deviation=
Deviation2
27.5625
39.0625
280.5625
14.0625
95.0625
0.0625
85.5625
85.5625
627.5000
Sum
9-1
627.5
8
9.5 seconds
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.3, MEASURES OF VARIATION
10
Range = highest value – lowest value = 99.0 – 98.0 = 1.00 degrees
The Mean is 98.44 degrees.
Temperature Deviation =
Deviation2
Temperature - Mean
98.6
0.16
0.0256
98.6
0.16
0.0256
98.0
-0.44
0.1936
98.0
-0.44
0.1936
99.0
0.56
0.3136
98.4
-0.04
0.0016
98.4
-0.04
0.0016
98.4
-0.04
0.0016
98.4
-0.04
0.0016
98.6
0.16
0.0256
Sum =
0.7840
Standard Deviation=
11
0.7840
9
0.30 degrees
Range = highest value – lowest value
The Mean is 0.1875.
Concentration Deviation =
Concentration - Mean
0.27
0.0825
0.17
-0.0175
0.17
-0.0175
0.16
-0.0275
0.13
-0.0575
0.24
0.0525
0.29
0.1025
0.24
0.0525
0.14
-0.0475
0.16
-0.0275
0.12
-0.0675
0.16
-0.0275
Sum =
Standard Deviation=
12
Sum
10-1
Sum
12-1
0.035825
11
= 0.29 – 0.12 = 0.170
Deviation2
0.006806
0.000306
0.000306
0.000756
0.003306
0.002756
0.010506
0.002756
0.002256
0.000756
0.004556
0.000756
0.035825
0.057
Range = highest value – lowest value = 98 – 65 = 33
The Mean is 91.25.
Time Deviation = Deviation2
Time - Mean
98
6.75
45.5625
92
0.75
0.5625
95
3.75
14.0625
87
-4.25
18.0625
96
4.75
22.5625
90
-1.25
1.5625
65
-26.25
689.0625
92
0.75
0.5625
95
3.75
14.0625
93
1.75
3.0625
98
6.75
45.5625
94
2.75
7.5625
Sum =
862.2500
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
61
62
CHAPTER 4, DESCRIBING DATA
Standard Deviation=
13
862.25
11
8.9 minutes.
Range = highest value – lowest value = 0.92 – 0.64 = 0.280
The Mean is 0.807273.
Length
0.72
0.90
0.84
0.68
0.84
0.90
0.92
0.84
0.64
0.84
0.76
Deviation =
Length - Mean
-0.087270
0.092727
0.032727
-0.127270
0.032727
0.092727
0.112727
0.032727
-0.167270
0.032727
-0.047270
Sum =
Standard Deviation=
14
Sum
12-1
Deviation2
0.007617
0.008598
0.001071
0.016198
0.001071
0.008598
0.012707
0.001071
0.027980
0.001071
0.002235
0.088218
Sum
11-1
0.088218
10
0.094 mm.
Range = highest value – lowest value = 68 – 48 = 20.0
The Mean is 57.13333.
Age
57
61
57
57
58
57
61
54
68
51
49
64
50
48
65
Deviation =
Age - Mean
-0.13333
3.86667
-0.13333
-0.13333
0.86667
-0.13333
3.86667
-3.13333
10.8667
-6.13333
-8.13333
6.86667
-7.13333
-9.13333
7.86667
Sum =
Standard Deviation=
Deviation2
0.01778
14.95111
0.01778
0.01778
0.75111
0.01778
14.95111
9.81778
118.08440
37.61778
66.15111
47.15111
50.88444
83.41778
61.88444
505.73330
Sum
15-1
505.7333
14
6.0 years.
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.3, MEASURES OF VARIATION
15
Range = highest value – lowest value = 0.958 – 0.842 = 0.1160
The Mean is 0.919364.
Weight Deviation =
Deviation2
Weight - Mean
0.957
0.037636
0.001416
0.912
-0.007360
0.000054
0.842
-0.077360
0.005985
0.925
0.005636
0.000032
0.939
0.019636
0.000386
0.886
-0.033360
0.001113
0.914
-0.005360
0.000029
0.913
-0.006360
0.000041
0.958
0.038636
0.001493
0.947
0.027636
0.000764
0.920
0.000636
0.000041
Sum =
0.011313
Standard Deviation=
16
Sum
11-1
0.11313
10
0.0336 g.
Range = highest value – lowest value = 5.84 – 5.52 = 0.320 g.
The Mean is 5.618636.
Weight Deviation =
Deviation2
Weight - Mean
5.60
-0.01864
0.000347
5.63
0.01136
0.000129
5.58
-0.03864
0.001493
5.56
-0.05864
0.003438
5.66
0.04136
0.001711
5.58
-0.03864
0.001493
5.57
-0.04864
0.002365
5.59
-0.02864
0.000820
5.67
0.05136
0.002638
5.61
-0.00864
0.000075
5.84
0.22136
0.049002
5.73
0.11136
0.012402
5.53
-0.08864
0.007856
5.58
-0.03864
0.001493
5.52
-0.09864
0.009729
5.65
0.03136
0.000984
5.57
-0.04864
0.002365
5.71
0.09136
0.008347
5.59
-0.02864
0.000820
5.53
-0.08864
0.007856
5.63
0.01136
0.000129
5.68
0.06136
0.003765
Sum =
0.119259
Standard Deviation=
Sum
22-1
0.119259
21
0.075 g.
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
63
64
CHAPTER 4, DESCRIBING DATA
17
Length
2
6
2
2
1
4
4
2
4
2
3
8
4
2
2
7
7
2
3
11
ROOF
Dev. =
Length - Mean
-1.9
2.1
-1.9
-1.9
-2.9
0.1
0.1
-1.9
0.1
-1.9
-0.9
4.1
0.1
-1.9
-1.9
3.1
3.1
-1.9
-0.9
7.1
Sum =
Deviation2
Length
3.61
4.41
3.61
3.61
8.41
0.01
0.01
3.61
0.01
3.61
0.81
16.81
0.01
3.61
3.61
9.61
9.61
3.61
0.81
50.41
129.80
3
3
3
3
5
2
3
3
3
2
4
2
2
3
2
3
5
3
4
4
Standard Deviation=
Sum
20-1
129.8
19
Standard Deviation=
Sum
20-1
15.8
19
HAT
Dev.=
Length - Mean
-0.1
-0.1
-0.1
-0.1
1.9
-1.1
-0.1
-0.1
-0.1
-1.1
0.9
-1.1
-1.1
-0.1
-1.1
-0.1
1.9
-0.1
0.9
0.9
Sum =
Deviation2
0.01
0.01
0.01
0.01
3.61
1.21
0.01
0.01
0.01
1.21
0.81
1.21
1.21
0.01
1.21
0.01
3.61
0.01
0.81
0.81
15.80
2.6 for Cat on a Hot Tin Roof
0.9 for The Cat in the Hat
Cat on a Hot Tin Roof: Range = 11 – 1 = 10; Standard deviation = 2.6.
The Cat in the Hat: Range = 5 – 2 = 3; Standard Deviation = 0.9.
There is much less variation among the word lengths in The Cat in the Hat.
18
Age
24
24
34
15
19
22
18
20
20
17
Eastbound
Dev. =
Age-Mean
2.7
2.7
12.7
-6.3
-2.3
0.7
-3.3
-1.3
-1.3
-4.3
Sum =
Deviation
7.29
7.29
161.29
39.69
5.29
0.49
10.89
1.69
1.69
18.49
254.1
2
Age
41
24
32
26
39
45
24
21
22
21
Westbound
Dev. =
Age-Mean
11.5
-5.5
2.5
-3.5
9.5
15.5
-5.5
-8.5
-7.5
-8.5
Sum =
Deviation2
132.25
30.25
6.25
12.25
90.25
240.25
30.25
72.25
56.25
72.25
742.50
The means are 21.3 and 29.5, respectively, for eastbound and westbound.
Standard Deviation=
Sum
10-1
254.1
9
5.3 for Eastbound stowaways
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.3, MEASURES OF VARIATION
Standard Deviation=
Sum
10-1
742.5
9
Eastbound: Range = 34 – 15 =
Westbound: Range = 45 – 21 =
The variation in ages for the
larger than the variation for
65
9.1 for Westbound stowaways
19.0; St.
24.0; St.
westbound
eastbound
Dev. = 5.3
Dev. = 9.1
stowaways appears to be substantially
stowaways.
19
Error
2
2
0
0
-3
-2
1
-2
8
1
0
-1
0
1
One Day
Dev. =
Error - Mean
1.5
1.5
-0.5
-0.5
-3.5
-2.5
0.5
-2.5
7.5
0.5
-0.5
-1.5
-0.5
0.5
Sum =
Deviation2
Error
2.25
2.25
0.25
0.25
12.25
6.25
0.25
6.25
56.25
0.25
0.25
2.25
0.25
0.25
89.50
0
-3
2
5
-6
-9
4
-1
6
-2
-2
-1
6
-4
Standard Deviation=
Sum
14-1
89.50
13
Standard Deviation=
Sum
14-1
267.2143
13
Five Days
Dev. =
Error - Mean
0.35714
-2.64286
2.35714
5.35714
-5.64286
-8.64286
4.35714
-0.64286
6.35714
-1.64286
-1.64286
-0.64286
6.35714
-3.64286
Sum =
Deviation2
0.12755
6.98469
5.55612
28.69898
31.84184
74.69898
18.98469
0.41327
40.41327
2.69898
2.69898
0.41327
40.41327
13.27041
267.21430
2.6 for one-day forecasts.
4.5 for five-day forecasts.
One day: Range = 8 – (-3) = 11.0; St. Dev. = 2.6
Five Days: Range = 6 – (-9) = 15.0; St. Dev. = 4.5
The variation in errors for the five-day forecasts of the high temperature
appears to be substantially larger than the variation for one-day forecasts
of the high temperature.
20
Weight
0.15
0.02
0.16
0.37
0.22
No treatment
Dev. =
Weight - Mean
-0.034
-0.164
-0.024
0.186
0.036
Sum =
Deviation2
Weight
0.001156
0.026896
0.000576
0.034596
0.001296
0.063364
2.03
0.27
0.92
1.07
2.38
Standard Deviation=
Sum
5-1
0.063364
4
Standard Deviation=
Sum
5-1
2.95172
4
Treatment
Dev. =
Weight – Mean
0.696
-1.064
-0.414
-0.264
1.046
Sum =
0.126
0.859
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
Deviation2
0.484416
1.132096
0.171396
0.069696
1.094116
0.859029
Fr eque nc y
23
8
7
6
5
4
3
2
1
0
6
7
8
9
10
11
12
6
7
8
9
10
11
12
6
7
8
9
10
11
12
6
7
8
9
10
11
12
8
7
Frequency
22
No treatment: Range = 0.37 – 0.02 = 0.0350; St. Dev. = 0.126
Treatment: Range = 2.38 – 0.27 = 2.110; St. Dev. = 0.859
The variation in weights of trees with no treatment appears to be
the variation in weights of the treated trees.
a)
25/465 = 0.054, so the M&M is in the 5th percentile.
b)
322/465 = 0.692, so the M&M is in the 69th percentile.
c)
224/465 = 0.482, so the M&M is in the 48th percentile.
a)
38/76 = 0.500, so age 38 is in the 50th percentile.
b)
20/76 = 0.263, so age 29 is in the 26th percentile.
c)
71/76 = 0.934, so age 71 is in the 91rd percentile.
a)
6
5
4
3
2
1
0
Fr eque nc y
21
CHAPTER 4, DESCRIBING DATA
8
7
6
5
4
3
2
1
0
8
Frequenc y
66
7
6
5
4
3
2
1
0
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
less than
SECTION 4.3, MEASURES OF VARIATION
b)
Set 1
9
9
9
9
9
Low value
Lower quartile
Median
Upper quartile
High value
Set 2
8
8
9
10
10
Set 3
8
8
9
10
10
Boxplot of 1, 2, 3, 4
12
11
Data
10
9
8
7
6
1
c)
2
3
4
Set 1
Value
Deviation =
Value - Mean
Deviation2
9
0
0
9
0
0
9
0
0
9
0
0
9
0
0
9
0
0
9
0
0
Sum = 0
Standard Deviation=
Sum
7-1
0
6
0
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
Set 4
6
6
9
12
12
67
68
CHAPTER 4, DESCRIBING DATA
Set 2
Value
Deviation =
Value - Mean
Deviation
8
-1
1
8
-1
1
9
0
0
9
0
0
9
0
0
10
1
1
10
1
1
2
Sum = 4
Standard Deviation=
Sum
7-1
4
6
0.816
Set 3
Value
Deviation =
Value - Mean
Deviation
8
-1
1
8
-1
1
8
-1
1
9
0
0
10
1
1
10
1
1
10
1
1
2
Sum = 6
Standard Deviation=
Sum
7-1
6
6
1
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.3, MEASURES OF VARIATION
69
Set 4
Value
Deviation =
Value - Mean
Deviation
6
-3
9
6
-3
9
6
-3
9
9
0
0
12
3
9
12
3
9
12
3
9
2
Sum = 54
Standard Deviation=
54
6
3
d)
The standard deviation takes all of the data into account and increases
as the data become more spread out around the mean.
a)
Set 1
Set 2
Fr eque nc y
Fre quency
8
7
6
5
4
3
2
1
0
3
4
5
6
7
8
8
7
6
5
4
3
2
1
0
9
3
Set 3
5
6
7
8
9
8
7
Frequency
5
4
3
2
1
6
5
4
3
2
1
0
0
3
b)
4
Set 4
8
7
6
Fre quency
24
Sum
7-1
4
5
6
7
8
9
3
4
5
6
7
8
9
In each set, the median is the 4th value in the ordered list, the lower
quartile is the middle value of the lowest three values (2 nd in the
overall list), and the upper quartile is the middle value of the
highest three values (6th in the overall list).
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
CHAPTER 4, DESCRIBING DATA
Set 1
6
6
6
6
6
Low value
Lower quartile
Median
Upper quartile
High value
Set 2
5
5
6
7
7
Set 3
5
5
6
7
7
Boxplotof Set 1, Set 2, Set 3, Set 4
9
8
7
Dat a
70
6
5
4
3
Set 1
c)
Set 2
Set 3
S et 4
Set 1
Value
Deviation =
Value - Mean
Deviation2
6
0
0
6
0
0
6
0
0
6
0
0
6
0
0
6
0
0
6
0
0
Sum = 0
Standard Deviation=
Sum
7-1
0
6
0
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
Set 4
3
3
6
9
9
SECTION 4.3, MEASURES OF VARIATION
Set 2
Value
Deviation =
Value - Mean
Deviation
5
-1
1
5
-1
1
6
0
0
6
0
0
6
0
0
7
1
1
7
1
1
2
Sum = 4
Standard Deviation=
Sum
7-1
4
6
0.816
Set 3
Value
Deviation =
Value - Mean
Deviation
5
-1
1
5
-1
1
5
-1
1
6
0
0
7
1
1
7
1
1
7
1
1
2
Sum = 6
Standard Deviation=
Sum
7-1
6
6
1
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
71
72
CHAPTER 4, DESCRIBING DATA
Set 4
Value
Deviation =
Value - Mean
Deviation
3
-3
9
3
-3
9
3
-3
9
6
0
0
9
3
9
9
3
9
9
3
9
2
Sum = 54
Standard Deviation=
25
Sum
7-1
54
6
3
d)
The standard deviation takes all of the data into account and increases
as the data become more spread out around the mean.
a)
For the faculty,
Mean=
2+3+1+0+1+2+4+3+3+2+1
11
22
11
2.0 years.
Median equals the sixth number in the ordered list and is 2 years.
Range = 4 - 0 = 4 years.
For the students,
Mean=
5+6+8+2+7+10+1+4+6+10+9
11
68
11
6.2 years.
Median equals the sixth number in the ordered list and is 6 years.
Range = 10 - 1 = 9 years.
b)
The lower quartile is the middle value of the lowest 5 values in each
data set and the upper quartile is the middle value of the highest 5
values in each data set.
Low Value
Lower quartile
Median
Upper quartile
High Value
Faculty
0
1
2
3
4
Students
1
4
6
9
10
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.3, MEASURES OF VARIATION
73
Boxplot of Fac ulty , Students
10
8
Data
6
4
2
0
Faculty
Students
c) Computation of standard deviation
Faculty
Value
Deviation =
Value - Mean
Students
Deviation
2
Value
Deviation =
Value - Mean
Deviation
2
2
0
0
5
-1.18
1.3924
3
1
1
6
-0.18
0.0324
1
-1
1
8
1.82
3.3124
0
-2
4
2
-4.18
17.4724
1
-1
1
7
0.82
0.6724
2
0
0
10
3.82
14.5924
4
2
4
1
-5.18
26.8324
3
1
1
4
-2.18
4.7524
3
1
1
6
-0.18
0.0324
2
0
0
10
3.82
14.5924
1
-1
1
9
2.82
7.9524
2
14
6.18
91.6364
The means for faculty and students are given in bold at the bottoms of the
first and fourth columns respectively. The deviations for faculty are
obtained by subtracting the mean from each number in the first column.
Similarly for students. The squared deviations are then placed in the third
and sixth columns, and their totals are shown in bold at the bottom of the
columns. The standard deviations are then found by dividing the sum of the
squared deviations by n-1 = 11-1 = 10 and taking the square root. Thus, the
standard deviations are
Faculty:
Standard Deviation=
Sum
11-1
14
10
1.2
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
CHAPTER 4, DESCRIBING DATA
Students:
d)
e)
26
a)
Sum
11-1
Standard Deviation=
91.6364
10
3.303
By the range rule, the standard deviation is approximately range/4,
which for the faculty is 4/4 = 1 and for the students is 9/4 = 2.25.
Both estimates are low, but are reasonably close.
The students have a higher mean age for their cars and a much greater
variation in ages.
For the school zone,
Mean=
20+18+23+21+19+18+17+24+25
9
185
9
20.6
mph.
Median equals the fifth number in the ordered list and is 20 mph.
Range = 25 - 27 = 8 mph
For the downtown intersection,
Mean=
29+31+35+24+31+26+36+31+28
9
271
9
30.1 mph.
Median equals the fifth number in the ordered list and is 31 mph.
Range = 36 - 24 = 12 mph
b)
The lower quartile is the average of the two middle values of the
lowest 4 values in each data set. The upper quartile is the average of
the two middle values of the highest 4 values in each data set.
School
Downtown
Low Value
17
24
Lower quartile
18
27
Median
20
31
Upper quartile
23.5
33
High Value
25
36
Boxplot of School, Downtown
35
30
Dat a
74
25
20
15
School
Downtown
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.3, MEASURES OF VARIATION
75
c) Computation of standard deviation
School Zone
Downtown
2
Value
Deviation =
Value - Mean
20
-.6
0.36
29
-1.1
1.21
18
-2.6
6.76
31
0.9
0.81
23
2.4
5.76
35
4.9
24.01
21
0.4
0.16
24
-6.1
37.21
19
-1.6
2.56
31
0.9
0.81
18
-2.6
6.76
26
-4.1
16.81
17
-3.6
12.96
36
5.9
34.81
24
3.4
11.56
31
0.9
0.81
25
4.4
19.36
28
-2.1
4.41
20.6
Deviation
66.24
Value
Deviation =
Value - Mean
30.1
Deviation
2
120.89
The means for the school zone and the downtown intersection are given in bold
at the bottoms of the first and fourth columns respectively. The deviations
for the school zone are obtained by subtracting the mean from each number in
the first column. For downtown, the deviations are obtained by subtracting
the mean from each number in the fourth column. The squared deviations are
then placed in the third and sixth columns, and their totals are shown in
bold at the bottom of the columns. The standard deviations are then found by
dividing the sum of the squared deviations by n-1 = 9-1 = 8 and taking the
square root. Thus, the standard deviations are
School Zone:
Downtown:
d)
e)
27
a)
Standard Deviation=
Standard Deviation=
Sum
9-1
Sum
9-1
66.24
8
120.89
8
2.88
3.89
By the range rule, the standard deviation is approximately range/4,
which for the school zone is 8/4 = 2 and for downtown is 12/4 = 3.
Both estimates are low, but are reasonably close.
The average speed is higher downtown and the variation is slightly
greater downtown.
For the first seven Presidents,
Mean=
57+61+57+57+58+57+61
7
1218
7
58.3 years.
Median equals the fourth number in the ordered list and is 57 years.
Range = 61 - 57 = 4 years
For the last seven Presidents,
Mean=
56+61+52+69+64+46+54
7
402
7
57.4 years.
Median equals the fourth number in the ordered list and is 56 years.
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
CHAPTER 4, DESCRIBING DATA
Range = 69 - 46 = 23 years
b)
The lower quartile is the middle value of the lowest 3 values in each
data set. The upper quartile is the middle value of the highest 3
values in each data set.
First 7
Last 7
Low Value
57
46
Lower quartile
57
52
Median
57
56
Upper quartile
61
64
High Value
61
69
Boxplot of First 7, L ast 7
70
65
60
Data
76
55
50
45
First 7
Las t 7
c) Computation of standard deviation
First 7
Last 7
2
Value
Deviation =
Value - Mean
Deviation
57
-1.3
1.69
56
-1.4
1.96
61
2.7
7.29
61
3.6
12.96
57
-1.3
1.69
52
-5.4
29.16
57
-1.3
1.69
69
11.6
134.56
58
0.7
0.49
64
6.6
43.56
57
-1.3
1.69
46
–11.4
129.96
61
2.7
7.29
54
–3.4
11.56
58.3
21.83
Value
57.4
Deviation =
Value - Mean
Deviation
2
363.72
The means for the first 7 and the last 7 presidents are given in bold at the
bottoms of the first and fourth columns respectively. The deviations for the
first 7 are obtained by subtracting the mean from each number in the first
column. Similarly, deviations for the last 7 are obtained by subtracting the
mean from each number in the fourth column. The squared deviations are then
placed in the third and sixth columns, and their totals are shown in bold at
the bottom of the columns. The standard deviations are then found by
dividing the sum of the squared deviations by n-1 = 7-1 = 6 and taking the
square root. Thus, the standard deviations are
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.3, MEASURES OF VARIATION
Standard Deviation=
First 7 Presidents:
Standard Deviation=
Last 7 Presidents:
d)
21.83
1.9
6
60.62
7.8
6
By the range rule, the standard deviation is approximately range/4,
which for the first 7 presidents is 4/4 = 1.00 and for the last 7
presidents is 23/4 = 5.75. Both estimates are low, but are reasonably
close.
The average ages of the first 7 and last 7 presidents are about the
same, but the variation is over three times greater among the last 7
presidents.
e)
a)
For Beethoven’s symphonies,
Mean=
28+36+50+33+30+40+38+26+68
9
349
9
38.8 minutes.
Median equals the fifth number in the ordered list and is 36 minutes.
Range = 68 - 26 = 42 minutes
For Mahler’s symphonies,
Mean=
52+85+94+50+72+72+80+90+80
9
675
9
75.0 minutes.
Median equals the fifth number in the ordered list and is 80 minutes.
Range = 94 - 50 = 44 minutes
b)
The lower quartile is the average of the two middle values in the
lowest 4 values of the data set. The upper quartile is the average of
the two middle values in the highest 4 values of the data set.
Low Value
Lower quartile
Median
Upper quartile
High Value
Beethoven
26
29
36
45
68
Mahler
50
62
80
87.5
94
Boxplot of Beethoven, Mahler
100
90
80
70
Data
28
Sum
7-1
Sum
7-1
77
60
50
40
30
20
Beethoven
Mahler
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
78
CHAPTER 4, DESCRIBING DATA
c) Computation of standard deviation
Beethoven
Mahler
Value
Deviation
Value - Mean
Deviation
28
-10.8
36
2
Value
Deviation
Value - Mean
116.64
52
-23
529
-2.8
7.84
85
10
100
50
11.2
125.44
94
19
361
33
-5.8
33.64
50
-25
625
30
-8.8
77.44
72
-3
9
40
1.2
1.44
72
-3
9
38
-0.8
0.64
80
5
25
26
-12.8
163.84
90
15
225
68
29.2
852.64
80
5
25
1378.96
75.0
38.8
Deviation
2
1908
The mean lengths for Beethoven’s and Mahler’s symphonies are given in
bold at the bottoms of the first and fourth columns respectively. The
deviations for Beethoven’s are obtained by subtracting the mean from
each number in the first column. Similarly for Mahler’s. The squared
deviations are then placed in the third and sixth columns, and their
totals are shown in bold at the bottom of the columns. The standard
deviations are then found by dividing the sum of the squared deviations
by n-1 = 9-1 = 8 and taking the square root. Thus, the standard
deviations are
Beethoven:
Mahler:
Standard Deviation=
Standard Deviation=
Sum
9-1
Sum
9-1
1378.96
8
1908
8
13.13
15.44
d)
29
30
31
32
By the range rule, the standard deviation is approximately range/4,
which for Beethoven’s symphonies is 42/4 = 10.5 and for Mahler’s is
44/4 = 11.0. Both estimates are low, but are reasonably close.
e)
The average length of Mahler’s symphonies is much greater than that of
Beethoven’s, but the variation is about the same for both composers.
The second shop has a slightly lower average delivery time, but its standard
deviation is so large that you risk the pizza being delivered 20 to 40
minutes late. It could, of course, arrive much earlier than you expected as
well. If you need to know the arrival time quite closely, you should order
from the first shop, particularly since the average delivery time is only
three minutes longer.
Kevin, who has the larger standard deviation, is more likely to serve up more
very small servings and is more likely to generate more complaints.
A lower standard deviation means more certainty in the value of the portfolio
and less risk.
The batting averages are more closely bunched today than in the past. Since
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.4, STATISTICAL PARADOXES
79
the overall average has remained at 0.260, averages above 0.350, should be
less common today.
1
2
3
4
5
6
7
8
Section 4.4
Statistical Literacy and Critical Thinking
A false positive occurs when the test indicates drug use for someone who does
not actually use drugs. A false negative occurs when the test indicates that
drugs are not used, but the subject actually does use drugs.
Cancer tests are not perfect. Some test results may be positive even though
the patient does not have cancer.
A polygraph test can be positive (indicating that the subject is lying) even
though the subject is telling the truth. If most people tested are not
lying, a small percentage of false positive results can still be a fairly
large number, while a small number of liars may produce a high percentage of
true positives, the actual number of true positives may be quite small.
Thus, among the positives, there may be more false positives than true
positives, resulting in a high proportion of false accusations.
Yes. For example, consider the table below.
Quarterback Half 1
Half 2
Game
A
25/42=0.60 2/3=0.67
27/45=0.60
B
5/10=0.50
59/90=.66 64/100=0.64
Quarterback A has the higher completion percentage in each half, but
Quarterback B has the higher completion percentage for the entire game.
[Probably, no quarterback has thrown 100 passes in a game, but the example
illustrates how this result can happen.]
This statement makes sense. When both people have the same number of scores
in each category, if one person has a higher average in each category, that
person will also have a higher average overall.
This statement is not true. It is similar to the quarterback example in
Exercise 4 above. [Substitute Ann for A and Bret for B in Exercise 4 and you
can see that it is possible for Ann to have the higher average in each half
of the season, but Bret is higher overall for the whole season.]
This statement does not make sense. These are two entirely different
probabilities.
This statement does not make sense. If the test is 90% accurate, it means
that 90% of drug users will test positive and 90% of non-users will test
negative. It does not mean that 90% of those who test positive are drug
users. This situation is similar to the mammogram example in the text. Even
though that test was 85% accurate, only about 5% of patients with positive
test results actually have cancer.
Concepts and Applications
9
10
11
Josh had the higher batting average in the first and second halves of the
season, but Jude had 80 hits in 200 at bats (.400 average) for the entire
season while Josh had 85 hits in 220 at bats (.386 average), so Jude the
higher overall batting average. This is an illustration of Simpson’s Paradox
and it can happen because of the unequal numbers of at bats for the players
in both halves of the season.
Allan had the higher completion percentage in both halves of the game, but
Abner had 14 for 31 (45% completions) while Allan had 11 for 26 (42%
completions) for the entire game, so Abner had the higher completion
percentage for the entire game. This is an illustration of Simpson’s Paradox
and it can happen because of the unequal numbers of passes thrown for the
players in both halves of the game.
a)
New Jersey had the higher scores in both racial categories, but
Nebraska had the higher overall average across both racial categories.
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
80
CHAPTER 4, DESCRIBING DATA
b)
c)
This can happen because of the unequal percentages of whites and nonwhites in the two states.
The overall average for New Jersey is a weighted average with the
weights being the percentages of whites and non-whites in New Jersey.
Thus
Mean=
12
13
14
15
16
66(283)+34(252)
66 34
27246
100
272.46or 272
a)
The average SAT scores in all five grade categories went down from 1988
to 1998 (average scores were lower in 1998).
b)
The overall average SAT scores went up from 1988 to 1998 (average
scores were higher in 1998).
c)
This is an illustration of Simpson’s Paradox and it can happen because
of the unequal percentages of students in each of the grade categories
during the two years.
a)
The death rates in New York City were
Whites:
8400/4,675,000 = 0.001797
Non-whites: 500/92,000 = 0.005435
Overall:
8900/4,767,000 = 0.001867
b)
The death rates in Richmond were
Whites:
130/81,000 = 0.001605
Non-whites: 160/47,000 = 0.003404
Overall:
290/128,000 = 0.002266
c)
New York City had higher TB death rates than Richmond for both whites
and non-whites in 1910, but Richmond had the higher overall TB death
rate. This is an illustration of Simpson’s Paradox and it can happen
because the unequal proportions of whites and non-whites in the two
cities, New York City being 98% white (and 2% non-white) while Richmond
was 63% white (and 37% non-white).
The Gazelles had the higher mean improvement in both categories, but the
cheetahs had the higher overall mean improvement. This could happen if the
percentages of the teams participating in weight training were different for
the two teams. For the Gazelles, let x represent the proportion of the team
that participated in weight training. Then 1 – x is the proportion that did
not. The overall team average improvement is a weighted average of the two
group averages with the weights being x and
1 – x. Thus x(10) + (1 – x)(2)
= 6.0. Simplifying, this becomes
8x + 2 = 6.0 or 8x = 4. From this, we see that x must be 0.5, so 50% of the
Gazelles participated in weight training. Similarly for the Cheetahs, we
have x(9) + (1 – x)(1) = 6.2, Simplifying,
8x + 1 = 6.2 or 8x = 5.2. From this we see that x = 5.2/8 = 0.65, so 65% of
the Cheetahs participated in weight training.
a)
Spelman College has a home record of 10/29 = 0.345, while Morehouse
College has a home record 9/28 = 0.321. For away games, Spelman has a
record of 12/16 = 0.750, while Morehouse has a record of 56/76 = 0.737.
Thus Spelman has the better record both home and away.
b)
Spelman’s overall record is 22/45 = 0.489, while Morehouse’s overall
record is 65/104 = 0.625, so Morehouse has the better overall record.
c)
At the end of the season, it makes no difference where games were won
and lost, so the only record that should be used for comparisons is the
overall record. Thus Morehouse College has the better team.
a)
Among women, Drug B cured 101/900 = 0.112 while Drug A cured 5/100 =
0.050. Among men, Drug B cured 196/200 = 0.980, while Drug A cured
400/800 = 0.500. Thus, Drug B did better among women and among men.
b)
Overall, Drug A cured 405/900 = 0.450, while Drug B cured 297/1100 =
0.270. Thus, Drug A did better overall.
c)
In this case, you might want to look at the results from the patients’
viewpoints. A woman would probably prefer Drug B because its cure rate
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
SECTION 4.4, STATISTICAL PARADOXES
17
18
19
20
81
was almost twice that of Drug A for women. Similarly, a man would
probably prefer Drug B because its cure rate was almost twice that of
Drug A for men. The cure rates for both drugs are very different for
men and women, so using the overall rate for comparison doesn’t make
much sense.
a)
Of the 2,000 employees, 1% or 20 use drugs. The polygraph test should
detect 90% of those 20, or 18. The other 2 go undetected. These
figures are shown in the first column. Among the 1980 non-users, the
test should be negative for 90% of them or 1782. The test should find
the remaining 198 to be lying. These figures are shown in the second
column.
b)
The number accused of lying is 18 + 198 = 216. Only 18 of these were
actually lying while 198 were telling the truth. Thus, 198 out of 216,
or 91.7%, were falsely accused.
c)
The number found to be truthful is 2 + 1782 = 1784. Of these, 1782, or
99.9%, really were truthful.
a)
Out of 4,000 people, 1.5% or 60 people should have the disease. This
is the total in the first column. Since the test is 80% accurate, it
should detect 80% of those 60 people, or 48. This is the number of
positive tests in the first column. Of the remaining 3.940 who do not
have the disease, the test should be negative for 80%, or 3,152.
b)
This is the number of negative tests in the first column. The rest of
the table follows automatically by addition and subtraction.
c)
Of the 836 who test positive, 48, or 5.7%, actually have the disease.
This is the proportion of those having the disease given that they have
tested positive. This is not the same as the proportion of people who
test positive (80%) given that they have the disease.
d)
You should describe the patient’s chance of have the disease as about
6%, or 1 chance in 16. This is higher than the 1.5% incidence rate of
the disease. If the test is going to be useful at all in diagnosing
the disease, this is what one would hope for. If the rate of true
positives is not higher than the incidence rate, then the test is not
revealing anything.
a)
A higher percentage of women applicants were hired for both the whitecollar and blue-collar positions. This suggests that the company hires
women preferentially. Overall, there were 300 female and 600 male
applicants. Forty females were hired for white-collar positions (20%
of 200) and 85 were hired for blue-collar positions (85% of 100).
Therefore, 125 of the 300 female applicants (41.7%) were hired overall.
Thirty males were hired for white-collar positions (15% of 200) and 300
were hired for blue-collar positions (75% of 400). Therefore, 330 of
the 600 male applicants (55%) were hired overall. The female
applicants had a higher success rate in both categories of jobs, but
the male applicants had a higher success rate overall. This apparent
paradox is a result of the fact that males and females did not apply
for the two kinds of jobs in equal numbers. That is not something that
the company can control. Two-thirds of the women applied for whitecollar positions and two-thirds of the men applied for blue-collar
positions.
Treatment A had the better success rate in both trials. Overall, Treatment A
was successful in 40 + 85 = 125 cases out of 300 cases (41.7%), while
Treatment B was successful in 30 + 300 = 330 cases out of 600 cases (55%), so
Treatment B was the more successful treatment overall. This apparent paradox
is a result that can happen when the number of patients using the two
treatments is quite different in the two trials. In this case, because the
success rates are very different for both drugs in the two trials, one must
assume that there was something very different about the subjects in the two
trials. Perhaps only females were tested in the first trial and only males
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
82
21
CHAPTER 4, DESCRIBING DATA
in the second, or maybe the subjects were all young in one trial and all old
in the other, or possibly the subjects were all in late stages of the disease
in the first trial and all in early stages of the disease in the second
trial. If the circumstances of the trials were not similar in some way such
as these, the results should not be combined and the results of the
individual trials should be taken into account when prescribing treatment.
a)
Of the 5000 people in the at-risk sample, 475 + 25 = 500 are infected.
This is 10% of the sample. Of the 20,000 people in the general
population sample, 57 + 3 = 60 are infected. This is 0.003 or 0.3% of
the sample. Thus the table reflects the estimated incidence rates. Of
those infected in the at-risk population, 475 out of 500 tested
positive (95%), and of those infected in the general population, 57 out
of 60 tested positive (95%).
b)
In the at-risk population, 95% (475 out of 500) of those with HIV test
positive. Of those who test positive, 475 out of 700 (67.9%) have HIV.
These are different percentages because the first is the proportion of
those with HIV who test positive, while the second is the proportion of
those who test positive who have HIV.
c)
The chance of the patient having HIV is about 67%, which is
considerably higher than the incidence rate of 10% for the at-risk
population. If the test is any good, one would expect that proportion
of those who test positive who actually have HIV would be higher than
the incidence rate. If it isn’t, the test is not revealing anything
useful.
d)
In the general population, 95% (57 out of 60) of those with HIV test
positive. Of those who test positive, 57 out of 1054 (5.6%) have HIV.
These are different percentages because the first is the proportion of
those with HIV who test positive, while the second is the proportion of
those who test positive who have HIV.
e)
The chance of the patient having HIV is about 5.6%, which is
considerably higher than the incidence rate of 0.3% for the general
population. If the test is any good, one would expect that proportion
of those who test positive who actually have HIV would be higher than
the incidence rate.
Chapter 4 Review Exercises
1
a)
b)
Red: Mean = (0.751 + 0.841 + ... + 0.905)/13 = 0.8635
Green: Mean = (0.925 + 0.914 +... + 0.881)/19 = 0.8635
Red: The median is the seventh number in the ordered list of 13 data
values, which is 0.8590.
Green: The median is the tenth number in the ordered list of 19 data
values, which is 0.8650.
Red: Range = maximum – minimum = 0.966 – 0.751 = 0.2150
Green: Range = maximum – minimum =1.015 – 0.778 = 0.2370
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
CHAPTER 4 REVIEW EXERCISES
Red
Deviation
s
Deviations
Green
Deviation
s
Deviations
0.751
-0.11254
0.012665
0.925
0.061474
0.003779
0.841
-0.02254
0.000508
0.914
0.050474
0.002548
0.856
-0.00754
0.000057
0.881
0.017474
0.000305
0.799
-0.06454
0.004165
0.865
0.00147
0.000002
0.010498
0.865
0.00147
0.000002
0.966
0.102462
2
2
0.859
-0.00454
0.000021
1.015
0.15147
0.022944
0.857
-0.00654
0.000043
0.876
0.01247
0.000156
0.942
0.078462
0.006156
0.809
-0.05453
0.002973
0.873
0.009462
0.000090
0.865
0.00147
0.000002
0.809
-0.05454
0.002974
0.848
-0.01553
0.000241
0.890
0.026462
0.000700
0.940
0.07647
0.005848
0.878
0.014462
0.000209
0.833
-0.03053
0.000932
0.905
0.041462
0.001719
0.845
-0.01853
0.000343
0.852
-0.01153
0.000133
0.778
-0.08553
0.007315
0.814
-0.04953
0.002453
0.791
-0.07253
0.005260
0.810
-0.05353
0.002865
0.881
0.01747
0.000305
Sum =
0.058407
Sum =
0.039805
83
The deviations for Red M&Ms are obtained by subtracting the mean from
each number in the first column. For Green M&Ms, the deviations are
obtained by subtracting the mean from each number in the fourth column.
The squared deviations are then placed in the third and sixth columns,
and their totals are shown in bold at the bottom of the columns. The
standard deviations are then found by dividing the sum of the squared
deviations by n-1 = 13-1 = 12 for the Red M&Ms and taking the square
root. For the Green M&Ms, divide the total of the squared deviations
by n-1 = 19-1 = 18 and taking the square root. Thus, the standard
deviations are
Red M&Ms:
Standard Deviation=
Sum
n-1
0.039805
13 1
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
0.0576
84
CHAPTER 4, DESCRIBING DATA
Green M&Ms:
Standard Deviation=
0.058407
19 1
0.0570
c)
d)
e)
2
a)
b)
3
a)
b)
c)
d)
e)
f)
Boxplot of Red, Green
1.05
1.00
0.95
Data
Five number summaries
Red
Green
Minimum
0.7510 0.7780
First Quartile 0.8250 0.8140
Median
0.8590 0.8650
Third Quartile 0.8975 0.8810
Maximum
0.9660 1.0150
Sum
n-1
0.90
0.85
For the Red, the first
quartile is the average of
0.80
the two middle values of the
lowest six values of the data
0.75
Red
G reen
set, 0.809 and 0.841, while
the third quartile is the
average of the two middle values of the highest six values, 0.890 and
0.905. For the Green, the first quartile is the middle value of the
lowest nine values of the data set, 0.814, while the third quartile is
the middle value of the highest nine values, 0.881.
By the range rule, the standard deviation is approximately range/4,
which for Red M&Ms is 0.215/4 = 0.054 and for green M&Ms is 0.237/04 =
0.059. Both estimates are very close to the real values, in part
because there are no extreme outliers and the sample sizes are
reasonably large.
The means and medians are close for the Red and Green M&Ms. The ranges
and standard deviations are also close. Therefore, there is not much
difference in either the center or the variation in the distributions
of the Red M&Ms and the Green ones.
There are 32 values in the combined sample. After ordering the values
from smallest to largest, 0.845 is the eleventh value. There are 10
values smaller than 0.845. Since 10/32 = 0.3125, the value 0.845 is in
the 31st percentile.
The mode is 0.865 g. It occurs three times in the combined list and no
other value occurs more than twice.
Zero. The mean will be the same as each of the 50 values. That means
that each deviation from the mean will be zero and their squares will
all be zero. The sum of the squares of the deviations will be zero,
and therefore the standard deviation will also be zero.
This is a toss-up. While both batteries are equally likely to achieve
a life length of 48 months, the batteries with a standard deviation of
2 months are likely to come closer to lasting exactly 48 months. Some
of the batteries with a 6 month standard deviation will likely fail
well before the 48 months is up, but an equal number of them will last
somewhat beyond the 48 month period.
The outlier pulls the mean either up or down, depending on whether it
is above or below the mean, respectively.
The outlier has no effect on the median since the median is found as
the average of the two middle values in the ordered list (25 th and 26th
in a sample of size 50). The outlier would be either first or last in
the list.
The outlier increases the range since one of the two numbers used to
compute the range will be the outlier.
The outlier increases the standard deviation since one of the squared
deviations will be larger than if it would be if there were no
outliers.
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
CHAPTER 4 QUIZ
85
Chapter 4 Quiz
1
2
3
4
5
6
7
8
9
10
This value is the mean.
The standard deviation is the only statistic in the list that is a measure of
variation. All of the others are measures of center.
No. It is an estimate based on only the largest and smallest values, whereas
the actual standard deviation is based on all of the values in the sample.
Any one of the statements could be correct. If all of the values are
different, there is no mode. If two values occur equally often, but more
often than any of the other values, there are two modes. If three values
occur equally often, but more often than any of the other values, there are
three modes.
The 20th percentile must be less than the 30th percentile.
The median is greater than the first quartile.
The third quartile is greater than the first quartile.
The mean could be equal to the median, but it doesn’t have to be.
Since all of the values are different, the maximum and minimum values cannot
be the same. Therefore, the range cannot be zero.
The range rule of thumb says that the standard deviation is approximately 1/4
of the range. If the standard deviation is 10, the range is 40. Assuming
that the distribution of the values is symmetric, the high value will be
greater than the mean by 20 and the low value will be less than the mean by
20. In that case the likely low value will be about 30 and the high value
about 70.
The range is 10 – 2 = 8, so the standard deviation is estimated to be about
8/4 = 2.0
Since all of the values are the same, the mean will also be 5.8, making all
of the deviations from the mean equal to zero. When you square and sum the
deviations, the result is zero, so the standard deviation is zero.
The range = maximum – minimum = 9.0 – 2.0 = 7.0.
The five number summary consists of the minimum value, the first quartile,
the median, the third quartile, and the maximum value.
Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley
Related documents