Download Chapter 2 Solutions Page 15 of 28

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Chapter 2 Solutions
Page 15 of 28
2.50
a. The median is 55. The mean is about 105.
b. The median is a more representative “average" than the median here. Notice in the stem-and-leaf plot on
p.32 of the text that a clear majority of the students owned fewer than 105 CDs so the value 105 is not a
representative "average." The large outlier has inflated the mean.
c. Yes, the relationship between the mean and median is what you would expect. The data are skewed to
the right and there is an extremely large outlier. In general, both of these characteristics cause the mean to
be larger than the median.
2.51
a. The five-number created using the methods described on p.36 of the text is shown below. There were
n=60 ages, so the lower quartile is the median of the 30 lowest ages and the upper quartile is the median of
the 30 highest ages.
Median
Quartiles
Extremes
CEO ages (years)
50
45.5
57
32
74
The five-number summary shows that the median age of the 60 CEOs of small companies is 50 years. The
middle ½ of the CEOs have ages between 45.5 and 57 years. The youngest CEO is 32 years old. The
oldest CEO is 74 years old.
2.52
The boxplot can be made either with a horizontal axis (as shown here ) or a vertical axis.(as in Figure 2.8 of
the text).
Figure for Exercise 2.52
2.53
The mean of the CEO ages is 51.47 years. The median is 50 years. The mean and the median are similar.
This is expected because the data are more or less symmetric in shape.
2.54
a. Mean ± St. Dev is 7 ± 1.7, or 5.3 to 8.7.
b. Mean ± 2 St. Dev is 7 ± (2)(1.7), or 3.6 to 10.4.
c. Mean ± 3 St. Dev is 7 ± (3)(1.7), or 1.9 to 12.1.
2.55
Figure for Exercise 2.55
Chapter 2 Solutions
Page 16 of 28
2.56
a.
b.
c.
d.
z = (200−170)/20 = 1.5.
z = (140−170)/20 = −1.5.
z = (170−170)/20 = 0.
z = (230−170)/20 = 3.
2.57
a. Mean =20; Standard deviation = 1.581.
b. Mean =20; Standard deviation = 0.
c. Mean = 20; Standard deviation = 33.09.
2.58
a. 98-41 = 57.
b. Standard deviation ≈ Range/6 = 57/6 = 9.5.
2.59
The interval is −6.1 to 22.7 hours. The interval includes negative values , which are impossible times. Thus,
the interval based an assumption of a bell-shaped curve would not reflect reality.
2.60
The Empirical Rule says that 68% of values fall within 1 standard deviation of the mean, 95% fall within 2
standard deviations of the mean, and 99.7% fall within 3 standard deviations of the mean. Of the 103 handspan measurements for women, 74 or 72% are within 1 standard deviation of the mean (18.2 to 21.8 cm).
100 of the 103 or 97% are within 2 standard deviations. 101 of the 103 or 98% are within 3 standard
deviations. This data seems to fit pretty well with the Empirical Rule.
2.61
A histogram or dotplot of the ages at death for the first ladies shows that the data are approximately bellshaped. This may be a little surprising because the data are a mixture of many different distributions. Due
to advancement in medicine and other areas, the mean age at death has been increasing over time. The
mean age at death is higher now than in the 1800’s.
2.62
a. The First Ladies may constitute a population rather than a sample. They lived in unique circumstances,
so it is hard to view these women as a representative sample from any larger population. And, they can't be
considered to be a sample from a larger population of First Ladies because future First Ladies will have
different circumstances affecting life expectancy.
b. If the First Ladies are viewed as a population, the population standard deviation is σ = 14.76 years. In
Excel, this can be found with the command "=STDEVP( )" and many calculators have a key for the
population standard deviation. See p.43 of the text for a discussion of the population standard deviation. If
the argument is made in part (a) that the First Ladies constitute a sample, the correct answer here is that the
sample standard deviation is s = 14.97 years.
2.63
Outliers affect the standard deviation. This happens because the calculation uses the deviation from the
mean for every value. An outlier has a large deviation from the mean, so it inflates the standard deviation.
Extreme values generally do not affect the quartiles, and consequently they generally don't affect the
interquartile range. Remember that a quartile is determined by counting through the ordered data to a
particular location, so the exact size of the largest or smallest observations doesn't matter.
2.64
You expect women’s heights to have a bell-shape curve because it is more common for a woman to have a
height close to the mean than far from the mean. Generally, the further a height is from the mean (in either
direction), the fewer the number of women with that height. The ages at marriage for women will probably
not follow a bell-curve. Most of the ages will be in the 20’s, but the data will not be symmetric. The ages
can only be as low as law permits—15, maybe. The other direction extends much farther from the mean—
some women do not get married until they are 40 or 50.
Chapter 2 Solutions
Page 17 of 28
2.65
A categorical variable cannot have a bell-shaped distribution. A variable must be quantitative for it to be
possible to have a distribution with any particular shape. For a categorical variable, the raw data are
category labels without a meaningful numerical ordering. The ordering of bars in a bar chart is arbitrary
and could be done in many different ways. So, with a categorical variable, there is no inherent shape to the
distribution.
2.66
a. If the two possible outliers are ignored, the data appear to be more or less bell-shaped so the Empirical
Rule may hold.
b. The Empirical Rule implies that the range should span about 4 to 6 standard deviations. About 95% of
the data will be within 2 standard deviations (plus or minus) of the mean and about 99.7% of a data set
should be within 3 standard deviations (plus or minus) of the mean. Here, range = maximum − minimum =
23.25 – 12.5 = 10.75 cm. This span is equal to 10.75/1.8 = 5.97 standard deviations so it is consistent with
the Empirical Rule.
2.67
a. If the two lowest values are deleted, the mean will increase and the standard deviation will decrease.
b. The Empirical Rule for mean ± 3 standard deviations says that 99.7% of the values will be between
20.2 ± 3(1.45) or 15.85 and 24.55 cm. All of the data, or 100% of the values, are within this interval.
c. Looking at the figures, it seems like the Empirical Rule should hold when the outliers are removed. The
data looks pretty symmetric without those two values. If the outliers are not removed, the Empirical Rule
may hold, but not as well, since the data seem more skewed to the left with those two points included.
d. There may be justification for removing the outliers if a convincing argument can be made that they are
errors. The value of 12.5 may really be an incorrect entry of 21.5. The value of 13 may really be an
incorrect entry of 18 or 23. Assuming the original surveys were available, this could be checked. Or, you
could see if either of these women is extremely short or had any other odd measurements.
2.68
This will differ for each student. The calculation is z =
2.69
a. If a z-score is 0, the value must equal the mean.
b. Begin by setting the formula for a z-score equal to 1.
observed value - mean
=1
standard deviation
Two steps of algebra lead to observed value = mean + 1 standard deviation.
Another strategy is to make observed value = mean + 1 standard deviation in the z-score formula.
Algebraic simplification leads to z = 1.
2.70
a. z =
2.71
You should be more satisfied if the standard deviation was 5. This would mean you scored 2 standard
deviations above the mean and, if scores are bell-shaped, only about 2.5% of students are expected to score
higher.
2.72
The only possible set of numbers is {50, 50, 50, 50, 50, 50, 50} because a standard deviation of 0 means
there is no variability.
height − mean
. Use the mean and standard
s
deviation relevant to your gender. Note that the z-score will be negative if the height is less than the mean.
Notice also that if the height equals the mean, the result is z = 0.
value − mean
450 − 500
=
= −0 .5 , and the proportion below is .3085.
standard deviation
100
value − mean
36 . 5 − 34
b. z =
=
= 2 .5 , and the proportion below is .9938.
standard deviation
1
79 − 75
c. z =
= 0 .5 , and the proportion below is .6915.
8
79 − 75
d. z =
= 1 , and the proportion below is .8413.
4
Chapter 8 Solutions
Page 6 of 15
8.38
The answers for this exercise can be found using any of the methods discussed in Section 8.4,
including the use of Minitab or Excel.
a. P(X = 4) = .2051
b. P(X ≥ 4) = 1 − P(X ≤ 3) = 1 − .6496 = .3504
c. P(X ≤ 3) = .6496
d. P(X = 0) = .5905
e. P(X ≥ 1) = 1 − P(X = 0) = 1 − .5905 = .4095
8.39
a. Note that 1/4 of 1,000 is 250 so the desired probability is P(X≥250). n = 1000 and p = the
proportion of adults in the United States living with a partner, but not married at the time of the
sampling. The value of p is not known.
b. The desired probability is P(X≥110), n = 500, and p = .20.
c. Note that 70% of 20 is 14 so the desired probability is P(X≥14). n = 20, and p = .50.
8.40
The formulas are µ =np and σ = np (1 − p )
a. µ = 10(1/2) = 5 and σ = 10(. 5)(1 − .5 ) = 1.5811
b. µ = 100(1/4) = 25 and σ = 100 (.25 )(1 − .25 ) = 4 .33
c. µ= 2500(1/5) = 500 and σ = 2500 (.2)(1 − . 2) = 20
d. µ = 1(1/10) = .1 and σ = 1(.1)(1 − .1) = .3
e. µ = 30(.4) = 12 and σ = 30 (. 4 )(1 − .4 ) = 2 .683
8.41
For n = 2 and p = .5, P(X = 0) = .25, P(X = 1) = .5, and P(X = 2) = .25. These can be found in
several ways. One way is to list possible outcomes, which are {SS, SF, FS, and FF}, recognize
that all outcomes are equally likely, and then tabulate the distribution of X = number of successes.
µ = E(X) = ∑ xp(x) = (0×.25) + (1×.5)+ (0×.25) = 1.
σ=
∑( x − µ)2 p( x) =
( 0 − 1) 2 (. 25 ) + (1 − 1) 2 (. 5) + (2 − 1) 2 (.25 ) = 0 .5 = .7071 .
8.42
a. P(0≤ X≤ 30) = .5 (because the interval from 0 to 30 is one-half of the interval of possible
outcomes (0 to 60) and the distribution is uniform.
b. P(30≤X≤ 60) = .5 by the same reasoning as in part (a).
8.43
a.
8.44
Table A.1 can be used to find the answers.
a. .5000.
b. .3632.
c. .6368.
d. .9750.
e. .0099.
f. .9951.
g. .9505.
1 .5 − 0
= 1 .5 .
1
4 − 10
b.
= −1 .
6
0 − 10
c.
= −2 .
5
−25 − ( −10 )
d.
= −1 .
15
Chapter 8 Solutions
Page 7 of 15
200 − 180
= 1 . P(Z ≤ 1) = .8413.
20
165 − 180
b. Answer = .2266. For 165 lbs, z =
= −0 .75 . P(Z ≤ −0.75) = .2266.
20
c. Answer = .7734. This is the “opposite” event to part (b), so calculation is 1−.2266 = .7734.
8.45
a. Answer = .8413. For 200 lbs, z =
8.46
a. X is a uniform random variable (and it is continuous).
b. X ranges from 0 to 100 and the area under any density curve is 1, so f(x) = 1/100=.01 for all x
between 0 and 100. This creates a rectangle (with area=1) similar to Figure 8.2.
Note: f(x) = 0 for any x not between 0 and 100.
c. P(X≤15 seconds) is the area of the rectangle from 0 to 15 seconds. The interval width is 15 and
the height is 1/100, so the answer is (15)(1/100) = .15.
d. P(X≥40 seconds) is the area of the rectangle between 40 and 100. The interval width is 60 and
the height is 1/100 so the answer is (60)(1/100)= .60.
e.
Figure for Exercise 8.46e
f. The expected value or mean is 50. The distribution is symmetric, so the mean equals the
median. For a uniform random variable, the median is at the middle of the range of possible
values.
8.47
This will differ for each student.
8.48
a. The rectangle has height =1/10=0.1 because the range of X is 20−10=10.
Figure for Exercise 8.48a
Chapter 8 Solutions
Page 8 of 15
b.
Figure for Exercise 8.48b
Note: The range of this normal curve was determined using the fact that about 99.7% of the area
will be in the range mean ± 3 standard deviation.
c.
Figure for Exercise 8.48c
Note: The range of this normal curve was determined using the fact that about 99.7% of the area
will be in the range mean ± 3 standard deviation.
P(Z≤ −1.4) =.0808
P(Z≤ 1.4) = .9192
P(−1.4 ≤Z ≤ 1.4) = P(Z ≤ 1.4)−P(Z ≤ −1.4) = .9192 − .0808 = .8384
P(Z≥1.4) = 1−P(Z≤1.4) = 1−.9192 = .0808. Equivalently, P(Z≥1.4) = P(Z≤−1.4) =.0808.
8.49
a.
b.
c.
d.
8.50
a. 0001. Use the “In the Extreme” portion of Table A.1.
b. P(−3.72 ≤Z ≤ 3.72) = P(Z≤ 3.72)−P(Z≤ −3.72) = .9999 − .0001 = .9998.
c. About 0. This is far beyond the usual range of a standard normal curve.
8.51
a. z * =–1.96. If using Table A.1, look for .025 within the interior part of the table.
b. z * =1.96. If using Table A.1, look for .975 within the interior part of the table. Or, note that the
area to the right of z * must be .025, so by the symmetry of the standard normal curve the answer
is the positive version of the answer for part (a).
c. z * =1.96 because if .95 is in the central area, .975 must be the area to the left of z * . This means
the answer is the same as for part (b).
8.52
a. Define event A as “Z ≤ a”; thus, A c is “Z > a”. So, P(Z > a) = 1 – P(Z ≤ a) by Rule 1.
b. Define event A as “Z ≤ a”. Define the event B as “a ≤ Z ≤ b”. Events A and B are mutually
exclusive because a value cannot be both less than a and between a and b at the same time ,
assuming a is less than b. By Rule 2b, P(A or B) = P(A) + P(B) = P(Z ≤ a) + P(a ≤ Z ≤ b). The
event “Z ≤ a” or “a ≤ Z ≤ b” is equivalent to “Z ≤ b” so P(Z≤b) = P(Z ≤ a) + P(a ≤ Z ≤ b). One
step of algebra leads to P(a ≤ Z ≤ b) = P(Z ≤ b) – P(Z ≤ a).
Chapter 8 Solutions
Page 9 of 15
8.53
a. Note that 500 is the mean, and the distribution is symmetric, so P(X ≤ 500) = .5 (because the
probability is .5 on each side of the mean).
650 − 500
b. For 650, z =
= 1 .5 so P(X ≤ 650) = P(Z ≤ 1.5) = .9332
100
700 − 500
c. For 700, z =
= 2 so P(X ≥ 700) = P(Z ≥ 2) = 1−P(Z≤2) = 1 − .9772 = .0228.
100
Equivalently, P(Z ≥ 2) = P(Z ≤ −2) = .0228
d. P(500 ≤ X ≤ 700) = P(0 ≤ Z ≤ 2) = P(Z ≤ 2) − P(Z ≤ 0) = .9772 − .5 = .4772
8.54
a. For 65, z = 0 because 65 is the mean while for 62, z =
8.55
The value of z * for which P(Z ≥ z * ) = .10 is about 1.28. (If 10% are taller, then 90% are shorter
so if using Table A.1, look for .90 in the interior part of the table.) The answer is 1.28 standard
deviations above the mean, which is (1.28× 2.7) + 65 = 68.5 inches. The percentile ranking for a
height of 68.5 inches is .90 or 90%.
8.56
The value of z * for which P(Z ≤ z * ) = .25 is about −0.675 which means the answer is 0.675
standard deviations below the mean. This height is (−0.675×2.7) + 65 = 63.2 inches. The
percentile ranking for a height of 63.2 inches is .10 or 10%.
8.57
a. This will differ for each student. Suppose, for example, that the student is a male with a right
23 − 22 . 5
handspan of 23 cm. In that case, z =
= 0 .33
1 .5
b. The answer will differ for each student. For the example given in the solution for part (a), the
proportion of males with a handspan smaller than 23 cm. is P(Z≤0.33) = .6293.
62 − 65
= −1 .11 .
2 .7
So, P(62 ≤ X ≤ 65) = P(−1.11 ≤ Z ≤ 0) = P(Z ≤ 0) − P(Z ≤ −1.11) = .5 − .1335 = .3665
60 − 65
70 − 65
b. For 60, z =
= − 1. 85 while for 70, z =
= 1 .85 .
2. 7
2 .7
So, P(60 ≤ X ≤ 70) = P(−1.85 ≤ Z ≤ 1.85) = P(Z ≤ 1.85) − P(Z ≤ −1.85) = .9678 − .0322 = .9356
c. P(X ≤ 70) = P(Z ≤ 1.85) = .9678
d. P(X ≥ 60) = P(Z ≥ −1.85) = P(Z ≤ 1.85) = .9678
e. X is either less than or equal to 60 or greater than or equal to 70 so the answer can be computed
as P(X ≤ 60) + P(X ≥ 70) = P(Z ≤ −1.85) + P(Z ≥ 1.85) = .0322 + .0322 = .0644
10 − 18
= −1 .33 so P(X < 10) = P(Z < −1.33) = .0918
6
30 − 18
b. For 30, z =
= 2 so P(X > 30) = P(Z > 2) = 1 − P(Z ≤ 2) = 1− .9772= .0228.
6
Equivalently, P(Z > 2) = P(Z < −2) =.0228.
21 − 18
c. For 21, z =
= 0. 5 while for 15, z = +0.5. So, P(15 ≤ X ≤ 21) = P(−0.5 ≤ Z ≤ 0.5) =
6
P(Z≤0.5) − P(Z ≤ −0.5) = .6915 − .3085 = .3830
35 − 18
d. P(X > 35) = P(Z >
) = P(Z > 2.83) = 1 − P(Z ≤ 2.83) = 1 − .9977 = .0023.
6
Equivalently, P(Z > 2.83) = P(Z < −2.83) =.0023.
8.58 a. For 10, z =