Download Chapter Three

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter Three
3.1
For a data set with an odd number of observations, first we rank the data set in increasing (or
decreasing) order and then find the value of the middle term. This value is the median.
For a data set with an even number of observations, first we rank the data set in increasing (or
decreasing) order and then find the average of the two middle terms. The average gives the median.
3.3
Suppose the 2002 sales (in millions of dollars) of five companies are: 10, 21, 14, 410, and 8. The mean
for the data set is:
Mean = (10 + 21 + 14 + 410 + 8) / 5 = $92.60 million
Now, if we drop the outlier (410), the mean is:
Mean: (10 + 21 + 14 + 8) / 4 = $13.25 million
This shows how an outlier can affect the value of the mean.
3.5
The mode can assume more than one value for a data set. Examples 3–8 and 3–9 of the text present
such cases.
3.7
For a symmetric histogram (with one peak), the values of the mean, median, and mode are all equal.
Figure 3.2 of the text shows this case. For a histogram that is skewed to the right, the value of the mode
is the smallest and the value of the mean is the largest. The median lies between the mode and the
mean. Such a case is presented in Figure 3.3 of the text. For a histogram that is skewed to the left, the
value of the mean is the smallest, the value of the mode is the largest, and the value of the median lies
between the mean and the mode. Figure 3.4 of the text exhibits this case.
3.9
x = 5 + (–7) +2 + 0 + (–9) + 16 +10 + 7 = 24
(N + 1) / 2 = (8 + 1) / 2 = 4.5
µ=(x) / N = 24 / 8 = 3
Median = value of the 4.5th term in ranked data = (2 + 5) / 2 = 3.50
This data set has no mode.
TI-83: Enter the data in L1 (as shown in Chapter 1). Then press STAT, then highlight CALC and 1:
(1- Var Stats) and press ENTER. Now press 2ND, then the number 1, and finally press ENTER.
29
30
Chapter Three
1 - Var Stats
1 - Var Stats
↑ n=8
x =3
minX= -9
x = 24
2
Q1 = -3.5
x = 564
Sx = 8.383657572
Med=3.5
σx = 7.842193571
Q3= 8.5
↓ n=8
maxX=16
On the first screen above the second row that starts with x is the mean. The arrow indicates additional
information below, so using the down arrow until the end gives you the second screen where Med=
gives the median. The mode needs to be identified by hand.
MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select STAT, Basic Statistics,
and Display Descriptive Statistics. Since this population variable did not have a name, its labeled Data.
Highlight the variable name and click on it. Once it appears in the box marked VARIABLES: then
click on STATISTICS. Now make sure there is a check mark beside Mean and Median. Note: you can
eliminate the check marks beside all the others if desired. Below the word mean is the mean and below
the title Median is the median of this data set.
Excel: Enter the data in the spreadsheet (as shown in Chapter 1). Then select INSERT and
FUNCTION. For the mean, now select AVERAGE and then insert the cell range. If you use column A
its something like (A1:A8). For the median, now select MEDIAN and then insert the cell range. If you
use column A its something like (A1:A8). Note: in the second calculation make sure you DO NOT
include the results of the last calculation in your cell range. It is also a good practice to identify your
calculations and in this instance the names appear after each for clarity.
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
3.11
x = (x) / n = 9907/ 12 = $825.58
(n +1) / 2 = (12 + 1) / 2 = 6.5
Median = value of the 6.5th term in ranked data set = (769 + 798) / 2 = $783.50
3.13
x = (x) / n = 16.682 / 12 = $1.390
(n + 1) / 2 = (12 + 1) / 2 = 6.5
Median = (1.360 + 1.351) / 2 = $1.356
This data set has no mode.
3.15
x = (x) / n = 85.81 /12 = $7.15
(n +1) / 2 = (12 + 1) / 2 = 6.5
Median = (6.99 + 7.03) / 2 = $7.01
3.17
μ = (x) / N = 35,629 / 6 = $5938.17 thousand
(n +1) / 2 = (6 +1) / 2 = 3.5
Median = (750 + 8500) / 2 = $4625 thousand
This data set has no mode because no value appears more than once.
3.19
x = (x) / n = 64 / 10 = 6.40 hours
(n +1) / 2 = (10 +1) / 2 = 5.5
Median = (7 + 7) / 2 =7 hours
Mode = 0 and 7 hours
3.21
x = (x) / n = 294 / 10 = 29.4 computer terminals
(n +1) / 2 = (10 +1) / 2 = 5.5
Median = (28 + 29) / 2 = 28.5 computer terminals
Mode = 23 computer terminals
3.23
x = (x) / n = 257 /13 = 19.77 newspapers
a.
(n + 1) / 2 = (13 + 1) / 2 = 7
Median = 12 newspapers
b. Yes, 92 is an outlier. When we drop this value,
Mean = 165 / 12 = 13.75 newspapers
(n + 1) / 2 = (12 + 1) / 2 = 6.5
Median = (11+12) / 2 = 11.5 newspapers
As we observe, the mean is affected more by the outlier.
c. The median is a better measure because it is not sensitive to outliers.
3.25
From the given information: n1= 10, n2 = 8, x1 = $95, x2 = $104
x
n1x1  n2 x2 (10 )(95)  (8)(104 ) 1782


 $99
n1  n2
10  8
18
31
32
Chapter Three
3.27
Total money spent by 10 persons =  x = n x = 10(85.50) = $855
3.29
Sum of the ages of six persons = 6 x 46 = 276 years, so the age of sixth person = 276 – (57 + 39 + 44 +
51 + 37) = 48 years.
3.31
For Data Set I:
Mean = 123 / 5 = 24.60
For Data Set II: Mean = 158 / 5 = 31.60
The mean of the second data set is greater than the mean of the first data set by 7.
3.33
The ranked data are:
19
23
26
31
38
39
47
49
53
67
By dropping 19 and 67, we obtain: x = 23 + 26 + 31 + 38 + 39 + 47 + 49 + 53 = 306
10% Trimmed Mean = (x) /n = 306/ 8 = 38.25 years
MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select STAT, Basic Statistics,
and Display Descriptive Statistics. Since this population variable did not have a name, its labeled Data.
Highlight the variable name and click on it. Once it appears in the box marked VARIABLES: then
click on STATISTICS. Now make sure there is a check mark beside trimmed mean. Note: you can
eliminate the check marks beside all the others if desired. Below the word trimmed mean is the
trimmed mean which is technically the 5% trimmed mean but in this example will be identical to the
10% trimmed mean due to the number of elements in the population. In the screen shot below the
mean and median where included to show the difference between the three of them in this instance.
3.35
From the given information: x1 = 73, x2 = 67, x3 = 85, w1 = w2 = 1, w3=2
Weighted mean =
3.37
 xw  (1)(73)  (1)(67 )  (2)(85)  310  77 .5
4
4
w
Suppose the monthly income of five families are:
$1445
$2310
$967
$3195
Then, Range = Largest value – Smallest value = 24,500 – 967 = $23,533
Now, if we drop the outlier ($24,500) and calculate the range, then:
Range = Largest value – Smallest value = 3195 – 967 = $2228
Thus, when we drop the outlier ($24,500), the range decreases from $23,533 to $2228.
This exhibits the sensitivity of the range with respect to outliers.
$24,500
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
3.39
33
The value of the standard deviation is zero when all values in a data are the same. For example,
suppose the scores of a sample of six students in an examination are: 87
87
87
87
87
87
As this data set has no variation, the value of the standard deviation is zero for these observations. This
is shown below.
 x = 522
s
3.41
and
 x2 = 45,414
( x) 2
n

n 1
 x2 
(522) 2
6
0
6 1
45,414 
Range = Largest value – Smallest value = 16 – (–9) = 25,
2 
(  x) 2
(24)2
564 
N 
8  564  72  61.5
N
8
8
x = 24, x2 = 564 and N = 8
 x2 
and
  61.5  7.84
TI-83: Enter the data in L1 (as shown in Chapter 1). Then press STAT, then highlight CALC and 1:
(1- Var Stats) and press ENTER. Now press 2ND, then the number 1, and finally press ENTER.
1 - Var Stats
x =3
x = 24
x2 = 564
Sx = 8.383657572
σx = 7.842193571
↓ n=8
1 - Var Stats
↑ n=8
minX= -9
Q1 = -3.5
Med=3.5
Q3= 8.5
maxX=16
On the first screen in the sixth row that starts with σx is the population standard deviation. (Note: if
this was a sample we would use Sx as it is the sample standard deviation.) Square the population
standard deviation to get the population variance. The arrow indicates additional information below, so
using the down arrow until the end gives you the second screen where the smallest data point named
minX and the largest names maxX. By subtracting minX from maxX you obtain the range.
MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select STAT, Basic Statistics,
and Display Descriptive Statistics. Since this population variable did not have a name, its labeled Data.
Highlight the variable name and click on it. Once it appears in the box marked VARIABLES: then
click on STATISTICS. Now make sure there is a check mark beside sum, sum of squares, standard
deviation, variance, and range. Note: you can eliminate the check marks beside all the others if desired.
Below each title is the appropriate statistic for this data set. CAUTION: Do not use the Standard
Deviation reported here, it is calculated for a sample and NOT a population. Instead use the sum and
the sum of squares to calculate the population variance and the population standard deviation.
34
Chapter Three
Excel: Enter the data in the spreadsheet (as shown in Chapter 1). Then select INSERT and
FUNCTION. For the range, now select Max and then insert the cell range. If you use column A its
something like (A1:A8). Next type in a minus sign, now select Min, insert the cell range, and finally
press ENTER. For the population variance select VARP and then insert the cell range. For the
population variance select VARP and then insert the cell range. Note: in the calculations make sure
you DO NOT include the results of previous calculation in your cell range. It is also a good practice to
identify your calculations and in this instance the names appear to the left of each for clarity.
3.43
a.
x = ( x) / n = 72 / 8 = 9 shoplifters caught
Shoplifters caught
7
10
8
3
15
12
6
11
Deviations from the Mean
7 – 9 = –2
10 – 9 = 1
8 – 9 = –1
3 – 9 = –6
15 – 9 = 6
12 – 9 = 3
6 – 9 = –3
11 – 9 = 2
Sum = 0
The sum of the deviations from the mean is zero.
b. Range = Largest value – Smallest value = 15 – 3 = 12,
s2 
( x) 2
(72 ) 2
748 
n 
8  14 .2857
n 1
8 1
x = 72, x2 = 748, and n = 8
 x2 
and
s  14.2857  3.78
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
3.45
35
x = 81, x2 =699, and n = 12
Range = Largest value – Smallest value = 15 – 2 = 13 thefts
s2 
3.47
(  x) 2
(291) 2
9171 
n 
10  78.1
n 1
10  1
and
s  78.1 = 8.84 pieces
and
s  32.9714 = 5.74 pounds
 x2 
( x) 2
(174 ) 2
2480 
n 
15  32 .9714
n 1
15  1
 x2 
Range = Largest value – Smallest value = 23 – (–7) = 30º Fahrenheit
s2 
3.53
s  13.8409 = 3.72 thefts
Range = Largest value – Smallest value = 25 – 5 = 20 pounds
s2 
3.51
and
Range = Largest value – Smallest value = 41 – 14 = 27 pieces
s2 
3.49
(  x) 2
(81) 2
699 
n 
12  13.8409
n 1
12  1
 x2 
( x) 2
(80) 2
1552 
n 
8  107.4286
n 1
8 1
 x2 
and
s  107.4286 = 10.36º Fahrenheit
Range = Largest value – Smallest value = 83.4 – 31.2 = $52.2 sales in billions
(  x) 2
(459.6) 2
23615.58 
n 
10
 276.9293
n 1
10  1
 x2 
s2 
3.55
and
s  276.9293 = $16.64 billion
From the given data: x = 96, x2 = 1152, and n = 8
( x) 2
n
=
n 1
 x2 
s=
(96 ) 2
8 
8 1
1152 
1152  1152
=0
7
The standard deviation is zero because all these data values are the same and there is no variation
among them.
3.57
For the yearly salaries of all employees:
CV = (σ /μ) × 100% = (3,820 /42,350) × 100 = 9.02%
For the years of schooling of these employees: CV = (σ / μ) × 100% = (2 /15) × 100 = 13.33%
The relative variation in salaries is lower than that in years of schooling.
36
Chapter Three
MINITAB: Note under Display Descriptive Statistics, you can select coefficient of variation for a
sample. However in this instance since we do not have the elements of the sample but instead have the
mean and standard deviation, so it is much easier to do the calculation by hand.
3.59
For Data Set I: x = 123, x2 = 3883, and n = 5
(  x) 2
(123) 2
3883 
n
5

 214.300  14.64
n 1
5 1
 x2 
s
For Data Set II: x = 158, x2 = 5850, and n = 5
( x) 2
(158) 2
5850 
n
5

 214.300  14.64
n 1
5 1
 x2 
s
The standard deviations of the two data sets are equal.
3.61
Chebyshev’s theorem is applied to find the area under a distribution curve between two points that are
on opposite sides of the mean and at the same distance from the mean. According to this theorem, for
any number k greater than 1, at least (1 – (1/k2)) of the data values lie within k standard deviations of
the mean.
3.63
1
1
For the interval x  2s : k = 2, and 1 –
=1–
= 1 – .25 = .75 or 75%. Thus, at least 75% of the
2
( 2) 2
k
observations fall in the interval x  2s .
For the interval x  2.5s : k = 2.5, and 1 –
1
k
2
=1–
1
( 2.5) 2
= 1 – .16 = .84 or 84%. Thus, at least
84% of the observations fall in the interval x  2.5s .
For the interval x  3s : k = 3, and 1–
1
k
2
=1–
1
(3) 2
= 1 – .11 = .89 or 89%. Thus, at least 89% of
the observations fall in the interval x  3s .
3.65
Approximately 68% of the observations fall in the interval   1 , approximately 95% fall in the
interval   2 , and about 99.7% fall in the interval   3 .
3.67
Each of the two values is 40 minutes from  = 220. Hence, k = 40 / 20 = 2
a.
1–
1
k
2
=1–
1
( 2) 2
= 1 – .25 = .75 or 75%.
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
37
Thus, at least 75% of the runners ran the race in 180 to 260 minutes.
b. Each of the two values is 60 minutes from  = 220. Hence,
k = 60 / 20 = 3
and
1 – 1 = 1 – 1 = 1 –.11 = .89 or 89%.
(3) 2
k2
Thus, at least 89% of the runners ran the race in 160 to 280 minutes.
c. Each of the two values is 50 minutes from  = 220. Hence,
k = 50 / 20 = 2.5
and
1
1–
k
2
1
=1–
( 2.5) 2
= 1 – .16 = .84 or 84%.
Thus, at least 84% of the runners ran this race in 170 to 270 minutes.
3.69
 = $8367 and  = $2400
a. i. Each of the two values is $4800 from  = $8367. Hence,
k = 4800 / 2400 = 2 and
1–
1
k
2
=1–
1
( 2) 2
= 1 – .25 = .75 or 75%.
Thus, at least 75% of all households have credit card debt between $3567 and $13,167.
ii. Each of the two values is $6000 from  = $8367. Hence,
k = 6000 / 2400 = 2.5
and
1–
1
k
2
=1–
1
( 2.5) 2
= 1 – .16 = .84 or 84%.
Thus, at least 84% of all households have credit card debt between $2,367 and $14, 367.
1
b. 1  1  .89 gives 1  1  .89  .11 or k2 =
, so k = 3.
2
2
.11
k
k
  3 = 8367 – 3(2400) = $1167
and
  3 = 8367 + 3(2400) = $15,567
Thus, the required interval is $1167 to $15,567.
3.71
 = 44 months and  = 3 months.
a.
The interval 41 to 47 months is    to    . Hence, approximately 68% of the batteries have a
life of 41 to 47 months.
b. The interval 38 to 50 months is   2 to   2 . Hence, approximately 95% of the batteries
have a life of 38 to 50 months.
38
Chapter Three
c. The interval 35 to 53 months is   3 to   3 . Hence, approximately 99.7% of the batteries
have a life of 35 to 53 months.
3.73
µ = 16 hours of housework per week and σ = 3.5 hours
a. i. The interval 12.5 to 19.5 hours is µ – σ to µ + σ. Hence, approximately 68% of all men in the
U.S. spent 12.5 to 19.5 hours per week on housework in 2002.
ii. The interval 9 to 23 hours is µ –2σ to µ + 2σ. Hence, approximately 95% of all men in the U.S.
spent 9 to 23 hours per week on housework in 2002.
b. The interval that contains 99.7% of U.S. men’s housework hours is µ –3σ to µ + 3σ. Hence, this
interval is 16 – 3(3.5) to 16 + 3(3.5) or 5.5 to 26.5 hours of housework per week.
3.75
To find the three quartiles:
1. Rank the given data set in increasing order.
2. Find the median by the procedure in Section 3.1.2. The median is the second quartile, Q2.
3. The first quartile, Q1, is the value of the middle term among the (ranked) observations that are less
than Q2.
4. The third quartile, Q3, is the value of the middle term among the (ranked) observations that are
greater that Q2.
Example 3–20 and 3–21 of the text exhibit how to calculate the three quartiles for data sets with an
even and odd number of observations, respectively.
3.77
Given a data set of n values, to find the kth percentile (Pk):
1. Rank the given data in increasing order.
2. Calculate kn/ 100. Then, Pk is the term that is approximately (kn/100) in the ranking. If kn/ 100 falls
between two consecutive integers a and b, it may be necessary to average the ath and bth values in
the ranking to obtain Pk.
3.79
The ranked data are:
5 5 7 8 8 9 10 10 11 11 12 14 18 21 25
a. The three quartiles are: Q1 = 8, Q2 = 10, and Q3 = 14
IQR = Q3 – Q1 = 14 – 8 = 6
TI-83: Enter the data in L1 (as shown in Chapter 1). Then press STAT, then highlight CALC and
1: (1- Var Stats) and press ENTER. Now press 2 ND, then the number 1, and finally press ENTER.
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
39
1 - Var Stats
1 - Var Stats
↑ n =15
x =11.6
minX= 5
x = 174
2
Q1 = 8
x = 2480
Sx = 5.742075284
Med=10
σx = 5.54737175
Q3= 14
↓ n = 15
maxX=25
On the second screen in the fourth row that starts with Q1 is the first quartile. The second quartile is
just below it titled median and the third quartile is in the next row titled Q 3. By subtracting Q1 from
Q3 you can obtain the interquartile range.
MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select STAT, Basic
Statistics, and Display Descriptive Statistics. The variable is labeled Pounds.
Highlight the variable name and click on it. Once it appears in the box marked VARIABLES: then
click on STATISTICS. Now make sure there is a check mark beside first quartile, median, third
quartile, and interquartile range. Note: you can eliminate the check marks beside all the others if
desired. Below each title is the appropriate statistic for this data set.
Excel: Enter the data in the spreadsheet (as shown in Chapter 1). Then select INSERT and
FUNCTION. For the first quartile, select Quartile, then insert the cell range like (A1:A15) which is
called array in this menu, and then insert the number 1. For the median or second quartile, select
Quartile, then insert the array, and then type the number 2. For the third quartile, select Quartile,
then insert the array, and then press the number 3. For the interquartile range simply subtract Q1
from Q3. Note: It is a good practice to identify your calculations and in this instance the names
appear to the left of each for clarity.
Caution: In some instances due to the number of numbers in the series the calculation of quartiles
and thus the interquartile range by Excel is WRONG due to the programming of the function.
Unfortunately, this is one of those instances. In this example Q3 and therefore IQR are incorrect!
40
Chapter Three
b. kn/100 = 82(15) / 100 = 12.30  12
Thus, the 82nd percentile can be approximated by the value of the 12 th term in the ranked data,
which is 14. Therefore, P82 = 14.
MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select the following
STAT, Reliability/ Survival, Distribution Analysis and Parametric Analysis. The variable is labeled
Pounds. Highlight the variable name and click on it. Once it appears in the box marked
VARIABLES: then click on ESTIMATE and enter the percentile as a number, not a decimal in the
Estimate Percentiles Box. The result appears in the session box.
Excel: Enter the data in the spreadsheet (as shown in Chapter 1). Then select INSERT and
FUNCTION. For the percentile, select Percentile, then insert the cell range like (A1:A15) which is
called array in this menu, and then insert the percentage in decimal form.
c. Six values in the given data are smaller than 10. Hence, percentile rank of 10 = (6/15) × 100 = 40%
3.81
The ranked data are:
41 42 43 44 44 45 46 46 47 47 48 48
48 49 50 50 51 51 52 52 52 53 53 54 56
a. The three quartiles are: Q1 = (45 + 46) / 2 = 45.5, Q2 = 48, and Q3 = (52 + 52) / 2 = 52
IQR = Q3 – Q1 = 52 – 45.5 =6.5
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
41
b. kn/100 = 53(25)/100 = 13.25  13
Thus, the 53rd percentile can be approximated by the value of the thirteenth term in the ranked data,
which is 48. Therefore, P53 = 48.
c. Fourteen values in the given data are less than 50. Therefore,
Percentile rank of 50 =(14/25) × 100 = 56%
3.83
The ranked data are:
3
5
6
6
7
9
9
10
11
12
14
15
a. The three quartiles are:
Q1 = (6+6)/2 = 6, Q2 = (9 +9)/ 2 = 9, and Q3 = (11 +12) /2 = 11.5
IQR = Q3 – Q1 = 11.5 – 6 = 5.5
The value 10 lies between Q2 and Q3, which means it lies in the third 25% group from the bottom in
the ranked data set.
b. kn/100 = 55(12)/100 = 6.6
Thus, the 55th percentile can be approximated by the average of the six and seventh terms in the
ranked data. Therefore, P55 = (9 + 9) /2 = 9.
c. Four values in the given data set are less than 7. Hence, percentile rank of 7 is (4/12) × 100 =
33.33%.
3.85
The ranked data are:
3 3 4 5 5 6 7 7 8 8 8 9 9 10 10 11 11 12 12 16
a. The three quartiles are: Q1 = (5 +6) / 2 = 5.5, Q2 = (8+ 8)/ 2 = 8, and Q3 = (10 + 11) /2 = 10.5
IQR = Q3 – Q1 = 10.5 – 5.5 = 5
The value 4 lies below Q1, which indicates that it is in the bottom 25% of the value in the (ranked)
data set.
b. kn/ 100 = 25(20) / 100 = 5
Thus, the 25th percentile may be approximated by the value of the fifth term in the ranked data,
which is 5. Therefore, P25 = 5. Thus, the number of new cars sold at this dealership is less than 5 for
approximately 25% of the days in this sample.
c. Thirteen values in the given data are less than 10. Hence, percentile rank of 10 = (13/20) × 100 =
65%. Thus, on 65% of the days in the sample, this dealership sold fewer than 10 cars.
42
3.87
Chapter Three
A box–and–whisker plot is based on five summary measures: the median, the first quartile, the third
quartile, and the smallest and largest value in the data set between the lower and upper inner fences.
3.89
The ranked data are: 3 6 7 8 11 13 14 15 16 18 19 23 26 29 30 31 33 42 62 75
For the data,
Median = (18+19) /2 = 18.5, Q1 = (11 +13) / 2 = 12, and Q3 = (30+31) /2= 30.5,
IQR = Q3 – Q1 = 30.5 – 12 = 18.5, 1.5 x IQR = 1.5 x 18.5 = 27.75,
Lower inner fence = Q1 – 27.75 = 12 – 27.75= – 15.75,
Upper inner fence = Q3 + 27.75= 30.5 + 27.75= 58.25
The smallest and the largest values within the two inner fences are 3 and 42, respectively.
There are two outliers, 62 and 75. To classify them, we compute:
3.0 x IQR = 3.0 x 18.5 = 55.5. Hence, the upper outer fence is: Q3 + 55.5 = 86
Since 62 and 75 are both less than 86, they are within the upper outer fence and are called mild
outliers.
MINITAB: Enter the data in spreadsheet (from Chapter 1). Then select Graph, Boxplot, Simple, and
OK. Next place the variable name or the column number in the Graph Variables box and click OK.
TI-83: Enter the data in L1 (as shown in Chapter 1). Then press 2 nd, STAT PLOT, turning on Plot1 if it
is off, and pressing the number 1. Now highlight in the second row under TYPE the first graph (note
you get there by pressing the right arrow four times), then press the down arrow go to the word mark
and select your symbol for outliers. Finally press ZOOM and using the down arrow go to number nine
that says ZoomStat and you box-and-whisker plot appears.
3.91
Median = 22,000, Q1 = 17200, Q3 = 35100, IQR = Q3 – Q1 =35,100 – 17,200 = 17,900,
1.5 x IQR = 1.5 x 17,200 = 26,850,
Lower inner fence = Q1 – 26,850 = 17,200 – 26,850 = –9650,
Upper inner fence = Q3 + 26,850 = 35,100 + 26,850 = 61,950
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
43
The smallest and the largest values within the two inner fences are 9400 and 50,300, respectively. The
data set contains no outliers.
The data are skewed to the right.
3.93
Median = 48, Q1 = 45.5, Q3 = 52, IQR = Q3 – Q1 = 52 –45.5 = 6.5, 1.5 x IQR = 1.5 x 6.5 = 9.75,
Lower inner fence = Q1 – 9.75 = 45.5 – 9.75 = 35.75,
Upper inner fence = Q3 + 9.75 = 52 + 9.75 = 61.75
The smallest and largest values within the two inner fences are 41 and 56, respectively. There are no
outliers.
The data are skewed slightly to the right.
3.95
Median = 9, Q1 = 6, Q3 = 11.5, IQR = Q3 – Q1 = 11.5– 6 = 5.5, 1.5 x IQR = 1.5 x 5.5 = 8.25,
Lower inner fence = Q1 – 8.25 = 6 – 8.25 = –2.75,
Upper inner fence = Q3 + 8.25 = 11.5 + 8.25 = 19.75
The smallest and largest values within the two inner fences are 3 and 15, respectively. There are no
outliers.
The data are nearly symmetric.
3.97
Median = 7.5, Q1 = 5, Q3 = 9.5, IQR = Q3 – Q1 = 9.5 – 5 = 4.5, 1.5 x IQR = 1.5 x 4.5 = 6.75,
Lower inner fence = Q1 – 6.75 = – 1.75
and
Upper inner fence = Q3 + 6.75 = 16.25
The smallest and largest values within the two inner fences are 3 and 12, respectively. There are no
outliers.
The data are nearly symmetric.
3.99
a.
x = (x)/ n = 1,471,311 / 10= $147,131.10
Median = $147,195 and Mode = $125,000
44
Chapter Three
b. Range = Largest value – Smallest value = 170,000 – 125,000 = $45,000
( x) 2
(1,471,311) 2
218,975,790,881 
n 
10
 277,798,334.3222
n 1
10  1
 x2 
s2 
s  277,798,334.3222  $16,667.28
3.101
a.
x = (x)/ n = 88/12 = 7.33 citations
(n + 1) /2 = (12 + 1) /2 = 6.5
Median = (7 + 8) /2 = 7.5 citations
Mode = 4, 7, and 8 citations
b. Range = Largest value – Smallest value = 14 – 0 = 14 citations
(  x) 2
(88) 2
834 
n 
12 = 17.1515
n 1
12  1
 x2 
s2 =
and
s=
17.1515 = 4.14 citations
c. The values of the summary measures in parts a and b are sample statistics because the data are
based on a sample of 12 drivers.
3.103
a. i. Each of the two values is $900 from µ = $1100. Hence,
k = 900/300 = 3
and
1–
1
k
2
=1–
1
32
= 1 – .11 = .89 or 89%.
Thus, at least 89% of households will have holiday expenditures between $200 and $2000.
ii. Each of the two values is $600 from µ = $1100. Hence,
k = 600/300 = 2
and
1–
1
k
2
=1–
1
( 2) 2
= 1 – .25 = .75 or 75%.
Thus, at least 75% of households will have holiday expenditures between $500 and $1700.
b. 1–
1
k
2
= .84 gives
1
k
2
= 1 – .84 = .16 or k2 =
1
, so k = 2.5
.16
The required interval is:
µ – kσ to µ + kσ = {1100 – 2.5(300)} to {1100 + 2.5(300)} = $350 to $1850
3.105
µ = $134,000 and σ = $12,000
a. i. The interval $98,000 to $170,000 is µ – 3σ to µ + 3σ. Thus, approximately 99.7% of CPAs
have salaries between $98,000 and $170,000.
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
45
ii. The interval $110,000 to $158,000 is µ – 2σ to µ + 2σ. Thus, approximately 95% of CPAs have
salaries between $110,000 and $158,000.
b. The interval that contains salaries of 68% of all such CPAs is µ – σ to µ + σ. Hence, this interval is:
(134,000 – 12,000) to (134,000 + 12,000) = $122,000 to $146,000.
3.107
The ranked data are:
0 3 4 4 7 7 8 8 9 11 13 14
a. The three quartiles are: Q1= (4+ 4)/ 2 = 4, Q2 = (7 +8)/ 2 = 7.5, Q3 = (9 + 11)/2 = 10
IQR = Q3 – Q1 = 10 – 4 = 6
The value 4 is equal to Q1, which indicates that approximately 25% of the values in the data set are
less than this value.
b. kn/ 100 = 70(12) / 100 = 8.4. Thus, the 70th percentile may be approximated by the mean of the
eighth and ninth terms in the ranked data. Therefore, P 70 = (8 + 9) /2 = 8.5. Thus, approximately
70% of these drivers had fewer than 8.5 citations.
c. One value in the given data is less than 3. Hence, the percentile rank of 3 = (1 /12) × 100 = 8.33%.
Thus, 8.33% of these drivers had fewer than 3 citations.
3.109
Median = (11 + 12) /2 = 11.5, Q1 = 8 , Q3 = 16 , IQR = Q3 – Q1 = 16 – 8 = 8
1.5 x IQR = 1.5 x 8 = 12
Lower inner fence = Q1 – 12 = 8 – 12 = –4
Upper inner fence = Q3 + 12 = 16 + 12 = 28
The smallest and largest values in the data set within the two inner fences are 2 and 24, respectively.
The data are skewed to the right.
The values 33 and 42 are outliers. The upper outer fence is Q3 + 3(IQR) = 16 +3(8) = 40. Since 42
is greater than 40, it is an extreme outlier.
3.111
a. Let y = amount that Jeffery suggests. Then, to insure the outcome Jeffery wants, we need
y  12,000(5)
 20,000
6
y + 12,000(5) = 6(20,000)
46
Chapter Three
y + 60,000 = 120,000
y = 60,000
So Jeffery would have to suggest $60,000 be awarded to the plaintiff.
b. To prevent a juror like Jeffery from having an undue influence on the amount of damage to be
awarded to the plaintiff, the jury could revise its procedure by throwing out any amounts that are
outliers and than recalculate the mean, or by using the median, or by using a trimmed mean.
3.113
a. To calculate how much time the trip requires, divide miles driven by miles per hour for each 100
mile segment. Time = 100 / 52+ 100 / 65 +100/ 58= 1.92 + 1.54 + 1.72 = 5.18 hours.
b. Linda’s average speed for the 300 – mile trip is not equal to (52 + 65 + 58) / 3 = 58.33 MPH. This
would assume that she spent an equal amount of time on each 100–mile segment, which is not true,
because her average speed is different on each segment. Linda’s average speed for the entire 300–
mile trip is given by (miles driven) / (elapsed time) = 300 / 5.18 = 57.92 MPH.
3.115
a. Total amount spent per month by the 2000 shoppers = (14 × 8× 1100) + (18 × 11× 900) = $301,400
b. Total number of trips per month by the 2000 shoppers = (8 × 1100) + (11 × 900) = 18,700
Mean number of trips per month per shopper = 18,700/ 2000 = 9.35 trips
a. Total amount spent per month by the 2000 shoppers = $301,400, from part a
Mean amount spent per person per month = 301,400 / 2000 = $150.70
3.117
Total distance for the first 100 students = 100 x 8.73 = 873 miles
Total distance for all 103 students = 873 + 11.5 + 7.6 + 10.0 = 902.1 miles
Mean distance for all 103 students = 902.1 / 103 = 8.76 miles
3.119
a. Since we are dealing with a normal distribution and we know that 16% of all students scored above
85, which is µ + 15, we must also have that 16% of all students scored below µ – 15 = 55.
Therefore, the remaining 68% of students scored between 55 and 85. By the empirical rule, we
know that 68% of the scores fall in the interval (µ – σ) to (µ + σ), so we have µ – σ = 70 – σ = 55
and µ + σ = 70 + σ = 85. Thus, σ = 15.
b. We know that 95% of the scores are between 60 and 80 and that µ = 70. By the empirical rule, 95%
of the scores fall in the interval (µ – 2σ) to (µ + 2σ). So 60 = µ – 2σ = 70 – 2σ and 80 = µ + 2σ =70
+ 2σ. Therefore 10 = 2σ and so σ = 5.
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
3.121
47
a. Mean = $600.35, Mean = $90, and Mode = $0, s = $2347.33,
Most of the losses are zero or near zero; however there is a relatively infrequent occurrence of very
large losses.
b. The mean is the largest.
c. Q1 = $0, Q3 = $272.50, IQR = $272.50, 1.5 x IQR = $408.75
Lower Inner fence is Q1 – 408.75 = 0 – 408.75 = –408.75
Upper Inner Fence is Q3 + 408.75 = 272.50 +408.75 = 681.25
The largest and smallest values within the two inner fences are 0 and 501 respectively. There are
three outliers at 1127, 3709 and 14,589.
Below are the box–and–whisker plot and the histogram for the given data.
The data are skewed to the right.
d. Because the data are skewed to the right, the insurance company should use the mean when
considering the center of the data as it is more affected by the extreme values. The insurance
company would want to use a measure that takes into consideration the possibility of extremely
large losses. The standard deviation should be used as a measure of variation.
3.123
a. Since, x 
x
x
we have: n =  = 12,372 / 51.55 = 240 pieces of luggage.
n
x
b. Since, x 
x
,  x  nx . Thus, the total score for the seven students is 7 x 81 = 567. Let x =
n
seventh student’s score. Then x + 81 + 75 + 93 + 88 + 82 + 85 = 567. Hence, x + 504 = 567, so x =
567 – 504 = 63.
3.125
For all students:
n = 44, x = 6597, x2 = 1,030,639 and
x
 x = 6597
44
n
( x) 2
n
=
n 1
 x2 
s=
= 149.93
Median = 147.50 pounds
pounds
(6597 ) 2
44
 31.0808 pounds
44  1
1,030 ,639 
48
Chapter Three
For men only:
n = 22, x = 3848, x2 = 680,724 and
Median = 179 pounds
x
 x = 3848
22
n
= 174.91
( x) 2
n
=
n 1
(3848 ) 2
22
 19.1160 pounds
22  1
 x2 
s=
pounds
680 ,724 
For women:
n = 22, x = 2749, x2 = 349,915 and
x
s
x
n
=
Median = 123 pounds
2749
= 124.95 pounds
22
( x) 2
n

n 1
(2749) 2
22
 17.4778 pounds
22  1
349,915 
 x2 
In this case, the median may be more informative than the mean, since it is less influenced by
extremely high or low weights.
As one might expect, the mean and median weights for men are higher than those of women. For the
entire group, the mean and median weights are about midway between the corresponding values for
men and women.
The standard deviations are roughly the same for men and women. The standard deviation for the
whole group is much larger than for men or women only, due to the fact that it includes the lower
weights of women and the heavier weights of men.
3.127
The given data are: 3
Ranked data are: 3
a.
6
6
9
9
12
18
15
11
10
15 25
10
11
12
15
15 18
21
21
25
26
26
38
38
41
41
62
62
x = 20.80 thousand miles, Median = 15 thousand miles, Mode = 15 thousand miles
b. Range = 59 thousand miles, s2 = 249.03, s = 15.78 thousand miles
c. Q1 = 10 thousand miles and Q3 = 26 thousand miles
d. IQR = Q3 – Q1 = 26 – 10 = 16 thousand miles
Since the interquartile range is based on the middle 50% of the observations it is not affected by
outliers. The standard deviation, however, is strongly affected by outliers. Thus, the inter–quartile
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
49
range is preferable in applications in which a measure of variation is required that is unaffected by
extreme values.
Self–Review Test for Chapter Three
1. b
2. a and d
3. c
4. c
5. b
6. b
7. a
8. a
9. b
10. a
11.b
12. c
13. a
14. a
15.
For the given data: n = 10, x = 109, x2 = 1775
x = (x)/ n = 109/10 = 10.9
( n +1) /2 = (10 +1) /2 = 5.5
Median = (7 +9) /2 = 8
Mode = 6
Range = Largest value – Smallest value = 28 – 2 =26
( x) 2
(109) 2
1775 
n 
10  65.2111
n 1
10  1
 x2 
s2 
s = 65.2111 = 8.08
16.
Suppose the 2002 gross sales (in millions of dollars) of six companies are:
1.2 1.9 .5 2.1 3.4 110.5
Then, x = 1.2 + 1.9 + .5 + 2.1 + 3.4 + 110.5 = 119.6
Mean = (x) /n = 119.6 / 6 = $19.93 million
The value of $110.5 million is an outlier. When we drop it:
x = 1.2 + 1.9 + .5 + 2.1 + 3.4 = 9.1
Mean = (x) /n = 9.1 / 5 = $1.82 million
Thus, when we drop the value of 110.5 million, which is an outlier, the value of the mean decreases
from $19.93 million to $1.82 million.
17.
Reconsider the data on the 2002 gross sales (in millions of dollars) of six companies given in Problem
16, which are reproduced below.
1.2 1.9 .5 2.1 3.4 110.5
Then, Range = Largest value – Smallest value = 110.5 – .5 = $110 million
When we drop the value of $110.5 million, which is an outlier: Range = 3.4 – .5 = $2.9 million
50
Chapter Three
Thus, when we drop the value of $110.5 million, which is an outlier, the value of the range decreases
from $110 million to $2.9 million.
18.
The value of the standard deviation is zero when all the values in a data set are the same. For example,
suppose the heights (in inches) of five women are:
67
67
67
67
67
This data set has no variation. As shown below the value of the standard deviation is zero for this
data set. For these data: n = 5, x = 335, and x2 = 22,445.
( x) 2
n
=
n 1
(335 ) 2
5

5 1
 x2 
s=
19.
22 ,445 
22 ,445  22 ,445
=0
4
a. i. Each of the two values is 5.5 years from µ = 7.3 years. Hence,
k = 5.5 / 2.2 = 2.5
and
1–
1
k
2
1
=1–
( 2.5) 2
= 1 – .16 = .84 or 84%
Thus, at least 84% of the cars are 1.8 to 12.8 years old.
ii. Each of the two values is 6.6 years from µ = 7.3years. Hence
k = 6.6 / 2.2 = 3
and
1–
1
k
2
=1–
1
(3) 2
= 1 – .11 = .89 or 89%
Thus, at least 89% of the cars are .7 to 13.9 years old.
b. 1 –
1
k
2
= .75 gives
1
k
2
= 1 – .75 = .25 or k2 =
1
or k = 2
2 .5
Thus, the required interval is µ – kσ to µ + kσ = {7.3 – 2(2.2)} to {7.3 + 2(2.2)} = 2.9 to 11.7 years.
20.
a. µ = 7.3 years and σ = 2.2 years
i. The intervals 5.1 to 9.5 years is µ – σ to µ + σ. Hence, approximately 68% of the cars are 5.1 to
9.5 years old.
ii. The interval .7 to 13.9 years is µ – 3σ to µ + 3σ. Hence, approximately 99.7% of the cars are .7
to 13.9 years.
b. The interval that contains ages of 95% of the cars will be µ – 2σ to µ + 2σ. Hence, this interval is: µ
– 2σ to µ + 2σ = {7.3 – 2(2.2)} to {7.3 + 2(2.2)} = 2.9 to 11.7 years. Thus, approximately 95% of
the cars are 2.9 to 11.7 years old.
Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual
21.
The ranked data are:
a.
51
0 1 2 3 4 5 7 8 10 11 12 13 14 15 20
The three quartiles are: Q1 = 3, Q2 = 8, and Q3 = 13. IQR = Q3 – Q1 = 13 – 3 = 10. The value 4 lies
between Q1 and Q2, which indicates that this value is in the second from the bottom 25% group in
the ranked data.
b.
kn/100 = 60(15)/100 = 9. Thus, the 60th percentile may be represented by the value of the ninth
term in the ranked data, which is 10. Therefore, P 60 = 10. Thus, approximately 60% of the half
hour time periods had fewer than 10 passengers set off the metal detectors during this day.
c.
Ten values in the given data are less than 12. Hence, percentile rank of 12 = (10/15) × 100 =
66.67%. Thus, 66.67% of the half hour time periods had fewer than 12 passengers set off the metal
detectors during this day.
22.
The ranked data are:
0 1 2 3 4 5 7 8 10 11 12 13 14 15 20
Q1 = 3, Q2 = 8, and Q3 = 13. IQR = Q3 – Q1 = 13 – 3 = 10, 1.5 × IQR= 1.5 × 10 =15
Lower inner fence = Q1 – 15 = 3 – 15 = –12
Upper inter fence = Q3 + 15 = 13 + 15 = 28
The smallest and largest values in the data set within the two inner fences are 0 and 20, respectively.
The data does not contain any outliers.
The data are skewed slightly to the right.
23.
From the given information: n1 = 15, n2 = 20, x1 = $435, x2 = $490
x
24.
n1 x1  n1 x 2
(15)( 435 )  (20 )( 490 ) 16 ,325
=
=
= $466.43
15  20
35
n1  n 2
Sum of the GPAs of five students = 5 × 3.21 = 16.05
Sum of the GPAs of four students = 3.85 + 2.67 + 3.45 + 2.91 = 12.88
GPA of the fifth student = 16.05 – 12.88 = 3.17
52
25.
Chapter Three
The ranked data are:
58
149
163
166
179 193
207
238
287
2534
Thus, to find the 10% trimmed mean, we drop the smallest value and the largest value (10% of 10 is 1)
and find the mean of the remaining 8 values. For these 8 values,
x = 149 + 163 + 166 + 179 + 193 + 207 + 238 + 287 = 1582
10% trimmed mean = (x) / 8 = 1582 / 8 = $197.75 thousand = $197,750. The 10% trimmed mean is a
better summary measure for these data than the mean of all 10 values because it eliminates the effect of
the outliers, 58 and 2534.
26.
a. For Data Set I:
For Data Set II:
x = (x) / n = 79/ 4 = 19.75
x = (x) / n = 67/ 4 = 16.75
The mean of Data Set II is smaller than the mean of Data Set I by 3.
b. For Data Set I: x = 79, x2 = 1945, and n = 4
( x) 2
n
=
n 1
 x2 
s=
(79 ) 2
4  11.32
4 1
1995 
c. For Data Set II: x = 67, x2 = 1507, and n = 4
( x) 2
n
=
n 1
 x2 
s=
(67 ) 2
4  11.32
4 1
1507 
The standard deviations of the two data sets are equal.