Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter Three 3.1 For a data set with an odd number of observations, first we rank the data set in increasing (or decreasing) order and then find the value of the middle term. This value is the median. For a data set with an even number of observations, first we rank the data set in increasing (or decreasing) order and then find the average of the two middle terms. The average gives the median. 3.3 Suppose the 2002 sales (in millions of dollars) of five companies are: 10, 21, 14, 410, and 8. The mean for the data set is: Mean = (10 + 21 + 14 + 410 + 8) / 5 = $92.60 million Now, if we drop the outlier (410), the mean is: Mean: (10 + 21 + 14 + 8) / 4 = $13.25 million This shows how an outlier can affect the value of the mean. 3.5 The mode can assume more than one value for a data set. Examples 3–8 and 3–9 of the text present such cases. 3.7 For a symmetric histogram (with one peak), the values of the mean, median, and mode are all equal. Figure 3.2 of the text shows this case. For a histogram that is skewed to the right, the value of the mode is the smallest and the value of the mean is the largest. The median lies between the mode and the mean. Such a case is presented in Figure 3.3 of the text. For a histogram that is skewed to the left, the value of the mean is the smallest, the value of the mode is the largest, and the value of the median lies between the mean and the mode. Figure 3.4 of the text exhibits this case. 3.9 x = 5 + (–7) +2 + 0 + (–9) + 16 +10 + 7 = 24 (N + 1) / 2 = (8 + 1) / 2 = 4.5 µ=(x) / N = 24 / 8 = 3 Median = value of the 4.5th term in ranked data = (2 + 5) / 2 = 3.50 This data set has no mode. TI-83: Enter the data in L1 (as shown in Chapter 1). Then press STAT, then highlight CALC and 1: (1- Var Stats) and press ENTER. Now press 2ND, then the number 1, and finally press ENTER. 29 30 Chapter Three 1 - Var Stats 1 - Var Stats ↑ n=8 x =3 minX= -9 x = 24 2 Q1 = -3.5 x = 564 Sx = 8.383657572 Med=3.5 σx = 7.842193571 Q3= 8.5 ↓ n=8 maxX=16 On the first screen above the second row that starts with x is the mean. The arrow indicates additional information below, so using the down arrow until the end gives you the second screen where Med= gives the median. The mode needs to be identified by hand. MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select STAT, Basic Statistics, and Display Descriptive Statistics. Since this population variable did not have a name, its labeled Data. Highlight the variable name and click on it. Once it appears in the box marked VARIABLES: then click on STATISTICS. Now make sure there is a check mark beside Mean and Median. Note: you can eliminate the check marks beside all the others if desired. Below the word mean is the mean and below the title Median is the median of this data set. Excel: Enter the data in the spreadsheet (as shown in Chapter 1). Then select INSERT and FUNCTION. For the mean, now select AVERAGE and then insert the cell range. If you use column A its something like (A1:A8). For the median, now select MEDIAN and then insert the cell range. If you use column A its something like (A1:A8). Note: in the second calculation make sure you DO NOT include the results of the last calculation in your cell range. It is also a good practice to identify your calculations and in this instance the names appear after each for clarity. Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 3.11 x = (x) / n = 9907/ 12 = $825.58 (n +1) / 2 = (12 + 1) / 2 = 6.5 Median = value of the 6.5th term in ranked data set = (769 + 798) / 2 = $783.50 3.13 x = (x) / n = 16.682 / 12 = $1.390 (n + 1) / 2 = (12 + 1) / 2 = 6.5 Median = (1.360 + 1.351) / 2 = $1.356 This data set has no mode. 3.15 x = (x) / n = 85.81 /12 = $7.15 (n +1) / 2 = (12 + 1) / 2 = 6.5 Median = (6.99 + 7.03) / 2 = $7.01 3.17 μ = (x) / N = 35,629 / 6 = $5938.17 thousand (n +1) / 2 = (6 +1) / 2 = 3.5 Median = (750 + 8500) / 2 = $4625 thousand This data set has no mode because no value appears more than once. 3.19 x = (x) / n = 64 / 10 = 6.40 hours (n +1) / 2 = (10 +1) / 2 = 5.5 Median = (7 + 7) / 2 =7 hours Mode = 0 and 7 hours 3.21 x = (x) / n = 294 / 10 = 29.4 computer terminals (n +1) / 2 = (10 +1) / 2 = 5.5 Median = (28 + 29) / 2 = 28.5 computer terminals Mode = 23 computer terminals 3.23 x = (x) / n = 257 /13 = 19.77 newspapers a. (n + 1) / 2 = (13 + 1) / 2 = 7 Median = 12 newspapers b. Yes, 92 is an outlier. When we drop this value, Mean = 165 / 12 = 13.75 newspapers (n + 1) / 2 = (12 + 1) / 2 = 6.5 Median = (11+12) / 2 = 11.5 newspapers As we observe, the mean is affected more by the outlier. c. The median is a better measure because it is not sensitive to outliers. 3.25 From the given information: n1= 10, n2 = 8, x1 = $95, x2 = $104 x n1x1 n2 x2 (10 )(95) (8)(104 ) 1782 $99 n1 n2 10 8 18 31 32 Chapter Three 3.27 Total money spent by 10 persons = x = n x = 10(85.50) = $855 3.29 Sum of the ages of six persons = 6 x 46 = 276 years, so the age of sixth person = 276 – (57 + 39 + 44 + 51 + 37) = 48 years. 3.31 For Data Set I: Mean = 123 / 5 = 24.60 For Data Set II: Mean = 158 / 5 = 31.60 The mean of the second data set is greater than the mean of the first data set by 7. 3.33 The ranked data are: 19 23 26 31 38 39 47 49 53 67 By dropping 19 and 67, we obtain: x = 23 + 26 + 31 + 38 + 39 + 47 + 49 + 53 = 306 10% Trimmed Mean = (x) /n = 306/ 8 = 38.25 years MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select STAT, Basic Statistics, and Display Descriptive Statistics. Since this population variable did not have a name, its labeled Data. Highlight the variable name and click on it. Once it appears in the box marked VARIABLES: then click on STATISTICS. Now make sure there is a check mark beside trimmed mean. Note: you can eliminate the check marks beside all the others if desired. Below the word trimmed mean is the trimmed mean which is technically the 5% trimmed mean but in this example will be identical to the 10% trimmed mean due to the number of elements in the population. In the screen shot below the mean and median where included to show the difference between the three of them in this instance. 3.35 From the given information: x1 = 73, x2 = 67, x3 = 85, w1 = w2 = 1, w3=2 Weighted mean = 3.37 xw (1)(73) (1)(67 ) (2)(85) 310 77 .5 4 4 w Suppose the monthly income of five families are: $1445 $2310 $967 $3195 Then, Range = Largest value – Smallest value = 24,500 – 967 = $23,533 Now, if we drop the outlier ($24,500) and calculate the range, then: Range = Largest value – Smallest value = 3195 – 967 = $2228 Thus, when we drop the outlier ($24,500), the range decreases from $23,533 to $2228. This exhibits the sensitivity of the range with respect to outliers. $24,500 Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 3.39 33 The value of the standard deviation is zero when all values in a data are the same. For example, suppose the scores of a sample of six students in an examination are: 87 87 87 87 87 87 As this data set has no variation, the value of the standard deviation is zero for these observations. This is shown below. x = 522 s 3.41 and x2 = 45,414 ( x) 2 n n 1 x2 (522) 2 6 0 6 1 45,414 Range = Largest value – Smallest value = 16 – (–9) = 25, 2 ( x) 2 (24)2 564 N 8 564 72 61.5 N 8 8 x = 24, x2 = 564 and N = 8 x2 and 61.5 7.84 TI-83: Enter the data in L1 (as shown in Chapter 1). Then press STAT, then highlight CALC and 1: (1- Var Stats) and press ENTER. Now press 2ND, then the number 1, and finally press ENTER. 1 - Var Stats x =3 x = 24 x2 = 564 Sx = 8.383657572 σx = 7.842193571 ↓ n=8 1 - Var Stats ↑ n=8 minX= -9 Q1 = -3.5 Med=3.5 Q3= 8.5 maxX=16 On the first screen in the sixth row that starts with σx is the population standard deviation. (Note: if this was a sample we would use Sx as it is the sample standard deviation.) Square the population standard deviation to get the population variance. The arrow indicates additional information below, so using the down arrow until the end gives you the second screen where the smallest data point named minX and the largest names maxX. By subtracting minX from maxX you obtain the range. MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select STAT, Basic Statistics, and Display Descriptive Statistics. Since this population variable did not have a name, its labeled Data. Highlight the variable name and click on it. Once it appears in the box marked VARIABLES: then click on STATISTICS. Now make sure there is a check mark beside sum, sum of squares, standard deviation, variance, and range. Note: you can eliminate the check marks beside all the others if desired. Below each title is the appropriate statistic for this data set. CAUTION: Do not use the Standard Deviation reported here, it is calculated for a sample and NOT a population. Instead use the sum and the sum of squares to calculate the population variance and the population standard deviation. 34 Chapter Three Excel: Enter the data in the spreadsheet (as shown in Chapter 1). Then select INSERT and FUNCTION. For the range, now select Max and then insert the cell range. If you use column A its something like (A1:A8). Next type in a minus sign, now select Min, insert the cell range, and finally press ENTER. For the population variance select VARP and then insert the cell range. For the population variance select VARP and then insert the cell range. Note: in the calculations make sure you DO NOT include the results of previous calculation in your cell range. It is also a good practice to identify your calculations and in this instance the names appear to the left of each for clarity. 3.43 a. x = ( x) / n = 72 / 8 = 9 shoplifters caught Shoplifters caught 7 10 8 3 15 12 6 11 Deviations from the Mean 7 – 9 = –2 10 – 9 = 1 8 – 9 = –1 3 – 9 = –6 15 – 9 = 6 12 – 9 = 3 6 – 9 = –3 11 – 9 = 2 Sum = 0 The sum of the deviations from the mean is zero. b. Range = Largest value – Smallest value = 15 – 3 = 12, s2 ( x) 2 (72 ) 2 748 n 8 14 .2857 n 1 8 1 x = 72, x2 = 748, and n = 8 x2 and s 14.2857 3.78 Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 3.45 35 x = 81, x2 =699, and n = 12 Range = Largest value – Smallest value = 15 – 2 = 13 thefts s2 3.47 ( x) 2 (291) 2 9171 n 10 78.1 n 1 10 1 and s 78.1 = 8.84 pieces and s 32.9714 = 5.74 pounds x2 ( x) 2 (174 ) 2 2480 n 15 32 .9714 n 1 15 1 x2 Range = Largest value – Smallest value = 23 – (–7) = 30º Fahrenheit s2 3.53 s 13.8409 = 3.72 thefts Range = Largest value – Smallest value = 25 – 5 = 20 pounds s2 3.51 and Range = Largest value – Smallest value = 41 – 14 = 27 pieces s2 3.49 ( x) 2 (81) 2 699 n 12 13.8409 n 1 12 1 x2 ( x) 2 (80) 2 1552 n 8 107.4286 n 1 8 1 x2 and s 107.4286 = 10.36º Fahrenheit Range = Largest value – Smallest value = 83.4 – 31.2 = $52.2 sales in billions ( x) 2 (459.6) 2 23615.58 n 10 276.9293 n 1 10 1 x2 s2 3.55 and s 276.9293 = $16.64 billion From the given data: x = 96, x2 = 1152, and n = 8 ( x) 2 n = n 1 x2 s= (96 ) 2 8 8 1 1152 1152 1152 =0 7 The standard deviation is zero because all these data values are the same and there is no variation among them. 3.57 For the yearly salaries of all employees: CV = (σ /μ) × 100% = (3,820 /42,350) × 100 = 9.02% For the years of schooling of these employees: CV = (σ / μ) × 100% = (2 /15) × 100 = 13.33% The relative variation in salaries is lower than that in years of schooling. 36 Chapter Three MINITAB: Note under Display Descriptive Statistics, you can select coefficient of variation for a sample. However in this instance since we do not have the elements of the sample but instead have the mean and standard deviation, so it is much easier to do the calculation by hand. 3.59 For Data Set I: x = 123, x2 = 3883, and n = 5 ( x) 2 (123) 2 3883 n 5 214.300 14.64 n 1 5 1 x2 s For Data Set II: x = 158, x2 = 5850, and n = 5 ( x) 2 (158) 2 5850 n 5 214.300 14.64 n 1 5 1 x2 s The standard deviations of the two data sets are equal. 3.61 Chebyshev’s theorem is applied to find the area under a distribution curve between two points that are on opposite sides of the mean and at the same distance from the mean. According to this theorem, for any number k greater than 1, at least (1 – (1/k2)) of the data values lie within k standard deviations of the mean. 3.63 1 1 For the interval x 2s : k = 2, and 1 – =1– = 1 – .25 = .75 or 75%. Thus, at least 75% of the 2 ( 2) 2 k observations fall in the interval x 2s . For the interval x 2.5s : k = 2.5, and 1 – 1 k 2 =1– 1 ( 2.5) 2 = 1 – .16 = .84 or 84%. Thus, at least 84% of the observations fall in the interval x 2.5s . For the interval x 3s : k = 3, and 1– 1 k 2 =1– 1 (3) 2 = 1 – .11 = .89 or 89%. Thus, at least 89% of the observations fall in the interval x 3s . 3.65 Approximately 68% of the observations fall in the interval 1 , approximately 95% fall in the interval 2 , and about 99.7% fall in the interval 3 . 3.67 Each of the two values is 40 minutes from = 220. Hence, k = 40 / 20 = 2 a. 1– 1 k 2 =1– 1 ( 2) 2 = 1 – .25 = .75 or 75%. Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 37 Thus, at least 75% of the runners ran the race in 180 to 260 minutes. b. Each of the two values is 60 minutes from = 220. Hence, k = 60 / 20 = 3 and 1 – 1 = 1 – 1 = 1 –.11 = .89 or 89%. (3) 2 k2 Thus, at least 89% of the runners ran the race in 160 to 280 minutes. c. Each of the two values is 50 minutes from = 220. Hence, k = 50 / 20 = 2.5 and 1 1– k 2 1 =1– ( 2.5) 2 = 1 – .16 = .84 or 84%. Thus, at least 84% of the runners ran this race in 170 to 270 minutes. 3.69 = $8367 and = $2400 a. i. Each of the two values is $4800 from = $8367. Hence, k = 4800 / 2400 = 2 and 1– 1 k 2 =1– 1 ( 2) 2 = 1 – .25 = .75 or 75%. Thus, at least 75% of all households have credit card debt between $3567 and $13,167. ii. Each of the two values is $6000 from = $8367. Hence, k = 6000 / 2400 = 2.5 and 1– 1 k 2 =1– 1 ( 2.5) 2 = 1 – .16 = .84 or 84%. Thus, at least 84% of all households have credit card debt between $2,367 and $14, 367. 1 b. 1 1 .89 gives 1 1 .89 .11 or k2 = , so k = 3. 2 2 .11 k k 3 = 8367 – 3(2400) = $1167 and 3 = 8367 + 3(2400) = $15,567 Thus, the required interval is $1167 to $15,567. 3.71 = 44 months and = 3 months. a. The interval 41 to 47 months is to . Hence, approximately 68% of the batteries have a life of 41 to 47 months. b. The interval 38 to 50 months is 2 to 2 . Hence, approximately 95% of the batteries have a life of 38 to 50 months. 38 Chapter Three c. The interval 35 to 53 months is 3 to 3 . Hence, approximately 99.7% of the batteries have a life of 35 to 53 months. 3.73 µ = 16 hours of housework per week and σ = 3.5 hours a. i. The interval 12.5 to 19.5 hours is µ – σ to µ + σ. Hence, approximately 68% of all men in the U.S. spent 12.5 to 19.5 hours per week on housework in 2002. ii. The interval 9 to 23 hours is µ –2σ to µ + 2σ. Hence, approximately 95% of all men in the U.S. spent 9 to 23 hours per week on housework in 2002. b. The interval that contains 99.7% of U.S. men’s housework hours is µ –3σ to µ + 3σ. Hence, this interval is 16 – 3(3.5) to 16 + 3(3.5) or 5.5 to 26.5 hours of housework per week. 3.75 To find the three quartiles: 1. Rank the given data set in increasing order. 2. Find the median by the procedure in Section 3.1.2. The median is the second quartile, Q2. 3. The first quartile, Q1, is the value of the middle term among the (ranked) observations that are less than Q2. 4. The third quartile, Q3, is the value of the middle term among the (ranked) observations that are greater that Q2. Example 3–20 and 3–21 of the text exhibit how to calculate the three quartiles for data sets with an even and odd number of observations, respectively. 3.77 Given a data set of n values, to find the kth percentile (Pk): 1. Rank the given data in increasing order. 2. Calculate kn/ 100. Then, Pk is the term that is approximately (kn/100) in the ranking. If kn/ 100 falls between two consecutive integers a and b, it may be necessary to average the ath and bth values in the ranking to obtain Pk. 3.79 The ranked data are: 5 5 7 8 8 9 10 10 11 11 12 14 18 21 25 a. The three quartiles are: Q1 = 8, Q2 = 10, and Q3 = 14 IQR = Q3 – Q1 = 14 – 8 = 6 TI-83: Enter the data in L1 (as shown in Chapter 1). Then press STAT, then highlight CALC and 1: (1- Var Stats) and press ENTER. Now press 2 ND, then the number 1, and finally press ENTER. Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 39 1 - Var Stats 1 - Var Stats ↑ n =15 x =11.6 minX= 5 x = 174 2 Q1 = 8 x = 2480 Sx = 5.742075284 Med=10 σx = 5.54737175 Q3= 14 ↓ n = 15 maxX=25 On the second screen in the fourth row that starts with Q1 is the first quartile. The second quartile is just below it titled median and the third quartile is in the next row titled Q 3. By subtracting Q1 from Q3 you can obtain the interquartile range. MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select STAT, Basic Statistics, and Display Descriptive Statistics. The variable is labeled Pounds. Highlight the variable name and click on it. Once it appears in the box marked VARIABLES: then click on STATISTICS. Now make sure there is a check mark beside first quartile, median, third quartile, and interquartile range. Note: you can eliminate the check marks beside all the others if desired. Below each title is the appropriate statistic for this data set. Excel: Enter the data in the spreadsheet (as shown in Chapter 1). Then select INSERT and FUNCTION. For the first quartile, select Quartile, then insert the cell range like (A1:A15) which is called array in this menu, and then insert the number 1. For the median or second quartile, select Quartile, then insert the array, and then type the number 2. For the third quartile, select Quartile, then insert the array, and then press the number 3. For the interquartile range simply subtract Q1 from Q3. Note: It is a good practice to identify your calculations and in this instance the names appear to the left of each for clarity. Caution: In some instances due to the number of numbers in the series the calculation of quartiles and thus the interquartile range by Excel is WRONG due to the programming of the function. Unfortunately, this is one of those instances. In this example Q3 and therefore IQR are incorrect! 40 Chapter Three b. kn/100 = 82(15) / 100 = 12.30 12 Thus, the 82nd percentile can be approximated by the value of the 12 th term in the ranked data, which is 14. Therefore, P82 = 14. MINITAB: Enter the data in spreadsheet (as shown in Chapter 1). Then select the following STAT, Reliability/ Survival, Distribution Analysis and Parametric Analysis. The variable is labeled Pounds. Highlight the variable name and click on it. Once it appears in the box marked VARIABLES: then click on ESTIMATE and enter the percentile as a number, not a decimal in the Estimate Percentiles Box. The result appears in the session box. Excel: Enter the data in the spreadsheet (as shown in Chapter 1). Then select INSERT and FUNCTION. For the percentile, select Percentile, then insert the cell range like (A1:A15) which is called array in this menu, and then insert the percentage in decimal form. c. Six values in the given data are smaller than 10. Hence, percentile rank of 10 = (6/15) × 100 = 40% 3.81 The ranked data are: 41 42 43 44 44 45 46 46 47 47 48 48 48 49 50 50 51 51 52 52 52 53 53 54 56 a. The three quartiles are: Q1 = (45 + 46) / 2 = 45.5, Q2 = 48, and Q3 = (52 + 52) / 2 = 52 IQR = Q3 – Q1 = 52 – 45.5 =6.5 Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 41 b. kn/100 = 53(25)/100 = 13.25 13 Thus, the 53rd percentile can be approximated by the value of the thirteenth term in the ranked data, which is 48. Therefore, P53 = 48. c. Fourteen values in the given data are less than 50. Therefore, Percentile rank of 50 =(14/25) × 100 = 56% 3.83 The ranked data are: 3 5 6 6 7 9 9 10 11 12 14 15 a. The three quartiles are: Q1 = (6+6)/2 = 6, Q2 = (9 +9)/ 2 = 9, and Q3 = (11 +12) /2 = 11.5 IQR = Q3 – Q1 = 11.5 – 6 = 5.5 The value 10 lies between Q2 and Q3, which means it lies in the third 25% group from the bottom in the ranked data set. b. kn/100 = 55(12)/100 = 6.6 Thus, the 55th percentile can be approximated by the average of the six and seventh terms in the ranked data. Therefore, P55 = (9 + 9) /2 = 9. c. Four values in the given data set are less than 7. Hence, percentile rank of 7 is (4/12) × 100 = 33.33%. 3.85 The ranked data are: 3 3 4 5 5 6 7 7 8 8 8 9 9 10 10 11 11 12 12 16 a. The three quartiles are: Q1 = (5 +6) / 2 = 5.5, Q2 = (8+ 8)/ 2 = 8, and Q3 = (10 + 11) /2 = 10.5 IQR = Q3 – Q1 = 10.5 – 5.5 = 5 The value 4 lies below Q1, which indicates that it is in the bottom 25% of the value in the (ranked) data set. b. kn/ 100 = 25(20) / 100 = 5 Thus, the 25th percentile may be approximated by the value of the fifth term in the ranked data, which is 5. Therefore, P25 = 5. Thus, the number of new cars sold at this dealership is less than 5 for approximately 25% of the days in this sample. c. Thirteen values in the given data are less than 10. Hence, percentile rank of 10 = (13/20) × 100 = 65%. Thus, on 65% of the days in the sample, this dealership sold fewer than 10 cars. 42 3.87 Chapter Three A box–and–whisker plot is based on five summary measures: the median, the first quartile, the third quartile, and the smallest and largest value in the data set between the lower and upper inner fences. 3.89 The ranked data are: 3 6 7 8 11 13 14 15 16 18 19 23 26 29 30 31 33 42 62 75 For the data, Median = (18+19) /2 = 18.5, Q1 = (11 +13) / 2 = 12, and Q3 = (30+31) /2= 30.5, IQR = Q3 – Q1 = 30.5 – 12 = 18.5, 1.5 x IQR = 1.5 x 18.5 = 27.75, Lower inner fence = Q1 – 27.75 = 12 – 27.75= – 15.75, Upper inner fence = Q3 + 27.75= 30.5 + 27.75= 58.25 The smallest and the largest values within the two inner fences are 3 and 42, respectively. There are two outliers, 62 and 75. To classify them, we compute: 3.0 x IQR = 3.0 x 18.5 = 55.5. Hence, the upper outer fence is: Q3 + 55.5 = 86 Since 62 and 75 are both less than 86, they are within the upper outer fence and are called mild outliers. MINITAB: Enter the data in spreadsheet (from Chapter 1). Then select Graph, Boxplot, Simple, and OK. Next place the variable name or the column number in the Graph Variables box and click OK. TI-83: Enter the data in L1 (as shown in Chapter 1). Then press 2 nd, STAT PLOT, turning on Plot1 if it is off, and pressing the number 1. Now highlight in the second row under TYPE the first graph (note you get there by pressing the right arrow four times), then press the down arrow go to the word mark and select your symbol for outliers. Finally press ZOOM and using the down arrow go to number nine that says ZoomStat and you box-and-whisker plot appears. 3.91 Median = 22,000, Q1 = 17200, Q3 = 35100, IQR = Q3 – Q1 =35,100 – 17,200 = 17,900, 1.5 x IQR = 1.5 x 17,200 = 26,850, Lower inner fence = Q1 – 26,850 = 17,200 – 26,850 = –9650, Upper inner fence = Q3 + 26,850 = 35,100 + 26,850 = 61,950 Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 43 The smallest and the largest values within the two inner fences are 9400 and 50,300, respectively. The data set contains no outliers. The data are skewed to the right. 3.93 Median = 48, Q1 = 45.5, Q3 = 52, IQR = Q3 – Q1 = 52 –45.5 = 6.5, 1.5 x IQR = 1.5 x 6.5 = 9.75, Lower inner fence = Q1 – 9.75 = 45.5 – 9.75 = 35.75, Upper inner fence = Q3 + 9.75 = 52 + 9.75 = 61.75 The smallest and largest values within the two inner fences are 41 and 56, respectively. There are no outliers. The data are skewed slightly to the right. 3.95 Median = 9, Q1 = 6, Q3 = 11.5, IQR = Q3 – Q1 = 11.5– 6 = 5.5, 1.5 x IQR = 1.5 x 5.5 = 8.25, Lower inner fence = Q1 – 8.25 = 6 – 8.25 = –2.75, Upper inner fence = Q3 + 8.25 = 11.5 + 8.25 = 19.75 The smallest and largest values within the two inner fences are 3 and 15, respectively. There are no outliers. The data are nearly symmetric. 3.97 Median = 7.5, Q1 = 5, Q3 = 9.5, IQR = Q3 – Q1 = 9.5 – 5 = 4.5, 1.5 x IQR = 1.5 x 4.5 = 6.75, Lower inner fence = Q1 – 6.75 = – 1.75 and Upper inner fence = Q3 + 6.75 = 16.25 The smallest and largest values within the two inner fences are 3 and 12, respectively. There are no outliers. The data are nearly symmetric. 3.99 a. x = (x)/ n = 1,471,311 / 10= $147,131.10 Median = $147,195 and Mode = $125,000 44 Chapter Three b. Range = Largest value – Smallest value = 170,000 – 125,000 = $45,000 ( x) 2 (1,471,311) 2 218,975,790,881 n 10 277,798,334.3222 n 1 10 1 x2 s2 s 277,798,334.3222 $16,667.28 3.101 a. x = (x)/ n = 88/12 = 7.33 citations (n + 1) /2 = (12 + 1) /2 = 6.5 Median = (7 + 8) /2 = 7.5 citations Mode = 4, 7, and 8 citations b. Range = Largest value – Smallest value = 14 – 0 = 14 citations ( x) 2 (88) 2 834 n 12 = 17.1515 n 1 12 1 x2 s2 = and s= 17.1515 = 4.14 citations c. The values of the summary measures in parts a and b are sample statistics because the data are based on a sample of 12 drivers. 3.103 a. i. Each of the two values is $900 from µ = $1100. Hence, k = 900/300 = 3 and 1– 1 k 2 =1– 1 32 = 1 – .11 = .89 or 89%. Thus, at least 89% of households will have holiday expenditures between $200 and $2000. ii. Each of the two values is $600 from µ = $1100. Hence, k = 600/300 = 2 and 1– 1 k 2 =1– 1 ( 2) 2 = 1 – .25 = .75 or 75%. Thus, at least 75% of households will have holiday expenditures between $500 and $1700. b. 1– 1 k 2 = .84 gives 1 k 2 = 1 – .84 = .16 or k2 = 1 , so k = 2.5 .16 The required interval is: µ – kσ to µ + kσ = {1100 – 2.5(300)} to {1100 + 2.5(300)} = $350 to $1850 3.105 µ = $134,000 and σ = $12,000 a. i. The interval $98,000 to $170,000 is µ – 3σ to µ + 3σ. Thus, approximately 99.7% of CPAs have salaries between $98,000 and $170,000. Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 45 ii. The interval $110,000 to $158,000 is µ – 2σ to µ + 2σ. Thus, approximately 95% of CPAs have salaries between $110,000 and $158,000. b. The interval that contains salaries of 68% of all such CPAs is µ – σ to µ + σ. Hence, this interval is: (134,000 – 12,000) to (134,000 + 12,000) = $122,000 to $146,000. 3.107 The ranked data are: 0 3 4 4 7 7 8 8 9 11 13 14 a. The three quartiles are: Q1= (4+ 4)/ 2 = 4, Q2 = (7 +8)/ 2 = 7.5, Q3 = (9 + 11)/2 = 10 IQR = Q3 – Q1 = 10 – 4 = 6 The value 4 is equal to Q1, which indicates that approximately 25% of the values in the data set are less than this value. b. kn/ 100 = 70(12) / 100 = 8.4. Thus, the 70th percentile may be approximated by the mean of the eighth and ninth terms in the ranked data. Therefore, P 70 = (8 + 9) /2 = 8.5. Thus, approximately 70% of these drivers had fewer than 8.5 citations. c. One value in the given data is less than 3. Hence, the percentile rank of 3 = (1 /12) × 100 = 8.33%. Thus, 8.33% of these drivers had fewer than 3 citations. 3.109 Median = (11 + 12) /2 = 11.5, Q1 = 8 , Q3 = 16 , IQR = Q3 – Q1 = 16 – 8 = 8 1.5 x IQR = 1.5 x 8 = 12 Lower inner fence = Q1 – 12 = 8 – 12 = –4 Upper inner fence = Q3 + 12 = 16 + 12 = 28 The smallest and largest values in the data set within the two inner fences are 2 and 24, respectively. The data are skewed to the right. The values 33 and 42 are outliers. The upper outer fence is Q3 + 3(IQR) = 16 +3(8) = 40. Since 42 is greater than 40, it is an extreme outlier. 3.111 a. Let y = amount that Jeffery suggests. Then, to insure the outcome Jeffery wants, we need y 12,000(5) 20,000 6 y + 12,000(5) = 6(20,000) 46 Chapter Three y + 60,000 = 120,000 y = 60,000 So Jeffery would have to suggest $60,000 be awarded to the plaintiff. b. To prevent a juror like Jeffery from having an undue influence on the amount of damage to be awarded to the plaintiff, the jury could revise its procedure by throwing out any amounts that are outliers and than recalculate the mean, or by using the median, or by using a trimmed mean. 3.113 a. To calculate how much time the trip requires, divide miles driven by miles per hour for each 100 mile segment. Time = 100 / 52+ 100 / 65 +100/ 58= 1.92 + 1.54 + 1.72 = 5.18 hours. b. Linda’s average speed for the 300 – mile trip is not equal to (52 + 65 + 58) / 3 = 58.33 MPH. This would assume that she spent an equal amount of time on each 100–mile segment, which is not true, because her average speed is different on each segment. Linda’s average speed for the entire 300– mile trip is given by (miles driven) / (elapsed time) = 300 / 5.18 = 57.92 MPH. 3.115 a. Total amount spent per month by the 2000 shoppers = (14 × 8× 1100) + (18 × 11× 900) = $301,400 b. Total number of trips per month by the 2000 shoppers = (8 × 1100) + (11 × 900) = 18,700 Mean number of trips per month per shopper = 18,700/ 2000 = 9.35 trips a. Total amount spent per month by the 2000 shoppers = $301,400, from part a Mean amount spent per person per month = 301,400 / 2000 = $150.70 3.117 Total distance for the first 100 students = 100 x 8.73 = 873 miles Total distance for all 103 students = 873 + 11.5 + 7.6 + 10.0 = 902.1 miles Mean distance for all 103 students = 902.1 / 103 = 8.76 miles 3.119 a. Since we are dealing with a normal distribution and we know that 16% of all students scored above 85, which is µ + 15, we must also have that 16% of all students scored below µ – 15 = 55. Therefore, the remaining 68% of students scored between 55 and 85. By the empirical rule, we know that 68% of the scores fall in the interval (µ – σ) to (µ + σ), so we have µ – σ = 70 – σ = 55 and µ + σ = 70 + σ = 85. Thus, σ = 15. b. We know that 95% of the scores are between 60 and 80 and that µ = 70. By the empirical rule, 95% of the scores fall in the interval (µ – 2σ) to (µ + 2σ). So 60 = µ – 2σ = 70 – 2σ and 80 = µ + 2σ =70 + 2σ. Therefore 10 = 2σ and so σ = 5. Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 3.121 47 a. Mean = $600.35, Mean = $90, and Mode = $0, s = $2347.33, Most of the losses are zero or near zero; however there is a relatively infrequent occurrence of very large losses. b. The mean is the largest. c. Q1 = $0, Q3 = $272.50, IQR = $272.50, 1.5 x IQR = $408.75 Lower Inner fence is Q1 – 408.75 = 0 – 408.75 = –408.75 Upper Inner Fence is Q3 + 408.75 = 272.50 +408.75 = 681.25 The largest and smallest values within the two inner fences are 0 and 501 respectively. There are three outliers at 1127, 3709 and 14,589. Below are the box–and–whisker plot and the histogram for the given data. The data are skewed to the right. d. Because the data are skewed to the right, the insurance company should use the mean when considering the center of the data as it is more affected by the extreme values. The insurance company would want to use a measure that takes into consideration the possibility of extremely large losses. The standard deviation should be used as a measure of variation. 3.123 a. Since, x x x we have: n = = 12,372 / 51.55 = 240 pieces of luggage. n x b. Since, x x , x nx . Thus, the total score for the seven students is 7 x 81 = 567. Let x = n seventh student’s score. Then x + 81 + 75 + 93 + 88 + 82 + 85 = 567. Hence, x + 504 = 567, so x = 567 – 504 = 63. 3.125 For all students: n = 44, x = 6597, x2 = 1,030,639 and x x = 6597 44 n ( x) 2 n = n 1 x2 s= = 149.93 Median = 147.50 pounds pounds (6597 ) 2 44 31.0808 pounds 44 1 1,030 ,639 48 Chapter Three For men only: n = 22, x = 3848, x2 = 680,724 and Median = 179 pounds x x = 3848 22 n = 174.91 ( x) 2 n = n 1 (3848 ) 2 22 19.1160 pounds 22 1 x2 s= pounds 680 ,724 For women: n = 22, x = 2749, x2 = 349,915 and x s x n = Median = 123 pounds 2749 = 124.95 pounds 22 ( x) 2 n n 1 (2749) 2 22 17.4778 pounds 22 1 349,915 x2 In this case, the median may be more informative than the mean, since it is less influenced by extremely high or low weights. As one might expect, the mean and median weights for men are higher than those of women. For the entire group, the mean and median weights are about midway between the corresponding values for men and women. The standard deviations are roughly the same for men and women. The standard deviation for the whole group is much larger than for men or women only, due to the fact that it includes the lower weights of women and the heavier weights of men. 3.127 The given data are: 3 Ranked data are: 3 a. 6 6 9 9 12 18 15 11 10 15 25 10 11 12 15 15 18 21 21 25 26 26 38 38 41 41 62 62 x = 20.80 thousand miles, Median = 15 thousand miles, Mode = 15 thousand miles b. Range = 59 thousand miles, s2 = 249.03, s = 15.78 thousand miles c. Q1 = 10 thousand miles and Q3 = 26 thousand miles d. IQR = Q3 – Q1 = 26 – 10 = 16 thousand miles Since the interquartile range is based on the middle 50% of the observations it is not affected by outliers. The standard deviation, however, is strongly affected by outliers. Thus, the inter–quartile Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 49 range is preferable in applications in which a measure of variation is required that is unaffected by extreme values. Self–Review Test for Chapter Three 1. b 2. a and d 3. c 4. c 5. b 6. b 7. a 8. a 9. b 10. a 11.b 12. c 13. a 14. a 15. For the given data: n = 10, x = 109, x2 = 1775 x = (x)/ n = 109/10 = 10.9 ( n +1) /2 = (10 +1) /2 = 5.5 Median = (7 +9) /2 = 8 Mode = 6 Range = Largest value – Smallest value = 28 – 2 =26 ( x) 2 (109) 2 1775 n 10 65.2111 n 1 10 1 x2 s2 s = 65.2111 = 8.08 16. Suppose the 2002 gross sales (in millions of dollars) of six companies are: 1.2 1.9 .5 2.1 3.4 110.5 Then, x = 1.2 + 1.9 + .5 + 2.1 + 3.4 + 110.5 = 119.6 Mean = (x) /n = 119.6 / 6 = $19.93 million The value of $110.5 million is an outlier. When we drop it: x = 1.2 + 1.9 + .5 + 2.1 + 3.4 = 9.1 Mean = (x) /n = 9.1 / 5 = $1.82 million Thus, when we drop the value of 110.5 million, which is an outlier, the value of the mean decreases from $19.93 million to $1.82 million. 17. Reconsider the data on the 2002 gross sales (in millions of dollars) of six companies given in Problem 16, which are reproduced below. 1.2 1.9 .5 2.1 3.4 110.5 Then, Range = Largest value – Smallest value = 110.5 – .5 = $110 million When we drop the value of $110.5 million, which is an outlier: Range = 3.4 – .5 = $2.9 million 50 Chapter Three Thus, when we drop the value of $110.5 million, which is an outlier, the value of the range decreases from $110 million to $2.9 million. 18. The value of the standard deviation is zero when all the values in a data set are the same. For example, suppose the heights (in inches) of five women are: 67 67 67 67 67 This data set has no variation. As shown below the value of the standard deviation is zero for this data set. For these data: n = 5, x = 335, and x2 = 22,445. ( x) 2 n = n 1 (335 ) 2 5 5 1 x2 s= 19. 22 ,445 22 ,445 22 ,445 =0 4 a. i. Each of the two values is 5.5 years from µ = 7.3 years. Hence, k = 5.5 / 2.2 = 2.5 and 1– 1 k 2 1 =1– ( 2.5) 2 = 1 – .16 = .84 or 84% Thus, at least 84% of the cars are 1.8 to 12.8 years old. ii. Each of the two values is 6.6 years from µ = 7.3years. Hence k = 6.6 / 2.2 = 3 and 1– 1 k 2 =1– 1 (3) 2 = 1 – .11 = .89 or 89% Thus, at least 89% of the cars are .7 to 13.9 years old. b. 1 – 1 k 2 = .75 gives 1 k 2 = 1 – .75 = .25 or k2 = 1 or k = 2 2 .5 Thus, the required interval is µ – kσ to µ + kσ = {7.3 – 2(2.2)} to {7.3 + 2(2.2)} = 2.9 to 11.7 years. 20. a. µ = 7.3 years and σ = 2.2 years i. The intervals 5.1 to 9.5 years is µ – σ to µ + σ. Hence, approximately 68% of the cars are 5.1 to 9.5 years old. ii. The interval .7 to 13.9 years is µ – 3σ to µ + 3σ. Hence, approximately 99.7% of the cars are .7 to 13.9 years. b. The interval that contains ages of 95% of the cars will be µ – 2σ to µ + 2σ. Hence, this interval is: µ – 2σ to µ + 2σ = {7.3 – 2(2.2)} to {7.3 + 2(2.2)} = 2.9 to 11.7 years. Thus, approximately 95% of the cars are 2.9 to 11.7 years old. Mann – Introductory Statistics, Fifth Edition, Student Solutions Manual 21. The ranked data are: a. 51 0 1 2 3 4 5 7 8 10 11 12 13 14 15 20 The three quartiles are: Q1 = 3, Q2 = 8, and Q3 = 13. IQR = Q3 – Q1 = 13 – 3 = 10. The value 4 lies between Q1 and Q2, which indicates that this value is in the second from the bottom 25% group in the ranked data. b. kn/100 = 60(15)/100 = 9. Thus, the 60th percentile may be represented by the value of the ninth term in the ranked data, which is 10. Therefore, P 60 = 10. Thus, approximately 60% of the half hour time periods had fewer than 10 passengers set off the metal detectors during this day. c. Ten values in the given data are less than 12. Hence, percentile rank of 12 = (10/15) × 100 = 66.67%. Thus, 66.67% of the half hour time periods had fewer than 12 passengers set off the metal detectors during this day. 22. The ranked data are: 0 1 2 3 4 5 7 8 10 11 12 13 14 15 20 Q1 = 3, Q2 = 8, and Q3 = 13. IQR = Q3 – Q1 = 13 – 3 = 10, 1.5 × IQR= 1.5 × 10 =15 Lower inner fence = Q1 – 15 = 3 – 15 = –12 Upper inter fence = Q3 + 15 = 13 + 15 = 28 The smallest and largest values in the data set within the two inner fences are 0 and 20, respectively. The data does not contain any outliers. The data are skewed slightly to the right. 23. From the given information: n1 = 15, n2 = 20, x1 = $435, x2 = $490 x 24. n1 x1 n1 x 2 (15)( 435 ) (20 )( 490 ) 16 ,325 = = = $466.43 15 20 35 n1 n 2 Sum of the GPAs of five students = 5 × 3.21 = 16.05 Sum of the GPAs of four students = 3.85 + 2.67 + 3.45 + 2.91 = 12.88 GPA of the fifth student = 16.05 – 12.88 = 3.17 52 25. Chapter Three The ranked data are: 58 149 163 166 179 193 207 238 287 2534 Thus, to find the 10% trimmed mean, we drop the smallest value and the largest value (10% of 10 is 1) and find the mean of the remaining 8 values. For these 8 values, x = 149 + 163 + 166 + 179 + 193 + 207 + 238 + 287 = 1582 10% trimmed mean = (x) / 8 = 1582 / 8 = $197.75 thousand = $197,750. The 10% trimmed mean is a better summary measure for these data than the mean of all 10 values because it eliminates the effect of the outliers, 58 and 2534. 26. a. For Data Set I: For Data Set II: x = (x) / n = 79/ 4 = 19.75 x = (x) / n = 67/ 4 = 16.75 The mean of Data Set II is smaller than the mean of Data Set I by 3. b. For Data Set I: x = 79, x2 = 1945, and n = 4 ( x) 2 n = n 1 x2 s= (79 ) 2 4 11.32 4 1 1995 c. For Data Set II: x = 67, x2 = 1507, and n = 4 ( x) 2 n = n 1 x2 s= (67 ) 2 4 11.32 4 1 1507 The standard deviations of the two data sets are equal.