Download CHAPTER 3: NUMERICAL DESCRIPTIVE MEASURES

Berenson CH03 22/12/09 11:55 AM Page 1 SOLUTIONS 1 CHAPTER 3: NUMERICAL DESCRIPTIVE MEASURES Learning Objectives: In this chapter, you learn: • To calculate and interpret numerical descriptive measures of central tendency, variation and shape for numerical data • To calculate and interpret descriptive summary measures for a population • To construct and interpret a box-and-whisker plot • To calculate and interpret the covariance and the coefficient of correlation for bivariate data. Solutions: 3.1 (a) Mean = 6; Median = 7; There is no mode. (b) Range = 7; Variance = 8.5; Interquartile range = 5.5; Standard deviation = 2.9; Coefficient of variation = (2.915/6) .100% = 48.6%. (c) Z scores: 0.343, –0.686, 1.029, 0.686, –1.372. None of the Z scores is larger than 3.0 or smaller than –3.0. There is no outlier. 3.2 (a) Mean = 7; Median = 7; Mode = 7. (b) Range = 9; Variance = 10.8; Interquartile range = 5; Standard deviation = 3.286; Coefficient of variation = (3.286/7) .100% = 46.94%. (c) Z scores: 0, –0.913, 0.609, 0, –1.217, 1.522. None of the Z scores is larger than 3.0 or smaller than –3.0. There is no outlier. 3.3 (a) Mean = 6; Median = 7; Mode = 7. (b) Range = 12; Variance = 16; Interquartile range = 6; Standard deviation = 4; Coefficient of variation = (4/6).100% = 66.67%. 3.4 (a) Mean = 2; Median = 7; Mode = 7. (b) Range = 17; Variance = 62; Interquartile range = 14.5; Standard deviation = 7.874; Coefficient of variation = (7.874/2).100% = 393.7%. – 3.5 RG = [(1 + 0.1)(1 + 0.3)]1/2 – 1 = 19.58%. Grade X Grade Y 575 575 6.4 575.4 575 2.1 Rolls and Lean Burgers – Mean: X = 11.7 Rank of the median is 11.5 Median = 8.15 3 modes 5.1, 5.4 and 8.1 Rank of the first quartile is 5.75 which rounds to 6 so Q1 = 5.4 Rank of the third quartile is 17.25 which rounds to 17 so, Q3 = 19.6 Variance S2 = 69.8314 Standard deviation S = 8.3565… Range = 30.9 Interquartile range = 14.2 Coefficient of variation CV = 71.42…% Salads – Mean: X = 15.12 Median = 15.8 no mode Rank of the first quartile is 1.5 so Q1 = 5.3 Rank of the third quartile is 4.5 so, Q3 = 24.6 Variance S2 = 99.692 Standard deviation S = 9.9845… Range = 24 Interquartile range = 19.3 Coefficient of variation CV = 66.035…% Traditional Items – Mean: X = 31.85 Median = 23.05 no mode Rank of the first quartile is 1.75 so Q1 = 20.8 Rank of the third quartile is 5.25 so, Q3 = 39.3 Variance S2 = 318.019 Standard deviation S = 17.833… Range = 45.3 Interquartile range = 18.5 Coefficient of variation CV = 55.99…% (c) As expected the traditional items have the highest average fat content, followed by salads with rolls and lean burgers the least. However, the traditional items vary the most in their fat content as indicated by the large standard deviation so some of these items have less fat than some of the ‘healthier’ options. 3.6 (a) Mean Median Standard deviation 3.7 (a) & (b) n (b) If quality is measured by the average inner diameter, Grade X tyres provide slightly better quality because X’s mean and median are both equal to the expected value, 575 mm. If, however, quality is measured by consistency, Grade Y provides better quality because, even though Y’s mean is only slightly larger than the mean for Grade X, Y’s standard deviation is much smaller. The range in values for Grade Y is 5 mm compared to the range in values for Grade X which is 16 mm. 3.8 (a) Mean: X = (c) Median = Mean Median Standard deviation Grade X Grade Y, Altered 575 575 6.4 577.4 575 6.1 In the event the fifth Y tyre measures 588 mm rather than 578 mm, Y’s average inner diameter becomes 577.4 mm, which is larger than X’s average inner diameter, and Y’s standard deviation swells from 2.07 mm to 6.11 mm. In this case, X’s tires are providing better quality in terms of the average inner diameter with only slightly more variation among the tires than Y’s. ∑ Xi 39,600 = = 2,200 18 n So mean daily sales is $2,200 n + 1 18 + 1 Rank of the median is = = 9.5 2 2 i=1 2,330 + 2,390 = 2,360 2 2 modes $2,390 and $2,400 n + 1 18 + 1 = = 4.75 round to 5 so Rank of the first quartile is 4 4 Q1 = 1,525 Rank of the third quartile is so, Q3 = 2,545 3(n + 1) 3(18 + 1) = = 14.25 round to 14 4 4 Berenson CH03 22/12/09 11:55 AM Page 2 2 SOLUTIONS 3.11 Excel output: n (b) Variance S 2 = Σ Xi2 − nX 2 i=1 n−1 Standard deviation S = S2= = 96,601,350 − 18 17 2,2002 Price($) 340 450 450 280 220 340 290 370 400 310 340 430 270 380 = 557,726.470… 557,726.470… = 746.810… Range = 3,580 – 1,350 = 2,230 Interquartile range = 2,545 – 1,525 = 1,020 Coefficient of variation 746.810… ⎛ S⎞ CV = ⎜ ⎟ 100% = 100% = 33.94…% 2,200 ⎝X⎠ (c) The mean of daily sales is $2,200 and the median is $2,360, suggesting that daily sales may be skewed to the left (only a few days with low sales) as mean < median. Furthermore, daily sales are fairly varied ranging from $1,350 to $3,580 with the middle 50% of days have sales between $1,525 and $2,545. The standard deviation of $746 implies that the majority of days have sales within $746 of the mean of $2,200. 3.9 (a) Mean 2.45 minutes; Median 2.5 minutes; Mode 1.4 minutes; First quartile 1.4 minutes; Third quartile 3.1 minutes. (b) Variance 2.271...; Standard deviation 1.507... minutes; Range 5.5 minutes; Interquartile range 1.7 minutes; Coefficient of variation 61.55...% Time Z score 0.6 0.9 1.4 1.4 1.5 2.4 2.6 2.7 2.8 3.1 3.9 6.1 –1.025 –0.725 –0.225 –0.225 –0.125 0.775 0.975 1.075 1.175 1.475 2.275 4.475 Distance Km 25.46 26 27 43 117.6 10.844 (c) The mean median and mode are all similar. So on average these negative z scores and the data is skewed to the right, in spite of the mean and median being very close. (d) The average time to serve a customer is approximate 2.5 minutes but this varies from 0.6 minutes to 6.1 minutes, probably indicating that some customers just need a yes or no answer or a brochure but that others need detailed information. A 7377.33 6667 7316.5 8091 856.229 733128 11.61% 3187 1424 5544 8731 (a) mean = $348; median = $340; 1st quartile = $290; 3rd quartile = $400. (b) variance = 4910; standard deviation = $70; range = $230; interquartile range = $110; CV = 20.14%. None of the Z scores are less than –3 or greater than 3. There is no outlier in the price. (c) The price of the digital cameras is rather symmetrical. (d) The mean price is $348 while the middle ranked price is $340. The average scatter of price around the mean is $70. 50% of the price is scattered over $110 while the difference between the highest and the lowest price is $230. 3.12 (a) and (b) Sample Data: Arithmetic Mean Median Mode Range Variance (Sample) Standard deviation (Sample) (c) 6.1 is an outlier, as the positive z-scores are generally larger than the 3.10 (a) and (b) Manufacturer Mean First quartile Median Third quartile Standard deviation Sample variance Coefficient of variation Range Interquartile range Minimum Maximum Price Z Score –0.1121 1.4576 1.4576 –0.9684 –1.8246 –0.1121 –0.8257 0.3160 0.7441 –0.5402 –0.1121 1.1722 –1.1111 0.4587 B 8260.90 7569 8140.5 9036 909.829 827789 11.01% 3034 1467 6701 9744 (c) Manufacturer’s B bulbs last on average longer, as mean and median are higher, however, the lifetimes are more varied as the standard deviation and interquartile range are also larger. people drive approximately 26 kms to work. Furthermore, the distances driven to work are clustered in the interval 15 kms to 37 kms. 3.13 (a) Mean = 473.5; Median = 451. There is no mode. The median seems to be a better descriptive measure of the data, since it is closer to the observed values than is the mean. Also the outlier of 1,049 affects the mean. (b) Range = 785; Variance = 44,422.44; Standard deviation = 210.77 (c) From the manufacturer’s viewpoint, the worst measure would be to compute the percentage of batteries that last over 400 hours (8/13 = .61). The median (451) and the mean (473.5) are both over 400, and would be better measures for the manufacturer to use in advertisements. (d) (a), (b) Mean Median Mode Range Variance Standard deviation Original Data 473.5 451 none 785 44,422.44 210.77 Altered Data 550.4 492 none 1,078 99,435.26 315.33 (c) From the manufacturer’s viewpoint, the worst measure remains the percentage of batteries that last over 400 hours (9/13 = .69). The median (492) and the mean (550.38) are both well over 400, and would be better measures for the manufacturer to use in advertisements. The shape of the distribution of the original data is right-skewed, since the mean is larger than the median. The shape of the distribution of the altered data set is right-skewed as well, since its mean is also larger than its median. Berenson CH03 22/12/09 11:55 AM Page 3 SOLUTIONS 3 3.14 (a) Mean = 4.287; Median = 4.5; Q1 = 3.20; Q3 = 5.55. (b) Variance = 2.683; Standard deviation = 1.638; Range = 6.08; Interquartile range = 2.35; Coefficient of variation = 38.21% Z scores: –0.05 0.77 –0.77 0.51 0.30 –1.19 –0.46 –0.66 0.13 1.11 –2.39 0.51 1.33 1.16 –0.30 There are no outliers. (c) Since the mean is less than the median, the distribution may be left-skewed. (d) The mean and median are both under 5 minutes and the distribution may be left-skewed, meaning that there are more unusually low observations than there are high observations. But six of the 15 bank customers sampled (or 40%) had wait times in excess of 5 minutes. So, although the customer is more likely to be served in less than 5 minutes, the manager may have been overconfident in responding that the customer would ‘almost certainly’ not wait longer than 5 minutes for service. 3.15 (a) Mean = 7.114; Median = 6.68; Q1 = 5.64; Q3 = 8.73. (b) Variance = 4.336; Standard deviation = 2.082; Range = 6.67; Interquartile range = 3.09; Coefficient of variation = 29.27%. (c) Since the mean is greater than the median, the distribution may be right-skewed. (d) The mean and median are both well over 5 minutes and the distribution may be right-skewed, meaning that there are more unusually high observations than low. Further, 13 of the 15 bank customers sampled (or 86.7%) had wait times in excess of 5 minutes. So, the customer is more likely to experience a wait time in excess of 5 minutes. The manager overstated the bank’s service record in responding that the customer would ‘almost certainly’ not wait longer than 5 minutes for service. 3.16 Asking Price Mean 472440 Median 457000 First quartile 397000 Third quartile 529000 Standard deviation 102394.989 Sample variance 1.0485E+10 Kurtosis 1.92131411 Range 524000 Interquartile range 132000 The mean asking price is $472,440. From the histogram for asking prices, Problem 2.54, we can see that $472,440 is in the upper of the two central classes $400,000 to $500,000. We would expect the mean to be higher than the “centre” of the asking prices as there are two extreme values, the asking prices over $600,000, which will have affected the mean. 3.17 (a) Year 2002 2003 2004 2005 2006 2007 Geometric rate of return Annual Return Hang Seng S&P/ASX 200 –18.2% –12.1% 34.9% 9.7% 13.2% 22.8% 4.5% 17.6% 34.2% 19.0% 39.3% 11.8% 16.0% 10.8% (b) The average rate of return for the Hang Seng is higher; however, it also is more variable. 3.18 (a) Historical crediting rate for year ending 30 June % Superannuation Fund 2008 Conservative Balanced –3.8 Balanced –5.9 High Growth –10.6 Socially Responsible, HG –8.5 Average returns to 30 June 2008 % p.a. 2007 2006 2005 2004 3 year 5 year 11.7 14.8 19.7 11.4 11.6 14.4 13.7 17.6 18.1 10.9 14.3 16.8 6.18% 7.31% 7.96% 8.18% 9.94% 11.66% 17.6 20.7 15.9 17.5 9.11% 12.08% (b) The average rate of return is lower over the previous 3 years than the previous 5 years due to the negative returns in the year ending 30th June 2008. The average returns for High Growth and Socially Responsible shares are the highest and are similar, however, these funds had the largest losses in the year ending 30 June 2008. 3.19 (a) Population Mean = 6. (b) ␴2 = 9.4; ␴ = 3.1. 3.20 (a) Population Mean = 6. (b) ␴ = 1.67; ␴2 = 2.8. 3.21 (a) Population Data Mean Variance (Pop) Standard Deviation (Pop) Male 97.92 93.58 9.67 Female 36.17 38.97 6.24 (b) Male µm ⫾ ␴m = 97.92 ⫾ 9.67 = (88.24, 107.59) 9 out of 12 or 75% of months are in this range. µm ⫾ 2␴m = 97.92 ⫾ 2 ⫻ 9.67 = (78.57, 117.26) 100% of months are in this range. Female µf ⫾ ␴f = 36.17 ⫾ 6.24 = (29.92, 42.41) 7 out of 12 or 58.3% of months are in this range. µf ⫾ 2␴f = 36.17 ⫾ 2 ⫻ 6.24 = (23.68, 48.65) 100% of months are in this range. (c) The proportion within one standard deviation of the mean for the male distribution is higher than what would be expected from the empirical rule, while the proportion within one standard deviation of the mean for the female distribution is lower than what would be expected from the empirical rule. Therefore, these distributions may not be mound shaped. 3.22 (a) As have weekly sales for all 52 weeks in the year this is population data Population Data Mean Variance (Pop) Standard Deviation (Pop) Forgive 564.873 47778.799 218.584 Rejoice 495.638 66121.428 257.141 (b) Weekly sales in the previous year were higher and less varied for Forgive chocolates than for Rejoice chocolate. However, as the population standard deviation is large for both products, we can conclude that the weekly quantity sold for each product was highly variable Berenson CH03 22/12/09 11:55 AM Page 4 4 SOLUTIONS (c) & (d) Lower Value 346.29 127.71 –90.88 Upper Value 783.46 1002.04 1220.62 Lower Rejoice Value Within 1 standard deviation 238.50 Within 2 standard deviations –18.64 Within 3 standard deviations –275.78 Upper Value 752.78 1009.92 1267.06 Forgive Within 1 standard deviation Within 2 standard deviations Within 3 standard deviations Number Percentage 35 67.31% 51 98.08% 52 100.00% Number Percentage 41 78.85% 50 96.15% 52 100.00% Last year, for the regional city store, sales were on average 564.9kg and 495.6kg for Forgive and Rejoice chocolates respectively. Furthermore, in a typical week sales were between 346.3kg and 783.5kg for Forgive chocolates and between 238.5kg and 752.8kg for Rejoice chocolates. The percentage of weeks with sales within one, two, and three standard deviations of the mean approximately follows the empirical rule for Forgive chocolates. Therefore, distribution of the quantity sold weekly of Forgive chocolates may be mound shaped with most weeks sales being close to the mean of 564.9kg, and a few weeks having very low or high sales. However, Rejoice chocolates have a larger percentage of weeks within one standard deviation than that given by the empirical rule so may not be mound shaped. As all values are within three standard deviations of the mean there are no outliers. However, there is one possible outlier for Forgive, sales of 1039.2kg in Week 1 (New Year) and two for Rejoice, sales of 1031.4kg and 1056.3kg in weeks 49 and 51 (leading up to Christmas). As you would expect sales of Rejoice chocolates to increase over the Christmas period, and possibly Forgive chocolates are needed after New Years parties, these quantities are probably not outliers. found within 2 standard deviations of the mean. 93% of the values are within 2 standard deviations of the mean and 100% of the values are within 3 standard deviations of the mean. (d) The preceding suggests there are no outliers within the sample set. 3.25 mj fj 5 15 25 35 45 10 20 40 20 10 n =100 mj fj 50 300 1,000 700 450 ∑(mj fj) = 2,500 – (mj – X )2 fj 4,000 2,000 0 2,000 4,000 – ∑(mj – X )2 fj = 12,000 c (a) X = mj fj ∑ j=1 n = 2,500 = 25 100 c Σ (mj − X )2 fj j=1 (b) S = n−1 3.26 mj = 11.01 fj 5 15 25 35 45 40 25 15 15 5 n = 100 mj fj 200 375 375 525 225 ∑(mj fj) = 1,700 – (mj – X )2 fj 5,760 100 960 4,860 3,920 – ∑(mj – X )2 fj = 15,600 c 3.23 (a) and (b) As data is all employee ages, this is population data Ages Z score 19 –0.43 19 –0.43 45 2.94 20 –0.30 21 –0.17 21 –0.17 18 –0.56 20 –0.30 23 –0.09 17 –0.69 Mean 22.30 Variance 59.810 Standard Deviation 7.734 (a) X = 3.24 (a) mean = 35. On average, the 30 employees worked 35 hours last week. (b) variance = 188.45, standard deviation = 13.73. The average squared distances between the 30 employees working hours is 188.45. If the distribution is approximately symmetrical, about 68% of the work hours will be within 13.73 of the mean value of 35. (c) Since the median = 37.5 it is highly likely that the distribution is not symmetrical. According to the Chebyshev rule at least 75% of the values will be n = 1,700 = 17 100 c (b) S = Σ (mj − X )2 fj j=1 n−1 = 12.55 3.27 Excel output for March: mj fj 1,000 3,000 5,000 7,000 9,000 11,000 (c) The mean age of all employees is 22.3 years with a standard deviation of 7.7 years. However only 20% (2 out of 10) employees have ages above the mean, so the mean is not a good measure of the ‘typical age’ of an employee. Furthermore, 90% (9 out of 10) of employees have ages within one standard deviation of the mean. There is one extreme age of 45 years 2.9 standard deviations above the mean, which has unduly affected the mean. mj fj ∑ j=1 6 13 17 10 4 0 n = 50 mj fj 6,000 83,030,400 39,000 38,459,200 85,000 1,332,800 70,000 51,984,000 36,000 73,273,600 0 0 – ∑(mj fj) = 236,000 ∑(mj – X )2 fj = 2.48E+08 Excel output for April: mj 1,000 3,000 5,000 7,000 9,000 11,000 fj 10 14 13 10 0 3 n = 50 – (mj – X )2 fj mj fj – (mj – X )2 fj 10,000 1.16E+08 42,000 27,440,000 65,000 4,680,000 70,000 67,600,000 0 0 33,000 1.31E+08 – ∑(mj fj) = 220,000 ∑(mj – X )2 fj = 3.46E+08 Berenson CH03 22/12/09 11:55 AM Page 5 SOLUTIONS 5 3.32 (a) Five-number Summary c (a) March: X = mj fj ∑ j=1 n = 236,000 = 4,720 50 c April: X = mj fj ∑ j=1 n = 220,000 = 4,400 50 c (b) March: S = Σ (mj − X )2 fj j=1 n−1 A 5544 6667 7316.5 8091 8731 Minimum First Quartile Median Third Quartile Maximum B 6701 7569 8140.5 9036 9744 (b) CFL light Bulbs = 2,250.08 CFL Light Bulbs c April: S = Σ (mj − X )2 fj j=1 n−1 Manufacturer B = 2,657.30 (c) The arithmetic mean has declined by $320 while the standard Manufacturer A deviation has increased by $407.22. 3.28 (a) Five-number summary: 2 3 7 8.5 9 (b) 5500 6000 6500 7000 7500 8000 8500 9000 9500 10000 Life in Hours 0 5 Manufacturer’s B bulbs generally last longer. 10 The distribution is left-skewed. There is a longer distance from Q1 to Q2 than from Q2 to Q3, confirming our conclusion that the data are left-skewed. 3.33 (a) From solution of Problem 3.8 the five-number summary is 1350 1525 2360 2545 3580 (b) Box-and-whisker plot 3.29 (a) Five-number summary: 3 4 7 9 12 (b) 1000 0 10 5 1500 2000 15 The distribution is almost symmetrical. The data set is almost symmetrical since the median line almost divides the box in half but the whiskers show right skewness. 3.30 (a) Five-number summary: 0 3 7 9 12 (b) 2500 3000 Daily sales ($) 3500 4000 The results are inconsistent. The right-hand whisker is far longer than the left-hand and the distance from the median to Xlargest is greater than the distance from the median to Xsmallest indicating right skewness. However the left-hand side of the box is far longer than the right-hand side indicating left skewness. 3.34 (a) From solution of Problem 3.7 the five-number summaries are –10 0 10 20 The distribution is left-skewed. The box-and-whisker plot shows a longer left box from Q1 to Q2 than from Q2 to Q3, visually confirming our conclusion that the data are leftskewed. Rolls and Lean Burgers Salads Traditional Items 3.7 3.9 19.8 5.4 5.3 20.8 8.15 15.8 23.05 19.6 24.6 39.3 34.6 27.9 65.1 (b) Fat Content per Serve grams Traditional Items 3.31 (a) Five-number summary: –8 –6.5 7 8 9 (b) Salads –10 –5 0 5 Rolls and Lean Burgers 10 0 The distribution is left-skewed. The box-and-whisker plot shows a longer left box from Q1 to Q2 than from Q2 to Q3, visually confirming our conclusion that the data are leftskewed. 10 20 30 40 50 60 70 (c) The traditional items have the highest average fat content, followed by salads with rolls and lean burgers the least. However, the fat content of traditional items and rolls and lean burgers vary more than that of the salads. Therefore, some rolls and lean burgers have more fat than any Berenson CH03 22/12/09 11:55 AM Page 6 6 SOLUTIONS salad and some traditional items have less fat than some ‘healthier’ options. Rolls and lean burgers and the traditional items are both skewed to the right while the fat content of salads seems fairly symmetric. 3.38 (a) cov (ULP, Diesel) = 18.5301K r = 0.6814K (b) In NSW on this day there was a slightly strong positive linear 3.35 (a) Commercial district: Five-number summary: 0.38 3.2 4.5 5.55 relationship between petrol and diesel prices, where petrol prices are high so are diesel prices. 6.46 Residential area: Five-number summary: 3.82 5.64 6.68 8.73 10.49 (b) Commercial district: 3.39 (a) (e) Annual Water Usage—Local Restaurants Waiting Time 2 0 4 6 8 The distribution is skewed to the left. Residential area: Box-and-whisker plot Annual Water Usage—Kilolitres Box-and-whisker plot 1000 900 800 700 600 500 400 300 200 100 0 40 45 50 Waiting Time 4 2 0 8 6 10 12 The distribution is skewed slightly to the right. (c) The central tendency of the waiting times for the bank branch located in the commercial district of a city is lower than that of the branch located in the residential area. There are a few longer than normal waiting times for the branch located in the residential area whereas there are a few exceptionally short waiting times for the branch located in the commercial area. 3.36 (a) cov(X,Y) = 65.2909. (b) SX2 = 21.7636, SY2= 195.8727 r= cov(X,Y ) S 2X S 2Y 65.2909 = 21.7636 195.8727 55 60 65 70 75 Number of Seats = +1.0 (c) There is a perfect positive linear relationship between X and Y; all the points lie exactly on a straight line with a positive slope. 3.37 Let X = number of sales staff; Y = sales $ million. n (b) cov (size, water usage) = 10.4029K r = 0.7941K (c) From the results of parts (a) and (b) we can conclude that there is a moderately strong positive linear relationship between size and annual water usage. 3.40 All of the data collected would be considered to be the population. Let X = Exports; Y = Imports N (a) SSXY = ∑ (Xi − X ) (Yi − Y ) = 1,181,005,626 i=1 cov(X,Y) = SSXY = 1,181,005,626/42 = 28,119,181 N SSXY 1,181,005,626 = = 0.7531 (b) r = SSX SSY 1,638,229,311 1,501,163,228 (c) The correlation coefficient is more valuable for expressing the relationship between exports and imports as it does not depend on the measurement units. (d) We can conclude that there is a relatively strong positive linear relationship between exports and imports. 3.41 Central tendency or location refers to the fact that most sets of data (a) SSXY = ∑ XiYi − nX Y = 10,569 − (10 22 46) = 449 show a distinct tendency to group or cluster about a certain central point. SSXY 449 = = 49.88… n−1 9 Can conclude that there is a positive linear relationship between the number of people in a sales team and the sales generated. 3.42 The arithmetic mean is a simple average of all the values, but is subject to the effect of extreme values. The median is the middle ranked value, but varies more from sample to sample than the arithmetic mean, although it is less susceptible to extreme values. The mode is the most common value, but is extremely variable from sample to sample. i=1 cov(X, Y ) = n SSX = ∑ Xi2 − nX 2 = 5,022 − (10 222) = 182 i=1 n SSY = ∑ i=1 r= Yi2 − nY 2 = 22,822 − (10 462) = 1,662 SSXY SSX SSY = 449 182 1,662 = 0.816… (b) Can conclude that there is a fairly strong positive linear relationship between the number of people in a sales team and the sales generated. 3.43 The first quartile is the value below which 1⁄4 of the total ranked observations will fall, the median is the value that divides the total ranked observations into two equal halves and the third quartile is the observation above which 1⁄4 of the total ranked observations will fall. 3.44 Variation is the amount of dispersion, or ‘spread’, in the data. 3.45 The Z score measures how many sample standard deviations an observation in a data set is away from the sample mean. Berenson CH03 22/12/09 11:55 AM Page 7 SOLUTIONS 7 3.46 The range is a simple measure, but only measures the difference between the extremes. The interquartile range measures the range of the centre fifty percent of the data. The standard deviation measures variation around the mean while the variance measures the squared variation around the mean, and these are the only measures that take into account each observation. The coefficient of variation measures the variation around the mean relative to the mean. 3.47 The Chebyshev rule applies to any type of distribution while the empirical rule applies only to data sets that are approximately bell-shaped. The empirical rule is more accurate than Chebyshev rule in approximating the concentration of data around the mean. 3.48 (a) mean = 5.5014; median = 5.515; first quartile = 5.44; third quartile = 5.57. (b) range = 0.52; interquartile range = 0.13; variance = 0.0112; standard deviation = 0.10583; coefficient of variation = 1.924%. (c) The mean weight of the tea bags in the sample is 5.5014 grams while the middle ranked weight is 5.515. The company should be concerned about the central tendency because that is where the majority of the weight will cluster around. The average of the squared differences between the weights in the sample and the sample mean is 0.0112 whereas the square-root of it is 0.106 gram. The difference between the lightest and the heaviest tea bags in the sample is 0.52. 50% of the tea bags in the sample weigh between 5.44 and 5.57 grams. According to the empirical rule, about 68% of the tea bags produced will have weight that falls within 0.106 gram around 5.5014 grams. The company producing the tea bags should be concerned about the variation because tea bags will not weigh exactly the same due to various factors in the production process, e.g. temperature and humidity inside the factory, differences in the density of the tea, etc. Having some idea about the amount of variation will enable the company to adjust the production process accordingly. 3.49 (a) and (b) Sample Data NSW ULP NSW Diesel QLD ULP QLD Diesel Number of Data Points 40 40 40 40 Minimum 137.90 172.90 133.50 164.90 Maximum 164.90 187.90 169.90 192.90 Total 6064.50 7278.00 5809.30 6947.90 Arithmetic mean 151.61 181.95 145.23 173.70 Median 151.90 180.90 144.90 172.90 Mode 151.90 179.90 137.90 170.90 First Quartile 144.9 179.9 137.9 170.9 Third Quartile 158.9 185.9 149.9 176.9 Range 27.00 15.00 36.40 28.00 Inter Quartile Range 14.00 6.00 12.00 6.00 Variance (Sample) 53.609 13.793 68.349 23.006 Standard Deviation (Sample) 7.322 3.714 8.267 4.796 Coefficient of Variation (Sample) 4.83% 2.04% 5.69% 2.76% We can conclude that on this day, petrol and diesel prices were on average higher but less variable in NSW than in Queensland. Also diesel prices are higher but less variable than unleaded petrol prices. (c) Fuel Prices—August 2008 Qld Diesel Qld ULP NSW Diesel NSW ULP (d) 130 140 150 160 170 180 190 200 Box-and-whisker plot Teabags 5 5.2 5.4 5.6 5.8 6 The data is slightly left skewed. (e) On average, the weight of the tea bags is quite close to the target of 5.5 grams. Even though the mean weight is close to the target weight of 5.5 grams, the standard deviation of 0.106 indicates that about 75% of the tea bags will fall within 0.212 gram around the target weight of 5.5 grams. The interquartile range of 0.13 also indicates that half of the tea bags in the sample fall in an interval 0.13 gram around the median weight of 5.515 grams. The process can be adjusted to reduce the variation of the weight around the target mean. Petrol and diesel prices are skewed to the right in Queensland but symmetric and skewed to the left respectively in New South Wales. Furthermore, the box-and whisker plots show that prices for both petrol and diesel are generally higher in New South Wales than in Queensland. (d) cov (ULP, Diesel) = 26.3039K r = 0.6633K In Queensland on this day there was a slightly strong positive linear relationship between petrol and diesel prices, where petrol prices are high so are diesel prices. 3.50 (a) & (b) Sample Data: Number of Data Points Minimum Maximum Total Arithmetic mean Median Mode First Quartile Third Quartile Range Inter Quartile Range Variance (Sample) Standard Deviation (Sample) Coefficient of Variation (Sample) Total Mark 55 14 94 3343 60.782 63 Multi 56 73 80 17 327.581 18.099 0.298 Berenson CH03 22/12/09 11:55 AM Page 8 8 SOLUTIONS The sample mean 60.782 is slightly less than the median 63 which indicates that the data may be skewed to the left with a few small values (marks). From the standard deviation of 18 we can conclude that the majority of marks, or a typical student’s mark, will be in the range of approximately 61 ⫾ 18, that is, 43 to 79. 50% of students have marks in the range 56 to 73 with 25% less than 56 and 25% more than 73. The distribution of processing times for Plant A is right-skewed. (d) Processing times for Plants A and B are quite different. Plant B has a greater range of processing times, much more dispersion among data values, a higher median, a higher value for the third quartile, and a greater extreme value than Plant A. (c) 3.53 (a) Other charts are also appropriate. 20 40 60 Total Mark out of 100 80 Data seems skewed to the left as the distance from median to lowest mark is longer than the distance from median to the highest mark, and the left-hand whisker is longer than the right hand whisker. However, the right-hand side of the box is longer than the left-hand side. 1948.8 (d) cov(X,Y) = = 39.771… 49 cov(X,Y) 39.771… r= = = 0.7939… SXSY 7.48… ⫻ 6.69… (e) There is a strong positive linear relationship between a student’s semester mark and their exam mark. 3.51 (a) Mean Standard deviation 2001 Males 35.3 21.8 2001 Females 37.1 22.7 2006 Males 36.4 22.2 2006 Females 38.1 22.9 Note: these are population parameters and are approximations as calculated from the frequency distribution with the midpoint of the last class (85 and over) estimated to be 87. (b) These statistics show that the average age for females is higher than males and slightly more varied. Furthermore, the Australian population is growing older, with the average age of both males and females increasing from 2001 to 2006. Ages are also slightly more varied in 2006 than 2001. 3.52 (a), (b) Plant A 9.382 8.515 7.29 11.42 17.2 4.13 15.981 3.998 42.61% Mean Median Q1 Q3 Range Interquartile range Variance Standard deviation Coefficient of variation (c) Plant B 11.354 11.96 6.25 14.25 23.42 8 26.277 5.126 45.15% Box-and-whisker plot B A 50 0 100 0 20 100 150 200 250 Annual Household Income $’000’s 40 60 80 100 120 140 Monthly Account $ (b) Sample Data: Number of Data Points Minimum Maximum Total Arithmetic mean Median Mode First Quartile Third Quartile Range Inter Quartile Range Variance (Sample) Standard Deviation (Sample) Coefficient of Variation (Sample) 160 Income $’000’s 40 11 245 3,622 90.55 74.5 Multi 40 122 234 77.5 3,950.613 62.854 0.684 5 10 15 20 25 180 200 Amount $ 40 0.47 182.2 2,104.99 52.625 47.18 Nil 5.47 89.15 181.73 78.475 1,906.184 43.66 0.83 both household income and monthly account are skewed to the right, this is supported by in both cases the mean being higher than the median, due to the mean being affected by a few large values. The annual incomes of 25% of households are less than $40,000 and 25% are more than $122,000 while the monthly long distance call accounts of 25% of households are less than $5.47 and 25% are more than $89.15. (d) Scatter plot 200 150 100 50 0 0 50 100 150 200 250 Annual Household Income $’000 0 300 (c) From the box-and-whisker plots we can see that the distributions of Monthly Long Distance Account $ 0 300 Berenson CH03 22/12/09 11:55 AM Page 9 SOLUTIONS 9 (e) r = 0.565…. (f) The scatter plot and the value of the correlation (d) coefficient show that there is a weak positive linear relationship between a household’s income and the monthly long distance phone account. Capital City Suburb—Rejoice Chocolates 3.54 (a) and (b) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 A Rejoice Mean First Quartile Median Third Quartile Mode Standard Deviation Sample Variance Range Interquartile Range Coefficient of Variation Minimum Maximum Sum Count B 554.76 441.9 490.35 596.6 #N/A 216.898 47044.927 747.3 154.7 39.10% 385.2 1132.5 5547.6 10 For the capital city suburban store the mean weekly sales for Rejoice chocolates is 554.76kg. Furthermore, for 50% of weeks sales are between 441.9kg and 596.6kg. As the median is less than the mean, the distribution may be skewed to the right. The range is large compared to the interquartile range. 300 Rejoice 385.2 412.0 441.9 445.2 453.4 527.3 545.1 596.6 608.4 1,132.5 Z-score –0.8 –0.7 –0.5 –0.5 –0.5 –0.1 0.0 0.2 0.2 2.7 As 90% of the values are within one standard deviation of the mean, the distribution does not follow the empirical rule, so is not a symmetric mound shaped distribution. Also the weekly sales of 1132.5kg may be a possible outlier, so should be investigated. 500 600 800 900 700 Weekly Sales, kg 1000 1100 1200 The distribution of weekly sales of Rejoice chocolates for this store is skewed to the right. 3.55 (a) Geometric mean change in CPI for Australia 2004 to 2008 is 3.12% Geometric mean change in CPI for New Zealand 2002 to 2006 is 3.04%. (b) For the given period New Zealand’s annual inflation rate was on average slightly less than Australia’s annual rate of 3.12%. 3.56 (a) Forgive r = 0.9260K Rejoice: r = 0.9701K For both products there is a very strong positive linear relationship between weekly quantity sold and associated costs. (b) r = –0.3420K there is a weak negative linear relationship between Forgive and Rejoice chocolates. When sales are high for one sales tend to be low for the other. 3.57 Much of this output is not valid. Examples • (c) 400 • Gender and major are categorical variables so box-and-whisker plots are not valid for this data, nor are the calculated statistics, mean, median etc Height and grade point value are numerical variables so pie charts are not valid for this data 3.58 Answers will vary.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CHAPTER 3: NUMERICAL DESCRIPTIVE MEASURES