Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Averages and spread 2 Teacher guidance notes: worked examples incorporating solutions 1. Raw data Raw data refers to a data set where each data value is known individually. All of the different measures of location and dispersion can be calculated from raw data. Example In an experiment, scientists from CERN were tested for their level of radiation exposure (millisievert, mSv), over a year. The results were as follows: 12.3 21.2 19.0 13.1 17.1 18.1 24.0 15.1 15.4 21.7 18.2 a. Calculate the following numerical measures i. Mean 12.3 + 21.2 + 19.0 + 13.1 + 17.1 + 18.1 + 24.0 + 15.1 + 15.4 + 21.7 + 18.2 11 195.2 Μ Μ π₯Μ = 11 = 17.745 = ππ. π mSv (1dp). π₯Μ = ii. Median π+1 If there are n data values in ascending order, the median is the ( 2 )th data value. In this case, n = 11, so the median is the 6th value. In ascending order: 12.3 13.1 15.1 15.4 17.1 18.1 18.2 19.0 21.2 21.7 24.0 The median is the middle value, 18.1. iii. Mode Each data value appears once, so there is no mode. If the data were rounded to the nearest whole number: (12, 13, 15, 15, 17, 18, 18, 19, 21, 22, 24) there would be twin modes at 15 and 18. The mode is of limited use here. iv. Interquartile range If there are n data values in ascending order, the lower quartile is the ( 3(π+1) data value and the upper quartile is the ( 4 π+1 4 )th )th data value. Here, n = 11, so the quartiles are the 3rd and 9th values. In ascending order: 12.3 13.1 15.1 15.4 17.1 18.1 18.2 19.0 21.2 21.7 24.0 The upper and lower quartiles are 21.2 and 15.1, so the interquartile range is 21.2 β 15.1 = 6.1 mSv. Page 1 of 7 v. Range The range is the difference between the highest data value (24.0) and the lowest (12.3). Range = 24.0 β 12.3 = 11.7 mSv. vi. Variance The variance is the average of the squared deviations from the mean: β(π₯ β π₯Μ )2 β π₯ 2 π2 = = β π₯Μ 2 . π π 2 2 + 19.02 + 13.12 + 17.12 + 18.12 = 3596.66 β π₯ 2 = 12.3 + 21.2 2 +24.0 + 15.12 + 15.42 + 21.72 + 18.22 So β π₯2 3596.66 2 π π = β π₯Μ 2 = β (17.74Μ5Μ) = 326.9691 β 314.9012 = ππ. ππππ π 11 vii. Standard deviation The standard deviation is the square root of the variance: π=β β π₯2 β π₯Μ 2 = β12.0679 = 3.4739 = π. π mSv (1dp). π b. Interpretation: The safe level for one yearβs exposure is 20.0 mSv. Explain if the following statement is correct, using the data you have just calculated. βThe scientists at CERN are working within the safe levels of radioactive exposure.β The statement is incorrect, since the upper quartile of 21.2 is above the safe limit, suggesting that a quarter of scientists exceeded the safe limit. The safe limit should apply to each scientist individually, rather than to the average scientistβs exposure. Page 2 of 7 Worked examples: 2. Frequency distribution A frequency distribution is a compact way of describing raw data when some of the readings occur more than once. Instead of listing all the data values individually, a frequency distribution lists each different value along with its frequency (the number of times it occurs). Averages and measures of spread can be calculated by reconstructing the raw data, but for the mean, variance and standard deviation it is easier adapt the formulas to take frequencies into account: β ππ₯ π₯Μ = where π = βπ π π2 = β ππ₯ 2 β π₯Μ 2 π π=β β ππ₯ 2 β π₯Μ 2 . π These are best calculated from a table, as in the example following. Example Isabella went up and down the street to find out how many parking spaces each house has. Here are her results: Number of parking spaces Frequency 1 15 2 27 3 8 4 3 a. Calculate the mean, variance and standard deviation. This can be done by extending the table: # of parking spaces, x Totals Frequency, f f x2 fx 1 15 15 15 2 27 54 108 3 8 24 72 4 3 12 48 π = β π = 53 β ππ₯ = 105 Page 3 of 7 β ππ₯ 2 = 243 Mean: Variance: β ππ₯ π₯Μ = π2 = π β ππ₯ 2 π 105 = 53 = 1.981132 = 1.98 (3SF). β π₯Μ 2 = 243 β (1.981132)2 = 4.584906 β 3.924884 = 53 0.660021. 2 Standard deviation: π = ββ ππ₯ β π₯Μ 2 = β0.660021 = 0.812417 = 0.81. π The mean number of parking spaces is 2.0 and the standard deviation is 0.8 parking spaces. b. Calculate the median and interquartile range π+1 If there are n data values in ascending order, the median is the ( 2 )th data value. In this case, n = 53, so the median is the 27th value. To work this out, it is necessary to imagine (but not write out) the raw data in ascending order. The raw data looks like this: 15 times 27 times 8 times 3 times β 1, 1, 1, β¦ β¦ β¦ , 1, 1, β 2, 2, 2, β¦ β¦ β¦ , 2, 2, β 3, 3, 3, β¦ β¦ , 3,3, β 4, 4, 4 The 27th value is one of the 2s, so the median is 2. π+1 3(π+1) 4 4 The quartiles are the ( )th and ( )th data values. Here, n = 53, so the quartiles are the 13.5th and 40.5th values. The 40.5th value is the average of the 40th and 41st values, both of which are 3s. The upper quartile is therefore 3. The 13.5th value is the average of the 13th and 14th values, both of which are 1s. The lower quartile is therefore 1. The interquartile range is 3 β 1 = 2. c. Calculate the mode and the range The mode is the data value with the highest frequency. In this case the mode is 2. The range is the difference between the highest and lowest data values. These are 4 and 1, so the range is 4 β 1 = 3. Page 4 of 7 Worked examples: 3. Grouped data Sometimes data is grouped, either because the data is continuous or because there are a lot of different data values. For example, we might group together all the values between 0 and 10 in one group, those from 10 to 20 in another and so on. Grouped data looks a bit like a frequency distribution but, instead of having the frequency for each data value, we have the frequency for each group of data. We describe the grouping using intervals: 0 β€ π₯ < 10, 10 β€ π₯ < 25, etc. The intervals can be different sizes, but we need to be careful that they do not overlap or leave gaps. For example: the intervals 0 β€ π₯ β€ 10, 10 β€ π₯ < 25 overlap, since 10 could go into either group; the intervals 0 β€ π₯ β€ 9, 10 β€ π₯ β€ 24 leave a gap, since 9.5 does not fit into either group. Once data has been grouped, we do not know the individual data values. This means we can only estimate the averages and measures of spread. Grouped data is dealt with in much the same way as a frequency distribution, but there is one extra step to deal with: we have to decide what to use as the x-value of each group. So that all mathematicians to do this in the same way, we agree to use the midpoint of each group. We do not know the data values, so we assume that they are all at the midpoint. The midpoint is calculated by averaging the group boundaries. For example, using 0+10 intervals: 0 β€ π₯ < 10, 10 β€ π₯ < 25, the first midpoint is at 2 = 5, and the second midpoint is at 10+25 2 = 17.5. Example A survey is conducted to look into the amount of money the average customer spends at a supermarket checkout. This was done with a sample of 100 people. The information was then grouped into the following intervals: Amount spent (£) Frequency 5 ο£ x < 25 10 25 ο£ x < 40 13 40 ο£ x < 70 12 70 ο£ x < 100 29 100 ο£ x < 150 23 150 ο£ x < 200 13 Page 5 of 7 a. Estimate the mean and standard deviation of this data. Note that the interval 5 ο£ x < 25 has a frequency of 10. This tells us that 10 people spent between £5 and £24.99, since £25 would be included in the next group up. The estimates can be made by extending the table: Midpoint, x Frequency, f fx f x2 5 ο£ x < 25 15.0 10 150.0 2250.00 25 ο£ x < 40 32.5 13 422.5 13731.25 40 ο£ x < 70 55.0 12 660.0 36300.00 70 ο£ x < 100 85.0 29 2465.0 209525.00 100 ο£ x < 150 125.0 23 2875.0 359375.00 150 ο£ x < 200 175.0 13 2275.0 398125.00 Totals π = 100 β ππ₯ = 8847.5 β ππ₯ 2 = 1019306.25 Amount spent (£) π₯Μ = Mean: β ππ₯ π = 8847.5 100 = 88.475 = £ππ. ππ Variance: π2 = β ππ₯ 2 π β π₯Μ 2 = 1019306.25 100 β (88.475)2 = 10193.625 β 7827.825625 = 2365.236875 2 Standard deviation: π = ββ ππ₯ β π₯Μ 2 = β2365.236875 = 48.6337 = £ππ. ππ π b. Estimate the median and interquartile range of this data. The estimates can be made by working out the cumulative frequency table. This indicates how many data values lie below each group boundary value: Amount spent (£) Endpoint, x Cumulative frequency 5 0 5 ο£ x < 25 25 10 25 ο£ x < 40 40 23 40 ο£ x < 70 70 35 70 ο£ x < 100 100 64 100 ο£ x < 150 150 87 150 ο£ x < 200 200 100 Page 6 of 7 A cumulative frequency graph has cumulative frequency on the y-axis and data values on the x-axis. The lower quartile, median and upper quartile are the data π π 3π values on the x-axis corresponding to the cumulative frequencies of 4 , 2 , 4 respectively on the y-axis. In this case, where n = 100, we can read off the lower quartile, median and upper quartile as the values on the x-axis corresponding to cumulative frequencies of 25, 50 and 75. LQ = £45 Median = £86 UQ = £124 The median is £86 to the nearest pound. The interquartile range is £124 β £45 = £79, to the nearest pound. Page 7 of 7