Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6 Outline solutions In all these solutions, don’t be put off if you have minor disagreements with my answers – this can be due to rounding off of figures within calculations, and is nothing to get excited about. You should be able to tell whether your answers are near enough to be correct, or whether there are sufficiently different to suggest that you’ve actually made a mistake. Definite signs that something has gone wrong are negative numbers under the square root sign in a standard deviation calculation, a mean which is outside the range of the original data, etc! 1 (a) 1.5, 2, 2, 3, 3.5, 4, 4, 4, 6, 7 It’s easiest to arrange the data as a table, and to use the formulae for ungrouped data: x 1.5 2 2 3 3.5 4 4 4 6 7 37 x2 2.25 4 4 9 12.25 16 16 16 36 49 164.5 Mean, x = x n = 37 = 3.7 minutes 10 Standard deviation = x n 2 x 2 = 164.5 / 10 3.72 = 1.66 minutes Median will be the (10 + 1)/2 = 5.5th value – that’s halfway between 3.5 and 4, which is 3.75 minutes. Quartiles are at the 11/4 = 2.75th and the 3 x 11/4 = 8.25th position, giving a value ¾ of the way between 2 and 2 – that’s 2 minutes – and one ¼ of the way between 4 and 6 – that’s 4.5 minutes. 2. Here we need to set up the following table – we’ve had to make a decision about closing the top class, so I’ve assumed that no call lasts more than 20 minutes. If you’ve made a different assumption you will have different answers: Duration (mins) less than 1 No. of calls 6 x fx fx^2 0.5 3 1.5 1 but under 2 2 but under 3 3 but under 5 5 but under 10 10 or more Total 19 32 65 36 11 1.5 28.5 42.75 2.5 80 200 4 260 1040 7.5 270 2025 15 165 2475 806.5 5784.25 169 and to use the formulae for grouped data. Mean = fx = f 806 .5 = 4.77 minutes. 169 Standard deviation = x n 2 x2 = 5784.25 / 169 4.772 = 3.38 minutes. For the median and quartiles we need the cumulative table: Duration (mins) less than 1 less than 2 less than 3 less than 5 less than 10 less than 20 Cumulative frequency 6 25 57 122 158 169 Median = (169 + 1)/2 = 85th value (it would be OK here to use the 169/2 = 84.5th value since the sample is large – the discrepancy in the answers will be small). The 85th value comes in the class 3 but less than 5. The proportion calculation as shown in the book gives: median = 3 + 85 57 28 x2=3+ x 2 = 3.86 minutes. 122 57 65 Quartile 1 is the 170/4 = 42.5th value, lying in the class 2 but less than 3, so we get Q1 = 2 + 42.5 25 x 1 = 2.55 minutes, 57 25 and in the same way Q3 = 127.5th value = 5.76 minutes. The mean is bigger than the median here, and Q3 is further from the median than Q1, because the distribution has a long tail at the upper end, and those high values ‘pull’ the mean and Q3 upwards. So the median probably gives a better picture of the ‘typical’ length of a phone call. 3. Use the same formulae and table layout as above, except that we don’t need to use a class midpoint for x here, since the values (marks) are discrete. You should find that mean = 1192/189 = 6.31 marks, standard deviation = 1.79 marks, median = 6 marks, Q1 = 5 marks and Q3 = 8 marks (there is no need to use the interpolation formula to calculate median and quartiles in this case, since the marks are discrete). 4. Working out the class mid-point is always tricky with age data. The point is that we go on quoting our age as, say, 70 until the day before our 71st birthday. Thus the class 61-70 actually includes everyone up to 70 years and 364 days – so it’s really ’61 but less than 71’. Its mid-point is thus 66. The same applies for all the other classes – for example, 71-75 is really ’71 but under 76’, thus having mid-point 73.5. We then have the table below: Age range No of residents x fx fx^2 f 61-70 71-75 76-80 81-85 86-90 91-95 96-100 3 12 16 27 18 7 2 Total 85 66 198 13068 73.5 882 64827 78.5 1256 98596 83.5 2254.5 188250.8 88.5 1593 140980.5 93.5 654.5 61195.75 98.5 197 19404.5 7035 586322.5 so mean = 7035/85 = 82.76 years, standard deviation = 6.92 years. The cumulative frequency table is: Age range cumulative frequency under 71 under 76 under 81 under 86 under 91 under 96 under 101 3 15 31 58 76 83 85 (again, the limits are 71, 76 etc not 70, 75 etc.) So median = 43rd value = 81 + 12/27 x 5 years = 83.22 years. Similarly Q1 = 78.03 years, and Q3 = 87.81 years. Here the mean is slightly pulled downwards by the small proportion of younger people in the sample.