Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AS-Level Maths: Statistics 1 for Edexcel S1.2 Calculating means and standard deviations This icon indicates the slide contains activities created in Flash. These activities are not editable. For more detailed instructions, see the Getting Started presentation. 11 of of 26 26 © Boardworks Ltd 2005 Contents Means Calculating means Calculating standard deviations Coding 22 of of 26 26 © Boardworks Ltd 2005 Mean The mean is the most widely used average in statistics. It is found by adding up all the values in the data and dividing by how many values there are. Notation: If the data values are x1, x2 , x3 ,..., xn , then the mean is This is the mean symbol x1 x2 x3 ... xn xi x n n This symbol means the total of all the x values Note: The mean takes into account every piece of data, so it is affected by outliers in the data. The median is preferred over the mean if the data contains outliers or is skewed. 3 of 26 © Boardworks Ltd 2005 Mean If data are presented in a frequency table: Value Frequency x1 x2 f1 f2 … … xn fn then the mean is x1 f1 x2 f 2 ... xn f n xi fi x fi fi 4 of 26 © Boardworks Ltd 2005 Mean Example: The table shows the results of a survey into household size. Find the mean size. Household size, x Frequency, f x×f 1 20 20 2 28 56 3 25 75 4 19 76 5 16 80 6 6 36 TOTAL 114 343 To find the mean, we add a 3rd column to the table. Mean = 343 ÷ 114 = 3.01 5 of 26 © Boardworks Ltd 2005 Contents Standard deviation Calculating means Calculating standard deviations Coding 66 of of 26 26 © Boardworks Ltd 2005 Standard deviation There are three commonly used measures of spread (or dispersion) – the range, the inter-quartile range and the standard deviation. The standard deviation is widely used in statistics to measure spread. It is based on all the values in the data, so it is sensitive to the presence of outliers in the data. The variance is related to the standard deviation: variance = (standard deviation)2 The following formulae can be used to find the variance and s.d. (x x ) variance i n 7 of 26 2 s.d. (x x ) 2 i n © Boardworks Ltd 2005 Standard deviation Example: The mid-day temperatures (in °C) recorded for one week in June were: 21, 23, 24, 19, 19, 20, 21 First we find the mean: x xi xi x ( xi x )2 21 0 0 23 2 4 24 3 9 19 -2 4 19 -2 4 20 -1 1 21 0 0 Total: 8 of 26 21 23 ... 21 147 21°C 7 7 (x x ) variance 2 i n So variance = 22 ÷ 7 = 3.143 So, s.d. = 1.77°C (3 s.f.) 22 © Boardworks Ltd 2005 Standard deviation There is an alternative formula which is usually a more convenient way to find the variance: variance ( xi x ) 2 n But, ( xi x )2 ( xi2 2 xi x x 2 ) xi2 2 x xi nx 2 xi2 2 x nx nx 2 xi2 nx 2 Therefore, 9 of 26 x variance i n 2 x and s.d. 2 x i n 2 x2 © Boardworks Ltd 2005 Standard deviation Example (continued): Looking again at the temperature data for June: 21, 23, 24, 19, 19, 20, 21 147 21°C We know that x 7 Also, So, 2 2 2 = 3109 x 21 23 ... 21 i 2 x variance i n 2 3109 x 212 3.143 7 2 s.d. 1.77 °C Note: Essentially the standard deviation is a measure of how close the values are to the mean value. 10 of 26 © Boardworks Ltd 2005 Calculating standard deviation from a table When the data is presented in a frequency table, the formula for finding the standard deviation needs to be adjusted slightly: s.d. f Example: A class of 20 students were asked how many times they exercise in a normal week. Find the mean and the standard deviation. 11 of 26 fi xi 2 x2 i Number of times exercise taken Frequency 0 5 1 3 2 5 3 4 4 2 5 1 © Boardworks Ltd 2005 Calculating standard deviation from a table No. of times exercise taken, x Frequency, f x×f x2 × f 0 5 0 0 1 3 3 3 2 5 10 20 3 4 12 36 4 2 8 32 5 1 5 25 20 38 TOTAL: 116 The table can be extended to help find the mean and the s.d. 38 x 1 .9 20 12 of 26 s.d. f fi xi i 2 116 x 1.92 1.48 20 2 © Boardworks Ltd 2005 Calculating standard deviation from a table If data is presented in a grouped frequency table, it is only possible to estimate the mean and the standard deviation. This is because the exact data values are not known. An estimate is obtained by using the mid-point of an interval to represent each of the values in that interval. Example: The table shows the annual mileage for the employees of an insurance company. Estimate the mean and standard deviation. 13 of 26 Annual mileage, x Frequency 0 ≤ x < 5000 6 5000 ≤ x < 10,000 17 10,000 ≤ x < 15,000 14 15,000 ≤ x < 20,000 5 20,000 ≤ x < 30,000 3 © Boardworks Ltd 2005 Calculating standard deviation from a table Mileage Frequency, f Mid-point, x f×x f × x2 0 – 5000 6 2500 15000 37,500,000 5000 – 10,000 17 7500 127,500 956,250,000 10,000 – 15,000 14 12,500 175,000 2,187,500,000 15,000 – 20,000 5 17,500 87,500 1,531,250,000 20,000 – 30,000 3 25,000 75,000 1,875,000,000 480,000 6,587,500,000 TOTAL 45 480,000 x 10,667 miles 45 s.d. 14 of 26 6,587,500,000 10,6672 5711 miles 45 © Boardworks Ltd 2005 Notes about standard deviation Here are some notes to consider about standard deviation. In most distributions, about 67% of the data will lie within 1 standard deviation of the mean, whilst nearly all the data values will lie within 2 standard deviations of the mean. Values that lie more than 2 standard deviations from the mean are sometimes classed as outliers – any such values should be treated carefully. Standard deviation is measured in the same units as the original data. Variance is measured in the same units squared. Most calculators have a built-in function which will find the standard deviation for you. Learn how to use this facility on your calculator. 15 of 26 © Boardworks Ltd 2005 Examination-style question Examination-style question: The ages of the people in a cinema queue one Monday afternoon are shown in the stem-and-leaf diagram: 2 3 4 5 6 2 3 1 1 0 1 3 means 23 years old 6 6 6 2 5 6 9 4 7 a) Explain why the diagram suggests that the mean and standard deviation can be sensibly used as measures of location and spread respectively. b) Calculate the mean and the standard deviation of the ages. c) The mean and the standard deviation of the ages of the people in the queue on Monday evening were 29 and 6.2 respectively. Compare the ages of the people queuing at the cinema in the afternoon with those in the evening. 16 of 26 © Boardworks Ltd 2005 Examination-style question 2 3 means 23 years old a) The mean and the standard 2 3 6 deviation are appropriate, as 3 1 6 6 the distribution of ages is 4 1 2 5 6 9 roughly symmetrical and 5 0 4 7 6 1 there are no outliers. 597 b) xi 597 so, x 42.64286 42.6 14 27,131 2 2 x 27131 so, s.d. 42 . 64286 10.9 i 14 c) The cinemagoers in the evening had a smaller mean age, meaning that they were, on average, younger than those in the afternoon. The standard deviation for the ages in the evening was also smaller, suggesting that the evening audience were closer together in age. 17 of 26 © Boardworks Ltd 2005 Combining sets of data Sometimes in examination questions you are asked to pool two sets of data together. Example: Six male and five female students sit an A-level examination. The mean marks were 52% and 57% for the males and females respectively. The standard deviations were 14 and 18 respectively. Find the combined mean and the standard deviation for the marks of all 11 students. 18 of 26 © Boardworks Ltd 2005 Combining sets of data Let x1,..., x6 be the marks for the 6 male students. Let y1,..., y5 be the marks of the 5 female students. To find the overall mean, we first need to find the total marks for all 11 students. As y 57 x 6 52 312 y 5 57 285 Therefore x y 312 285 597 As x 52 So the combined mean is: 19 of 26 597 54.2727... 54.3% 11 © Boardworks Ltd 2005 Combining sets of data To find the overall standard deviation, we need to find the total of the marks squared for all 11 students. Notice that the formula s.d. rearranges to give x i n 2 x2 2 2 2 x n ( s.d. x ) As s.d.x 14 2 2 2 x 6 ( 14 52 ) 17,400 As s.d.y 18 2 2 2 y 5 ( 18 57 ) 17,865 Therefore, 2 2 x y 35,265 So the combined s.d. is: 20 of 26 35,265 54.272 16.1% (to 3 s.f.) 11 © Boardworks Ltd 2005 Contents Coding Calculating means Calculating standard deviations Coding 21 21 of of 26 26 © Boardworks Ltd 2005 Coding Coding is a technique that can simplify the numerical effort required in finding a mean or standard deviation. Enter some data below, and see how it changes when you add or multiply by different numbers. 22 of 26 © Boardworks Ltd 2005 Coding Adding So, if a number b is added to each piece of data, the mean value is also increased by b. The standard deviation is unchanged. Multiplying If each piece of data is multiplied by a, the mean value is multiplied by a. The standard deviation is also multiplied by a. More formally, if yi axi b then: y ax b s.d.y a s.d.x 23 of 26 © Boardworks Ltd 2005 Coding Example: Find the mean and the standard deviation of the values in the table. Use the transformation below to help you. 1 y x 5 10 x Frequency y 50 3 0 60 5 1 70 7 2 80 4 3 90 1 4 Using the given transformation, add a y column to the table. 24 of 26 © Boardworks Ltd 2005 Coding y Frequency, f y×f y2 × f 0 3 0 0 1 5 5 5 2 7 14 28 3 4 12 36 4 1 4 16 20 35 85 Total To find the mean: To find the s.d.: 35 y 1.75 20 s.d. f y f i i i 25 of 26 2 y2 85 1.752 1.09 20 © Boardworks Ltd 2005 Coding You have now found the mean and standard deviation of y. To find them for the x values, you must reverse the coding. 1 x 5 We can rearrange: y 10 to get: x 10 y 50 Therefore the mean of x is: x 10 y 50 10 1.75 50 67.5 And the standard deviation of x is: 10 × 1.09 = 10.9 Note how the coding helped to simplify the calculations by making the numbers smaller. 26 of 26 © Boardworks Ltd 2005