Download Slide 1

Spread of Data Whilst we have been looking at averages: mean, medians and modes these tell us little without knowing how the data spreads out around these central measures. The quartiles mentioned in the last section can be used to describe how data spreads around the median. They give us the range of the middle 50% of the data. This is also known as the Interquartile Range and is often represented in a Boxplot. The variance and the standard deviation are used in a similar way to describe how the data spreads around the mean. These will be used frequently in topics throughout S1 and S2. The standard deviation is the square root of the variance. The following data is the ages of children in a play school 1,1,1,2,2,2,2,2,3,3,4,4,4,4,4,5 Find the mean average age. Now calculate how much each childs age differs from the mean. How would you program a robot to calculate this? How would you differentiate between above and below the mean? If we want to know how their ages spread around the mean what is wrong with just adding these differences together and dividing by 16? How the data varies about the mean average is known as the variance. We can calculate it for lists of data by comparing each result to the mean. Subtract the mean from each piece of data and square it (to make it positive so they don't cancel each other out) Add them together and divide by how many pieces of data there are. 1,1,1,2,2,2,2,2,3,3,4,4,4,4,4,5 The data above can be put into a frequency table How might we find the variance using this table rather than finding the difference between each piece of data? This isn't the quickest method although it does make the most sense The formulae can be manipulated into this form which is a lot easier to calculate Because we have squared x the units x is measured in have also been squared. This poses a problem when interpreting the spread of data so they square root the variance to give the standard deviation. Once you've matched them up decide the black and red sets of data are examples of Weights of the 10 what? Heights of year 11 babies born in Arrowe students at St Anselms Park hospital on College 3/11/06 Size of houses in Bebington Children who attend churches on the Wirral Students in UK Babies born in November 2006 Number of bedrooms in the houses in Oaklands Drive Age of 12 children in St Andrews Sunday School To find the sample mean is exactly the same as the population mean. However the sample variance must be calculated differently. Compare the differences and similarities between this formula and the one for populations. Remember we use for populations and for samples Page 40 Exercise 3C Q3, 4 and 6 Page 43 Exercise 3D Q5 and 6 Page 49 Mixed Exercise 3F Q5 and 6 Which average would be the best way to represent this continuous data? 1,1,1,1,1,2,2,3,3,3,4,4,6,11,22 Why? Group this data with class widths: 1-2, 3-4, 5-7, 8-12, 13-22 Now sketch a histogram for it. example of skewed data.agg This data is an example of positive skew Although it can be seen in this example other data sets may be only slightly skewed and less noticible. It is also a long winded way to decide on skew - drawing a histogram. There are other methods which involve the data we have been learning to calculate on this course. Work out Q1,Q2 and Q3 for this data. By calculating the differences between the median and the two quartiles decide which side of the boxplot is bigger? (Left or right) We can use this technique to measure negative and psitive skew. Other measures for skewnwss include: positive skew: mode < median < mean negative skew: mean < median < mode or if we want to quantify the skewness as well as identify its direction we may use: 3(mean - median) standard deviation the closer the value to zero teh nearer a symmetrical distribution it is - normal There are other ways to interpret measures of location (averages) and spread (IQR and s.d.) Often these values are of most use when comparing two or more data sets rather than singular use. The Coeffient of Variation, V, is defined as: V = 100 x This gives a percentage of dispersion. The Quartile coefficient of variation, QV, is: QV = 100 x 0.5(Q3 - Q1) Q2 This last one being more useful when you have outliers you feel are distorting your data and you wish to ignore them. Lastly we should add some ways to quantify outliers which continue to be mentioned: Again there are many ways people use but one Edexcel use is: if x is less greater than 2 or less than -2 then that value of x is considered an outlier Page 71 Exercise 4F Q1 - 4 Page 72 Mixed Exercise 4G Q1, 3, 4, 7 and 8

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slide 1