Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Introductory Mathematics & Statistics Chapter 13 Measures of Variation Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-1 Learning Objectives • Calculate common measures of variation (including the range, interquartile range, mean deviation and standard deviation) from grouped and ungrouped data • Calculate and interpret the coefficient of variation Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-2 13.1 Introduction • A measure of central tendency in itself is not sufficient to describe a set of data adequately • A measure of dispersion (or spread) of the data is usually required • This measure gives an indication of the internal variation of the data—that is, the extent to which data items vary from one another or from a central point • Some reasons for requiring a measure of dispersion of a set of data: – As an indication of the reliability of the average value – To assist in controlling unwanted variation Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-3 13.2 The range • The simplest measure of dispersion is the range • It is the difference between the largest and smallest values in a set of data Range = largest observation – smallest observation • Examples of uses of range include – Temperature fluctuations on a given day – Movement of share prices – Acceptable range of systolic and diastolic blood pressures Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-4 13.2 The range (cont…) • Range is considered primitive as it considers only the extreme values, which may not be useful indicators of the bulk of the population • Extreme values, called outliers, may often result from errors of measurement • Outliers are defined as values that are inconsistent with the rest of the data • Although the range is the quickest and easiest measure of dispersion to calculate, its should be interpreted with some caution Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-5 13.3 The interquartile range (midspread) • Measures the range of the middle 50% of the values only • Is defined as the difference between the upper and lower quartiles Interquartile range = upper quartile – lower quartile = Q3 – Q1 • May be calculated from grouped frequency distributions that contain open-ended class intervals • It is usually only used with a large number of observations Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-6 13.4 The mean deviation • The mean deviation takes into account the actual value of each observation • It measures the ‘average’ distance of each observation away from the mean of the data • It gives an equal weight to each observation • It is generally more sensitive than the range or interquartile range, since a change in any value will affect it Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-7 13.4 The mean deviation (cont…) • The residual measures the actual deviation (or distance) of each observation from the mean • A set of x values has a mean of x • The residual of a particular x-value is: Residual x x Example If the mean for a set of data is 3.22, find the residual for an observation of 4.38 Solution The residual of 4.38 is 4.38 – 3.22 = 1.16 Note: Residuals can be in the negative range. It shows that the observation is below the mean Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-8 13.4 The mean deviation (cont…) • The mean deviation is defined as the mean of these absolute deviations: xx Mean deviation n • To calculate the mean deviation Step 1: Calculate the mean of the data Step 2: Subtract the mean from each observation and record the resulting differences Step 3: Write down the absolute value of each of the differences found in Step 2 (ignore their signs) Step 4: Calculate the mean of the absolute values of the differences found in step 3 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-9 13.4 The mean deviation (cont…) Example The batting scores of a cricketer was recorded over 10 completed innings to date. His scores were: 32, 27, 38, 25, 20, 32, 34, 28, 40, 29 Calculate the mean deviation of the cricketers’ scores Solution Step 1 32 27 29 10 30.5 x The cricketers’ average number of runs is 30.5 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-10 13.4 The mean deviation (cont…) Step 2 and 3 completed in the table Step 4 Score Deviation from mean Absolute value of deviation 32 +1.5 1.5 27 -3.5 3.5 29 -1.5 x x 0 Mean xx deviation 1.5 x x 47.0 n 47.0 10 4.7 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-11 13.4 The mean deviation (cont…) • Calculation of the mean deviation from a frequency distribution – If the data is in the form of a frequency distribution, the mean deviation can be calculated f xx Mean deviation f Where f = the frequency on an observation x f = the sum on the frequencies = n Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-12 13.5 The standard deviation • The most commonly used measure of dispersion is the standard deviation • It takes into account every observation and measures the ‘average deviation’ of observations from mean • It works with squares of residuals, not absolute values, therefore it is easier to use in further calculations • The values of the mean deviation and standard deviation should be reasonably close, since they are both measuring the variation of the observations from their mean Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-13 13.5 The standard deviation (cont…) • Population standard deviation – Uses squares of the residuals, which will eliminate the effect of the signs, since squares of numbers cannot be negative Step 1: find the sum of the squares of the residuals Step 2: find their mean. Step 3: take the square root of this mean. Standard deviation 2 x x N Where N = the size of the population The square of the population standard deviation is called the variance. Variance 2 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-14 13.5 The standard deviation (cont…) • Sample standard deviation – It is rare to calculate the value of usually very large since populations are – It is far more likely that the sample standard deviation (denoted by s) will be needed. Sample standard deviation s 2 x x n 1 – Where: (n – 1) is the number of observations in the sample Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-15 13.5 The standard deviation (cont…) • A note on the use of (n − 1) in formulae – If the value of n is large, it will only make a slight difference to the answer whether you divide by n or (n − 1) – To calculate the value of s from a sample the calculator button will usually be indicated by one of n−1 or xn−1 or x or written either on it or near it – To calculate the value of from a population, the calculator key will usually be indicated by one of n or xn or x or written either on it or near it Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-16 13.5 The standard deviation (cont…) • Important points about the standard deviation – The standard deviation cannot be negative – The standard deviation of a set of data is zero if, and only if, the observations are of equal value – The standard deviation can never exceed the range of the data – The more scattered the data, the greater the standard deviation – The square of the standard deviation is called the variance Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-17 13.5 The standard deviation (cont…) • Calculation of the sample standard deviation Step 1: Calculate the mean x Step 2: For each x-value, find the value of the residual Step 3: Square the residuals xx x x 2 Step 4: Calculate the sum of the squares of the residuals Step 5: Divide the sum found in step 4 by (n – 1) Step 6: Take the square root of the quantity found in step 5: this is the sample standard deviation Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-18 13.5 The standard deviation (cont…) • Calculation of the standard deviation from a frequency distribution – If the data are in the form of a frequency distribution, No. Units n 1 Frequency f 85 2 192 3 123 Total 400 Total – Calculate standard deviation using: f x x 2 s f 1 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-19 13.5 The standard deviation (cont…) • Calculation of the standard deviation from a grouped frequency distribution – When calculating s from a grouped frequency distribution, we should assume that the observations in each class interval are concentrated at the midpoint of the interval f m x 2 s – Where f 1 x = the estimated mean of the same m = the midpoint of the class interval f = the frequency of the class interval Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-20 13.6 The coefficient of variation • This is a measure of relative variability used to: – measure changes that have occurred in a population over time – compare variability of two populations that are expressed in different units of measurement • It is expressed as a percentage rather than in terms of the units of the particular data Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-21 13.6 The coefficient of variation (cont…) • The formula for the coefficient of variation (V) is: s V 100 % x Where x = the mean of the sample s = the standard deviation of the sample Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-22 13.6 The coefficient of variation (cont…) Example Calculate the coefficient of variation for the price of 400 g cans of pet food, given that the mean is 81 cents and s = 6.77 cents. Interpret the results. Solution s V 100 % x 6.77 100 % 81 8.36% This means that the standard deviation of the price of a 400g can of pet food is 8.36% of the mean price. Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-23 13.7 Remarks • Among the more important characteristics of the standard deviation are: – It is the most frequently used measure of dispersion, and because of its mathematical properties it has widespread use in problems involving statistical inference – If the mean cannot be calculated, neither can the standard deviation – Its value is affected by the value of every observation in the data – If the data have a number of extreme values, the value of the standard deviation may be distorted so as not to be a good ‘representative’ measure of dispersion Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-24 Summary • Among the more important characteristics of the standard deviation are: – It is the most frequently used measure of dispersion, and because of its mathematical properties it has widespread use in problems involving statistical inference. – If the mean cannot be calculated. neither can the standard deviation. – Its value is affected by the value of every observation in the data. – If the data have a number of extreme values, the value of the standard deviation may be distorted so as not to be a good ‘representative’ measure of dispersion. Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e 13-25