Download Statistics Measures of Variation Unit Plan

Measures of Variation Unit Plan I. Basic Measures of Variation Consider these two sets: 𝐴: {5,5,5,5,5,5} 𝐵: {0, 0, 0, 10, 10, 10} They have the same mean, median and midrange, yet they are very different. Clearly something other than Measures of Central Tendency is needed to fully describe the nature of data sets. We will use Measures of Variation to quantify the obvious differences between sets A and B. These measures attempt to quantify the concept of “average difference from the average.” In this unit we will look at Five Basic Measures of Variation: Range, Deviation, Sum of Squares, Variance and Standard Deviation. We already know how to find range from when we made Frequency Tables: It is the maximum minus the minimum. The next Measure of Variation we will look at is deviation. Each data point in a set has its own deviation. It is found by the following formula: 𝑑𝑥 = 𝑥 − 𝑚𝑒𝑎𝑛 But this tells us little about the variation of a set as a whole. To do that, we need to make a data table of three columns, called a variation table. The first two are x (our data points), and d (their deviations.) We can sum this second column as a check. The sum of the d column should always be zero: Σ(𝑑) = 0 (check) Then, we want to square the deviations and add them up, to find the Sum of Squares, which has the symbol 𝑆𝑆𝑥 . In formula notation: 𝑆𝑆𝑥 = ∑(𝑑 2 ) The reason we want to square the deviations is to get rid of the negatives (remember, the square of a negative number is positive.) One might think simply taking the absolute value would be easier, but doing this causes problems later on, so we square instead. Next is the tricky part: To calculate the next MOV, variance, it matters whether you have a population or a sample… If you have a population, you divide Sum of Squares by N, the population size. The symbol is 𝜎 2 , “little sigma squared.” That is: 𝜎2 = 𝑆𝑆𝑥 𝑁 If you have a sample, you divide Sum of Squares by one less than the sample size (n). The symbol is 𝑠 2 , “s squared.” That is: 𝑠2 = 𝑆𝑆𝑥 (𝑛 − 1) The reason for the difference in these calculations is something called Degrees of Freedom. It is an advanced topic and you don’t need to worry about it for this course. However, if you’re curious, to begin to understand Degrees of Freedom, think what would happen if you tried to find the variation of a population by taking a sample of size 1. Would your result have any meaning? Next, we have to get rid of the squares we put in to these calculations. Doing this will give us standard deviation, which is the MOV we will be using later on in the course. Remember, square root cancels square, so: 𝜎 = √𝜎 2 (population standard deviation) 𝑠 = √𝑠 2 (sample standard deviation) Collect a data set from the class and find its variance and standard deviation. HW: Basic Measures of Variation Topic Practice II. Estimating the Variance of a Frequency Table Just like we could estimate the mean of a frequency table, so we can estimate the variance and standard deviation of one. The basic idea is the same: Let the midpoints stand in for data points. Find the estimated mean in the same way as the last section: 𝑚𝑒𝑎𝑛 (𝑒𝑠𝑡. ) = Σ(𝑥𝑓) 𝑁 𝑜𝑟 𝑛 Here, remember, x stands for midpoint, as we don’t know the original data points! Now, find the deviation of each class using the same formula as before (d = x – mean), except now x is the midpoint. We can square the deviations as before, and as in the estimated mean formula, we must weight each square by the frequency of its class. So the formula for Sum of Squares changes to: 𝑆𝑆𝑥 (𝑒𝑠𝑡. ) = Σ(𝑑2 𝑓) After that, variance and standard deviation can be calculated as above. Give an example. A similar technique can be used for other forms of grouped data that are not frequency distributions. HW: Estimating the Variance of a Frequency Table Topic Practice III. Chebychev’s Theorem This is the first important Theorem we will be learning in Statistics. It states that the portion of data within k standard deviations of the mean must exceed 1-1/(k^2). That is: 𝑃 ≥1− 1 𝑘2 Sometimes it is not immediately apparent what the value of k is. In such times, this formula can be helpful: 𝑘= |𝑚𝑒𝑎𝑛 − 𝑏𝑜𝑢𝑛𝑑| 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 There are also times when P is given but k is unknown. In these cases we have to remember some Algebra skills and solve for k. Give some examples of Chebychev-style problems. HW: Chebychev’s Theorem Topic Practice IV. Using Measures of Variation to Classify and Compare Data Sets We can use our basic measure of variation both to quantify our notion of skew from the last unit, and thereby be able to determine which of two sets is more skewed, and to compare the variation of sets with data denominated in different units: 1.) Pearson’s Index of Skew We can now quantify the idea of skew we learned in the last section with the following formula: 𝑃= 3(𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛) 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 2.) Coefficient of Variation It is improper to compare the variation of two sets if the data were collected in different units (for example, speeding data collected in the US vs. that in Canada – one would be in miles and one in kilometers.) To get around this problem, we have Coefficient of Variation (CV). The formula is: 𝐶𝑉 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 This measure is “unitless,” so we can use it to compare the variation of sets with different units. CV is usually expressed as a percent, so multiply the answer you get from the formula by 100. HW: p. 90-91 #45,50 V. Unit Review HW: Measures of Variation Worksheets Measures of Variation Test

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics Measures of Variation Unit Plan