* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Statistics Measures of Variation Unit Plan
Survey
Document related concepts
Transcript
Measures of Variation Unit Plan I. Basic Measures of Variation Consider these two sets: π΄: {5,5,5,5,5,5} π΅: {0, 0, 0, 10, 10, 10} They have the same mean, median and midrange, yet they are very different. Clearly something other than Measures of Central Tendency is needed to fully describe the nature of data sets. We will use Measures of Variation to quantify the obvious differences between sets A and B. These measures attempt to quantify the concept of βaverage difference from the average.β In this unit we will look at Five Basic Measures of Variation: Range, Deviation, Sum of Squares, Variance and Standard Deviation. We already know how to find range from when we made Frequency Tables: It is the maximum minus the minimum. The next Measure of Variation we will look at is deviation. Each data point in a set has its own deviation. It is found by the following formula: ππ₯ = π₯ β ππππ But this tells us little about the variation of a set as a whole. To do that, we need to make a data table of three columns, called a variation table. The first two are x (our data points), and d (their deviations.) We can sum this second column as a check. The sum of the d column should always be zero: Ξ£(π) = 0 (check) Then, we want to square the deviations and add them up, to find the Sum of Squares, which has the symbol πππ₯ . In formula notation: πππ₯ = β(π 2 ) The reason we want to square the deviations is to get rid of the negatives (remember, the square of a negative number is positive.) One might think simply taking the absolute value would be easier, but doing this causes problems later on, so we square instead. Next is the tricky part: To calculate the next MOV, variance, it matters whether you have a population or a sampleβ¦ If you have a population, you divide Sum of Squares by N, the population size. The symbol is π 2 , βlittle sigma squared.β That is: π2 = πππ₯ π If you have a sample, you divide Sum of Squares by one less than the sample size (n). The symbol is π 2 , βs squared.β That is: π 2 = πππ₯ (π β 1) The reason for the difference in these calculations is something called Degrees of Freedom. It is an advanced topic and you donβt need to worry about it for this course. However, if youβre curious, to begin to understand Degrees of Freedom, think what would happen if you tried to find the variation of a population by taking a sample of size 1. Would your result have any meaning? Next, we have to get rid of the squares we put in to these calculations. Doing this will give us standard deviation, which is the MOV we will be using later on in the course. Remember, square root cancels square, so: π = βπ 2 (population standard deviation) π = βπ 2 (sample standard deviation) Collect a data set from the class and find its variance and standard deviation. HW: Basic Measures of Variation Topic Practice II. Estimating the Variance of a Frequency Table Just like we could estimate the mean of a frequency table, so we can estimate the variance and standard deviation of one. The basic idea is the same: Let the midpoints stand in for data points. Find the estimated mean in the same way as the last section: ππππ (ππ π‘. ) = Ξ£(π₯π) π ππ π Here, remember, x stands for midpoint, as we donβt know the original data points! Now, find the deviation of each class using the same formula as before (d = x β mean), except now x is the midpoint. We can square the deviations as before, and as in the estimated mean formula, we must weight each square by the frequency of its class. So the formula for Sum of Squares changes to: πππ₯ (ππ π‘. ) = Ξ£(π2 π) After that, variance and standard deviation can be calculated as above. Give an example. A similar technique can be used for other forms of grouped data that are not frequency distributions. HW: Estimating the Variance of a Frequency Table Topic Practice III. Chebychevβs Theorem This is the first important Theorem we will be learning in Statistics. It states that the portion of data within k standard deviations of the mean must exceed 1-1/(k^2). That is: π β₯1β 1 π2 Sometimes it is not immediately apparent what the value of k is. In such times, this formula can be helpful: π= |ππππ β πππ’ππ| π π‘ππππππ πππ£πππ‘πππ There are also times when P is given but k is unknown. In these cases we have to remember some Algebra skills and solve for k. Give some examples of Chebychev-style problems. HW: Chebychevβs Theorem Topic Practice IV. Using Measures of Variation to Classify and Compare Data Sets We can use our basic measure of variation both to quantify our notion of skew from the last unit, and thereby be able to determine which of two sets is more skewed, and to compare the variation of sets with data denominated in different units: 1.) Pearsonβs Index of Skew We can now quantify the idea of skew we learned in the last section with the following formula: π= 3(ππππ β ππππππ) π π‘ππππππ πππ£πππ‘πππ 2.) Coefficient of Variation It is improper to compare the variation of two sets if the data were collected in different units (for example, speeding data collected in the US vs. that in Canada β one would be in miles and one in kilometers.) To get around this problem, we have Coefficient of Variation (CV). The formula is: πΆπ = π π‘ππππππ πππ£πππ‘πππ ππππ This measure is βunitless,β so we can use it to compare the variation of sets with different units. CV is usually expressed as a percent, so multiply the answer you get from the formula by 100. HW: p. 90-91 #45,50 V. Unit Review HW: Measures of Variation Worksheets Measures of Variation Test