* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Measures of Variation
Survey
Document related concepts
Transcript
Measures of Variation Recap: Measure of Central Tendency — Mean, Median, and Mode Mean Measure of central tendency Most common measure Acts as ‗balance point‘ Affected by extreme values (‗outliers‘) Measure of central tendency Middle value in ordered sequence If n is odd, middle value of sequence If n is even, average of 2 middle values Not affected by extreme values Median Mean is generally used, unless extreme values (outliers) exist — then median is often used, since the median is not sensitive to extreme values. Mode A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or nominal data There may be no mode There may be several modes Measures of Variation Range: difference between the extreme values (max - min), actual values are most often reported in the literature (min - max) rather than the difference Variance - measure of variation in a sample of data: mean squared deviations of a value from the mean, often referred to as the mean square or MS Standard deviation: square root of the variance, measures amount of variation of values around the mean Standard error: measure of variability of sample means around a population mean Coefficient of variation: used to compare variability among different variables that vary in magnitude of the values (elephant weight versus mouse weight) Measures of variation give information on the spread or variability of the data values. The ―data sets‖ have the same Mean, Median, and Mode yet clearly differ!. Range o Simplest measure of variation Difference between the largest and the smallest observations Range = Maximum – Minimum Team I has range 6 inches, Team II has range 17 inches. Disadvantages of the Range Ignores the way in which data are distributed Only uses two entries from the data set Sensitive to outliers . Variance and Standard Deviation Measures of variation Most common measures Consider how data are distributed Uses all the entries in a data set Show variation about mean .First you must know what is meant by the deviation of an entry in a data set. The deviation of an entry x in a population data set is the difference between the entry and the mean of the data set. Deviation of x = x – mean . Example: Find the deviation of each player‘s height for Team I. Notice that the sum of the deviations is zero. Because this is true for any data set, it does not make sense to find the average of the deviations. To overcome this problem, you can square each deviation. In a population data set, the mean of the squares of the deviations is called the population variance. The population variance of a population data set of N entries is The symbol σ is the lowercase Greek letter sigma. The portion of the variance formula is referred to as the sum of squares and is denoted by SSx. . Example: Find the population variance of the player‘s height for Team I. Now notice the unit of the answer… inches squared. We were not working with area. This is the disadvantage with variance in that its units are usually meaningless. You‘ll be able to return to the original unit of the data by using the standard deviation. . The population standard deviation of a population data set of N entries is the square root of the population variance. . Example: Find the population standard deviation of the player‘s height for Team I. . Example: Find the population variance and standard deviation of Team II. Since the population standard deviation of Team I is less than that of Team II, this is an indicator that the heights of Team I are, as a whole, closer to the mean height of the team. . Spread of data o Variances and standard deviations can be used to determine the spread of the data. If the variance or standard deviation is large, the data are more dispersed. The information is useful in comparing two or more data sets to determine which is more variable. The sample variance and sample standard deviation of a sample data set of n entries are Notice that when you find the population variance, you divide by N, the number of entries. When you find the sample variance, you divide by n – 1, one less than the number of entries. . Example: The heights of the two teams represent a sample of heights of all basketball players. Find the variance and standard deviation of this sample of all basketball players.