Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 1010: standard deviation 4.3 Measures of Variation (part 2) ! How much variation is there in the data? ! Look for the spread of the distribution. ! What do we mean by “spread”? ! Part ! Part 1: Range, the Quartiles, and IQR 2: Standard deviation (and Variance) 1 Limitations of measures of spread: Range ! The range only considers the most extreme values (min and max). The middle observations do not affect the range at all. " These 1) 0 distributions can have the same range: 2) 5 10 0 5 10 2 Limitations of measures of spread: 5-number summary ! The 5-number summary and the quartiles allow some of the middle numbers to contribute to the measure of spread (better). " But we often like to summarize the spread with a single value (not 5 numbers). ! Can we find a measure of spread that allows EVERY observation to contribute to the measure AND is a single value? 3 Stat 1010: standard deviation Yes! ! The standard deviation. " Involves a computation that includes EVERY observation. " Provides a single summary value of the spread of a distribution. 4 Standard Deviation ! Based " This ! What on the deviation from the mean. is a distance from the center. is a ‘deviation’? ( xi − x ) Mean computed from ALL observations (measure of center). One observed value. Observation i. 5 Golf Scores (n=6) 46, 44, 50, 43, 47, 52 x= 40 282 = 47 strokes 6 45 50 55 6 Stat 1010: standard deviation Deviations (distances from the mean) Graphically –4 +5 –3 +3 –1 40 45 50 55 7 Deviations (distances from the mean) Observed value – mean = deviation Numerically 43 – 47 = – 4 52 – 47 = + 5 –4 +5 –3 –1 40 45 +3 44 – 47 = – 3 50 – 47 = + 3 46 – 47 = – 1 47 – 47 = 50 0 55 8 Standard Deviation ! Standard deviation is a measure of the average of all the distances (or absolute deviations) from the mean. ! We can think of it as the average distance the observations are from the mean. ! Larger standard deviation # more spread. standard deviation # less spread. ! Smaller 9 Stat 1010: standard deviation Standard Deviation ! Because the actual mean of the deviations is always zero, we instead focus on the mean of the squared deviations. ! Mean of the deviations: -4+5+-3+3+-1+0 0 = =0 6 6 " Also, we divide by n-1 rather than n for technical reasons. " And we take the square root so we can work in our original units (not squared units). 10 Standard Deviation ! The letter s usually represents the standard deviation. (∑ ( x − x ) ) 2 s= i n −1 (∑(deviations from the mean) ) 2 s= total number of observations −1 11 Standard Deviation ! Golf score example. s= 1st observation or x1 mean or x (43 – 47)2=(- 4)2=16 (16 + 9 +1+ 25 + 9 + 0) 6 −1 6 observations altogether 60 = = 12 = 3.5strokes 5 12 Stat 1010: standard deviation Technical note: Variance ! A commonly discussed measure of spread. the average squared deviation. ! Denoted as s2 (where s is the standard deviation). ! Almost ∑( x − x ) ) ( = 2 s2 i n −1 13 Variance ! Golf score example. " Notice s2 = the units of the variance is strokes2. (16 + 9 +1+ 25 + 9 + 0) = 60 5 = 12 strokes 5 2 Standard deviation is nicer to work with than variance because it goes back to strokes (original units, not squared units). 14 Center & Spread Choice (General guidelines) ! For symmetrical distributions (Like the bell-shaped normal distribution) " Use Mean (center) and Standard Deviation (spread) of these measures are affected by outliers, so we don’t usually use them for skewed distributions, but they’re very useful for symmetrical distributions. " Both ! For skewed distributions " Use Median (center) and 5-Number Summary (spread) 15