Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Variation Measures of variation quantify how spread out the data is. Variation is one of the core ideas in Statistics Super-simple measure of variation Range = highest value – lowest value Not good for much, but gives us some idea how spread out the data is. Standard Deviation Standard Deviation is a measure of variation based on the mean Because of this, it can be strongly influenced by outliers, just like the mean. Standard Deviation is always positive or 0 (zero only if all the data are the same) The standard deviation has the same units as the data Calculating Standard Deviation Definitional formula s (x x) 2 n 1 Notice we are measuring variation of the data from the mean. This formula is for the sample standard deviation, and is based on the sample mean and sample size Calculating Standard Deviation Shortcut Formula s n x x 2 2 n(n 1) The advantage: No need to calculate the mean first The disadvantage: Doesn’t make as much sense Example: Definitional Form Data x 7 8 10 11 13 x 12.3 25 s x x xx 2 7-12.3 = -5.3 8-12.3 = -4.3 10-12.3 = -2.3 11-12.3 = -1.3 13-12.3 = 0.7 25-12.3 = 12.7 2 ( x x ) n 1 28.09 18.49 5.29 1.69 .49 161.29 215.34 6.6 5 Example: Shortcut Form Data x 7 8 10 11 13 25 Sums: 74 x2 49 64 100 121 169 625 1128 s n x x 2 2 n(n 1) 61128 74 6(6 1) 2 s 1292 s 30 s 6.6 Population Standard Deviation If we have the population data, we can calculate the population standard deviation. To distinguish it, we use a different symbol. (x ) N 2 Variance Sample Variance: s 2 Population Variance: 2 Understanding Standard Deviation Main idea: Bigger value, data is more spread out. Smaller value, data is closer together. Rule of Thumb range To very roughly approximate s, s 4 Rough interpretation: “Most” data will be within two standard deviations of the mean. In other words, Approximate highest value x 2s Approximate lowest value x 2s Empirical Rule For data sets with a bell-shaped distribution, Example For a particular fast-food store, the time people have to wait at the drive-through has a bell-shaped distribution with x 3.5 min s 0.7 min Then about 68% of people wait between x s 2.8 min and x s 4.2 min About 95% of people wait between x 2s 2.1min and x 2s 4.9 min Almost everyone (99.7%) of people wait between x 3s 1.4 min and x 3s 5.6 min Homework 2.5: 3, 9, 21, 23, 25, 33