Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive Statistics II: Variability 9/3 Attendance Question Really, how many courses? A: 4 B: 5 C: 6 D: 7 E: 8 Variability 9 8 7 6 5 4 3 2 1 0 Central tendency locates middle of distribution How are scores distributed around that point? Low variability vs high variability Ways to measure variability 10 – – – – 3 Range Interquartile range Variance Standard deviation 4 5 6 7 Maze Attempts 8 9 Frequency (rats) Frequency (rats) • • • • 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 Maze Attempts Why Variability is Important • Inference – Reliability of estimators • For its own sake 31 52 13 43 49 58 100 61 61 97 97 55 31 m = 100 97 106 104 104 106 100105 105 100 108 108 99 97 – Consistency (manufacturing, sports, etc.) 105104 104 109 93 97 100 104 104 28 93 108 96 105 M M – Diversity (attitudes, strategies) 87.3 99.1 99 96 105 95 93 92 100 103 103 97 34 175 37 139 106 19 184 88 112 64 178 154 136 103 94 94 181 181 91 109 109 46 46 178 175 76 172 157 22 22 16 16 85 85 142 130 130 151 67 67 121 111.6 100.5 99.5 99 92 103 99 101 106 94 94 102 92 94 96 108 108 98 99 121124160 40 145 127 127 100 82 133 169 169 95 102 95 93 99 101 94 93 101 102 107 92 92 97 166 163 118 148 148 79 25 25 115 73 73 187 70 101 107 92 102 102 103 98 105 95 96 96 107 101 Range • Distance from minimum to maximum Measurement unit or precision 10 9 8 7 6 5 4 3 2 1 0 50 10 45 9 40 8 35 7 30 6 25 5 20 4 – 15 3 10 2 – 5 1 – 0 Frequency (rats) Frequency (students) range max( X ) - min( X ) X= [66.2,depends 78.6, 69.6,on 65.3, • Sample range n 62.7] • More useful as population parameter (11Height – 1) (Inches) + 1 = 11 84 82 80 78 76 74 72 70 68 66 64 62 60 58 56 54 52 50 78.6 – 62.7of+ measurement .1 = 16.0 Theoretical property variable E.g. memory test: min and max possible 1 2 3 4 5 6 7 8 9 10 11 Rough guidelines, e.g. height Maze Attempts Interquartile range • Quartiles – – – – Values of X based on dividing data into quarters 1st quartile: greater than 1/4 of data 3rd quartile: greater than 3/4 of data 2nd quartile = median • Interquartile range – Difference between 1st and 3rd quartiles – Like range, but for middle half of distribution – Not sensitive to n more stable 6–3=3 X = [1,1,2,2,2,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,7,7,7,8] 1st quartile = 3 3rd quartile = 6 Variance • Most sophisticated statistic for variability • Based on distance of each datum to the mean: |X – m| 88 94 108 m 115 122 133 729 = 272 441 = 212 49 = 72 • Could compute average of distances • Instead do squared distance 2 • Average squared difference from mean 729 441 49 49 324 900 615 .3 6 145 7 2 = 49 18 2 = 324 30 2 = 900 2 X m N Why squared difference? • Special property of mean – Given population X and some single value X, X̂ define – What X X̂ minimizes MSE? X Xˆ 2 M ean Squared E rror N • Mean error X̂ 108 X̂X̂ X̂ minimizes m 122 88 94 • Variance is intrinsic, unavoidable error 133 145 108 X̂ m 122 133 145 88 94 MSE 2000 2 0 X̂ m Alternative formula for Variance 2 X N 2 m2 • Mean of squares minus square of mean X = [m, m, m,…] X2 = [m2, m2, m2,…] Mean(X2) = m2 2 = m2 – m2 = 0 • As scores move away from m (some up, some down), S(X2) increases but m stays same Standard deviation • Typical difference between X and m • Again, based on (X – m)2 X m 2 2N X m X = [5, 3, 7, 6, 4, 6, 8, 7, 4, 2, N 3, 5] m=5 •X –Variance is average squared m = [0, -2, 2, 1, -1, 1, 3, 2, -1, -3, -2, 0] so sqrt(variance) is standard deviation Average deviation, 2 N Square-root Square (X – m)2 = [0, 4, 4, 1, 1, 1, 9, 4, 1, 9, 4, 0] X m 1.8 Average X m 3.2 2 N