Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Summary of Prev. Lecture   Central Tendency Mode   Median   Highest frequency with Nominal or Category data Middle value that can avoid outliers' influence Mean    Arithmetic Mean: First and Second Moment Geometric Mean Weighted Mean 1 Distribution Descriptor 2 1. Measure of Dispersion (2) Geography Jinmu Choi 2. Range and Percentile (2) 3. Mean Deviation, Variance, Std. Dev. (3) 4. Weighted Var. and Std. Dev., CV (3) 5. Skewness and Kurtosis (2) Summary and Next… 2 Dispersion  Dispersion: How the values are concentrated or scattered around the mean and along the value line    Very similar to the mean Quite different from the mean Just scattered around Xa: 1, 3, 5, 7, 9, 11, 13: Mean = Range = Xb: -11, -5, 1, 7, 13, 19, 25: Mean = Range = 3 Dispersion Measures  Magnitude of dispersion      Range: Maximum – Minimum Percentiles Mean deviations Standard deviations Direction and Sharpness   Skewness Kurtosis 4 Range  Range: Maximum – Minimum   The greater the range in a data series, the more dispersed the data are Only how far the values are scattered Xb: -11, -5, 1, 7, 13, 19, 25 : Mean = Range = Xc: -11, -10, 6, 7, 8, 24, 25: Mean = Range = 5 Percentiles  Milestones within the range of data   Sorting and counting ¼, ½, ¾ of the total observations from the minimum Medium = ½ from the minimum = 50% Xb: -11, -5, 1, 7, 13, 19, 25 : Mean = Range = Percentile Xc: -11, -10, 6, 7, 8, 24, 25: Mean = Range = Percentile 6 Mean Deviation  Dispersion using all values The average difference from all values to their mean Xa: 1, 3, 5, 7, 9, 11, 13: n xi  x Mean Dev. = 3.4286 Xb: -11, -5, 1, 7, 13, 19, 25: D  i 1 n Mean Dev. = 10.285  Only concern the distance of the values from the mean, not the direction M.:5 M.Dev. = 2.22… 1 2 3 4 5 6 7 8 9 M.:6 M.Dev. = 3.33…   1 2 3 4 5 6 7 8 18 7 Variance   Squared difference from the mean Population variance n 2    x    n 2 i i 1 n   x  2 i i 1 n  2 Sample variance  x  x  n S2  i 1 2 i n 1 2   xi    xi    i 1   i 1  n 1 n(n  1) n n 2 8 Standard Deviation  Averaged squared deviation   The magnitude or scale of the original dataset Mean: 201.23, Var.: 88432.30, Std. Dev. : 297.38 n  x      i 1  x  x  n 2 i S n i 1 2 i n 1 Resembling Normal distribution with Standard Dev. x   x    About 68% of the data value:   About 95% of the data value: About 99% of the data value: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 x  2  x  2 x  3  x  3 M.:5 Std.Dev. = 2.58… M.:6 Std.Dev. = 4.76… 18 9 Weighted Variance  Variance for grouped data     n 2  n  x    2 i i 1  n  x  2 i i 1 n  f x  x   f x k k 2 i i  2 w 2 i i  xw 2  w2  i 1 k  i 1 k Get the range for each group (class) fi fi   Get mid value for each group (class) i 1 i 1 Put mid value for each observation 2 n n n 2   2 Calculate variance using list of mid values xi  x   xi    xi    i 1  2 S   i 1 n 1 Range Mid value 4~50 4~50 4~50 4~50 4~50 4~50 4~50 4~50 4~50 4~50 50~200 50~200 50~200 200~1000 200~1000 200~1000 27 27 27 27 27 27 27 27 27 27 125 125 125 600 600 600 i 1 n 1  k S  2 w fx i 1 2 i i k f i 1 i n(n  1)  xw 1 10 2 Weighted Standard Deviation   Square root of weighted variance Sw  Unweighted variance: 88432.30  Unweighted std. dev.: 297.38  Weighted variance: 1537.7615  Weighted std. dev.: 39.21 Why they are differ?  Variations in each group have been removed fx i 1 k f i 1 2 1 i   k w   xw 2 i i i 1 Unweighted Vs. Weighted statistics   k f i xi  x w  2 k f i 1 i 11 Coefficient of Variation   Problem of Mean, Variance: Sensitive to scale Standard deviation X: 1 3 5 7 9 11 13: mean 7, std. dev.: 4 Y: 10 30 50 70 90 110 130: mean 70, std.dev.: 40  Coefficient of variation     To check just scale difference between two datasets S  CV  CV  x x Mean: the center of the data Standard deviation: how much dispersion the data have Both (CV): difference in magnitude for comparing multiple datasets 12 Skewness  Third moment statistic: Directional bias of the distribution of the data  x  x  n Sk    n 3 X axis: numerical range Y axis: frequency Positive skewness   i Use frequency distribution (histogram)   i 1 3 Bulk < Mean Negative skewness  Mean < Bulk 13 Kurtosis  Fourth moment statistic: Sharpness of the distribution of the data  x  x  n K      i n 4 3 Use histogram   i 1 4 X axis: numerical range Y axis: frequency Kurtosis of normal dist.: 3 Normal distribution: K=0 High Kurtosis (sharp peak): K>0 Low Kurtosis (flat): K<0 14 Summary  Dispersion        Range: gives boundary Percentile: gives clustering of observation Mean Deviation: magnitude of dispersion Variance and Standard Deviation: magnitude of dispersion Weighted Variance and Standard Deviation: dispersion of grouped values Coefficient of Variation: removes scale differences Direction and Sharpness   Skewness: direction from mean Kurtosis: sharpness compared to normal distribution 15 Next   Lab3: Additional Statistics and MAUP Lecture 4: Relationship Descriptor 1. Correlation Analysis (Ch 3, pp.94-107) 16