Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Jan. 29 “Statistics” for one quantitative variable… Mean and standard deviation (last week!) “Robust” measures of location (median and its friends) Quartiles, IQR, five-number summary, Box plots Percentiles Transforming data… Rescale: Y = c times X Recenter: Y = X plus a other transformations adding variables to each other Standardizing data… Population vs. Sample NH polls, 1/26/04 - errors Errors from 1/26 NH polls 12 10 8 6 4 2 0.1 0.08 0.06 0.04 0.02 0 -0.02 -0.04 -0.06 -0.08 -0.1 0 A statistic anything that can be computed from data. is STATISTICS of a single quantitative variable MEAN MEDIAN QUARTILES ( Q1, Q3 ) Five-number summary Boxplots Interquartile range PERCENTILES / QUANTILES / FRACTILES STANDARD DEVIATION VARIANCE Statistics of one variable… Median --- middle value (when values are ranked, smallest to largest) (or, average of two middle values) “Robust” Trimmed mean Midmean Geometric mean “RMS mean” Number of Colleges 1 2 1 2 12 1 1 1 9 1 1 5 1 7 8 6 1 1 10 1 5 5 7 8 1 6 1 10 4 1 1 10 10 5 7 7 1 5 14 8 1 6 1 1 5 8 1 14 1 1 5 6 6 7 5 13 14 12 5 7 1 8 1 12 12 6 9 8 7 1 8 6 Number of Colleges 1 1 2 6 8 12 1 1 4 6 8 12 1 1 5 6 8 12 1 1 5 6 8 13 1 1 5 6 8 14 1 1 5 7 8 14 1 1 5 7 9 14 1 1 5 7 9 1 1 5 7 10 1 1 5 7 10 1 1 5 7 10 1 1 6 7 10 1 2 6 8 12 Number of Colleges 1 1 2 6 8 12 1 1 4 6 8 12 1 1 5 6 8 12 1 1 5 6 8 13 1 1 5 6 8 14 1 1 5 7 8 14 1 1 5 7 9 14 1 1 5 7 9 1 1 5 7 10 1 1 5 7 10 1 1 5 7 10 1 1 6 7 10 1 2 6 8 12 Mean vs. Median Large tails affect the mean more than the median. So: Right-skewed distribution Mean right of median Left-skewed distribution Mean left of median Colleges – Datadesk histogram median — 5 mean — 5.36 salaries median — mean — 60,000 106,875 So, which measure of “center” is best? All the measures agree (roughly) when the distribution is symmetrical Mean has attractive mathematical properties Also, the mean is related to the total, if that’s what you care about Median may be more “typical” when the distribution is nonsymmetrical A measure is “robust” if it works reasonably well under a wide variety of circumstances Medians are robust Computing percentiles To calculate 20-th percentile: Rank the values from smallest to largest Compute 20% of n… 20% of 72 = 14.4 Count off that many values (from lowest)… The value at which you stop is the 20-th percentile. What if you stop between values ? Number of Colleges 1 1 2 6 8 12 1 1 4 6 8 12 1 1 5 6 8 12 1 1 5 6 8 13 1 1 5 6 8 14 1 1 5 7 8 14 1 1 5 7 9 14 1 1 5 7 9 1 1 5 7 10 1 1 5 7 10 1 1 5 7 10 1 1 6 7 10 1 2 6 8 12 QUARTILES Lower quartile (Q1) = 25-th percentile Upper quartile (Q3) = 75-th percentile ( What’s Q2 ? ) INTERQUARTILE RANGE ( IQR ) = Q3 minus Q1 Five-number summary — maximum (or, say, 95 %ile) — Q3 — — median Q1 — minimum (or, say, 5 %ile) Linear Transformations If you MULTIPLY or DIVIDE a variable by a constant… Y = c times X Y=X/c then… measures of center are multiplied or divided by c measures of spread are multiplied or divided by |c| If you ADD or SUBTRACT a constant from a variable… Y=X+a Y=X–a then… measures of center are increased (decreased) by a measures of spread are UNCHANGED. More transformations ADDING VARIABLES: W = X + Y Mean(W) = Mean(X) + Mean(Y) Standard Deviation of (W) — anything can happen OTHER TRANSFORMATIONS: Y = X squared ? Y = log(X) ? …NO RELIABLE RULES for mean or std. dev. Standardized Variables Write x and S for mean, standard deviation of X Then form transformed variable: Z = (X - x ) / S Then… mean (Z) = 0 std dev (Z) = 1 Z answers the question: How many standard deviations is this value above (or below) the mean?