Download slide show

VARIABILITY Distributions Measuring dispersion Variance and standard deviation Review: Distribution An arrangement of cases according to their score or value on one or more variables • Categorical variable • Continuous variable Case no. Age Height M/F 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 23 22 23 25 27 22 24 23 23 25 21 21 24 27 21 25 22 22 25 26 21 31 24 31 23 27 25 26 22 29 24 68 64 69 71 64 72 65 66 66 68 68 62 71 66 62 56 71 70 66 60 52 70 71 61 72 71 71 64 66 69 67 M F F M F M F M F F M F M F F F M M F F F F M F M F M F F M F Summary statistics mean = 24 mean = 67 %M 39 %F 61 Dispersion officers How do cases “disperse” (arrange themselves) around the mean? Three statistics that measure dispersion Measure how cases “disperse” (arrange themselves) around the mean – – – – •  (x - ) ----------n Average distance between the mean and the values (scores) for each case Uses absolute distances (no + or -) Affected by extreme scores We’ll never use it in class Average deviation officers • Variance (s2): A sample’s cumulative dispersion  (x - )2 ----------n  we always use n-1 (our sample sizes are always small) • Standard deviation (s): A standardized form of variance, comparable between samples  (x - )2 ----------n  we always use n-1 (our sample sizes are always small) – Square root of the variance – Expresses dispersion in units of equal size for that particular distribution – Less affected by extreme scores Variability exercise Sample 1 (n=10) Officer Score Mean Diff. Sq. 1 3 2.9 .1 .01 2 3 2.9 .1 .01 3 3 2.9 .1 .01 4 3 2.9 .1 .01 5 3 2.9 .1 .01 6 3 2.9 .1 .01 7 3 2.9 .1 .01 8 1 2.9 -1.9 3.61 9 2 2.9 -.9 .81 10 5 2.9 2.1 4.41 ____________________________________________________ Sum 8.90 Variance (sum of squares / n-1) s2 .99 Standard deviation (sq. root of variance) s .99 Random sample of patrol officers, each scored 1-5 on a cynicism scale This is not an acceptable graph – it’s only to illustrate dispersion Sample 2 (n=10) Another random sample of patrol officers, each scored 1-5 on a cynicism scale Officer Score Mean Diff. Sq. 1 2 3 4 5 6 7 8 9 10 2 1 1 2 3 3 3 3 4 2 ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ Sum ____ Variance s2 ____ Standard deviation s ____ Compute ... Two random samples of patrol officers, each scored 1-5 on a cynicism scale Sample 1 (n=10) Officer 1 2 3 4 5 6 7 8 9 10 Score 3 3 3 3 3 3 3 1 2 5 Mean 2.9 2.9 2.9 2.9 2.9 2.9 2.9 2.9 2.9 2.9 Variance (sum of squares / n-1) Standard deviation (sq. root of variance) Sample 2 (n=10) Diff. .1 .1 .1 .1 .1 .1 .1 -1.9 -.9 2.1 Sq. .01 .01 .01 .01 .01 .01 .01 3.61 .81 4.41 Sum s2 s 8.90 .99 .99 Officer 1 2 3 4 5 6 7 8 9 10 Score 2 1 1 2 3 3 3 3 4 2 Mean 2.4 2.4 2.4 2.4 2.4 2.4 2.4 2.4 2.4 2.4 Variance (sum of squares / n-1) Standard deviation (sq. root of variance) Diff. -.4 -1.4 -1.4 -.4 .6 .6 .6 .6 1.6 -.4 Sq. .16 1.96 1.96 .16 .36 .36 .36 .36 2.56 .16 Sum s2 s 8.40 .93 .97 These are not acceptable graphs – they’re only used here to illustrate how the scores disperse around the mean VARIABILITY Shape of distributions Flat, peaked, normal “Flat” distributions • • Dispersion (aka, “variability”): How scores or values arrange themselves around the mean When scores are more dispersed (i.e., “variability” is greater) a distribution’s shape gets flatter – Greater distance between most scores and the mean – Many scores are at a considerable distance from the mean – The mean loses value as a “summary statistic” Arrests Mean A poor 3.65 descriptor “Peaked” and “normal” distributions • • • Dispersion (aka, “variability”): How scores or values arrange themselves around the mean Peaked: If most scores cluster about a certain value the shape of the distribution is called “peaked” Normal: If the clustering of scores is around the mean the distribution is called “normal” – In social science research it turns out that scores or values for many variables are normally or nearnormally distributed – This allows use of the mean to describe the underlying datasets – That’s why means are called a “summary statistic” - they can “summarize” the values of samples or populations Arrests Mean Not a good 2.3  descriptor Peaked distribution (but not “normal”) Arrests Mean A good 3.0  descriptor Peaked and “normal” distribution Characteristics of normal distributions • • Unimodal and symmetrical: shapes on both sides of the mean are identical – 68.26 percent of the area “under” the curve – meaning 68.26 percent of the cases – falls within one “standard deviation” (+/- 1 ) from the mean The fact that a distribution is “normal” or “near-normal” does NOT imply that the mean is of any particular value. All it implies is that scores distribute themselves around the mean “normally”. – Means depend on the data. In this distribution the mean could be any value. – By definition, the standard deviation score that corresponds with the mean of a normal distribution - whatever the mean might be - is zero. ( = 0) Mean (whatever it is) Standard deviation (always 0 at the mean) How well do means represent (summarize) a sample? If variable “no. of tickets” was “normally” distributed most cases would fall inside a bell-shaped curve. Here they don’t. 13 officers scored on numbers of tickets written in one week In a normal distribution about 66% of cases would fall within 1 SD of the mean. Frequency 13 X .66 = 9 cases But here only 7 cases (Officers D-J) do, while nearly as many (6) don’t. Scores are very dispersed, making the distribution mostly flat. So here the mean is NOT a good shortcut for describing how officers performed. Number of tickets A B C 2.13 -1 SD D E F G H I 4.46 mean J K L 6.79 +1 SD M Officer A: 1 ticket Officers B & C: 2 tickets each Officers D & E: 3 tickets each Officers F & G: 4 tickets each Officers H & I: 5 tickets each Officer J: 6 tickets Officers K & L: 7 tickets each Officer M: 9 tickets Mean = 4.46 SD = 2.33 13 officers scored on numbers of tickets written in one week Here most cases do fall inside the bellshaped curve. Variable “no. of tickets” seems near-normally distributed Frequency Here, 9 of 13 cases (officers C-K) do fall within 1 SD of the mean. The distribution is near-normal because most officers wrote close to the same number of tickets. The cases “cluster” around the mean. So, for this sample the mean is a decent summary statistic - a good shortcut for describing officer performance Number of tickets A B C 2.59 -1 SD D E F G H I 4.69 mean J K L 6.79 +1 SD M Officer A: 1 ticket Officer B: 2 tickets Officer C: 3 tickets Officers D, E, F: 4 tickets each Officers G, H, I: 5 tickets each Officers J & K: 6 tickets each Officer L: 7 tickets Officer M: 9 tickets Mean = 4.69 SD = 2.1 Going beyond description… • • • • • • When variables are normally or near-normally distributed, the mean, variance and standard deviation can help describe datasets But they are also useful in explaining why things change; that is, in testing hypotheses You want to test the hypothesis that college-educated cops are more effective: college  greater effectiveness – Independent variable: college (Y/N) – Dependent variable: effectiveness (scale 1-5) You go to the XYZ police dept., draw two samples of patrol officers - one of college grads, the other of non-college grads - and test each officer for effectiveness. On a scale of 1 (ineffective) to 5 (highly effective) this is how they scored: – 10 college grads (mean 3.7) – 10 non-college (mean 2.8) The difference between means is in the hypothesized direction. But does that “prove” that college grads are more effective? To determine whether the difference in means is “statistically significant,” meaning large enough to prove the value of education, we need to know each sample’s variance. Don’t worry - we’ll cover this later! Are collegeeducated cops more effective? College grads Non-college grads Exam information • You must bring a regular, non-scientific calculator with no functions beyond a square root key. • You will be asked to apply concepts including research question, hypothesis and variables to the “college education and police job performance" article. • You will be given data and asked to create graph(s) depicting the distribution of a single variable. • You will compute basic statistics, including mean, median, mode and standard deviation. All computations must be shown on the answer sheet. • You will be given the formula for variance (s2). You must use and display the procedure described in the slides and practiced in class for manually calculating variance (s2) and its square root, known as standard deviation (s). • This is a relatively brief exam. You will have one hour to complete it. We will then take a break and move on to the next topic.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slide show