Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DESCRIBING DATA: 2 Numerical summaries of data using measures of central tendency and dispersion Central tendency--Mode Table 1. Undergraduate Majors Major Anthropology Economics Geography Political Science Sociology F 97 104 57 110 82 Bimodal Distributions Table 1. Undergraduate Majors Major Anthropology Economics Geography Political Science Sociology F 97 110 57 110 82 Mode for Grouped Frequency Distributions based on Interval Data Mean daily temp. 10-19.9 degrees 20-29.9 30-39.9 40-49.9 50-59.9 60-69.9 Place A (f) 5 5 20 30 20 20 Place B (f) 0 5 10 15 30 40 Midpoint of the modal class interval Median • The point in the distribution above which and below which exactly half the observations lie (50th percentile) • Calculation depends on whether the no. of observations is odd or even. Distribution 1 (n=5) 198 179 172 167 154 Distribution 2 (n=6) 197 193 189 Median= 188 187 183 179 MEDIAN for grouped frequency distributions based on interval data Mean daily temp. 10-19.9 degrees 20-29.9 30-39.9 40-49.9 50-59.9 60-69.9 (f) 5 5 20 30 20 20 Cumulative (f) 5 10 30 60 80 100 Median = 40 + ((20/30) * 10) = 40 + 6.67 = 46.67 ARITHMETIC MEAN Y ( yi ) / n y (1 1 3 3 6 7 7) / 7 28 / 7 4 Mean for Grouped Data Mean daily temp. 10-19.9 degrees 20-29.9 30-39.9 40-49.9 50-59.9 60-69.9 Totals (f) 5 5 20 30 20 20 100 Midpoint of interval 15 F times midpoint 75 25 35 45 55 65 125 700 1350 1100 1300 4650 Mean = sum of weighted midpoints / n = 4650/100=46.5 Mean is the balancing point of the distribution X X X 0 1 X 2 3 4 MEAN 5 X X X 6 7 8 9 Key Properties of the Mean • Sum of the differences between the individual scores and the mean equals 0 (Y Y ) 0 • sum of the squared differences between the individual scores and the mean equals a minimum value. 2 The minimum value (Y Y ) Weaknesses of each measure of central tendency • MODE: ignores all other info. about values except the most frequent one • MEDIAN: ignores the LOCATION of scores above or below the midpoint • MEAN: is the most sensitive to extreme values Impacts of skewed distributions Mean Mode Median Measures of Dispersion Poverty Households (%) in 2 suburbs by tract Suburb A 24 23 Less 22 dispersion 21 20 Mean=22 Suburb B 28 25 more 22 dispersion 19 16 Mean=22 Range • Highest value minus the lowest value • problem: ignores all the other values between the two extreme values Interquartile range • Based on the quartiles (25th percentile and 75th percentile of a distribution) • Interquartile range = Q3-Q1 • Semi-interquartile range = (Q3-Q1)/2 • eliminates the effect of extreme scores by excluding them Graphic representation: Box Plot 200 132 101 100 Infant mortality rate 0 -100 N= 52 Africa africa 44 Asia asia 37 Latin America latin a merica Variance • A measure of dispersion based on the second property of the mean we discussed earlier: (Y Y ) 2 minimum Step 1: Calculate the total sum of squares around the mean Y 10 12 14 15 16 18 20 Me an =105/7=15 (Y Y ) (Y Y ) 2 -5 -3 -1 0 +1 +3 +5 25 9 1 0 1 9 25 Su m = 70 Step 2: Take an average of this total variation s (Y Y ) / n 1 2 2 Why n-1? Rather than simply n??? The normal procedure involves estimating variance for a population using data from a sample. Samples, especially small samples, are less likely to include extreme scores in the population. N-1 is used to compensate for this underestimate. Step 3: Take the square root of variance s (Y Y ) 2 / n 1 Purpose: expresses dispersion in the original units of measurement--not units of measurement squared Like variance: the larger the value the greater the variability Coefficient of Variation (V) V = (standard deviation / mean) Value: To allow you to make comparisons of dispersion across groups with very different mean values or across variables with very different measurement scales.