Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Outline and Review of Chapter 3 Measures of Central Tendency In the last chapter, we explored how creating grouped frequency distribution tables allows us to condense our information; instead of trying to show each and every individual score, we can condense groups to make our distributions easier to describe and visualize. Measures of central tendency (Mean, Median, and Mode) serve a similar purpose. They are single numbers that tell your reader something about the distribution without showing them the distribution. They represent convenient ways to describe and compare data sets. Let’s return to the normal distribution to consider the most common measure of central tendency, the Mean. The Shape of a Normal Distribution Normal distributions are normal; they are nice and symmetrical with the most common values clustered around the mean and less common values falling far from the mean. Normal distributions are easy to describe. If you know that a distribution is perfectly normal, there are only two questions to ask; What is the mean? and What is the width?(The Standard Deviation provides a way to report the width of a distribution and we will discuss it in the next chapter.) This is why scientific reports often report the mean and standard deviation of a set of data. With those two numbers, you can imagine the shape of any data set without needing to know the list of all the individual scores. Figure 3.1. Frequency distribution histograms of two sets of data with identical means (M=11) but different widths, or standard deviations. The width of the distribution on the left is larger than the width of the group on the right. As stated, a set of data are often described with only the mean and standard deviation. There are other measures of central tendency, but they are unnecessary when describing normal distributions, and here’s why: The Median is the score that divides a distribution in half. If you have an odd number of scores, it will be the one precisely in the middle, if you have an even number of scores, it is the average of the middle two. (Calculating the true median is actually a little more complicated than that, but this definition is acceptable in most cases). Look at the distributions in Figure 3.1 and see if you can determine the median. For this and any perfectly symmetrical distribution, the median will equal the mean. It is not necessary to report the median for a normal distribution. The Mode is the most common score in a distribution and is usually easy to spot in a frequency distribution histogram; it will always be the tallest bar in the distribution. For a perfectly normal distribution, the mean will be the most common score and therefore equal the mode. It is not necessary to report the mode of a normal distribution. Skewed Distributions You may be asking yourself why we bother with these other measures of central tendency. The simple answer is that not all distributions are normal. It is appropriate to report the median and mode of a distribution when they are not the same as the mean. Skewed distributions are one kind of non-normal distribution and the mean, median, and mode can tell you a lot about the direction and the degree of the skew. If you can imagine taking a normal distribution and stretching it to the right (in the direction of higher or more positive numbers), the mean (blue line) will be pulled away from the median (green line) and mode (red line). If the distribution is skewed positively (like the one below) the mean will be greater than the median and mode, if the distribution is negatively skewed, the mean will be less than the median and mode. Figure 3.2. Positively skewed distribution of 150 quiz scores. If a distribution is skewed, it usually means that there is some kind of firm boundary that the scores cannot go below or above. In the case of figure 3.2, the results represent 150 scores from a particularly difficult quiz. Half the students got scores at or below 6 but it is impossible for anyone to get a score lower than 0 so the scores are clustered at the low end of the scale. The opposite would happen for a very easy quiz. If you tell someone that your exam mean was 7.1 and that the mode was 4, they should understand that the distribution was positively skewed. The more data you present, the clearer the picture should be. Now consider how misleading it can be to report only one measure of central tendency. This happens all the time when people argue about things like the distribution of annual household incomes in the United States. Economists almost never use mean household income because the distribution is dramatically skewed in the positive direction doing so includes individuals who have annual incomes in the hundreds of millions or billions. Mean household income does not change dramatically from year to year. Median household income is a much better indicator of the “typical” American household since at least half of the US household are living at or above that level and this number does change dramatically with changes in the economy. A politician who wanted to exaggerate the wealth of the typical household would be inclined to report the mean, whereas an economist who wanted to understand how changes in the economy affect the typical household would follow the median. It will always be more complete and honest to report both. Figure 3.3. Frequency distribution histogram of U.S. household incomes in 2010. Multi-Modal Distributions The mode is the most common score in a distribution but, in more general terms, it represents any peak in a frequency distribution histogram. If the groups don’t contain consistently fewer and fewer members as scores get farther and farther from the mean, you could argue that you have a multi-modal distribution. Multi-modal distributions often indicate that your group actually has two subgroups. Let’s consider a distribution that is similar to the one you saw in Figure 3.2. Imagine that a class in Spanish Literature includes mostly students who learned Spanish in high school but also includes a handful of students who have been speaking Spanish all their lives. If the literature is read from the original Spanish, the native speakers might have better comprehension of the original material and will, therefore, perform better than the non-native speakers on a quiz. The bi-modal distribution below reflects the mixture of two distributions; the positively skewed distribution of the non-native speakers and the smaller normal distribution of the native speakers clustered at the high end of the scale. We see two modes because our scores include two distinct groups. Figure 3.4. Bi-modal distribution of Spanish Literature quiz scores from a class including native and nonnative speakers. Important notes: 1. A bi-modal distribution doesn’t necessarily have to have two modes of exactly the same size. Although one mode (4) is much larger than the other (18), each one is what statisticians would call a local peak or local maximum. 2. In this case, the mean (M=9.14) and median (7) are very poor indicators of a typical scores. In a dramatically bi-modal distribution, it is even possible that nobody in the group even has a score that matches the mean or median (Can you think of how this could happen?) Summary: Measures of central tendency are single numbers used to describe distributions. When describing a normal – or any other symmetrical - distribution, the mean is sufficient. However, for skewed and multi-modal distributions, it is helpful, and often ethical, to report the median and mode.