Download Ch03

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Outline and Review of Chapter 3
Measures of Central Tendency
In the last chapter, we explored how creating grouped frequency distribution tables allows us to
condense our information; instead of trying to show each and every individual score, we can
condense groups to make our distributions easier to describe and visualize.
Measures of central tendency (Mean, Median, and Mode) serve a similar purpose. They are
single numbers that tell your reader something about the distribution without showing them the
distribution. They represent convenient ways to describe and compare data sets. Let’s return to
the normal distribution to consider the most common measure of central tendency, the Mean.
The Shape of a Normal Distribution
Normal distributions are normal; they are nice and symmetrical with the most common values
clustered around the mean and less common values falling far from the mean. Normal
distributions are easy to describe. If you know that a distribution is perfectly normal, there are
only two questions to ask; What is the mean? and What is the width?(The Standard Deviation
provides a way to report the width of a distribution and we will discuss it in the next chapter.)
This is why scientific reports often report the mean and standard deviation of a set of data. With
those two numbers, you can imagine the shape of any data set without needing to know the list of
all the individual scores.
Figure 3.1. Frequency distribution histograms of two sets of data with identical means (M=11)
but different widths, or standard deviations. The width of the distribution on the left is larger
than the width of the group on the right.
As stated, a set of data are often described with only the mean and standard deviation. There are
other measures of central tendency, but they are unnecessary when describing normal
distributions, and here’s why:
The Median is the score that divides a distribution in half. If you have an odd number of scores,
it will be the one precisely in the middle, if you have an even number of scores, it is the average
of the middle two. (Calculating the true median is actually a little more complicated than that,
but this definition is acceptable in most cases). Look at the distributions in Figure 3.1 and see if
you can determine the median. For this and any perfectly symmetrical distribution, the median
will equal the mean. It is not necessary to report the median for a normal distribution.
The Mode is the most common score in a distribution and is usually easy to spot in a frequency
distribution histogram; it will always be the tallest bar in the distribution. For a perfectly normal
distribution, the mean will be the most common score and therefore equal the mode. It is not
necessary to report the mode of a normal distribution.
Skewed Distributions
You may be asking yourself why we bother with these other measures of central tendency. The
simple answer is that not all distributions are normal. It is appropriate to report the median and
mode of a distribution when they are not the same as the mean. Skewed distributions are one
kind of non-normal distribution and the mean, median, and mode can tell you a lot about the
direction and the degree of the skew. If you can imagine taking a normal distribution and
stretching it to the right (in the direction of higher or more positive numbers), the mean (blue
line) will be pulled away from the median (green line) and mode (red line). If the distribution is
skewed positively (like the one below) the mean will be greater than the median and mode, if the
distribution is negatively skewed, the mean will be less than the median and mode.
Figure 3.2. Positively skewed distribution of 150 quiz scores.
If a distribution is skewed, it usually means that there is some kind of firm boundary that the
scores cannot go below or above. In the case of figure 3.2, the results represent 150 scores from
a particularly difficult quiz. Half the students got scores at or below 6 but it is impossible for
anyone to get a score lower than 0 so the scores are clustered at the low end of the scale. The
opposite would happen for a very easy quiz. If you tell someone that your exam mean was 7.1
and that the mode was 4, they should understand that the distribution was positively skewed.
The more data you present, the clearer the picture should be.
Now consider how misleading it can be to report only one measure of central tendency. This
happens all the time when people argue about things like the distribution of annual household
incomes in the United States. Economists almost never use mean household income because the
distribution is dramatically skewed in the positive direction doing so includes individuals who
have annual incomes in the hundreds of millions or billions. Mean household income does not
change dramatically from year to year. Median household income is a much better indicator of
the “typical” American household since at least half of the US household are living at or above
that level and this number does change dramatically with changes in the economy. A politician
who wanted to exaggerate the wealth of the typical household would be inclined to report the
mean, whereas an economist who wanted to understand how changes in the economy affect the
typical household would follow the median. It will always be more complete and honest to
report both.
Figure 3.3. Frequency distribution histogram of U.S. household incomes in 2010.
Multi-Modal Distributions
The mode is the most common score in a distribution but, in more general terms, it represents
any peak in a frequency distribution histogram. If the groups don’t contain consistently fewer
and fewer members as scores get farther and farther from the mean, you could argue that you
have a multi-modal distribution.
Multi-modal distributions often indicate that your group actually has two subgroups. Let’s
consider a distribution that is similar to the one you saw in Figure 3.2. Imagine that a class in
Spanish Literature includes mostly students who learned Spanish in high school but also includes
a handful of students who have been speaking Spanish all their lives. If the literature is read from
the original Spanish, the native speakers might have better comprehension of the original
material and will, therefore, perform better than the non-native speakers on a quiz. The bi-modal
distribution below reflects the mixture of two distributions; the positively skewed distribution of
the non-native speakers and the smaller normal distribution of the native speakers clustered at the
high end of the scale. We see two modes because our scores include two distinct groups.
Figure 3.4. Bi-modal distribution of Spanish Literature quiz scores from a class including native
and nonnative speakers.
Important notes:
1. A bi-modal distribution doesn’t necessarily have to have two modes of exactly the same
size. Although one mode (4) is much larger than the other (18), each one is what
statisticians would call a local peak or local maximum.
2. In this case, the mean (M=9.14) and median (7) are very poor indicators of a typical
scores. In a dramatically bi-modal distribution, it is even possible that nobody in the
group even has a score that matches the mean or median (Can you think of how this
could happen?)
Summary: Measures of central tendency are single numbers used to describe distributions.
When describing a normal – or any other symmetrical - distribution, the mean is sufficient.
However, for skewed and multi-modal distributions, it is helpful, and often ethical, to report the
median and mode.