Download Measures of Central Tendency and Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Mean field particle methods wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Measures of Central Tendency and Variability
Central Tendency
The central tendency of a distribution is, roughly speaking, its middle. It is often useful to be able to
describe the central tendency of a distribution in terms of a single value. For example, it might be useful
to be able to say that the mean score on a statistics exam was 80, or the median home price in Pacific
Palisades is $1,000,000.00. We refer to these values as measures of central tendency.
The Mean
The mean of a set of scores (abbreviated M) is probably the most common and useful measure of central
tendency. It is just the sum of the scores divided by the total number of scores.
The Median
The median of a set of scores is the “middle” score, when the scores are arranged from lowest to highest.
When there is an odd number of scores, there is one score that falls right in the middle, and it is the
median. When there is an even number of scores, there are two middle scores, in which case the median
is the value that is halfway between those two scores.
Consider the following set of scores: 3, 9, 2, 10, 6, 6, 4, 3, 6, 5. To find the median, start by arranging
them from lowest to highest: 2, 3, 3, 4, 5, 6, 6, 6, 9, 10. Then find the middle score or scores. Because
there is an even number of scores, there are two middle scores: 5 and 6. So the median is 5.50.
The Mean vs. the Median
For roughly symmetrical distributions, the mean and median are about the same. (For perfectly
symmetrical distributions, they are identical.) For skewed distributions, however, they can be quite
different. Positively skewed distributions have a few relatively high scores that tend to pull the mean in a
positive direction. Negatively skewed distributions have a few relatively low scores that tend to pull the
mean in a negative direction. These same extreme scores also pull the median in a positive or negative
direction, but to a much lesser extent.
So although the mean is more common and—as we will see—plays a much larger role in more advanced
statistical procedures, for highly skewed distributions the median is often the preferred measure of central
tendency. For example, house prices in different areas are usually described in terms of medians rather
than means because means would be strongly affected by the relatively few extremely expensive houses.
In practice, though, there is no reason that you cannot always compute and report both the mean and
median as measures of central tendency.
The Mode
The mode of a set of scores is the most common score. The mode of the set of scores above is 6.
Unlike the mean and median, it is possible for a set of scores to have more than one mode.
The mode is often a useful measure of central tendency to compute and report in conjunction with the
mean or median. For example, it might be interesting to know that the mean of a set of mood ratings was
7.30, and also that the mode—the most common rating—was 8. Note also that the mode is the only
measure of central tendency available to describe categorical variables. For example, we can say that
brown is the modal hair color of CSUF students, or psychology is the modal major of students in Psych
144. But there is no way to compute the mean hair color or the median major. (Interestingly, for
categorical variables, the mode is not really a measure of central tendency because the distribution of a
categorical variable has no real center.)
“Averages”
The word “average” refers to the mean, median, mode, or any other measure of central tendency. It is a
generic term. For this reason, when you hear statements about “averages,” you should ask yourself,
“Which average?”
Another issue with the word “average” is that it has a completely non-statistical meaning—something like
“typical” or “ordinary.” So if you hear that “the average American eats 30 pounds of onions per year,” it is
unclear whether 30 pounds is really an average in the statistical sense, or just an amount that would not
be unusual for a “typical” American.
Variability
Another important feature of a frequency distribution (in addition to its overall shape and its central
tendency) is its variability. Are the scores all very similar to each other? Or are they quite different from
each other.
The Range
The simplest measure of variability is the range, which is the difference between the highest and the
lowest scores. In most applications, it is best to report the actual highest and lowest scores, as opposed
to just the range. For example, I might tell a class that their exams included a high score of 90 and a low
score of 50, which is more informative than just saying that the range was 40 (because that could also
mean 80 to 40, or 70 to 30, …).
The range is easy to understand and useful in a lot of situations, but it can be somewhat misleading when
there are outliers. For example, a set of exam scores might range from 90 to 20, making it seem as
though there was a high degree of variability. In fact, all the scores might have been between 80 and 90,
except for the one score of 20.
The Standard Deviation
The standard deviation (abbreviated SD or s) is, roughly, the average amount by which the scores differ
from the mean. Consider the following scores: 45, 55, 50, 53, 47. Their mean is 50, and they differ from
the mean by 5, 5, 0, 3, and 3. So on average, they differ from the mean by a shade over 3. Now
consider another set of scores: 20, 80, 50, 38, 62. Their mean is also 50, but they differ from the mean
by 30, 30, 0, 12, and 12. So on average, they differ from the mean by about 17. These two sets of
scores have the same means, but they have quite different standard deviations.
Unfortunately, the standard deviation is not just the mean absolute difference between the scores and the
mean. It is just a bit more complicated. Here is how to compute it. 1) Find the mean. 2) Subtract the
mean from each score (or each score from the mean; it does not matter). 3) Square each of these
differences. 4) Find the mean of these squared differences. 5) Find the square root of this mean. Voila!
You have the standard deviation.
For now, when you compute a standard deviation, do it using a table like the one below.
X (Mood
Ratings)
5
4
8
2
8
7
2
9
7
8
ΣX =
M=
60
60 / 10 = 6.00
M
X–M
(X – M)2
6.00
6.00
6.00
6.00
6.00
6.00
6.00
6.00
6.00
6.00
–1
–2
+2
–4
+2
+1
–4
+3
+1
+2
1
4
4
16
4
1
16
9
1
4
Σ(X – M)2 =
SD2 =
SD =
60
60 / 10 = 6.00
6.001/2 = 2.45
The Variance
The variance (abbreviated SD2 or s2) is another measure of variability. It is just the mean of the squared
differences, before you take the square root to get the standard deviation. In other words, you actually
compute the variance on your way to computing the standard deviation. In the table above, the variance
is 6.00. (It is just a coincidence, though, that the variance is the same as the mean.)
The variance is generally not used as a descriptive measure of variability, at least in part because it is in
squared units. Consider five people’s heights in inches: 72, 65, 60, 74, 66. The variance is 22.96. But
22.96 what? It is obviously not inches, because 22.96 is way bigger than even the range. The answer is
that it is 22.96 squared inches. But because we are not used to thinking in terms of squared inches this is
not a very meaningful measure of variability. So we take the square root of the variance to get the
standard deviation, which is 4.79 inches.
The variance is useful, though, in doing inferential statistics, especially an important method called
analysis of variance (ANOVA).
More on the Standard Deviation
z scores
The z score of a particular score is the number of standard deviations it is above or below the mean. You
find the z score by subtracting the mean from the score in question and dividing by the standard
deviation. For example, if a distribution of IQ scores has a mean of 100 and a standard deviation of 15,
then a score of 115 has a z score of +1: (115 – 100) / 15 = +1. This just means that it is one standard
deviation above the mean. Similarly a score of 90 has a z score of (90 – 100) / 15 = –0.67. It is twothirds of a standard deviation below the mean.
Relationship to the Normal Distribution
A normal distribution is a unimodal, symmetrical, “bell-shaped” distribution (although not every
symmetrical bell shaped distribution is normal), and many variables are distributed normally or close to it.
This is important because we know that in a normal distribution, about 34% of cases score between the
mean and one standard deviation above the mean, and about 34% of cases score between the mean and
one standard deviation below the mean. So about two-thirds of cases fall within a standard deviation of
the mean. We can also say that about two-thirds of cases have z scores between –1 and +1. We also
know that about 14% of cases fall between one and two standard deviations above the mean, and about
14% of cases fall between one and two standard deviations below the mean. So about 96% of cases fall
within two standard deviations of the mean. In other words, the vast majority of cases have z scores
between –2 and +2.
Create a unimodal graph with each of the 6 levels of SD
-3,-2,-1,+1,+2,+3