Download CHAPTER THREE: Measures of Central Tendency

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Central limit theorem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
CHAPTER THREE: Measures of Central Tendency
Sometimes a most representative number will be used to depict a distribution, for
summary purposes. But what is a most representative number. It is a measure of
CENTRAL TENDENCY, the place within a distribution where most members tend to
occur. This chapter discusses the three types of representative numbers, as well as, the
computation and application of each.
MEASURES OF CENTRAL TENDENCY: There are three methods of
measuring the central tendency of a distribution.
Types of measures:
THE MODE (Mo): The most frequently occurring score in a distribution tends
to fall in the middle when a distribution is symmetrical. But there is no
guarantee of that, so the mode can be very misleading. In fact, when all the
members occur the same number of times there is no mode. The mode will be
all that is possible when the data is nominal. When data is represented in a
frequency distribution, the mode is simply the score with the highest frequency.
This is not appropriate in a grouped frequency distribution.
THE MEDIAN (Md): The 50th%, or exact middle of a distribution. The
median also falls near the center of the distribution. It is sensitive to extreme
numbers, or OUTLIERS, so it may not be the true balance point of the
distribution. Still, it is not oblivious to outliers and is generally not as potentially
misleading as the mode. The median requires at least ordinal data, as percentiles
must be ranked. When data is collected into a cumulative percentile frequency
distribution, the median is the percentile equivalent to the 50th percentile rank.
THE MEAN (Mn): The arithmetic average of a distribution, or mean, is the
balanced center of a distribution. Denoted by X, it's formula is: X = The Sum of
X/N. The sum of the distances of all the scores from the mean is always equal to
zero. This is true even when only one score is greater than the mean and all the
other scores in the distribution are less than the mean. For this to happen, the
larger score is simply much further away from the mean. Consider the set X =
10, 10, 10, 10 and 60. Since the average is 20, all the small scores are ten points
below the mean. The largest score is 40 points above the mean.
_
Additional columns can depict the distance from X:
_
X
X - X (distance from the mean)
60
60 - 20 = 40
10
10 - 20 = -10
10
10
10
___
100
10 - 20 = -10
10 - 20 = -10
10 - 20 = -10
___________
Sum (x- u)=0
*Note...The sum of the distances of the scores of a distribution is always zero.
Furthermore, the mean lends itself to more complex calculations and requires a
continuous scale of data. Unfortunately, the mean is extremely sensitive to outliers. That
is what is happening in the example depicted above. The mean is suppose to measure the
central tendency of the distribution. The mean is 20 and not one of the scores is a 20.
When the distribution is symmetrical, or balanced in shape, the mean falls where most of
the members of the distribution are. Regardless of whether the distribution is symmetrical
or not, the sum of the distances from the mean is generally smaller and never larger than
the distances from the mode or the median. Interestingly, when a distribution is perfectly
symmetrical, all three measures of tendency are equal to each other.
Let's consider a CPDF. Notice the sum of X, _X, and the number of scores, N, seem to be
computed differently. That is because frequency distributions no longer contain the
original data set, unless all the numbers appear exactly once. By adding a Xf column, in
which each possible score is multiplied by the frequency in the f column, you can correct
this problem. Note,
N still = Sum f, but now Sum X = Sum Xf. Further, the median is the 50%
and the mode is simply the score with the greatest number in the f column
Xf
X
f
cf
%
c%
60
0
0
0
0
40
60
50
40
30
20
10
1
0
0
0
0
4
5
4
4
4
4
4
20
0
0
0
0
80
100
80
80
80
80
80
∑X = ∑Xf = 100
N = ∑f = 5
Mo = 10 , Md = 10.125 , Sum of X = 100 N = 5
DISTRIBUTION SHAPE: The shape of the distribution is depicted with a polygon.
A SKEW, or pull in the distribution, will jeopardize the symmetry of that
distribution.
Consider these shapes.
SYMETERY: A distribution is described as symmetrical when the curve of the
polygon depicts a balanced image. A normal curve is balanced , mesokurtic
and all it’s measures of central tendency are equal.
SKEW: A skew, or tail, in a distribution can be pulled toward smaller scores.
NEGATIVE SKEW: When the majority of scores are large, and the exceptions
are small, the data set will have a point or skewer toward the smaller.
POSITIVE SKEWS: When most of the numbers in the set are relatively small
and the exceptions are relatively large. The tail or skewer will be toward the
larger numbers.
SYMMETRY AND CENTRAL TENDENCY: The position of the measures of central
tendency, in relation to each other, can indicate the shape of the distribution.
When the curve is symmetrical: Mn = Md.
When the curve is normal: Mo = Mn = Md.
When the curve has -skew: Mo > Md > Mn.
When the curve has +skew: Mo < Md < Mn.
KURTOSIS: The degree of the curve in a polygon has three types:
Platykurtic polygons are flat (A).
Mesokurtic polygons are medial height (B).
Leptokurtic polygons are pointed (C).