Download central tendency - Website Staff UI

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Central Tendency
 
CENTRAL TENDENCY
Mean, Median, and Mode
© aSup-2007
1
Central Tendency
 
OVERVIEW
 The general purpose of descriptive statistical
methods is to organize and summarize a set
score
 Perhaps the most common method for
summarizing and describing a distribution is
to find a single value that defines the average
score and can serve as a representative for the
entire distribution
 In statistics, the concept of an average or
representative score is called central tendency
© aSup-2007
2
Central Tendency
 
OVERVIEW
 Central tendency has purpose to provide a
single summary figure that best describe the
central location of an entire distribution of
observation
 It also help simplify comparison of two or
more groups tested under different conditions
 There are three most commonly used in
education and the behavioral sciences: mode,
median, and arithmetic mean
© aSup-2007
3
 
Central Tendency
The MODE
 A common meaning of mode is ‘fashionable’,
and it has much the same implication in
statistics
 In ungrouped distribution, the mode is the
score that occurs with the greatest frequency
 In grouped data, it is taken as the midpoint of
the class interval that contains the greatest
numbers of scores
 The symbol for the mode is Mo
© aSup-2007
4
 
Central Tendency
The MEDIAN
 The median of a distribution is the point along
the scale of possible scores below which 50%
of the scores fall and is there another name for
P50
 Thus, the median is the value that divides the
distribution into halves
 It symbols is Mdn
© aSup-2007
5
Central Tendency
 
The ARITHMETIC MEAN
 The arithmetic mean is the sum of all the
scores in the distribution divided by the total
number of scores
 Many people call this measure the average,
but we will avoid this term because it is
sometimes used indiscriminately for any
measure of central tendency
 For brevity, the arithmetic mean is usually
called the mean
© aSup-2007
6
Central Tendency
 
The ARITHMETIC MEAN
 Some symbolism is needed to express the mean
mathematically. We will use the capital letter X as a
collective term to specify a particular set of score (be
sure to use capital letters; lower-case letters are used
in a different way)
 We identify an individual score in the distribution by
a subscript, such as X1 (the first score), X8 (the eighth
score), and so forth
 You remember that n stands for the number in a
sample and N for the number in a population
© aSup-2007
7
 
Central Tendency
Properties of the Mode
 The mode is easy to obtain, but it is not very stable
from sample to sample
 Further, when quantitative data are grouped, the
mode maybe strongly affected by the width and
location of class interval
 There may be more than one mode for a particular set
of scores. In rectangular distribution the ultimate is
reached: every score share the honor! For these
reason, the mean or the median is often preferred
with numerical data
 However, the mode is the only measure that can be
used for data that have the character of a nominal
scale
© aSup-2007
8
Central Tendency
 
Properties of the Median
© aSup-2007
9
 
Central Tendency
Properties of the Mean
 Unlike the other measures of central tendency,
the mean is responsive to the exact position of
reach score in the distribution
 Inspect the basic formula ΣX/n. Increasing or
decreasing the value of any score changes ΣX and
thus also change the value of the mean
 The mean may be thought of as the balance point
of the distribution, to use a mechanical analogy.
There is an algebraic way of stating that the mean
is the balance point:
( X  X )  0
© aSup-2007
10
Central Tendency
 
Properties of the Mean
 The sums of negative deviation from the mean
exactly equals the sum of the positive
deviation
 The mean is more sensitive to the presence (or
absence) of scores at the extremes of the
distribution than are the median or (ordinarily
the mode
 When a measure of central tendency should
reflect the total of the scores, the mean is the
best choice because it is the only measure
based of this quantity
© aSup-2007
11
 
Central Tendency
The MEAN of Ungrouped Data
 The mean (M), commonly known as the
arithmetic average, is compute by adding all
the scores in the distribution and dividing by
the number of scores or cases
M=
© aSup-2007
ΣX
N
12
 
Central Tendency
The MEAN of Grouped Data
 When data come to us
grouped, or
M
 when they are too lengthy
for comfortable addition
without the aid of a
calculating machine, or
X
 when we are going to
20 - 24
group them for other
purpose anyway,
15 - 19
 we find it more convenient 10 - 14
to apply another formula
5-9
for the mean:
0-4
© aSup-2007
Σ f.Xc
=
N
Xc
f
f.Xc
22
17
12
7
2
1
4
7
5
3
22
68
84
35
6
13
Central Tendency
 
The MEDIAN of Ungrouped Data
 Method 1: When N is an odd number
 list the score in order (lowest to highest),
and the median is the middle score in the list
 Method 2: When N is an even number
 list the score in order (lowest to highest),
and then locate the median by finding the
point halfway between the middle two scores
© aSup-2007
14
Central Tendency
 
The MEDIAN of Ungrouped Data
 Method 3: When there are several scores with
the same value in the middle of the
distribution
 1, 2, 2, 3, 4, 4, 4, 4, 4, 5
 There are 10 scores (an even number), so you
normally would use method 2 and average the
middle pair to determine the median
 By this method, the median would be 4
© aSup-2007
15
 
Central Tendency
f
f
5
5
4
4
3
3
2
2
1
1
0
© aSup-2007
1
2
3
4
5
X 0
1
2
3
4
5
X
16
Central Tendency
 
The MEDIAN of Grouped Data
 There are 10 scores (an even number), so you
normally would use method 2 and average the
middle pair to determine the median. By this method
the median would be 4
 In many ways, this is a perfectly legitimate value for
the median. However when you look closely at the
distribution of scores, you probably get the clear
impression that X = 4 is not in the middle
 The problem comes from the tendency to interpret
the score of 4 as meaning exactly 4.00 instead of
meaning an interval from 3.5 to 4.5
© aSup-2007
17
 
Central Tendency
How to count the median?
Mdn = XLRL +
© aSup-2007
0.5N – f BELOW LRL
f TIED
18
 
Central Tendency
THE MODE
 The word MODE means the most common
observation among a group of scores
 In a frequency distribution, the mode is the
score or category that has the greatest
frequency
© aSup-2007
19
Central Tendency
 
SELECTING A MEASURE OF CENTRAL TENDENCY
 How do you decide which measure of central
tendency to use? The answer depends on
several factors
 Note that the mean is usually the preferred
measure of central tendency, because the mean
uses every score score in the distribution, it
typically produces a good representative value
 The goal of central tendency is to find the
single value that best represent the entire
distribution
© aSup-2007
20
Central Tendency
 
SELECTING A MEASURE OF CENTRAL TENDENCY
 Besides being a good representative, the mean
has the added advantage of being closely
related to variance and standard deviation, the
most common measures of variability
 This relationship makes the mean a valuable
measure for purposes of inferential statistics
 For these reasons, and others, the mean
generally is considered to be the best of the
three measure of central tendency
© aSup-2007
21
Central Tendency
 
SELECTING A MEASURE OF CENTRAL TENDENCY
 But there are specific situations in which it is
impossible to compute a mean or in which the
mean is not particularly representative
 It is in these condition that the mode an the
median are used
© aSup-2007
22
Central Tendency
 
WHEN TO USE THE MEDIAN
1. Extreme scores or skewed distribution
When a distribution has a (few) extreme
score(s), score(s) that are very different in
value from most of the others, then the mean
may not be a good representative of the
majority of the distribution.
The problem comes from the fact that one or
two extreme values can have a large
influence and cause the mean displaced
© aSup-2007
23
Central Tendency
 
WHEN TO USE THE MEDIAN
2. Undetermined values
Occasionally, we will encounter a situation in
which an individual has an unknown or
undetermined score
Person Time (min.)
1
2
3
4
5
6
© aSup-2007
8
11
12
13
17
Never finished
Notice that person 6 never
complete the puzzle. After one
hour, this person still showed no
sign of solving the puzzle, so the
experimenter stop him or her
24
Central Tendency
 
WHEN TO USE THE MEDIAN
2. Undetermined values
There are two important point to be noted:
 The experimenter should not throw out this
individual’s score. The whole purpose to use a
sample is to gain a picture of population, and this
individual tells us about that part of the population
cannot solve this puzzle
 This person should not be given a score of X = 60
minutes. Even though the experimenter stopped the
individual after 1 hour, the person did not finish the
puzzle. The score that is recorded is the amount of
time needed to finish. For this individual, we do not
know how long this is
© aSup-2007
25
 
Central Tendency
WHEN TO USE THE MEDIAN
3. Open-ended distribution
A distribution is said to be open-ended when there
is no upper limit (or lower limit) for one of the
categories
Number of
children (X)
5 or more
4
3
2
1
0
© aSup-2007
f
3
2
2
3
6
4
Notice that is impossible to
compute a mean for these data
because you cannot find ΣX
26
Central Tendency
 
WHEN TO USE THE MEDIAN
4. Ordinal scale
when score are measured on an ordinal scale,
the median is always appropriate and is
usually the preferred measure of central
tendency
© aSup-2007
27
Central Tendency
 
WHEN TO USE THE MODE
 Nominal scales
Because nominal scales do not measure quantity, it is
impossible to compute a mean or a median for data
from a nominal scale
 Discrete variables  indivisible categories
 Describes shape
the mode identifies the location of the peak (s). If you
are told a set of exam score has a mean of 72 and a
mode of 80, you should have a better picture of the
distribution than would be available from mean alone
© aSup-2007
28
Central Tendency
 
CENTRAL TENDENCY AND THE SHAPE
OF THE DISTRIBUTION
 Because the mean, the median, and the mode
are all trying to measure the same thing
(central tendency), it is reasonable to expect
that these three values should be related
 There are situations in which all three
measures will have exactly the same or
different value
 The relationship among the mean, median,
and mode are determined by the shape of the
distribution
© aSup-2007
29
Central Tendency
 
SYMMETRICAL DISTRIBUTION SHAPE
 For a symmetrical distribution, the right-hand
side will be a mirror image of the left-hand
side
 By definition, the mean and the median will be
exactly at the center because exactly half of the
area in the graph will be on either side of the
center
 Thus, for any symmetrical distribution, the
mean and the median will be the same
© aSup-2007
30
Central Tendency
 
SYMMETRICAL DISTRIBUTION SHAPE
 If a symmetrical distribution has only one
mode, it will also be exactly in the center of the
distribution. All three measures of central
tendency will have same value
 A bimodal distribution will have the mean and
the median together in the center with the
modes on each side
 A rectangular distribution has no mode
because all X values occur with the same
frequency. Still the mean and the median will
be in the center and equivalent in value
© aSup-2007
31
Central Tendency
 
SYMMETRICAL DISTRIBUTION SHAPE
© aSup-2007
32
Central Tendency
 
POSITIVELY SKEWED DISTRIBUTION
© aSup-2007
33
Central Tendency
 
NEGATIVELY SKEWED DISTRIBUTION
© aSup-2007
34
Central Tendency
© aSup-2007
 
35