Download on Measures of Central Tendency

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Mean field particle methods wikipedia , lookup

Gibbs sampling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Measures of Central Tendency
Levin and Fox
Elementary Statistics In Social Research
Chapter 3
1
Measures of central tendency:
Measures of central tendency:
Measures of central tendency are numbers that describe what is average or
typical in a distribution
We will focus on three measures of central tendency:
– The Mode
– The Median
– The Mean (average)
Our choice of an appropriate measure of central tendency depends on three
factors: (a) the level of measurement, (b) the shape of the distribution,
(c) the purpose of the research.
2
The Mode
The Mode:
The mode is the most frequent, most typical or most common value or category
in a distribution.
Example: There are more protestants in the US than people of any other
religion.
The mode is always a category or score, not a frequency. The mode is the only
measureof available to nominal-level variables.
The mode is not necessarily the category with the majority (that is, 50% or more)
of cases. It is simply the category in which the largest number (or proportion)
of cases falls.
3
Let’s Practice!
Look at the figure below and identity the mode.
4%
4
A Review of Mode
The pie chart shows answers of 1998 GSS respondents to the question,
“Would you say your own health, in general, is excellent, good, fair, or poor?”
Note that the highest percentage (49%) of respondents is associated with the
answer “good.”
The answer “good” is the mode.
Remember: The mode is used to describe nominal variables!
5
A Review of Mode
Another Mode Example:
Our question is the following:
“What is the most common foreign language spoken in the United States today, as
determined by the mode?”
To answer this question, let’s look at a list of the ten most commonly spoken
foreign languages in the United States and the number of people who speak
each foreign language:
6
Ten Most Common Foreign Languages Spoken in
the United States, 1990.
Language
Number of Speakers
Spanish
17,339,000
French
1,702,000
German
1,547,000
Italian
1,309,000
Chinese
1,249,000
Tagalog
843,000
Polish
723,000
Korean
626,000
Vietnamese
507,000
Portuguese
430,000
Source: U.S. Bureau of the Census, Statistical Abstract of the United
States, 2000, Table 51.
7
A Review of Mode
Is the mode 17,339,000?
NO!
Recall: The mode is the category or score, not the frequency!!
Thus, the mode is Spanish.
8
The Mode
Some additional points to consider about modes:
Some distributions have two modes where two response categories have the
highest frequencies.
Such distributions are said to be bimodal.
NOTE: When two scores or categories have the highest frequencies that are
quite close, but not identical, in frequency, the distribution is still
“essentially” bimodal. In these instances report both the “true” mode and
the highest frequency categories.
9
Example of a Bimodal Frequency Distribution
10
The Median
The Median:
The median is the score that divides the distribution into two equal parts so
that half of the cases are above it and half are below it.
The median can be calculated for both ordinal and interval levels of
measurement, but not for nominal data.
It must be emphasized that the median is the exact middle of a
distribution.
So, now let’s look at ways we can find the median in sorted data:
11
In some cases, we can find the median by
simple inspection.
Let’s look at the responses (A) to the
question: “Think about the economy, how
would you rate economic conditions in the
country today?”
A
Poor
Jim
Good
Sue
Only Fair
Bob
Poor
Jorge
Excellent Karen
First, we arrange the responses (B) in order
from lowest to highest (or highest to
lowest).
Total (N)
Since we have an odd number of cases, let’s
find the middle case.
B
5
Poor
Jim
Poor
Jorge
Only Fair
Bob
Good
Sue
Excellent Karen
Total (N)
5
12
Calculating the median:
Jim
Poor
Jorge
Poor
Bob
Only Fair
Sue
Good
Karen
Excellent
We can find the median through visual inspection
and through calculation.
We can also find the middle case when N is odd by
adding 1 to N and dividing by 2: (N + 1) ÷2.
Since N is 5, you calculate (5 + 1) ÷ 2 = 3.
The middle case is, thus, the third case (Bob), the
median response is “Only Fair.”
13
Calculating the median:
Another example:
The following is a list of the number of hate crimes reported in the nine
largest U.S. states for 1997.
State
California
Number
1831
Florida
93
Virginia
105
New Jersey
694
New York
853
Ohio
265
Pennsylvania
168
Texas
333
North Carolina
42
TOTAL
N=9
14
Calculating the median:
Finding the Median State for Hate
Crimes
1.
2.
Order the cases from lowest to
highest.
In this situation, we need the 5th
case:
(9 + 1) ÷ 2 = 5
Which is Ohio
Remember: (N + 1) ÷2.
State
Number
North Carolina
42
Florida
93
Virginia
105
Pennsylvania
168
Ohio
265
Texas
333
New Jersey
694
New York
853
California
1831
N=9
15
Finding the Median State for Hate Crimes out of Eight States
1.
2.
3.
4.
5.
Order the cases from lowest to highest.
State
Number
North Carolina
42
Florida
93
For an even number of cases, there will be
two middle cases.
Virginia
105
Pennsylvania
168
In this instance, the median falls halfway
between both cases.
Ohio
265
Texas
333
However, the circumstances being explained
should determine if you use the two middle
cases or the point halfway between both
cases for your explanation.
New Jersey
694
New York
853
The median is always that point above which
50% of cases fall and below which 50% of
cases fall.
16
The median in frequency distributions:
So now, let’s find the median in frequency distributions:
Often the data are arranged in frequency distributions.
The procedure is a bit more involved:
– We have to find the category associated with the observation located in
the middle of the distribution.
– To do this, we construct a cumulative percentage distribution.
So, let’s take a look at a frequency distribution…
17
Table: Political Views of GSS Respondents, 1988
Political
Views
Frequency
(f)
Cf
Percentage
C%
Extremely
Liberal
32
32
2.4
2.4
Liberal
175
207
12.9
15.3
Slightly
Liberal
189
396
13.9
29.2
Moderate
502
898
37.0
66.2
Slightly
Conservative
211
1109
15.6
81.8
Conservative
203
1312
15.0
96.8
Extremely
Conservative
44
1356
3.2
100.00
Total
1356
100.00
18
Cumulative Percentage Distribution:
Cumulative Percentage Distribution:
We construct a cumulative percentage distribution to help locate the middle of
the distribution.
The observation located in the middle of the distribution is the one that has the
cumulative percentage value equal to 50%.
 Notice that 29.2% of the observations are accumulated below the
category of “moderate” and that 66.2% are accumulated up to and
including the category “moderate.”
The median is the value of the category associated with this observation.
This middle observation falls within the category “moderate,” so the median for
this distribution is “moderate.”
19
Table: Political Views of GSS Respondents, 1988
Political
Views
Frequency
(f)
Cf
Percentage
C%
Extremely
Liberal
32
32
2.4
2.4
Liberal
175
207
12.9
15.3
Slightly Liberal
189
396
13.9
29.2
Moderate
502
898
37.0
Slightly
Conservative
211
1109
15.6
81.8
Conservative
203
1312
15.0
96.8
Extremely
Conservative
44
1356
3.2
100.00
Total
1356
66.2
29.2-66.2
100.00
20
The Mean
The Mean:
The mean is what most people call the average. It find the mean of any distribution
simply add up all the scores and divide by the total number of scores.
Here is formula for calculating the mean
X

X
N
where X  mean (read as X bar)

 sum (expressed as the Greek letter sigma )
X  raw score in a set of scores
N  total number of scores in a set
21
Finding the Mean
Communicable Diseases -> Tuberculosis (as of 22 March 2007) ->
Case detection rate (MDG indicator 24) -> DOTS all new case
detection rate (%) -> Total
(Periodicity: Year, Applied Time Period: from 2005 to 2005)
2005
Bangladesh
37
Bhutan
44
Democratic People's Republic of Korea
103
India
58
Indonesia
47
Maldives
76
Myanmar
119
Nepal
64
Sri Lanka
71
Thailand
61
Timor-Leste
71
© World Health Organization, 2008. All rights reserved
22
Finding the Mean
Finding the Mean:
To identify the number of new tuberculosis cases found in 2006 by the WHO
in this region,
– Add up the cases for all of the countries in the region and
– Divide the sum by the total number of cases.
X

X
N
Thus, the mean rate is (751 ÷ 11) = 68.273.

23
Using a formula to calculate the mean:
The Usefulness of Formulas:
The mean introduces the usefulness of a formula, which may be defined as a
is a shorthand way to explain what operations we need to follow to
obtain a certain result.
Again, the formula that defines the mean is:
X

X
N
where X  mean (read as X bar)

 sum (expressed as the Greek letter sigma )
X  raw score in a set of scores
N  total number of scores in a set
24
Deviation:
Deviation:
The deviation indicates the distance and direction of any raw score from the
mean.
To find the deviation of a particular score, we simply subtract the mean from
the score:
Deviation  X  X
Where X = any raw score in the distribution
X  mean of the distributi on

25
The Weighted Mean
When groups differ in size, you can’t just sum their means and divide by the
number of groups. Instead, you must weight each group mean by its size,
Xw
where
N
group
X
group
N total
X
group
 mean of a particular group
N group  number in a particular group
N total  number in all groups combined
X
w
 weighted mean
26
Time to practice!
Reasons Why Homeowners get a Home Equity Line of Credit.
Consolidate debts: 26
Invest in other real estate: 3
Home improvements/repairs: 45
Other purposes: 9
Purchase auto: 9
Pay for education or medical: 4
27
So what do you do? And then?
We want to know the mo, mdn, and
X
First, let’s arrange the scores from
highest to lowest.
Home
improvements/
repairs
45
Consolidate debts
26
Other purposes
9
Purchase auto
9
Pay for education
or medical
4
Invest in other
real estate
3
Total
96
28
What’s the most frequent case (Mo)?
- Home improvements/repairs 45.
What is the middlemost score (Mdn)?
– 9, because (N + 1) ÷2 or (6+1)÷2=
3.5
What is the mean ( X )?
– 16, because the sum of the scores is
96 and we divide this by 6 to get 16.
Home
improvements/
repairs
45
Consolidate
debts
26
Other
purposes
9
Purchase auto
9
Pay for
education or
medical
4
Invest in other
real estate
3
Total (N = 6)
96

29
So what does this tell us?
The mode is the peak of the curve.
The mean is found closest to the tail, where the relatively few extreme cases
will be found.
The median is found between the mode and mean or is aligned with them in
a normal distribution.
30
Did you know?
The shape or form of a distribution can influence the researcher’s choice of a
measure of tendency.
Why is that? Well, let’s see…
31
Chapter Three: Review
32
Review: The Mode
The Mode:
The mode is the category with the largest frequency (or percentage) in the
distribution.
The mode is always a category or score, not a frequency.
The mode is not necessarily the category with the majority (that is, 50% or more)
of cases.
It is simply the category in which the largest number (or proportion) of cases falls.
33
Review: The Median
The Median:
The median is the score that divides the distribution into two equal parts so
that half of the cases are above it and half are below it.
The median can be calculated for both ordinal and interval levels of
measurement, but not for nominal data.
It must be emphasized that the median is the exact middle of a
distribution.
34
Review: The median:
Jim
Poor
Jorge
Poor
Bob
Only Fair
Sue
Good
Karen
Excellent
Calculating the median:
We can find the median through visual inspection
and through calculation.
We can also find the middle case when N is odd by
adding 1 to N and dividing by 2: (N + 1) ÷2.
Since N is 5, you calculate (5 + 1) ÷ 2 = 3.
The middle case is, thus, the third case (Bob), the
median response is “Only Fair.”
35
Review: The Mean
The Mean:
The mean is what most people call the average. It find the mean of any distribution
simply add up all the scores and divide by the total number of scores.
Here is formula for calculating the mean
X

X
N
where X  mean (read as X bar)

 sum (expressed as the Greek letter sigma )
X  raw score in a set of scores
N  total number of scores in a set
36
Review: Measures of Central Tendency
Reasons Why Homeowners get a Home Equity Line of Credit.
Consolidate debts: 26
Invest in other real estate: 3
Home improvements/repairs: 45
Other purposes: 9
Purchase auto: 9
Pay for education or medical: 4
37
Review: Measures of Central Tendency
We want to know the mo, mdn, and
X
First, let’s arrange the scores from
highest to lowest.
Home
improvements/
repairs
45
Consolidate debts
26
Other purposes
9
Purchase auto
9
Pay for education
or medical
4
Invest in other
real estate
3
Total
96
38
What’s the most frequent case (Mo)?
– Other purposes and Purchase auto
because they both have the score of
9.
What is the middlemost score (Mdn)?
– 9, because 9 + 9= 18 and if we divide
18 by 2, we get 9.
What is the mean ( X )?
– 16, because the sum of the scores is
  and we divide this by 6 to get 16.
96
Home
improvements/
repairs
45
Consolidate
debts
26
Other
purposes
9
Purchase auto
9
Pay for
education or
medical
4
Invest in other
real estate
3
Total (N = 6)
96
39
Review: Shape of the Distribution
Choosing a Measure of Central Tendency
The shape or form of a distribution can influence the researcher’s choice of a
measure of tendency.
40
Review: Shape of the Distribution
The mode is the peak of the curve.
The mean is found closest to the tail, where the relatively few extreme cases
will be found.
The median is found between the mode and mean or is aligned with them in
a normal symmetrical/unimodal distribution.
41