Download on Measures of Central Tendency

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Mean field particle methods wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Exam 2: Review
G 201
Statistics for Political Science
1
Exam 2: Review
Exam 2: Review Topics
Chapter 3: Central Tendency
1. Mode, Median, Mean (Definition, Formula for each)
2. Skewed Distribution
3. Systematical Distributions
Chapter 4: Variability
1. Range (Definition, Formula)
2. Deviation (Definition, Formula)
3. Variance (Definition, Formula)
4. Standard Deviation (Definition, Formula)
2
Measures of central tendency:
Measures of central tendency:
Measures of central tendency are numbers that describe what is average or
typical in a distribution
We will focus on three measures of central tendency:
– The Mode
– The Median
– The Mean (average)
Our choice of an appropriate measure of central tendency depends on three
factors: (a) the level of measurement, (b) the shape of the distribution,
(c) the purpose of the research.
3
The Mode
The Mode:
The mode is the most frequent, most typical or most common value or category
in a distribution.
Example: There are more protestants in the US than people of any other
religion.
The mode is always a category or score, not a frequency.
The mode is not necessarily the category with the majority (that is, 50% or more)
of cases. It is simply the category in which the largest number (or proportion)
of cases falls.
4
Ten Most Common Foreign Languages Spoken in
the United States, 1990.
Language
Number of Speakers
Spanish
17,339,000
French
1,702,000
German
1,547,000
Italian
1,309,000
Chinese
1,249,000
Tagalog
843,000
Polish
723,000
Korean
626,000
Vietnamese
507,000
Portuguese
430,000
Source: U.S. Bureau of the Census, Statistical Abstract of the United
States, 2000, Table 51.
5
A Review of Mode
Is the mode 17,339,000?
NO!
Recall: The mode is the category or score, not the frequency!!
Thus, the mode is Spanish.
6
The Mode
Some additional points to consider about modes:
Some distributions have two modes where two response categories have the
highest frequencies.
Such distributions are said to be bimodal.
NOTE: When two scores or categories have the highest frequencies that are
quite close, but not identical, in frequency, the distribution is still
“essentially” bimodal. In these instances report both the “true” mode
and the highest frequency categories.
7
Example of a Bimodal Frequency Distribution
8
The Median
The Median:
The median is the score that divides the distribution into two equal parts so
that half of the cases are above it and half are below it.
The median can be calculated for both ordinal and interval levels of
measurement, but not for nominal data.
It must be emphasized that the median is the exact middle of a
distribution.
So, now let’s look at ways we can find the median in sorted data:
9
In some cases, we can find the median by
simple inspection.
Let’s look at the responses (A) to the
question: “Think about the economy, how
would you rate economic conditions in
the country today?”
A
Poor
Jim
Good
Sue
Only Fair
Bob
Poor
Jorge
Excellent Karen
First, we sort the responses (B) in order from
lowest to highest (or highest to lowest).
Total (N)
Since we have an odd number of cases, let’s
find the middle case.
B
5
Poor
Jim
Poor
Jorge
Only Fair
Bob
Good
Sue
Excellent Karen
Total (N)
5
10
Calculating the median:
Jim
Poor
Jorge
Poor
Bob
Only Fair
Sue
Good
Karen
Excellent
We can find the median through visual inspection
and through calculation.
We can also find the middle case when N is odd by
adding 1 to N and dividing by 2:
(N + 1) ÷2.
Since N is 5, you calculate (5 + 1) ÷ 2 = 3.
The middle case is, thus, the third case (Bob), the
median response is “Only Fair.”
11
Calculating the median:
Another example:
The following is a list of the number of hate crimes reported in the nine
largest U.S. states for 1997.
State
California
Number
1831
Florida
93
Virginia
105
New Jersey
694
New York
853
Ohio
265
Pennsylvania
168
Texas
333
North Carolina
42
TOTAL
N=9
12
Calculating the median:
Finding the Median State for Hate
Crimes
1.
2.
Order the cases from lowest to
highest.
In this situation, we need the 5th
case:
(9 + 1) ÷ 2 = 5
Which is Ohio
Remember: (N + 1) ÷2.
State
Number
North Carolina
42
Florida
93
Virginia
105
Pennsylvania
168
Ohio
265
Texas
333
New Jersey
694
New York
853
California
1831
N=9
13
Finding the Median Number of Hate Crimes out of Eight States
Order the cases from lowest to highest.
For an even number of cases, there will be two
middle cases.
In this instance, the median falls halfway between
both cases (216.5).
However, the circumstances being explained
should determine if you use the two middle cases
or the point halfway between both cases for your
explanation.
State
Number
North Carolina
42
Florida
93
Virginia
105
Pennsylvania
168
Ohio
265
Texas
333
New Jersey
694
New York
853
14
Finding the Median Number of Hate Crimes out of Eight States
1.In this instance, the median falls halfway between
both cases (216.5).
(8 + 1) ÷ 2 = 4.5
4.5 (216.5)
State
Number
North Carolina
42
Florida
93
Virginia
105
Pennsylvania
168
Ohio
265
Texas
333
New Jersey
694
New York
853
15
The Median
The Median (Mdn) : Examples
Odd Number of Cases: Median exactly in the middle
12, 17, 13, 11, 16, 25, 20 (not ordered)
11, 12, 13, 16, 17, 20, 25 (ordered: Lowest to Highest)
N=7
(N + 1) ÷ 2 = (7 + 1) ÷ 2 = 4
11, 12, 13, 16, 17, 20, 25, 26 (ordered)
1 2 3 4
Mdn = 16
16
The Median
The Median (Mdn): Examples
Even Number of Cases: Median is the point above and below which 50%
of the cases fall:
17, 12, 16, 13, 11, 25, 20, 26
11, 12, 13, 16, 17, 20, 25, 26 (ordered)
N=8
(N + 1) ÷ 2 = (8 + 1) ÷ 2 = 4.5
11, 12, 13, 16, 17, 20, 25, 26
1 2 3 4 4.5
Mdn = 16.5
17
The Mean
The Mean:
The mean is what most people call the average. It find the mean of any distribution
simply add up all the scores and divide by the total number of scores.
Here is formula for calculating the mean
X
å
X=
N
where X = mean (read as X bar)
å
= sum (expressed as the Greek letter sigma)
X = raw score in a set of scores
N = total number of scores in a set
18
Finding the Mean
Communicable Diseases -> Tuberculosis (as of 22 March 2007)
2005
Bangladesh
37
Bhutan
44
Democratic People's Republic of Korea
103
India
58
Indonesia
47
Maldives
76
Myanmar
119
Nepal
64
Sri Lanka
71
Thailand
61
Timor-Leste
71
n (cases) = 11
© World Health Organization, 2008. All rights reserved
751
19
Finding the Mean
Finding the Mean:
To identify the number of new tuberculosis cases found in 2006 by the WHO
in this region,
– Add up the cases for all of the countries in the region and
– Divide the sum by the total number of cases.
X
å
X=
N
Thus, the mean rate is (751 ÷ 11) = 68.273.
20
Using a formula to calculate the mean:
The Usefulness of Formulas:
The mean introduces the usefulness of a formula, which may be defined as a
is a shorthand way to explain what operations we need to follow to
obtain a certain result.
Again, the formula that defines the mean is:
X
å
X=
N
where X = mean (read as X bar)
å
= sum (expressed as the Greek letter sigma)
X = raw score in a set of scores
N = total number of scores in a set
21
Deviation:
Deviation:
The deviation indicates the distance and direction of any raw score from the
mean.
To find the deviation of a particular score, we simply subtract the mean from
the score:
Deviation = X - X
Where X = any raw score in the distribution
X  mean of the distributi on
22
So what does this tell us?
The mode is the peak of the curve.
The mean is found closest to the tail, where the relatively few extreme cases
will be found.
The median is found between the mode and mean or is aligned with them in
a normal distribution.
23
Did you know?
The shape or form of a distribution can influence the researcher’s choice of a
measure of tendency.
Why is that? Well, let’s see…
24
Measures of Variability
Chapter 4: Measures of Variability
Measures of Variability
Measures of variability tell us:
• The extent to which the scores differ from each other or how
spread out the scores are.
• How accurately the measure of central tendency describes the
distribution.
• The shape of the distribution.
Measures of Variability
Just what is variability?
Variability is the spread or dispersion of scores.
Measuring Variability
There are a few ways to measure variability and they include:
1) The Range
2) The Deviation
3) The Standard Deviation
4) The Variance
Variability
Measures of Variability
Range: The range is a measure of the distance between
highest and lowest.
R= H – L
Temperature Example:
Honolulu: 89° – 65°
Phoenix: 106° – 41°
Range:
24°
65°
Okay, so now you tell me the range…
This table indicates the number of
metropolitan areas, as defined
by the Census Bureau, in six
states.
What is the range in the number
metropolitan areas in these six
states?
– R=H-L
– R=9-3
– R=6
Delaware
3
Idaho
4
Nebraska
4
Kansas
5
Iowa
4
Montana
3
California
9
The Variance
Remember that the deviation is the distance of any given score from
its mean.
(X  X )
The variance takes into account every score.
But if we were to simply add them up, the plus and minus (positive and
negative) scores would cancel each other out because the sum of
actual deviations is always zero!
(X  X )  0
The Variance
So, what we should we do?
We square the actual deviations and then add them together.
å (X - X)
2
– Remember: When you square a negative number it becomes
positive!
SO,
S2 = sum of squared deviations divided by the number of scores.
The variance provides information about the relative variability.
Variance: Weeks on Unemployment:
Step 1:
Calculate
the Mean
Step 2: Calculate Step 3: Calculate
Deviation
Sum of square Dev
X
(weeks)
Deviation:
(X - X) (X - X)2
(raw score from the
mean, squared)
9
8
6
4
2
1
ΣX=30
χ= 30=5
6
9-5= 4
8-5=3
6-5=1
4-5=-1
2-5=-3
1-5=-4
42 = 16
32 = 9
12 = 1
-12 = 1
-32 = 9
-42 = 16
å (X - X) = 52
2
The Variance
The mean of the squared deviations is the same as the variance,
and can be symbolized by s2
s
where
2
X - X)
(
å
=
2
N
s 2  variance
2
(
X

X
)
 sum of the squared deviations from the mean

N  total number of scores
Variance: Weeks on Unemployment:
Step 1:
Calculate
the Mean
Step 2: Calculate Step 3: Calculate
Deviation
Sum of square Dev
X
(weeks)
Deviation:
(X - X) (X - X)
9
8
6
4
2
1
ΣX=30
χ= 30=5
6
Variance:
2
(raw score from the
mean, squared)
9-5= 4
8-5=3
6-5=1
4-5=-1
2-5=-3
1-5=-4
42 = 16
32 = 9
12 = 1
-12 = 1
-32 = 9
-42 = 16
2
(X
X)
= 52
å
Step 4: Calculate
the Mean of squared dev.
s
2
X - X)
(
å
=
N
(weeks squared)
2
What is a standard deviation?
Standard Deviation:
It is the typical (standard) difference (deviation) of an observation from
the mean.
Think of it as the average distance a data point is from the mean,
although this is not strictly true.
What is a standard deviation?
Standard Deviation:
The standard deviation is calculated by taking the square root of the
variance.
s=
(X - X)
å
n
2
Variance: Weeks on Unemployment:
Step 1:
Calculate
the Mean
Step 2: Calculate Step 3: Calculate
Deviation
Sum of square Dev
X
(weeks)
Deviation:
(X - X) (X - X)
9
8
6
4
2
1
ΣX=30
χ= 30=5
6
Variance:
2
(raw score from the
mean, squared)
9-5= 4
8-5=3
6-5=1
4-5=-1
2-5=-3
1-5=-4
42 = 16
32 = 9
12 = 1
-12 = 1
-32 = 9
-42 = 16
Step 4: Calculate
Step 5: Calculate the
the Mean of squared dev. Square root of the Var.
s
2
X - X)
(
å
=
N
52
= 8.67
6
Standard Deviation:
2
( X - X )2
å
n
(square root of the variance)
s=
8.67
(weeks squared)
å (X - X) = 52
2
s = 2.94
Raw Score Calculations
Here is how you calculate variance using raw scores:
Here is how you calculate standard deviation using raw scores:
S=
Variance: Weeks on Unemployment:
Step 1: Calculate
the Mean
X
(weeks)
9
8
6
4
2
1
ΣX=30
χ= 30=5
_2 6
X =25
Step 2: Calculate
Square raw scores
Step 3: Calculate
Variance
Step 4: Calculate
the Standard Deviation.
2
X
92 = 81
82 = 64
62 = 36
42 = 16
22 = 4
12 = 1
202 – 25 =
6
ΣX 2= 202
S 2= 8.67
____
√ 8.67
33.67 – 25 =
s = 2.94
Standard Deviation
Standard Deviation: Applications
Standard deviation also allows us to:
1) Measure the baseline of a frequency polygon.
2) Find the distance between raw scores and the mean – a standardized
method that permits comparisons between raw scores in the distribution
– as well as between different distributions.
Standard Deviation
Standard Deviation: Baseline of a Frequency Polygon.
The baseline of a frequency polygon can be measured in units of standard
deviation.
Example:
X = 80
s=5
Thus, the raw score 85 lies
one Standard Deviation
above the mean (+1s).
Standard Deviation
Standard Deviation: The Normal Range
Unless highly skewed, approximately two-thirds of scores within a
distribution will fall within the one standard deviation above and below
the mean.
Example: Reading Levels
Words per minute.
X= 120
s = 25