Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Chapter 3
Measures of Central Tendency and
Dispersion
3-1
In this presentation
• Concepts of Central Tendency and
Dispersion
• Mode, Median, Mean
• Variation Ratio, Range and
Interquartile Range, Variance and
Standard Deviation
• Choosing a Measure of Central
Tendency and Dispersion
3-2
The Concept of
Central Tendency
• Central Tendency = most typical, central,
or common score of a variable.
• Three measures of Central Tendency:
– Mode: The most common score.
– Median: The score of the middle case.
– Mean: The average score.
3-3
The Concept of
Central Tendency (continued)
• Mode, median, and mean are three
different statistics.
• They report three different kinds of
information and will have the same value
only in certain specific situations.
• They vary in terms of:
– Level-of-measurement considerations
– How they define central tendency
3-4
The Concept of
Dispersion
• Dispersion = variety, diversity, amount
of variation between scores.
• The greater the dispersion of a variable,
the greater the range of scores and the
greater the differences between scores.
4-5
The Concept of
Dispersion (continued)
• The taller curve has less dispersion.
• The flatter curve has more dispersion.
4-6
The Concept of
Dispersion: Examples
• Students in a given class tend to differ in
their final exam marks.
• Canadians tend to differ in their incomes.
• Countries are diverse in their average
incomes (e.g., the average income of
Americans is higher [at about \$49,000 in
2011] than that of Canadians [\$40,000 in
2011]).
4-7
Measure of Central Tendency and
Dispersion for Nominal Variables
Mode
Variation Ratio
4-8
Mode
• The most common score.
• Can be used with variables at all three
levels of measurement.
• Most often used with nominal level
variables.
3-9
Finding the Mode
1. Count the number of times each score
occurred.
2. The score that occurs most often is the
mode.
– If the variable is presented in a frequency
distribution, the mode is the largest
category.
– If the variable is presented in a line chart,
the mode is the highest peak.
3-10
Finding the Mode: An
Example
3-11
Variation Ratio (v)
• Variation Ratio (v) is one of only a few measures of
dispersion for nominal-level variables.
• v provides a quick, easy way to quantify dispersion.
• Variation Ratio is simply the proportion of cases not
in the modal category. That is:
• v has a lower limit of 0.00 (no variation/all cases are
in the mode) and increases to 1.00 as the proportion
of cases in the mode decreases.
• Thus, the larger the v, the more dispersion in a variable.
4-12
Variation Ratio: An Example
Conclusion: Canadian society has grown increasingly diverse in its
ethnocultural composition, and will be quite heterogeneous by
4-13
2017.
Measure of Central Tendency and
Dispersion for Ordinal Variables
Median
Range/Interquartile Range
4-14
Median (Md)
• Exact center of distribution of scores.
• The score of the middle case.
• Can be used with variables measured at
the ordinal or interval-ratio levels.
– Cannot be used for nominal-level variables.
3-15
Finding the Median
1. Array the cases from high to low.
2. Locate the middle case.
– If n is odd: the median is the score of the
middle case.
– If n is even: the median is the average of
the scores of the two middle cases.
3-16
Finding the Median: Odd
Number of Cases
3-17
Finding the Median:
Even Number of Cases
3-18
The Range (R)
• Range (R ) = High Score – Low Score
• Quick and easy indication of variability.
• Can be used with ordinal or interval-ratio
variables.
• Limitations because based on only two
scores:
1. Distorted by atypically high or low scores
(often referred to as outliers)
2. No information about variation between high
and low scores.
4-19
Interquartile Range (Q)
• Interquartile Range (Q) = distance from the
third quartile (Q3) to the first quartile (Q1), or
symbolically, Q = Q3 - Q1.
• Quartile is a type of Percentile.
•Percentile is the point below which a specific percentage of cases fall.
Thus:
the first quartile, Q1, is the point below which 25% of the cases fall;
the third quartile, Q3, is the point below which 75% of the cases fall.
•Q avoids some problems of R by focusing only on
middle 50 percent of scores.
4-20
Measure of Central Tendency and
Dispersion for Interval-Ratio
Variables
Mean
Standard Deviation
4-21
Mean
• Reports the average score of a
distribution.
• By far the most commonly used measure
of central tendency.
• Requires variables measured at the
interval-ratio level.
• Symbolized as, X, for a sample and, µ,
for a population.
3-22
Finding the Mean
•The calculation is straightforward: add the
scores and then divide by the number of scores.
• The mathematical formula for the sample
mean is*
*The population mean, µ, is calculated using the same method.
3-23
Finding the Mean: An Example
The mean of these five scores is 78
3-24
Characteristics of the
Mean
1. All scores cancel out around the mean.
2. The mean is the point of minimized
variation. “Least squares principle.”
3. The mean uses all the scores.
3-25
1. All Scores Cancel Out
Around the Mean
3-26
2. Mean is the Point of
Minimized Variation
•As illustrated in the table above:
•if we square and sum the differences between the scores and the
mean (78), we get a total of 388.
•if we performed the same operation with any OTHER number—say the
value 77—the sum WILL ALWAYS be greater than 388.
•for example, the sum of the squared differences around 77 is 393.
Hence, the mean is the point in a distribution around which the variation of
the scores (as indicated by the squared differences) is minimized: we call
this the “least squares” principle.
3-27
3. Mean is Affected by
Every Score
• Conclusion:
The strength of the mean is that it uses all the available information
from the variable.
However, its weaknesses is that it is affected by every score.
• If there are some very high or low scores , the mean may be
3-28
Means, Medians, and
Skew
• When a distribution has a few very high or low
scores (outliers), the mean will be pulled in the
direction of the extreme scores.
– For a positive skew, the mean will be greater than the
median.
– For a negative skew, the mean will be less than the
median.
• When an interval-ratio variable has a pronounced
skew, the median may be the more trustworthy
measure of central tendency.
3-29
Means, Medians, and Skew (continued)
3-30
Standard Deviation
• A measure of the degree of dispersion of the data
from the mean.
• Specifically, the “average” distance of each score from the
mean.
• The lowest value possible is 0 (no dispersion)
• Symbolized as, s, for a sample and, σ (sigma), for a
population.
• Square of the standard deviation is the variance,
symbolized as s2 for a sample and σ2 for a population
• Is used in combination with the mean to describe a
"Normal” distribution (Ch.4).
4-31
Standard Deviation
(continued)
• Meets criteria for good measure of dispersion:
1. Use all scores in the distribution.
2. Describe the average or typical deviation of the scores.
3. Increase in value as the distribution of scores becomes
more diverse.
• As with the mean, the standard deviation requires
variables measured at the interval-ratio level.
4-32
Formula for Sample
Standard Deviation, s*
*The population standard deviation, σ, is calculated using the same method.
4-33
Computing Standard
Deviation
• To solve:
1.
2.
3.
4.
Subtract mean from each score.
Square the deviations.
Sum the squared deviations.
Divide the sum of the squared deviations
by the number of scores.
5. Find the square root of the result.