Download DESCRIPTIVE STATISTICS

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Making Sense of Statistics:
A Conceptual Overview
Sixth Edition
Fred Pyrczak
PowerPoints by Pamela Pitman Brown, PhD, CPG
Pyrczak Publishing
Part B: Descriptive Statistics
Section 6
FREQUENCIES, PERCENTAGES, AND
PROPORTIONS
Frequency
• Number of participants or cases
• Symbol for frequency is f (lowercase f, italicized)
• N meaning number of participants
is also used to stand for frequency.
Frequency
• If a report says
f = 23 for a score of 99
then you know that 23 participants had a score of 99
99
Percentage
• Indicates number per 100 who have a
certain characteristic. So when we say
“percent” we are really saying “per 100.”
• Symbol %
Percentage
• 44 % of the registered voters in a town are
registered Democrats. You know that for each 100
registered voters, 44 are Democrats.
Calculating Frequencies
Example: 44% of registered voters in town are Democrats.
Convert 44% to decimal form (move decimal 2 places to left).
44% = .44
If the town has 2,200 registered voters,
to determine the f (# of Democrats), do this:
.44 X 2,200 = 968 Democrats
Calculating Frequencies
The previous statement tells you that for every 100
registered voters there are 44 Democrats.
• Example:
Calculating Percentage
Example: There are 84
gifted children in a
school. Of those 84
gifted children, 22 are
afraid of the dark.
Calculating Percentage
What percentage of the gifted children in the school are afraid of
the dark?
# of
children
afraid of
dark
Total # of
gifted
children
Based on the information given by the
sample , if you asked 100 children
from the same population (gifted) you
would expect 26 of them to be afraid
of the dark!
22 ÷ 84 =.2619
.2619 X 100 = 26.19%
Proportion
• 22 ÷ 84 =.2619
• .2619 is the proportion of gifted children afraid
of the dark. REMEMBER: Proportion is part of 1
(one)
The above statement tells us that
twenty-six hundredths
of the children are afraid of the dark.
• PROPORTIONS ARE KIND OF HARD TO
INTERPRET!
Reporting f Along With %
A news article says that 8% of foreign language students at
Whatsamatta U are Russian majors. BUT what if you knew
that
f = 12 (8%)
12 students study Russian
out of 150 total foreign
language students
at Whatsamatta U
Reporting f Along With %
Whatsammata U has 150 foreign language students (FLS)
Big State U has 350 foreign language students (FLS)
Total # of FLS
Russian majors
Whatsammata U
N = 150
N = 12 (8%)
*So who has the most FLS?
Looking at f (indicated by N ),
who has the most Russian majors?
Looking at %, who has the
most Russian majors?
Big State U
N = 350
N = 14 (4%)
Section 6 Questions
1.
2.
3.
4.
5.
6.
7.
8.
What does frequency mean?
What is the symbol for frequency?
For what does N stand?
If 21% of kindergarten children are afraid of
monsters, how many out of each 100 are afraid?
Suppose you read that 20% of a population of
1,000 was opposed to a city council resolution.
How many are opposed?
What statistic is part of 1?
According to this section, are “percentages” or
“proportions” easier to interpret?
Why is it a good idea to report the underlying
frequencies when reporting percentages?
Section 7
SHAPES OF DISTRIBUTIONS
Frequency Distribution
A frequency distribution is a table that shows how many participants have each score.
The frequency (f) associated with each score (X) is shown.
Distribution of Depression Scores
X
f
22
1
21
3
20
4
19
8
18
5
17
2
16
0
15
1
N=24
Frequency Polygon
8 participants
had score of 19
3 participants
had score of 21
Normal Distribution
• Also called a bell-shaped curve
Positive Skew
Many people have
low income, so
curve is high on
the left
Only a few people
have higher income,
so curve is lower on
the right
Negative Skew
Only a few
people have low
scores
More people have
higher scores, so
curve is to the right.
Bimodal Distribution
Bimodal distributions are rare in research.
Section 7 Questions
1. What is the name of a table that shows how many
participants have each score?
2. What does a frequency polygon show?
3. What is the most important type of curve?
4. Which type of distribution is often found in
nature?
5. In a distribution with a negative skew, is the long
tail pointing to the “left” or to the “right”?
6. When plotted, income in large populations usually
has what type of skew?
7. What is the name of the type of distribution that
has two high points?
8. Which type of distribution is found much less
frequently in research than the others?
Section 8
THE MEAN: AN AVERAGE
MEAN
• Most frequently used average
• So widely used that people refer to the mean as the average
Symbols for mean:
called X-bar (usually in
mathematical stats)
M
m
used in academic journals
M is used for the mean of a population, and m is used for
the mean of a sample drawn from the population.
The mean is the balance point in a distribution of scores.
Specifically, it is the point around which all the deviations
sum to zero.
**The mean is sensitive to extreme scores!
Computing the Mean
Computing the mean is easy! You probably learned it in 4th – 6th
grade, BUT it has been a while. So how is it done?
Mean= Average
SUM (add all scores) and divide by the number of scores
Let’s look at an example:
Scores: 5, 6, 7, 10, 12, 15
Sum of scores: 55
Number of scores: 6
Computation of mean: 55/6 = 9.166= 9.17
*notice that answer was rounded to 2 decimal places
Deviation From the Mean
• This is easy, too!
• Subtract the mean from each score to find the deviation
from the mean.
• The mean is the balance point in a distribution of scores.
Specifically, it is the point around which all the deviations
sum to zero.
•
•
•
•
•
Example:
Scores: 7, 11, 11, 14, 17
Sum of scores is 60
Number of scores: 5
Computation of the mean: 60/5 = 12
Compute Deviations
From the Mean
Scores and Their Deviations From Their Mean__
Score
Mean
Deviation
7
12.00
-5
11
12.00
-1
11
12.00
-1
14
12.00
+2
17
12.00
+5____
Sum of deviations =
0
Substitution of Another
Number for Mean
Score
7
11
11
14
17
Mean
10.00
10.00
10.00
10.00
10.00
Deviation
-3
+1
+1
+4
+7____
Sum of deviations =
10
Major Drawback of the Mean
• Pulled in the direction of extreme scores.
• Extremely high scores will pull the mean higher.
• Extremely low scores will pull the mean lower.
Example:
Group A: 1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 8, 10, 10, 10, 11
Mean for Group A = 5.52
Group B: 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 9, 10, 10, 150, 200
Mean for Group B = 21.24
Extreme Scores
• A distribution that has some extreme scores at one
end but not the other is called a skewed
distribution.
Limitations of the Mean
• Mean is almost always inappropriate for describing
the average of a highly skewed distribution.
• The mean is only appropriate for interval or ratio
scales of measurement.
Synonym of Average
• Measure of central tendency
Section 8 Questions
1.
2.
3.
4.
5.
6.
7.
8.
How is the mean computed?
What are the most commonly used symbols for the mean in
academic journals?
For a given distribution, if you subtract the mean from each
score to get deviations and then sum the deviations, what will
the sum of the deviations equal?
Refer to the example in this section of contributions given to
charity. Explain why the mean for Group B is much higher than
the mean for Group A.
If most participants have similar scores but there are a few very
high scores, what effect will the very high scores have on the
mean?
Is the mean usually appropriate for describing the average of a
highly skewed distribution?
For which scales of measurement is the mean appropriate?
The term measure of central tendency is synonymous with what
other term?
Section 9
MEAN, MEDIAN, AND MODE
Mean
Most frequently used average
The mean is the balance point in a distribution of scores.
Specifically, it is the point around which all the deviations sum
to zero.
**The mean is sensitive to extreme scores!!
Median
• Alternative average
• 50% of scores are above the median, and 50% of
scores are below the median
• Middle point of the distribution
**Unlike the mean, the median is insensitive to
extreme scores! Median is an appropriate average
for describing the typical participants in a highly
skewed distribution.
Determine the Median
• Scores (in order from low to high)
61 61 72 77 80 81 82 85 89 90 92
There are 11 scores. Which one is the middle
score?
Determine the Median
• Scores (in order from low to high)
3
3
7
10 12 15
There are 6 scores. Which one is the middle
score?
Take the sum of the 2 scores (7 + 10 = 17) and
Divide by 2 (17 ÷ 2 = 8.5)
Determine the Median
• Scores (in order from low to high)
3
3
7
10 12 229
There are 6 scores, which one is the middle
score?
Take the sum of the 2 scores (7 + 10 = 17) and
Divide by 2 (17 ÷ 2 = 8.5)
* Median is insensitive to extreme scores!
Mode
• Most frequently occurring score
• This is easy, too!
What is the mode for
the following scores?
Scores (arranged from low to high)
2 2 4 6 7 7 7 9 10 12
Seven occurs the most, so 7 is the mode for
the scores.
What is the mode for
the following scores?
Scores (arranged from low to high)
17 19 20 20 22 23 23
28
There are 2 modes!
This is one disadvantage of using mode; there may
be more than one mode in the distribution.
How do I choose which
average to use?
• Mean
• Usually the most appropriate
• NOT appropriate for nominal or ordinal data
• ONLY can be used with interval or ratio levels of
measurement
• Almost always inappropriate for use with highly
skewed distributions
How do I choose which
average to use?
• Median
• Chose median when mean is not appropriate
• NOT appropriate for nominal data
How do I choose which
average to use?
• Mode
• APPROPRIATE for nominal data
Positions of Averages in Skewed
Distributions
Section 9 Questions
1.
2.
3.
4.
Which average always has 50% of the cases below it?
Which average is defined as the most frequently occurring score?
Which average is defined as the middle point in a distribution?
If you read that the median equals 42 on a test, what percentage of the participants have
scores higher than 42?
5. What is the mode of the following scores? 11, 13, 16, 16, 18, 21, 25
6. Is the mean appropriate for describing highly skewed distributions?
7. This is a guideline from this section: "Choose the median when the mean is
inappropriate." What is the exception to this guideline?
8. For describing nominal data, what is an alternative to reporting the mode?
9. In a distribution with a negative skew, does the "mean" or the "median" have a higher
value?
10. In a distribution with a positive skew, does the "mean" or the "median" have a higher
value?
Section 10
RANGE AND INTERQUARTILE RANGE
Variability
• differences among the scores of participants
• Synonyms are spread & dispersion
NO Variability
Measures of Variability
• Range
• Interquartile range
Range
Difference between the highest score and the lowest score.
2 5
7
7
8
8 10
12 12
15 17 20
To calculate the range, we can subtract the lowest score from
the highest score.
20 – 2 = 18
So we say “the range is 18”
Range
Difference between the highest score and the lowest score.
2 5
7
7
8
8 10
12 12
15 17 20
OR we can simply state that
“the scores range from 2 to 20!”
What is the weakness of using the
range?
EXTREME SCORES
We also call these extreme scores
OUTLIERS
Interquartile Range (IQR)
• The range of the middle 50% of the participants
2 2 2 2.5 3
4
4
5
5 5 5.5 6
6
20
MIDDLE 50%
Lowest 25%
Highest 25%
The range for the middle 50% is only 3 points
5.5 – 2.5 = 3
Section 10 Questions
1. What is the name of the group of statistics designed to concisely describe the amount of
variability in a set of scores?
2. What are the two synonyms for variability?
3. If all participants have the same score on a test, what should be said about the variability in
the set of scores?
4. If the differences among a set of scores are great, do we say that there is "much variability"
or "little variability"?
5. What is the definition of the range?
6. What is a weakness of the range?
7. What is the outlier in the following set of scores? 2, 31, 33, 35, 36, 38, 39
8. What is the outlier in the following set of scores? 50, 50, 52, 53, 56, 57, 75
9. As a general rule, is the range appropriate for describing a distribution of scores with
outliers?
10. What is the definition of the interquartile range?
11. Is the interquartile range unduly affected by outliers?
12. When the median is reported as the average, it is also customary to report which measure
of variability?
Section 10
STANDARD DEVIATION
Standard Deviation
• Most frequently used measure of variability.
• AKA spread and dispersion
• Symbol: S (upper case S, italicized) (population)
• Symbol: s (lower case s, italicized) (sample)
• AKA sd or SD
Standard Deviation
• Statistic that provides an overall measurement of
how much participants’ scores differ from the
mean score of their group.
• Special type of average of the deviation of the
scores from their mean.
Standard Deviation
• The more spread out participants’ scores are
around the mean, the larger the standard
deviation.
Standard Deviation
Example 1:
Scores for Group A: 0, 0, 5, 5, 10, 15,15, 20, 20
M= 10.00, S= 7.45
Greater variability, larger S
Example 2:
Scores for Group B: 8, 8, 9, 9, 10, 11, 11, 12, 12
M= 10.00, S= 1.49
Standard Deviation
Example 3:
Scores for Group C: 10, 10, 10, 10, 10, 10, 10, 10, 10
M= 10.00, S= 0.00
NO variability, NO S
Scores for Group A: 0, 0, 5, 5, 10, 15,15, 20, 20
M= 10.00, S= 7.45
Scores for Group B: 8, 8, 9, 9, 10, 11, 11, 12, 12
M= 10.00, S= 1.49
Scores for Group C: 10, 10, 10, 10, 10, 10, 10, 10, 10
M= 10.00, S= 0.00
Here is what we CAN say:
Each Group has an M= 10.00
1. Group A has more variability than Group B or C.
2. Group B has more variability than Group C.
3. Group C has NO variability.
S & Normal Curve
REMEMBER THIS!!
About 2/3 of the cases (68%) lie within one standard
deviation unit of the mean in a normal distribution.
AND DON’T FORGET THIS!!
“Within one standard deviation unit” means one unit on
both sides of the mean!!
Normal Curve S=10
Normal Curve S=5
Sample Statement Reporting
M&S
Group A has a higher mean (M = 67.89, S = 8.77)
than Group B (M = 60.23, S = 8.54).
Section 11 Questions
1. The term variability refers to what?
2. Is the standard deviation a frequently used measure of variability?
3. The standard deviation provides an overall measurement of how much participants' scores
differ from what other statistic?
4. If the differences among a set of scores are small, this indicates which of the following?
A. There is much variability B. There is little variability
5. What is the symbol for the standard deviation when a population has been studied?
6. Will the scores for "Group D" or "Group E” below have a larger standard deviation if the
two standard deviations are computed?
Group D: 23,24,25,27,27,26 Group E: 10,19,20,21,25,30,40
7. If all the participants in a group have the same score, what is the value of the standard
deviation of the scores?
8. If you read the following statistics in a research report, which group should you conclude
has the greatest variability?
Group F: M = 30.23, S = 2.14; Group G: M = 25.99, S = 3.0 Group H: M = 22.43, S = 4.79
9. What percentage of the cases in a normal curve lies within one standard deviation unit of
the mean (i.e., within one unit above and one unit below the mean)?
10. Suppose M = 30 and S= 3 for a normal distribution of scores. What percentage of the
cases lies between scores of 27 and 30?
11. Suppose M = 80 and S= 10 for a normal distribution of scores. About 68% of the cases lie
between what two scores?