Download frequency distribution

Document related concepts
no text concepts found
Transcript
Psych 230
Psychological Measurement
and Statistics
Pedro Wolf
September 2, 2009
Previously on “let’s learn statistics in five weeks”
• the logic of research
– samples, populations, and variables
• descriptive and inferential statistics
– statistics and parameters
• understanding experiments
– experimental and correlational studies
– independent and dependent variables
• characteristics of scores
– nominal, ordinal, interval, and ratio scales
– continuous and discrete
Which Scale?
Does the variable have an intrinsic value?
NO
Nominal
YES
Does the variable have equal values between scores?
NO
YES
Ordinal
Does the variable have a real zero point?
NO
Interval
YES
Ratio
Continuous
• A continuous scale allows for fractional amounts
– it ‘continues’ between the whole-number amount
– decimals make sense
• Examples:
– Height
– Weight
– IQ
Discrete
• In a discrete scale, only whole-number amounts can
be measured
– decimals do not make sense
– usually, nominal and ordinal scales are discrete
– some interval and ratio variables are also discrete
• number of children in a family
• Special type of discrete variable: dichotomous
– only two amounts or categories
– pass/fail; living/dead; male/female
Today….
•
•
•
•
Why graphical representations of data?
Stem and leaf plots.
Box plots.
Frequency
– what is it
– how a frequency distribution is created
• Graphing frequency distributions
– bar graphs, histograms, polygons
• Types of distribution
– normal, skewed, bimodal
• Relative frequency and the normal curve
– percentiles, area under the normal curve
“… look at the data” (Robert Bolles, 1998)
• Raw data is often messy, overwhelming, and
un-interpretable.
• Many data sets can have thousands of
measurements and hundreds of variables.
• Graphical representations of data can make
data interpretable
• Looking at the data can inspire ideas.
What in the world could these data mean?
Imagine over 30,000 observations
Time
Lat
930485:23:06.8600001
930497:04:34.77
930497:04:59.7599998
930497:05:46.7600002
930497:06:05.7600002
930497:06:16.7600002
930497:06:28.7599998
930497:09:31.77
930497:09:58.77
930497:10:07.77
930497:10:37.77
930497:11:38.77
Long
32.20497
32.20482
32.20487
32.20485
32.20578
32.20678
32.20698
32.20687
32.2055
32.20555
32.20687
32.20672
-111.028
-111.028
-111.028
-111.029
-111.029
-111.029
-111.028
-110.999
-110.993
-110.992
-110.986
-110.979
After plotting those data
•By plotting the data and
superimposing it on map
data, suddenly the previous
slide’s data can tell a story
•Of course not all data can
tell such a story
• People have developed
various ways to visualize
their data graphically
Stem and Leaf Plots
5|46799 5
6|34688 5
7|2256
4
8|148
3
9|
0
10 | 6
1
N
=
18
•data - 54, 56, 57, 59, 59, 63, 64, 66, 68, 72 …
•preserves the data in tact.
is a way to see the distribution
•numbers on the left of the line are called
the stems and represent the leading edge of
each of the numbers
•numbers on the right of the line are called
the leaves and represent the individual
numbers
• indicate their value by completing the
stem.
Box Plots
•Each of the lines in a box plot
represents either quartiles or the
range of the data.
•In this particular plot the dots
represent outliers.
Frequency distributions - why?
• Standard method for graphing data
– easy way of visualizing group data
• Introduction to the Normal Distribution
– underlies all of the statistical tests we will be studying
this semester
– understanding the concepts behind statistical testing will
make life a lot easier later on
Frequency
Frequency - some definitions
• Raw scores are the scores we initially measure in a
study
• The number of times a score occurs in a set is the
score’s frequency
• A distribution is the general name for any organized
set of data
• A frequency distribution organizes the scores based
on each score’s frequency
• N is the total number of scores in the data
Understanding Frequency Distributions
• A frequency distribution table shows the number of
times each score occurs in a set of data
• The symbol for a score’s frequency is simply f
• N = ∑f
Raw Scores
• The following is a data set of raw scores. We will
use these raw scores to construct a frequency
distribution table.
14
14
13
15
11
15
13
10
12
13
14
13
14
15
17
14
14
15
Frequency Distribution Table
Frequency Distribution Table - Example
• Make a frequency distribution table for the
following scores:
5, 7, 4, 5, 6, 5, 4
Frequency Distribution Table - Example
• Make a frequency distribution table for the
following scores:
5, 7, 4, 5, 6, 5, 4
Value
7
Frequency
1
Frequency Distribution Table - Example
• Make a frequency distribution table for the
following scores:
5, 7, 4, 5, 6, 5, 4
Value
7
6
Frequency
1
1
Frequency Distribution Table - Example
• Make a frequency distribution table for the
following scores:
5, 7, 4, 5, 6, 5, 4
Value
7
6
5
Frequency
1
1
3
Frequency Distribution Table - Example
• Make a frequency distribution table for the
following scores:
5, 7, 4, 5, 6, 5, 4
Value
7
6
5
4
Frequency
1
1
3
2
Frequency Distribution Table - Example
• Make a frequency distribution table for the
following scores:
5, 7, 4, 5, 6, 5, 4
X
7
6
5
4
f
1
1
3
2
Learning more about our data
• What are the values for N and ∑X for the scores
below?
14
14
13
15
11
15
13
10
12
13
14
13
14
15
17
14
14
15
Results via Frequency Distribution Table
What is N?
N = ∑f
Results via Frequency Distribution Table
What is ∑X?
Results via Frequency Distribution Table
What is ∑X?
(17 * 1) = 17
(16 * 0) = 0
(15 * 4) = 60
(14 * 6) = 84
(13 * 4) = 52
(12 * 1) = 12
(11 * 1) = 11
(10 * 1) = 10
__________
Total
= 246
Graphing Frequency Distributions
Graphing Frequency Distributions
• A frequency distribution graph shows the scores on
the X axis and their frequency on the Y axis
Graphing Frequency Distributions
• A frequency distribution graph shows the scores on
the X axis and their frequency on the Y axis
• Why?
– Because it’s not easy to make sense of this:
Graphing Frequency Distributions
• A frequency distribution graph shows the scores on
the X axis and their frequency on the Y axis
• Why?
– Because it’s not easy to make sense of this:
•
On a scale of 0-10, how excited are you about this class: __________
0=absolutely dreading it
10=extremely excited/highlight of my semester
•
Data (raw scores)
5 7 2 3 5 5 5 8 7 7 4 5 10 7 5 4 5 5 7 3 6 2 6 3 5 5 7 2 4 6 3 7 5 5 7 3 5 6 5 5 8 6 7 5 3 5 7 2 3 5 4 5 4 8 3 6 5 5
5 1 2 4 7 5 5 4 3 3 7 5 8 6 3 5 10 0 6 6 3 8 5 4 3 2 4 6 3 7 5 5 7 5 7 5 10 7 5 4 5 5 7 6 3 8 1 5 5 6 4 9 8 5 8 5 7
5 10 7 5 4 5 5 7 4 8 4 5 8 5 5 7 5 5 5 2 4 6 3 7 5 2 4 6 3 7 5 8 6 3 5 10 0 6 7 2 8 8 5 5 8 6 3 6 2 6 3 5 5 7 2 5 10
7 5 4 5 5 7 5 7 5 10 7 5 4 5 5 5 7 2 3 3 7 5 8 6 3 5 10 0 6
Graphing Frequency Distributions
f
4
7
35
40
33
43
11
11
3
6
4
50
40
Fre que ncy
X
10
9
8
7
6
5
4
3
2
1
0
30
20
10
0
01
12
23
34
45
56
7
6
8
7
9
8
Excit ed ab o ut co urs e (0= no ,1 0= ye s )
10
9
11
10
Graphing Frequency Distributions
• A frequency distribution graph shows the scores on
the X axis and their frequency on the Y axis
• The type of measurement scale (nominal, ordinal,
interval, or ratio) determines whether we use:
– a bar graph
– a histogram
– a frequency polygon
Graphs - bar graph
• A frequency
bar graph is
used for
nominal and
ordinal data
Graphs - bar graph
• A frequency
bar graph is
used for
nominal and
ordinal data
Values on the x-axis
Graphs - bar graph
• A frequency
bar graph is
used for
nominal and
ordinal data
Frequencies on
the y-axis
Graphs - bar graph
• A frequency
bar graph is
used for
nominal and
ordinal data
In a bar graph,
bars do not touch
Graphs - histogram
• A histogram is
used for a
small range of
different
interval or
ratio scores
Graphs - histogram
• A histogram is
used for a
small range of
different
interval or
ratio scores
Values on the x-axis
Graphs - histogram
• A histogram is
used for a
small range of
different
interval or
ratio scores
Frequencies on
the y-axis
Graphs - histogram
• A histogram is
used for a
small range of
different
interval or
ratio scores
In a histogram,
adjacent bars touch
Graphs - frequency polygon
• A frequency
polygon is
used for a
large range of
different
scores
Graphs - frequency polygon
• A frequency
polygon is
used for a
large range of
different
scores
In a freq. polygon,
there are many
scores on the x-axis
Constructing a Frequency Distribution
• Step 1: make a frequency table
• Step 2: put values along x-axis (bottom of page)
• Step 3: put a scale of frequencies along y-axis (left
edge of page)
• Step 4 (bar graphs and histograms)
– make a bar for each value
• Step 4 (frequency polygons)
– mark a point above each value with a height for the
frequency of that value
– connect the points with lines
Graphing - example
• A researcher observes driving behavior on a road,
noting the gender of drivers, type of vehicle driven,
and the speed at which they are traveling. Which
type of graph should be used for each variable?
• Gender?
• nominal: bar graph
• Vehicle Type?
• nominal: bar graph
• Speed?
• ratio: frequency polygon
Use and Misuse of Graphs -2
600
Number of Felonies
500
400
300
200
100
0
2000
2001
2002
Year
2003
Use and Misuse of Graphs
240
600
235
500
230
400
Number of Felonies
Number of Felonies
• Which graph is correct?
225
220
215
300
200
100
210
0
2000
2001
2002
Year
2003
2000
2001
2002
2003
Year
• Neither does a very good job at summarizing the
data
• Beware of graphing tricks
Types of Distributions
Distributions
• Frequency tables, bar-graphs, histograms and
frequency polygons describe frequency distributions
Distributions - Why?
• Describing the shape of this frequency distribution is
important for both descriptive and inferential
statistics
• The benefit of descriptive statistics is being able to
understand a set of data without examining every
score
Distributions : The Normal Curve
• It turns out that many, many variables have a
distribution that looks the same. This has been called
the ‘normal distribution’.
• A bell-shaped curve
• Symmetrical
• Extreme scores have a low frequency
– extreme scores: scores that are relatively far above or far
below the middle score
The Ideal Normal Curve
The Ideal Normal Curve
Symmetrical
The Ideal Normal Curve
Most scores in
middle range
The Ideal Normal Curve
Few extreme scores
The Ideal Normal Curve
In theory, tails never
reach the x-axis
Normal Curve - height
How ta ll a re you (in inche s )?
40
Fre qu e n cy
30
20
10
0
5 2.5
5 5 .0
5 7 .5
6 0 .0
6 2 .5
65 .0
6 7 .5
7 0 .0
He igh t (in ch e s )
7 2 .5
7 5 .0
77 .5
8 0 .0
Normal Curve - hours slept
60
50
Fre que ncy
40
30
20
10
0
1
0
2
1
3
2
4
3
5
4
6
5
7
6
8
7
9
8
Ho urs o f Slee p la s t nig ht
10
9
11
10
12
13
11 12
Normal Curve - GPA
50
Fre que ncy
40
30
20
10
0
1 .7 5
2 .0 0 2 .2 5
2 .5 0 2 .7 5
3 .0 0
3 .2 5 3 .5 0 3 .7 5 4 .0 0
GPA
4 .2 5
4 .5 0
Normal Distributions
• While the scores in the population may approximate
a normal distribution, it is not necessarily so for a
sample of scores
How ta ll a re you (in inche s )? (N= 10 )
3 .0
Fre qu e n cy
2 .0
1 .0
0 .0
6 1.5
6 2 .5
6 3 .5
6 4 .5
6 5 .5
66 .5
6 7 .5
6 8 .5
He igh t (in ch e s )
6 9 .5
7 0 .5
71 .5
7 2 .5
Skewed Distributions
• A skewed distribution is not symmetrical. It has only
one pronounced tail
• A distribution may be either negatively skewed or
positively skewed
• Negative or positive depends on whether the tail
slopes towards or away from zero
– the side with the longer tail describes the distribution
• Tail on negative side : negatively skewed
• Tail on positive side : positively skewed
Negatively Skewed Distributions
Negatively Skewed Distributions
Tail on negative side:
Negatively skewed
Negatively Skewed Distributions
Contains extreme
low scores
Negatively Skewed Distributions
Does not contain
extreme high scores
Negatively Skewed Distributions
Can occur due to
a “ceiling effect”
Positively Skewed Distributions
Positively Skewed Distributions
Tail on positive side:
Positively skewed
Positively Skewed Distributions
Contains extreme
high scores
Positively Skewed Distributions
Does not contain
extreme low scores
Positively Skewed Distributions
Can occur due to
a “floor effect”
Positively Skewed Distributions
1 00
Fre que ncy
80
60
40
20
0
1
2
3
4
Ran k in Fa m ily
5
6
Bimodal Distributions
• a symmetrical
distribution
containing two
distinct humps
Bimodal - birth month
Wha t m onth we re you b orn?
25
Fre qu e n cy
20
15
10
5
0
Jan
Feb
Ma r
Ap r
Ma y
Ju n
Ju l
Au g
Mon th Bo rn
Se p
Oc t
No v
Dec
Distributions - data
• How many alcoholic
drinks do you have
per week?
Distributions - data
• How many alcoholic
drinks do you have
per week?
1 00
Fre que ncy
80
60
40
20
0
.5
2 .5
4 .5
6 .5
8 .5
1 0 .5 1 2 .5 1 4 .5 1 6 .5 1 8 .5 2 0 .5 2 2 .5 2 4 .5
Alco h olic d rin ks per we e k
Distributions - data
• How many alcoholic
drinks do you have
per week?
1 00
• Positively skewed
Fre que ncy
80
60
40
20
0
.5
2 .5
4 .5
6 .5
8 .5
1 0 .5 1 2 .5 1 4 .5 1 6 .5 1 8 .5 2 0 .5 2 2 .5 2 4 .5
Alco h olic d rin ks per we e k
Distributions - data
• How much did you
spend on textbooks
for this semester?
Distributions - data
• How much did you
spend on textbooks
for this semester?
60
50
Fre que nc y
40
30
20
10
0
50
1 50
1 00
2 50
2 00
3 50
3 00
4 50
4 00
5 50
5 00
6 50
6 00
Sp e nt o n Tex t b o ok s ($ )
7 50
7 00
8 50
8 00
9 00
Distributions - data
• How much did you
spend on textbooks
for this semester?
60
50
• Normal
– one outlier
Fre que nc y
40
30
20
10
0
50
1 50
1 00
2 50
2 00
3 50
3 00
4 50
4 00
5 50
5 00
6 50
6 00
Sp e nt o n Tex t b o ok s ($ )
7 50
7 00
8 50
8 00
9 00
Kurtosis
• meso- Forming chiefly scientific terms with the sense ‘middle,
intermediate’
• lepto- Small, fine, thin, delicate
• platy- Forming nouns and adjectives, particularly in biology and
anatomy, with the sense ‘broad, flat’
Relative Frequency and
the Normal Curve
Relative Frequency
• Another way to organize scores is by relative
frequency
• Relative frequency is the proportion of time that a
particular score occurs
– remember: a proportion is a number between 0 and 1
• Simple frequency: the number of times a score
occurs
• Relative frequency: the proportion of times a score
occurs
Relative Frequency - Why?
• We are still asking how often certain scores occurred
• Sometimes, relative frequency is easier to interpret
than simple frequency
• Example:
• 82 people in the class reported drinking no alcohol weekly
– Simple frequency
• 0.42 of the class (42%) reported drinking no alcohol
– Relative frequency
Relative Frequency
• The formula for a score’s relative frequency is:
relative frequency =

f
N
Relative Frequency Distribution
Example
• Using the following data set, find the relative
frequency of the score 12
14
14
13
15
11
15
13
10
12
13
14
13
14
15
17
14
14
15
Example
• The frequency table
for this set of data
is:
14
14
13
15
11
15
13
10
12
13
14
13
14
15
17
14
14
15
Example
• The frequency for the score of 12 is 1, and N = 18
• Therefore, the relative frequency of 12 is:
Example
• The frequency for the score of 12 is 1, and N = 18
• Therefore, the relative frequency of 12 is:
f
1
relative frequency   0.06
N 18
Relative Frequencies
• We can also add relative frequencies together.
– For example , what proportion of people scored a passing mark in this exam
(>3):
Value
Frequency
Relative Frequency
6
5
5/18 = 0.28
5
6
6/18 = 0.33
4
3
3/18 = 0.17
3
2
2/18 = 0.11
2
1
1/18 = 0.06
1
1
1/18 = 0.06
N=18
Total=1.00
Relative Frequencies
• We can also add relative frequencies together.
– For example , what proportion of people scored a passing mark in this exam
(>3): 0.28+0.33+0.17=0.78
Value
Frequency
Relative Frequency
6
5
5/18 = 0.28
5
6
6/18 = 0.33
4
3
3/18 = 0.17
3
2
2/18 = 0.11
2
1
1/18 = 0.06
1
1
1/18 = 0.06
N=18
Total=1.00
Relative Frequency and the Normal Curve
• When the data are normally distributed (as most
data are), we can use the normal curve directly to
determine relative frequency.
• There is a known proportion of scores above or
below any point
• For example, exactly 0.50 of the scores lie above the mean
Relative Frequency and the Normal Curve
• The proportion of the total area under the normal
curve at certain scores corresponds to the relative
frequency of those scores.
Relative Frequency and the Normal Curve
• Normal distribution showing the area under the
curve to the left of selected scores
Percentiles
• A percentile is the percent of all scores in the data
that are at or below a score
– Example: 98th percentile - 98% of the scores lie below this.
Homework
• Complete exercises 1, 6, and 9 for chapter 3.
• Read chapter 4 and 5 for next week.
Related documents