Download EPSY 439 - Texas A&M University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
EPSY 439
Variability
Range
Variance
Standard Deviation
z-score
Concept Map
Variability - Chapter 3
Index of
Vari abil ity
D is tribution
Standard
D ev iation
R ange
D epends only
on ex tr eme
s c ores
Single Sc ore
Same mean,
s ame r ange differer nt form
D ev iation
Sc ores :(X- Mean)
C omputation:
R aw- Sc ore
z Sc ores
Graphing
Standard
D ev iations
Meas ure of a
s ample's
v ariablity
Population
v ariability
C omputation:
D ev iation-Sc ore
Varianc e
C omputation:
D ev iation-Sc ore
D es c ribes the relation
of an X to the Mean with
res pec t to the v ar iability of
the dis tribution
Es timate of a
population's
v ariability
R epres ents # of SD s
a s c ore is abov e or
below the Mean
Giv es y ou a
w ay to c ompar e
raw s c ores
Standard Sc ore
R ange betw een
-3 and +3
C omputation:
R aw- Sc ore
C onc ept of N -1
May 24, 2017
Copyright 2000 - Robert J. Hall
2
Overview
The mean, as a single value, is a useful measure
for describing


the center of a distribution and
where the most frequent scores in a distribution are
located.
It does not tell us very much, however, about


scores away from the center of the distribution
and/or
scores that occur infrequently.
May 24, 2017
Copyright 2000 - Robert J. Hall
3
Overview
Thus, to describe a data set accurately, we
need to know not only where scores are
centered but also about how much
individual scores within the distribution differ
from one another. Measures of variability,
then, provide context (spread/consistency)
for measures of center.
May 24, 2017
Copyright 2000 - Robert J. Hall
4
Overview
Three different distributions have the same mean score
Sample A
0
2
6
10
12
S` X = 6
Sample B
8
7
6
5
4
S` X = 6
Sample C
6
6
6
6
6
S` X = 6
Each of the three samples has a mean of 6, so if you did not look at
the raw scores, you might think that we have three identical
distributions. Clearly, they are not the same.
May 24, 2017
Copyright 2000 - Robert J. Hall
5
Overview
Distance between the locations of scores in three distributions
X
Sample A
Sample B
Sample C
`X = 6
`X = 6
`X = 6
0, 2, 6, 10, 12
4, 5, 6, 7, 8
6, 6, 6, 6, 6
XXXXX
X
X
X
X
X
X
X
X
X
0 2 4 6 8 10 12
0 2 4 6 8 10 12
0 2 4 6 8 10 12
Here you can see differences that separate scores in the three samples.
May 24, 2017
Copyright 2000 - Robert J. Hall
6
Statistical Model of Variability
The statistical model for abnormality requires
two statistical concepts:


a measure of the average and
a measure of the deviation from the average
– We have examined the standard techniques for defining
the average score; we will now focus on methods for
measuring deviations.
– The fact that scores deviate from average means that they
are variable.
Variability, then, is one of the most
basic statistical concepts.
May 24, 2017
Copyright 2000 - Robert J. Hall
7
Meaning of Variability
The term variability has much the same
meaning in statistics as it has in everyday
language; to say that things are variable
means that they are not all the same.

In statistics, our goal is to measure the
amount of variability for a particular set of
scores, a distribution.
– Are scores all clustered together, or are they
scattered over a wide range of values?
May 24, 2017
Copyright 2000 - Robert J. Hall
8
Measuring Variability
A good measure of variability should:

provide an accurate picture of the spread of
the distribution, and
 give an indication of how well an individual
score (or group of scores) represents the
entire distribution.
– In inferential statistics, where we often use small
samples to answer general questions about large
populations, variability is a particularly important
concern. Why?
May 24, 2017
Copyright 2000 - Robert J. Hall
9
Types of Variability
In this section we will consider three
different measures of variability:

the range,
– interquartile range

the standard deviation, and
 the variance
Of these three, the standard deviation is by
far the most important.
May 24, 2017
Copyright 2000 - Robert J. Hall
10
Range
 The range is the distance between the
largest score (Xmax) and the smallest
score in the distribution (Xmin).

In determining this distance, you must also
take into account the real limits of the
maximum and minimum X values.
 The range, therefore, is computed as:
range = URL Xmax  LRL Xmin
May 24, 2017
Copyright 2000 - Robert J. Hall
11
Range
 The range is the most obvious way of
describing how spread out the scores
are in the distribution.
*** Problem ***

Range is completely determined by the
two extreme values and ignores the other
scores in the distribution.
May 24, 2017
Copyright 2000 - Robert J. Hall
12
Range
Example:

The following two distributions have
exactly the same range (10 points in each
case) but
– Distribution 1 clusters scores at the upper end
of the range
– Distribution 2 spreads scores out over the
range
Distribution 1: 1, 8, 9, 9, 10, 10
Distribution 2: 1, 2, 4, 6, 8, 10
May 24, 2017
Copyright 2000 - Robert J. Hall
13
Range
Summary

Because the range does not consider all
the scores in the distribution, it often does
not give an accurate description of the
variability for the entire distribution.
 Generally, the range is considered to be a
crude and unreliable measure of
variability.
– John Tukey has developed ways looking at data that
give prominence to the dispersion of data using what is
called the boxplot and interquartile range.
May 24, 2017
Copyright 2000 - Robert J. Hall
14
Interquartile Range
 A distribution can be divided into four
equal parts using quartiles.

The first quartile (Q1) is the score that
separates the lower 25% of the distribution
from the rest.
 The second quartile (Q2 or median) is the
score that has exactly two quarters, or
50%, of the distribution below it.
May 24, 2017
Copyright 2000 - Robert J. Hall
15
Interquartile Range

Finally, the third quartile (Q3) is the score
that divides the bottom three-fourths of the
distribution from the top quarter.
 The interquartile range is the distance
between the first quartile and the third
quartile:
interquartile range = Q3 - Q1
May 24, 2017
Copyright 2000 - Robert J. Hall
16
Interquartile Range
Frequency distribution for a set of 16 scores
4.0
Interquartile range
(3.5 points)
Bottom
3.0
25%
Top 25%
2.0
1.0
Std. Dev = 2.50
Mean = 6.3
0.0
N = 16.00
2.0
4.0
3.0
6.0
5.0
8.0
7.0
Q1 = 4.5
10.0
9.0
11.0
Q3 = 8
SCORE
May 24, 2017
Copyright 2000 - Robert J. Hall
17
Boxplot
Max
Q2 = 6
Valu e
Q3 = 8
Q1 = 4.5
Min
12
11
10
9
8
7
6
5
4
3
2
1
0
N=
IQR = 3.5
16
SCORE
May 24, 2017
Copyright 2000 - Robert J. Hall
18
Construction of Boxplot
 Draw a number line that includes the
range of observations.
 Compute Q1, Q2, and Q3.
 Above the line drawn in the first step,
draw a box extending from Q1 to Q3.
 Inside the box, draw a line at the
median.
May 24, 2017
Copyright 2000 - Robert J. Hall
19
Construction of Boxplot
 To identify the outliers, compute the
lower and upper fences, which are
located at 1.5 (IQR) below Q1 and
above Q3. That is, the lower fence is
fL= Q1 - 1.5(IQR)
and the upper fence is
fU= Q3 + 1.5(IQR)
May 24, 2017
Copyright 2000 - Robert J. Hall
20
Construction of Boxplot
 Observations located beyond the fence
are classified as outliers and are
identified with an asterisk (*).

If there are no outliers, extend horizontal
line segments (whiskers) from the ends of
the box to the smallest and largest
observations.
 If there are outliers, extend the whiskers to
the smallest and largest non-outliers.
May 24, 2017
Copyright 2000 - Robert J. Hall
21
May 24, 2017
Copyright 2000 - Robert J. Hall
22
Example
Temperatures at time of space shuttle launches:
53
57
58
63
66
67
67
67
68
69
70
70
70
70
72
73
75
75
76
76
78
79
80
81
Five number summary:
Low
53
May 24, 2017
Q1
66.84
M
70
Q3
75.5
Copyright 2000 - Robert J. Hall
High
81
23
Example - Frequency Distribution
T
e
Median:
70.5
m
a
u
li
la
r
r
u
r
c
c
c
e
e
e
e
V
5
3
a
1
2
2
2
5
7
1
2
2
3
5
8
1
2
2
5
6
3
1
2
2
7
6
6
1
2
2
8
6
7
3
5
5
3
6
8
1
2
2
5
6
9
1
2
2
7
7
0
4
7
7
3
7
2
1
2
2
5
7
3
1
2
2
7
7
5
2
3
3
0
7
6
2
3
3
3
7
8
1
2
2
5
7
9
1
2
2
7
8
0
1
2
2
8
8
1
1
2
2
0
T
o
4
0
0
T
o
4
0
1
58.3
70.0
?
50
69.5
41.7
16.6
50  41.7
8 .3

 .50
16.6
16.6
.50 * 1  .5
69.5  0.5  70.0
May 24, 2017
Copyright 2000 - Robert J. Hall
24
Example - Frequency Distribution
T
e
First Quartile:
67.5
33.3
m
a
u
li
la
r
r
u
r
c
c
c
e
e
e
e
V
5
3
a
1
2
2
2
5
7
1
2
2
3
5
8
1
2
2
5
6
3
1
2
2
7
6
6
1
2
2
8
6
7
3
5
5
3
6
8
1
2
2
5
6
9
1
2
2
7
7
0
4
7
7
3
7
2
1
2
2
5
7
3
1
2
2
7
7
5
2
3
3
0
7
6
2
3
3
3
7
8
1
2
2
5
7
9
1
2
2
7
8
0
1
2
2
8
8
1
1
2
2
0
T
o
4
0
0
T
o
4
0
1
66.84
?
25
66.5
20.8
12.5
25  20.8 4.2

 .34
12.5
12.5
.34 * 1  .34
66.5  0.34  66.84
May 24, 2017
Copyright 2000 - Robert J. Hall
25
Example - Frequency Distribution
T
e
Third Quartile:
75.5
75.0
m
a
u
li
la
r
r
u
r
c
c
c
e
e
e
e
V
5
3
a
1
2
2
2
5
7
1
2
2
3
5
8
1
2
2
5
6
3
1
2
2
7
6
6
1
2
2
8
6
7
3
5
5
3
6
8
1
2
2
5
6
9
1
2
2
7
7
0
4
7
7
3
7
2
1
2
2
5
7
3
1
2
2
7
7
5
2
3
3
0
7
6
2
3
3
3
7
8
1
2
2
5
7
9
1
2
2
7
8
0
1
2
2
8
8
1
1
2
2
0
T
o
4
0
0
T
o
4
0
1
8.3
74.5
66.7
75.0  66.7 8.3

 1.00
8.3
8.3
1.00 * 1  1.00
74.5  1.00  75.5
May 24, 2017
Copyright 2000 - Robert J. Hall
26
Example - Summary Statistics
S
u
t
e
r
N
V
S
4
M
S
0
M
S
0
M
S
0
M
S
0
S
S
2
V
S
7
S
S
5
S
2
E
K
S
4
S
8
E
R
S
8
P
2
S
0
5
S
0
7
S
5
May 24, 2017
Copyright 2000 - Robert J. Hall
27
Example - Histogram
Histogram
6
5
Frequency
4
3
2
Std. Dev = 7.22
1
Mean = 70.0
N = 24.00
0
52.5
57.5
55.0
62.5
60.0
67.5
65.0
72.5
70.0
77.5
75.0
80.0
Space Shuttle Launch Temperatures
May 24, 2017
Copyright 2000 - Robert J. Hall
28
Example - Boxplot
90
80
70
60
1
50
N=
O1 indicates that case number
1, 53o, is an outlier in this data
set. Case 1 was the
temperature at the time that
the Challenger space shuttle
was launched. On the morning
of January 28, 1986
Challenger exploded and
crashed 73 seconds after take
off. There were no survivors.
A simple boxplot such as this
might have raised enough of a
red flag for mission control to
abort the launch.
24
Launch Temperature
May 24, 2017
Copyright 2000 - Robert J. Hall
29
Victims of Challenger Disaster
Seven astronauts were killed when the space shuttle Challenger
suddenly exploded shortly after launch on January 28, 1986. The
crewmembers who died during the mission are shown here.
May 24, 2017
Copyright 2000 - Robert J. Hall
30
May 24, 2017
Copyright 2000 - Robert J. Hall
31
Standard Deviation - Index of
Variability
 The most commonly used measure of
the variability of scores in a distribution
is the standard deviation.

It is an index of the variability (spread) of
scores about the mean of a distribution,
something like the average dispersion or
deviation of the scores (X) about their
mean (`X ).
– The greater the spread of scores, the greater
the standard deviation.
May 24, 2017
Copyright 2000 - Robert J. Hall
32
Standard Deviation - Illustration
– For example, in the figure below, the scores in
distribution A cluster about the mean of the
distribution, and the scores in distribution B are
spread out.

The standard deviation for distribution A, then, would
be less than the standard deviation for distribution B.
A
B
May 24, 2017
Copyright 2000 - Robert J. Hall
33
Standard Deviation - Definition
 More formally, the standard deviation may
be defined as the square root of the
average squared deviation of scores from
the mean of the distribution, measured in
units of the original score.

This definition is reflected in the deviation
score formula.
S
May 24, 2017
2
(
X

X
)


N
Copyright 2000 - Robert J. Hall
2
x

N
34
Standard Deviation - Explanation
 Major limitation of the range as a
measure of the variability of scores in a
distribution is that it is based on two
extreme, often unstable, scores.

Another way to measure the spread of
scores in a distribution might be to consider
the variability of each of the scores in the
distribution about the mean -- the most
stable measure of central tendency -- from
each of the scores in the distribution.
May 24, 2017
Copyright 2000 - Robert J. Hall
35
Standard Deviation - Deviation
Scores
 This process produces deviation scores
that can be represented as:
x  X X

These deviation scores provide a measure
of the distance of each raw score from the
mean of the distribution.
May 24, 2017
Copyright 2000 - Robert J. Hall
36
Standard Deviation - S(X - `X)
 Since we want a measure that takes all
deviation scores into account, the most
obvious next step would be to sum up
the deviation scores.

Since the mean is the balance point of the
distribution, however, we know that this
attempt is doomed to failure:
 ( X X )   x  0
May 24, 2017
Copyright 2000 - Robert J. Hall
37
Standard Deviation - S(X - `X)2
 How do we get around this problem?

Square each deviation score to eliminate
the sign and to preserve the relative
distance between scores and the mean.
 This would yield:
 ( X X )
May 24, 2017
2
 x  0
Copyright 2000 - Robert J. Hall
2
38
Standard Deviation - Control for N
 We’re finished, right?

No.
 We still have a problem comparing across
distributions because different distributions
are likely to have different numbers of
people.
 To control for the size of the distribution,
we get a measure of the “average” amount
of variability about the mean of the
distribution.
May 24, 2017
Copyright 2000 - Robert J. Hall
39
Standard Deviation - Average
Deviation
 This average can be obtained by
dividing the sum of squared deviation
scores by N:
 ( X X )
N
May 24, 2017
2
x


Copyright 2000 - Robert J. Hall
2
N
40
Standard Deviation - Average
Dispersion
 This formula will give us something like
the average dispersion of scores in the
distribution about the mean; it is defined
as the variance.
 The last problem is to return the
measure of variability back to the
original score scale.

This problem can be solved by taking the
square root.
May 24, 2017
Copyright 2000 - Robert J. Hall
41
Standard Deviation - Square Root
 ( X X )
N
May 24, 2017
2

Copyright 2000 - Robert J. Hall
x
2
N
42
Standard Deviation - Example
(Deviation Score)
X
1
3
5
7
9
Mean
5
5
5
5
5
x
-4
-2
0
2
4
S(X - `X)2
2
x
16
4
0
4
16
x
2
N
40

8
5
2
x
  40
x
N
May 24, 2017
2
 8  2.83
Copyright 2000 - Robert J. Hall
43
Standard Deviation - Example (Raw
Score)
X
1
3
5
7
9
25
X2
1
9
25
49
81
165
S
X
252
165 
5  165  125 
5
5
May 24, 2017
Copyright 2000 - Robert J. Hall

X


2
2
N
N
40
 8  2.83
5
44
Organizational Chart - Variability
Describing variability
(differences between scores)
Descriptive measures are used
to describe a known sample of
scores or a population
In formulas final division uses N
To describe sample
variance compute
Sx2
To describe
population variance
compute sx2
Taking square root gives Taking square root gives
Sample standard
deviation
Sx
May 24, 2017
Population
standard deviation
sx
Inferential measures are used
to estimate the population
based on a sample
In formulas final division uses N - 1
To estimate
population variance
compute sx2
Taking square root gives
Estimate population
standard deviation
sx
Copyright 2000 - Robert J. Hall
45