Download Stat 1010: standard deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Non-standard calculus wikipedia , lookup

Law of large numbers wikipedia , lookup

Normal distribution wikipedia , lookup

Multimodal distribution wikipedia , lookup

Transcript
Stat 1010: standard deviation
4.3 Measures of Variation (part 2)
!  How
much variation is there in the data?
!  Look
for the spread of the distribution.
!  What
do we mean by “spread”?
!  Part
!  Part
1: Range, the Quartiles, and IQR
2: Standard deviation (and Variance)
1
Limitations of measures of spread:
Range
!  The
range only considers the most extreme
values (min and max). The middle
observations do not affect the range at all.
" These
1)
0
distributions can have the same range:
2)
5
10
0
5
10
2
Limitations of measures of spread:
5-number summary
!  The
5-number summary and the quartiles
allow some of the middle numbers to
contribute to the measure of spread (better).
" But
we often like to summarize the spread with a
single value (not 5 numbers).
!  Can
we find a measure of spread that allows
EVERY observation to contribute to the
measure AND is a single value?
3
Stat 1010: standard deviation
Yes!
!  The
standard deviation.
" Involves
a computation that includes
EVERY observation.
" Provides
a single summary value of the
spread of a distribution.
4
Standard Deviation
!  Based
" This
!  What
on the deviation from the mean.
is a distance from the center.
is a ‘deviation’?
( xi − x )
Mean computed from ALL
observations (measure of center).
One observed value.
Observation i.
5
Golf Scores (n=6)
46, 44, 50, 43, 47, 52
x=
40
282
= 47 strokes
6
45
50
55
6
Stat 1010: standard deviation
Deviations (distances from the mean)
Graphically
–4
+5
–3
+3
–1
40
45
50
55
7
Deviations (distances from the mean)
Observed value – mean = deviation
Numerically
43 – 47 = – 4
52 – 47 = + 5
–4
+5
–3
–1
40
45
+3
44 – 47 = – 3
50 – 47 = + 3
46 – 47 = – 1
47 – 47 =
50
0
55
8
Standard Deviation
!  Standard
deviation is a measure of the
average of all the distances (or absolute
deviations) from the mean.
!  We
can think of it as the average distance
the observations are from the mean.
!  Larger
standard deviation # more spread.
standard deviation # less spread.
!  Smaller
9
Stat 1010: standard deviation
Standard Deviation
!  Because
the actual mean of the deviations
is always zero, we instead focus on the
mean of the squared deviations.
!  Mean
of the deviations: -4+5+-3+3+-1+0
0
=
=0
6
6
" Also,
we divide by n-1 rather than n for
technical reasons.
" And
we take the square root so we can work
in our original units (not squared units).
10
Standard Deviation
!  The
letter s usually represents the
standard deviation.
(∑ ( x − x ) )
2
s=
i
n −1
(∑(deviations from the mean) )
2
s=
total number of observations −1
11
Standard Deviation
!  Golf
score example.
s=
1st observation or x1
mean or x
(43 – 47)2=(- 4)2=16
(16 + 9 +1+ 25 + 9 + 0)
6 −1
6 observations
altogether
60
=
= 12 = 3.5strokes
5
12
Stat 1010: standard deviation
Technical note: Variance
!  A commonly
discussed measure of spread.
the average squared deviation.
!  Denoted as s2 (where s is the standard
deviation).
!  Almost
∑( x − x ) )
(
=
2
s2
i
n −1
13
Variance
!  Golf
score example.
" Notice
s2 =
the units of the variance is strokes2.
(16 + 9 +1+ 25 + 9 + 0) = 60
5
= 12 strokes
5
2
Standard deviation is nicer to work with than variance because
it goes back to strokes (original units, not squared units).
14
Center & Spread Choice
(General guidelines)
!  For symmetrical distributions
(Like the bell-shaped normal distribution)
" Use
Mean (center) and Standard Deviation (spread)
of these measures are affected by outliers, so
we don’t usually use them for skewed distributions,
but they’re very useful for symmetrical distributions.
" Both
!  For
skewed distributions
" Use
Median (center) and
5-Number Summary (spread)
15