Download 4.3 Measures of Variation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
Statistical Reasoning
for everyday life
Intro to Probability and
Statistics
Mr. Spering – Room 113
4.3 Measures of Variation

“Top of the Muffin to
you!” ?????
Variation:
 Describes
how widely data are spread out about the
center of a distribution.

????How would you expect the variation to differ between the
running times of theatre movies compared to running times for
television sitcoms????


Theatre movie times more variation
Television sitcoms less variation usually 30 or 60 minutes
4.3 Measures of Variation

How do we investigate variation?
 Study all of the raw data…
 Range…
 Quartiles…
 Five-number summary (BOXPLOT or BOX-and-WHISKER)…
 Interquartile range…
 Semi-quartile range…
 Percentiles…
 MAD…
 Variance
& Standard Deviation…
4.3 Measures of Variation

Today:
 Semi-quartile
range…
 Percentiles…
 MAD…
 Variance
& Standard Deviation…
MAD???
65th
Percentile!
4.3 Measures of Variation

Semi-quartile range

The semi-quartile range is another measure of spread. It is
calculated as one half the difference between the Upper Quartile
(often called Q3) and the Lower Quartile (Q1). The formula for
semi-quartile range is:
(Q3–Q1) ÷ 2.
Since half the values in a distribution lie between Q3 and Q1, the
semi-quartile range is one-half the distance needed to cover half
the values. In a symmetric distribution, an interval stretching from
one semi-quartile range below the median to one semi-quartile
above the median will contain one-half of the values. However, this
will not be true for a skewed distribution.
The semi-quartile range is not affected by higher values, so it is a
good measure of spread to use for skewed distributions, but it is
rarely used for data sets that have normal distributions. In the case
of a data set with a normal distribution, the standard deviation is
used instead. We will discuss standard deviation later.



4.3 Measures of Variation

EXAMPLE: Find the Semi-quartile range of
the data.

Semi-quartile = (Q3–Q1) ÷ 2

4.1, 5.2, 5.6, 6.2, 7.2, 7.7, 7.7, 8.5, 9.3, 11.0
Q1 = 5.6
Q3 = 8.5
Semi-quartile = (8.5 – 5.6) 2
= 1.45




4.3 Measures of Variation


Percentiles
The nth percentile of a data set is (an estimate)
of a value separating the bottom values from the
top (100 – n)%. A data value that lies between
two percentiles is often said to lie in the lower
percentile. You can approximate the percentile
of any data value with the following formulas:
number of values  this data
percentile 
100
total number of values
4.3 Measures of Variation

EXAMPLE: Percentiles.
What percentile is the lowest score, Q1, Q2, Q3,
and highest score?
 4.1, 5.2, 5.6, 6.2, 7.2, 7.7, 7.7, 8.5, 9.3, 11.0
 Lowest number = 0 percentile
 Q1 = 5.6 = 25th percentile
 Q2 = (7.7 - 7.2)/2 = 7.45 = 50th percentile
 Q3 = 8.5 = 75th percentile
 Highest number = 100th percentile
4.3 Measures of Variation

EXAMPLE: Percentiles.
What percentile is the 9.3?
 4.1, 5.2, 5.6, 6.2, 7.2, 7.7, 7.7, 8.5, 9.3, 11.0
 9.3 is the 9th number out of ten, after the numbers
are set in ascending order. Therefore, it is larger
than 9 out of ten numbers, or the 90th percentile.
Note: One quartile is equivalent to 25 percentile while 1 decile is equal to 10
percentile and 1 quintile is equal to 20 percentile
Think about it:
P25 = Q1, P50 = D5 = Q2 = median value, P75 = Q3, P100 = D10 = Q4, P10 =
D1, P20 = D2, P30 = D3, P40 = D4, P60 = D6, P70 = D7, P80 = D8, P90 = D9
4.3 Measures of Variation


MAD: Mean Absolute
Deviation
MAD is the mean of
the absolute
differences between
the “sample mean”
and the data values.
xx

MAD 
n
4.3 Measures of Variation

Example: Find the MAD (Mean Absolute
Deviation).


DATA: 10, 1, 3, 3, 3, 4, 5, 6, 5, 10
Mean = 5
∑ 5, 4, 2, 2, 2, 1, 0, 1, 0, 5 = 22

MAD = 22/10 = 2.2

xx

MAD 
n
4.3 Measures of Variation

Variance…

The variance of a random variable is a measure of statistical
dispersion/distribution found by averaging the squared distance
of its possible values from the mean. Whereas the mean is a way
to describe the location of a distribution, the variance is a way to
capture its scale or degree of being spread out. The unit of
variance is the square of the unit of the original variable.
4.3 Measures of Variation

Example: Find the Variance.


DATA: 10, 1, 3, 3, 3, 4, 5, 6, 5, 10
Mean = 5
2
2
2
2
2
2
2
2
2
2
5
,
4
,
2
,
2
,
2
,
1
,
0
,
1
,
0
,
5

∑ 25, 16, 4, 4, 4, 1, 0, 1, 0, 25

Variance (s2) = 80/9 = 8 8/9 ≈ 8.89

4.3 Measures of Variation

Standard Deviation…

Universally accepted as the best measure of statistical
dispersion/distribution.
Standard deviation is developed because there is a problem with
variances. Recall that the deviations were squared. That means
that the units were also squared. To get the units back the same
as the original data values, the square root must be taken.

4.3 Measures of Variation



Example: Find the Standard Deviation
DATA: 10, 1, 3, 3, 3, 4, 5, 6, 5, 10
Mean = 5
2
2
2
2
2
2
2
2
2
2
5
,
4
,
2
,
2
,
2
,
1
,
0
,
1
,
0
,
5


∑ 25, 16, 4, 4, 4, 1, 0, 1, 0, 25
Variance (s2) = 80/9 = 8 8/9 ≈ 8.89

Standard Deviation =

Variance
88
9
=
≈ 2.981
4.3 Measures of Variation

The Range Rule of Thumb…

The s 2 is approximately related to the range of distribution by
the following:
s
range
s 
4
2

We can use this rule of thumb to estimate the low and high
values:
low value ≈ mean – 2 × standard deviation
high value ≈ mean + 2 × standard deviation

The range rule of thumb does not work well when low and high
values are extreme outliers. Therefore, use careful judgment in
deciding whether the range rule of thumb is applicable.
4.3 Measures of Variation
 The Range Rule of Thumb…
EXAMPLE:
The mean score on the mathematics SAT for women is 496, and the
standard of deviation is 108. Use the range rule of thumb to estimate
the minimum and maximum scores for women on the mathematics
SAT.
low value ≈ mean – 2 × standard deviation
= 496 – (2×108) = 280 minimum
high value ≈ mean + 2 × standard deviation
= 496 + (2×108) = 712 maximum
Is this reasonable?
Of course, scores below 280 and above 712 are unusual on SAT’s.
4.3 Measures of Variation

HOMEWORK:



Pg 174 # 3
Pg 175 # 9, 10, and 14
Pg 176 # 24, pg 176 # 25-27 all (Letters c, d only)