Download Measures of Variation - UB Summer Academy Math

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Measures of
Variation
Upward Bound Statistics
2014 Summer Academy
What is variation?
 Definition: Variation is how far the data is spread
out from the mean
You can have a lot of variation or little variation
 It’s really visible when you’re looking at a bell
curve
2 measures: standard deviation and variance
Standard Deviation
 Two kinds: Sample standard deviation (𝑠) and
population standard deviation (𝜎). Don’t worry
about the second one for now!
 The formula is
𝑠=
(𝑥𝑖 − 𝑥)2
𝑛−1
where xi is each data value, 𝑥 is the mean, and n is the
total number of data points.
Standard Deviation Example
Find the standard deviation of these US gas prices in different cities from
July 5, 2014.
Springfield, MA
$3.73
Hinsdale, NH
$3.63
Columbus, OH
$3.56
Detroit, MI
$3.73
Chicago, IL
$3.59
Cheyenne, WY
$3.37
Denver, CO
$3.52
Los Angeles, CA
$4.17
Omaha, NE
$3.21
Seattle, WA
$3.59
Portland, OR
$3.67
Baltimore, MD
$3.65
New York, NY
$3.88
Austin, TX
$3.71
Variance
 Two kinds: Sample Variance (s2) and population
variance (𝜎2). Don’t worry about the second one for
now!
 The formula is
2
(𝑥
−
𝑥)
𝑖
2
𝑠 =
𝑛−1
where xi is each data value, 𝑥 is the mean, and n is the
total number of data points.
 The variance is the square of the standard deviation—
so to get the variance, find the standard deviation and
square it
 Why? The standard deviation is statistically useful (to
be explained), but the variance is bigger when s > 1
Variance Example
Given the standard deviation of .2145 from the
previous problem, find the variance.
Okay, why is this useful?
 The standard deviation can give us some
important information about data that fits (at least
approximately) on a bell curve.
 According to the empirical rule,
about 68% of all values (in a population) fall within 1
standard deviation of the mean
About 95% of all values fall within 2 standard deviations
of the mean
About 99.7% of all values fall within 3 standard
deviations of the mean
 Probability
Empirical Rule Example
Heights of women have a bell-shaped distribution
with a mean of 63.6 inches and a standard deviation
of 2.5 inches. Using the empirical rule, what is the
approximate percentage of women between
a) 61.1 inches and 66.1 inches?
b) 56.1 inches and 71.1 inches?
Usefulness #2
 The standard deviation can also help us to use
Chebyshev’s theorem. Chebyshev’s theorem applies to
ALL data sets, not just those with a bell-shaped curve.
 The fraction of any data lying on a distribution within
k standard deviations of the mean is always at least
1
1 − 2 , where k is any positive number greater than 1
𝑘
 omg what did you just say?
 If k = 2, that equation evaluates to ¾, or 75%. That means at
least 75% of any data set will be found within 2 standard
deviations of the mean.
 If k = 3, that equation evaluates to 8/9, or 89%. That means at
least 89% of any data set will be found within 3 standard
deviations of the mean.
Chebyshev’s Theorem Example
If heights of women have a mean of 63.6 inches and a
standard deviation of 2.5 inches, what can you
conclude from Chevyshev’s Theorem about the
percentage of women between 58.6 inches and 68.6
inches?
 StDev in Excel – 2 ways