Download Measures of Variation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Descriptive Statistics
Measures of Variation

Essentials

Measures of Variation



Range

Variance

Standard Deviation

Interquartile Range (in Measures of Position)
Empirical Rule

Chebychev’s Theorem (in Additional Topics)

Example
Additional Topics
Essentials: Measures of Variation
(Variation – a must for statistical analysis.)

Know the types of measures used to look at variation and the type data
to which they apply.

Be able to calculate the range, standard deviation and inter-quartile
range.

Be able to determine the distance away from the mean a given value lies
in terms of standard deviations (think z-score).

Be able to apply the Empirical Rule and Chebychev’s Theorem to
specific situations.
Measures of Variation

Range

Variance

Standard Deviation

Interquartile Range

(IQR; see Measures of Position)
Range

The Range of a data set is the difference between the highest
value and the lowest value.

Example: Given

the following data values, identify the range of the
distribution.
 Values:
 Range
2, 4, 6, 8, 10
= 10 – 2 = 8
Variance

For a sample the variance is a measure of variation equal to the
sum of the squared deviation scores divided by n-1. It is also the
square of the standard deviation.
Sample Variance:
2

(
x

x
)
s 
2
n 1
Sample Standard Deviation

Standard deviation is a measure of the typical amount an entry
deviates (or varies) from the mean.

The more the entries are spread out, the greater the standard
deviation.

Sample Standard Deviation(s):
Definition Formula 2
s  ( x  x)
n x 2  (x) 2
n(n  1)
Calculation Formula
n 1
s
Interpreting Standard Deviation

Standard deviation is a measure of the typical amount an entry
deviates from the mean.

The more the entries are spread out, the greater the standard
deviation.
Larson/Farber 4th ed.
7
Anatomy of the Standard Deviation
The Standard Deviation is the most used measure of dispersion (how spread out the data are from one another).
The value of the Standard Deviation tells us how closely the values of observations for a data set are
clustered around the mean. A lower value of the Standard Deviation for a data set indicates that the
values of that data set are spread over a relatively smaller range around the mean. A large value of
the Standard Deviation for a data set indicates that the values of that data set are spread over a
relatively larger range around the mean.
Mean = 120
Standard Deviation = 2
n = 500
Mean = 120
Standard Deviation = 20
n = 500
60
80
70
60
40
Frequency
Frequency
50
30
20
50
40
30
20
10
10
0
0
112
117
122
127
80
NOTATION
When we refer to the Population Standard Deviation, it is denoted by 
When we refer to the Sample Standard Deviation, it is denoted by s
130
180
Interpreting Standard Deviation: Empirical Rule
(68 – 95 – 99.7 Rule)
For data with a (symmetric) bell-shaped distribution, the
standard deviation has the following characteristics:
• About 68.26% of the data lie within one standard deviation of the
mean.
• About 95.44% of the data lie within two standard deviations of the
mean.
• About 99.74% of the data lie within three standard deviations of
the mean.
Interpreting Standard Deviation: Empirical
Rule (68 – 95 – 99.7 Rule)
99.7% within 3 standard deviations
95% within 2 standard deviations
68% within 1 standard
deviation
34%
2.35%
x  3s
34%
13.5%
x  2s
Source: Larson/Farber 4th ed.
13.5%
x s
x
xs
2.35%
x  2s
x  3s
Example: Using the Empirical Rule
Example:
In a survey conducted by the National Center for
Health Statistics, the sample mean height of women
in the United States (ages 20-29) was 64 inches, with
a sample standard deviation of 2.71 inches. Estimate
the percent of the women whose heights are between
64 inches and 69.42 inches.
Source: Larson/Farber 4th ed.
Solution: Using the Empirical Rule
• Because the distribution is bell-shaped, you can use the
Empirical Rule.
34%
13.5%
55.87
x  3s
58.58
x  2s
61.29
x s
64
x
66.71
xs
69.42
x  2s
72.13
x  3s
34% + 13.5% = 47.5% of women are between 64 and 69.42 inches
tall. (64 + 2.71 = 66.71 + 2.71 = 69.42; all inches)
Source: Larson/Farber 4th ed.
ADDITIONAL
TOPICS
Range Rule of Thumb

To obtain a rough estimate of the standard deviation, s,
range
s

4
Conversely, the “minimum” value would be approximately equal to
the mean – 2*(standard deviation). The “maximum” value would
be approximately equal to the mean + 2*(standard deviation).
Population Variance & Standard Deviation

The population variance, 2 (sigma-squared) is a measure of
variation equal to the sum of the squared deviation scores divided
by N. It is also the square of the standard deviation.
Population Variance:
Population Standard Deviation:
2

(
x


)
 
2
N
2

(
x


)

N
Chebyshev’s Theorem


The Empirical Rule applies if the distribution of the
data is approximately bell-shaped.
Chebyshev’s Theorem applies to distributions
regardless of shape. It states that the proportion
(fraction) of data lying within K standard deviations
of the mean is always at least 1 – 1/K2, where K is
any possible number > 1.


When K = 2: At least 3/4 (75%) of
all values lie within 2 standard
deviations of the mean.
When K = 3: At least 8/9 (89%) of
all values lie within 3 standard
deviations of the mean.
1
1 3
 or 75%
2
2
4
1
1 8
 or 88.9%
32 9
Example: Using Chebychev’s Theorem
The age distribution for Florida is shown in the
histogram. Apply Chebychev’s Theorem to the
data using k = 2. What can you conclude?
Source: Larson/Farber 4th ed.
Solution: Using Chebychev’s Theorem
Given k = 2:
Two
S.D. below the mean = μ – 2σ = 39.2 – 2(24.8) = -10.4
(use 0 since age can’t be negative)
Two
S.D. above the mean = μ + 2σ = 39.2 + 2(24.8) = 88.8
At least 75% of the population of Florida is between 0 and
88.8 years old.
Source: Larson/Farber 4th ed.
End of Slides
Empirical Rule (68-95-99.7 Rule)
The Empirical Rule states that if the
distribution of the data is approximately bellshaped, then:
 Approx. 68.26% of the observations fall within
1 standard deviation of the mean.
 Approx. 95.44% of the observations fall within
2 standard deviations of the mean.
 Approx. 99.74% of the observations fall within
3 standard deviations of the mean.

Empirical Rule
99.7% of data are within 3 standard deviations of the mean
95% within
2 standard deviations
68% within
1 standard deviation
34%
34%
2.4%
2.4%
0.1%
0.1%
13.5%
x - 3s
x - 2s
13.5%
x-s
x
x+s
x + 2s
x + 3s
Chebychev’s Theorem

The portion of any data set lying within k standard deviations (k > 1) of
the mean is at least:
1
1 2
k
• k = 2: In any data set, at least 1 
1 3
 or 75%
2
2
4
of the data lie within 2 standard deviations of the
mean.
• k = 3: In any data set, at least 1  12  8 or 88.9%
3
9
of the data lie within 3 standard deviations of the
mean.
Source: Larson/Farber 4th ed.
Example: Using Chebychev’s Theorem
The age distribution for Florida is shown in the histogram. Apply
Chebychev’s Theorem to the data using k = 2. What can you
conclude?
Larson/Farber 4th ed.
23
Solution: Using Chebychev’s Theorem
Given k = 2:
μ – 2σ = 39.2 – 2(24.8) = -10.4
(use 0 since
age can’t be negative)
μ + 2σ = 39.2 + 2(24.8) = 88.8
At least 75% of the population of Florida is between 0 and
88.8 years old.
Source: Larson/Farber 4th ed.
Interquartile Range (IQR)
 The
Interquartile Range is a
measure of variation. It is the
difference between the first
quartile, Q1(25th percentile) and the
third quartile, Q3 (75th percentile).
 The
Interquartile Range enables us
to determine the existence of
outliers.

Outliers exist in a data set if any of the values are

Less than
or

Greater than
Q1  1.5(IQR )
Q3  1.5(IQR )
Outliers

An Outlier is a value (or values) that is located
very far away from almost all of the other
values in a data set. An outlier can:
 Have
a dramatic effect on the mean.
 Have
a dramatic effect on the standard deviation.
 Have
an effect so dramatic on the scale of a
histogram that the true nature of the distribution is
totally obscured.
Finding a Standard Deviation From a
Frequency Table
To find a standard deviation when data is presented in the form of a frequency
table
As was the case when the (mean
 f from
x 2 a[( f
s
m
frequency table, was calculated, x is the
class midpoint.
 xm ) 2 / n])
(n  1)
Standard Deviation

Standard Deviation – a measure of variation of values about the mean.

Sample Standard Deviation

Population Standard Deviation
2

(
x

x
)
s
n 1
2
2
(

(
x
)

[(

x
)
/ n])
s
2

(
x


)

N
(n  1)
What is the correct pronunciation of the capital of Kentucky?
A) “Loui’ville
B) “Lewis”ville