Download Section 4.4: Interpreting Center and Variability: Chebyshev`s Rule

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Section 4.4: Interpreting Center
and Variability: Chebyshev’s
Rule, The Empirical Rule, and zscores
• To compare information such as the mean
and standard deviation it is useful to be
able to describe how far away a particular
observation is from the mean in terms of
standard deviation.
• Suppose we have a data set of scores on a
standardized test with mean of 100 and
standard deviation of 15. We can make the
following statements:
– Because 100 – 15 = 85, we say that a score of 85 is “
1 standard deviation below the mean” similarly 100 +
15 = 115 is “1 standard deviation above the mean”
– Because 2 standard deviations is 2(15) = 30 and 100
+ 30 = 130 and 100 – 30 = 70 scores between 70 and
130 are within 2 standard deviations of the mean.
– Because 100 + (3)(15) = 145, scores above 145
exceed the mean by more than 3 standard deviations
Chebyshev’s Rule
Chebyshev’s Rule
Consider any number k, where
k  1. Then the percentage of
observations that are within k standard
deviations of the mean is at least
.
1 

100 1  2  %
 k 
For specific values of k
Chebyshev’s Rule reads
• At least 75% of the observations are within 2 standard
•
•
•
•
•
deviations of the mean.
At least 89% of the observations are within 3 standard
deviations of the mean.
At least 90% of the observations are within 3.16
standard deviations of the mean.
At least 94% of the observations are within 4 standard
deviations of the mean.
At least 96% of the observations are within 5 standard
deviations of the mean.
At least 99% of the observations are with 10
standard deviations of the mean.
Example – Chebyshev’s Rule
• Consider the student age data
17 18 18
19 19 19
19 19 19
20 20 20
21 21 21
22 22 22
22 23 23
25 26 28
Color Code:
18
19
19
20
21
22
23
28
18 18 19 19 19 19
19 19 19 19 19 19
19 19 20 20 20 20
20 20 21 21 21 21
21 21 21 21 21 21
22 22 22 22 22 22
23 23 23 24 24 24
30 37 38 44 47
within 1 standard deviation of the mean
within 2 standard deviations of the mean
within 3 standard deviations of the mean
within 4 standard deviations of the mean
within 5 standard deviations of the mean
Example continued
Chebyshev’s
Actual
within 1 standard
deviation of the mean
 0%
72/79 = 91.1%
within 2 standard
deviations of the mean
 75%
75/79 = 94.9%
within 3 standard
deviations of the mean
 88.8%
76/79 = 96.2%
within 4 standard
deviations of the mean
 93.8%
77/79 = 97.5%
within 5 standard
deviations of the mean
 96.0%
79/79 = 100%
Interval
• Notice that Chebyshev gives very
conservative lower bounds and the values
aren’t very close to the actual
percentages.
Empirical Rule
•
If the histogram of values in a data set is
reasonably symmetric and unimodal
(specifically, is reasonably approximated
by a normal curve), then
1. Approximately 68% of the observations are
within 1 standard deviation of the mean
2. Approximately 95% of the observations are
within 2 standard deviations of the mean
3. Approximately 99.7% of the observations
are within 3 standard deviations of the mean
Z-Score
The z score corresponding to a particular
observation in a data set is
zscore  observation  mean
standard deviation
• The z-score is how many standard deviations
the observation is from the mean.
• A positive z-score indicates the observation is
above the mean and a negative z-score
indicates the observation is below the mean
• Computing the z score is often referred to as
standardization and the z score is called a
standardized score.
The formula used with sample data is
z score  x s x
Example
A sample of GPAs of 38 statistics students appear
below (sorted in increasing order)
2.00 2.25 2.36 2.37 2.50 2.50 2.60
2.67 2.70 2.70 2.75 2.78 2.80 2.80
2.82 2.90 2.90 3.00 3.02 3.07 3.15
3.20 3.20 3.20 3.23 3.29 3.30 3.30
3.42 3.46 3.48 3.50 3.50 3.58 3.75
3.80 3.83 3.97
Mean = 3.0434 and s = 0.4720
• The following stem
and leaf indicates that
the GPA data is
reasonably symmetric
and uimodal
2
2
2
2
2
3
3
3
3
3
0
233
55
667777
88899
0001
2222233
444555
7
889
Stem: Units digit
Leaf: Tenths digit
x we compute
Using the formula z score  x 
s
the z scores and color code the values as we
did in an earlier example.
-2.21 -1.68 -1.45 -1.43 -1.15 -1.15
-0.94 -0.79 -0.73 -0.73 -0.62 -0.56
-0.52 -0.52 -0.47 -0.30 -0.30 -0.09
0.23 0.33 0.33 0.33
-0.05 0.06
0.40 0.52 0.54 0.54 0.80 0.88
0.93 0.97 0.97 1.14 1.50 1.60
1.67 1.96
Interval
Empirical Rule
within 1 standard
deviation of the mean
 68%
within 2 standard
deviations of the mean
 95%
within 3 standard
deviations of the mean
99.7%
Actual
27/38 = 71%
37/38 = 97%
38/38 = 100%
• Notice that the empirical rule gives
reasonably good estimates for this
example.
Comparison of Chebyshev’s Rule
and the Empirical Rule
• The following refers to the
weights in the sample of
79 students. Notice that
the stem and leaf
diagram suggest the data
distribution is unimodal
but is positively skewed
because of the outliers on
the high side.
Nevertheless, the results
for the Empirical Rule are
good.
10
11
12
13
14
15
16
17
18
19
20
21
22
23
3
37
011444555
000000455589
000000000555
000000555567
000005558
0000005555
0358
5
00
0
55
79
Stem: Hundreds & tens digits
Leaf: Units digit
Interval
Chebyshev’s Empirical
Rule
Rule
Actual
within 1 standard
deviation of the mean
 0%
 68%
56/79 = 70.9%
within 2 standard
deviations of the mean
 75%
 95%
75/79 = 94.9%
within 3 standard
deviations of the mean
 88.8%
99.7%
79/79 = 100%
• Notice that even with moderate positive skewing
of the data, the Empirical Rule gave a much
more usable and meaningful result.
Related documents