• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Confidence interval wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Linear least squares (mathematics) wikipedia, lookup

Association rule learning wikipedia, lookup

Transcript
```Last Update
16th March 2011
SESSION 19 & 20
Measures of Dispersion
Measures of Variability
- Grouped Data -
Lecturer:
University:
Domain:
Florian Boehlandt
http://www.hedge-fundanalysis.net/pages/vega.php
Learning Objectives
All measures for grouped data:
1. Measures of relative standing: Median,
Quartiles, Deciles and Percentiles
2. Measures of dispersion: Range
3. Measures of variability: Variance and
Standard Deviation
4. Empirical Rule and Chebysheff’s Theroem
5. Coefficient of Variation
Percentiles
We can determine any percentile for grouped data
using the following formula:
For quartiles, the formula ‘simplifies’ to:
Where m = 1, 2 , 3 or 4 for the first, second, third and
fourth quartile
Calculation of Percentile
1. Calculate the less than cumulative
frequencies f(<) from the observed
frequencies f
2. Use the following formula to determine the
location of the Pth percentile:
Lp = (n + 1) * (P / 100)
3. Locate the interval Lp falls into
Calculation of Percentile
4. Determine the following parameters
P
The percentile (e.g. 25 for the first quartile)
n
Sample size
OLP
The lower limit of the interval Lp falls into
C
Class width
f(<)
The cumulative frequency of the previous interval of
the interval Lp falls into
fLP
The observed frequency of the interval Lp falls into
5. Apply formula for Pth Percentile
Percentile: An example
Let us assume the following grouped data is to
be assessed:
Interval
40 to 49
50 to 59
60 to 69
70 to 79
80 to 89
C
n
f
f(<)
6
14
11
6
3
10
40
C = Upper + 1 – Lower
C = 49 + 1 – 40 = 10
6
20
31
37
40
Percentile: An example
If the data is interval (student marks approximately are), inequalities in the
intervals may be more appropriate.
Interval
40 to 49
50 to 59
60 to 69
70 to 79
80 to 89
C
n
f
f(<)
6
14
11
6
3
10
40
C = Upper + 1 – Lower
C = 49 + 1 – 40 = 10
6
20
31
37
40
This example
comes from your
student manual.
The intervals on
the right including
inequalities may
be somewhat
more intuitive
Interval
40 to <50
50 to <60
60 to <70
70 to <80
80 to <90
C
n
f
f(<)
6
14
11
6
3
10
40
C = Upper – Lower
C= 50 – 40 = 10
6
20
31
37
40
Solution – Step 1
Interval
40 to < 50
50 to < 60
60 to < 70
70 to < 80
80 to < 90
C
n
P
Lp
f
f(<)
6
14
11
6
3
6
20
31
37
40
10
40
25
9.75 =(40 + 1) * (25/100)
Use the formula for the
calculation to determine what
interval the median falls into.
Since 6 < 9.75 < 20, the
median interval is 50 to < 60.
Beware that the median
interval is to be looked up in
the cumulative frequency
column, not the interval
column!
Solution – Step 2
Interval
40 to < 50
50 to < 60
60 to < 70
70 to < 80
80 to < 90
f
f(<)
C
n
P
Lp
OLP
fLP
f(<)
10
40
25
9.75
50
14
6
6
14
11
6
3
6
20
31
37
40
required for the median
formula for grouped data. The
formula:
Now yields:
It is left as an exercise to
confirm that the formula for Q
yields the same result.
Variance
Using the midpoints allows us to calculate the variance
of grouped data as well. In the case of interval data, as
with the mean, the original data is to be preferred to
the grouped data. For ordinal or nominal data the
variance has no probabilistic meaning! Measures of
relative standing (i.e. percentiles) may be used for
ordinal data. There are no measures of variability for
nominal data (Example: 1 = married, 2 = single, 3 =
divorced, 4 = widowed).
Calculation of Variance
1. Determine the interval midpoints x
2. Multiply the observed frequencies f with the
interval midpoints (fx)
3. Sum the results from 2. and divide by n
(Steps 1 to 3 are identical to calculating the
mean for grouped data)
4. Square x and multiply by f yielding fx2
Calculation of Variance
6. Use the following formula to determine the
variance for grouped data (sample):
And for the population:
Note that x denotes the midpoints here and
not the actual observations.
Variance: An example
Let us assume the following grouped data is to
be assessed:
Interval
40 to < 49
50 to < 59
60 to < 69
70 to < 79
80 to < 89
C
n
f
6
14
11
6
3
9
40
Solution – Step 1
Interval
40 to < 49
50 to < 59
60 to < 69
70 to < 79
80 to < 89
Total
Average
f
6
14
11
6
3
40
x
44.5
54.5
64.5
74.5
84.5
fx
267.0
763.0
709.5
447.0
253.5
2440.0
61.0
x2
1980.25
2970.25
4160.25
5550.25
7140.25
61
40
153950
fx2
11881.5
41583.5
45762.75
33301.5
21420.75
153950
Solution – Step 2
Using the formula
yields:
As before, the square root yields the standard
deviation.
Empirical Rule
x
68,2
6%
95,4
4%
2
s
1
s
x
+
1
s
+
2
s
In normal bell-shaped frequency distribution polygons, we find the following:
1. Approx. 68.2% of all observations fall within one standard deviation of the mean
2. Approx. 95.4% of all observations fall within two standard deviations of the mean
3. Approx. 99.7% of all observations fall within three standard deviations of the mean
Chebycheff’s Theorem
The Chebycheff Theorem is a more general alternative to the
empirical rule, which applies to all shapes of histograms.
The proportion of observations that lie within k standard
deviations of the mean is at least:
1 – 1 / k2
for k > 1
Where k denotes the standard deviations away from the mean
Chebycheff’s Theorem - Example
k
Formula
Chebycheff
Empirical
k=1
not defined
n.a.
=68.2%
k=2
1–1/4
=75%
=95.4%
k=3
1–1/9
=88.9%
=99.7%
K=4
1 – 1/16
=93.75%
n.a.
The Empirical Rule provides approximate proportions under the assumption of a bellshaped normal distribution, whereas Chebycheff’s Theorem provides lower bounds
on the approximations for any types of distribution. Consequently, the tail-ends of the
distribution are further apart. Chebycheff is not relevant to your examination!
Coefficient of Variation
The coefficient of variation of a set of observations is the
standard deviation divided by their mean:
Sample
Population
By relating the standard deviation to its mean one can make a
statement about the variability of the data. Compare a standard
deviation of 10 to a mean of 100 and a mean of 1,000,000!
```
Related documents