Download Numerical Descriptive Measures - STATISTICS -

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Numerical Descriptive Measures
Numerical Descriptive Measures
STATISTICS – Lecture no. 8
Jiřı́ Neubauer
Department of Econometrics FEM UO Brno
office 69a, tel. 973 442029
email:[email protected]
19. 11. 2009
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Numerical Descriptive Measures
measures of location (center)
measures of dispersion (variation)
measures of concentration
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Arithmetic mean
The most important aspect of studying the distribution of a
sample of measurements is locating the position of a central value
about which the measurements are distributed.
Definition
The arithmetic mean (average) of a set of n measurements
x1 , x2 . . . , xn is given by the formula
n
x=
1X
xi .
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Numerical Descriptive Measures
Arithmetic mean
If the data are organized in the frequency distribution table then
we can calculate the mean by the formula
k
1X
x=
nj · xj ,
n
j=1
where n1 , n2 , . . . , nk are frequencies of variable varieties
x 1 , x2 . . . , xk .
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Arithmetic mean
Elementary properties of the arithmetic mean:
the sum of deviations between the values and the mean is
equal to zero
n
X
(xi − x) = 0,
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Numerical Descriptive Measures
Arithmetic mean
Elementary properties of the arithmetic mean:
the sum of deviations between the values and the mean is
equal to zero
n
X
(xi − x) = 0,
i=1
if the variable is constant then the mean is equal to this
constant
n
1X
c = c,
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Arithmetic mean
Elementary properties of the arithmetic mean:
if we add a constant to the values of the variable, then
n
1X
(xi + c) = c + x,
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Arithmetic mean
Elementary properties of the arithmetic mean:
if we add a constant to the values of the variable, then
n
1X
(xi + c) = c + x,
n
i=1
if we multiply the values of the variable by a constant c, then
n
1X
c · xi = c · x.
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Harmonic mean
Definition
The harmonic mean of a set of n measurements x1 , x2 . . . , xn is
given by the formula
n
xH = n
.
P 1
i=1
xi
In certain situations, especially many situations involving rates and
ratios, the harmonic mean provides the truest average.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Geometric mean
Definition
The geometric mean of a set of n measurements x1 , x2 . . . , xn is
given by the formula
xG =
√
n
x1 · x2 · · · x n .
The geometric mean may be more appropriate than the arithmetic
mean for describing percentage growth.
Suppose an orange tree yields 100 oranges one year, then 180, 210
and 300 the following years, so the growth is 80 %, 16.7 % and
42.9 % for each of the years. Using the arithmetic mean, we can
calculate an average growth as 46.5 % (80 % + 16.7 % + 42.9 %
divided by 3). However, if we start with 100 oranges and let it grow
with 46.5 % for three years, the result is 314 oranges, not 300.
Jiřı́ Neubauer
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Numerical Descriptive Measures
Example
Calculate the arithmetic, harmonic and geometric mean of 1, 2, 5,
6, 7, 8, 8, 9.
Arithmetic mean
x=
1+2+5+6+7+8+8+9
= 5.75.
8
Harmonic mean
xH =
1
1
+
1
2
+
1
5
+
1
6
8
+
1
7
+
1
8
+
1
8
+
1
9
.
= 3.375.
Geometric mean
xG =
√
8
.
1 · 2 · 5 · 6 · 7 · 8 · 8 · 9 = 4.709.
Notice that x H ≤ x G ≤ x.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Quantile
Definition
The quantile xp is the value of the variable which fulfils that
100p % of values of ordered sample (or population) are smaller or
equal to xp and 100(1 − p) % of values of ordered sample (or
population) are larger or equal to xp .
The quantile is not uniquely defined.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Quantile
Let us have the data set 2 5 7 10 12 13 18 21.
Possible methods of calculation
Sort the data in ascending order. Find the sequential index ip
of the quantile xp , which fulfils inequation
np < ip < np + 1.
The quantile xp is then equal to the value of variable with the
sequential index ip – xp = x(ip ) . If np, np + 1 are integer, we
calculate the quantile as an aritmetic mean of x(np) a x(np+1) ,
x
+x
xp = (np) 2 (np+1) .
Statistical software STATISTICA uses this method.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Quantile
According to MATLAB
We calculate
īp =
np + np + 1
2np + 1
=
2
2
determining the location of the quantile. Using linear
interpolation we get
xp = x([īp ]) + (x([īp ]+1) − x([īp ]) )(īp − [īp ]),
where [·] denotes the integer part of the number. If īp < 1
then xp = x(1) , if īp > n then xp = x(n) .
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Quantile
According to EXCEL
1
2
We assign values 0, n−1
, n−1
, . . . , n−2
n−1 , 1 to the data sorted in
1
, the
ascending order. If P is equal to the multiple of n−1
quantile xp is equal to the value corresponding to the given
1
, we use linear
multiple. If P is not the multiple n−1
interpolation.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Quantile
xp
STATISTICA
MATLAB
EXCEL
0.10
2
2.9
4.1
Jiřı́ Neubauer
0.25
6
6
6.5
0.50
11
11
11
0.75
15.5
15.5
14.25
0.90
21
20.1
18.9
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Example
Calculate the median, lower and upper quartile and lower and
upper decile of 1, 2, 5, 6, 7, 8, 8, 9.
The range of the data set is n = 8. The median is the middle value
of the data sorted in ascending order. There is not one middle
value, but two (6 and 7). We calculate the median as
x̃ = x0.50 =
6+7
= 6.5.
2
Interpretation: 50 % of ordered values are smaller or equal to 6.5,
do not exceed value 6.5.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Example
Lower quartile x0.25 . Usign the formula
np < ip < np + 1
we get 8 · 0.25 < ip < 8 · 0.25 + 1 ⇔ 2 < ip < 3.
x0.25 =
x(2) + x(3)
2+5
=
= 3.5.
2
2
Analogously for upper decile: x0.90 ,
8 · 0.90 < ip < 8 · 0.90 + 1 ⇔ 7.2 < ip < 8.2, we get ip = 8 and
x0.90 = x(8) = 9.
We say that 25 % of ordered values are smaller or equal to 3.5.
Analogously 90 % of values do not exceed 9.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Mode
Definition
The mode x̂ is the value of variable with the highest frequency.
In the case of continuous variable (data) the mode is the value
where the histogram reaches its peak.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Mode
Figure: Non-homogeneous sample
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Measures of Dispersion
Means, quantiles and a mode – measures of location – describe
one property of frequency distribution – location.
Another important property is dispersion (variation) which we
describe by several measures of variation
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Measures of Dispersion
Figure: Two samples with different variation
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Range of Variation
Definition
The range of variation R is defined as difference between the
largest and the smallest value of the variable
R = xmax − xmin .
It is the simplest but the rawest measure of variation. It indicates
the width of the interval where all values are included.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Interquantile Ranges
Definition
the interquartile range
RQ = x0.75 − x0.25
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Interquantile Ranges
Definition
the interquartile range
RQ = x0.75 − x0.25
the interdecile range
RD = x0.90 − x0.10
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Interquantile Ranges
Definition
the interquartile range
RQ = x0.75 − x0.25
the interdecile range
RD = x0.90 − x0.10
the interpercentile range
RC = x0.99 − x0.01
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Interquantile Ranges
The interquartile range indicates the width of the interval which
includes 50 % of middle values of ordered sample. By analogy the
interdecile or the interpercentile range indicatethe width of the
interval which includes 80 % or 98 % of middle values of ordered
sample.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Example
We have calculated quantiles of the data 2, 5, 7, 10, 12, 13, 18
and 21. According to STATISTICA: x0.10 = 2, x0.25 = 6,
x0.50 = 11, x0.75 = 15.5, x0.90 = 21.
The range of variation is R = xmax − xmin = 21 − 2 = 19.
The interquartile range is RQ = x0.75 − x0.25 = 15.5 − 6 = 9.5.
The interdecile range is RD = x0.90 − x0.10 = 21 − 2 = 19.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Quantile Deviations
Definition
the quartile deviation
Q = RQ /2
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Quantile Deviations
Definition
the quartile deviation
Q = RQ /2
the decile deviation
D = RD /8
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Quantile Deviations
Definition
the quartile deviation
Q = RQ /2
the decile deviation
D = RD /8
the percentile deviation
C = RC /98
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Example
Calculate the quartile and the decile deviation of 2, 5, 7, 10, 12,
13, 18 and 21.
The quartile deviation is Q = RQ /2 = 9,5/2 = 4,75.
The decile deviation is D = RD /8 = 19/8 = 2,375.
It means that the average width of two (eight) middle quartile
(decile) intervals is 4.75 (2.375).
Jiřı́ Neubauer
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Numerical Descriptive Measures
Average Deviation
Definition
The average deviation is defined as the arithmetic mean of the
absolute deviations
n
1X
dx =
|xi − x|.
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Example
Find the average deviation of a data set 1, 2, 5, 6, 7, 8, 8 and 9.
The arithmetic mean is x = 5.75. We obtain
|1 − 5.75| + |2 − 5.75| + |5 − 5.75| + |6 − 5.75|
+
8
|7 − 5.75| + |8 − 5.75| + |8 − 5.75| + |9 − 5.75|
+
= 2.3125.
8
dx =
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Variance
Definition
The variance sn2 is defined as the arithmetic mean of squares of
deviations
n
1X
2
sn =
(xi − x)2 .
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Numerical Descriptive Measures
Variance
n
1X
1
(xi − x)2 =
sn2 =
n
n
=
1
n
i=1
n
X
n
X
xi2 − 2x
i=1
!
xi2 − 2nx 2 − nx 2
i=1
Jiřı́ Neubauer
=
1
n
n
X
xi +
i=1
n
X
xi2
i=1
n
X
!
x2
i=1
− x 2 = x 2 − x 2.
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Variance
Elementary properties of the variance:
if the variable is constant and is equal to c, then the variance
is zero
n
1X
(c − c)2 = 0,
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Variance
Elementary properties of the variance:
if the variable is constant and is equal to c, then the variance
is zero
n
1X
(c − c)2 = 0,
n
i=1
if we add a constant to the values of the variable, then
n
1X
[(xi + c) − (x + c)]2 = sn2 ,
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Variance
Elementary properties of the variance:
if the variable is constant and is equal to c, then the variance
is zero
n
1X
(c − c)2 = 0,
n
i=1
if we add a constant to the values of the variable, then
n
1X
[(xi + c) − (x + c)]2 = sn2 ,
n
i=1
if we multiply the values of the variable by a constant c, then
n
1X
(c · xi − c · x)2 = c 2 · sn2 .
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Standard Deviation
Definition
The square root of the variance is called standard deviation
q
sn = sn2
Jiřı́ Neubauer
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Numerical Descriptive Measures
Sample Variance and Standard Deviation
Definition
The sample variance s 2 if defined by the formula
n
1 X
(xi − x̄)2 ,
s =
n−1
2
i=1
the square root of the sample variance is called sample standard
deviation
√
s = s 2.
It is obvious that
sn2 =
Jiřı́ Neubauer
n−1 2
s .
n
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Example
Calculate the variance, the standard deviation, the sample variance
and the sample standard deviation of the data set 1, 2, 5, 6, 7, 8,
8 and 9.
The arithmetic mean is x = 5.75.
(1 − 5.75)2 + (2 − 5.75)2 + (5 − 5.75)2 + (6 − 5.75)2
+
8
(7 − 5.75)2 + (8 − 5.75)2 + (8 − 5.75)2 + (9 − 5.75)2
+
= 7.4375.
8
sn2 =
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Example
The variace can be also calculated by the formula sn2 = x 2 − x 2 .
n
x2 =
1 X 2 12 + 22 + 52 + 62 + 72 + 82 + 82 + 92
xi =
= 40.5,
n
8
i=1
sn2 = x 2 − x 2 = 40.5 − 5.752 = 7.4375.
The standard deviation is
q
√
.
sn = sn2 = 7.4375 = 2.72718.
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Example
To get the sample variation we apply the formula
s2 =
n 2 8
s = · 7.4375 = 8.5.
n−1 n
7
The sample standard deviation is
√
√
.
s = s 2 = 8.5 = 2.91548.
Jiřı́ Neubauer
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Numerical Descriptive Measures
Moments
Definition
The r th moment is defined by the formula
n
mr0
1X r
=
xi ,
n
i=1
The r th central moment is defined by the formula
n
1X
mr =
(xi − x)r .
n
i=1
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Sample Skewness
Definition
The sample skewness is defined by the formula
n
P
a3 =
m3
3/2
m2
=
(xi − x)3
i=1
Jiřı́ Neubauer
nsn3
=
m3
sn3
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Sample Skewness
Figure: Frequency distribution with the different sample skewness
Jiřı́ Neubauer
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Sample Kurtosis
Definition
The sample kurtosis is defined by formula
n
P
m4
a4 = 2 − 3 =
m2
Jiřı́ Neubauer
(xi − x)4
i=1
nsn4
−3
Numerical Descriptive Measures
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Sample Kurtosis
Figure: Frequency distribution with the different sample kurtosis
Jiřı́ Neubauer
Numerical Descriptive Measures
Measures of Location
Measures of Dispersion
Measures of Concentration
Numerical Descriptive Measures
Note
Excel functions SKEW and KURT calculate skewness and kurtosis
by formulas
n X
xi − x 3
n
∗
,
a3 =
(n − 1)(n − 2)
s
i=1
n
a4∗
X
n(n + 1)
=
(n − 1)(n − 2)(n − 3)
i=1
xi − x
s
4
−
3(n − 1)2
.
(n − 2)(n − 3)
We can derive
n−2
a3 = p
n(n − 1)
a4 =
· a3∗ ,
(n − 2)(n − 3) ∗
6
· a4 −
.
n2 − 1
n+1
Jiřı́ Neubauer
Numerical Descriptive Measures
Related documents