Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Numerical Descriptive Measures Numerical Descriptive Measures STATISTICS – Lecture no. 8 Jiřı́ Neubauer Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:[email protected] 19. 11. 2009 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Numerical Descriptive Measures measures of location (center) measures of dispersion (variation) measures of concentration Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Arithmetic mean The most important aspect of studying the distribution of a sample of measurements is locating the position of a central value about which the measurements are distributed. Definition The arithmetic mean (average) of a set of n measurements x1 , x2 . . . , xn is given by the formula n x= 1X xi . n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Numerical Descriptive Measures Arithmetic mean If the data are organized in the frequency distribution table then we can calculate the mean by the formula k 1X x= nj · xj , n j=1 where n1 , n2 , . . . , nk are frequencies of variable varieties x 1 , x2 . . . , xk . Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Arithmetic mean Elementary properties of the arithmetic mean: the sum of deviations between the values and the mean is equal to zero n X (xi − x) = 0, i=1 Jiřı́ Neubauer Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Numerical Descriptive Measures Arithmetic mean Elementary properties of the arithmetic mean: the sum of deviations between the values and the mean is equal to zero n X (xi − x) = 0, i=1 if the variable is constant then the mean is equal to this constant n 1X c = c, n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Arithmetic mean Elementary properties of the arithmetic mean: if we add a constant to the values of the variable, then n 1X (xi + c) = c + x, n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Arithmetic mean Elementary properties of the arithmetic mean: if we add a constant to the values of the variable, then n 1X (xi + c) = c + x, n i=1 if we multiply the values of the variable by a constant c, then n 1X c · xi = c · x. n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Harmonic mean Definition The harmonic mean of a set of n measurements x1 , x2 . . . , xn is given by the formula n xH = n . P 1 i=1 xi In certain situations, especially many situations involving rates and ratios, the harmonic mean provides the truest average. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Geometric mean Definition The geometric mean of a set of n measurements x1 , x2 . . . , xn is given by the formula xG = √ n x1 · x2 · · · x n . The geometric mean may be more appropriate than the arithmetic mean for describing percentage growth. Suppose an orange tree yields 100 oranges one year, then 180, 210 and 300 the following years, so the growth is 80 %, 16.7 % and 42.9 % for each of the years. Using the arithmetic mean, we can calculate an average growth as 46.5 % (80 % + 16.7 % + 42.9 % divided by 3). However, if we start with 100 oranges and let it grow with 46.5 % for three years, the result is 314 oranges, not 300. Jiřı́ Neubauer Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Numerical Descriptive Measures Example Calculate the arithmetic, harmonic and geometric mean of 1, 2, 5, 6, 7, 8, 8, 9. Arithmetic mean x= 1+2+5+6+7+8+8+9 = 5.75. 8 Harmonic mean xH = 1 1 + 1 2 + 1 5 + 1 6 8 + 1 7 + 1 8 + 1 8 + 1 9 . = 3.375. Geometric mean xG = √ 8 . 1 · 2 · 5 · 6 · 7 · 8 · 8 · 9 = 4.709. Notice that x H ≤ x G ≤ x. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Quantile Definition The quantile xp is the value of the variable which fulfils that 100p % of values of ordered sample (or population) are smaller or equal to xp and 100(1 − p) % of values of ordered sample (or population) are larger or equal to xp . The quantile is not uniquely defined. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Quantile Let us have the data set 2 5 7 10 12 13 18 21. Possible methods of calculation Sort the data in ascending order. Find the sequential index ip of the quantile xp , which fulfils inequation np < ip < np + 1. The quantile xp is then equal to the value of variable with the sequential index ip – xp = x(ip ) . If np, np + 1 are integer, we calculate the quantile as an aritmetic mean of x(np) a x(np+1) , x +x xp = (np) 2 (np+1) . Statistical software STATISTICA uses this method. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Quantile According to MATLAB We calculate īp = np + np + 1 2np + 1 = 2 2 determining the location of the quantile. Using linear interpolation we get xp = x([īp ]) + (x([īp ]+1) − x([īp ]) )(īp − [īp ]), where [·] denotes the integer part of the number. If īp < 1 then xp = x(1) , if īp > n then xp = x(n) . Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Quantile According to EXCEL 1 2 We assign values 0, n−1 , n−1 , . . . , n−2 n−1 , 1 to the data sorted in 1 , the ascending order. If P is equal to the multiple of n−1 quantile xp is equal to the value corresponding to the given 1 , we use linear multiple. If P is not the multiple n−1 interpolation. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Quantile xp STATISTICA MATLAB EXCEL 0.10 2 2.9 4.1 Jiřı́ Neubauer 0.25 6 6 6.5 0.50 11 11 11 0.75 15.5 15.5 14.25 0.90 21 20.1 18.9 Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Example Calculate the median, lower and upper quartile and lower and upper decile of 1, 2, 5, 6, 7, 8, 8, 9. The range of the data set is n = 8. The median is the middle value of the data sorted in ascending order. There is not one middle value, but two (6 and 7). We calculate the median as x̃ = x0.50 = 6+7 = 6.5. 2 Interpretation: 50 % of ordered values are smaller or equal to 6.5, do not exceed value 6.5. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Example Lower quartile x0.25 . Usign the formula np < ip < np + 1 we get 8 · 0.25 < ip < 8 · 0.25 + 1 ⇔ 2 < ip < 3. x0.25 = x(2) + x(3) 2+5 = = 3.5. 2 2 Analogously for upper decile: x0.90 , 8 · 0.90 < ip < 8 · 0.90 + 1 ⇔ 7.2 < ip < 8.2, we get ip = 8 and x0.90 = x(8) = 9. We say that 25 % of ordered values are smaller or equal to 3.5. Analogously 90 % of values do not exceed 9. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Mode Definition The mode x̂ is the value of variable with the highest frequency. In the case of continuous variable (data) the mode is the value where the histogram reaches its peak. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Mode Figure: Non-homogeneous sample Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Measures of Dispersion Means, quantiles and a mode – measures of location – describe one property of frequency distribution – location. Another important property is dispersion (variation) which we describe by several measures of variation Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Measures of Dispersion Figure: Two samples with different variation Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Range of Variation Definition The range of variation R is defined as difference between the largest and the smallest value of the variable R = xmax − xmin . It is the simplest but the rawest measure of variation. It indicates the width of the interval where all values are included. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Interquantile Ranges Definition the interquartile range RQ = x0.75 − x0.25 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Interquantile Ranges Definition the interquartile range RQ = x0.75 − x0.25 the interdecile range RD = x0.90 − x0.10 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Interquantile Ranges Definition the interquartile range RQ = x0.75 − x0.25 the interdecile range RD = x0.90 − x0.10 the interpercentile range RC = x0.99 − x0.01 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Interquantile Ranges The interquartile range indicates the width of the interval which includes 50 % of middle values of ordered sample. By analogy the interdecile or the interpercentile range indicatethe width of the interval which includes 80 % or 98 % of middle values of ordered sample. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Example We have calculated quantiles of the data 2, 5, 7, 10, 12, 13, 18 and 21. According to STATISTICA: x0.10 = 2, x0.25 = 6, x0.50 = 11, x0.75 = 15.5, x0.90 = 21. The range of variation is R = xmax − xmin = 21 − 2 = 19. The interquartile range is RQ = x0.75 − x0.25 = 15.5 − 6 = 9.5. The interdecile range is RD = x0.90 − x0.10 = 21 − 2 = 19. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Quantile Deviations Definition the quartile deviation Q = RQ /2 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Quantile Deviations Definition the quartile deviation Q = RQ /2 the decile deviation D = RD /8 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Quantile Deviations Definition the quartile deviation Q = RQ /2 the decile deviation D = RD /8 the percentile deviation C = RC /98 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Example Calculate the quartile and the decile deviation of 2, 5, 7, 10, 12, 13, 18 and 21. The quartile deviation is Q = RQ /2 = 9,5/2 = 4,75. The decile deviation is D = RD /8 = 19/8 = 2,375. It means that the average width of two (eight) middle quartile (decile) intervals is 4.75 (2.375). Jiřı́ Neubauer Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Numerical Descriptive Measures Average Deviation Definition The average deviation is defined as the arithmetic mean of the absolute deviations n 1X dx = |xi − x|. n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Example Find the average deviation of a data set 1, 2, 5, 6, 7, 8, 8 and 9. The arithmetic mean is x = 5.75. We obtain |1 − 5.75| + |2 − 5.75| + |5 − 5.75| + |6 − 5.75| + 8 |7 − 5.75| + |8 − 5.75| + |8 − 5.75| + |9 − 5.75| + = 2.3125. 8 dx = Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Variance Definition The variance sn2 is defined as the arithmetic mean of squares of deviations n 1X 2 sn = (xi − x)2 . n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Numerical Descriptive Measures Variance n 1X 1 (xi − x)2 = sn2 = n n = 1 n i=1 n X n X xi2 − 2x i=1 ! xi2 − 2nx 2 − nx 2 i=1 Jiřı́ Neubauer = 1 n n X xi + i=1 n X xi2 i=1 n X ! x2 i=1 − x 2 = x 2 − x 2. Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Variance Elementary properties of the variance: if the variable is constant and is equal to c, then the variance is zero n 1X (c − c)2 = 0, n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Variance Elementary properties of the variance: if the variable is constant and is equal to c, then the variance is zero n 1X (c − c)2 = 0, n i=1 if we add a constant to the values of the variable, then n 1X [(xi + c) − (x + c)]2 = sn2 , n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Variance Elementary properties of the variance: if the variable is constant and is equal to c, then the variance is zero n 1X (c − c)2 = 0, n i=1 if we add a constant to the values of the variable, then n 1X [(xi + c) − (x + c)]2 = sn2 , n i=1 if we multiply the values of the variable by a constant c, then n 1X (c · xi − c · x)2 = c 2 · sn2 . n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Standard Deviation Definition The square root of the variance is called standard deviation q sn = sn2 Jiřı́ Neubauer Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Numerical Descriptive Measures Sample Variance and Standard Deviation Definition The sample variance s 2 if defined by the formula n 1 X (xi − x̄)2 , s = n−1 2 i=1 the square root of the sample variance is called sample standard deviation √ s = s 2. It is obvious that sn2 = Jiřı́ Neubauer n−1 2 s . n Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Example Calculate the variance, the standard deviation, the sample variance and the sample standard deviation of the data set 1, 2, 5, 6, 7, 8, 8 and 9. The arithmetic mean is x = 5.75. (1 − 5.75)2 + (2 − 5.75)2 + (5 − 5.75)2 + (6 − 5.75)2 + 8 (7 − 5.75)2 + (8 − 5.75)2 + (8 − 5.75)2 + (9 − 5.75)2 + = 7.4375. 8 sn2 = Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Example The variace can be also calculated by the formula sn2 = x 2 − x 2 . n x2 = 1 X 2 12 + 22 + 52 + 62 + 72 + 82 + 82 + 92 xi = = 40.5, n 8 i=1 sn2 = x 2 − x 2 = 40.5 − 5.752 = 7.4375. The standard deviation is q √ . sn = sn2 = 7.4375 = 2.72718. Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Example To get the sample variation we apply the formula s2 = n 2 8 s = · 7.4375 = 8.5. n−1 n 7 The sample standard deviation is √ √ . s = s 2 = 8.5 = 2.91548. Jiřı́ Neubauer Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Numerical Descriptive Measures Moments Definition The r th moment is defined by the formula n mr0 1X r = xi , n i=1 The r th central moment is defined by the formula n 1X mr = (xi − x)r . n i=1 Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Sample Skewness Definition The sample skewness is defined by the formula n P a3 = m3 3/2 m2 = (xi − x)3 i=1 Jiřı́ Neubauer nsn3 = m3 sn3 Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Sample Skewness Figure: Frequency distribution with the different sample skewness Jiřı́ Neubauer Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Sample Kurtosis Definition The sample kurtosis is defined by formula n P m4 a4 = 2 − 3 = m2 Jiřı́ Neubauer (xi − x)4 i=1 nsn4 −3 Numerical Descriptive Measures Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Sample Kurtosis Figure: Frequency distribution with the different sample kurtosis Jiřı́ Neubauer Numerical Descriptive Measures Measures of Location Measures of Dispersion Measures of Concentration Numerical Descriptive Measures Note Excel functions SKEW and KURT calculate skewness and kurtosis by formulas n X xi − x 3 n ∗ , a3 = (n − 1)(n − 2) s i=1 n a4∗ X n(n + 1) = (n − 1)(n − 2)(n − 3) i=1 xi − x s 4 − 3(n − 1)2 . (n − 2)(n − 3) We can derive n−2 a3 = p n(n − 1) a4 = · a3∗ , (n − 2)(n − 3) ∗ 6 · a4 − . n2 − 1 n+1 Jiřı́ Neubauer Numerical Descriptive Measures